42
todos/SQ-020-cluster-membership.md
Normal file
42
todos/SQ-020-cluster-membership.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# SQ-020: Cluster Membership (Gossip)
|
||||
|
||||
**Status:** `[ ] TODO`
|
||||
**Blocked by:** SQ-009, SQ-019
|
||||
**Priority:** Medium
|
||||
|
||||
## Description
|
||||
|
||||
Nodes discover each other via seed list and maintain a membership list through periodic heartbeats.
|
||||
|
||||
## Files to Create/Modify
|
||||
|
||||
- `crates/sq-cluster/src/lib.rs` - module exports
|
||||
- `crates/sq-cluster/src/membership.rs` - seed list, join, heartbeat, failure detection
|
||||
- `crates/sq-server/src/grpc/cluster.rs` - ClusterService Join/Heartbeat RPC impl
|
||||
- `crates/sq-server/src/cli/serve.rs` - add --seeds CLI flag
|
||||
|
||||
## Configuration
|
||||
|
||||
```
|
||||
SQ_SEEDS=node1:6060,node2:6060 # Seed node addresses
|
||||
SQ_NODE_ID=node-1 # Unique node ID
|
||||
SQ_HEARTBEAT_INTERVAL_MS=5000 # Heartbeat every 5s
|
||||
SQ_FAILURE_THRESHOLD=3 # Missed heartbeats before suspected
|
||||
```
|
||||
|
||||
## Membership State Machine
|
||||
|
||||
```
|
||||
Unknown -> Alive (on Join response or Heartbeat)
|
||||
Alive -> Suspected (missed 3 heartbeats)
|
||||
Suspected -> Dead (suspected for 30 seconds)
|
||||
Dead -> Alive (on successful re-Join)
|
||||
```
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Start 3 nodes with seed list, all discover each other
|
||||
- [ ] Status RPC shows all 3 nodes as "alive"
|
||||
- [ ] Stop one node, others detect it as "suspected" then "dead"
|
||||
- [ ] Restart dead node, it re-joins and becomes "alive"
|
||||
- [ ] Node with no seeds starts as single-node cluster
|
||||
Reference in New Issue
Block a user