43 lines
1.3 KiB
Markdown
43 lines
1.3 KiB
Markdown
# SQ-020: Cluster Membership (Gossip)
|
|
|
|
**Status:** `[ ] TODO`
|
|
**Blocked by:** SQ-009, SQ-019
|
|
**Priority:** Medium
|
|
|
|
## Description
|
|
|
|
Nodes discover each other via seed list and maintain a membership list through periodic heartbeats.
|
|
|
|
## Files to Create/Modify
|
|
|
|
- `crates/sq-cluster/src/lib.rs` - module exports
|
|
- `crates/sq-cluster/src/membership.rs` - seed list, join, heartbeat, failure detection
|
|
- `crates/sq-server/src/grpc/cluster.rs` - ClusterService Join/Heartbeat RPC impl
|
|
- `crates/sq-server/src/cli/serve.rs` - add --seeds CLI flag
|
|
|
|
## Configuration
|
|
|
|
```
|
|
SQ_SEEDS=node1:6060,node2:6060 # Seed node addresses
|
|
SQ_NODE_ID=node-1 # Unique node ID
|
|
SQ_HEARTBEAT_INTERVAL_MS=5000 # Heartbeat every 5s
|
|
SQ_FAILURE_THRESHOLD=3 # Missed heartbeats before suspected
|
|
```
|
|
|
|
## Membership State Machine
|
|
|
|
```
|
|
Unknown -> Alive (on Join response or Heartbeat)
|
|
Alive -> Suspected (missed 3 heartbeats)
|
|
Suspected -> Dead (suspected for 30 seconds)
|
|
Dead -> Alive (on successful re-Join)
|
|
```
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] Start 3 nodes with seed list, all discover each other
|
|
- [ ] Status RPC shows all 3 nodes as "alive"
|
|
- [ ] Stop one node, others detect it as "suspected" then "dead"
|
|
- [ ] Restart dead node, it re-joins and becomes "alive"
|
|
- [ ] Node with no seeds starts as single-node cluster
|