Files
sq/todos/SQ-020-cluster-membership.md
2026-02-26 21:52:50 +01:00

1.3 KiB

SQ-020: Cluster Membership (Gossip)

Status: [ ] TODO Blocked by: SQ-009, SQ-019 Priority: Medium

Description

Nodes discover each other via seed list and maintain a membership list through periodic heartbeats.

Files to Create/Modify

  • crates/sq-cluster/src/lib.rs - module exports
  • crates/sq-cluster/src/membership.rs - seed list, join, heartbeat, failure detection
  • crates/sq-server/src/grpc/cluster.rs - ClusterService Join/Heartbeat RPC impl
  • crates/sq-server/src/cli/serve.rs - add --seeds CLI flag

Configuration

SQ_SEEDS=node1:6060,node2:6060   # Seed node addresses
SQ_NODE_ID=node-1                 # Unique node ID
SQ_HEARTBEAT_INTERVAL_MS=5000    # Heartbeat every 5s
SQ_FAILURE_THRESHOLD=3           # Missed heartbeats before suspected

Membership State Machine

Unknown -> Alive (on Join response or Heartbeat)
Alive -> Suspected (missed 3 heartbeats)
Suspected -> Dead (suspected for 30 seconds)
Dead -> Alive (on successful re-Join)

Acceptance Criteria

  • Start 3 nodes with seed list, all discover each other
  • Status RPC shows all 3 nodes as "alive"
  • Stop one node, others detect it as "suspected" then "dead"
  • Restart dead node, it re-joins and becomes "alive"
  • Node with no seeds starts as single-node cluster