Files
sq/todos/SQ-021-write-replication.md
2026-02-27 12:15:43 +01:00

1.2 KiB

SQ-021: Write Replication

Status: [x] DONE Blocked by: SQ-020, SQ-010 Priority: High

Description

Writes are replicated to N peers before ack to client. Simple quorum approach: coordinator writes locally, sends to peers, waits for majority ack.

Files to Create/Modify

  • crates/sq-cluster/src/replication.rs - Replicator with quorum logic
  • crates/sq-server/src/grpc/cluster.rs - ReplicateEntries RPC impl
  • crates/sq-server/src/grpc/data_plane.rs - update Publish to use Replicator

Replication Flow

  1. Coordinator receives Publish request
  2. Coordinator writes to local WAL, assigns offset
  3. Coordinator sends ReplicateEntries to all known alive peers
  4. Coordinator waits for W acks (W = floor(N/2) + 1, where N = replication factor)
  5. On quorum reached: ack to client
  6. On quorum timeout: return error to client

Acceptance Criteria

  • 3-node cluster: publish message, verify all 3 nodes have it in WAL
  • 3-node cluster, 1 node down: publish succeeds (2/3 quorum)
  • 3-node cluster, 2 nodes down: publish fails (no quorum)
  • ACK_MODE_LOCAL: ack after local WAL only (skip replication)
  • ACK_MODE_NONE: return immediately, replicate async
  • Replication timeout: configurable, default 5 seconds