1.2 KiB
1.2 KiB
SQ-021: Write Replication
Status: [ ] TODO
Blocked by: SQ-020, SQ-010
Priority: High
Description
Writes are replicated to N peers before ack to client. Simple quorum approach: coordinator writes locally, sends to peers, waits for majority ack.
Files to Create/Modify
crates/sq-cluster/src/replication.rs- Replicator with quorum logiccrates/sq-server/src/grpc/cluster.rs- ReplicateEntries RPC implcrates/sq-server/src/grpc/data_plane.rs- update Publish to use Replicator
Replication Flow
- Coordinator receives Publish request
- Coordinator writes to local WAL, assigns offset
- Coordinator sends ReplicateEntries to all known alive peers
- Coordinator waits for W acks (W = floor(N/2) + 1, where N = replication factor)
- On quorum reached: ack to client
- On quorum timeout: return error to client
Acceptance Criteria
- 3-node cluster: publish message, verify all 3 nodes have it in WAL
- 3-node cluster, 1 node down: publish succeeds (2/3 quorum)
- 3-node cluster, 2 nodes down: publish fails (no quorum)
- ACK_MODE_LOCAL: ack after local WAL only (skip replication)
- ACK_MODE_NONE: return immediately, replicate async
- Replication timeout: configurable, default 5 seconds