Files
sq/todos/SQ-021-write-replication.md
2026-02-26 21:52:50 +01:00

34 lines
1.2 KiB
Markdown

# SQ-021: Write Replication
**Status:** `[ ] TODO`
**Blocked by:** SQ-020, SQ-010
**Priority:** High
## Description
Writes are replicated to N peers before ack to client. Simple quorum approach: coordinator writes locally, sends to peers, waits for majority ack.
## Files to Create/Modify
- `crates/sq-cluster/src/replication.rs` - Replicator with quorum logic
- `crates/sq-server/src/grpc/cluster.rs` - ReplicateEntries RPC impl
- `crates/sq-server/src/grpc/data_plane.rs` - update Publish to use Replicator
## Replication Flow
1. Coordinator receives Publish request
2. Coordinator writes to local WAL, assigns offset
3. Coordinator sends ReplicateEntries to all known alive peers
4. Coordinator waits for W acks (W = floor(N/2) + 1, where N = replication factor)
5. On quorum reached: ack to client
6. On quorum timeout: return error to client
## Acceptance Criteria
- [ ] 3-node cluster: publish message, verify all 3 nodes have it in WAL
- [ ] 3-node cluster, 1 node down: publish succeeds (2/3 quorum)
- [ ] 3-node cluster, 2 nodes down: publish fails (no quorum)
- [ ] ACK_MODE_LOCAL: ack after local WAL only (skip replication)
- [ ] ACK_MODE_NONE: return immediately, replicate async
- [ ] Replication timeout: configurable, default 5 seconds