33
todos/SQ-021-write-replication.md
Normal file
33
todos/SQ-021-write-replication.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# SQ-021: Write Replication
|
||||
|
||||
**Status:** `[ ] TODO`
|
||||
**Blocked by:** SQ-020, SQ-010
|
||||
**Priority:** High
|
||||
|
||||
## Description
|
||||
|
||||
Writes are replicated to N peers before ack to client. Simple quorum approach: coordinator writes locally, sends to peers, waits for majority ack.
|
||||
|
||||
## Files to Create/Modify
|
||||
|
||||
- `crates/sq-cluster/src/replication.rs` - Replicator with quorum logic
|
||||
- `crates/sq-server/src/grpc/cluster.rs` - ReplicateEntries RPC impl
|
||||
- `crates/sq-server/src/grpc/data_plane.rs` - update Publish to use Replicator
|
||||
|
||||
## Replication Flow
|
||||
|
||||
1. Coordinator receives Publish request
|
||||
2. Coordinator writes to local WAL, assigns offset
|
||||
3. Coordinator sends ReplicateEntries to all known alive peers
|
||||
4. Coordinator waits for W acks (W = floor(N/2) + 1, where N = replication factor)
|
||||
5. On quorum reached: ack to client
|
||||
6. On quorum timeout: return error to client
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] 3-node cluster: publish message, verify all 3 nodes have it in WAL
|
||||
- [ ] 3-node cluster, 1 node down: publish succeeds (2/3 quorum)
|
||||
- [ ] 3-node cluster, 2 nodes down: publish fails (no quorum)
|
||||
- [ ] ACK_MODE_LOCAL: ack after local WAL only (skip replication)
|
||||
- [ ] ACK_MODE_NONE: return immediately, replicate async
|
||||
- [ ] Replication timeout: configurable, default 5 seconds
|
||||
Reference in New Issue
Block a user