34 lines
1.2 KiB
Markdown
34 lines
1.2 KiB
Markdown
# SQ-021: Write Replication
|
|
|
|
**Status:** `[x] DONE`
|
|
**Blocked by:** SQ-020, SQ-010
|
|
**Priority:** High
|
|
|
|
## Description
|
|
|
|
Writes are replicated to N peers before ack to client. Simple quorum approach: coordinator writes locally, sends to peers, waits for majority ack.
|
|
|
|
## Files to Create/Modify
|
|
|
|
- `crates/sq-cluster/src/replication.rs` - Replicator with quorum logic
|
|
- `crates/sq-server/src/grpc/cluster.rs` - ReplicateEntries RPC impl
|
|
- `crates/sq-server/src/grpc/data_plane.rs` - update Publish to use Replicator
|
|
|
|
## Replication Flow
|
|
|
|
1. Coordinator receives Publish request
|
|
2. Coordinator writes to local WAL, assigns offset
|
|
3. Coordinator sends ReplicateEntries to all known alive peers
|
|
4. Coordinator waits for W acks (W = floor(N/2) + 1, where N = replication factor)
|
|
5. On quorum reached: ack to client
|
|
6. On quorum timeout: return error to client
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] 3-node cluster: publish message, verify all 3 nodes have it in WAL
|
|
- [ ] 3-node cluster, 1 node down: publish succeeds (2/3 quorum)
|
|
- [ ] 3-node cluster, 2 nodes down: publish fails (no quorum)
|
|
- [ ] ACK_MODE_LOCAL: ack after local WAL only (skip replication)
|
|
- [ ] ACK_MODE_NONE: return immediately, replicate async
|
|
- [ ] Replication timeout: configurable, default 5 seconds
|