Files
sq/todos/SQ-022-simulation-tests.md
2026-02-27 12:15:43 +01:00

1.8 KiB

SQ-022: Multi-Node Simulation Tests

Status: [x] DONE Blocked by: SQ-021, SQ-019 Priority: High

Description

Full TigerBeetle-inspired simulation test suite. Spin up multiple nodes with virtual I/O, inject faults, verify invariants.

Files to Create/Modify

  • crates/sq-sim/src/runtime.rs - test harness for multi-node simulation
  • crates/sq-sim/tests/invariants.rs - invariant checker functions
  • crates/sq-sim/tests/scenarios/mod.rs
  • crates/sq-sim/tests/scenarios/single_node.rs - S01-S04
  • crates/sq-sim/tests/scenarios/multi_node.rs - S05-S08
  • crates/sq-sim/tests/scenarios/failures.rs - S09-S12

Scenarios

  • S01: Single node, single producer, single consumer - baseline
  • S02: Single node, concurrent producers - offset ordering
  • S03: Single node, disk full during write - graceful error
  • S04: Single node, crash and restart - WAL recovery
  • S05: Three nodes, normal operation - replication works
  • S06: Three nodes, one crashes - remaining two continue
  • S07: Three nodes, network partition (2+1) - majority continues
  • S08: Three nodes, S3 outage - local WAL accumulates
  • S09: Consumer group, offset preservation
  • S10: High throughput burst - no message loss
  • S11: Slow consumer with WAL trimming - falls back to S3
  • S12: Node rejoins after long absence - catches up

Invariants (checked after every step)

  1. No acked message is ever lost
  2. Offsets strictly monotonic, no gaps
  3. CRC integrity on all reads
  4. Consumer group offsets never regress
  5. After network heal, replicas converge
  6. WAL never trimmed before S3 confirmation

Acceptance Criteria

  • All 12 scenarios pass
  • Each scenario runs with multiple random seeds (at least 10)
  • Invariant violations produce clear diagnostic output
  • Tests complete in < 60 seconds total