CORRECTNESS
Built to be trusted, not just tested.
A database earns trust by construction. EzraDB's consistency-critical core is formally model-checked, exercised by deterministic simulation, and fuzzed for query correctness — with every failure reproducible from a seed. Here is exactly how, and exactly where the line is.
Formally model-checked
TLA+ and Stateright models of Raft replication, leader-lease, MVCC/SSI serializability, parallel-commit, online-DDL.
Deterministic simulation
FoundationDB-style DST covering all five Raft safety properties, with seed-reproducible replay.
Metamorphic fuzzing
SQLancer-style TLP and NoREC query-correctness fuzzing.
Mutation-gated
CI-gated mutation testing on correctness-critical crates.
BATTLE-TESTED, MEASURABLY
v0.1 — tested like it can never die.
The ambition is the reliability bar set by the most trusted databases in the world: the database that never dies. That is an engineering practice, not a slogan — a published 502-line mission-critical standard with four pillars of gates, audited at every phase. The full CI stack — 29 lanes covering fuzzing, deterministic simulation, mutation testing, sanitizers, and differential fuzzing — runs on dedicated build machines on every gate.
THE PLAYBOOK
The techniques the most trusted databases use — in use here.
Marked honestly: shipped is verifiable in the codebase today; building is specced and in progress.
Deterministic simulation
shippedthe FoundationDB playbook
Seeded, simulated clusters with partitions and healing. All five Raft safety properties are checked, and any failure replays exactly from a single seed.
Metamorphic query fuzzing
shippedthe SQLancer / CockroachDB playbook
TLP and NoREC oracles diff EzraDB against PostgreSQL 18, and a cross-engine sweep asserts row, column, and hybrid engines never diverge.
Formal model-checking
shippedthe TLA+ practice
Raft replication, MVCC/SSI visibility, and parallel commit are model-checked in Stateright; leader-lease and online-DDL safety in TLA+. The spec must change before the code can.
Every bug becomes a test
shippedthe SQLite rule
Every fixed bug lands with a regression test pinned to it permanently — ~356 so far, and they never get deleted.
Mutation-gated kernels
shippedrare anywhere
In four correctness-critical crates (optimizer, runtime filters, hash join, memory broker), a single surviving mutant fails the build.
Kill −9 durability drills
shippedthe "never dies" bar
Write, kill all three nodes, restart on the same volumes: the data survives and quorum re-forms. A soak harness hammers restarts under sustained load — it has already caught a real consensus bug before any user could.
Jepsen-style nemesis testing
buildingthe Jepsen playbook
A published 12-workload, 14-nemesis test plan with the harness in active development.
72-hour continuous soak
buildingthe mission-critical spec
Specced under the published engineering standard; hardware provisioned.
FORMAL METHODS
Five core protocols, model-checked in TLA+ and Stateright.
The protocols where a subtle race means data loss are specified as executable models and checked before a line of implementation lands. Model-checking explores the state space for the invariant violations that unit tests never think to write.
Raft replication
Log agreement and leader election across replicas.
Leader-lease safety
No two leaseholders serve a range at the same time.
MVCC / SSI serializability
Snapshot isolation and serializable schedules hold.
Parallel-commit
Atomic multi-range commit — no torn or partial transactions.
Online-DDL safety
Schema changes stay safe under concurrent traffic.
Spec PR before code
Protocol changes are spec-first: the model and its invariants ship in a spec PR and pass the checker before implementation is written. The spec is the source of truth, not an afterthought.
DETERMINISTIC SIMULATION
FoundationDB-style simulation, replayable from a seed.
The cluster runs against a simulated world — clock, disk, and network under our control — so we can inject partitions, reorderings, and crashes on a single thread. Every run is deterministic: a failing seed replays the exact schedule, bit for bit, until the bug is fixed.
All five Raft safety properties, continuously checked
At most one leader per term.
A leader never overwrites or deletes its own log.
Identical entries imply identical prefixes.
Committed entries survive in every future leader.
No two nodes apply different commands at an index.
seed → identical schedule → identical failure. Reproduce, then fix.
QUERY-CORRECTNESS FUZZING
SQLancer-style metamorphic fuzzing, run nightly.
Metamorphic testing finds wrong answers with no oracle: transform a query in a result-preserving way and any divergence is a real bug. Paired with multi-layer boundary fuzzing, it hunts the logic errors that crash-only fuzzers miss.
TLP
Ternary Logic Partitioning splits a predicate into TRUE / FALSE / NULL branches whose union must equal the unpartitioned result.
NoREC
Non-optimizing Reference Engine Construction compares an optimized plan against a naive rewrite — the row counts must agree.
Boundary fuzzing
Multi-layer fuzzing at the wire, parser, and type boundaries, plus PostgreSQL-isolationtester-style permutation tests for SI / SSI conformance.
CI GATES
Tests that prove the tests, and engines that must agree.
Mutation testing asks a harder question than coverage: if we deliberately break the code, does a test notice? And because the same SQL runs on two very different storage engines, we gate on them producing identical answers.
Mutation testing
CI-gated mutation testing on the correctness-critical crates — the optimizer, runtime filters, hash join, and memory broker. A single injected fault that survives the suite fails the build.
Row-vs-column parity gate
Every TPC-H query runs on both the row and column engines and must return byte-identical results. One engine, one set of answers.
WHERE THE LINE IS
What these methods do — and what they don't.
- This is bounded model-checking over small state spaces — strong evidence of correctness, not a machine-checked proof. We say model-checked, never “proven.”
- A Jepsen harness is in progress; consistency today is covered by deterministic simulation and formal model-checking, not a Jepsen result.
Pre-1.0, and honest about it. Formally model-checked core, reproducible benchmarks, a public roadmap.