CORRECTNESS

Built to be trusted, not just tested.

A database earns trust by construction. EzraDB's consistency-critical core is formally model-checked, exercised by deterministic simulation, and fuzzed for query correctness — with every failure reproducible from a seed. Here is exactly how, and exactly where the line is.

Formally model-checked coreDeterministic simulationSeed-reproducible replay

Formally model-checked

TLA+ and Stateright models of Raft replication, leader-lease, MVCC/SSI serializability, parallel-commit, online-DDL.

Deterministic simulation

FoundationDB-style DST covering all five Raft safety properties, with seed-reproducible replay.

Metamorphic fuzzing

SQLancer-style TLP and NoREC query-correctness fuzzing.

Mutation-gated

CI-gated mutation testing on correctness-critical crates.

BATTLE-TESTED, MEASURABLY

v0.1 — tested like it can never die.

The ambition is the reliability bar set by the most trusted databases in the world: the database that never dies. That is an engineering practice, not a slogan — a published 502-line mission-critical standard with four pillars of gates, audited at every phase. The full CI stack — 29 lanes covering fuzzing, deterministic simulation, mutation testing, sanitizers, and differential fuzzing — runs on dedicated build machines on every gate.

2,701sqllogictest files — ~20,900 checked queries

14,300+test functions across 109 crates

~356pinned regression tests — every fixed bug stays fixed, forever

15fuzz targets, plus differential fuzzing vs PostgreSQL 18

5protocols formally model-checked (Stateright + TLA+)

66fault-injection failpoints across 9 crates

107end-to-end scenarios — crash-mid-DDL to auth races

17+5driver lanes + ORM suites validated against stock PostgreSQL 18

56hardening releases shipped in 39 days

THE PLAYBOOK

The techniques the most trusted databases use — in use here.

Marked honestly: shipped is verifiable in the codebase today; building is specced and in progress.

Deterministic simulation

shipped

the FoundationDB playbook

Seeded, simulated clusters with partitions and healing. All five Raft safety properties are checked, and any failure replays exactly from a single seed.

Metamorphic query fuzzing

shipped

the SQLancer / CockroachDB playbook

TLP and NoREC oracles diff EzraDB against PostgreSQL 18, and a cross-engine sweep asserts row, column, and hybrid engines never diverge.

Formal model-checking

shipped

the TLA+ practice

Raft replication, MVCC/SSI visibility, and parallel commit are model-checked in Stateright; leader-lease and online-DDL safety in TLA+. The spec must change before the code can.

Every bug becomes a test

shipped

the SQLite rule

Every fixed bug lands with a regression test pinned to it permanently — ~356 so far, and they never get deleted.

Mutation-gated kernels

shipped

rare anywhere

In four correctness-critical crates (optimizer, runtime filters, hash join, memory broker), a single surviving mutant fails the build.

Kill −9 durability drills

shipped

the "never dies" bar

Write, kill all three nodes, restart on the same volumes: the data survives and quorum re-forms. A soak harness hammers restarts under sustained load — it has already caught a real consensus bug before any user could.

Jepsen-style nemesis testing

building

the Jepsen playbook

A published 12-workload, 14-nemesis test plan with the harness in active development.

72-hour continuous soak

building

the mission-critical spec

Specced under the published engineering standard; hardware provisioned.

FORMAL METHODS

Five core protocols, model-checked in TLA+ and Stateright.

The protocols where a subtle race means data loss are specified as executable models and checked before a line of implementation lands. Model-checking explores the state space for the invariant violations that unit tests never think to write.

Raft replication

Log agreement and leader election across replicas.

Leader-lease safety

No two leaseholders serve a range at the same time.

MVCC / SSI serializability

Snapshot isolation and serializable schedules hold.

Parallel-commit

Atomic multi-range commit — no torn or partial transactions.

Online-DDL safety

Schema changes stay safe under concurrent traffic.

Spec PR before code

Protocol changes are spec-first: the model and its invariants ship in a spec PR and pass the checker before implementation is written. The spec is the source of truth, not an afterthought.

DETERMINISTIC SIMULATION

FoundationDB-style simulation, replayable from a seed.

The cluster runs against a simulated world — clock, disk, and network under our control — so we can inject partitions, reorderings, and crashes on a single thread. Every run is deterministic: a failing seed replays the exact schedule, bit for bit, until the bug is fixed.

All five Raft safety properties, continuously checked

Election Safety

At most one leader per term.

Leader Append-Only

A leader never overwrites or deletes its own log.

Log Matching

Identical entries imply identical prefixes.

Leader Completeness

Committed entries survive in every future leader.

State Machine Safety

No two nodes apply different commands at an index.

seed → identical schedule → identical failure. Reproduce, then fix.

QUERY-CORRECTNESS FUZZING

SQLancer-style metamorphic fuzzing, run nightly.

Metamorphic testing finds wrong answers with no oracle: transform a query in a result-preserving way and any divergence is a real bug. Paired with multi-layer boundary fuzzing, it hunts the logic errors that crash-only fuzzers miss.

TLP

Ternary Logic Partitioning splits a predicate into TRUE / FALSE / NULL branches whose union must equal the unpartitioned result.

NoREC

Non-optimizing Reference Engine Construction compares an optimized plan against a naive rewrite — the row counts must agree.

Boundary fuzzing

Multi-layer fuzzing at the wire, parser, and type boundaries, plus PostgreSQL-isolationtester-style permutation tests for SI / SSI conformance.

CI GATES

Tests that prove the tests, and engines that must agree.

Mutation testing asks a harder question than coverage: if we deliberately break the code, does a test notice? And because the same SQL runs on two very different storage engines, we gate on them producing identical answers.

Mutation testing

CI-gated mutation testing on the correctness-critical crates — the optimizer, runtime filters, hash join, and memory broker. A single injected fault that survives the suite fails the build.

Row-vs-column parity gate

Every TPC-H query runs on both the row and column engines and must return byte-identical results. One engine, one set of answers.

22 / 22TPC-H queries, byte-identical across engines

0Tolerated divergences

WHERE THE LINE IS

What these methods do — and what they don't.

This is bounded model-checking over small state spaces — strong evidence of correctness, not a machine-checked proof. We say model-checked, never “proven.”
A Jepsen harness is in progress; consistency today is covered by deterministic simulation and formal model-checking, not a Jepsen result.

Pre-1.0, and honest about it. Formally model-checked core, reproducible benchmarks, a public roadmap.