CSMT (Binary Trie)
The Compact Sparse Merkle Tree is MTS's binary trie implementation. It uses single-bit branching (L/R directions) with path compression via jump fields.
Info
This page covers CSMT-specific details. For the shared interface, see MTS Interface. For general Merkle tree concepts, see Concepts.
Overview
CSMT is the original implementation in this package. It provides:
- Binary trie with path compression (
Indirectnodes withjumpfields) - CBOR-encoded inclusion proofs (see Inclusion Proof Format)
- Completeness proofs over prefix-grouped subtrees
- Secondary indexing via configurable
treePrefix - CLI tool for interactive operations (see CLI Manual)
- TypeScript proof verifier for browser/Node.js (see TypeScript Verifier)
Key Types
FromKV - Key/Value Conversion
data FromKV k v a = FromKV
{ fromK :: k -> Key -- User key to binary path
, fromV :: v -> a -- User value to hash
, treePrefix :: v -> Key -- Optional prefix for secondary indexing
}
The default fromKVHashes hashes both key and value with Blake2b-256 and
uses no prefix (treePrefix = const []).
Indirect - Tree Nodes
data Indirect a = Indirect
{ jump :: Key -- Path compression: bits to skip
, value :: a -- Hash value
}
Hashing - Hash Operations
data Hashing a = Hashing
{ rootHash :: Indirect a -> a
, combineHash :: Indirect a -> Indirect a -> a
}
Storage
CSMT uses two columns:
| Column | Key | Value |
|---|---|---|
| KV | User key (k) |
User value (v) |
| CSMT | treePrefix(v) <> fromK(k) |
Indirect a (jump + hash) |
See Storage Layer for serialization details.
Inclusion Proofs
CSMT inclusion proofs are CBOR-encoded and self-contained. They include the key, value hash, root hash, proof steps (sibling hashes), and root jump path. Verification is pure computation.
See Inclusion Proof Format for the CDDL specification and verification algorithm.
Completeness Proofs
CSMT supports completeness proofs via the CSMT.Proof.Completeness
module. When treePrefix groups entries by a common prefix, a
completeness proof demonstrates that a given set of leaves is the
entire contents of that subtree:
- Collect all leaves under the prefix (
collectValues) - Generate a proof (
generateProof) - Verify by folding the proof and comparing the reconstructed root
CLI Tool
The mts executable provides an interactive CLI for CSMT operations.
It reads from CSMT_DB_PATH and supports insert, delete, query, root
hash, and proof generation/verification.
See CLI Manual for full usage.
Parallel Population (patchParallel)
For bulk-loading large datasets (e.g. Cardano UTxO restoration) or
replaying journal entries, CSMT.Populate.patchParallel builds
independent bucket transactions from a batch of operations:
- Prepare: caller runs
expandToBucketDepthto push any path-compressed node whose jump crosses the bucket boundary below it. This ensures each bucket's subtree is self-contained. - Bucket:
patchParallelgroups operations by the first N bits of the tree key and returns oneTransactionper non-empty bucket. Each transaction applies tree ops (insertingDirect/deletingDirect) and deletes the corresponding journal entries. - Execute: the caller runs the returned transactions
concurrently (e.g. via
mapConcurrently_). The subtrees write to disjoint regions of the CSMT column. - Merge: caller runs
mergeSubtreeRootsto read the subtree roots and rebuild the top N levels with correct path compression.
Works on both empty and non-empty trees. Supports inserts and deletes
via the PatchOp type.
patchParallel
:: (GCompare d, Ord jk, Monad m)
=> Int -- bucket bits (e.g. 4 → 16 buckets)
-> Key -- global prefix
-> Hashing a
-> Selector d Key (Indirect a) -- CSMT column
-> Selector d jk v -- journal column
-> [(jk, PatchOp Key a)] -- (journal key, tree op) pairs
-> [Transaction m cf d ops ()] -- independent bucket transactions
Benchmarks on a development machine (RocksDB, -O2 -threaded -N):
| N entries | Sequential | 4 bits (16x) | 8 bits (256x) |
|---|---|---|---|
| 1,000 | 6,500/s | 34,000/s (5x) | 43,000/s (7x) |
| 5,000 | 5,000/s | 29,000/s (6x) | 35,000/s (7x) |
| 10,000 | 4,700/s | 25,000/s (5x) | 32,000/s (7x) |
| 50,000 | 3,700/s | 19,000/s (5x) | 24,000/s (6x) |
Split-Mode Operations (Ops GADT)
The Ops GADT provides bidirectional mode transitions for the full
KVOnly ↔ Full lifecycle:
data Ops (mode :: Mode) m cf d ops k v a where
OpsKVOnly :: { kvCommon :: CommonOps, toFull :: IO (Maybe ...) } -> Ops 'KVOnly ...
OpsFull :: { fullCommon :: CommonOps, opsRootHash :: ..., toKVOnly :: IO (Maybe ...) } -> Ops 'Full ...
- KVOnly: insert/delete write KV + journal.
toFullreplays the journal viapatchParallelwith concurrent execution. - Full: insert/delete write KV + update CSMT tree.
toKVOnlyverifies journal is empty before transitioning. CommonOps:opsInsert,opsDelete,opsQuery— available in both modes.
Crash Safety
The toFull transition performs three non-atomic steps:
expandToBucketDepth— separate transactionmapConcurrently_ (runTx . snd) bucketTxns— N parallel transactions (each deletes its journal entries)mergeSubtreeRoots— separate transaction
A sentinel flag in the journal column brackets this sequence:
1. Write sentinel + expandToBucketDepth (one transaction)
2. replayLoop (parallel bucket transactions)
3. mergeSubtreeRoots + delete sentinel (one transaction)
If the process crashes between steps 1–4, the next toFull call
detects the sentinel, runs mergeSubtreeRoots to fix the tree
top, deletes the sentinel, then replays remaining journal entries
normally.
Sentinel format: key = "" (empty, sorts first), value =
0xFF || Word8(bucketBits) || encodedPrefix.
Recovery guarantees:
- Each bucket transaction is atomic (tree ops + journal deletes)
- Committed buckets are consistent; their journal entries are gone
- Uncommitted entries remain in the journal for re-replay
expandToBucketDepthis idempotentmergeSubtreeRootsreads subtree roots and rebuilds the top; safe to re-run after partial bucket commits
The DbState type exposes recovery at database open time:
data DbState m cf d ops k v a
= NeedsRecovery (IO (DbState m cf d ops k v a))
| Ready (ReadyState m cf d ops k v a)
Benchmarks: CSMT vs MPF
Unified benchmark comparing CSMT (binary trie, Haskell/RocksDB) against MPF (16-ary trie, Haskell/RocksDB and JS/LevelDB reference). All use blake2b-hashed keys for identical trie depth. Proofs use Aiken CBOR for MPF and compact CBOR for CSMT.
N = 1,000
| Insert | Proof gen | Delete | Proof CBOR | DB size | |
|---|---|---|---|---|---|
| CSMT (Haskell/RocksDB) | 3,542/s | 5,486/s | 4,154/s | 453 bytes | 1,053 KB |
| MPF (Haskell/RocksDB) | 3,146/s | 3,073/s | 2,860/s | 426 bytes | 414 KB |
| MPF JS (LevelDB) | 897/s | 1,467/s | 1,050/s | 426 bytes | 4,362 KB |
N = 10,000
| Insert | Proof gen | Delete | Proof CBOR | DB size | |
|---|---|---|---|---|---|
| CSMT (Haskell/RocksDB) | 2,750/s | 4,285/s | 3,170/s | 582 bytes | 12,412 KB |
| MPF (Haskell/RocksDB) | 2,406/s | 2,451/s | 2,212/s | 538 bytes | 3,557 KB |
| MPF JS (LevelDB) | 723/s | 1,082/s | 805/s | 538 bytes | 10,172 KB |
N = 100,000
| Insert | Proof gen | Delete | Proof CBOR | DB size | |
|---|---|---|---|---|---|
| CSMT (Haskell/RocksDB) | 2,146/s | 3,410/s | 2,482/s | 711 bytes | 169,778 KB |
| MPF (Haskell/RocksDB) | 1,807/s | 1,846/s | 1,710/s | 646 bytes | 38,185 KB |
Key findings
- CSMT is faster on all operations (1.2-1.8x on insert, 1.5-1.8x on proofs)
- MPF proofs are 6-9% smaller at N=1K (426 vs 453 bytes), gap narrows at scale
- MPF DB is 2.5-4.5x smaller than CSMT across all sizes
- Both Haskell implementations are 3-4x faster than the JS MPF reference
- CSMT compact proofs are 58% smaller than the old CBOR encoding (453 vs 1,079 at N=1K)
Proof size details
| N | CSMT old CBOR | CSMT compact | MPF Aiken CBOR |
|---|---|---|---|
| 1,000 | 1,079 bytes | 453 bytes | 426 bytes |
| 10,000 | 1,195 bytes | 582 bytes | 538 bytes |
| 100,000 | 1,321 bytes | 711 bytes | 646 bytes |
patchParallel (CSMT bulk population)
For initial population from N entries, patchParallel provides 5-7x
speedup over sequential insertion by distributing work across 16-256
independent subtree buckets:
| N entries | Sequential | 4 bits (16x) | 8 bits (256x) |
|---|---|---|---|
| 1,000 | 6,500/s | 34,000/s (5x) | 43,000/s (7x) |
| 5,000 | 5,000/s | 29,000/s (6x) | 35,000/s (7x) |
| 10,000 | 4,700/s | 25,000/s (5x) | 32,000/s (7x) |
| 50,000 | 3,700/s | 19,000/s (5x) | 24,000/s (6x) |
Worked Example
See Worked Example for a step-by-step walkthrough of CSMT storage and hash computation.