Skip to content

Library API

This page covers using the MTS library in Haskell applications. There are three levels of API:

  1. MTS Interface (recommended) - Implementation-agnostic
  2. CSMT Direct API - CSMT-specific operations
  3. MPF Direct API - MPF-specific operations

The MerkleTreeStore record provides a uniform API for both implementations. See MTS Interface for the full type definition.

Basic Usage

import MTS.Interface (MerkleTreeStore(..))

example :: MerkleTreeStore imp IO -> IO ()
example store = do
    -- Insert
    mtsInsert store "key1" "value1"
    mtsInsert store "key2" "value2"

    -- Root hash
    mroot <- mtsRootHash store
    print mroot

    -- Inclusion proof
    mp <- mtsMkProof store "key1"
    case mp of
        Nothing -> putStrLn "Key not found"
        Just proof -> do
            ok <- mtsVerifyProof store "value1" proof
            print ok  -- True

    -- Batch insert
    mtsBatchInsert store [("key3", "value3"), ("key4", "value4")]

    -- Delete
    mtsDelete store "key1"

Constructing Stores

See MTS Interface for csmtMerkleTreeStore and mpfMerkleTreeStore constructor signatures and examples.

2. CSMT Direct API

For CSMT-specific features (completeness proofs, CBOR proof format, CLI integration), use the mts:csmt sub-library directly.

Modules

Module Purpose
CSMT Re-exports the public API
CSMT.Hashes Blake2b-256 operations, fromKVHashes, hashHashing
CSMT.Interface FromKV, Hashing, Indirect, root
CSMT.Insertion inserting, expandToBucketDepth, mergeSubtreeRoots
CSMT.Deletion deleting
CSMT.Populate patchParallel, PatchOp — bucketed parallel replay
CSMT.Proof.Insertion buildInclusionProof, verifyInclusionProof, computeRootHash
CSMT.Proof.Completeness generateProof, collectValues, foldProof
CSMT.Backend.RocksDB RocksDB persistent backend
CSMT.Backend.Pure In-memory backend for testing
CSMT.Backend.Standalone Column selectors and codecs
CSMT.MTS CsmtImpl, Ops GADT, CommonOps, DbState, csmtMerkleTreeStore

Insert and Delete

import CSMT.Hashes (fromKVHashes)
import CSMT.Insertion (inserting)
import CSMT.Deletion (deleting)

-- In a transaction context:
inserting fromKVHashes hashing StandaloneKVCol StandaloneCSMTCol "key" "value"
deleting  fromKVHashes hashing StandaloneKVCol StandaloneCSMTCol "key"

Inclusion Proofs

import CSMT.Proof.Insertion (buildInclusionProof, verifyInclusionProof)

-- Generate (in transaction)
result <- buildInclusionProof fromKVHashes StandaloneKVCol StandaloneCSMTCol "key"
-- result :: Maybe (ByteString, InclusionProof Hash)

-- Verify (pure, requires trusted root hash)
verifyInclusionProof hashing trustedRootHash proof  -- :: Bool

Completeness Proofs

import CSMT.Proof.Completeness (generateProof, collectValues, foldProof)

-- Collect leaves under a prefix
leaves <- collectValues StandaloneCSMTCol prefix

-- Generate proof
mproof <- generateProof StandaloneCSMTCol prefix

-- Verify
let computed = foldProof (combineHash hashing) leaves proof

Custom Key/Value Types

import CSMT.Interface (FromKV(..))

myFromKV :: FromKV MyKey MyValue Hash
myFromKV = FromKV
    { fromK      = myKeyToPath
    , fromV      = myValueToHash
    , treePrefix = const []
    }

The treePrefix field enables secondary indexing by prepending a prefix derived from the value to the tree key.

Column Selectors

CSMT uses type-safe GADT column selectors:

  • StandaloneKVCol - Key-value column
  • StandaloneCSMTCol - CSMT tree column
  • StandaloneJournalCol - Journal column (KVOnly replay)

KVOnly Mode

For high-throughput ingest without tree overhead, use KVOnly mode via the Ops GADT:

import CSMT.MTS (mkKVOnlyOps, Ops(..), CommonOps(..))

-- Build KVOnly ops
let ops = mkKVOnlyOps prefix bucketBits chunkSize
            kvCol csmtCol journalCol journalIso
            fromKV hashing runTx runTxReplay trace

-- Insert (writes KV + journal, no tree)
runTx $ opsInsert (kvCommon ops) key value

-- Transition to Full (replays journal via patchParallel)
Just fullOps <- toFull ops
rootHash <- runTx $ opsRootHash fullOps

Crash Recovery

openOps checks for a sentinel flag left by a crashed toFull transition and returns a DbState value:

stateDiagram-v2
    [*] --> openOps
    openOps --> NeedsRecovery : sentinel present
    openOps --> Ready : no sentinel

    NeedsRecovery --> Ready : merge subtrees + delete sentinel

    Ready --> ChooseKVOnly
    Ready --> ChooseFull

    state "Mode transitions" as modes {
        ChooseKVOnly --> KVOnly
        ChooseFull --> Full : replays journal

        KVOnly --> Full : toFull (replay)
        Full --> KVOnly : toKVOnly (journal empty)
    }

If the process crashes mid-toFull (after the sentinel is written but before merge completes), the next openOps call detects the sentinel, runs mergeSubtreeRoots, and returns a clean Ready state.

Transaction Runners

mkKVOnlyOps and openOps take two transaction runners:

  • runTx — guarded (e.g. MVar-locked), used for normal insert/delete operations where concurrent block processing requires serialization.
  • runTxReplay — unguarded, used exclusively during the toFull journal replay where patchParallel splits work into independent bucket transactions.
flowchart TB
    subgraph "Normal operations"
        ins[opsInsert / opsDelete] -->|runTx| db[(Database)]
    end

    subgraph "toFull replay"
        journal[Journal chunks] --> patch[patchParallel]
        patch --> b1[Bucket 1]
        patch --> b2[Bucket 2]
        patch --> bN[Bucket N]
        b1 -->|runTxReplay| db
        b2 -->|runTxReplay| db
        bN -->|runTxReplay| db
    end

    style b1 fill:#e8f5e9
    style b2 fill:#e8f5e9
    style bN fill:#e8f5e9

Bucket transactions are safe to run without a lock because patchParallel partitions operations by tree key prefix — each bucket writes to a disjoint subtree, so there are no data races.

Parallel Replay

The toFull transition replays journal entries using patchParallel, which splits operations by tree key prefix into independent bucket transactions that run concurrently. See CSMT for benchmarks.

3. MPF Direct API

For MPF-specific features (batch/streaming inserts, hex key manipulation), use the mts:mpf sub-library directly.

Modules

Module Purpose
MPF Re-exports the public API
MPF.Hashes Blake2b-256 operations, fromHexKVHashes, mpfHashing
MPF.Interface FromHexKV, HexIndirect, HexKey, HexDigit
MPF.Insertion inserting, insertingBatch, insertingChunked, insertingStream
MPF.Deletion deleting
MPF.Proof.Insertion mkMPFInclusionProof, verifyMPFInclusionProof, foldMPFProof
MPF.Backend.RocksDB RocksDB persistent backend
MPF.Backend.Pure In-memory backend for testing
MPF.Backend.Standalone Column selectors and codecs
MPF.MTS MpfImpl, mpfMerkleTreeStore

Insert Modes

import MPF.Insertion (inserting, insertingBatch, insertingChunked, insertingStream)

-- Sequential (small datasets)
inserting fromKV hashing kvCol mpfCol key value

-- Batch (medium datasets, O(n log n))
insertingBatch fromKV hashing kvCol mpfCol [(k1, v1), (k2, v2)]

-- Chunked (large datasets, bounded memory)
insertingChunked fromKV hashing kvCol mpfCol chunkSize pairs

-- Streaming (very large datasets, ~16x lower peak memory)
insertingStream fromKV hashing kvCol mpfCol pairs

Hex Key Operations

import MPF.Interface (byteStringToHexKey, hexKeyToByteString, HexDigit(..), HexKey)

let key = byteStringToHexKey "hello"  -- [HexDigit 6, HexDigit 8, ...]
let bs  = hexKeyToByteString key       -- round-trips back

Column Selectors

MPF uses its own GADT column selectors:

  • MPFStandaloneKVCol - Key-value column
  • MPFStandaloneMPFCol - MPF tree column

Error Handling

  • Nothing from proof/root operations means "key not found" or "tree empty"
  • Invalid proofs return False from verification
  • Database errors surface as exceptions
  • MPF completeness proof operations currently fail

Performance Tips

  1. Use the MTS interface when you don't need implementation-specific features
  2. Batch inserts for MPF are significantly faster than sequential for large datasets
  3. Streaming inserts reduce peak memory by ~16x by processing subtrees independently
  4. Group transactions to amortize RocksDB write overhead