Skip to content

Bank Transaction Benchmark

A realistic banking workload benchmark comparing five embedded key-value storage engines on atomic transactional operations modeled after TPC-B.

Workload

Each successful transfer performs 5 key-value operations in a single atomic transaction:

  1. Read source account balance
  2. Read destination account balance
  3. Update source balance (debit)
  4. Update destination balance (credit)
  5. Insert transaction log entry (big-endian sequence number key with transfer details)

This mirrors the TPC-B debit-credit pattern (3 updates + 1 select + 1 insert) and exercises both random-access updates and sequential-key inserts within the same transaction.

  • 1,000,000 accounts with random names (dictionary words + synthetic binary/decimal keys)
  • 10,000,000 transfer attempts per phase (6,856,951 successful in write-only phase)
  • Triangular access distribution -- some accounts are "hot," mimicking real-world Pareto-like skew
  • Deterministic -- identical RNG seed ensures every engine processes the exact same workload
  • Validated -- balance conservation and transaction log entry count verified after completion

Fairness Controls

All engines use identical batching and sync parameters:

Parameter Value
Batch size 100 transfers per commit
Sync frequency Every 100 commits
Sync mode none (no forced durability)
Initial balance 1,000,000 per account
RNG seed 12345

Results: Apple M5 Max (ARM64, macOS)

Transaction Throughput

The core metric -- sustained transfers per second over 10M operations. Each successful transfer performs 2 reads + 2 updates + 1 insert.

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#2563EB'}}}}%%
xychart-beta
    title "Transaction Throughput — Write-Only (transfers/sec)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Transfers per second" 0 --> 400000
    bar [376691, 356703, 225937, 126341, 57138]
Engine Transfers/sec KV Ops/sec Relative
PsiTri 376,691 1,883,455 1.00x
PsiTriRocks 356,703 1,783,515 0.95x
TidesDB 225,937 1,129,685 0.60x
RocksDB 126,341 631,705 0.34x
MDBX 57,138 285,690 0.15x

Each transfer performs 5 key-value operations (2 reads + 2 updates + 1 insert), so the effective KV ops/sec is 5x the transfer rate.

PsiTri's DWAL layer absorbs burst writes in an in-memory ART (adaptive radix trie) buffer, then drains them to the COW trie via background merge threads. This decouples write latency from COW cost -- the writer thread never blocks on trie mutations. There are no compaction stalls and no memtable flushes; the ART buffer provides sub-microsecond commit latency while the merge pipeline sustains throughput.

Bulk Load

Inserting 1M accounts with initial balances in a single batch transaction.

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#16A34A'}}}}%%
xychart-beta
    title "Bulk Load — 1M Accounts (ops/sec)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Operations per second" 0 --> 3500000
    bar [3060276, 1795315, 959912, 1961333, 3078055]
Engine Time Ops/sec
PsiTri 0.33s 3.06M
MDBX 0.33s 3.08M
RocksDB 0.51s 1.96M
PsiTriRocks 0.56s 1.80M
TidesDB 1.04s 0.96M

Transaction Time

Wall-clock time for the 10M transfer phase.

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#DC2626'}}}}%%
xychart-beta
    title "Transaction Phase Wall Time — Write-Only (seconds)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Seconds" 0 --> 180
    bar [26.547, 28.034, 44.260, 79.151, 175.012]
Engine Time vs. PsiTri
PsiTri 26.5s --
PsiTriRocks 28.0s +5.6%
TidesDB 44.3s +66%
RocksDB 79.2s +198%
MDBX 175.0s +558%

PsiTri completes the same work in one-sixth the time MDBX requires.

Validation Scan

Full scan reading all 1M accounts plus ~6.9M transaction log entries.

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#9333EA'}}}}%%
xychart-beta
    title "Validation Scan — 7.9M Entries (ops/sec)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Operations per second" 0 --> 115000000
    bar [27619781, 37644460, 1843029, 14153353, 109856761]
Engine Time Ops/sec
MDBX 0.072s 109.9M
PsiTriRocks 0.209s 37.6M
PsiTri 0.284s 27.6M
RocksDB 0.555s 14.2M
TidesDB 4.263s 1.84M

MDBX dominates sequential scanning -- its B+tree stores keys in sorted order with contiguous leaf pages.

Storage Efficiency

Reachable Data Size

The theoretical minimum raw data size is 275 MB. This chart shows bytes occupied by live, reachable objects.

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#2563EB'}}}}%%
xychart-beta
    title "Reachable Data Size (MB)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Megabytes" 0 --> 400
    bar [303, 304, 263, 320, 366]
Engine Reachable Data vs. Theoretical (275 MB) File Size
TidesDB 263 MB 0.96x 263 MB
PsiTri 303 MB 1.10x 1,088 MB
PsiTriRocks 304 MB 1.11x 1,210 MB
RocksDB 314 MB 1.14x 320 MB
MDBX 366 MB 1.33x 640 MB

PsiTri's reachable data is only 1.10x the theoretical minimum, thanks to graduated leaf node sizing.

File Size

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#DC2626'}}}}%%
xychart-beta
    title "On-Disk File Size (MB)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Megabytes" 0 --> 1300
    bar [1088, 1210, 263, 320, 640]

PsiTri's file size is 3.6x its reachable data. The dead space consists of copy-on-write duplicates awaiting compaction and allocator free space within segments. The background compactor keeps growth bounded during sustained write workloads.

Concurrent Read Performance

The benchmark runs a second phase with a concurrent reader thread performing Pareto-distributed point lookups.

Write Throughput Under Read Load

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#2563EB'}}}}%%
xychart-beta
    title "Write Throughput: Write-Only vs Write+Read (tx/sec)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Transfers per second" 0 --> 400000
    bar [376691, 356703, 225937, 126341, 57138]
    bar [382416, 250564, 228892, 124864, 51858]
Engine Write-Only Write+Read Write Impact Reader reads/sec
PsiTri 376,691 382,416 +1.5% 1,719,249
PsiTriRocks 356,703 250,564 -29.8% 1,403,776
TidesDB 225,937 228,892 +1.3% 457,579
RocksDB 126,341 124,864 -1.2% 329,917
MDBX 57,138 51,858 -9.2% 1,750,952

PsiTri shows zero write degradation from concurrent reads. Its memory-mapped MVCC architecture means readers access the same physical pages with no locking.

Reader Throughput

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#9333EA'}}}}%%
xychart-beta
    title "Concurrent Reader Throughput (reads/sec)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Reads per second" 0 --> 1800000
    bar [1719249, 1403776, 457579, 329917, 1750952]

Summary

Engine Architecture Strength Weakness
PsiTri Radix/B-tree hybrid, mmap copy-on-write Fastest transactions (377K/s), zero read contention Larger file footprint
PsiTriMDBX PsiTri via MDBX API shim 12x faster writes than MDBX, drop-in replacement Reads 10–20% slower than native MDBX in buffered mode
PsiTriRocks PsiTri via RocksDB API shim Drop-in RocksDB replacement 29% write penalty with concurrent reads
TidesDB Skip-list + SSTables Good tx speed (226K/s), compact Slow scan, 100K txn op limit
RocksDB LSM-tree Compact storage, minimal read impact 3.0x slower than PsiTri
MDBX B+tree, MVCC copy-on-write Fastest scan (110M/s) and reads (1.75M/s) 6.6x slower transactions

All five engines pass validation: balance conservation verified and transaction log entry counts match across both phases.

PsiTriMDBX: Drop-In MDBX Replacement

PsiTriMDBX provides an MDBX-compatible C/C++ API backed by PsiTri's DWAL layer. The same mdbx_put / mdbx_get / mdbx_cursor_get calls work unchanged — only the linker target changes. This section compares PsiTriMDBX against native libmdbx using the identical bank benchmark binary (same source, different link target).

Workload: 100K accounts, 1M transactions, batch=100, sync=none, seed=12345.

Write Throughput

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#2563EB,#DC2626'}}}}%%
xychart-beta
    title "MDBX API: Write-Only Throughput (tx/sec)"
    x-axis ["PsiTriMDBX", "Native MDBX"]
    y-axis "Transfers per second" 0 --> 500000
    bar [452716, 37853]
Engine Write-Only Write+Read (latest) Ratio
PsiTriMDBX 452,716 tx/s 373,476 tx/s 12x / 4.7x faster
Native MDBX 37,853 tx/s 79,081 tx/s 1.0x

PsiTriMDBX writes are 12x faster than native MDBX. The DWAL layer absorbs writes into an in-memory btree and batches them for background merge into the persistent trie, avoiding MDBX's page-level copy-on-write overhead.

Read Modes

PsiTriMDBX exposes three read modes that control which DWAL layers are searched on each mdbx_get call (see MDBX migration guide):

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#2563EB,#16A34A,#7C3AED,#DC2626'}}}}%%
xychart-beta
    title "MDBX API: Reader Throughput by Read Mode (reads/sec)"
    x-axis ["Trie", "Buffered", "Latest", "Native MDBX"]
    y-axis "Reads per second" 0 --> 1600000
    bar [1534646, 1165690, 323626, 1276659]
Read Mode Layers Reader (reads/s) Write+Read (tx/s) Write Impact
Trie Tri only 1,535K 260K -42%
Buffered RO + Tri 1,166K 254K -39%
Latest RW + RO + Tri 324K 373K -20%
Native MDBX B+tree 1,277K 79K +109%
  • Trie mode reads only the merged COW trie — 20% faster than native MDBX on point lookups, with zero lock contention. The tradeoff: very recently committed data may not be visible until the background merge completes.
  • Buffered mode adds the RO btree snapshot — within 10% of native MDBX read speed, still with no lock contention with writers.
  • Latest mode checks all three layers including the active RW btree (shared lock). Lowest write impact (-20%) but slowest reads due to writer contention.

Total Wall Time

Engine Total vs. Native MDBX
PsiTriMDBX (latest) 4.9s 8.3x faster
PsiTriMDBX (buffered) 6.8s 6.0x faster
Native MDBX 40.8s 1.0x

Both engines pass validation: balance conservation and log entry counts verified.


Results: AMD EPYC-Turin (x86-64, Linux)

Environment

Component Spec
CPU AMD EPYC-Turin, 8 cores / 16 threads (SMT), 2.4 GHz
ISA extensions SSE2, SSE4.1, SSSE3, AVX2, AVX-512F/BW/DQ/VL/VBMI/VNNI
RAM 121 GB
Storage 960 GB virtual disk (cloud VM — Linux 6.17, 4 KB page size)
Compiler Clang 20, C++20, -O3 -flto -march=native

Transaction Throughput

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#2563EB'}}}}%%
xychart-beta
    title "Transaction Throughput — Write-Only (transfers/sec)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Transfers per second" 0 --> 280000
    bar [163614, 159938, 167093, 104102, 239041]
Engine Transfers/sec KV Ops/sec Relative to PsiTri
PsiTri 163,614 818,070 1.00x
PsiTriRocks 159,938 799,690 0.98x
TidesDB 167,093 835,465 1.02x
RocksDB 104,102 520,510 0.64x
MDBX 239,041 1,195,205 1.46x

On this x86 cloud VM, MDBX's B+tree layout benefits from 4 KB OS pages and x86 hardware prefetchers. The gap is largely a page-copy artifact: MDBX copies one page per write, and 4 KB pages on x86 cost 4x less than the 16 KB pages on M5 Max (see cross-platform comparison).

Bulk Load

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#16A34A'}}}}%%
xychart-beta
    title "Bulk Load — 1M Accounts (ops/sec)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Operations per second" 0 --> 3500000
    bar [2876212, 960000, 630000, 1160000, 2230000]
Engine Time Ops/sec
PsiTri 0.35s 2.88M
MDBX 0.45s 2.23M
RocksDB 0.86s 1.16M
PsiTriRocks 1.05s 0.96M
TidesDB 1.58s 0.63M

PsiTri leads bulk load on x86 — sequential arena writes benefit from AVX-512 copy_branches (9x speedup over scalar).

Transaction Time (Write-Only Phase)

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#DC2626'}}}}%%
xychart-beta
    title "Transaction Phase Wall Time — Write-Only (seconds)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Seconds" 0 --> 110
    bar [61.1, 62.5, 59.8, 96.1, 41.8]
Engine Time vs. PsiTri
PsiTri 61.1s --
PsiTriRocks 62.5s +2%
TidesDB 59.8s -2%
RocksDB 96.1s +57%
MDBX 41.8s -32%

Validation Scan

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#9333EA'}}}}%%
xychart-beta
    title "Validation Scan — 7.9M Entries (ops/sec)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Operations per second" 0 --> 42000000
    bar [19628559, 19300000, 1200000, 11200000, 38100000]
Engine Time Ops/sec
MDBX 0.38s 38.1M
PsiTri 0.73s 19.6M
PsiTriRocks 0.74s 19.3M
RocksDB 1.28s 11.2M
TidesDB 11.9s 1.20M

PsiTri and PsiTriRocks are nearly identical on scan, and both comfortably ahead of RocksDB and TidesDB.

Concurrent Read Performance

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#2563EB'}}}}%%
xychart-beta
    title "Write Throughput: Write-Only vs Write+Read (tx/sec)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Transfers per second" 0 --> 280000
    bar [163614, 159938, 167093, 104102, 239041]
    bar [181168, 121276, 163275, 87423, 229280]
%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#9333EA'}}}}%%
xychart-beta
    title "Concurrent Reader Throughput (reads/sec)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Reads per second" 0 --> 1200000
    bar [1117771, 785868, 326596, 223149, 927734]
Engine Write-Only Write+Read Write Impact Reader reads/sec
PsiTri 163,614 181,168 +10.7% 1,117,771
PsiTriRocks 159,938 121,276 -24.2% 785,868
TidesDB 167,093 163,275 -2.3% 326,596
RocksDB 104,102 87,423 -16.0% 223,149
MDBX 239,041 229,280 -4.1% 927,734

PsiTri shows no write degradation from concurrent reads (+10.7%), consistent with its lock-free MVCC architecture. PsiTri delivers the highest reader throughput of any engine on this platform (1.12M reads/sec), ahead of MDBX (928K).

Storage

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#2563EB'}}}}%%
xychart-beta
    title "Reachable Data Size (MB)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Megabytes" 0 --> 1400
    bar [561, 558, 675, 567, 1257]
%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#DC2626'}}}}%%
xychart-beta
    title "On-Disk File Size (MB)"
    x-axis ["PsiTri", "PsiTriRocks", "TidesDB", "RocksDB", "MDBX"]
    y-axis "Megabytes" 0 --> 2500
    bar [2080, 2298, 675, 579, 1344]
Engine Reachable File Size
PsiTriRocks 558 MB 2,298 MB
PsiTri 561 MB 2,080 MB
RocksDB 567 MB 579 MB
TidesDB 675 MB 675 MB
MDBX 1,257 MB 1,344 MB

PsiTri and PsiTriRocks have the most compact reachable data. The larger file size reflects COW free space awaiting compaction, consistent with the M5 Max results.

Commit Granularity

Batch size controls how many transfers are grouped into a single atomic commit. Larger batches amortize commit overhead and improve throughput, but at the cost of read-visibility latency: a concurrent reader cannot see any transfer in the batch until the whole batch commits.

Note: In PsiTri, uncommitted writes are buffered in the DWAL's in-memory ART map and backed by a WAL file for durability. Committing appends to the WAL buffer with no I/O; the background merge thread drains committed data to the COW trie asynchronously. In RocksDB, large batches mirror the memtable (up to ~64 MB for a 1M-transfer batch). TidesDB has a hard limit of 100K ops per transaction; at 5 ops per transfer that allows at most ~20K transfers per commit, so it is only valid at batch=100 and is excluded from the scaling charts.

Write-Only Throughput vs Batch Size

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#2563EB,#7C3AED,#DC2626,#D97706'}}}}%%
xychart-beta
    title "Write-Only Throughput vs Batch Size (tx/sec)"
    x-axis ["100", "10K", "100K", "1M"]
    y-axis "Transfers per second" 0 --> 650000
    line [163614, 254421, 481361, 593556]
    line [159186, 212378, 274096, 360102]
    line [242402, 286117, 475811, 603124]
    line [108884, 159623, 144383, 185796]

Series order: PsiTri (blue), PsiTriRocks (purple), MDBX (red), RocksDB (amber)

Engine batch=100 batch=10K batch=100K batch=1M Peak gain
PsiTri 163,614 254,421 481,361 593,556 +263%
MDBX 242,402 286,117 475,811 603,124 +149%
PsiTriRocks 159,186 212,378 274,096 360,102 +126%
RocksDB 108,884 159,623 144,383 185,796 +71%
TidesDB 180,654 N/A

PsiTri and MDBX both reach ~600K tx/sec at batch=1M and scale similarly — PsiTri benefits from amortizing the root-pointer CAS, while MDBX amortizes its B+tree page COW. RocksDB shows a dip at batch=100K before recovering at 1M, likely reflecting memtable pressure interacting with compaction.

Write+Read Throughput vs Batch Size

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#2563EB,#7C3AED,#DC2626,#D97706'}}}}%%
xychart-beta
    title "Write+Read Throughput vs Batch Size (tx/sec)"
    x-axis ["100", "10K", "100K", "1M"]
    y-axis "Transfers per second" 0 --> 550000
    line [181168, 272840, 468694, 509351]
    line [117094, 177691, 230013, 361591]
    line [263524, 292664, 444587, 490456]
    line [119185, 138015, 144091, 181406]

Series order: PsiTri (blue), PsiTriRocks (purple), MDBX (red), RocksDB (amber)

Engine batch=100 batch=10K batch=100K batch=1M
PsiTri 181,168 272,840 468,694 509,351
MDBX 263,524 292,664 444,587 490,456
PsiTriRocks 117,094 177,691 230,013 361,591
RocksDB 119,185 138,015 144,091 181,406
TidesDB 178,478

PsiTri overtakes MDBX in write+read mode at batch=100K and beyond. MDBX's reader/writer lock becomes relatively more costly as the write thread holds the lock for longer batches, while PsiTri's lock-free MVCC is unaffected.

Concurrent Reader Throughput vs Batch Size

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#2563EB,#7C3AED,#DC2626,#D97706'}}}}%%
xychart-beta
    title "Concurrent Reader Throughput vs Batch Size (reads/sec)"
    x-axis ["100", "10K", "100K", "1M"]
    y-axis "Reads per second" 0 --> 1800000
    bar [1117771, 1278498, 1358428, 1356727]
    bar [872316, 1095787, 1388806, 1669767]
    bar [1055486, 1166392, 1306705, 1100886]
    bar [275903, 324247, 347816, 296904]

Series order: PsiTri (blue), PsiTriRocks (purple), MDBX (red), RocksDB (amber)

Engine batch=100 batch=10K batch=100K batch=1M
PsiTri 1,117,771 1,278,498 1,358,428 1,356,727
PsiTriRocks 872,316 1,095,787 1,388,806 1,669,767
MDBX 1,055,486 1,166,392 1,306,705 1,100,886
RocksDB 275,903 324,247 347,816 296,904

PsiTri and PsiTriRocks reader throughput improves monotonically with batch size: larger batches mean the writer holds the current root longer, giving readers more time on a stable snapshot. MDBX reader throughput peaks at batch=100K and drops at batch=1M, where longer write transactions increase reader-visible stall events. RocksDB readers are consistently the slowest across all batch sizes.

Per-Round Throughput Curves

The write+read phase runs 10M transfers in 10 rounds of 1M each. The round-by-round breakdown reveals stability (or instability) in each engine's throughput profile.

Round PsiTri 1M MDBX 100K RocksDB 10K PsiTriRocks 100K
1 548K 453K 165K 254K
2 523K 453K 157K 249K
3 510K 452K 156K 243K
4 504K 452K 142K 235K
5 505K 452K 139K 230K
6 508K 453K 134K 229K
7 505K 452K 133K 230K
8 505K 452K 134K 230K
9 508K 450K 137K 230K
10 509K 444K 138K 230K
  • PsiTri (batch=1M): Starts hot (548K) as the working set is warm, settles to a stable ~505–509K after round 3. No compaction stalls.
  • MDBX (batch=100K): Essentially flat at 452–453K through round 9, with a small dip in round 10. Highly predictable.
  • RocksDB (batch=10K): Falls from 165K to a 133–134K trough in rounds 6–8 — a classic compaction cliff as L0→L1 compaction competes with writes — then partially recovers to 138K.
  • PsiTriRocks (batch=100K): Sharp drop from 254K to ~230K by round 5 (RocksDB compaction absorbing the write path through the shim), then fully stable through rounds 5–10.

MDBX File Bloat Under Large Batches

MDBX copies full pages on each write. Larger batches create more dirty pages per commit, and MDBX's free-page recycler can leave significant dead space in the file.

Batch size File size Live data Waste
100 1,344 MB 1,257 MB 6%
10K 3,968 MB 1,257 MB 68%
100K 4,928 MB 1,257 MB 74%
1M 2,048 MB 1,257 MB 39%

File bloat peaks at batch=100K (4,928 MB — nearly 4x live data) then shrinks at batch=1M as fewer, larger commits leave the free-page recycler more time per commit to consolidate space. PsiTri's file size is not meaningfully affected by batch size because its COW operates at 64-byte node granularity rather than 4 KB page granularity.


Results: AMD EPYC-Turin — Five Engines × Four Batch Sizes (April 2026)

This run adds PsiTri's DWAL mode (buffered writes via ART + WAL) alongside direct mode and sweeps four batch sizes on the same Vultr cloud VM used in the prior AMD section above. Workload parameters differ from the prior run (100K accounts, 2M transactions, no sync) so absolute numbers are not directly comparable to the single-batch results above; use these charts to compare engines and batch-size scaling relative to each other.

Environment

Component Spec
CPU AMD EPYC-Turin, 8 cores / 16 threads (SMT)
RAM 121 GB
OS Ubuntu Linux 6.17.0-14-generic x86_64 (Vultr cloud VM)
Compiler Clang 20, C++20, -O3 -flto -march=native
Kernel page size 4 KB

Workload

Parameter Value
Accounts 100,000
Transactions 2,000,000 per phase
Batch sizes swept 1 · 1,000 · 100,000 · 1,000,000
Sync mode none
Sync every never
RNG seed 12345

Write-Only Throughput vs Batch Size

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#2563EB,#16A34A,#7C3AED,#D97706,#DC2626'}}}}%%
xychart-beta
    title "Write-Only Throughput vs Batch Size (tx/sec)"
    x-axis ["1", "1K", "100K", "1M"]
    y-axis "Transfers per second" 0 --> 1100000
    line [101190, 416986, 1012173, 1064827]
    line [783944, 993090, 775373, 820138]
    line [727044, 803153, 624526, 856796]
    line [251842, 301194, 292238, 461033]
    line [525868, 482735, 341122, 344629]

Series: PsiTri direct (blue) · PsiTri DWAL (green) · PsiTriRocks (purple) · RocksDB (amber) · TidesDB (red)

Engine batch=1 batch=1K batch=100K batch=1M Peak
PsiTri 101,190 416,986 1,012,173 1,064,827 1.06M
PsiTri (DWAL) 783,944 993,090 775,373 820,138 993K
PsiTriRocks 727,044 803,153 624,526 856,796 857K
TidesDB 525,868 482,735 341,122 344,629 526K
RocksDB 251,842 301,194 292,238 461,033 461K

PsiTri in direct mode peaks at 1.06M tx/sec at large batches -- the COW cost is fully amortized across the batch. PsiTri in DWAL mode dominates at small batches (batch=1: 7.7x faster than direct mode), because the ART buffer absorbs per-commit COW overhead. Both modes converge near 820-1,065K at batch=1M, where batching already amortizes COW cost.

Write+Read Throughput vs Batch Size

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#2563EB,#16A34A,#7C3AED,#D97706,#DC2626'}}}}%%
xychart-beta
    title "Write+Read Throughput vs Batch Size (tx/sec)"
    x-axis ["1", "1K", "100K", "1M"]
    y-axis "Transfers per second" 0 --> 1100000
    line [114569, 343664, 1051946, 1074253]
    line [733942, 863056, 698728, 826451]
    line [560835, 712217, 562915, 902071]
    line [280863, 231596, 266907, 467095]
    line [491786, 481292, 333894, 219403]

Series: PsiTri direct (blue) · PsiTri DWAL (green) · PsiTriRocks (purple) · RocksDB (amber) · TidesDB (red)

Engine batch=1 batch=1K batch=100K batch=1M Write impact at 1M
PsiTri 114,569 343,664 1,051,946 1,074,253 +0.9%
PsiTri (DWAL) 733,942 863,056 698,728 826,451 +0.8%
PsiTriRocks 560,835 712,217 562,915 902,071 +5.3%
RocksDB 280,863 231,596 266,907 467,095 +1.3%
TidesDB† 491,786 481,292 333,894 219,403 −36.3%

PsiTri write throughput is unaffected or slightly boosted by a concurrent reader across all batch sizes — consistent with its lock-free MVCC. TidesDB shows severe writer degradation at batch=1M (−36%), and fails correctness at batch ≥ 1K (see note below).

Concurrent Reader Throughput vs Batch Size

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#2563EB,#16A34A,#7C3AED,#D97706,#DC2626'}}}}%%
xychart-beta
    title "Concurrent Reader Throughput vs Batch Size (reads/sec)"
    x-axis ["1", "1K", "100K", "1M"]
    y-axis "Reads per second" 0 --> 5000000
    line [2618658, 1611067, 4691063, 4850451]
    line [4100122, 3935602, 4218272, 4840387]
    line [4334633, 4127036, 4316397, 4878738]
    line [928308, 778775, 639254, 409677]
    line [1561115, 1633270, 1379504, 937190]

Series: PsiTri direct (blue) · PsiTri DWAL (green) · PsiTriRocks (purple) · RocksDB (amber) · TidesDB (red)

Engine batch=1 batch=1K batch=100K batch=1M
PsiTriRocks 4,334,633 4,127,036 4,316,397 4,878,738
PsiTri 2,618,658 1,611,067 4,691,063 4,850,451
PsiTri (DWAL) 4,100,122 3,935,602 4,218,272 4,840,387
TidesDB 1,561,115 1,633,270 1,379,504 937,190
RocksDB 928,308 778,775 639,254 409,677

All three psitri-family engines deliver 4–4.9M reads/sec at large batches. PsiTri reader throughput scales steeply with batch size (2.6M→4.9M) — the writer holds the current root longer at larger batches, giving readers more time on a stable, hot snapshot. PsiTri DWAL reader throughput is broadly flat (~4M) across batch sizes. RocksDB reader throughput falls with batch size, likely as write amplification from larger memtables crowds out read I/O.

Correctness

Engine batch=1 batch=1K batch=100K batch=1M
PsiTri PASS PASS PASS PASS
PsiTri (DWAL) PASS PASS PASS PASS
PsiTriRocks PASS PASS PASS PASS
RocksDB PASS PASS PASS PASS
TidesDB PASS FAIL FAIL FAIL

†TidesDB balance conservation fails at batch ≥ 1K under concurrent read load (observed balance exceeds expected by 0.008–32% depending on batch size), indicating a transaction isolation bug when large write batches overlap with concurrent readers. Only the batch=1 result is valid for TidesDB.


Cross-Platform Comparison

The most striking cross-platform shift is MDBX, not PsiTri.

Write Throughput: ARM M5 Max vs x86 EPYC-Turin

Engine M5 Max (ARM64) EPYC-Turin (x86) Change
PsiTri 376,691 163,614 -57%
PsiTriRocks 356,703 159,938 -55%
TidesDB 225,937 167,093 -26%
RocksDB 126,341 104,102 -18%
MDBX 57,138 239,041 +318%

Every engine is slower on the x86 VM than on M5 Max — this is a cloud VM vs a high-end workstation, so absolute numbers aren't directly comparable. What matters is the relative ordering and the magnitude of MDBX's swing.

Why does MDBX improve so much on x86?

MDBX (like LMDB) uses page-level copy-on-write: every write copies the entire page containing the modified key. On M5 Max the OS page size is 16 KB; on x86 Linux it is 4 KB. Each write copies 4x less data on x86, which maps almost exactly onto the 4.2x throughput increase. This is an OS page size effect, not an architectural one.

PsiTri's COW operates at 64-byte node granularity and is indifferent to OS page size, so it does not benefit from the same effect. Its absolute throughput drops on the cloud VM due to higher memory access latency compared to M5 Max's Unified Memory Architecture, but its relative standing against RocksDB and TidesDB stays consistent across both platforms.

Consistent patterns across both platforms:

  • PsiTri leads or ties for bulk load on both platforms
  • PsiTri and PsiTriRocks lead reachable data compactness on both platforms
  • MDBX leads sequential scan on both platforms
  • PsiTri and MDBX lead concurrent reader throughput on both platforms
  • RocksDB is consistently mid-pack on transactions

Reproducing

# Build all engines (from repo root)
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release \
      -DBUILD_ROCKSDB_BENCH=ON \
      -DBUILD_TIDESDB_BENCH=ON \
      -B build/release

cmake --build build/release -j8 --target \
      bank-bench-psitri \
      bank-bench-psitrirocks \
      bank-bench-rocksdb \
      bank-bench-mdbx \
      bank-bench-tidesdb

# Run each engine with identical parameters
for engine in psitri psitrirocks rocksdb mdbx tidesdb; do
    build/release/bin/bank-bench-${engine} \
        --num-accounts=1000000 \
        --num-transactions=10000000 \
        --batch-size=100 \
        --sync-every=100 \
        --db-path=/tmp/bb_${engine}
done

CLI Options

Flag Default Description
--num-accounts 1,000,000 Number of bank accounts
--num-transactions 10,000,000 Number of transfer attempts
--batch-size 1 Transfers per commit
--sync-every 0 Sync to disk every N commits (0 = never)
--sync-mode none Durability: none, async, sync
--seed 12345 RNG seed for reproducibility
--db-path /tmp/bank_bench_db Database directory
--initial-balance 1,000,000 Starting balance per account
--reads-per-tx 100 Point reads per reader thread batch

Environment

Apple M5 Max (ARM64)

  • Hardware: Apple M5 Max, 128 GB Unified Memory
  • OS: macOS (Darwin 25.3.0)
  • Compiler: Clang 17 (LLVM), C++20, -O3 -flto=thin
  • Engine versions: RocksDB 9.9.3, libmdbx 0.13.11, TidesDB 8.9.4

AMD EPYC-Turin (x86-64)

  • Hardware: AMD EPYC-Turin, 8 cores / 16 threads, 2.4 GHz, 121 GB RAM (cloud VM — Vultr)
  • ISA: AVX-512F/BW/DQ/VL/VBMI/VBMI2/VNNI, AVX2, SSSE3, SSE4.1
  • OS: Ubuntu Linux 6.17.0-14-generic x86_64, 4 KB page size
  • Compiler: Clang 20 (LLVM), C++20, -O3 -flto -march=native
  • Engine versions: RocksDB (built from source), TidesDB (built from source)
  • April 2026 batch-scaling run: 5 engines + PsiTri DWAL mode; 100K accounts, 2M tx, batch sizes 1/1K/100K/1M, sync=none
  • April 2026 full comparison: 6 engines including SQLite; 1M accounts, 1M tx, batch=1, sync=none

Full Engine Comparison (April 2026, AMD EPYC-Turin)

1M accounts, 1M transactions, batch=1 (per-transfer commit), sync=none. Each transfer = 5 KV operations (2 reads + 2 updates + 1 insert).

Write-Only Throughput

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#7b1fa2'}}}}%%
xychart-beta
    title "Bank Benchmark — Write-Only (KV ops/sec, 1M accounts)"
    x-axis ["DWAL", "PsiTriRocks", "RocksDB", "PsiTri-SQLite", "System SQLite"]
    y-axis "KV operations per second" 0 --> 2500000
    bar [2138670, 1847230, 787940, 264175, 248690]
Engine API Transfers/sec KV Ops/sec vs RocksDB
PsiTri (DWAL) Native 427,734 2,138,670 2.7x
PsiTriRocks RocksDB 369,446 1,847,230 2.3x
RocksDB RocksDB 157,588 787,940 1.0x
PsiTri-SQLite SQLite 52,835 264,175
System SQLite SQLite 49,738 248,690

Concurrent Read + Write

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#7b1fa2'}}}}%%
xychart-beta
    title "Bank Benchmark — Write + Concurrent Reader (KV ops/sec)"
    x-axis ["DWAL", "PsiTriRocks", "RocksDB", "PsiTri-SQLite", "System SQLite"]
    y-axis "KV operations per second" 0 --> 2000000
    bar [1746370, 1748115, 573760, 262740, 77485]
Engine API Write+Read tx/sec Write Impact Reader reads/sec
PsiTri (DWAL) Native 349,274 -18% 1,447,979
PsiTriRocks RocksDB 349,623 -5% 1,416,652
RocksDB RocksDB 114,752 -27% 289,627
PsiTri-SQLite SQLite 52,548 -0.5% 215,104
System SQLite SQLite 15,497 -69% 15,917

Drop-In Replacement Speedups

Same API, same workload — only the storage engine changes:

Comparison Write Speedup Read Speedup Write Impact
PsiTriRocks vs RocksDB 2.3x 4.9x -5% vs -27%
PsiTri-SQLite vs System SQLite 1.06x 13.5x -0.5% vs -69%

PsiTri-SQLite's -0.5% write impact under concurrent reads means readers are effectively invisible to the writer — compared to System SQLite's -69% degradation where the reader blocks the writer during WAL checkpointing.