Random Upsert Benchmark: 200M Keys¶

This benchmark compares sustained random write throughput across PsiTri, RocksDB, and MDBX as the dataset grows from zero to 200 million keys -- well beyond the point where in-memory buffering helps and compaction/page management dominates performance.

Workload¶

Each round inserts 1 million random key-value pairs using upsert semantics. Keys are 8-byte hashes of a sequence number (uniformly random distribution), values are 256 bytes. Writes are committed in batches of 100.

Parameter	Value
Operation	Random upsert (hashed 64-bit keys)
Rounds	200 (1M ops per round = 200M total)
Batch size	100 ops per commit
Value size	256 bytes
Key size	8 bytes
Concurrent readers	0 (write-only)

Engine Configuration¶

PsiTri (DWAL mode): pinned_cache=256 MB, merge_threads=2, max_rw=100K entries, sync=none
RocksDB: default options (create_if_missing), WriteBatch size=100
MDBX: UTTERLY_NOSYNC, commit_interval=100 ops, map_size=200 GB, MDBX_UPSERT

Results¶

Throughput Over Time¶

---
config:
    theme: base
    themeVariables:
        xyChart:
            backgroundColor: "#ffffff"
            titleColor: "#222222"
            xAxisLabelColor: "#222222"
            xAxisTitleColor: "#222222"
            xAxisTickColor: "#666666"
            xAxisLineColor: "#666666"
            yAxisLabelColor: "#222222"
            yAxisTitleColor: "#222222"
            yAxisTickColor: "#666666"
            yAxisLineColor: "#666666"
            plotColorPalette: "#7b1fa2,#e65100,#1565c0"
---
xychart-beta
    title "Random Upsert Throughput (smoothed, ops/sec)"
    x-axis "Keys (millions)" [10, 30, 50, 70, 90, 110, 130, 150, 170, 190]
    y-axis "Operations/sec (thousands)" 0 --> 1800
    line [1588, 1488, 1364, 1335, 1101, 1044, 1029, 924, 936, 879]
    line [1155, 1053, 1001, 957, 760, 643, 450, 279, 291, 351]
    line [296, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Chart legend

Purple = PsiTri, Orange = RocksDB, Blue = MDBX (stopped at 20M keys)

Summary¶

Metric	PsiTri	RocksDB	MDBX
Rounds completed	200	200	20 (MAP_FULL)
Total ops	200M	200M	20M
Total time	740s	--	--
Avg first 30 rounds	1,588,320/s	1,155,254/s	296,806/s
Avg first 100 rounds	1,311,423/s	1,000,853/s	--
Avg all 200 rounds	1,129,318/s	712,824/s	--
Avg last 30 rounds	909,379/s	390,162/s	--
Peak throughput	2,334,318/s	1,622,951/s	471,055/s
Min throughput	24,791/s	85,161/s	162,650/s
Final DB size	76.7 GB	51.5 GB	MAP_FULL at 200 GB

PsiTri vs RocksDB Speedup¶

The performance gap widens as the dataset grows. PsiTri starts 1.37x faster and finishes 2.33x faster:

%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#7b1fa2'}}}}%%
xychart-beta
    title "PsiTri Speedup Over RocksDB"
    x-axis ["First 30 rounds", "First 100 rounds", "Full 200 rounds", "Last 30 rounds"]
    y-axis "Speedup (x)" 0 --> 3
    bar [1.37, 1.31, 1.58, 2.33]

Period	PsiTri	RocksDB	Speedup
First 30 rounds (in-RAM)	1.59M/s	1.16M/s	1.37x
First 100 rounds	1.31M/s	1.00M/s	1.31x
Full 200 rounds	1.13M/s	713K/s	1.58x
Last 30 rounds (beyond-RAM)	909K/s	390K/s	2.33x

Analysis¶

PsiTri¶

PsiTri sustains high throughput throughout the entire 200-round run. The DWAL write buffer absorbs bursts while the background merge thread drains data into the COW trie. Periodic merge stalls cause brief drops to ~25-80K/s but recover immediately in the next round.

The key advantage is predictable degradation: as the dataset grows beyond RAM, throughput decreases gradually due to page cache pressure, but there are no compaction cliffs. At 200M keys (76.7 GB on disk), PsiTri still delivers 909K ops/sec averaged over the final 30 rounds.

Space amplification is ~1.6x theoretical (76.7 GB for ~49 GB of raw data), reflecting COW segment overhead and DWAL buffering.

RocksDB¶

RocksDB shows strong initial throughput thanks to its LSM write-ahead-log buffering. However, it exhibits the classic compaction stall pattern: throughput drops to 85-125K/s every 3-5 rounds as background compaction falls behind.

Beyond round 150, throughput collapses to 200-400K/s sustained as the LSM tree grows deeper and compaction becomes the bottleneck. This is the fundamental LSM tradeoff -- deferred work eventually catches up.

RocksDB is the most space-efficient at 51.5 GB (~1.05x theoretical) thanks to SSTable compaction eliminating dead data.

MDBX (libmdbx)¶

MDBX's COW B-tree cannot reclaim pages fast enough with frequent commits (every 100 ops). It hits MAP_FULL after only 20 million keys despite a 200 GB map allocation -- a 10x space amplification.

With commit_interval raised to 10,000, MDBX reaches 31M keys before MAP_FULL. The fundamental issue is that page-level COW (4KB granularity) creates massive write amplification on random workloads, and the freelist cannot keep pace with page allocation.

MDBX is not competitive for high-frequency random write workloads.

Environment¶

Spec	Value
Host	Vultr VPS
CPU	AMD EPYC-Turin, 16 vCPUs
RAM	128 GB
OS	Linux 6.17.0-20-generic x86_64
Filesystem	ext4 (NVMe)

Raw Data¶

Per-round CSV data for charting is available in docs/data/random_upsert_200r/.

Reproducing¶

cmake -G Ninja -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_C_COMPILER=clang-20 -DCMAKE_CXX_COMPILER=clang++-20 \
    -DBUILD_ROCKSDB_BENCH=ON -B build/release
cmake --build build/release -j16

# PsiTri
./build/release/bin/dwal-bench --rounds 200 --batch 100 --value-size 256 \
    --mode upsert-rand --pinned-cache 256

# RocksDB
./build/release/bin/dwal-bench --rounds 200 --batch 100 --value-size 256 \
    --mode upsert-rand --engine rocksdb

# MDBX
./build/release/bin/dwal-bench --rounds 200 --batch 100 --value-size 256 \
    --mode upsert-rand --engine mdbx