Random Upsert Benchmark: 200M Keys¶
This benchmark compares sustained random write throughput across PsiTri, RocksDB, and MDBX as the dataset grows from zero to 200 million keys -- well beyond the point where in-memory buffering helps and compaction/page management dominates performance.
Workload¶
Each round inserts 1 million random key-value pairs using upsert semantics. Keys are 8-byte hashes of a sequence number (uniformly random distribution), values are 256 bytes. Writes are committed in batches of 100.
| Parameter | Value |
|---|---|
| Operation | Random upsert (hashed 64-bit keys) |
| Rounds | 200 (1M ops per round = 200M total) |
| Batch size | 100 ops per commit |
| Value size | 256 bytes |
| Key size | 8 bytes |
| Concurrent readers | 0 (write-only) |
Engine Configuration¶
- PsiTri (DWAL mode): pinned_cache=256 MB, merge_threads=2, max_rw=100K entries, sync=none
- RocksDB: default options (create_if_missing), WriteBatch size=100
- MDBX: UTTERLY_NOSYNC, commit_interval=100 ops, map_size=200 GB, MDBX_UPSERT
Results¶
Throughput Over Time¶
---
config:
theme: base
themeVariables:
xyChart:
backgroundColor: "#ffffff"
titleColor: "#222222"
xAxisLabelColor: "#222222"
xAxisTitleColor: "#222222"
xAxisTickColor: "#666666"
xAxisLineColor: "#666666"
yAxisLabelColor: "#222222"
yAxisTitleColor: "#222222"
yAxisTickColor: "#666666"
yAxisLineColor: "#666666"
plotColorPalette: "#7b1fa2,#e65100,#1565c0"
---
xychart-beta
title "Random Upsert Throughput (smoothed, ops/sec)"
x-axis "Keys (millions)" [10, 30, 50, 70, 90, 110, 130, 150, 170, 190]
y-axis "Operations/sec (thousands)" 0 --> 1800
line [1588, 1488, 1364, 1335, 1101, 1044, 1029, 924, 936, 879]
line [1155, 1053, 1001, 957, 760, 643, 450, 279, 291, 351]
line [296, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Chart legend
Purple = PsiTri, Orange = RocksDB, Blue = MDBX (stopped at 20M keys)
Summary¶
| Metric | PsiTri | RocksDB | MDBX |
|---|---|---|---|
| Rounds completed | 200 | 200 | 20 (MAP_FULL) |
| Total ops | 200M | 200M | 20M |
| Total time | 740s | -- | -- |
| Avg first 30 rounds | 1,588,320/s | 1,155,254/s | 296,806/s |
| Avg first 100 rounds | 1,311,423/s | 1,000,853/s | -- |
| Avg all 200 rounds | 1,129,318/s | 712,824/s | -- |
| Avg last 30 rounds | 909,379/s | 390,162/s | -- |
| Peak throughput | 2,334,318/s | 1,622,951/s | 471,055/s |
| Min throughput | 24,791/s | 85,161/s | 162,650/s |
| Final DB size | 76.7 GB | 51.5 GB | MAP_FULL at 200 GB |
PsiTri vs RocksDB Speedup¶
The performance gap widens as the dataset grows. PsiTri starts 1.37x faster and finishes 2.33x faster:
%%{init: {'theme': 'base', 'themeVariables': {'xyChart': {'plotColorPalette': '#7b1fa2'}}}}%%
xychart-beta
title "PsiTri Speedup Over RocksDB"
x-axis ["First 30 rounds", "First 100 rounds", "Full 200 rounds", "Last 30 rounds"]
y-axis "Speedup (x)" 0 --> 3
bar [1.37, 1.31, 1.58, 2.33]
| Period | PsiTri | RocksDB | Speedup |
|---|---|---|---|
| First 30 rounds (in-RAM) | 1.59M/s | 1.16M/s | 1.37x |
| First 100 rounds | 1.31M/s | 1.00M/s | 1.31x |
| Full 200 rounds | 1.13M/s | 713K/s | 1.58x |
| Last 30 rounds (beyond-RAM) | 909K/s | 390K/s | 2.33x |
Analysis¶
PsiTri¶
PsiTri sustains high throughput throughout the entire 200-round run. The DWAL write buffer absorbs bursts while the background merge thread drains data into the COW trie. Periodic merge stalls cause brief drops to ~25-80K/s but recover immediately in the next round.
The key advantage is predictable degradation: as the dataset grows beyond RAM, throughput decreases gradually due to page cache pressure, but there are no compaction cliffs. At 200M keys (76.7 GB on disk), PsiTri still delivers 909K ops/sec averaged over the final 30 rounds.
Space amplification is ~1.6x theoretical (76.7 GB for ~49 GB of raw data), reflecting COW segment overhead and DWAL buffering.
RocksDB¶
RocksDB shows strong initial throughput thanks to its LSM write-ahead-log buffering. However, it exhibits the classic compaction stall pattern: throughput drops to 85-125K/s every 3-5 rounds as background compaction falls behind.
Beyond round 150, throughput collapses to 200-400K/s sustained as the LSM tree grows deeper and compaction becomes the bottleneck. This is the fundamental LSM tradeoff -- deferred work eventually catches up.
RocksDB is the most space-efficient at 51.5 GB (~1.05x theoretical) thanks to SSTable compaction eliminating dead data.
MDBX (libmdbx)¶
MDBX's COW B-tree cannot reclaim pages fast enough with frequent commits (every 100 ops). It hits MAP_FULL after only 20 million keys despite a 200 GB map allocation -- a 10x space amplification.
With commit_interval raised to 10,000, MDBX reaches 31M keys before MAP_FULL. The fundamental issue is that page-level COW (4KB granularity) creates massive write amplification on random workloads, and the freelist cannot keep pace with page allocation.
MDBX is not competitive for high-frequency random write workloads.
Environment¶
| Spec | Value |
|---|---|
| Host | Vultr VPS |
| CPU | AMD EPYC-Turin, 16 vCPUs |
| RAM | 128 GB |
| OS | Linux 6.17.0-20-generic x86_64 |
| Filesystem | ext4 (NVMe) |
Raw Data¶
Per-round CSV data for charting is available in docs/data/random_upsert_200r/.
Reproducing¶
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=clang-20 -DCMAKE_CXX_COMPILER=clang++-20 \
-DBUILD_ROCKSDB_BENCH=ON -B build/release
cmake --build build/release -j16
# PsiTri
./build/release/bin/dwal-bench --rounds 200 --batch 100 --value-size 256 \
--mode upsert-rand --pinned-cache 256
# RocksDB
./build/release/bin/dwal-bench --rounds 200 --batch 100 --value-size 256 \
--mode upsert-rand --engine rocksdb
# MDBX
./build/release/bin/dwal-bench --rounds 200 --batch 100 --value-size 256 \
--mode upsert-rand --engine mdbx