Transaction and Cursor Contract¶
This page defines the public transaction contract for PsiTri. The implementation must satisfy this developer experience. Internal data structures, cursor pools, and MDBX compatibility code should be shaped by this contract, not the other way around.
Mental Model¶
PsiTri has up to 512 top-level roots. Each root is an independent ordered key space. A committed top-level root is visible to readers and is treated as shared state.
A write transaction creates current write state for the root or tree it opens.
Reads inside that transaction read the current write state, including writes
that have not been turned back into a tree yet. This is the default and cheap
path.
A tree does not have to start from a top-level root. Use
create_temporary_tree() to build a detached tree, then put it somewhere by
upserting it as a subtree value or publishing it to a top-level root with
set_root().
PsiTri can hand application code an explicit retained tree object. A tree
may come from create_temporary_tree(), a top-level root snapshot, or an
existing subtree. Passing a tree to upsert_subtree() or set_root() stores
that tree identity in a new place. This is powerful: applications can manage
their own snapshots and copy them between roots and subtrees without
deep-copying all keys and values.
To modify a tree, start a write transaction from it:
auto edit = ws->start_write_transaction(tree_handle,
psitri::tx_mode::expect_success);
edit.upsert("field", "value");
auto updated_tree = edit.get_tree();
start_write_transaction(tree) does not give the transaction a root slot or
parent key to publish into. It only edits that tree and lets the caller ask for
the resulting tree. Publishing is a separate step: store the returned tree
with upsert_subtree() or set_root().
That power has a real footgun. PsiTri stores subtrees by reference-counted tree identity. If application code stores a tree inside itself, inside one of its descendants, or otherwise creates a cycle, the cycle may keep storage alive forever. The public API makes tree movement explicit, but it does not try to prove the user's ownership graph is acyclic.
Snapshot ownership is different from current-state reads. A snapshot cursor
pins a stable view so it can survive later mutation. A copyable tree obtained
from get_root() or get_subtree() can also retain a stable tree identity.
That is useful and correct when the application needs isolation or wants to
archive a tree, but it can make later writes copy more data.
Write Sessions¶
A write session is a thread-owned allocator context. It owns a 32 MB write buffer that is flushed and recycled when the session closes. Create the session on the thread that will use it, keep it for the life of that worker thread, and create many transactions from it.
void writer_thread(std::shared_ptr<psitri::database> db)
{
auto ws = db->start_write_session();
while (auto job = next_job())
{
auto tx = ws->start_transaction(job.root, psitri::tx_mode::expect_success);
apply_job(tx, *job);
tx.commit();
}
}
Do not create a write session for each small operation:
for (auto& job : jobs)
{
auto ws = db->start_write_session(); // wrong: creates allocator contexts repeatedly
auto tx = ws->start_transaction(job.root);
apply_job(tx, job);
tx.commit();
}
Contract:
- A write session must be created by the thread that uses it.
- A write session must not be passed to another thread.
- A write session should be long-lived, normally one per worker thread.
- Each write session owns a 32 MB write buffer. Applications should budget for that per active write session.
- Closing a write session flushes/recycles its write buffer.
- Transactions, cursors, and subtree scopes are the short-lived objects.
- An application may have at most 50 active write sessions. The allocator has 64 session slots total; 14 are reserved for background work such as current and future compaction services.
- Creating write sessions repeatedly for small operations is an API misuse and should show up clearly in examples, docs, and diagnostics.
Basic Write Transaction¶
Use expect_success when the transaction is likely to commit and may perform
many writes to the same root.
#include <psitri/database.hpp>
#include <psitri/transaction.hpp>
auto db = psitri::database::open(
"app.db",
psitri::open_mode::create_or_open);
auto ws = db->start_write_session();
auto tx = ws->start_transaction(0, psitri::tx_mode::expect_success);
tx.upsert("account:alice", "100");
tx.upsert("account:bob", "50");
bool found = tx.get("account:alice", [](psitri::value_view value) {
assert(value == "100");
});
assert(found);
tx.commit();
Contract:
get()inside a write transaction reads the transaction's current write state.get(key, lambda)is the zero-copy read path. Thevalue_viewpassed to the lambda is valid only for the life of that lambda call.get<std::string>(key)andget(key, buffer)are copying convenience APIs. Use them when the value must outlive the read callback.commit()publishes the final state atomically.- Destroying a live transaction aborts it.
expect_successmay prepare a private writable version at transaction start.- A transaction does not hold a compactor/recycling read lock for its whole lifetime. PsiTri takes short internal read locks around operations that need them. Longer zero-copy value lifetimes must be explicit at the call site.
Zero-Copy Reads¶
Use the lambda form for hot reads:
auto tx = ws->start_transaction(0, psitri::tx_mode::expect_success);
bool found = tx.get("user:alice", [](psitri::value_view value) {
parse_user(value);
});
Contract:
- The
value_viewis borrowed directly from the tree; PsiTri does not copy the value into a temporary buffer. get(key, lambda)returnstruewhen the key contains a normal value and the lambda was called. It returnsfalsefor a missing key; subtree values are accessed through subtree APIs.- The view is valid only during the lambda call. Do not store it, return it, or pass it to asynchronous work.
- The lambda should be short-running. While the view is live, PsiTri may hold a read lock or equivalent protection that delays compaction/recycling.
- Do CPU-light parsing, comparison, or immediate serialization inside the lambda. Copy the value out first if later work may block, allocate heavily, call user plugins, wait on I/O, or take locks.
If a caller intentionally needs a borrowed value view to live beyond one callback, make that cost visible with an explicit RAII pin:
auto pin = tx.pin_values();
auto cur = tx.cursor();
if (cur.lower_bound("user:alice") && cur.key() == "user:alice")
{
psitri::value_view value = cur.value(pin);
parse_and_compare(value);
}
Contract:
- A value pin protects the memory backing borrowed
value_viewobjects from compactor/recycling movement until the pin is destroyed. - A value pin is not a snapshot. It does not make a current-state cursor stable across writes, commits, aborts, or cursor invalidation.
- Keep value pins tightly scoped. Holding a pin longer than necessary can delay compaction and segment reuse.
- Prefer
get(key, lambda)for normal hot reads and copying reads for values that must survive slow work.
Copying reads are still useful when ownership matters:
auto owned = tx.get<std::string>("user:alice"); // copies
std::string reused;
tx.get("user:alice", &reused); // copies into caller buffer
Known-Key Fast Paths¶
PsiTri has separate paths for exact-key operations and ordered operations. Use the exact-key APIs when the application already knows the key.
// Exact-key point read. Fast lookup path; lambda form is also zero-copy.
tx.get("user:alice", [](psitri::value_view value) {
use_user(value);
});
// Exact-key owned read. Fast lookup path, then copies into std::string.
auto owned = tx.get<std::string>("user:alice");
// Ordered search. Use this when you need the first key at or after a boundary.
auto cur = tx.cursor();
cur.lower_bound("user:alice");
The same distinction applies to writes:
// Fast path when the key is expected to exist.
if (!tx.update("user:alice", new_value))
handle_missing_user();
// General path when the key may or may not exist.
tx.upsert("user:alice", new_value);
Contract:
get(key, lambda),get<T>(key), andget(key, buffer*)are point reads. When the exact key is known, they can use the hash lookup fast path.lower_bound()is an ordered-positioning operation. It must find the first key at or after the search key, so it uses ordered search semantics even when the caller passes an exact key.update(key, value)is the preferred write when the caller knows the key should already exist. It can use the existing-key fast path and reports missing keys withfalse.upsert(key, value)is the general insert-or-update API. Because it must handle both existing and missing keys, it uses ordered search semantics.
Tree Objects¶
Use create_temporary_tree() when you want an empty tree that is not yet
attached to any top-level root. It returns a tree. Edit that tree by starting
a write transaction from it.
auto profile = ws->create_temporary_tree();
auto build_profile = ws->start_write_transaction(
std::move(profile),
psitri::tx_mode::expect_success);
build_profile.upsert("name", "Alice");
build_profile.upsert("timezone", "America/Chicago");
profile = build_profile.get_tree();
The detached tree can be stored under a key:
auto users = ws->start_transaction(0, psitri::tx_mode::expect_success);
users.upsert_subtree("user:alice:profile", std::move(profile));
users.commit();
Use std::move() when you do not need the handle afterward. Moving avoids the
extra reference-count retain/release traffic that a copy would perform.
Or it can become a top-level root:
auto index = ws->create_temporary_tree();
auto index_edit = ws->start_write_transaction(std::move(index),
psitri::tx_mode::expect_success);
index_edit.upsert("email:alice@example.com", "user:alice");
index_edit.upsert("email:bob@example.com", "user:bob");
index = index_edit.get_tree();
ws->set_root(7, std::move(index));
Tree handles can also come from existing roots or subtrees. This stores the same tree identity somewhere else; it is not a deep copy of every key/value pair.
auto rs = db->start_read_session();
// Retains the root snapshot visible to this read session.
psitri::tree root_snapshot = rs->get_root(2);
auto archive = ws->start_transaction(7, psitri::tx_mode::expect_success);
archive.upsert_subtree("snapshots/root-2", root_snapshot);
archive.commit();
If the same snapshot should be installed in more than one place, copy the
tree handle or pass the same handle again. Copying a tree copies the
underlying smart pointer. It points at the same tree identity; it is not a deep
copy.
auto root_snapshot = rs->get_root(2);
auto another_handle = root_snapshot; // copies the tree smart pointer
auto tx = ws->start_transaction(7, psitri::tx_mode::expect_success);
tx.upsert_subtree("copy-a", root_snapshot);
tx.upsert_subtree("copy-b", another_handle);
tx.commit();
Contract:
- A
treeobject is a copyable retained smart pointer to a tree identity. - Copying a
treeretains the same tree identity. It is a shallow smart-pointer copy, not a deep data copy. - A tree created by
create_temporary_tree()is detached. It is not attached to any top-level root slot until it is stored. - A detached tree is edited through
start_write_transaction(tree, mode). start_write_transaction(tree)has no commit target. It never publishes to a root slot or parent key by itself.get_tree()on a write transaction returns the transaction's current tree. That returnedtreecan then be stored withupsert_subtree()orset_root().- A detached tree is not visible to readers until it is stored with
upsert_subtree()on a transaction that eventually commits to a top-level root, or stored directly withset_root(). - Passing a
treetoupsert_subtree()orset_root()stores a retained copy of that tree identity. - Moving a
treeintoupsert_subtree()orset_root()stores the same tree identity while avoiding extra reference-count traffic on the caller's handle. commit()is not meaningful for a write transaction that was opened from atree, because there is no root slot or parent key to publish to.- Abort or destruction releases an unstored detached tree and discards an unfinished write transaction.
Cycle footgun:
- A stored subtree is a reference to a tree identity, not an owned deep copy.
- Do not store a tree inside itself.
- Do not store a tree inside one of its descendants.
- Do not build an indirect cycle such as
AcontainsB,BcontainsC, andCcontainsA. - PsiTri may not be able to detect all cycles cheaply. Code that creates cycles can leak storage because reference counts never drain.
These shapes are legal only when the application knows they do not create a cycle:
auto tree = tx.get_subtree("key");
tx.upsert_subtree("key2", tree);
auto root = rs->get_root(2);
tx.upsert_subtree("data", root);
To make an independent deep copy, iterate the source tree into a new detached tree, then store that tree in its destination.
Speculative Transaction¶
Use expect_failure when abort is common or the transaction may do only a few
writes. The API contract is the same: reads still see writes made by the
transaction.
auto tx = ws->start_transaction(0, psitri::tx_mode::expect_failure);
tx.upsert("session:42", "pending");
bool found = tx.get("session:42", [](psitri::value_view value) {
assert(value == "pending");
});
assert(found);
if (request_is_valid())
tx.commit();
else
tx.abort();
Contract:
expect_failuremay buffer writes before touching the persistent tree.- Point reads and range cursors must still read buffered writes.
- Commit produces the same visible result as
expect_success.
Current-State Cursors¶
Use cursor() on a write transaction when you want to read or iterate the
transaction's current write state.
auto tx = ws->start_transaction(0, psitri::tx_mode::expect_success);
tx.upsert("user:001", "Alice");
tx.upsert("user:002", "Bob");
{
auto cur = tx.cursor();
for (cur.lower_bound("user:"); !cur.is_end(); cur.next())
{
if (!cur.key().starts_with("user:"))
break;
cur.get_value([&](psitri::value_view value) {
process_user(cur.key(), value);
});
}
}
tx.upsert("user:003", "Carol");
tx.commit();
Contract:
tx.cursor()does not create a snapshot and does not pin the root.- The cursor sees committed data plus uncommitted writes in the transaction.
- Creating and destroying current-state cursors is cheap after warmup.
- A current-state cursor is valid only until the transaction mutates, commits, aborts, or is destroyed.
Do not mutate while a current-state cursor is live:
auto cur = tx.cursor();
cur.lower_bound("user:");
tx.upsert("user:004", "Dana"); // invalid: cur is still live
The implementation must catch this in debug builds, and may return an error in checked release APIs. End the cursor scope before mutating:
{
auto cur = tx.cursor();
cur.lower_bound("user:");
read_some_keys(cur);
}
tx.upsert("user:004", "Dana"); // ok
Snapshot Cursors¶
Use snapshot_cursor() only when you need a stable view that survives later
mutation.
auto tx = ws->start_transaction(0, psitri::tx_mode::expect_success);
tx.upsert("k1", "before");
auto snapshot = tx.snapshot_cursor();
tx.upsert("k1", "after");
assert(snapshot.get<std::string>("k1") == std::optional<std::string>{"before"});
assert(tx.get<std::string>("k1") == std::optional<std::string>{"after"}); // copies
Read-only sessions also use explicit snapshot naming:
Contract:
snapshot_cursor()pins a retained tree view.- A snapshot cursor remains stable across later writes.
- A snapshot cursor may force later writes to copy shared nodes.
- Snapshot cursor creation is explicit in the API name.
- Retained tree snapshot ownership is explicit in the return type:
get_root()andget_subtree()returntree, notcursor. - A snapshot cursor is a read-only isolation handle. Use
get_root()orget_subtree()when the application needs a retained tree handle. - A subtree opened from a snapshot uses the subtree version stored in that snapshot.
Snapshot implications:
- A snapshot keeps the objects it can see alive until the snapshot is released.
- Writes after the snapshot may need to COW nodes that the snapshot still sees.
- Long-lived snapshots can delay reclamation and compaction.
- Use snapshots for stable reads, but release them promptly on write-heavy workloads.
There are two different ways to copy from a snapshot.
To preserve the exact tree snapshot without deep-copying every key/value pair,
move a tree handle:
auto rs = db->start_read_session();
auto old_root = rs->get_root(0);
auto tx = ws->start_transaction(7, psitri::tx_mode::expect_success);
tx.upsert_subtree("archived-root-0", old_root);
tx.commit();
To create an independent deep copy, iterate the snapshot into a new detached tree:
auto rs = db->start_read_session();
auto old_profile = rs->get_root(0).get_subtree("user:alice:profile");
auto old_profile_cur = old_profile.cursor();
auto copy = ws->create_temporary_tree();
auto copy_edit = ws->start_write_transaction(std::move(copy),
psitri::tx_mode::expect_success);
for (old_profile_cur.seek_begin(); !old_profile_cur.is_end(); old_profile_cur.next())
{
old_profile_cur.get_value([&](psitri::value_view value) {
copy_edit.upsert(old_profile_cur.key(), value);
});
}
copy = copy_edit.get_tree();
auto tx = ws->start_transaction(0, psitri::tx_mode::expect_success);
tx.upsert_subtree("user:bob:profile", std::move(copy));
tx.commit();
The first form stores another reference to the same snapshot tree. The second form creates a new tree with independent contents.
Nested Transactions¶
Use sub_transaction() when part of a larger transaction should be committed or
aborted independently before the parent commits.
auto tx = ws->start_transaction(0, psitri::tx_mode::expect_success);
tx.upsert("order:100", "open");
{
auto save = tx.sub_transaction();
save.upsert("order:100:item:1", "book");
save.upsert("order:100:item:2", "pen");
save.commit();
}
assert(tx.get<std::string>("order:100:item:1") == std::optional<std::string>{"book"});
tx.commit();
Abort drops only the child work:
auto tx = ws->start_transaction(0, psitri::tx_mode::expect_success);
tx.upsert("cart:1", "open");
{
auto save = tx.sub_transaction();
save.upsert("cart:1:coupon", "INVALID");
save.abort();
}
assert(!tx.get<std::string>("cart:1:coupon"));
tx.commit();
Contract:
- A subtransaction is move-only and cannot outlive its parent.
- Committing a subtransaction saves its final state into the parent transaction.
- Aborting a subtransaction leaves the parent unchanged.
- Parent commit is still required to publish the final result.
Subtree Transactions¶
A subtree transaction edits a tree stored as a value under a parent key. On commit, the subtree transaction writes its final tree identity back to that key in the parent transaction.
auto tx = ws->start_transaction(0, psitri::tx_mode::expect_success);
{
auto profile = tx.subtree_transaction(
"user:alice:profile",
psitri::subtree_open::create_if_missing);
profile.upsert("name", "Alice");
profile.upsert("timezone", "America/Chicago");
profile.commit();
}
assert(tx.is_subtree("user:alice:profile"));
tx.commit();
Open an existing subtree the same way:
auto tx = ws->start_transaction(0, psitri::tx_mode::expect_success);
{
auto profile = tx.subtree_transaction(
"user:alice:profile",
psitri::subtree_open::must_exist);
profile.upsert("timezone", "UTC");
profile.commit();
}
tx.commit();
Contract:
- A subtree transaction has the same read, cursor, subtransaction, and
expect_success/expect_failurebehavior as a top-root transaction. - A subtree commit updates the parent transaction at exactly one key.
- A subtree abort leaves the parent key unchanged.
- A stored subtree is a full
tree_id, not just a root address. - A freshly-created subtree can be built as a detached tree and
then stored with
upsert_subtree(). - Snapshot cursors that open a subtree get a read-only snapshot view using the subtree version stored in their snapshot.
Multi-Root Transactions¶
Use a multi-root transaction when one logical operation must update independent top-level roots atomically.
std::array roots = {
psitri::root_access{0, psitri::root_mode::write}, // users
psitri::root_access{1, psitri::root_mode::write}, // email index
psitri::root_access{2, psitri::root_mode::read}, // config
};
auto tx = ws->start_transaction(std::span{roots},
psitri::tx_mode::expect_success);
auto users = tx.root(0);
auto email = tx.root(1);
auto cfg = tx.root(2);
users.upsert("user:42", user_record);
email.upsert("alice@example.com", "user:42");
cfg.get("signup_policy", [&](psitri::value_view policy) {
apply_policy(policy);
});
tx.commit();
Contract:
- Each top-level root has independent current write state.
- Write roots are published atomically by the transaction commit.
- Read roots are readable but not writable.
- Roots are locked in deterministic root-index order.
- A current-state cursor from one root is invalidated by writes to that same root. It is not invalidated by writes to independent roots.
Choosing Cursor APIs¶
| Need | API | Copies value? | Pins root? | Sees uncommitted writes? |
|---|---|---|---|---|
| Hot point read inside writer | tx.get(key, lambda) |
No | No | Yes |
| Owned point read | tx.get<std::string>(key) |
Yes | No | Yes |
| Reused caller buffer | tx.get(key, &buffer) |
Yes | No | Yes |
| Borrow view across a small local scope | auto pin = tx.pin_values() + cur.value(pin) |
No | No | Yes |
| Iterate current writer state | tx.cursor() |
No by default | No | Yes |
| Stable writer snapshot | tx.snapshot_cursor() |
Optional per read | Yes | State at creation |
| Read-only snapshot | read_session::snapshot_cursor(root) |
Optional per read | Yes | Committed state only |
| Movable root/subtree snapshot | get_root() / get_subtree() |
No by default | Yes | State at handle creation |
| Build detached tree | write_session::create_temporary_tree() |
No | No | Yes |
| MDBX read-write cursor | transaction current-state cursor | No by default | No | Yes |
| MDBX read-only cursor | snapshot cursor | No by default | Yes | Committed state only |
MDBX Shim Reference¶
The MDBX shim uses the same contract:
| MDBX operation | PsiTri behavior |
|---|---|
mdbx_get() in a read-write transaction |
Reads the current write state, including prior mdbx_put() and mdbx_del() calls in that transaction. |
mdbx_cursor_open() in a read-write transaction |
Acquires a pooled current-state cursor from the MDBX_txn. It does not open a PsiTri snapshot. |
mdbx_cursor_get() in a read-write transaction |
Iterates committed data plus uncommitted writes from the same transaction. |
mdbx_put() / mdbx_del() |
Invalidates current-state cursors opened earlier on the same DBI/root unless the write is performed through that cursor API and the shim explicitly repositions it. |
mdbx_txn_commit() / mdbx_txn_abort() |
Invalidates all cursors owned by that transaction and returns their storage to the transaction cursor pool. |
mdbx_txn_begin(... MDBX_TXN_RDONLY ...) |
Opens an explicit snapshot lifetime. Read-only cursors are snapshot cursors. |
mdbx_txn_reset() / mdbx_txn_renew() |
Releases snapshot ownership and cursor state, then reuses the transaction/session storage where possible. |
The shim must not accidentally convert read-write cursor opens into snapshot opens. Native MDBX users expect read-write transactions to read their own writes; PsiTri must preserve that behavior without pinning the current root on every short cursor session.
Testable Promises¶
The implementation should have tests for each promise below:
- A write transaction
get(key, lambda)sees its own insert, update, remove, and range remove before commit without copying the value. - The zero-copy value view passed to
get(key, lambda)is invalid after the lambda returns. - Long-running zero-copy callbacks are documented as delaying compaction or recycling while the protected view is live.
get<std::string>(key)copies into an owned temporary.- Exact-key
get()is the documented fast path when the caller knows the key;lower_bound()is for ordered positioning and range scans. update()is the documented fast path when the caller knows the key should exist;upsert()is the general insert-or-update path.- A write transaction
cursor()sees its own inserts and hides its own tombstones before commit. - Repeated current-state cursor acquire/release allocates only during warmup.
- Creating a current-state cursor does not retain the root and does not force COW.
- Mutating a tree while one of its current-state cursors is live is caught.
- A snapshot cursor remains stable after later writes.
- A snapshot cursor can open a subtree and sees the subtree version stored in that snapshot.
- A subtree transaction commit upserts exactly one subtree
tree_idinto its parent key. - A subtree transaction abort leaves the parent key unchanged.
create_temporary_tree()can build a detachedtree; copying it retains the same tree identity, while moving it intoupsert_subtree()orset_root()avoids extra reference-count traffic.start_write_transaction(tree)can modify a tree without publishing it to a root slot or parent key.get_tree()on a write transaction returns the edited tree, and storing that returned tree is the explicit publish step.- A detached tree is not visible to readers until it is stored in a transaction
that commits to a top-level root or is installed with
set_root(). get_root()andget_subtree()return copyable retained tree objects that can be stored elsewhere.- Copying or moving a snapshot tree into another root/subtree preserves the snapshot's tree identity; it does not deep-copy the contents.
- The docs and diagnostics clearly warn that storing trees cyclically can leak storage.
- A nested transaction commit saves into its parent; nested abort leaves the parent unchanged.
- A multi-root transaction publishes all write roots atomically.
- MDBX read-write
mdbx_get()andmdbx_cursor_get()read their own writes. - MDBX read-only transactions retain snapshot isolation.
- Creating more than 50 active application write sessions fails clearly or is rejected by configuration.
- Examples and benchmarks create write sessions once per worker thread, not once per operation.