CUBRID MVCC — Snapshot Construction, Active-MVCCID Tracking, and Vacuum Coordination
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Source verification (as of 2026-04-29)
- Beyond CUBRID — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”Multiversion concurrency control (MVCC) keeps multiple timestamped versions of each record so that reads and writes do not have to block each other on the same row. Database Internals (Petrov, ch. 5) frames it as one of three families of concurrency control alongside optimistic (OCC) and pessimistic (PCC) schemes, distinguished by the property that “reads can continue accessing older values until the new ones are committed” — coordination is pushed down to visibility rather than mutual exclusion. The dominant isolation level built on top of MVCC is snapshot isolation (SI): each transaction takes a logical snapshot of the database at start, executes its queries against that snapshot, and only commits if the values it modified were not changed concurrently. SI prevents dirty reads, non-repeatable reads, phantoms (for the snapshot), and lost updates, but admits write skew — the canonical example being two transactions that each preserve a local invariant but jointly violate it (Database Internals §“Isolation Levels”, §“Multiversion Concurrency Control”; see also [FEKETE04], [HELLERSTEIN07]).
Two implementation choices follow from the SI model and shape every MVCC engine:
- How to identify versions and decide visibility. Each transaction gets a monotonically increasing identifier. A version is visible to a snapshot iff its inserter committed before the snapshot was taken and its deleter (if any) committed after. The set of “active at snapshot time” transaction IDs is therefore the central data structure.
- How to reclaim dead versions. Once a row version is older than the oldest snapshot held by any live transaction, it is unreachable and can be vacuumed. The “oldest visible MVCCID” is the global low-water mark that gates this reclamation. A long-running write transaction holds this watermark down and stalls vacuum — a structural limitation of MVCC noted in the textbook and visible in PostgreSQL, MySQL, and CUBRID alike.
CUBRID implements snapshot isolation with monotonically incremented MVCCIDs, an in-memory table of currently active MVCCIDs, and a separate vacuum process. The rest of this document traces how each piece is realized in the source.
Common DBMS Design
Section titled “Common DBMS Design”The textbook gives the model; this section names the engineering
conventions that almost every SI/MVCC engine — PostgreSQL, Oracle,
InnoDB, SQL Server, CUBRID — adopts in some form. CUBRID’s specific
choices in ## CUBRID's Approach are best read as one set of dials
within this shared design space, not as inventions. The picture sits
naturally next to its two siblings: a Lock Manager enforcing
write/write (and, under stricter isolation, read/write) serialization,
and a Vacuum process reclaiming versions no live snapshot can
reach. The three legs share one language — the MVCCID — and three
rendezvous points: the per-record header, the per-transaction
snapshot, and the global “oldest visible” watermark.
Per-record version metadata
Section titled “Per-record version metadata”Every row carries enough information to answer “is this version
visible to me?” without consulting a central registry on each read.
The minimum stamp is (inserted_by, deleted_by) plus a pointer to the
previous version. PostgreSQL keeps xmin / xmax inline on the heap
page; Oracle and InnoDB push the prior version into undo segments and
store an undo locator on the live row. The choice cascades into
garbage-collection cost — see In-place vs out-of-place below.
Active set as snapshot
Section titled “Active set as snapshot”At snapshot acquisition the engine captures which transactions are still in flight. The naive representation is a sorted list of in-flight IDs; visibility becomes a binary search. Real systems compress with three layers, common across engines:
- A bit array over a sliding window of recent IDs (O(1) probe).
- An overflow list for outliers — long-running transactions whose IDs have aged out of the window.
- Cached low / high-watermark scalars that short-circuit the common case before any structure is touched.
The window size is the central knob — too small and outliers dominate, too large and copying the snapshot itself is the bottleneck.
Reclamation watermark
Section titled “Reclamation watermark”Versions older than the lowest live snapshot’s lower bound are
unreachable and may be reclaimed. The single global “oldest visible”
MVCCID is the watermark; reclamation (PostgreSQL VACUUM, Oracle UNDO
trim, CUBRID vacuum_master) is gated by it. Every SI engine carries
the same structural cost: one long-running write transaction pins the
watermark and stalls reclamation regardless of how many shorter
transactions have completed.
In-place vs out-of-place old versions
Section titled “In-place vs out-of-place old versions”- In-place (PostgreSQL): old versions live next to the current row on the heap. Pro: simple read path. Con: bloat, the need for HOT updates and a heavyweight scan-everything vacuum.
- Out-of-place (Oracle, MySQL InnoDB, CUBRID): old versions live in a separate area — undo segments, or the redo / undo log — and the current row carries a pointer (LSN, undo locator). Pro: the heap stays compact, the log is already structured for reclamation. Con: reading an older version costs an extra indirection.
The choice cascades into vacuum complexity, recovery semantics, and the shape of the version chain.
Snapshot acquisition timing is the isolation knob
Section titled “Snapshot acquisition timing is the isolation knob”Most SI engines reuse the same MVCC machinery across isolation levels and vary only the snapshot acquisition timing:
- Read Committed — a fresh snapshot per statement.
- Repeatable Read / SI — one snapshot at transaction start.
- Serializable — SI alone admits write skew. Two production responses: predicate locking (PostgreSQL SSI) or a fallback to lock-based serialization on writes (CUBRID’s choice — the lock manager carries this load; see the companion analysis).
Theory ↔ CUBRID mapping
Section titled “Theory ↔ CUBRID mapping”The textbook concepts of §“Theoretical Background” map to CUBRID’s
named entities as follows. ## CUBRID's Approach is the slow zoom
into each row.
| Theory | CUBRID name |
|---|---|
| Per-version timestamp | MVCCID — 64-bit counter, lazily issued on first write |
| Inserter / deleter stamps in record | mvcc_rec_header.mvcc_ins_id, mvcc_del_id |
| Old-version chain (out-of-place) | mvcc_rec_header.prev_version_lsa → log-resident copy |
| ”Active at snapshot time” set | mvcc_active_tran — bit array + long-tran overflow array |
| Per-transaction snapshot | mvcc_snapshot — active set + low/high MVCCID scalars |
| Visibility predicate | mvcc_satisfies_snapshot — 3-valued result |
| Global registry | mvcctable + m_trans_status_history[2048] ring |
| Oldest visible watermark | mvcctable::m_oldest_visible (atomic) |
CUBRID’s Approach
Section titled “CUBRID’s Approach”CUBRID instantiates the conventions above with three moving parts: a
global mvcctable that owns the active-set bookkeeping, a
per-transaction mvcc_info hung off the transaction descriptor, and
a separate vacuum process that reads the global watermark. The
distinguishing choices are: (1) lazy MVCCID issuance — only writers
consume IDs; (2) the active set is encoded as a bit array plus an
overflow list, not a sorted set; (3) snapshot construction is
lock-free against the commit path, validated by per-slot atomic
versions in a 2048-slot history ring.

Figure 1 — The big picture: a central mvcc_trans_status registry
on the left, transactions in the middle each carrying their own
snapshot and MVCCID, and the user-visible table on the right being
read or written. The arrows mark the three core operations: snapshot
creation against the registry, MVCCID deactivation on commit, and
the per-snapshot read against the table. (Source: original mvcc
analysis deck, slide 5.)
How an MVCC operation flows
Section titled “How an MVCC operation flows”flowchart LR
A["transaction begin"] --> B{"first write?\n(DDL / DML)"}
B -- "yes" --> C["mvcctable::get_new_mvccid\n→ assign MVCCID"]
B -- "no (read-only)" --> D
C --> D["statement runs"]
D --> E{"need a snapshot?\n(by isolation level)"}
E -- "yes" --> F["mvcctable::build_mvcc_info\n→ lock-free copy of active set"]
E -- "no" --> G
F --> G["per-record visibility:\nmvcc_satisfies_snapshot"]
G --> H["commit / rollback"]
H --> I["mvcctable::complete_mvcc\n→ flip bit, publish ring slot,\nmaybe advance lowest_active"]
I --> J["vacuum reclaims versions\n< m_oldest_visible"]
Each labeled box is unpacked in the subsections below. The boxes do not move; only the level of detail increases.
Component overview
Section titled “Component overview”flowchart LR
subgraph TX["per-transaction state (log_tdes)"]
MI["mvcc_info\n• id (own MVCCID)\n• snapshot\n• recent_lowest_active\n• sub_ids"]
end
subgraph TBL["global mvcctable (log_Gl.mvcc_table)"]
CUR["m_current_trans_status\n(live mvcc_active_tran)"]
HIST["m_trans_status_history[2048]\n(cyclic ring of past states)"]
OV["m_oldest_visible\n(vacuum watermark)"]
LV["m_transaction_lowest_visible_mvccids[]\n(per-tran snapshot floor)"]
end
subgraph LOG["active log volume header"]
NEXT["log_Gl.hdr.mvcc_next_id\n(MVCCID counter)"]
end
VAC[("vacuum master")]
MI -- build_mvcc_info --> HIST
CUR -- complete_mvcc / get_new_mvccid --> HIST
CUR -- get_new_mvccid --> NEXT
LV -- update_global_oldest_visible --> OV
OV --> VAC
MVCCID assignment policy
Section titled “MVCCID assignment policy”- An MVCCID is allocated lazily, on the transaction’s first write operation (DDL or DML). Read-only transactions never receive an MVCCID and so do not consume the active-set capacity.
- Exactly one MVCCID per write transaction. Subsequent writes in the same transaction reuse it; sub-transactions get separate IDs.
- The MVCCID counter itself is not owned by the MVCC table — it lives in the active log volume header.
// mvcctable::get_new_mvccid — src/transaction/mvcc_table.cppMVCCIDmvcctable::get_new_mvccid (){ MVCCID id;
m_new_mvccid_lock.lock (); id = log_Gl.hdr.mvcc_next_id; MVCCID_FORWARD (log_Gl.hdr.mvcc_next_id); m_new_mvccid_lock.unlock ();
return id;}The dedicated m_new_mvccid_lock keeps MVCCID issuance off the hot
m_active_trans_mutex path; the comment in mvcc_table.hpp notes that
this could in principle be replaced with atomic operations.
Per-record header
Section titled “Per-record header”Every heap and index record carries an mvcc_rec_header. The flag byte
controls which optional fields are physically present, so unused MVCC
slots cost zero bytes:
// mvcc_rec_header — src/transaction/mvcc.hstruct mvcc_rec_header{ INT32 mvcc_flag:8; /* MVCC flags */ INT32 repid:24; /* representation id */ int chn; /* cache coherency number */ MVCCID mvcc_ins_id; /* MVCC insert id */ MVCCID mvcc_del_id; /* MVCC delete id */ LOG_LSA prev_version_lsa; /* log address of previous version */};flowchart TB R0["row v0 (current in heap)\nins_id = 7, del_id = 12\nprev_version_lsa → log entry A"] R1["row v_-1 (in log)\nins_id = 3, del_id = 7\nprev_version_lsa → log entry B"] R2["row v_-2 (in log)\nins_id = 0, del_id = 3\nprev_version_lsa = NULL"] R0 --> R1 --> R2
Older versions are not kept inline next to the current version (as in
PostgreSQL) — they are reachable via prev_version_lsa chained back
into the log, where vacuum and recovery can find them.
Active-MVCCID tracking — mvcc_active_tran
Section titled “Active-MVCCID tracking — mvcc_active_tran”The novel piece in CUBRID is how the set of currently active MVCCIDs
is encoded. A naive std::set<MVCCID> would dominate the cost of
building snapshots and of the visibility check. CUBRID uses a bit
array + overflow array hybrid:
// mvcc_active_tran (private members) — src/transaction/mvcc_active_tran.hppprivate: using unit_type = std::uint64_t;
static const size_t BITAREA_MAX_SIZE = 500; // 500 * 64 = 32k MVCCIDs static const unit_type ALL_ACTIVE = 0; static const unit_type ALL_COMMITTED = (unit_type) -1;
/* bit area to store MVCCIDS status - size BITAREA_MAX_SIZE */ unit_type *m_bit_area; /* first MVCCID whose status is stored in bit area */ volatile MVCCID m_bit_area_start_mvccid; /* the area length expressed in bits */ volatile size_t m_bit_area_length;
/* long time transaction mvccid array */ MVCCID *m_long_tran_mvccids; volatile size_t m_long_tran_mvccids_length;Each bit represents one MVCCID; 0 = active, 1 = completed (committed
or rolled back). The bit at offset i covers
m_bit_area_start_mvccid + i. Default cap is 500 units = 32 000
MVCCIDs of recent history. IDs that fall behind that window because
their owners are still running get evicted into the sorted
m_long_tran_mvccids array, sized by max_transactions.
flowchart LR
subgraph BA["m_bit_area (LSB → MSB by MVCCID)"]
direction LR
U0["unit 0\n0011 0111 ..."]
U1["unit 1\n1111 1111 ..."]
U2["unit 2\n0110 0110 ..."]
DOTS["..."]
UN["unit ≤ 499"]
U0 --> U1 --> U2 --> DOTS --> UN
end
START["m_bit_area_start_mvccid\n(MVCCID at unit-0 LSB)"] --> U0
LT["m_long_tran_mvccids[]\n(sorted, MVCCIDs older than start)"]
LT -. "evicted" .-> BA

Figure 2 — Bit-array layout: 500 units × 64 bits = 32 000 MVCCIDs
of recent history. m_bit_area_start_mvccid anchors unit 0’s LSB,
so MVCCID = start + bit_offset. Bit value 0 = active, 1 =
completed. (Source: deck slide 12.)
Visibility check (mvcc_active_tran::is_active): if the queried MVCCID
predates m_bit_area_start_mvccid, scan the long-transaction array;
else look up the bit. Common case (recently issued IDs) is O(1).
When an MVCCID is completed, the bit is flipped. If the bit area’s
prefix has many fully-completed units (ALL_COMMITTED), those units
are LTRIM-ed (m_bit_area_start_mvccid advances). If the area then
still exceeds LONG_TRAN_THRESHOLD, residual still-active IDs are
migrated into the long-transaction array. Two cached scalars short-
circuit common queries: compute_highest_completed_mvccid and
compute_lowest_active_mvccid.

Figure 3 — lowest_active_mvccid cached scalar. ○ = active,
● = completed. Everything strictly below the cached value is
known-completed without probing the bit area, which short-circuits
the common case in mvcc_is_id_in_snapshot. (Source: deck slide 19.)
MVCC table — mvcctable
Section titled “MVCC table — mvcctable”The MVCC table is the global registry. The relevant private members:
// mvcctable (private members) — src/transaction/mvcc_table.hppclass mvcctable{ /* ... public API ... */ private: static const size_t HISTORY_MAX_SIZE = 2048; // must be a power of 2 static const size_t HISTORY_INDEX_MASK = HISTORY_MAX_SIZE - 1;
lowest_active_mvccid_type *m_transaction_lowest_visible_mvccids; size_t m_transaction_lowest_visible_mvccids_size; lowest_active_mvccid_type m_current_status_lowest_active_mvccid;
mvcc_trans_status m_current_trans_status; std::atomic<size_t> m_trans_status_history_position; mvcc_trans_status *m_trans_status_history;
std::mutex m_new_mvccid_lock; std::mutex m_active_trans_mutex;
std::atomic<MVCCID> m_oldest_visible; std::atomic<size_t> m_ov_lock_count;};Two mutexes split the contention:
m_new_mvccid_lock for monotonic ID issuance,
m_active_trans_mutex for the active-set transitions. The history ring
plus an atomic version counter on each slot lets readers (snapshot
builders) operate lock-free.
Snapshot construction — build_mvcc_info
Section titled “Snapshot construction — build_mvcc_info”// mvcctable::build_mvcc_info — src/transaction/mvcc_table.cpp// (lock-free retry loop, condensed)while (true) { snapshot_retry_count++; /* ... set transaction's lowest_visible to MVCCID_ALL_VISIBLE, * then to crt_status_lowest_active, in this order ... */
index = m_trans_status_history_position.load (); assert (index < HISTORY_MAX_SIZE);
const mvcc_trans_status &trans_status = m_trans_status_history[index];
trans_status_version = trans_status.m_version.load (); trans_status.m_active_mvccs.copy_to ( tdes.mvccinfo.snapshot.m_active_mvccs, mvcc_active_tran::copy_safety::THREAD_UNSAFE);
/* ... load global stats ... */
if (trans_status_version == trans_status.m_version.load ()) { // no version change; copying status was successful break; } else { // a failed copy may break data validity tdes.mvccinfo.snapshot.m_active_mvccs.reset_active_transactions (); } }
tdes.mvccinfo.recent_snapshot_lowest_active_mvccid = crt_status_lowest_active;tdes.mvccinfo.snapshot.snapshot_fnc = mvcc_satisfies_snapshot;tdes.mvccinfo.snapshot.lowest_active_mvccid = crt_status_lowest_active;tdes.mvccinfo.snapshot.highest_completed_mvccid = highest_completed_mvccid;tdes.mvccinfo.snapshot.valid = true;This is the lock-free read pattern in full. The reader picks up the
ring index, reads the slot’s m_version before and after copying
the active-set, and retries if a writer touched the slot in between.
sequenceDiagram
participant TX as Transaction
participant TBL as mvcctable
participant RING as history[pos]
participant SNAP as tx.snapshot
TX->>TBL: build_mvcc_info(tdes)
loop until version stable
TBL->>RING: pos = m_trans_status_history_position.load()
TBL->>RING: v1 = ring[pos].m_version
TBL->>SNAP: copy active_mvccs (bit_area + long_tran)
TBL->>RING: v2 = ring[pos].m_version
alt v1 == v2
note right of TBL: stable copy — break
else changed
TBL->>SNAP: reset_active_transactions
note right of TBL: retry — perfmon counts retries
end
end
TBL->>SNAP: lowest_active = crt_status_lowest_active
TBL->>SNAP: highest_completed = computed from copy
TBL->>SNAP: snapshot_fnc = mvcc_satisfies_snapshot
TBL->>SNAP: valid = true
The snapshot is read-only after construction. SI guarantees that queries within the same snapshot see the same set of committed versions.
Snapshot acquisition timing is driven by the isolation level, set
by the executor before calling logtb_get_mvcc_snapshot:
| Isolation level | When snapshot is taken |
|---|---|
| READ COMMITTED (4) | Before each statement that touches existing rows |
| REPEATABLE READ (5) | Once, at transaction start |
| SERIALIZABLE (6) | Once, at transaction start |
Even at READ COMMITTED, statements that do not access existing data
(CREATE, plain DROP, TRUNCATE not implemented as DELETE) skip
snapshot acquisition.
Visibility predicate — mvcc_satisfies_snapshot
Section titled “Visibility predicate — mvcc_satisfies_snapshot”The predicate has two top-level branches (deleted vs. not-deleted) and returns one of three verdicts:
// mvcc_satisfies_snapshot — src/transaction/mvcc.c (condensed)MVCC_SATISFIES_SNAPSHOT_RESULTmvcc_satisfies_snapshot (THREAD_ENTRY * thread_p, MVCC_REC_HEADER * rec_header, MVCC_SNAPSHOT * snapshot){ if (!MVCC_IS_HEADER_DELID_VALID (rec_header)) { /* Record is not deleted */ if (!MVCC_IS_FLAG_SET (rec_header, OR_MVCC_FLAG_VALID_INSID)) return SNAPSHOT_SATISFIED; /* visible to all */ else if (MVCC_IS_REC_INSERTED_BY_ME (...)) return SNAPSHOT_SATISFIED; /* my own insert */ else if (MVCC_IS_REC_INSERTER_IN_SNAPSHOT (...)) return TOO_NEW_FOR_SNAPSHOT; /* inserter active or * committed after snap */ else return SNAPSHOT_SATISFIED; /* committed before */ } else { /* Record is deleted */ if (MVCC_IS_REC_DELETED_BY_ME (...)) return TOO_OLD_FOR_SNAPSHOT; /* I deleted it */ else if (MVCC_IS_REC_INSERTER_IN_SNAPSHOT (...)) return TOO_NEW_FOR_SNAPSHOT; /* inserter still active */ else if (MVCC_IS_REC_DELETER_IN_SNAPSHOT (...)) return SNAPSHOT_SATISFIED; /* deleter active / * committed-after-snap */ else return TOO_OLD_FOR_SNAPSHOT; /* deleter committed * before snap */ }}flowchart TD
A["record header"] --> B{"deleted?\n(DELID flag valid)"}
B -- "no" --> C{"VALID_INSID flag?"}
C -- "no" --> R1["SNAPSHOT_SATISFIED\n(visible to all)"]
C -- "yes" --> D{"inserted by me?"}
D -- "yes" --> R2["SNAPSHOT_SATISFIED"]
D -- "no" --> E{"inserter in snapshot's\nactive set?"}
E -- "yes" --> R3["TOO_NEW_FOR_SNAPSHOT\n→ walk prev_version_lsa"]
E -- "no" --> R4["SNAPSHOT_SATISFIED"]
B -- "yes" --> F{"deleted by me?"}
F -- "yes" --> R5["TOO_OLD_FOR_SNAPSHOT"]
F -- "no" --> G{"inserter in snapshot's\nactive set?"}
G -- "yes" --> R6["TOO_NEW_FOR_SNAPSHOT"]
G -- "no" --> H{"deleter in snapshot's\nactive set?"}
H -- "yes" --> R7["SNAPSHOT_SATISFIED\n(deleter not yet visible)"]
H -- "no" --> R8["TOO_OLD_FOR_SNAPSHOT"]
The “is the inserter/deleter in the snapshot’s active set?” check drills into the bit-array fast path:
// mvcc_is_id_in_snapshot — src/transaction/mvcc.cSTATIC_INLINE boolmvcc_is_id_in_snapshot (THREAD_ENTRY * thread_p, MVCCID mvcc_id, MVCC_SNAPSHOT * snapshot){ if (MVCC_ID_PRECEDES (mvcc_id, snapshot->lowest_active_mvccid)) return false; /* certainly committed before snap */
if (MVCC_ID_FOLLOW_OR_EQUAL (mvcc_id, snapshot->highest_completed_mvccid)) return true; /* certainly active or future */
return snapshot->m_active_mvccs.is_active (mvcc_id); /* probe bit area */}The two scalar bounds (lowest_active, highest_completed) eliminate
the bit-array probe for the bulk of MVCCIDs.

Figure 4 — Visibility worked example. Three concurrent transactions
(A: snapshot {18, 19, 30}; B: snapshot {19, 30, 32}, MVCCID 32;
C: snapshot {19, 30, 32, 34}, MVCCID 34) reading four record versions
on the left. The colored circles on the right enumerate which
snapshots see which version. Note the asymmetry of insert vs. delete
visibility — record (ins=18, del=32) is visible to A (deleter not
yet committed at A’s snapshot), to B (the deleter itself), and not
visible to C (deleter committed before C’s snapshot). (Source: deck
slide 26.)
Commit / rollback — complete_mvcc
Section titled “Commit / rollback — complete_mvcc”The commit path is where the active-set state and the history ring advance together:
// mvcctable::complete_mvcc — src/transaction/mvcc_table.cpp (condensed)voidmvcctable::complete_mvcc (int tran_index, MVCCID mvccid, bool committed){ std::unique_lock<std::mutex> ulock (m_active_trans_mutex);
mvcc_trans_status::version_type next_version; size_t next_index; mvcc_trans_status &next_status = next_trans_status_start (next_version, next_index);
/* ... stats update if committed ... */
// update current trans status m_current_trans_status.m_active_mvccs.set_inactive_mvccid (mvccid); m_current_trans_status.m_last_completed_mvccid = mvccid; m_current_trans_status.m_event_type = committed ? COMMIT : ROLLBACK;
// finish next trans status (publish to ring) next_tran_status_finish (next_status, next_index);
/* ... bookkeeping for vacuum's lowest_visible array ... */
ulock.unlock ();
// advance lowest_active outside the lock when warranted MVCCID global_lowest_active = m_current_status_lowest_active_mvccid; if (global_lowest_active == mvccid || MVCC_ID_PRECEDES (mvccid, next_status.m_active_mvccs.get_bit_area_start_mvccid ())) { MVCCID new_lowest_active = next_status.m_active_mvccs.compute_lowest_active_mvccid (); if (next_status.m_version.load () == next_version) advance_oldest_active (new_lowest_active); }}Sequence for a single commit/rollback:
sequenceDiagram
participant TX as Transaction
participant CUR as m_current_trans_status
participant RING as history ring
participant LV as lowest_visible[tran_index]
TX->>CUR: lock m_active_trans_mutex
CUR->>RING: next_trans_status_start → reserve slot N+1, bump version
TX->>CUR: m_active_mvccs.set_inactive_mvccid(mvccid)
TX->>CUR: m_last_completed = mvccid — event_type = COMMIT or ROLLBACK
CUR->>RING: next_tran_status_finish → copy CUR into slot, store position
TX->>LV: if committed clamp to mvccid — if rollback set MVCCID_NULL
TX->>CUR: unlock
opt mvccid was the lowest active
TX->>CUR: compute_lowest_active_mvccid + advance_oldest_active
end
m_trans_status_history_position is the atomic that snapshot readers
load — bumping it last is what makes the new state visible to them.
Vacuum coordination
Section titled “Vacuum coordination”Vacuum cannot remove a version that is still visible to any live
snapshot. CUBRID’s vacuum master periodically calls
mvcctable::update_global_oldest_visible, which sweeps every
m_transaction_lowest_visible_mvccids[idx] plus the live
m_current_status_lowest_active_mvccid:
// mvcctable::compute_oldest_visible_mvccid — src/transaction/mvcc_table.cpp (excerpt)MVCCID lowest_active_mvccid = oldest_active_get ( m_current_status_lowest_active_mvccid, 0, oldest_active_event::GET_OLDEST_ACTIVE);
for (size_t idx = 0; idx < m_transaction_lowest_visible_mvccids_size; idx++) { loaded_tran_mvccid = oldest_active_get ( m_transaction_lowest_visible_mvccids[idx], idx, oldest_active_event::GET_OLDEST_ACTIVE); if (loaded_tran_mvccid == MVCCID_ALL_VISIBLE) { waiting_mvccids_pos.append (idx); /* re-check later */ } else if (loaded_tran_mvccid != MVCCID_NULL && MVCC_ID_PRECEDES (loaded_tran_mvccid, lowest_active_mvccid)) { lowest_active_mvccid = loaded_tran_mvccid; } }The vacuum master then publishes the result into the atomic
m_oldest_visible, which is the single value the per-record
mvcc_satisfies_vacuum reads.
The well-known cost of this design: a single long-running write
transaction with a small MVCCID pins m_oldest_visible and prevents
vacuuming of any version newer than it, regardless of how many shorter
transactions have come and gone.

Figure 5 — Vacuum watermark calculation. The 2048-slot history ring
holds three live versions v0/v1/v2 with per-version active-set
snapshots {10, 13, 17}, {13, 17}, {13, 17, 18}. The
m_transaction_lowest_visible_mvccids[] array gives each in-flight
transaction’s snapshot floor (MVCCID_NULL = transaction ended,
ignored). m_oldest_visible is the minimum of all live floors —
here 13 — and is what mvcum_master consults to decide which
versions are reclaimable. (Source: deck slide 30.)
Source Walkthrough
Section titled “Source Walkthrough”Anchor on symbol names, not line numbers. The CUBRID source moves; a function name (or struct/enum tag) is the stable handle. Use
git grep -n '<symbol>' src/transaction/to locate the current position. The line numbers cited in this section were observed when the document was lastupdated:and are intended only as quick hints.
Header definitions (src/transaction/)
Section titled “Header definitions (src/transaction/)”struct mvcc_rec_header(inmvcc.h) — on-record MVCC fields (flag byte, ins/del MVCCID, prev_version_lsa).enum mvcc_satisfies_snapshot_result(inmvcc.h) — the three visibility outcomes (SNAPSHOT_SATISFIED,TOO_OLD_FOR_SNAPSHOT,TOO_NEW_FOR_SNAPSHOT).struct mvcc_snapshot(inmvcc.h) — embeddedm_active_mvccspluslowest_active/highest_completedscalars.struct mvcc_info(inmvcc.h) — per-active-transaction MVCC state, hung offlog_tdes.struct mvcc_trans_status(inmvcc_table.hpp) — one slot in the history ring; the live status is the same type.class mvcctable(inmvcc_table.hpp) — global table with the history ring and the two mutexes.struct mvcc_active_tran(inmvcc_active_tran.hpp) — bit-array- long-tran active set.
Hot paths (src/transaction/)
Section titled “Hot paths (src/transaction/)”mvcctable::build_mvcc_info(inmvcc_table.cpp) — lock-free snapshot copy with version-validated retry.mvcctable::compute_oldest_visible_mvccid(inmvcc_table.cpp).mvcctable::is_active(inmvcc_table.cpp) — delegates to the embeddedmvcc_active_tran.mvcctable::complete_mvcc(inmvcc_table.cpp) — commit/rollback hook; advances the history ring.mvcctable::get_new_mvccid(inmvcc_table.cpp) — issuance underm_new_mvccid_lockagainstlog_Gl.hdr.mvcc_next_id.mvcctable::update_global_oldest_visible(inmvcc_table.cpp) — vacuum’s source of truth form_oldest_visible.
Visibility evaluation (src/transaction/mvcc.c)
Section titled “Visibility evaluation (src/transaction/mvcc.c)”mvcc_is_id_in_snapshot— low-water / high-water short-circuits followed by a bit-area probe.mvcc_is_active_id— per-tran fast path againstrecent_snapshot_lowest_active_mvccid.mvcc_satisfies_snapshot— the decision tree (deleted-or-not branch, then inserter/deleter visibility).mvcc_is_not_deleted_for_snapshot— DML “still-deletable” check.mvcc_satisfies_vacuum— vacuum’s per-record decision.mvcc_satisfies_delete— five-state classification at delete time (DELETE_RECORD_INSERT_IN_PROGRESS/_CAN_DELETE/_DELETED/_DELETE_IN_PROGRESS/_SELF_DELETED).
Position hints as of this revision
Section titled “Position hints as of this revision”These line numbers held when the document was last updated:. If you
land at a different definition, the symbol name above is authoritative;
update the table on your way through.
| Symbol | File | Line |
|---|---|---|
struct mvcc_rec_header | mvcc.h | 38 |
enum mvcc_satisfies_snapshot_result | mvcc.h | 159 |
struct mvcc_snapshot | mvcc.h | 173 |
struct mvcc_info | mvcc.h | 196 |
struct mvcc_trans_status | mvcc_table.hpp | 40 |
class mvcctable | mvcc_table.hpp | 64 |
struct mvcc_active_tran | mvcc_active_tran.hpp | 31 |
mvcctable::build_mvcc_info | mvcc_table.cpp | 226 |
mvcctable::compute_oldest_visible_mvccid | mvcc_table.cpp | 355 |
mvcctable::is_active | mvcc_table.cpp | 423 |
mvcctable::complete_mvcc | mvcc_table.cpp | 465 |
mvcctable::get_new_mvccid | mvcc_table.cpp | 566 |
mvcctable::update_global_oldest_visible | mvcc_table.cpp | 617 |
mvcc_is_id_in_snapshot | mvcc.c | 91 |
mvcc_is_active_id | mvcc.c | 123 |
mvcc_satisfies_snapshot | mvcc.c | 156 |
mvcc_is_not_deleted_for_snapshot | mvcc.c | 280 |
mvcc_satisfies_vacuum | mvcc.c | 321 |
mvcc_satisfies_delete | mvcc.c | 389 |
Source verification (as of 2026-04-29)
Section titled “Source verification (as of 2026-04-29)”Each entry is a fact about the current source — readable without the original analysis materials. The trailing note shows how it was checked and, where relevant, historical drift or limits of verification. Open questions follow as the curator’s recorded gaps; future readers should treat them as starting points, not as known bugs.
Verified facts
Section titled “Verified facts”-
The bit-area cap is
BITAREA_MAX_SIZE = 500units (= 32 000 MVCCIDs of recent history). Hard-coded inmvcc_active_tran.hpp; the migration thresholds (LONG_TRAN,CLEANUP) live alongside it inmvcc_active_tran.cpp. Not a runtime parameter — tuning requires a code change. -
The MVCCID counter is owned by the active log volume header (
log_Gl.hdr.mvcc_next_id), not by the MVCC table. Confirmed inmvcctable::get_new_mvccid(mvcc_table.cpp). A dedicatedm_new_mvccid_lockkeeps issuance off the active-set hot path; themvcc_table.hppcomment notes this could in principle be replaced with atomic ops. -
The MVCC layer only exposes
build_mvcc_info; isolation-level snapshot-timing policy lives outsidesrc/transaction/mvcc*.c. The “RC = per-statement, RR/SR = once at start” rule is enforced by the call sites oflogtb_get_mvcc_snapshotin transaction-descriptor / xasl code. Auditing the per-statement RC behavior requires stepping outside this document’s scope. -
mvcc_rec_headercarries three documented flag bits inside a 5-bit maskOR_MVCC_FLAG_MASK = 0x1f. Documented bits:VALID_INSID,VALID_DELID,VALID_PREV_VERSION. Two bits in the mask are currently unused — see Open Questions §1. -
Sub-transactions exist in code (
mvcc_info::sub_ids,complete_sub_mvcc) and shape savepoint semantics. Present inmvcc.handmvcc_table.cppbut not covered in the body of this document — a follow-up analysis is owed.
Open questions
Section titled “Open questions”-
Two unused bits in
OR_MVCC_FLAG_MASK. Reserved for a planned feature (distributed MVCC? tombstone-without-deleter?), or just slack? Investigation path: trace the bit definitions through git history and search for any in-flight CBRD tickets touching the mask. -
Saturation behavior of the 2048-slot history ring. Under what workload does
HISTORY_MAX_SIZE = 2048saturate, and what does CUBRID do when a snapshot’s source slot is overwritten before the snapshot is fully built? The atomicm_versionvalidation inbuild_mvcc_infodrives a retry loop (snapshot_retry_count), but the worst-case retry bound is unknown. Investigation path: instrumentsnapshot_retry_countunder a contended workload. -
Write-skew handling under SERIALIZABLE. SI alone admits write skew; PostgreSQL SSI detects-and-aborts, while CUBRID almost certainly falls back to lock-based serialization. Investigation path: trace the SERIALIZABLE write path through
lock_objectcalls; cross-referencecubrid-lock-manager.md§“NON2PL” and §“Beyond CUBRID”.
Beyond CUBRID — Comparative Designs & Research Frontiers
Section titled “Beyond CUBRID — Comparative Designs & Research Frontiers”Pointers, not analysis. Each bullet is a starting handle for a follow-up doc; depth here is intentionally shallow.
- PostgreSQL SSI — Serializable Snapshot Isolation (Cahill et al., SIGMOD 2008; Ports & Grittner, VLDB 2012) augments SI with predicate locking and dependency-graph cycle detection to catch write skew at commit time. CUBRID’s SERIALIZABLE relies on the lock manager instead. A side-by-side cost comparison would tell us what we trade by avoiding predicate locking.
- In-memory MVCC engines (HyPer, Hekaton, Cicada) redesign version chains for cache-aware in-memory layouts and often eliminate the central registry. CUBRID is disk-resident; the comparison is orthogonal but instructive for distinguishing costs that are intrinsic to MVCC from those intrinsic to disk-resident MVCC. Wu et al., In-Memory MVCC Empirical Evaluation (VLDB 2017) surveys the design space.
- In-place vs out-of-place trade-off, measured. PostgreSQL’s bloat / HOT problem and CUBRID’s vacuum read-amplification are symmetric costs of the same underlying choice. Following the empirical literature here would let us put numbers on what we pay for the undo-log version chain.
- Concurrency control at high core counts. Yu et al., Staring into the Abyss (VLDB 2015) — a benchmark of seven CC protocols at 1 000 cores. Relevant if CUBRID is to scale beyond current per-server core counts.
- Hybrid OCC + MVCC. Modern engines often pair MVCC reads with
optimistic write validation. Whether CUBRID’s NON2PL mechanism is
a stepping stone toward this is itself an open question — see the
cross-reference in
cubrid-lock-manager.md§“Beyond CUBRID”.
The intent of this section is to seed next documents, not to analyze. Each bullet should become its own curated note when its turn comes.
Sources
Section titled “Sources”Raw analyses (under raw/code-analysis/cubrid/storage/mvcc/)
Section titled “Raw analyses (under raw/code-analysis/cubrid/storage/mvcc/)”mvcc 코드 분석 ver 2.pdf(slide render)mvcc 코드 분석 ver 2.pptx(slide source — cleaner text extraction)
Textbook chapters (under knowledge/research/dbms-general/)
Section titled “Textbook chapters (under knowledge/research/dbms-general/)”- Database Internals (Petrov), Ch. 5 “Transaction Processing and Recovery”, §“Multiversion Concurrency Control” (≈ line 4002), §“Isolation Levels” (≈ line 4136), §“Snapshot Isolation” (≈ line 11266 in the distributed-transactions chapter).
Notion (CUBRID DEV WIKI)
Section titled “Notion (CUBRID DEV WIKI)”- Storage – Concurrency 코드 분석 — module-level positioning of Lock Manager · MVCC · Vacuum on top of Heap Manager · Page Buffer; the source of the “three-leg” framing in §“Common DBMS Design”.
CUBRID source (under /data/hgryoo/references/cubrid/)
Section titled “CUBRID source (under /data/hgryoo/references/cubrid/)”src/transaction/mvcc.hsrc/transaction/mvcc.csrc/transaction/mvcc_table.hppsrc/transaction/mvcc_table.cppsrc/transaction/mvcc_active_tran.hppsrc/transaction/mvcc_active_tran.cpp