Skip to content

CUBRID MVCC — Snapshot Construction, Active-MVCCID Tracking, and Vacuum Coordination

Contents:

Multiversion concurrency control (MVCC) keeps multiple timestamped versions of each record so that reads and writes do not have to block each other on the same row. Database Internals (Petrov, ch. 5) frames it as one of three families of concurrency control alongside optimistic (OCC) and pessimistic (PCC) schemes, distinguished by the property that “reads can continue accessing older values until the new ones are committed” — coordination is pushed down to visibility rather than mutual exclusion. The dominant isolation level built on top of MVCC is snapshot isolation (SI): each transaction takes a logical snapshot of the database at start, executes its queries against that snapshot, and only commits if the values it modified were not changed concurrently. SI prevents dirty reads, non-repeatable reads, phantoms (for the snapshot), and lost updates, but admits write skew — the canonical example being two transactions that each preserve a local invariant but jointly violate it (Database Internals §“Isolation Levels”, §“Multiversion Concurrency Control”; see also [FEKETE04], [HELLERSTEIN07]).

Two implementation choices follow from the SI model and shape every MVCC engine:

  1. How to identify versions and decide visibility. Each transaction gets a monotonically increasing identifier. A version is visible to a snapshot iff its inserter committed before the snapshot was taken and its deleter (if any) committed after. The set of “active at snapshot time” transaction IDs is therefore the central data structure.
  2. How to reclaim dead versions. Once a row version is older than the oldest snapshot held by any live transaction, it is unreachable and can be vacuumed. The “oldest visible MVCCID” is the global low-water mark that gates this reclamation. A long-running write transaction holds this watermark down and stalls vacuum — a structural limitation of MVCC noted in the textbook and visible in PostgreSQL, MySQL, and CUBRID alike.

CUBRID implements snapshot isolation with monotonically incremented MVCCIDs, an in-memory table of currently active MVCCIDs, and a separate vacuum process. The rest of this document traces how each piece is realized in the source.

The textbook gives the model; this section names the engineering conventions that almost every SI/MVCC engine — PostgreSQL, Oracle, InnoDB, SQL Server, CUBRID — adopts in some form. CUBRID’s specific choices in ## CUBRID's Approach are best read as one set of dials within this shared design space, not as inventions. The picture sits naturally next to its two siblings: a Lock Manager enforcing write/write (and, under stricter isolation, read/write) serialization, and a Vacuum process reclaiming versions no live snapshot can reach. The three legs share one language — the MVCCID — and three rendezvous points: the per-record header, the per-transaction snapshot, and the global “oldest visible” watermark.

Every row carries enough information to answer “is this version visible to me?” without consulting a central registry on each read. The minimum stamp is (inserted_by, deleted_by) plus a pointer to the previous version. PostgreSQL keeps xmin / xmax inline on the heap page; Oracle and InnoDB push the prior version into undo segments and store an undo locator on the live row. The choice cascades into garbage-collection cost — see In-place vs out-of-place below.

At snapshot acquisition the engine captures which transactions are still in flight. The naive representation is a sorted list of in-flight IDs; visibility becomes a binary search. Real systems compress with three layers, common across engines:

  • A bit array over a sliding window of recent IDs (O(1) probe).
  • An overflow list for outliers — long-running transactions whose IDs have aged out of the window.
  • Cached low / high-watermark scalars that short-circuit the common case before any structure is touched.

The window size is the central knob — too small and outliers dominate, too large and copying the snapshot itself is the bottleneck.

Versions older than the lowest live snapshot’s lower bound are unreachable and may be reclaimed. The single global “oldest visible” MVCCID is the watermark; reclamation (PostgreSQL VACUUM, Oracle UNDO trim, CUBRID vacuum_master) is gated by it. Every SI engine carries the same structural cost: one long-running write transaction pins the watermark and stalls reclamation regardless of how many shorter transactions have completed.

  • In-place (PostgreSQL): old versions live next to the current row on the heap. Pro: simple read path. Con: bloat, the need for HOT updates and a heavyweight scan-everything vacuum.
  • Out-of-place (Oracle, MySQL InnoDB, CUBRID): old versions live in a separate area — undo segments, or the redo / undo log — and the current row carries a pointer (LSN, undo locator). Pro: the heap stays compact, the log is already structured for reclamation. Con: reading an older version costs an extra indirection.

The choice cascades into vacuum complexity, recovery semantics, and the shape of the version chain.

Snapshot acquisition timing is the isolation knob

Section titled “Snapshot acquisition timing is the isolation knob”

Most SI engines reuse the same MVCC machinery across isolation levels and vary only the snapshot acquisition timing:

  • Read Committed — a fresh snapshot per statement.
  • Repeatable Read / SI — one snapshot at transaction start.
  • Serializable — SI alone admits write skew. Two production responses: predicate locking (PostgreSQL SSI) or a fallback to lock-based serialization on writes (CUBRID’s choice — the lock manager carries this load; see the companion analysis).

The textbook concepts of §“Theoretical Background” map to CUBRID’s named entities as follows. ## CUBRID's Approach is the slow zoom into each row.

TheoryCUBRID name
Per-version timestampMVCCID — 64-bit counter, lazily issued on first write
Inserter / deleter stamps in recordmvcc_rec_header.mvcc_ins_id, mvcc_del_id
Old-version chain (out-of-place)mvcc_rec_header.prev_version_lsa → log-resident copy
”Active at snapshot time” setmvcc_active_tran — bit array + long-tran overflow array
Per-transaction snapshotmvcc_snapshot — active set + low/high MVCCID scalars
Visibility predicatemvcc_satisfies_snapshot — 3-valued result
Global registrymvcctable + m_trans_status_history[2048] ring
Oldest visible watermarkmvcctable::m_oldest_visible (atomic)

CUBRID instantiates the conventions above with three moving parts: a global mvcctable that owns the active-set bookkeeping, a per-transaction mvcc_info hung off the transaction descriptor, and a separate vacuum process that reads the global watermark. The distinguishing choices are: (1) lazy MVCCID issuance — only writers consume IDs; (2) the active set is encoded as a bit array plus an overflow list, not a sorted set; (3) snapshot construction is lock-free against the commit path, validated by per-slot atomic versions in a 2048-slot history ring.

Figure 1 — Overall structure

Figure 1 — The big picture: a central mvcc_trans_status registry on the left, transactions in the middle each carrying their own snapshot and MVCCID, and the user-visible table on the right being read or written. The arrows mark the three core operations: snapshot creation against the registry, MVCCID deactivation on commit, and the per-snapshot read against the table. (Source: original mvcc analysis deck, slide 5.)

flowchart LR
  A["transaction begin"] --> B{"first write?\n(DDL / DML)"}
  B -- "yes" --> C["mvcctable::get_new_mvccid\n→ assign MVCCID"]
  B -- "no (read-only)" --> D
  C --> D["statement runs"]
  D --> E{"need a snapshot?\n(by isolation level)"}
  E -- "yes" --> F["mvcctable::build_mvcc_info\n→ lock-free copy of active set"]
  E -- "no" --> G
  F --> G["per-record visibility:\nmvcc_satisfies_snapshot"]
  G --> H["commit / rollback"]
  H --> I["mvcctable::complete_mvcc\n→ flip bit, publish ring slot,\nmaybe advance lowest_active"]
  I --> J["vacuum reclaims versions\n< m_oldest_visible"]

Each labeled box is unpacked in the subsections below. The boxes do not move; only the level of detail increases.

flowchart LR
  subgraph TX["per-transaction state (log_tdes)"]
    MI["mvcc_info\n• id (own MVCCID)\n• snapshot\n• recent_lowest_active\n• sub_ids"]
  end

  subgraph TBL["global mvcctable (log_Gl.mvcc_table)"]
    CUR["m_current_trans_status\n(live mvcc_active_tran)"]
    HIST["m_trans_status_history[2048]\n(cyclic ring of past states)"]
    OV["m_oldest_visible\n(vacuum watermark)"]
    LV["m_transaction_lowest_visible_mvccids[]\n(per-tran snapshot floor)"]
  end

  subgraph LOG["active log volume header"]
    NEXT["log_Gl.hdr.mvcc_next_id\n(MVCCID counter)"]
  end

  VAC[("vacuum master")]

  MI -- build_mvcc_info --> HIST
  CUR -- complete_mvcc / get_new_mvccid --> HIST
  CUR -- get_new_mvccid --> NEXT
  LV -- update_global_oldest_visible --> OV
  OV --> VAC
  • An MVCCID is allocated lazily, on the transaction’s first write operation (DDL or DML). Read-only transactions never receive an MVCCID and so do not consume the active-set capacity.
  • Exactly one MVCCID per write transaction. Subsequent writes in the same transaction reuse it; sub-transactions get separate IDs.
  • The MVCCID counter itself is not owned by the MVCC table — it lives in the active log volume header.
// mvcctable::get_new_mvccid — src/transaction/mvcc_table.cpp
MVCCID
mvcctable::get_new_mvccid ()
{
MVCCID id;
m_new_mvccid_lock.lock ();
id = log_Gl.hdr.mvcc_next_id;
MVCCID_FORWARD (log_Gl.hdr.mvcc_next_id);
m_new_mvccid_lock.unlock ();
return id;
}

The dedicated m_new_mvccid_lock keeps MVCCID issuance off the hot m_active_trans_mutex path; the comment in mvcc_table.hpp notes that this could in principle be replaced with atomic operations.

Every heap and index record carries an mvcc_rec_header. The flag byte controls which optional fields are physically present, so unused MVCC slots cost zero bytes:

// mvcc_rec_header — src/transaction/mvcc.h
struct mvcc_rec_header
{
INT32 mvcc_flag:8; /* MVCC flags */
INT32 repid:24; /* representation id */
int chn; /* cache coherency number */
MVCCID mvcc_ins_id; /* MVCC insert id */
MVCCID mvcc_del_id; /* MVCC delete id */
LOG_LSA prev_version_lsa; /* log address of previous version */
};
flowchart TB
  R0["row v0 (current in heap)\nins_id = 7, del_id = 12\nprev_version_lsa → log entry A"]
  R1["row v_-1 (in log)\nins_id = 3, del_id = 7\nprev_version_lsa → log entry B"]
  R2["row v_-2 (in log)\nins_id = 0, del_id = 3\nprev_version_lsa = NULL"]
  R0 --> R1 --> R2

Older versions are not kept inline next to the current version (as in PostgreSQL) — they are reachable via prev_version_lsa chained back into the log, where vacuum and recovery can find them.

Active-MVCCID tracking — mvcc_active_tran

Section titled “Active-MVCCID tracking — mvcc_active_tran”

The novel piece in CUBRID is how the set of currently active MVCCIDs is encoded. A naive std::set<MVCCID> would dominate the cost of building snapshots and of the visibility check. CUBRID uses a bit array + overflow array hybrid:

// mvcc_active_tran (private members) — src/transaction/mvcc_active_tran.hpp
private:
using unit_type = std::uint64_t;
static const size_t BITAREA_MAX_SIZE = 500; // 500 * 64 = 32k MVCCIDs
static const unit_type ALL_ACTIVE = 0;
static const unit_type ALL_COMMITTED = (unit_type) -1;
/* bit area to store MVCCIDS status - size BITAREA_MAX_SIZE */
unit_type *m_bit_area;
/* first MVCCID whose status is stored in bit area */
volatile MVCCID m_bit_area_start_mvccid;
/* the area length expressed in bits */
volatile size_t m_bit_area_length;
/* long time transaction mvccid array */
MVCCID *m_long_tran_mvccids;
volatile size_t m_long_tran_mvccids_length;

Each bit represents one MVCCID; 0 = active, 1 = completed (committed or rolled back). The bit at offset i covers m_bit_area_start_mvccid + i. Default cap is 500 units = 32 000 MVCCIDs of recent history. IDs that fall behind that window because their owners are still running get evicted into the sorted m_long_tran_mvccids array, sized by max_transactions.

flowchart LR
  subgraph BA["m_bit_area (LSB → MSB by MVCCID)"]
    direction LR
    U0["unit 0\n0011 0111 ..."]
    U1["unit 1\n1111 1111 ..."]
    U2["unit 2\n0110 0110 ..."]
    DOTS["..."]
    UN["unit ≤ 499"]
    U0 --> U1 --> U2 --> DOTS --> UN
  end
  START["m_bit_area_start_mvccid\n(MVCCID at unit-0 LSB)"] --> U0
  LT["m_long_tran_mvccids[]\n(sorted, MVCCIDs older than start)"]
  LT -. "evicted" .-> BA

Figure 2 — Bit-array layout

Figure 2 — Bit-array layout: 500 units × 64 bits = 32 000 MVCCIDs of recent history. m_bit_area_start_mvccid anchors unit 0’s LSB, so MVCCID = start + bit_offset. Bit value 0 = active, 1 = completed. (Source: deck slide 12.)

Visibility check (mvcc_active_tran::is_active): if the queried MVCCID predates m_bit_area_start_mvccid, scan the long-transaction array; else look up the bit. Common case (recently issued IDs) is O(1).

When an MVCCID is completed, the bit is flipped. If the bit area’s prefix has many fully-completed units (ALL_COMMITTED), those units are LTRIM-ed (m_bit_area_start_mvccid advances). If the area then still exceeds LONG_TRAN_THRESHOLD, residual still-active IDs are migrated into the long-transaction array. Two cached scalars short- circuit common queries: compute_highest_completed_mvccid and compute_lowest_active_mvccid.

Figure 3 — lowest_active_mvccid cached scalar

Figure 3 — lowest_active_mvccid cached scalar. ○ = active, ● = completed. Everything strictly below the cached value is known-completed without probing the bit area, which short-circuits the common case in mvcc_is_id_in_snapshot. (Source: deck slide 19.)

The MVCC table is the global registry. The relevant private members:

// mvcctable (private members) — src/transaction/mvcc_table.hpp
class mvcctable
{
/* ... public API ... */
private:
static const size_t HISTORY_MAX_SIZE = 2048; // must be a power of 2
static const size_t HISTORY_INDEX_MASK = HISTORY_MAX_SIZE - 1;
lowest_active_mvccid_type *m_transaction_lowest_visible_mvccids;
size_t m_transaction_lowest_visible_mvccids_size;
lowest_active_mvccid_type m_current_status_lowest_active_mvccid;
mvcc_trans_status m_current_trans_status;
std::atomic<size_t> m_trans_status_history_position;
mvcc_trans_status *m_trans_status_history;
std::mutex m_new_mvccid_lock;
std::mutex m_active_trans_mutex;
std::atomic<MVCCID> m_oldest_visible;
std::atomic<size_t> m_ov_lock_count;
};

Two mutexes split the contention: m_new_mvccid_lock for monotonic ID issuance, m_active_trans_mutex for the active-set transitions. The history ring plus an atomic version counter on each slot lets readers (snapshot builders) operate lock-free.

// mvcctable::build_mvcc_info — src/transaction/mvcc_table.cpp
// (lock-free retry loop, condensed)
while (true)
{
snapshot_retry_count++;
/* ... set transaction's lowest_visible to MVCCID_ALL_VISIBLE,
* then to crt_status_lowest_active, in this order ... */
index = m_trans_status_history_position.load ();
assert (index < HISTORY_MAX_SIZE);
const mvcc_trans_status &trans_status = m_trans_status_history[index];
trans_status_version = trans_status.m_version.load ();
trans_status.m_active_mvccs.copy_to (
tdes.mvccinfo.snapshot.m_active_mvccs,
mvcc_active_tran::copy_safety::THREAD_UNSAFE);
/* ... load global stats ... */
if (trans_status_version == trans_status.m_version.load ())
{
// no version change; copying status was successful
break;
}
else
{
// a failed copy may break data validity
tdes.mvccinfo.snapshot.m_active_mvccs.reset_active_transactions ();
}
}
tdes.mvccinfo.recent_snapshot_lowest_active_mvccid = crt_status_lowest_active;
tdes.mvccinfo.snapshot.snapshot_fnc = mvcc_satisfies_snapshot;
tdes.mvccinfo.snapshot.lowest_active_mvccid = crt_status_lowest_active;
tdes.mvccinfo.snapshot.highest_completed_mvccid = highest_completed_mvccid;
tdes.mvccinfo.snapshot.valid = true;

This is the lock-free read pattern in full. The reader picks up the ring index, reads the slot’s m_version before and after copying the active-set, and retries if a writer touched the slot in between.

sequenceDiagram
    participant TX as Transaction
    participant TBL as mvcctable
    participant RING as history[pos]
    participant SNAP as tx.snapshot

    TX->>TBL: build_mvcc_info(tdes)
    loop until version stable
        TBL->>RING: pos = m_trans_status_history_position.load()
        TBL->>RING: v1 = ring[pos].m_version
        TBL->>SNAP: copy active_mvccs (bit_area + long_tran)
        TBL->>RING: v2 = ring[pos].m_version
        alt v1 == v2
            note right of TBL: stable copy — break
        else changed
            TBL->>SNAP: reset_active_transactions
            note right of TBL: retry — perfmon counts retries
        end
    end
    TBL->>SNAP: lowest_active = crt_status_lowest_active
    TBL->>SNAP: highest_completed = computed from copy
    TBL->>SNAP: snapshot_fnc = mvcc_satisfies_snapshot
    TBL->>SNAP: valid = true

The snapshot is read-only after construction. SI guarantees that queries within the same snapshot see the same set of committed versions.

Snapshot acquisition timing is driven by the isolation level, set by the executor before calling logtb_get_mvcc_snapshot:

Isolation levelWhen snapshot is taken
READ COMMITTED (4)Before each statement that touches existing rows
REPEATABLE READ (5)Once, at transaction start
SERIALIZABLE (6)Once, at transaction start

Even at READ COMMITTED, statements that do not access existing data (CREATE, plain DROP, TRUNCATE not implemented as DELETE) skip snapshot acquisition.

Visibility predicate — mvcc_satisfies_snapshot

Section titled “Visibility predicate — mvcc_satisfies_snapshot”

The predicate has two top-level branches (deleted vs. not-deleted) and returns one of three verdicts:

// mvcc_satisfies_snapshot — src/transaction/mvcc.c (condensed)
MVCC_SATISFIES_SNAPSHOT_RESULT
mvcc_satisfies_snapshot (THREAD_ENTRY * thread_p,
MVCC_REC_HEADER * rec_header,
MVCC_SNAPSHOT * snapshot)
{
if (!MVCC_IS_HEADER_DELID_VALID (rec_header))
{
/* Record is not deleted */
if (!MVCC_IS_FLAG_SET (rec_header, OR_MVCC_FLAG_VALID_INSID))
return SNAPSHOT_SATISFIED; /* visible to all */
else if (MVCC_IS_REC_INSERTED_BY_ME (...))
return SNAPSHOT_SATISFIED; /* my own insert */
else if (MVCC_IS_REC_INSERTER_IN_SNAPSHOT (...))
return TOO_NEW_FOR_SNAPSHOT; /* inserter active or
* committed after snap */
else
return SNAPSHOT_SATISFIED; /* committed before */
}
else
{
/* Record is deleted */
if (MVCC_IS_REC_DELETED_BY_ME (...))
return TOO_OLD_FOR_SNAPSHOT; /* I deleted it */
else if (MVCC_IS_REC_INSERTER_IN_SNAPSHOT (...))
return TOO_NEW_FOR_SNAPSHOT; /* inserter still active */
else if (MVCC_IS_REC_DELETER_IN_SNAPSHOT (...))
return SNAPSHOT_SATISFIED; /* deleter active /
* committed-after-snap */
else
return TOO_OLD_FOR_SNAPSHOT; /* deleter committed
* before snap */
}
}
flowchart TD
  A["record header"] --> B{"deleted?\n(DELID flag valid)"}
  B -- "no" --> C{"VALID_INSID flag?"}
  C -- "no" --> R1["SNAPSHOT_SATISFIED\n(visible to all)"]
  C -- "yes" --> D{"inserted by me?"}
  D -- "yes" --> R2["SNAPSHOT_SATISFIED"]
  D -- "no" --> E{"inserter in snapshot's\nactive set?"}
  E -- "yes" --> R3["TOO_NEW_FOR_SNAPSHOT\n→ walk prev_version_lsa"]
  E -- "no" --> R4["SNAPSHOT_SATISFIED"]

  B -- "yes" --> F{"deleted by me?"}
  F -- "yes" --> R5["TOO_OLD_FOR_SNAPSHOT"]
  F -- "no" --> G{"inserter in snapshot's\nactive set?"}
  G -- "yes" --> R6["TOO_NEW_FOR_SNAPSHOT"]
  G -- "no" --> H{"deleter in snapshot's\nactive set?"}
  H -- "yes" --> R7["SNAPSHOT_SATISFIED\n(deleter not yet visible)"]
  H -- "no" --> R8["TOO_OLD_FOR_SNAPSHOT"]

The “is the inserter/deleter in the snapshot’s active set?” check drills into the bit-array fast path:

// mvcc_is_id_in_snapshot — src/transaction/mvcc.c
STATIC_INLINE bool
mvcc_is_id_in_snapshot (THREAD_ENTRY * thread_p, MVCCID mvcc_id, MVCC_SNAPSHOT * snapshot)
{
if (MVCC_ID_PRECEDES (mvcc_id, snapshot->lowest_active_mvccid))
return false; /* certainly committed before snap */
if (MVCC_ID_FOLLOW_OR_EQUAL (mvcc_id, snapshot->highest_completed_mvccid))
return true; /* certainly active or future */
return snapshot->m_active_mvccs.is_active (mvcc_id); /* probe bit area */
}

The two scalar bounds (lowest_active, highest_completed) eliminate the bit-array probe for the bulk of MVCCIDs.

Figure 4 — Visibility worked example

Figure 4 — Visibility worked example. Three concurrent transactions (A: snapshot {18, 19, 30}; B: snapshot {19, 30, 32}, MVCCID 32; C: snapshot {19, 30, 32, 34}, MVCCID 34) reading four record versions on the left. The colored circles on the right enumerate which snapshots see which version. Note the asymmetry of insert vs. delete visibility — record (ins=18, del=32) is visible to A (deleter not yet committed at A’s snapshot), to B (the deleter itself), and not visible to C (deleter committed before C’s snapshot). (Source: deck slide 26.)

The commit path is where the active-set state and the history ring advance together:

// mvcctable::complete_mvcc — src/transaction/mvcc_table.cpp (condensed)
void
mvcctable::complete_mvcc (int tran_index, MVCCID mvccid, bool committed)
{
std::unique_lock<std::mutex> ulock (m_active_trans_mutex);
mvcc_trans_status::version_type next_version;
size_t next_index;
mvcc_trans_status &next_status = next_trans_status_start (next_version, next_index);
/* ... stats update if committed ... */
// update current trans status
m_current_trans_status.m_active_mvccs.set_inactive_mvccid (mvccid);
m_current_trans_status.m_last_completed_mvccid = mvccid;
m_current_trans_status.m_event_type = committed ? COMMIT : ROLLBACK;
// finish next trans status (publish to ring)
next_tran_status_finish (next_status, next_index);
/* ... bookkeeping for vacuum's lowest_visible array ... */
ulock.unlock ();
// advance lowest_active outside the lock when warranted
MVCCID global_lowest_active = m_current_status_lowest_active_mvccid;
if (global_lowest_active == mvccid
|| MVCC_ID_PRECEDES (mvccid, next_status.m_active_mvccs.get_bit_area_start_mvccid ()))
{
MVCCID new_lowest_active = next_status.m_active_mvccs.compute_lowest_active_mvccid ();
if (next_status.m_version.load () == next_version)
advance_oldest_active (new_lowest_active);
}
}

Sequence for a single commit/rollback:

sequenceDiagram
    participant TX as Transaction
    participant CUR as m_current_trans_status
    participant RING as history ring
    participant LV as lowest_visible[tran_index]

    TX->>CUR: lock m_active_trans_mutex
    CUR->>RING: next_trans_status_start → reserve slot N+1, bump version
    TX->>CUR: m_active_mvccs.set_inactive_mvccid(mvccid)
    TX->>CUR: m_last_completed = mvccid — event_type = COMMIT or ROLLBACK
    CUR->>RING: next_tran_status_finish → copy CUR into slot, store position
    TX->>LV: if committed clamp to mvccid — if rollback set MVCCID_NULL
    TX->>CUR: unlock
    opt mvccid was the lowest active
        TX->>CUR: compute_lowest_active_mvccid + advance_oldest_active
    end

m_trans_status_history_position is the atomic that snapshot readers load — bumping it last is what makes the new state visible to them.

Vacuum cannot remove a version that is still visible to any live snapshot. CUBRID’s vacuum master periodically calls mvcctable::update_global_oldest_visible, which sweeps every m_transaction_lowest_visible_mvccids[idx] plus the live m_current_status_lowest_active_mvccid:

// mvcctable::compute_oldest_visible_mvccid — src/transaction/mvcc_table.cpp (excerpt)
MVCCID lowest_active_mvccid = oldest_active_get (
m_current_status_lowest_active_mvccid, 0,
oldest_active_event::GET_OLDEST_ACTIVE);
for (size_t idx = 0; idx < m_transaction_lowest_visible_mvccids_size; idx++)
{
loaded_tran_mvccid = oldest_active_get (
m_transaction_lowest_visible_mvccids[idx], idx,
oldest_active_event::GET_OLDEST_ACTIVE);
if (loaded_tran_mvccid == MVCCID_ALL_VISIBLE)
{
waiting_mvccids_pos.append (idx); /* re-check later */
}
else if (loaded_tran_mvccid != MVCCID_NULL
&& MVCC_ID_PRECEDES (loaded_tran_mvccid, lowest_active_mvccid))
{
lowest_active_mvccid = loaded_tran_mvccid;
}
}

The vacuum master then publishes the result into the atomic m_oldest_visible, which is the single value the per-record mvcc_satisfies_vacuum reads.

The well-known cost of this design: a single long-running write transaction with a small MVCCID pins m_oldest_visible and prevents vacuuming of any version newer than it, regardless of how many shorter transactions have come and gone.

Figure 5 — Vacuum watermark calculation

Figure 5 — Vacuum watermark calculation. The 2048-slot history ring holds three live versions v0/v1/v2 with per-version active-set snapshots {10, 13, 17}, {13, 17}, {13, 17, 18}. The m_transaction_lowest_visible_mvccids[] array gives each in-flight transaction’s snapshot floor (MVCCID_NULL = transaction ended, ignored). m_oldest_visible is the minimum of all live floors — here 13 — and is what mvcum_master consults to decide which versions are reclaimable. (Source: deck slide 30.)

Anchor on symbol names, not line numbers. The CUBRID source moves; a function name (or struct/enum tag) is the stable handle. Use git grep -n '<symbol>' src/transaction/ to locate the current position. The line numbers cited in this section were observed when the document was last updated: and are intended only as quick hints.

  • struct mvcc_rec_header (in mvcc.h) — on-record MVCC fields (flag byte, ins/del MVCCID, prev_version_lsa).
  • enum mvcc_satisfies_snapshot_result (in mvcc.h) — the three visibility outcomes (SNAPSHOT_SATISFIED, TOO_OLD_FOR_SNAPSHOT, TOO_NEW_FOR_SNAPSHOT).
  • struct mvcc_snapshot (in mvcc.h) — embedded m_active_mvccs plus lowest_active/highest_completed scalars.
  • struct mvcc_info (in mvcc.h) — per-active-transaction MVCC state, hung off log_tdes.
  • struct mvcc_trans_status (in mvcc_table.hpp) — one slot in the history ring; the live status is the same type.
  • class mvcctable (in mvcc_table.hpp) — global table with the history ring and the two mutexes.
  • struct mvcc_active_tran (in mvcc_active_tran.hpp) — bit-array
    • long-tran active set.
  • mvcctable::build_mvcc_info (in mvcc_table.cpp) — lock-free snapshot copy with version-validated retry.
  • mvcctable::compute_oldest_visible_mvccid (in mvcc_table.cpp).
  • mvcctable::is_active (in mvcc_table.cpp) — delegates to the embedded mvcc_active_tran.
  • mvcctable::complete_mvcc (in mvcc_table.cpp) — commit/rollback hook; advances the history ring.
  • mvcctable::get_new_mvccid (in mvcc_table.cpp) — issuance under m_new_mvccid_lock against log_Gl.hdr.mvcc_next_id.
  • mvcctable::update_global_oldest_visible (in mvcc_table.cpp) — vacuum’s source of truth for m_oldest_visible.

Visibility evaluation (src/transaction/mvcc.c)

Section titled “Visibility evaluation (src/transaction/mvcc.c)”
  • mvcc_is_id_in_snapshot — low-water / high-water short-circuits followed by a bit-area probe.
  • mvcc_is_active_id — per-tran fast path against recent_snapshot_lowest_active_mvccid.
  • mvcc_satisfies_snapshot — the decision tree (deleted-or-not branch, then inserter/deleter visibility).
  • mvcc_is_not_deleted_for_snapshot — DML “still-deletable” check.
  • mvcc_satisfies_vacuum — vacuum’s per-record decision.
  • mvcc_satisfies_delete — five-state classification at delete time (DELETE_RECORD_INSERT_IN_PROGRESS / _CAN_DELETE / _DELETED / _DELETE_IN_PROGRESS / _SELF_DELETED).

These line numbers held when the document was last updated:. If you land at a different definition, the symbol name above is authoritative; update the table on your way through.

SymbolFileLine
struct mvcc_rec_headermvcc.h38
enum mvcc_satisfies_snapshot_resultmvcc.h159
struct mvcc_snapshotmvcc.h173
struct mvcc_infomvcc.h196
struct mvcc_trans_statusmvcc_table.hpp40
class mvcctablemvcc_table.hpp64
struct mvcc_active_tranmvcc_active_tran.hpp31
mvcctable::build_mvcc_infomvcc_table.cpp226
mvcctable::compute_oldest_visible_mvccidmvcc_table.cpp355
mvcctable::is_activemvcc_table.cpp423
mvcctable::complete_mvccmvcc_table.cpp465
mvcctable::get_new_mvccidmvcc_table.cpp566
mvcctable::update_global_oldest_visiblemvcc_table.cpp617
mvcc_is_id_in_snapshotmvcc.c91
mvcc_is_active_idmvcc.c123
mvcc_satisfies_snapshotmvcc.c156
mvcc_is_not_deleted_for_snapshotmvcc.c280
mvcc_satisfies_vacuummvcc.c321
mvcc_satisfies_deletemvcc.c389

Each entry is a fact about the current source — readable without the original analysis materials. The trailing note shows how it was checked and, where relevant, historical drift or limits of verification. Open questions follow as the curator’s recorded gaps; future readers should treat them as starting points, not as known bugs.

  • The bit-area cap is BITAREA_MAX_SIZE = 500 units (= 32 000 MVCCIDs of recent history). Hard-coded in mvcc_active_tran.hpp; the migration thresholds (LONG_TRAN, CLEANUP) live alongside it in mvcc_active_tran.cpp. Not a runtime parameter — tuning requires a code change.

  • The MVCCID counter is owned by the active log volume header (log_Gl.hdr.mvcc_next_id), not by the MVCC table. Confirmed in mvcctable::get_new_mvccid (mvcc_table.cpp). A dedicated m_new_mvccid_lock keeps issuance off the active-set hot path; the mvcc_table.hpp comment notes this could in principle be replaced with atomic ops.

  • The MVCC layer only exposes build_mvcc_info; isolation-level snapshot-timing policy lives outside src/transaction/mvcc*.c. The “RC = per-statement, RR/SR = once at start” rule is enforced by the call sites of logtb_get_mvcc_snapshot in transaction-descriptor / xasl code. Auditing the per-statement RC behavior requires stepping outside this document’s scope.

  • mvcc_rec_header carries three documented flag bits inside a 5-bit mask OR_MVCC_FLAG_MASK = 0x1f. Documented bits: VALID_INSID, VALID_DELID, VALID_PREV_VERSION. Two bits in the mask are currently unused — see Open Questions §1.

  • Sub-transactions exist in code (mvcc_info::sub_ids, complete_sub_mvcc) and shape savepoint semantics. Present in mvcc.h and mvcc_table.cpp but not covered in the body of this document — a follow-up analysis is owed.

  1. Two unused bits in OR_MVCC_FLAG_MASK. Reserved for a planned feature (distributed MVCC? tombstone-without-deleter?), or just slack? Investigation path: trace the bit definitions through git history and search for any in-flight CBRD tickets touching the mask.

  2. Saturation behavior of the 2048-slot history ring. Under what workload does HISTORY_MAX_SIZE = 2048 saturate, and what does CUBRID do when a snapshot’s source slot is overwritten before the snapshot is fully built? The atomic m_version validation in build_mvcc_info drives a retry loop (snapshot_retry_count), but the worst-case retry bound is unknown. Investigation path: instrument snapshot_retry_count under a contended workload.

  3. Write-skew handling under SERIALIZABLE. SI alone admits write skew; PostgreSQL SSI detects-and-aborts, while CUBRID almost certainly falls back to lock-based serialization. Investigation path: trace the SERIALIZABLE write path through lock_object calls; cross-reference cubrid-lock-manager.md §“NON2PL” and §“Beyond CUBRID”.

Beyond CUBRID — Comparative Designs & Research Frontiers

Section titled “Beyond CUBRID — Comparative Designs & Research Frontiers”

Pointers, not analysis. Each bullet is a starting handle for a follow-up doc; depth here is intentionally shallow.

  • PostgreSQL SSISerializable Snapshot Isolation (Cahill et al., SIGMOD 2008; Ports & Grittner, VLDB 2012) augments SI with predicate locking and dependency-graph cycle detection to catch write skew at commit time. CUBRID’s SERIALIZABLE relies on the lock manager instead. A side-by-side cost comparison would tell us what we trade by avoiding predicate locking.
  • In-memory MVCC engines (HyPer, Hekaton, Cicada) redesign version chains for cache-aware in-memory layouts and often eliminate the central registry. CUBRID is disk-resident; the comparison is orthogonal but instructive for distinguishing costs that are intrinsic to MVCC from those intrinsic to disk-resident MVCC. Wu et al., In-Memory MVCC Empirical Evaluation (VLDB 2017) surveys the design space.
  • In-place vs out-of-place trade-off, measured. PostgreSQL’s bloat / HOT problem and CUBRID’s vacuum read-amplification are symmetric costs of the same underlying choice. Following the empirical literature here would let us put numbers on what we pay for the undo-log version chain.
  • Concurrency control at high core counts. Yu et al., Staring into the Abyss (VLDB 2015) — a benchmark of seven CC protocols at 1 000 cores. Relevant if CUBRID is to scale beyond current per-server core counts.
  • Hybrid OCC + MVCC. Modern engines often pair MVCC reads with optimistic write validation. Whether CUBRID’s NON2PL mechanism is a stepping stone toward this is itself an open question — see the cross-reference in cubrid-lock-manager.md §“Beyond CUBRID”.

The intent of this section is to seed next documents, not to analyze. Each bullet should become its own curated note when its turn comes.

Raw analyses (under raw/code-analysis/cubrid/storage/mvcc/)

Section titled “Raw analyses (under raw/code-analysis/cubrid/storage/mvcc/)”
  • mvcc 코드 분석 ver 2.pdf (slide render)
  • mvcc 코드 분석 ver 2.pptx (slide source — cleaner text extraction)

Textbook chapters (under knowledge/research/dbms-general/)

Section titled “Textbook chapters (under knowledge/research/dbms-general/)”
  • Database Internals (Petrov), Ch. 5 “Transaction Processing and Recovery”, §“Multiversion Concurrency Control” (≈ line 4002), §“Isolation Levels” (≈ line 4136), §“Snapshot Isolation” (≈ line 11266 in the distributed-transactions chapter).
  • Storage – Concurrency 코드 분석 — module-level positioning of Lock Manager · MVCC · Vacuum on top of Heap Manager · Page Buffer; the source of the “three-leg” framing in §“Common DBMS Design”.

CUBRID source (under /data/hgryoo/references/cubrid/)

Section titled “CUBRID source (under /data/hgryoo/references/cubrid/)”
  • src/transaction/mvcc.h
  • src/transaction/mvcc.c
  • src/transaction/mvcc_table.hpp
  • src/transaction/mvcc_table.cpp
  • src/transaction/mvcc_active_tran.hpp
  • src/transaction/mvcc_active_tran.cpp