Skip to content

CUBRID MVCC — Code-Level Deep Dive

Where this document fits: The high-level analysis cubrid-mvcc.md covers design intent and theoretical background. This document traces every branch and field at the code level. Each chapter is self-contained, but reading in order follows the full lifecycle of a row version and the snapshot that decides its visibility inside the kernel.

Contents:

ChTitleStatus
1Data-Structure Map
2Initialization and Memory Layout
3MVCCID Birth and the On-Record Header
4Active-Set Reads — Bit-Area Probe and Cached Scalars
5Snapshot Construction
6Visibility Evaluation
7Sibling Predicates for Delete Dirty and Vacuum
8Commit and the History Ring Advance
9Vacuum Coordination and the Oldest-Visible Watermark
10Sub-Transactions and Special Paths

Field-by-field reference for every structure the MVCC module owns. SI theory is not re-derived here — see cubrid-mvcc.md.

HeaderOwns
mvcc.hmvcc_rec_header, mvcc_snapshot, mvcc_info, the three result enums
mvcc_active_tran.hppmvcc_active_tran (bit-area active-set engine)
mvcc_table.hppmvcc_trans_status, mvcctable (global coordinator)
storage_common.hMVCCID typedef + sentinel ladder

1.1 The MVCCID type and its sentinel ladder

Section titled “1.1 The MVCCID type and its sentinel ladder”

One unsigned 64-bit counter; low values are reserved sentinels (ids 1, 2 are skipped, first real id is 4).

// MVCCID + sentinels -- src/storage/storage_common.h
typedef UINT64 MVCCID; /* MVCC ID */
#define MVCCID_NULL (0)
#define MVCCID_ALL_VISIBLE ((MVCCID) 3) /* visible for all transactions */
#define MVCCID_FIRST ((MVCCID) 4)
ValueNameRole / why
0MVCCID_NULL”no id”; unset/uninitialized field
3MVCCID_ALL_VISIBLEpredates any live snapshot; visible to all (set by vacuum stripping the insert id)
4MVCCID_FIRSTfirst id a normal tran can get; counter never falls below

Predicate macros classify an id (MVCCID_IS_VALID = != MVCCID_NULL, MVCCID_IS_NORMAL = >= MVCCID_FIRST); the step macro skips the reserved band:

// MVCCID_FORWARD -- src/storage/storage_common.h
#define MVCCID_FORWARD(id) \
do { (id)++; if ((id) < MVCCID_FIRST) (id) = MVCCID_FIRST; } while (0)

Invariant — the sentinel band [1,2] is never a live id. MVCCID_FORWARD snaps any post-increment below 4 back to 4, so a real id never collides with MVCCID_ALL_VISIBLE (3).

1.2 mvcc_rec_header — the on-record MVCC stamp

Section titled “1.2 mvcc_rec_header — the on-record MVCC stamp”

The per-record header stored with each heap object — the only MVCC struct on disk; everything else here is in-memory.

// mvcc_rec_header -- src/transaction/mvcc.h
struct mvcc_rec_header
{
INT32 mvcc_flag:8;
INT32 repid:24;
int chn;
MVCCID mvcc_ins_id;
MVCCID mvcc_del_id;
LOG_LSA prev_version_lsa;
};
FieldRole / why
mvcc_flag:8low byte of the packed INT32; OR_MVCC_FLAG_* bits say which fields are present
repid:24upper 24 bits; schema representation, packed in the same word
chnchange counter, written instead of a delete id when no DELID; detects stale caches
mvcc_ins_idbirth stamp; did the inserter commit before my snapshot
mvcc_del_iddeath stamp; did the deleter commit before my snapshot
prev_version_lsaversion-chain link; a too-new record walks back through it

The OR_MVCC_FLAG_* bits (in object_representation_constants.h) tag which fields are meaningful — VALID_INSID/VALID_DELID/VALID_PREV_VERSION under OR_MVCC_FLAG_MASK.

Invariant — DELID and chn are logically exclusive, not a physical union. The on-disk header writes only one (selected by OR_MVCC_FLAG_VALID_DELID); MVCC_IS_HEADER_DELID_VALID gates delete-id reads on that flag and MVCCID_IS_VALID, so a chn is never read as a tran id.

The initializer zeroes flags/repid, sets chn to NULL_CHN (-1), both ids to MVCCID_NULL, the LSA null:

// MVCC_REC_HEADER_INITIALIZER -- src/transaction/mvcc.h
#define MVCC_REC_HEADER_INITIALIZER \
{ 0, 0, NULL_CHN, MVCCID_NULL, MVCCID_NULL, LSA_INITIALIZER }

MVCC_IS_HEADER_ALL_VISIBLE detects the vacuumed-clean case: no insid/delid flags set and mvcc_ins_id == MVCCID_ALL_VISIBLE.

1.3 mvcc_active_tran — the bit-area active set

Section titled “1.3 mvcc_active_tran — the bit-area active set”

Answers “is MVCCID x still active?” lock-free. Embedded by value in both mvcc_snapshot and mvcc_trans_status. Chapter 4 dissects the probe.

// mvcc_active_tran private state -- src/transaction/mvcc_active_tran.hpp
using unit_type = std::uint64_t;
static const size_t BITAREA_MAX_SIZE = 500;
static const unit_type ALL_ACTIVE = 0;
static const unit_type ALL_COMMITTED = (unit_type) -1;
unit_type *m_bit_area;
volatile MVCCID m_bit_area_start_mvccid;
volatile size_t m_bit_area_length;
MVCCID *m_long_tran_mvccids;
volatile size_t m_long_tran_mvccids_length;
bool m_initialized;
FieldRole / why
m_bit_area64-bit words, one bit per id: set = completed, clear = active
m_bit_area_start_mvccidid of bit 0; window left edge, offset = id - start
m_bit_area_lengthlength in bits; trims from the left as old ids complete
m_long_tran_mvccidsoverflow array of still-active ids past the left edge; lets the window slide
m_long_tran_mvccids_lengthoverflow-scan bound
m_initializedguard between default-constructed and live; used by finalize/reset

ALL_ACTIVE (0) / ALL_COMMITTED ((unit_type)-1) are the all-clear / all-set word patterns; BITAREA_MAX_SIZE (500 words) caps the window at 500 × 64 = 32 000 ids before overflow.

Invariant — bit offset stays inside [0, m_bit_area_length). The Chapter 4 probe range-checks before touching m_bit_area, else it reads past the 500-word area; volatile window fields let readers snapshot (start, length) while a committer slides it.

operator= is deleted so m_bit_area is never shallow-copied; callers use copy_to, parameterized by enum class copy_safety { THREAD_SAFE, THREAD_UNSAFE }.

1.4 mvcc_snapshot — a transaction’s frozen view

Section titled “1.4 mvcc_snapshot — a transaction’s frozen view”

The immutable picture of “who had committed” when a read began.

// mvcc_snapshot -- src/transaction/mvcc.h
struct mvcc_snapshot
{
MVCCID lowest_active_mvccid;
MVCCID highest_completed_mvccid;
mvcc_active_tran m_active_mvccs;
MVCC_SNAPSHOT_FUNC snapshot_fnc;
bool valid;
// ... mvcc_snapshot() ctor + reset() omitted ...
mvcc_snapshot &operator= (const mvcc_snapshot& snapshot) = delete;
void copy_to (mvcc_snapshot & dest) const;
};
FieldRole / why
lowest_active_mvccidlow watermark; id < this is committed-visible without the bits
highest_completed_mvccidhigh watermark; id > this is invisible without the bits
m_active_mvccsembedded mvcc_active_tran; precise answer between the watermarks
snapshot_fncswappable predicate; same snapshot serves normal vs dirty reads
validfalse = not built / reset on a reused tdes slot, triggers a rebuild

MVCC_SNAPSHOT_FUNC is a (*)(THREAD_ENTRY *, MVCC_REC_HEADER *, MVCC_SNAPSHOT *) returning MVCC_SATISFIES_SNAPSHOT_RESULT.

Invariant — the watermarks bracket the bit area (lowest_active_mvccid <= highest_completed_mvccid + 1, bit area consulted only between them). The deleted operator= forces copies through copy_to, deep-copying the bits so watermarks and bits stay consistent.

1.5 mvcc_info — per-transaction MVCC state on the tdes

Section titled “1.5 mvcc_info — per-transaction MVCC state on the tdes”

One per active transaction descriptor (log_tdes).

// mvcc_info -- src/transaction/mvcc.h
struct mvcc_info
{
MVCC_SNAPSHOT snapshot;
MVCCID id;
MVCCID recent_snapshot_lowest_active_mvccid;
std::vector<MVCCID> sub_ids;
bool is_sub_active;
// ... mvcc_info() ctor + init() + reset() omitted ...
void copy_to (mvcc_info & dest) const;
};
FieldRole / why
snapshotcurrent mvcc_snapshot (by value); rebuilt per statement (RC) or held (SR)
idthis tran’s MVCCID, MVCCID_NULL until first write; lazily assigned (Ch 3)
recent_snapshot_lowest_active_mvccidcached low watermark; fast “definitely inactive” cutoff so is_active skips the global table
sub_idssub-transaction ids, one per nested sub (Ch 10)
is_sub_activetrue while a sub-transaction is open, so visibility also checks sub_ids

1.6 mvcc_trans_status — one immutable history snapshot of the active set

Section titled “1.6 mvcc_trans_status — one immutable history snapshot of the active set”

The global table keeps a ring of versioned status records; each slot is an mvcc_trans_status.

// mvcc_trans_status -- src/transaction/mvcc_table.hpp
struct mvcc_trans_status
{
using version_type = unsigned int;
enum event_type { COMMIT, ROLLBACK, SUBTRAN };
mvcc_active_tran m_active_mvccs;
MVCCID m_last_completed_mvccid; // just for info
event_type m_event_type; // just for info
std::atomic<version_type> m_version;
};
FieldRole / why
m_active_mvccsthe active-set bits as of this version
m_last_completed_mvcciddiagnostic only; last id completed at this version
m_event_typediagnostic only; COMMIT / ROLLBACK / SUBTRAN tag
m_versionseqlock linchpin: read, copy bits, re-read; equal-and-even = consistent (Ch 5)

Invariant — m_version brackets a consistent copy. A committer bumps it before and after mutating the bits; a reader seeing the same version before/after its copy knows no write interleaved (Ch 5). The two diagnostic fields are not part of this contract.

One instance in log_Gl.mvcc_table; owns id assignment, the status history ring, the per-tran lowest-visible array, and the oldest-visible watermark driving vacuum (lowest_active_mvccid_type is std::atomic<MVCCID>).

// mvcctable private members -- src/transaction/mvcc_table.hpp
static const size_t HISTORY_MAX_SIZE = 2048; // must be a power of 2
static const size_t HISTORY_INDEX_MASK = HISTORY_MAX_SIZE - 1;
lowest_active_mvccid_type *m_transaction_lowest_visible_mvccids; /* size = NUM_TOTAL_TRAN_INDICES */
size_t m_transaction_lowest_visible_mvccids_size;
lowest_active_mvccid_type m_current_status_lowest_active_mvccid;
mvcc_trans_status m_current_trans_status;
std::atomic<size_t> m_trans_status_history_position;
mvcc_trans_status *m_trans_status_history; /* ring of HISTORY_MAX_SIZE */
std::mutex m_new_mvccid_lock;
std::mutex m_active_trans_mutex;
std::atomic<MVCCID> m_oldest_visible;
std::atomic<size_t> m_ov_lock_count;
FieldRole / why
m_transaction_lowest_visible_mvccidsper-tran atomic array; each tran publishes its oldest needed id, the min feeds oldest-visible (Ch 9)
m_transaction_lowest_visible_mvccids_sizemin-scan bound; = NUM_TOTAL_TRAN_INDICES
m_current_status_lowest_active_mvccidatomic low watermark of current status; fast snapshot lower bound
m_current_trans_statusthe live, newest status; mutated under m_active_trans_mutex
m_trans_status_history_positionatomic ring cursor at newest slot; masked by HISTORY_INDEX_MASK
m_trans_status_historyring of HISTORY_MAX_SIZE; readers grab a stable past version (Ch 5/8)
m_new_mvccid_lockserializes get_new_mvccid / get_two_new_mvccid for monotonic ids
m_active_trans_mutexserializes current-status mutation + history advance
m_oldest_visibleatomic global watermark; vacuum reclaims nothing newer
m_ov_lock_countsoft pin; nonzero freezes m_oldest_visible for a stable floor (Ch 9)

Invariant — HISTORY_MAX_SIZE is a power of two. The ring index pos & HISTORY_INDEX_MASK is a correct modulo only then; otherwise the mask aliases non-adjacent slots and hands readers the wrong past version.

Invariant — m_oldest_visible only moves forward while unpinned (monotonic, frozen whenever m_ov_lock_count > 0); else vacuum reclaims a version a running reader needs, a missing row (Ch 9).

Visibility returns a typed enum, not a bool, so callers distinguish “too old” from “too new” (the latter triggers a version-chain walk).

// mvcc_satisfies_snapshot_result -- src/transaction/mvcc.h
enum mvcc_satisfies_snapshot_result
{ TOO_OLD_FOR_SNAPSHOT, SNAPSHOT_SATISFIED, TOO_NEW_FOR_SNAPSHOT };

TOO_OLD_FOR_SNAPSHOT = dead to me, stop; SNAPSHOT_SATISFIED = read it; TOO_NEW_FOR_SNAPSHOT = born after my snapshot, follow prev_version_lsa.

// mvcc_satisfies_delete_result -- src/transaction/mvcc.h
enum mvcc_satisfies_delete_result
{
DELETE_RECORD_INSERT_IN_PROGRESS, DELETE_RECORD_CAN_DELETE,
DELETE_RECORD_DELETED, DELETE_RECORD_DELETE_IN_PROGRESS,
DELETE_RECORD_SELF_DELETED
};

INSERT_IN_PROGRESS = inserter uncommitted, cannot touch; CAN_DELETE = clear; DELETED = another tran committed the delete; DELETE_IN_PROGRESS = a live tran holds it, wait or abort; SELF_DELETED = I deleted it in this tran.

// mvcc_satisfies_vacuum_result -- src/transaction/mvcc.h
enum mvcc_satisfies_vacuum_result
{
VACUUM_RECORD_REMOVE, VACUUM_RECORD_DELETE_INSID_PREV_VER,
VACUUM_RECORD_CANNOT_VACUUM
};

REMOVE = reclaim the dead version; DELETE_INSID_PREV_VER = keep row, drop the useless insert id + prev-version link; CANNOT_VACUUM = not reclaimable yet. These enums are dissected against their producing functions in Chapters 6–7.

One global mvcctable in log_Gl, one mvcc_info per tdes, and the on-disk mvcc_rec_header that visibility compares against the snapshot in that mvcc_info. Every embedded mvcc_active_tran is an independent value-copy, not a shared pointer — so a reader holds a stable snapshot while the global status advances.

graph TD
  subgraph Global["log_Gl"]
    MT["mvcctable"]
    MT --> CUR["m_current_trans_status"]
    MT --> RING["m_trans_status_history[2048]"]
    MT --> LVA["m_transaction_lowest_visible_mvccids[]"]
    MT --> OV["m_oldest_visible + m_ov_lock_count"]
    CUR --> CAT["m_active_mvccs (by value)"]
    RING --> RAT["m_active_mvccs (by value)"]
  end

  subgraph Tdes["log_tdes (per transaction)"]
    MI["mvcc_info"]
    MI --> SNAP["snapshot (mvcc_snapshot)"]
    MI --> ID["id"]
    MI --> SUB["sub_ids + is_sub_active"]
    SNAP --> SAT["m_active_mvccs (by value)"]
    SNAP --> WM["lowest/highest watermarks"]
  end

  subgraph Disk["Heap record (on disk)"]
    RH["mvcc_rec_header"]
    RH --> INS["mvcc_ins_id / mvcc_del_id"]
    RH --> PV["prev_version_lsa"]
  end

  MT -. "build_mvcc_info copies bits + watermarks" .-> SNAP
  SNAP -. "snapshot_fnc compares" .-> RH
  PV -. "version chain walk" .-> Disk

Figure 1-1. Solid arrows = containment (by value unless labelled); dashed arrows = runtime data flows: table seeds a snapshot, snapshot_fnc evaluates a header, a too-new record walks prev_version_lsa.

  • The legacy macro MVCC_IS_REC_DELETED_BY still dereferences a delid_chn.mvcc_del_id union member gone from mvcc_rec_header (now separate chn and mvcc_del_id) — dead code; live reads use MVCC_GET_DELID / MVCC_IS_HEADER_DELID_VALID.
  1. MVCCID is a 64-bit counter with a reserved low band (0/3/4 = MVCCID_NULL/MVCCID_ALL_VISIBLE/MVCCID_FIRST); MVCCID_FORWARD keeps live ids above it.
  2. mvcc_rec_header is the only on-disk MVCC struct; mvcc_flag:8 selects which of mvcc_ins_id, mvcc_del_id (vs chn), prev_version_lsa apply — DELID/chn exclusive on disk, distinct in memory.
  3. mvcc_active_tran is a sliding bit window plus a long-tran overflow array (offset = id - m_bit_area_start_mvccid, window ≤ 500×64 ids, volatile fields for lock-free reads).
  4. mvcc_snapshot brackets the bit area with two watermarks, takes a swappable snapshot_fnc, and copies only via copy_to.
  5. mvcc_info is the per-tdes bundle of snapshot, own id, the fast recent_snapshot_lowest_active_mvccid cutoff, and sub-transaction state.
  6. mvcc_trans_status is a seqlock-versioned active-set snapshot validated by m_version alone; the other two fields are diagnostic.
  7. mvcctable is the single global coordinator — id minting, a power-of-two history ring, the lowest-visible array, and the monotonic pinnable m_oldest_visible watermark gating vacuum.

Chapter 2: Initialization and Memory Layout

Section titled “Chapter 2: Initialization and Memory Layout”

Reader question: before any transaction runs, how is each MVCC structure allocated, sized, and bootstrapped, and where do the magic sizes come from? Chapter 1 mapped the three owning objects; here we trace every constructor, initialize/finalize, reset*, and size helper. Visibility theory lives in cubrid-mvcc.md.

The MVCCID counter does not live in any of these three structures — it lives in the log header, log_Gl.hdr.mvcc_next_id (log_storage.hpp); the table only reads and forwards it under m_new_mvccid_lock and seeds its own derived start markers from it. This governs 2.6; the structs here hold only derived state.

  • Dynamic, per-tran-index — sized to logtb_get_number_of_total_tran_indices () (= log_Gl.trantable.num_total_indices): m_long_tran_mvccids and m_transaction_lowest_visible_mvccids. Re-sizable as the tran table grows.
  • Fixed, compile-time — bit area BITAREA_MAX_SIZE = 500 units, ring HISTORY_MAX_SIZE = 2048 slots. Never resize; overflow migrates (bit area to long-tran array) or overwrites (ring wraps), never reallocs.
// mvcc_active_tran (private) -- src/transaction/mvcc_active_tran.hpp
using unit_type = std::uint64_t;
static const size_t BITAREA_MAX_SIZE = 500; // 500 units, fixed
static const size_t UNIT_BIT_COUNT = sizeof (unit_type) * BYTE_BIT_COUNT; // 64
static const size_t BITAREA_MAX_MEMSIZE = BITAREA_MAX_SIZE * UNIT_BYTE_COUNT; // 4000 bytes
static const size_t BITAREA_MAX_BITS = BITAREA_MAX_SIZE * UNIT_BIT_COUNT; // 32000 bits
static const unit_type ALL_ACTIVE = 0;
static const unit_type ALL_COMMITTED = (unit_type) -1;

A unit_type is 64 bits, so the bit area is 500 words = 4000 bytes tracking 32000 MVCCIDs. ALL_ACTIVE = 0 is load-bearing: a new[]()-zeroed buffer already means “every slot active,” so init never scrubs. The helpers convert between the three units:

// mvcc_active_tran helpers -- src/transaction/mvcc_active_tran.cpp
size_t bit_size_to_unit_size (size_t b) { return (b + UNIT_BIT_COUNT - 1) / UNIT_BIT_COUNT; } // bits->words, ceil
size_t units_to_bits (size_t n) { return n * UNIT_BIT_COUNT; } // words -> bits
size_t units_to_bytes (size_t n) { return n * UNIT_BYTE_COUNT; } // words -> bytes
size_t get_area_size () const { return bit_size_to_unit_size (m_bit_area_length); } // LIVE words
size_t get_bit_area_memsize () const { return units_to_bytes (get_area_size ()); } // LIVE bytes

get_area_size() is live words (from m_bit_area_length); BITAREA_MAX_SIZE is allocated words. The buffer is always full-width; reset and copy paths touch only get_bit_area_memsize() bytes — hence trailing words must stay ALL_ACTIVE.

flowchart LR
  subgraph mvcctable["mvcctable (one per log_Gl)"]
    A["m_transaction_lowest_visible_mvccids<br/>atomic MVCCID [ total_tran_indices ]"]
    B["m_trans_status_history<br/>mvcc_trans_status [ 2048 ]"]
    C["m_current_trans_status<br/>mvcc_trans_status"]
  end
  C --> F["m_active_mvccs : mvcc_active_tran"]
  B --> G["[i].m_active_mvccs : mvcc_active_tran"]
  F --> H["m_bit_area : unit_type [500]<br/>m_long_tran_mvccids : MVCCID [ total_tran_indices ]"]
  G --> H

Figure 2-1. Ownership and sizing axes. The two MVCCID arrays are dynamic-axis; the [500] bit area and [2048] ring are fixed-axis. Each mvcc_trans_status (current + 2048 ring) embeds a full mvcc_active_tran, so a live table holds 2049 bit areas.

2.2 mvcc_active_tran: construction, initialize, finalize, reset

Section titled “2.2 mvcc_active_tran: construction, initialize, finalize, reset”

The default constructor builds an empty, uninitialized object — no heap, pointers NULL, start marker MVCCID_FIRST. initialize makes the struct’s only two heap allocations, guarded for idempotency:

// mvcc_active_tran::mvcc_active_tran -- src/transaction/mvcc_active_tran.cpp
mvcc_active_tran::mvcc_active_tran ()
: m_bit_area (NULL) , m_bit_area_start_mvccid (MVCCID_FIRST) /* <- 4, never 0 */ , m_bit_area_length (0)
, m_long_tran_mvccids (NULL) , m_long_tran_mvccids_length (0) , m_initialized (false) { }
// mvcc_active_tran::initialize -- src/transaction/mvcc_active_tran.cpp
void mvcc_active_tran::initialize ()
{
if (m_initialized) { return; } /* <- branch 1: already up, no-op */
m_bit_area = new unit_type[BITAREA_MAX_SIZE] (); /* <- () zero-inits => ALL_ACTIVE */
m_bit_area_start_mvccid = MVCCID_FIRST;
m_bit_area_length = 0;
m_long_tran_mvccids = new MVCCID[long_tran_max_size ()] (); /* <- sized to tran indices */
m_long_tran_mvccids_length = 0;
m_initialized = true;
}

The trailing () value-initializes both arrays to zero; because ALL_ACTIVE == 0, that is a valid “all active, length 0” state. long_tran_max_size () returns logtb_get_number_of_total_tran_indices () — the ceiling on simultaneously-active “long” transactions (older than m_bit_area_start_mvccid); add_long_transaction asserts m_long_tran_mvccids_length < long_tran_max_size ().

finalize frees both arrays, nulls them, drops the flag — unlike ~mvcc_active_tran, it resets state so the object can be initialized again. reset and reset_active_transactions are not finalize: both keep allocations and only wipe content (snapshot-copy retry path, Chapter 5). They differ: reset memsets only the live prefix (get_bit_area_memsize (), guarded against a zero-byte call) and unseeds the start to MVCCID_NULL; reset_active_transactions memsets the whole BITAREA_MAX_MEMSIZE and keeps the start. The full clear is needed because a failed lock-free copy (version changed mid-memcpy) may have written garbage past the prefix that a prefix-only clear would miss before retry.

// mvcc_active_tran::finalize -- src/transaction/mvcc_active_tran.cpp
void mvcc_active_tran::finalize ()
{ delete [] m_bit_area; m_bit_area = NULL; delete [] m_long_tran_mvccids; m_long_tran_mvccids = NULL; m_initialized = false; }
// mvcc_active_tran::reset -- src/transaction/mvcc_active_tran.cpp
void mvcc_active_tran::reset ()
{
if (!m_initialized) { return; } /* <- branch 1: bare object => no-op */
if (m_bit_area_length > 0) /* <- branch 2: memset only LIVE prefix */
{ std::memset (m_bit_area, 0, get_bit_area_memsize ()); }
m_bit_area_length = 0;
m_bit_area_start_mvccid = MVCCID_NULL; /* <- NULL (0), not MVCCID_FIRST */
m_long_tran_mvccids_length = 0;
check_valid ();
}
// mvcc_active_tran::reset_active_transactions -- src/transaction/mvcc_active_tran.cpp
void mvcc_active_tran::reset_active_transactions ()
{ std::memset (m_bit_area, 0, BITAREA_MAX_MEMSIZE); /* <- full 4000 bytes */ m_bit_area_length = 0; m_long_tran_mvccids_length = 0; }

Invariant (trailing-words-clear). Every word past get_unit_of(m_bit_area_length) + 1 up to m_bit_area + BITAREA_MAX_SIZE equals ALL_ACTIVE (0), and bits past m_bit_area_length in the last partial word are 0. Enforced by check_valid (an #ifndef NDEBUG loop asserting *p_area == ALL_ACTIVE); maintained by initialize, reset, ltrim_area, reset_active_transactions. If violated, compute_lowest_active_mvccid / compute_highest_completed_mvccid read stale set-bits beyond the live length and report a wrong watermark, corrupting visibility.

FieldRoleWhy it exists
m_bit_areaWindow of BITAREA_MAX_SIZE words; bit 0 = active, 1 = committedOne bit per recent MVCCID, 64/word
m_bit_area_start_mvccid (volatile MVCCID)MVCCID mapped to bit offset 0Anchors the window
m_bit_area_length (volatile size_t)Bits in use (not bytes, not alloc size)Bounds scans/memsets to live prefix
m_long_tran_mvccidsAscending array of active MVCCIDs below the windowOverflow store for older transactions
m_long_tran_mvccids_length (volatile size_t)Live entries in the long-tran arrayBounds scans; asserted < long_tran_max_size()
m_initializedHas initialize run (and not finalized)Idempotent init, safe-no-op reset

volatile on three fields reflects lock-free reads of the active set mutated under m_active_trans_mutex; the real fence is the version recheck in build_mvcc_info (Chapter 5).

2.3 mvcc_trans_status: the version-tagged wrapper

Section titled “2.3 mvcc_trans_status: the version-tagged wrapper”

A thin envelope: one mvcc_active_tran plus bookkeeping and the atomic version counter readers spin on. The ctor sets version 0 and neutral info fields; initialize delegates down and re-stamps version 0 (it may run on a recycled object whose version was bumped); finalize mirrors it.

// mvcc_trans_status -- src/transaction/mvcc_table.hpp
struct mvcc_trans_status
{
using version_type = unsigned int;
enum event_type { COMMIT, ROLLBACK, SUBTRAN };
mvcc_active_tran m_active_mvccs;
MVCCID m_last_completed_mvccid; // just for info
event_type m_event_type; // just for info
std::atomic<version_type> m_version;
};
// ctor / initialize / finalize -- src/transaction/mvcc_table.cpp
mvcc_trans_status::mvcc_trans_status ()
: m_active_mvccs () , m_last_completed_mvccid (MVCCID_NULL) , m_event_type (COMMIT) , m_version (0) { }
void mvcc_trans_status::initialize () { m_active_mvccs.initialize (); m_version = 0; }
void mvcc_trans_status::finalize () { m_active_mvccs.finalize (); }
FieldRoleWhy it exists
m_active_mvccsThe active-transaction bitmap for this snapshotThe payload; rest is metadata
m_last_completed_mvccidLast MVCCID committed/rolled back hereInfo/debug; visibility ignores it
m_event_typeCOMMIT/ROLLBACK/SUBTRANInfo; debugger trace of ring advances
m_version (atomic)Monotonic stamp bumped on every status changeLock-free guard: re-read unchanged = consistent copy

2.4 mvcctable::initialize: the 2048-slot ring and the per-tran array

Section titled “2.4 mvcctable::initialize: the 2048-slot ring and the per-tran array”

mvcctable is the single owning object (log_Gl.mvcc_table). Its constructor wires every member to a benign default but allocates nothing (markers MVCCID_FIRST/MVCCID_NULL, pointers NULL, sizes/counts 0). initialize (called once at boot from logtb_define_trantable_log_latch) does the allocation:

// mvcctable::initialize -- src/transaction/mvcc_table.cpp
void mvcctable::initialize ()
{
m_current_trans_status.initialize (); /* <- 1: seed the live status */
m_trans_status_history = new mvcc_trans_status[HISTORY_MAX_SIZE]; /* <- 2: 2048 slots, ctors only */
for (size_t idx = 0; idx < HISTORY_MAX_SIZE; idx++)
{ m_trans_status_history[idx].initialize (); } /* <- 3: each slot allocs its bit area */
m_trans_status_history_position = 0; /* <- 4: ring head at slot 0 */
m_current_status_lowest_active_mvccid = MVCCID_FIRST; /* <- 5: nothing older than 4 active */
alloc_transaction_lowest_active (); /* <- 6: per-tran array */
}

Step ordering is deliberate: the bare new[2048] only runs constructors (each embedded active set NULL); the per-slot initialize does the heap work. So a live table holds 2049 mvcc_active_tran instances (current + 2048 ring), each a 4000-byte buffer. HISTORY_MAX_SIZE = 2048 is a power of two so the ring wraps with & HISTORY_INDEX_MASK (= 2047), not a modulo.

Invariant (history-position-in-range). m_trans_status_history_position < HISTORY_MAX_SIZE always. Enforced at init (0) and on advance by (pos + 1) & HISTORY_INDEX_MASK (masks into [0, 2047]); build_mvcc_info and is_active assert (index < HISTORY_MAX_SIZE). If violated, readers index past the ring and read unrelated memory as an active set.

finalize tears down in reverse, zeroing the per-tran size so a later alloc re-allocates. It does not loop the ring slots: delete [] m_trans_status_history runs each ~mvcc_trans_status then ~mvcc_active_tran, freeing every slot’s buffers; the current status’s bit area is freed explicitly via m_current_trans_status.finalize ().

// mvcctable::finalize -- src/transaction/mvcc_table.cpp
void mvcctable::finalize ()
{
m_current_trans_status.finalize ();
delete [] m_trans_status_history; m_trans_status_history = NULL;
delete [] m_transaction_lowest_visible_mvccids; m_transaction_lowest_visible_mvccids = NULL;
m_transaction_lowest_visible_mvccids_size = 0; /* <- forces re-alloc on next init */
}
flowchart TD
  S(["mvcctable::initialize"]) --> A["m_current_trans_status.initialize()<br/>=> allocs current bit area + long-tran array"]
  A --> B["new mvcc_trans_status[2048]<br/>=> 2048 default ctors, arrays still NULL"]
  B --> C{"loop idx 0..2047"}
  C -->|each| D["history[idx].initialize()<br/>=> allocs that slot's bit area + long-tran array"]
  C -->|done| E["history_position = 0"]
  E --> F["current_status_lowest_active = MVCCID_FIRST"]
  F --> G(["alloc_transaction_lowest_active()"])

Figure 2-2. mvcctable::initialize control flow.

2.5 alloc_transaction_lowest_active: size-change detection

Section titled “2.5 alloc_transaction_lowest_active: size-change detection”

The one allocation that can run more than once — at boot and on every transaction-table resize — written as a re-alloc-on-resize check:

// mvcctable::alloc_transaction_lowest_active -- src/transaction/mvcc_table.cpp
void mvcctable::alloc_transaction_lowest_active ()
{
if (m_transaction_lowest_visible_mvccids_size != (size_t) logtb_get_number_of_total_tran_indices ())
{ // first time or tran table resized
delete [] m_transaction_lowest_visible_mvccids; /* <- delete NULL is fine first time */
m_transaction_lowest_visible_mvccids_size = logtb_get_number_of_total_tran_indices ();
m_transaction_lowest_visible_mvccids =
new lowest_active_mvccid_type[m_transaction_lowest_visible_mvccids_size] (); // all 0 = MVCCID_NULL
}
}

Two branches: (1) size matches — body skipped (steady-state when reached redundantly, e.g. logtb_expand_trantable re-invoked with no net change); (2) size differs (first call has stored size 0, or a genuine resize) — old array freed (delete [] NULL is legal), size updated, new lowest_active_mvccid_type (= std::atomic<MVCCID>) array value-initialized so every element reads MVCCID_NULL (0) = “no snapshot lower bound.” The hook is wired in logtb_expand_trantable, so the array tracks growth without re-running initialize.

Invariant (per-tran-array sized to live tran count). m_transaction_lowest_visible_mvccids_size == logtb_get_number_of_total_tran_indices () whenever the array is used. Enforced by this guard each time the table could grow. If violated (array too small), build_mvcc_info / complete_mvcc indexing [tdes.tran_index] writes out of bounds; build_mvcc_info defends with assert (tdes.tran_index < logtb_get_number_of_total_tran_indices ()).

FieldRoleWhy it exists
m_transaction_lowest_visible_mvccidsOne atomic<MVCCID> per tran index: lowest visible MVCCIDSource for global oldest-visible; dynamic axis
m_transaction_lowest_visible_mvccids_sizeAllocated length of that arrayResize detection in alloc_transaction_lowest_active
m_current_status_lowest_active_mvccid (atomic)Lowest active MVCCID for the current statusFast watermark; seeds snapshots
m_current_trans_statusSingle live, mutable status (under mutex)Write target; published into the ring
m_trans_status_history_position (atomic)Index of most-recently-published ring slotLock-free reader entry point
m_trans_status_history2048-slot ring of published read-only snapshotsStable past status without blocking writers
m_new_mvccid_lock (mutex)Guards read-and-forward of log_Gl.hdr.mvcc_next_idSerializes MVCCID issuance
m_active_trans_mutex (mutex)Guards current-status mutation + ring publishOne completion mutates status at a time
m_oldest_visible (atomic)Cached global oldest-visible for vacuumAvoids re-scanning per query
m_ov_lock_count (atomic)Holders pinning m_oldest_visibleVacuum freezes the watermark (Chapter 9)

2.6 Boot/restart seeding: reset_start_mvccid

Section titled “2.6 Boot/restart seeding: reset_start_mvccid”

initialize seeds markers to MVCCID_FIRST, but a real boot/restart has already issued MVCCIDs, so markers must be re-seeded from the persisted counter (introduced above):

// mvcctable::reset_start_mvccid -- src/transaction/mvcc_table.cpp
void mvcctable::reset_start_mvccid () // not thread safe (header comment)
{
m_current_trans_status.m_active_mvccs.reset_start_mvccid (log_Gl.hdr.mvcc_next_id);
assert (m_trans_status_history_position < HISTORY_MAX_SIZE);
m_trans_status_history[m_trans_status_history_position]
.m_active_mvccs.reset_start_mvccid (log_Gl.hdr.mvcc_next_id); /* <- only the CURRENT ring slot */
m_current_status_lowest_active_mvccid.store (log_Gl.hdr.mvcc_next_id);
}
// mvcc_active_tran::reset_start_mvccid -- src/transaction/mvcc_active_tran.cpp
void mvcc_active_tran::reset_start_mvccid (MVCCID mvccid)
{ m_bit_area_start_mvccid = mvccid; if (m_initialized) { check_valid (); } }

It re-points three things at the recovered value: the current status’s active set, the currently-published ring slot’s active set, and the global lowest-active. Only the slot at m_trans_status_history_position is touched — it alone is live after restart; the other 2047 are stale-but-zeroed and re-published as the ring advances. The only code that advances log_Gl.hdr.mvcc_next_id is get_new_mvccid/get_two_new_mvccid (MVCCID_FORWARD under m_new_mvccid_lock). reset_start_mvccid runs from log_initialize_internal and three points in log_recovery.c (after analysis, after redo, finish), each right after the header is rebuilt; “not thread safe” is fine because these precede concurrent transactions.

stateDiagram-v2
  [*] --> Constructed : mvcctable ctor \n markers MVCCID_FIRST, arrays NULL
  Constructed --> Initialized : initialize \n allocs ring + per-tran array, seeds MVCCID_FIRST
  Initialized --> Reseeded : reset_start_mvccid \n markers to log_Gl.hdr.mvcc_next_id
  Reseeded --> Reseeded : recovery re-runs reset_start_mvccid
  Reseeded --> Serving : recovery done, accept transactions
  Serving --> Finalized : finalize \n free ring, free per-tran array
  Finalized --> [*]

Figure 2-3. mvcctable lifecycle: initialize once; reset_start_mvccid once per recovery phase; alloc_transaction_lowest_active re-runs on resize.

  1. The MVCCID counter is not in the table — it lives in log_Gl.hdr.mvcc_next_id, advanced only by get_new_mvccid/get_two_new_mvccid under m_new_mvccid_lock; the table holds derived markers re-synced via reset_start_mvccid.
  2. Two sizing axesm_long_tran_mvccids and m_transaction_lowest_visible_mvccids sized to logtb_get_number_of_total_tran_indices () (dynamic); bit area (500 words = 32000 bits) and ring (2048) fixed, absorbing overflow by migration/wrap.
  3. ALL_ACTIVE == 0 makes zero-init meaningful — every new[]() already means “all active, length 0,” so initialize needs no scrub and the trailing-words-clear invariant holds for free.
  4. A live table holds 2049 bit areas — current status plus 2048 ring slots; the bare new[2048] only runs ctors, the per-slot initialize does the heap work.
  5. initialize/reset/reset_active_transactions/finalize are distinct — allocate; zero the live prefix and unseed to MVCCID_NULL; zero the full buffer for the failed-copy retry; free and de-initialize.
  6. alloc_transaction_lowest_active is the only re-runnable allocation — its size-change guard reallocates only when total_tran_indices differs (slots value-init to MVCCID_NULL); wired into logtb_expand_trantable.
  7. reset_start_mvccid is the boot/restart seam — re-points the current status, the current ring slot, and the global lowest-active at the recovered counter; single-threaded recovery only.

Chapter 3: MVCCID Birth and the On-Record Header

Section titled “Chapter 3: MVCCID Birth and the On-Record Header”

This chapter answers: when does a version acquire its MVCCID, and how is that stamp serialized onto a heap record so records that never needed delete or prev-version metadata cost zero extra bytes? The companion (cubrid-mvcc.md, MVCCID assignment policy / Per-record header) establishes why CUBRID uses lazy issuance and a flag-driven header; here we trace every branch. The lifecycle splits into birth (a 64-bit MVCCID minted lazily, once per writing transaction, under a spinlock) and serialization (that MVCCID plus optional delete-id and prev-version pointer encoded into a variable-length header whose length is a 5-bit flag field in the high byte of the rep word).

3.1 Birth: lazy issuance through curr_mvcc_info->id

Section titled “3.1 Birth: lazy issuance through curr_mvcc_info->id”

A read-only transaction never writes, so it never needs an MVCCID. The transaction descriptor (LOG_TDES) carries an mvccinfo.id that is MVCCID_NULL (= 0) until the first write demands a stamp. The gate is logtb_get_current_mvccid:

// logtb_get_current_mvccid -- src/transaction/log_tran_table.c
if (MVCCID_IS_VALID (curr_mvcc_info->id) == false) /* <- mint UNCONDITIONALLY, first write only */
curr_mvcc_info->id = log_Gl.mvcc_table.get_new_mvccid ();
if (!tdes->mvccinfo.sub_ids.empty ())
return tdes->mvccinfo.sub_ids.back (); /* <- sub shadows parent; parent already minted */
return curr_mvcc_info->id;

Order is load-bearing: the mint/validity test runs first and unconditionally, so even when a sub-transaction is open the parent id is minted (or confirmed valid) before sub_ids returns the sub-id. Branches: (1) id valid -> no allocation; the same MVCCID serves every row (atomic visibility); (2) sub_ids non-empty -> the sub-id (Chapter 10); (3) otherwise the parent id.

Invariant — one stamp per writing transaction. curr_mvcc_info->id transitions MVCCID_NULL -> <normal id> exactly once, only on a write, enforced by the MVCCID_IS_VALID guard and reset to MVCCID_NULL only at transaction end (logtb_complete_mvcc / reset()). Re-minting mid-transaction would give two rows of the same transaction different stamps — a reader could see a torn write.

A non-lazy entry exists for crash recovery: logtb_rv_assign_mvccid_for_undo_recovery forces tdes->mvccinfo.id = mvccid straight from a log record — at recovery the id is already known and restored verbatim, not re-minted. The “valid”/“all-visible” contract comes from storage_common.h: MVCCID_NULL is 0, MVCCID_ALL_VISIBLE is the literal 3, MVCCID_FIRST is 4, so a normal id is always >= 4. These reserved low values let the header overload “no insert id” to mean “visible to everyone” (§3.5).

flowchart TD
  A["needs current MVCCID"] --> D{"id valid?"}
  D -- "yes" --> S{"sub_ids empty?"}
  D -- "no, first write" --> F["get_new_mvccid(): id = mvcc_next_id; FORWARD"]
  F --> S
  S -- "no" --> C["return sub_ids.back()"]
  S -- "yes" --> E["return curr_mvcc_info->id"]

Figure 3-1: lazy issuance in logtb_get_current_mvccid; mint/validity test precedes the sub_ids test, matching source order.

3.2 The allocator: get_new_mvccid and get_two_new_mvccid

Section titled “3.2 The allocator: get_new_mvccid and get_two_new_mvccid”

The counter lives in the global log header (log_Gl.hdr.mvcc_next_id), bumped under a dedicated lock with a tiny critical section:

// mvcctable::get_new_mvccid -- src/transaction/mvcc_table.cpp
m_new_mvccid_lock.lock ();
id = log_Gl.hdr.mvcc_next_id;
MVCCID_FORWARD (log_Gl.hdr.mvcc_next_id); /* <- ++, skipping reserved 0..3 */
m_new_mvccid_lock.unlock ();

MVCCID_FORWARD is (id)++ with a wrap-guard: if the post-increment lands below MVCCID_FIRST (4) — only at the 64-bit unsigned wrap — it snaps back to 4, so 0..3 are never handed out live.

Invariant — monotonic, gap-tolerant issuance. Under m_new_mvccid_lock each caller reads then forwards, so every id is strictly greater than the last. Gaps are expected and harmless — a transaction may mint an id and roll back; visibility depends on ordering, not contiguity. Dropping the lock would let two threads read the same value and share a stamp, breaking §3.1.

get_two_new_mvccid handles the parent+sub case where a transaction’s first write happens inside a sub-transaction. It forwards twice in one lock acquisition (first to the parent, second to the sub). The single caller, logtb_get_new_subtransaction_mvccid, mints just one (get_new_mvccid) if the parent already has a valid id; only when the parent is still MVCCID_NULL does it call get_two_new_mvccid, so the parent always gets the smaller of the pair — it must sort before its own sub-transaction in the active set (Chapter 10).

or_mvcc_get_header deserializes a record into a MVCC_REC_HEADER. It is wider than the on-disk form: all fields always exist in RAM, only the flagged ones are written back.

// mvcc_rec_header -- src/transaction/mvcc.h
struct mvcc_rec_header {
INT32 mvcc_flag:8; /* MVCC flags */
INT32 repid:24; /* representation id */
int chn; /* cache coherency number */
MVCCID mvcc_ins_id; /* MVCC insert id */
MVCCID mvcc_del_id; /* MVCC delete id */
LOG_LSA prev_version_lsa; /* log address of previous version */
};
FieldRoleWhy it exists
mvcc_flag:8low 5 bits (OR_MVCC_FLAG_MASK = 0x1f) say which optional members are present on diskThe flag is the schema for the variable-length encoding — decides header length and every branch in §3.4
repid:24Representation id of the row’s schema versionPacked into the same 32-bit word (flags top byte, repid low 24); one OR_GET_INT recovers both
chnCache coherency number, bumped on non-MVCC updates. Role-shared with delete-id: VALID_DELID clear (live) -> slot is a real CHN, mvcc_del_id in-RAM MVCCID_NULL; set -> slot superseded, real deleter MVCCID 8 bytes laterA record is CHN-bearing or DELID-bearing in meaning, never both at once — see the VALID_DELID invariant in §3.4
mvcc_ins_idMVCCID of the inserting transactionBirth stamp; compared against a reader’s snapshot for insert-visibility
mvcc_del_idMVCCID of the deleting/updating transactionPresent only on delete/in-place-update; common-case absence saves 8 bytes
prev_version_lsaLog LSA of the previous versionWalks the version chain backward; present only on updated rows
classDiagram
  class mvcc_rec_header {
    +INT32 mvcc_flag : 8
    +INT32 repid : 24
    +int chn
    +MVCCID mvcc_ins_id
    +MVCCID mvcc_del_id
    +LOG_LSA prev_version_lsa
  }
  class OR_MVCC_FLAGS {
    VALID_INSID 0x01
    VALID_DELID 0x02
    VALID_PREV_VERSION 0x04
    MASK 0x1f
  }
  mvcc_rec_header --> OR_MVCC_FLAGS : low 5 bits select on-disk fields

Figure 3-2: the in-memory header and the flags that govern its on-disk projection.

3.4 The on-disk layout and its offset arithmetic

Section titled “3.4 The on-disk layout and its offset arithmetic”

The first 4-byte word (OR_REP_OFFSET, OR_MVCC_REP_SIZE = 4) packs repid in its low 24 bits and the flags in its high byte, shifted by OR_MVCC_FLAG_SHIFT_BITS = 24. The CHN word (OR_CHN_SIZE = 4) always follows. Everything after is conditional and cumulative — each optional offset adds the size of every earlier present field:

// offset macros -- src/base/object_representation.h
#define OR_MVCC_INSERT_ID_OFFSET (OR_CHN_OFFSET + OR_CHN_SIZE) /* = 8 */
#define OR_MVCC_DELETE_ID_OFFSET(f) \
(OR_MVCC_INSERT_ID_OFFSET + (((f) & OR_MVCC_FLAG_VALID_INSID) ? OR_MVCC_INSERT_ID_SIZE : 0))
#define OR_MVCC_PREV_VERSION_LSA_OFFSET(f) \
(OR_MVCC_DELETE_ID_OFFSET(f) + (((f) & OR_MVCC_FLAG_VALID_DELID) ? OR_MVCC_DELETE_ID_SIZE : 0))

A record with only VALID_INSID puts nothing where delete-id would go. The flag-to-length lookup is mvcc_header_size_lookup[8] (indexed by the 3 active bits): flag 000 -> 8, 001/010 -> 16, 011 -> 24, … 111 -> 32, adding OR_MVCCID_SIZE per id flag and OR_MVCC_PREV_VERSION_LSA_SIZE for the prev-version bit. Endpoints are named OR_MVCC_MIN_HEADER_SIZE = 8 (no optional fields) and OR_MVCC_MAX_HEADER_SIZE = 32 (all three). A live, never-updated row carries a 16-byte header (rep + CHN + insid); the delete-id and prev-version slots simply do not exist on the page. This is the “unused slots cost zero bytes” property — it falls out of the conditional offset macros plus the size table.

Invariant — flag bits and physical length agree. mvcc_header_size_lookup[flag] must equal the bytes the or_mvcc_set_* sequence writes. or_mvcc_set_header enforces it: it compares old_mvcc_size vs new_mvcc_size and, if they differ, calls HEAP_MOVE_INSIDE_RECORD to grow/shrink the slot region so payload bytes are not overwritten. A setter writing a field whose flag was clear (or vice-versa) would misalign every later offset and parse the body as garbage.

3.4.1 or_mvcc_get_header — branch by branch

Section titled “3.4.1 or_mvcc_get_header — branch by branch”

Deserialization reads in fixed field order, each step consulting the flag: repid and mvcc_flag are unpacked from the rep word, then or_mvcc_get_chn (always), or_mvcc_get_insid, or_mvcc_get_delid, and or_mvcc_get_prev_version_lsa (each flag-gated), with if (rc != NO_ERROR) goto exit_on_error; after each — exit_on_error returns the code, falling back to er_errid() / ER_FAILED. The helpers share one governing property: a flag-gated getter that finds its flag clear returns a sentinel and leaves buf->ptr exactly where it was, so the next read stays aligned and the cumulative offsets remain self-consistent without the reader touching the offset macros. Only the sentinel differs per field:

  • or_mvcc_get_insid: flag clear -> returns MVCCID_ALL_VISIBLE; set -> reads a BIGINT, advances OR_MVCCID_SIZE.

    // or_mvcc_get_insid -- src/base/object_representation_sr.c
    if (!(mvcc_flags & OR_MVCC_FLAG_VALID_INSID))
    return MVCCID_ALL_VISIBLE; /* <- ptr NOT advanced */
    // ... reads BIGINT, buf->ptr += OR_MVCCID_SIZE ...
  • or_mvcc_get_delid: clear -> returns MVCCID_NULL; set -> reads-and-advances.

  • or_mvcc_get_chn: unconditional — CHN has no flag (the slot always exists); reads OR_INT_SIZE, advances. Whether the slot is a CHN or a displaced delete id is decided by VALID_DELID (§3.3).

  • or_mvcc_get_prev_version_lsa: clear -> LSA_SET_NULL; set -> copies 8 bytes via struct assignment, advances.

3.4.2 or_mvcc_set_header, or_mvcc_add_header, and the setters

Section titled “3.4.2 or_mvcc_set_header, or_mvcc_add_header, and the setters”

or_mvcc_set_header rewrites an existing header (the resize-aware path above); or_mvcc_add_header prepends one to a fresh record (asserts record->length == 0, sets record->length to bytes written). Both run the same sequence — set_repid_and_flags -> set_chn -> set_insid -> set_delid -> set_prev_version_lsa — each short-circuiting on its flag: or_mvcc_set_insid returns NO_ERROR writing nothing when VALID_INSID is clear, else or_put_bigint; or_mvcc_set_delid and or_mvcc_set_prev_version_lsa mirror it; or_mvcc_set_chn is unconditional. The only structural difference: set first calls HEAP_MOVE_INSIDE_RECORD to reconcile size; add appends to a zero-length record.

or_mvcc_get_flag / or_mvcc_set_flag are narrow accessors used when only the flag byte must change:

// or_mvcc_set_flag -- src/base/object_representation_sr.c
repid_and_flag = OR_GET_INT (record->data + OR_REP_OFFSET);
repid_and_flag &= ~OR_MVCC_FLAG_MASK; /* <- clears LOW 5 bits 0x1f */
repid_and_flag += ((flags & OR_MVCC_FLAG_MASK) << OR_MVCC_FLAG_SHIFT_BITS); /* <- ADD into bits 24+ */
// ... or_put_int writes it back at OR_REP_OFFSET ...

Note the quirk: the mask target (low 5 bits) is not where the flags are combined (bits 24+), and the combine is +=, not bitwise OR — it works only because the high flag region was already cleared by whatever last wrote the word. or_mvcc_set_flag does not resize the record, so callers must separately ensure the physical layout matches the new flag — a sharp edge versus set_header.

flowchart TD
  A["or_mvcc_set_header(record, hdr)"] --> B["old=lookup[old_flag]\nnew=lookup[hdr.flag]"]
  B --> C{"old != new?"}
  C -- "yes" --> D{"area_size big enough?"}
  D -- "no" --> E["assert(false); exit_on_error"]
  D -- "yes" --> F["HEAP_MOVE_INSIDE_RECORD"]
  C -- "no" --> G["set repid+flags"]
  F --> G
  G --> H["set_chn -> set_insid -> set_delid -> set_prev_version_lsa"]
  H --> I["NO_ERROR"]

Figure 3-3: resize-aware header rewrite in or_mvcc_set_header.

3.5 Heap-layer entry points and where the stamp lands

Section titled “3.5 Heap-layer entry points and where the stamp lands”

heap_get_mvcc_header dispatches on context->record_type. REC_HOME -> spage_get_record(..., PEEK) then or_mvcc_get_header; a PEEK failure or get error both assert(false) and return S_ERROR (impossible by construction — page latched, slot validated). REC_BIGONE -> needs a forward page; delegates to the overflow reader. REC_RELOCATION -> reads the forward page at context->forward_oid.slotid, same get + error handling. default -> assert(false), S_ERROR — any other type is a caller bug.

heap_get_mvcc_rec_header_from_overflow is the special case: overflow records always store a maximum-size header. If the caller passes a NULL peek_recdes, it falls back to a stack ovf_recdes scratch buffer before pointing ->data at overflow_get_first_page_data (ovf_page) and forcing ->length = OR_MVCC_MAX_HEADER_SIZE, then calls or_mvcc_get_header, because on overflow pages the optional fields are always materialized (its sibling heap_set_mvcc_rec_header_on_overflow force-sets VALID_INSID/VALID_DELID and asserts OR_MVCC_MAX_HEADER_SIZE). The zero-cost-slot optimization is thus a home/relocation property only; big records trade space for a fixed layout.

Where does the insert stamp get written? The freshly built record carries VALID_INSID, set in heap_attrinfo_transform_header_to_disk (and heap_insert_adjust_recdes_header) via the repid_bits |= (OR_MVCC_FLAG_VALID_INSID << OR_MVCC_FLAG_SHIFT_BITS) branch, but the insid slot is initially 0. The real MVCCID is fetched via logtb_get_current_mvccid(thread_p) at log time. heap_mvcc_log_insert forces lazy issuance per §3.1 and logs only the rep-word, CHN, and body as redo crumbs — never the insid bytes. It branches on record size and on the logging mode. For a non-REC_BIGONE record it emits four redo crumbs (record type, the rep-word OR_INT_SIZE, CHN OR_INT_SIZE, then the body from OR_HEADER_SIZE(p_recdes->data) onward — past the insid slot); for REC_BIGONE it skips the header crumbs and logs only record type plus the full record body (the overflow page carries its own max-size header). When thread_p->no_logging is set it calls log_append_undo_crumbs (RVHF_MVCC_INSERT, ...) with no redo, otherwise log_append_undoredo_crumbs:

// heap_mvcc_log_insert -- src/storage/heap_file.c
redo_crumbs[n_redo_crumbs].length = sizeof (p_recdes->type); /* <- always: record type */
redo_crumbs[n_redo_crumbs++].data = &p_recdes->type;
if (p_recdes->type != REC_BIGONE)
{
// ... rep-word OR_INT_SIZE, CHN OR_INT_SIZE crumbs ...
data_copy_offset = OR_HEADER_SIZE (p_recdes->data); /* <- body starts past insid slot */
} /* <- REC_BIGONE: data_copy_offset stays 0 */
redo_crumbs[n_redo_crumbs].length = p_recdes->length - data_copy_offset;
redo_crumbs[n_redo_crumbs++].data = p_recdes->data + data_copy_offset;
if (thread_p->no_logging)
log_append_undo_crumbs (thread_p, RVHF_MVCC_INSERT, p_addr, 0, NULL); /* <- undo only */
else
log_append_undoredo_crumbs (thread_p, RVHF_MVCC_INSERT, p_addr, 0, n_redo_crumbs, NULL, redo_crumbs);

On redo, heap_rv_mvcc_redo_insert re-stamps via MVCC_SET_INSID (&mvcc_rec_header, rcv->mvcc_id) then or_mvcc_add_header. So the MVCCID is logically issued lazily (§3.1), threaded through the log record’s mvcc_id, and physically written into the insid slot by the redo/apply path — recovery re-derives the stamp from rcv->mvcc_id, avoiding a double source of truth.

3.6 Interpreting the flags: the MVCC_IS_HEADER_* macros

Section titled “3.6 Interpreting the flags: the MVCC_IS_HEADER_* macros”

Once a header is in RAM, callers ask three boolean questions, each combining a flag test with a value test:

// MVCC_IS_HEADER_* -- src/transaction/mvcc.h
#define MVCC_IS_HEADER_DELID_VALID(h) \
(MVCC_IS_FLAG_SET (h, OR_MVCC_FLAG_VALID_DELID) && MVCCID_IS_VALID (MVCC_GET_DELID (h)))
#define MVCC_IS_HEADER_INSID_NOT_ALL_VISIBLE(h) \
(MVCC_IS_FLAG_SET (h, OR_MVCC_FLAG_VALID_INSID) && MVCC_GET_INSID (h) != MVCCID_ALL_VISIBLE)
#define MVCC_IS_HEADER_ALL_VISIBLE(h) \
(!MVCC_IS_FLAG_SET (h, OR_MVCC_FLAG_VALID_INSID|OR_MVCC_FLAG_VALID_DELID) \
&& MVCC_GET_INSID (h) == MVCCID_ALL_VISIBLE)

The double test guards a half-built header: a flag may be set while the id is still MVCCID_NULL (during construction), or the in-RAM struct may hold MVCCID_ALL_VISIBLE with no flag set (what or_mvcc_get_insid returns when VALID_INSID is clear). ALL_VISIBLE is the steady state of a row old enough that vacuum stripped its insid flag — no insid, no delid, in-RAM insid reads as literal 3 — so it is unconditionally visible with no snapshot comparison. These feed Chapter 6 (visibility) and Chapter 7 (vacuum predicates).

3.7 Open question — the two spare flag bits

Section titled “3.7 Open question — the two spare flag bits”

OR_MVCC_FLAG_MASK reserves five bits (0x1f), but only three are defined: VALID_INSID (0x01), VALID_DELID (0x02), VALID_PREV_VERSION (0x04). Bits 0x08 and 0x10 are masked-in and shifted but never assigned a meaning, and mvcc_header_size_lookup is sized [8] — it indexes only the three defined bits, so setting 0x08 would index out of bounds. The header comment implies deliberate reserve, but the intended use is undocumented. Anyone adding a fourth on-record MVCC field must widen mvcc_header_size_lookup, extend every cumulative offset macro, and audit the OR_MVCC_MAX_HEADER_SIZE = 32 assertion on overflow pages.

  1. MVCCIDs are issued lazily, once per writing transaction. logtb_get_current_mvccid mints curr_mvcc_info->id only when MVCCID_NULL; the mint/validity test runs unconditionally before the sub_ids branch, so even a sub-transaction path mints the parent id first.
  2. The allocator is a tiny lock-guarded counter. get_new_mvccid reads and MVCCID_FORWARDs log_Gl.hdr.mvcc_next_id under m_new_mvccid_lock; get_two_new_mvccid forwards twice for the parent+sub bootstrap, giving the parent the lower id. Gaps from rollbacks are harmless.
  3. mvcc_rec_header is wider in RAM than on disk. mvcc_flag’s low 5 bits decide which of insid, delid, and prev_version_lsa are physically present; the chn slot is role-shared with the delete-id region.
  4. Zero-cost unused slots come from cumulative offset macros plus mvcc_header_size_lookup[8]. A live un-updated row carries a 16-byte header. Bounds: OR_MVCC_MIN_HEADER_SIZE = 8, OR_MVCC_MAX_HEADER_SIZE = 32.
  5. Get/set helpers are flag-gated and pointer-consistent. A getter that skips a field must not advance the pointer; or_mvcc_set_header reconciles size via HEAP_MOVE_INSIDE_RECORD, while or_mvcc_set_flag clears the low 5 bits and +=-combines new flags into bits 24+ without resizing — a sharp edge.
  6. The stamp is written on the recovery/apply path. Insert builds a header with VALID_INSID (via heap_attrinfo_transform_header_to_disk / heap_insert_adjust_recdes_header) and a zero insid; heap_mvcc_log_insert forces issuance, and heap_rv_mvcc_redo_insert’s MVCC_SET_INSID(..., rcv->mvcc_id) is where the real MVCCID lands. Overflow records always store the max-size header.
  7. Two of the five flag bits are unused and unreachable through the [8]-wide size table — a constraint for any future on-record MVCC field.

Chapter 4: Active-Set Reads — Bit-Area Probe and Cached Scalars

Section titled “Chapter 4: Active-Set Reads — Bit-Area Probe and Cached Scalars”

Chapters 1–3 built the active-set representation (sliding m_bit_area, overflow m_long_tran_mvccids, on-record header). This chapter answers: given an MVCCID, how does the code decide whether it is still active, and how are the two cached short-circuit scalars (lowest_active_mvccid, highest_completed_mvccid) derived from the raw bits?

Three layers stack: the bit-area probe mvcc_active_tran::is_active; the derivation scans compute_highest_completed_mvccid / compute_lowest_active_mvccid that flatten the bit area into a scalar; and the wrappers mvcc_is_id_in_snapshot / mvcc_is_active_id that try the scalars first and fall back to the probe only inside the active window. For why the short-circuit is the fast path, see Snapshot Visibility in cubrid-mvcc.md; this chapter traces the code.

The probe lives on mvcc_active_tran, the scalars on mvcc_snapshot, the per-transaction cache on mvcc_info.

flowchart TB
  INFO["mvcc_info (per active tran, in LOG_TDES)<br/>id / recent_snapshot_lowest_active_mvccid / sub_ids"]
  SNAPSHOT["mvcc_snapshot<br/>lowest_active_mvccid / highest_completed_mvccid (scalars)"]
  ACTIVE["mvcc_active_tran<br/>m_bit_area + start + length / m_long_tran_mvccids"]
  INFO -->|owns snapshot| SNAPSHOT
  SNAPSHOT -->|owns m_active_mvccs| ACTIVE

Figure 4-1. mvcc_info owns a mvcc_snapshot, which owns the mvcc_active_tran bit area plus the two derived scalars.

mvcc_active_tran — mapped in Ch. 1; read-path fields recapped.

FieldRoleWhy it exists
m_bit_areaunit_type[BITAREA_MAX_SIZE]; bit i set ⇒ start+i committedDense window of recently-completed MVCCIDs
m_bit_area_start_mvccidMVCCID at bit offset 0Probe subtracts it to make a bit offset
m_bit_area_lengthWindow length in bitsBounds-check; 0 ⇒ all active
m_long_tran_mvccidsSorted-ascending active MVCCIDs below windowLong-runners the window evicted
m_long_tran_mvccids_lengthCount of the aboveLoop bound; [0] is global lowest active
m_initializedAllocation guardRead paths assume m_bit_area != NULL

mvcc_snapshot — the caller-facing snapshot record.

FieldRoleWhy it exists
lowest_active_mvccidScalar: anything strictly below is committed1st short-circuit: PRECEDES ⇒ not in snapshot
highest_completed_mvccidScalar: anything >= is active wrt this snapshot2nd short-circuit: FOLLOW_OR_EQUAL ⇒ in snapshot
m_active_mvccsThe mvcc_active_tran bit area copied at build timeAuthoritative answer inside the active window
snapshot_fncVisibility function bound to the snapshotSet by build_mvcc_info; Ch. 5–6
validWhether the snapshot is populatedGuards stale reads

mvcc_info — attached to every active transaction’s LOG_TDES.

FieldRoleWhy it exists
snapshotThe transaction’s own mvcc_snapshotBuilt once per statement/transaction
idOwn MVCCID (MVCCID_NULL if not written)logtb_is_current_mvccid matches it — own writes active to me
recent_snapshot_lowest_active_mvccidLowest-active from the most recent snapshotmvcc_is_active_id short-circuit: below ⇒ committed, skip table
sub_idsRunning sub-transaction MVCCIDslogtb_is_current_mvccid also matches these (Ch. 10)
is_sub_activeSub-transaction runningSub-transaction bookkeeping (Ch. 10)

Invariant (bit semantics). A set bit means committed, a clear bit means active — inverse of the naive reading, so is_active returns !is_set(position). ALL_ACTIVE = 0 and ALL_COMMITTED = (unit_type) -1 are the extreme units. Flip this polarity and every visibility decision inverts — committed rows vanish.

Every probe reduces to “which unit, which bit”. Four tiny but load-bearing inline helpers do the arithmetic; an off-by-one corrupts visibility for the whole window.

// get_bit_offset / get_unit_of / get_mask_of / is_set -- src/transaction/mvcc_active_tran.cpp
size_t get_bit_offset (MVCCID mvccid) const
{ return static_cast<size_t> (mvccid - m_bit_area_start_mvccid); } /* <- MVCCID to bit index */
unit_type *get_unit_of (size_t bit_offset) const
{ return m_bit_area + (bit_offset / UNIT_BIT_COUNT); } /* <- which 64-bit word (UNIT_BIT_COUNT==64) */
static unit_type get_mask_of (size_t bit_offset)
{ return ((unit_type) 1) << (bit_offset & 0x3F); } /* <- bit within word; &0x3F == mod 64 */
bool is_set (size_t bit_offset) const
{ return ((*get_unit_of (bit_offset)) & get_mask_of (bit_offset)) != 0; }

get_unit_of divides by 64 for the word; get_mask_of’s & 0x3F is a fast % 64 for the intra-word bit; is_set composes them. is_set has no bounds check — callers must verify the offset against m_bit_area_length first (§4.3 invariant).

The bottom of the stack: “is this MVCCID active in this captured active set”, in three mutually exclusive cases.

// mvcc_active_tran::is_active -- src/transaction/mvcc_active_tran.cpp
if (MVCC_ID_PRECEDES (mvccid, m_bit_area_start_mvccid)) /* CASE 1: below the window */
{
if (m_long_tran_mvccids != NULL)
for (size_t i = 0; i < m_long_tran_mvccids_length; i++)
if (mvccid == m_long_tran_mvccids[i])
return true; /* <- in long-tran overflow: active */
return false; /* <- below window, not long-tran: committed */
}
else if (m_bit_area_length == 0) /* CASE 2: empty window */
return true; /* <- nothing committed yet: active */
else /* CASE 3: inside / above the window */
{
size_t position = get_bit_offset (mvccid);
if (position < m_bit_area_length)
return !is_set (position); /* <- in window: set bit == committed */
else
return true; /* <- above highest tracked bit: active */
}

MVCC_ID_PRECEDES(id1, id2) is (id1) < (id2); the CASE 1/2/3 comments above annotate each branch.

Invariant (probe bounds safety). is_set/get_unit_of are dereferenced only after the position < m_bit_area_length test in CASE 3, reached only when m_bit_area_length != 0. The branch ordering is load-bearing — moving CASE 2 after CASE 3 would let a zero-length window reach is_set. The “all bits beyond m_bit_area_length are ALL_ACTIVE” invariant (check_valid, Ch. 2) makes the out-of-range case safe to call active.

4.4 compute_highest_completed_mvccid — top-down highest set bit

Section titled “4.4 compute_highest_completed_mvccid — top-down highest set bit”

Flattens the bit area into the highest completed MVCCID; build_mvcc_info (Ch. 5) MVCCID_FORWARDs the result by one.

// mvcc_active_tran::compute_highest_completed_mvccid -- src/transaction/mvcc_active_tran.cpp
if (m_bit_area_length == 0)
return m_bit_area_start_mvccid - 1; /* <- EMPTY: nothing completed; one below start */
// ... declarations condensed ...
for (highest_completed_bit_area = get_unit_of (m_bit_area_length - 1); /* <- scan units top-down */
highest_completed_bit_area >= m_bit_area; --highest_completed_bit_area)
{
bits = *highest_completed_bit_area;
if (bits == 0) continue; /* <- ALL_ACTIVE unit: keep going down */
for (bit_pos = 0, count_bits = UNIT_BIT_COUNT / 2; count_bits > 0; count_bits /= 2)
if (bits >= (1ULL << count_bits)) /* <- in-word search for highest set bit */
{ bit_pos += count_bits; bits >>= count_bits; }
highest_bit_position = bit_pos;
break;
}
if (highest_completed_bit_area < m_bit_area) /* <- ran off bottom: no set bit anywhere */
return m_bit_area_start_mvccid - 1;
else
return get_mvccid (units_to_bits (highest_completed_bit_area - m_bit_area) + highest_bit_position);

Empty window and not-found both yield m_bit_area_start_mvccid - 1. The found path scans units top-down (skipping ALL_ACTIVE) with a 6-step (log2 64) in-word binary search for the most-significant set bit (software clz).

4.5 compute_lowest_active_mvccid — bottom-up lowest clear bit

Section titled “4.5 compute_lowest_active_mvccid — bottom-up lowest clear bit”

The mirror: lowest still-active MVCCID. complete_mvcc (Ch. 8) calls it to advance the watermark.

// mvcc_active_tran::compute_lowest_active_mvccid -- src/transaction/mvcc_active_tran.cpp
if (m_long_tran_mvccids_length > 0 && m_long_tran_mvccids != NULL)
return m_long_tran_mvccids[0]; /* <- SHORTCUT: sorted, [0] is lowest active */
if (m_bit_area_length == 0)
return m_bit_area_start_mvccid; /* <- EMPTY: lowest active is window start */
unit_type *end_bit_area = get_unit_of (m_bit_area_length - 1);
size_t lowest_bit_pos = 0; // other declarations condensed
for (lowest_active_bit_area = m_bit_area; lowest_active_bit_area <= end_bit_area; ++lowest_active_bit_area)
{
bits = *lowest_active_bit_area;
if (bits == ALL_COMMITTED) /* <- whole word committed: skip 64 bits */
{ lowest_bit_pos += UNIT_BIT_COUNT; continue; }
for (bit_pos = 0, count_bits = UNIT_BIT_COUNT / 2; count_bits > 0; count_bits /= 2)
{
mask = (1ULL << count_bits) - 1;
if ((bits & mask) == mask) /* <- low count_bits all set: clear bit is higher */
{ bit_pos += count_bits; bits >>= count_bits; }
}
lowest_bit_pos += bit_pos;
break;
}
if (lowest_active_bit_area > end_bit_area) /* <- every tracked bit set: no active bit */
return get_mvccid (m_bit_area_length);
else
return get_mvccid (lowest_bit_pos);

The early returns cover the sorted-array [0] shortcut and the empty window; otherwise units are scanned upward (skipping ALL_COMMITTED by +64) and the inner loop mirrors §4.4 for the least-significant clear bit. Not-found returns get_mvccid(m_bit_area_length).

Invariant (sorted overflow array). The [0] shortcut is correct only because m_long_tran_mvccids is ascending — add_long_transaction asserts each new entry exceeds the last. If the order broke, the watermark would jump forward and VACUUM could reclaim still-visible versions.

4.6 Snapshot-layer wrapper — mvcc_is_id_in_snapshot

Section titled “4.6 Snapshot-layer wrapper — mvcc_is_id_in_snapshot”

Puts the two scalars in front of the probe so the scan runs only inside the active window.

// mvcc_is_id_in_snapshot -- src/transaction/mvcc.c (signature/assert elided)
if (MVCC_ID_PRECEDES (mvcc_id, snapshot->lowest_active_mvccid))
return false; /* <- below lowest active: committed before snapshot */
if (MVCC_ID_FOLLOW_OR_EQUAL (mvcc_id, snapshot->highest_completed_mvccid))
return true; /* <- at/above highest completed: active wrt snapshot */
return snapshot->m_active_mvccs.is_active (mvcc_id); /* <- gray zone: consult the bit area */

Only the gray band reaches the §4.3 probe. (MVCC_ID_FOLLOW_OR_EQUAL is >=; build_mvcc_info sets highest_completed_mvccid = compute_highest_completed_mvccid() + 1 — the MVCCID_FORWARD in Ch. 5.)

4.7 Snapshot-layer wrapper — mvcc_is_active_id

Section titled “4.7 Snapshot-layer wrapper — mvcc_is_active_id”

Answers “is this MVCCID active right now” (not against a frozen snapshot; dirty/delete paths in Ch. 7).

// mvcc_is_active_id -- src/transaction/mvcc.c (tdes lookup / asserts elided)
curr_mvcc_info = &tdes->mvccinfo;
if (MVCC_ID_PRECEDES (mvccid, curr_mvcc_info->recent_snapshot_lowest_active_mvccid))
return false; /* <- below recent lowest-active: committed */
if (logtb_is_current_mvccid (thread_p, mvccid))
return true; /* <- my own id or a sub-id: active to me */
return log_Gl.mvcc_table.is_active (mvccid); /* <- otherwise ask the shared table */

Both short-circuits answer without locking — below recent_snapshot_lowest_active_mvccid (committed), or matched by logtb_is_current_mvccid against own id/sub_ids (active to itself); otherwise it consults mvcctable::is_active (§4.8).

4.8 Table-level mvcctable::is_active — version-validated retry

Section titled “4.8 Table-level mvcctable::is_active — version-validated retry”

The shared active set lives in the lock-free history ring (m_trans_status_history, Ch. 8). A committing transaction can swap the live entry mid-scan, so is_active validates m_version around the probe.

// mvcctable::is_active -- src/transaction/mvcc_table.cpp (decls elided)
do
{
index = m_trans_status_history_position.load (); /* <- current ring slot */
version = m_trans_status_history[index].m_version.load ();/* <- snapshot the version BEFORE */
ret_active = m_trans_status_history[index].m_active_mvccs.is_active (mvccid); /* <- §4.3 probe */
}
while (version != m_trans_status_history[index].m_version.load ()); /* <- version moved? redo */
return ret_active;

Invariant (version stability). A read is trusted only if m_version matches before and after the probe. Writers publish a new status slot with an incremented m_version, then advance m_trans_status_history_position to it (next_trans_status_start stamps the bumped version on the next slot; next_tran_status_finish advances the position once the slot is built — both Ch. 8). A reader that latched index + version before the swap sees a mismatch on reload and retries. There is no retry cap — progress relies on writers being short versus readers.

  1. Set bit means committed, clear bit means active (inverse of the naive reading); is_active returns !is_set(position), with ALL_ACTIVE = 0 / ALL_COMMITTED = -1 as the extreme units.
  2. is_active has three ordered cases: below the window (scan sorted long-tran array, else committed), empty (active), inside/above (bit lookup, or active above the tracked length); the order keeps is_set from being dereferenced on an empty/out-of-range window.
  3. The two derivation scans mirror each other — highest set bit top-down vs. lowest clear bit bottom-up — both a 6-step in-word binary search with explicit not-found fallbacks (start - 1, get_mvccid(m_bit_area_length)).
  4. compute_lowest_active_mvccid short-circuits on m_long_tran_mvccids[0] because the overflow array is kept sorted ascending.
  5. The cached scalars front-run the probe: mvcc_is_id_in_snapshot touches the bit area only for the gray band between the two scalars; mvcc_is_active_id adds the recent_snapshot_lowest_active_mvccid cache and logtb_is_current_mvccid (own id + sub-ids).
  6. The shared table read is optimistic, not locked: mvcctable::is_active validates m_version around the probe, retrying on change.

A read-only transaction photographs the global commit state into a private structure, then reads against that photograph rather than freezing the system. This chapter dissects how the photograph is taken: the entry guard in logtb_get_mvcc_snapshot, the lock-free retry loop inside mvcctable::build_mvcc_info, the strict-order publish dance that keeps VACUUM honest, and which bytes fill each snapshot field. The meaning of a snapshot is in the companion cubrid-mvcc.md §“Snapshot isolation”; Ch. 6 consumes the structure built here; Ch. 4 explains the mvcc_active_tran layout it copies.

5.1 The four structures this chapter fills

Section titled “5.1 The four structures this chapter fills”

A snapshot is not one object. It is mvcc_snapshot nested inside mvcc_info, fed by a copy of one mvcc_trans_status slot, whose payload is a mvcc_active_tran bit-area. Figure 5-1 shows the containment.

flowchart LR
  subgraph TDES["log_tdes (per transaction)"]
    INFO["mvcc_info mvccinfo"]
  end
  INFO --> SNAP["mvcc_snapshot snapshot"]
  SNAP --> ACT["mvcc_active_tran m_active_mvccs"]
  subgraph GLOBAL["mvcctable (global)"]
    HIST["m_trans_status_history[index]<br/>mvcc_trans_status"]
  end
  HIST --> HACT["mvcc_active_tran m_active_mvccs"]
  HACT -. "copy_to THREAD_UNSAFE" .-> ACT

Figure 5-1. Snapshot construction copies one global history slot’s bit-area into the transaction-private snapshot.

mvcc_snapshot (mvcc.h) — the read-against photograph:

FieldRoleWhy it exists
lowest_active_mvccidAnything < this is committed; never needs a bit probeFast lower-bound cutoff during visibility (Ch. 6)
highest_completed_mvccidAnything >= this was born after the snapshot, hence invisibleFast upper-bound cutoff during visibility (Ch. 6)
m_active_mvccsBit-area: per-MVCCID committed/active status for the gap between the two boundsExact answer for IDs in the ambiguous middle range
snapshot_fncFunction pointer to the visibility predicate, set to mvcc_satisfies_snapshotLets callers invoke visibility polymorphically (dirty/snapshot variants share the call shape)
validTrue once fully built; checked by the entry guardAvoids rebuilding mid-transaction (RR/SR) and signals RC invalidation

mvcc_info (mvcc.h) — the per-transaction MVCC envelope:

FieldRoleWhy it exists
snapshotThe mvcc_snapshot built hereThe read photograph
idThis transaction’s own MVCCID (Ch. 3); MVCCID_NULL until first writeSelf-visibility checks
recent_snapshot_lowest_active_mvccidCached copy of the snapshot’s lowest activeA second fast cutoff used outside the snapshot struct (sibling predicates, Ch. 7)
sub_idsSub-transaction MVCCID stack (Ch. 10)Savepoint / nested-statement visibility
is_sub_activeTrue while a sub-transaction runsRoutes special paths (Ch. 10)

mvcc_trans_status (mvcc_table.hpp) — one global commit-state ring slot:

FieldRoleWhy it exists
m_active_mvccsThe authoritative live bit-area at the moment this slot was publishedThe data the snapshot copies
m_last_completed_mvccidLast MVCCID completed when slot was written; “just for info”Debugging / history forensics only
m_event_typeCOMMIT/ROLLBACK/SUBTRAN, “just for info”Debugging only
m_versionatomic<version_type> bumped each time the writer rewrites this slotThe lock-free guard: read it before and after the copy

mvcc_active_tran (mvcc_active_tran.hpp) — the bit-area itself. Deep semantics (bit packing, long-tran migration, BITAREA_MAX_SIZE = 500) are in Ch. 4; this chapter only exercises copy and reset (§5.4). All six fields:

FieldRoleWhy it exists
m_bit_areaPointer to the unit_type[] bit buffer; bit n = status of start + nThe packed committed/active map (Ch. 4)
m_bit_area_start_mvccidMVCCID mapped by bit 0 of m_bit_areaAnchors the bit range to an absolute MVCCID
m_bit_area_lengthLive length in bits; bits past it are all-active (0)Bounds the valid prefix; drives get_bit_area_memsize
m_long_tran_mvccidsOverflow array of still-active MVCCIDs older than startHolds long transactions that fell off the bit-area window
m_long_tran_mvccids_lengthCount of entries in the overflow arrayBounds the long-tran copy/scan
m_initializedTrue once buffers are allocatedcopy_to/check_valid assert on it before touching buffers

5.2 The entry guard: who is allowed a snapshot

Section titled “5.2 The entry guard: who is allowed a snapshot”

Every read path funnels through logtb_get_mvcc_snapshot. It is a guard, not a builder — it decides whether to (re)build at all.

// logtb_get_mvcc_snapshot -- src/transaction/log_tran_table.c
LOG_TDES *tdes = LOG_FIND_TDES (LOG_FIND_THREAD_TRAN_INDEX (thread_p));
if (!tdes->is_active_worker_transaction ())
{
return NULL; /* <- system trans read latest committed, no MVCC photo */
}
assert (tdes != NULL); /* <- in source: AFTER the early return */
THREAD_ENTRY *main_thread_p = NULL;
if (thread_p->m_px_orig_thread_entry != NULL)
{
main_thread_p = thread_get_main_thread (thread_p);
pthread_mutex_lock (&main_thread_p->m_px_lock_mutex); /* <- parallel-px workers share one snapshot */
}
if (!tdes->mvccinfo.snapshot.valid)
{
log_Gl.mvcc_table.build_mvcc_info (*tdes); /* <- only build when invalid */
}
if (main_thread_p != NULL)
{
pthread_mutex_unlock (&main_thread_p->m_px_lock_mutex);
}
return &tdes->mvccinfo.snapshot;

Branch accounting:

  1. Not an active worker transaction → return NULL. System transactions (VACUUM, checkpoint, recovery) have no MVCC photo; callers treat NULL as “see everything committed.”
  2. Parallel-px worker (m_px_orig_thread_entry != NULL) → take the main thread’s m_px_lock_mutex. px sub-threads share the main transaction’s tdes, so the lock serializes the valid check and build — two workers must not both call build_mvcc_info on one tdes.
  3. Snapshot valid → skip the build (common for RR/SR after the first statement); invalid → build, then unlock if step 2 locked. Always return the pointer.

Invariant — one build per validity epoch. build_mvcc_info runs only while snapshot.valid == false. RR/SR stay valid for the whole transaction (built once); RC’s logtb_invalidate_snapshot_data sets valid = false per statement (§5.6). Building while valid == true would overwrite a live snapshot mid-scan and corrupt in-flight visibility decisions.

5.3 build_mvcc_info: the lock-free copy with version re-check

Section titled “5.3 build_mvcc_info: the lock-free copy with version re-check”

mvcctable::build_mvcc_info must copy a global ring slot that the commit path may be rewriting concurrently, without taking a lock. It does so with an optimistic version re-check. Figure 5-2 is the authoritative control flow; the notes below add only what the diagram cannot show.

flowchart TD
  A["initialize snapshot bit-area"] --> B["tx_lowest_active =<br/>load m_transaction_lowest_visible[tran_index]"]
  B --> C{"MVCCID_IS_VALID<br/>tx_lowest_active?"}
  C -- "no = not yet published" --> D["set my slot = MVCCID_ALL_VISIBLE<br/>then read crt_status_lowest_active<br/>then store it back -- strict order"]
  C -- "yes = already published" --> E["crt_status_lowest_active =<br/>load m_current_status_lowest_active"]
  D --> F["index = history_position.load"]
  E --> F
  F --> G["ver = slot.m_version.load"]
  G --> H["slot.m_active_mvccs.copy_to<br/>dest, THREAD_UNSAFE"]
  H --> I["logtb_load_global_statistics_to_tran"]
  I --> J{"ver == slot.m_version.load?"}
  J -- "yes = stable" --> K["break"]
  J -- "no = writer raced us" --> L["dest.reset_active_transactions"]
  L --> M["retry_count++"]
  M --> G
  K --> N["check_valid; fill scalar fields"]

Figure 5-2. build_mvcc_info principal control flow. The non-fatal statistics-load error path (node I) is described in prose below; it sets an error and continues without breaking the loop.

The lowest-visible publish dance — the documented VACUUM race. The function’s subtlest code. When !MVCCID_IS_VALID (tx_lowest_active) (no lowest-visible value published yet), it does three atomics in strict order: publish the sentinel MVCCID_ALL_VISIBLE into my slot, read the global lowest active, store that global value back into my slot.

// mvcctable::build_mvcc_info -- src/transaction/mvcc_table.cpp
oldest_active_set (m_transaction_lowest_visible_mvccids[tdes.tran_index],
tdes.tran_index, MVCCID_ALL_VISIBLE, oldest_active_event::BUILD_MVCC_INFO);
/* Is important that between next two code lines to not have delays (e.g. instructions adding). */
crt_status_lowest_active = oldest_active_get (m_current_status_lowest_active_mvccid, 0,
oldest_active_event::BUILD_MVCC_INFO);
oldest_active_set (m_transaction_lowest_visible_mvccids[tdes.tran_index],
tdes.tran_index, crt_status_lowest_active, oldest_active_event::BUILD_MVCC_INFO);

The in-source comment walks a 5-step scenario the sentinel defeats. The key steps, quoted verbatim:

  • the transaction having global lowest active MVCCID commits, so the global value is updated (advanced)
  • the VACUUM thread computes the MVCCID threshold as the updated global lowest active MVCCID
  • the snapshot thread resumes and p_transaction_lowest_active_mvccid is set to initial value of global lowest active MVCCID
  • the VACUUM thread computes the threshold again and found a value (initial global lowest active MVCCID) less than the previous threshold

That is: VACUUM computes the threshold from the advanced global lowest, then the suspended snapshot thread resumes and stores its older initial value, so VACUUM’s next threshold comes out less than the previous one — moving the watermark backward. Setting the sentinel makes compute_oldest_visible_mvccid (Ch. 9) wait for this slot. When tx_lowest_active is already valid (the common retry path), the dance is skipped and only the single m_current_status_lowest_active_mvccid load runs.

Snapshotting the history slot. m_trans_status_history_position always points at the current (newest) slot; the commit path advances it (Ch. 8). The index is read once, then that slot’s m_version is read before the copy.

// mvcctable::build_mvcc_info -- src/transaction/mvcc_table.cpp
index = m_trans_status_history_position.load ();
assert (index < HISTORY_MAX_SIZE);
const mvcc_trans_status &trans_status = m_trans_status_history[index];
trans_status_version = trans_status.m_version.load (); /* <- version BEFORE copy */
trans_status.m_active_mvccs.copy_to (tdes.mvccinfo.snapshot.m_active_mvccs,
mvcc_active_tran::copy_safety::THREAD_UNSAFE);

THREAD_UNSAFE is deliberate: check_valid cannot run mid-copy on a slot that may mutate under us; validity is verified only after the loop confirms stability. logtb_load_global_statistics_to_tran runs next; on error it sets ER_MVCC_CANT_GET_SNAPSHOT but does not abort the build or break the loop.

The version re-check — the lock-free pivot.

// mvcctable::build_mvcc_info -- src/transaction/mvcc_table.cpp
if (trans_status_version == trans_status.m_version.load ()) /* <- version AFTER copy */
{
break; /* <- writer did not touch slot; copy is consistent */
}
else
{
tdes.mvccinfo.snapshot.m_active_mvccs.reset_active_transactions (); /* <- discard torn copy */
}

Invariant — version-stable copy. The bit-area handed to the caller equals exactly one published mvcc_trans_status image, because the same m_version was observed before and after copy_to. The commit path bumps m_version around every slot rewrite (Ch. 8), so any concurrent write is detected and forces a retry; on detection reset_active_transactions zeroes the destination so a stale tail cannot poison the next attempt. Without the re-check, a snapshot could mix a pre-commit bit-area with post-commit bounds, and visibility (Ch. 6) would answer inconsistently for the racing MVCCID.

Scalar fill after the loop.

// mvcctable::build_mvcc_info -- src/transaction/mvcc_table.cpp
tdes.mvccinfo.snapshot.m_active_mvccs.check_valid (); /* <- now safe to validate */
highest_completed_mvccid = tdes.mvccinfo.snapshot.m_active_mvccs.compute_highest_completed_mvccid ();
MVCCID_FORWARD (highest_completed_mvccid); /* <- exclusive upper bound */
tdes.mvccinfo.recent_snapshot_lowest_active_mvccid = crt_status_lowest_active;
tdes.mvccinfo.snapshot.snapshot_fnc = mvcc_satisfies_snapshot;
tdes.mvccinfo.snapshot.lowest_active_mvccid = crt_status_lowest_active;
tdes.mvccinfo.snapshot.highest_completed_mvccid = highest_completed_mvccid;
tdes.mvccinfo.snapshot.valid = true; /* <- LAST; publishes the snapshot */

compute_highest_completed_mvccid (Ch. 4) returns m_bit_area_start_mvccid - 1 if empty; MVCCID_FORWARD advances it into the exclusive upper bound. Both lowest fields take crt_status_lowest_active; valid = true is set last so a peer never observes a half-filled snapshot. Perf accounting then adds snapshot_retry_count - 1 to PSTAT_LOG_SNAPSHOT_RETRY_COUNTERS (the “minus one” drops the mandatory first pass, making the metric contention, not work) and elapsed time to PSTAT_LOG_SNAPSHOT_TIME_COUNTERS.

The copy is a sized memcpy with a shrink-clear and a long-transaction tail, gated by the safety flag.

// mvcc_active_tran::copy_to -- src/transaction/mvcc_active_tran.cpp
assert (m_initialized && dest.m_initialized);
if (safety == copy_safety::THREAD_SAFE)
{
check_valid (); /* <- source validated only when safe */
dest.check_valid ();
}
size_t new_bit_area_memsize = get_bit_area_memsize (); /* <- source live bytes */
size_t old_bit_area_memsize = dest.get_bit_area_memsize (); /* <- dest's previous live bytes */
char *dest_bit_area = (char *) dest.m_bit_area;
if (new_bit_area_memsize > 0)
{
std::memcpy (dest_bit_area, m_bit_area, new_bit_area_memsize);
}
if (old_bit_area_memsize > new_bit_area_memsize)
{
/* <- dest was longer last time; zero the now-unused tail */
std::memset (dest_bit_area + new_bit_area_memsize, 0, old_bit_area_memsize - new_bit_area_memsize);
}
if (m_long_tran_mvccids_length > 0)
{
std::memcpy (dest.m_long_tran_mvccids, m_long_tran_mvccids, get_long_tran_memsize ());
}
dest.m_bit_area_start_mvccid = m_bit_area_start_mvccid;
dest.m_bit_area_length = m_bit_area_length;
dest.m_long_tran_mvccids_length = m_long_tran_mvccids_length;
if (safety == copy_safety::THREAD_SAFE)
{
dest.check_valid ();
}

The five branches: THREAD_SAFE brackets the copy with check_valid calls (clone wrappers, §5.5, over quiescent sources) while THREAD_UNSAFE skips them (build_mvcc_info over a live slot, relying on the re-check); new_bit_area_memsize > 0 skips the copy on a fresh empty system; old_bit_area_memsize > new_bit_area_memsize zeroes the leftover tail when the destination held a longer area; m_long_tran_mvccids_length > 0 copies the overflow array (Ch. 4) before the scalar assignments mirror the metadata.

Invariant — the tail is always all-active (zero). check_valid asserts in debug builds that every bit past m_bit_area_length is ALL_ACTIVE (0), and that long-tran MVCCIDs are strictly ordered and precede m_bit_area_start_mvccid. The shrink-clear above and reset_active_transactions keep this true. A stale committed tail bit would make compute_highest_completed_mvccid report an MVCCID outside the active set, corrupting the upper bound.

reset_active_transactions is the blunt reset used on a torn copy:

// mvcc_active_tran::reset_active_transactions -- src/transaction/mvcc_active_tran.cpp
std::memset (m_bit_area, 0, BITAREA_MAX_MEMSIZE); /* <- zero the WHOLE max buffer, not just live part */
m_bit_area_length = 0;
m_long_tran_mvccids_length = 0;

It zeroes the entire BITAREA_MAX_MEMSIZE and drops both lengths to zero, leaves m_bit_area_start_mvccid untouched, and does not call check_valid (the caller is mid-retry, not yet consistent).

mvcc_snapshot::copy_to and mvcc_info::copy_to are not on the hot path — they clone an already-built snapshot (e.g. parent → sub-transaction, Ch. 10), using THREAD_SAFE because the source is a finished, non-mutating local. mvcc_snapshot::copy_to calls dest.m_active_mvccs.initialize (), then copy_to (..., THREAD_SAFE), then mirrors lowest_active_mvccid, highest_completed_mvccid, snapshot_fnc, and valid — every field, so the clone is usable without a rebuild. mvcc_info::copy_to calls this->snapshot.copy_to (dest.snapshot) then layers the envelope fields:

// mvcc_info::copy_to -- src/transaction/mvcc.c
dest.id = this->id;
dest.recent_snapshot_lowest_active_mvccid = this->recent_snapshot_lowest_active_mvccid;
dest.sub_ids = this->sub_ids; /* <- std::vector deep copy */
dest.is_sub_active = this->is_sub_active;

build_mvcc_info is isolation-agnostic. The acquisition timing lives at the call sites and at logtb_invalidate_snapshot_data, not here.

IsolationSnapshot takenMechanism
TRAN_READ_COMMITTEDOnce per statementlogtb_invalidate_snapshot_data sets valid = false at each statement boundary; next logtb_get_mvcc_snapshot rebuilds
TRAN_REPEATABLE_READOnce per transactionFirst logtb_get_mvcc_snapshot builds; valid stays true (invalidate is a no-op)
TRAN_SERIALIZABLEOnce per transactionSame as RR for snapshot acquisition

The guard is logtb_invalidate_snapshot_data:

// logtb_invalidate_snapshot_data -- src/transaction/log_tran_table.c
if (tdes == NULL || tdes->isolation >= TRAN_REPEATABLE_READ)
{
return NO_ERROR; /* <- RR/SR keep their snapshot across statements */
}
if (tdes->mvccinfo.snapshot.valid)
{
tdes->mvccinfo.snapshot.valid = false; /* <- RC: drop it so next read rebuilds */
logtb_tran_reset_count_optim_state (thread_p);
}

RC sees transactions that committed between its statements; RR/SR are pinned to the first statement’s photograph. The >= TRAN_REPEATABLE_READ test relies on the enum ordering RC < RR < SR. None of this touches build_mvcc_info.

  1. The entry guard gates everything. logtb_get_mvcc_snapshot returns NULL for system transactions, serializes parallel-px workers on m_px_lock_mutex, and builds only when valid == false.
  2. Construction is lock-free via version re-check. build_mvcc_info reads m_version before and after a THREAD_UNSAFE copy_to; equal → coherent (break), unequal → torn (reset_active_transactions and retry).
  3. The lowest-visible dance prevents a backward VACUUM watermark. Write the MVCCID_ALL_VISIBLE sentinel first, read the global lowest, store it back with no intervening work, so VACUUM (Ch. 9) waits rather than undercut it.
  4. THREAD_UNSAFE vs THREAD_SAFE is about the source. The build path copies a live mutating slot (skips check_valid, relies on the re-check); clone paths copy quiescent locals (validate).
  5. The bit-area tail must stay zero. copy_to’s shrink-clear, reset_active_transactions’s full-buffer memset, and check_valid’s asserts keep no stale committed bit past m_bit_area_length.
  6. Scalar fields fill in a fixed order, valid last. highest_completed = MVCCID_FORWARD(...) is the exclusive upper bound; both lowest fields take the global lowest; snapshot_fnc = mvcc_satisfies_snapshot.
  7. Isolation timing is external. RC rebuilds per statement (via logtb_invalidate_snapshot_data), RR/SR build once; the builder is isolation-agnostic.

title: “CUBRID MVCC Detail — Chapter 6: Visibility Evaluation” category: code-analysis project: cubrid module: mvcc sources: [raw/code-analysis/cubrid/storage/mvcc/] references: [src/transaction/mvcc.c, src/transaction/mvcc.h] summary: “Branch-complete dissection of mvcc_satisfies_snapshot — the verdict for every insert/delete state, the insert/delete asymmetry, and the perfmon classification leaves.” created: 2026-06-07 updated: 2026-06-07 tags: [mvcc, visibility, snapshot, perfmon, detail]

Section titled “title: “CUBRID MVCC Detail — Chapter 6: Visibility Evaluation” category: code-analysis project: cubrid module: mvcc sources: [raw/code-analysis/cubrid/storage/mvcc/] references: [src/transaction/mvcc.c, src/transaction/mvcc.h] summary: “Branch-complete dissection of mvcc_satisfies_snapshot — the verdict for every insert/delete state, the insert/delete asymmetry, and the perfmon classification leaves.” created: 2026-06-07 updated: 2026-06-07 tags: [mvcc, visibility, snapshot, perfmon, detail]”

The MVCC apparatus of Chapters 2–5 exists to answer one yes/no question, asked millions of times per second: given the snapshot this transaction reads under and the MVCC header on a record version, is this the version I should see? This chapter dissects mvcc_satisfies_snapshot branch by branch — every conditional, return, and perfmon leaf. For the conceptual framing (snapshot as a half-open MVCCID interval, what “active” means, the committed-before / active / committed-after model) see the companion cubrid-mvcc.md, §“Snapshot semantics” and §“Visibility”; this chapter does not re-derive it.

6.1 The three inputs and the three outputs

Section titled “6.1 The three inputs and the three outputs”

mvcc_satisfies_snapshot is a pure decision function over the thread (who am I), a record header, and a snapshot — no side effects, unlike mvcc_satisfies_dirty (Ch. 7) which mutates the snapshot. The verdict is one of three enum values:

// mvcc_satisfies_snapshot_result -- src/transaction/mvcc.h
enum mvcc_satisfies_snapshot_result
{
TOO_OLD_FOR_SNAPSHOT, /* not visible, deleted by me or deleted by inactive transaction */
SNAPSHOT_SATISFIED, /* is visible and valid */
TOO_NEW_FOR_SNAPSHOT /* not visible, inserter is still active.
* ... check previous versions in log ... */
};
VerdictMeaningWhat the caller does next
SNAPSHOT_SATISFIEDThis exact version is the one the reader sees.Return the record.
TOO_NEW_FOR_SNAPSHOTBorn too late; an older version may be visible.Walk prev_version_lsa back one link and re-evaluate (Ch. 3).
TOO_OLD_FOR_SNAPSHOTAlready dead from the reader’s view; no older version can save it.Stop — the reader sees nothing on this chain head.

The directionality invariant. TOO_NEW points backward along the version chain; TOO_OLD is terminal. A delete committed within the snapshot can never be undone by an older version — that version is the same logical row that was deleted. Hence only TOO_NEW’s enum comment mentions “check previous versions in log.” §6.7 gives the worked example.

mvcc_satisfies_snapshot reads two structs: mvcc_rec_header (per-version stamps) and mvcc_snapshot (the visibility frontier). Both are defined in full in Chapter 1; the field tables below cover every member.

mvcc_rec_header (src/transaction/mvcc.h) — fields mvcc_flag:8, repid:24, chn, mvcc_ins_id, mvcc_del_id, prev_version_lsa:

FieldRole hereWhy it exists
mvcc_flagRead via MVCC_IS_HEADER_DELID_VALID and MVCC_IS_FLAG_SET(.., VALID_INSID)which stamps are present.A version may lack an insert stamp (vacuum stripped it) or delete stamp (never deleted).
repid / chnNot read.Representation id / cache-coherency number — record format and client cache.
mvcc_ins_idInserter’s MVCCID; compared vs snapshot and “me.”The transaction whose commit makes this version appear.
mvcc_del_idDeleter’s MVCCID; compared vs snapshot and “me.”The transaction whose commit makes this version disappear.
prev_version_lsaNot read inside; the link the caller follows on TOO_NEW.Threads the version chain backward (Ch. 3).

The flag bits live in object_representation_constants.h: OR_MVCC_FLAG_VALID_INSID = 0x01, OR_MVCC_FLAG_VALID_DELID = 0x02. MVCC_IS_HEADER_DELID_VALID(h) is MVCC_IS_FLAG_SET (h, OR_MVCC_FLAG_VALID_DELID) && MVCCID_IS_VALID (MVCC_GET_DELID (h)) — the flag set and the id not MVCCID_NULL.

mvcc_snapshot (src/transaction/mvcc.h) — fields lowest_active_mvccid, highest_completed_mvccid, m_active_mvccs, snapshot_fnc, valid, plus member functions (ctor, reset, deleted operator=, copy_to):

FieldRole hereWhy it exists
lowest_active_mvccidmvcc_is_id_in_snapshot: id strictly below is committed-before (visible).Lower bound of the in-doubt band; fast reject.
highest_completed_mvccidmvcc_is_id_in_snapshot: id at-or-above is active at snapshot time (in-snapshot).Upper bound; fast accept.
m_active_mvccsBit-area / cached-scalar probe for ids inside the band (Ch. 4).Exact membership test for concurrent ids.
snapshot_fncNot read — the pointer that selected this function.Plugs in snapshot / dirty / delete polymorphically.
validNot read (caller guarantees a built snapshot).Marks a snapshot as constructed (Ch. 5).
member functionsNot invoked here.Construction / copy helpers (Ch. 5).
flowchart TD
  H["mvcc_rec_header<br/>mvcc_flag, mvcc_ins_id, mvcc_del_id"]
  S["mvcc_snapshot<br/>lowest_active_mvccid, highest_completed_mvccid, m_active_mvccs"]
  F["mvcc_satisfies_snapshot"]
  H -->|"DELID_VALID? INSID flag? ins_id / del_id"| F
  S -->|"is_id_in_snapshot(ins_id / del_id)"| F
  TH["thread_p<br/>logtb_is_current_mvccid -> me?"]
  TH -->|"INSERTED_BY_ME / DELETED_BY_ME"| F
  F --> V{"verdict"}
  V --> R1["SNAPSHOT_SATISFIED"]
  V --> R2["TOO_NEW_FOR_SNAPSHOT"]
  V --> R3["TOO_OLD_FOR_SNAPSHOT"]

Figure 6-1 — Inputs to the verdict. Header supplies stamps and flags; snapshot supplies the frontier; thread supplies identity.

The first branch is the only structural fork in the function:

// mvcc_satisfies_snapshot -- src/transaction/mvcc.c
assert (rec_header != NULL && snapshot != NULL);
if (!MVCC_IS_HEADER_DELID_VALID (rec_header))
{
/* The record is not deleted */
// ... insert-side ladder (§6.4) ...
}
else
{
/* The record is deleted */
// ... delete-side ladder (§6.5) ...
}

A version is “not deleted” when VALID_DELID is clear or mvcc_del_id == MVCCID_NULL — both fold into MVCC_IS_HEADER_DELID_VALID. The not-deleted side asks only “did the inserter become visible to me?”; the deleted side reasons about inserter and deleter, so its ladder is longer. Both lean on mvcc_is_id_in_snapshot, the band test for one MVCCID:

// mvcc_is_id_in_snapshot -- src/transaction/mvcc.c (body, condensed)
if (MVCC_ID_PRECEDES (mvcc_id, snapshot->lowest_active_mvccid))
return false; /* below band -> committed-before, NOT in-snapshot */
if (MVCC_ID_FOLLOW_OR_EQUAL (mvcc_id, snapshot->highest_completed_mvccid))
return true; /* at/above band -> not completed, IS in-snapshot */
return snapshot->m_active_mvccs.is_active (mvcc_id); /* inside band -> exact probe (Ch. 4) */

“In snapshot” means this MVCCID was still active, or had not yet started, when the snapshot was taken — its effects must be invisible. The macros MVCC_IS_REC_INSERTER_IN_SNAPSHOT / MVCC_IS_REC_DELETER_IN_SNAPSHOT are thin wrappers feeding mvcc_ins_id / mvcc_del_id into this helper.

// mvcc_satisfies_snapshot -- src/transaction/mvcc.c (not-deleted branch, condensed)
if (!MVCC_IS_HEADER_DELID_VALID (rec_header))
{
if (!MVCC_IS_FLAG_SET (rec_header, OR_MVCC_FLAG_VALID_INSID))
{ /* ... perfmon ... */ return SNAPSHOT_SATISFIED; } /* (a) no insert stamp -> all-visible */
else if (MVCC_IS_REC_INSERTED_BY_ME (thread_p, rec_header))
{ /* ... perfmon ... */ return SNAPSHOT_SATISFIED; } /* (b) I inserted it */
else if (MVCC_IS_REC_INSERTER_IN_SNAPSHOT (thread_p, rec_header, snapshot))
{ /* ... perfmon ... */ return TOO_NEW_FOR_SNAPSHOT; } /* (c) inserter still in-snapshot */
else
{ /* ... perfmon ... */ return SNAPSHOT_SATISFIED; } /* (d) inserter committed before snapshot */
}

Figure 6-2 encodes the four short-circuit cases: (a) a stripped insert stamp means vacuum already declared the version all-visible; (b) MVCC_IS_REC_INSERTED_BY_ME reaches logtb_is_current_mvccid, which also matches the transaction’s own sub-transaction ids (Ch. 10); (c) the lone TOO_NEW path; (d) the by-elimination residue, the only leaf with sub-classification (§6.8).

Insert-side invariant. On the not-deleted branch the verdict is TOO_NEW iff the inserter is concurrent (case c); otherwise SNAPSHOT_SATISFIED. Enforced by ordering the unconditional-visible tests (a, b) before the in-snapshot test (c) — identity before frontier. If (c) fired for an inserter that had committed before the snapshot, the reader would chase a previous version and surface a stale row; correctness rests on mvcc_is_id_in_snapshot being exact inside the band (Ch. 4).

flowchart TD
  A{"VALID_INSID flag set?"}
  A -->|"no"| AV["SNAPSHOT_SATISFIED<br/>INSERTED_VACUUMED / VISIBLE"]
  A -->|"yes"| B{"INSERTED_BY_ME?"}
  B -->|"yes"| BV["SNAPSHOT_SATISFIED<br/>INSERTED_CURR_TRAN / VISIBLE"]
  B -->|"no"| C{"inserter IN_SNAPSHOT?"}
  C -->|"yes"| CV["TOO_NEW_FOR_SNAPSHOT<br/>INSERTED_OTHER_TRAN / INVISIBLE"]
  C -->|"no"| DV["SNAPSHOT_SATISFIED<br/>INSERTED_COMMITED[_LOST] / VISIBLE"]

Figure 6-2 — Not-deleted ladder. Only the in-snapshot inserter yields TOO_NEW.

When MVCC_IS_HEADER_DELID_VALID is true the version carries a committed-or-pending delete stamp, so both inserter and deleter matter.

// mvcc_satisfies_snapshot -- src/transaction/mvcc.c (deleted branch, condensed)
else
{
if (MVCC_IS_REC_DELETED_BY_ME (thread_p, rec_header))
{ /* ... perfmon ... */ return TOO_OLD_FOR_SNAPSHOT; } /* (e) I deleted it */
else if (MVCC_IS_REC_INSERTER_IN_SNAPSHOT (thread_p, rec_header, snapshot))
{
/* !!TODO: Is this check necessary? It seems that if inserter is active, then so will be the deleter (actually
* they will be the same). It only adds an extra-check in a function frequently called.
*/
/* ... perfmon ... */ return TOO_NEW_FOR_SNAPSHOT; /* (f) inserter in-snapshot */
}
else if (MVCC_IS_REC_DELETER_IN_SNAPSHOT (thread_p, rec_header, snapshot))
{ /* ... perfmon ... */ return SNAPSHOT_SATISFIED; } /* (g) deleter in-snapshot -> not yet visible */
else
{ /* ... perfmon ... */ return TOO_OLD_FOR_SNAPSHOT; } /* (h) deleter committed before snapshot */
}

Figure 6-3 encodes the cases: (e) terminal — chasing back would resurrect my own delete; (f) a concurrent transaction inserted then deleted before committing, so the version never existed for the reader (TOO_NEW); (g) the lone visible deleted-leaf — inserter committed-before but the deleter is still concurrent, so the row shows as if undeleted; (h) fall-through — both stamps committed-before (TOO_OLD, terminal). The !!TODO on (f) is discussed in §6.10.

Delete-side invariant. On the deleted branch, SNAPSHOT_SATISFIED only in (g) (deleter concurrent); TOO_NEW only in (f) (inserter concurrent); every other case is TOO_OLD. Ordering: (e) before (f) decides a self-delete by identity, not frontier; (f) before (g) because an in-snapshot inserter is a strictly stronger reason to look backward. Testing (g) first would let an insert-and-delete by one concurrent transaction wrongly return SNAPSHOT_SATISFIED, surfacing a row that never committed.

flowchart TD
  E{"DELETED_BY_ME?"}
  E -->|"yes"| EV["TOO_OLD_FOR_SNAPSHOT<br/>DELETED_CURR_TRAN / INVISIBLE"]
  E -->|"no"| F{"inserter IN_SNAPSHOT?"}
  F -->|"yes"| FV["TOO_NEW_FOR_SNAPSHOT<br/>INSERTED_DELETED / INVISIBLE"]
  F -->|"no"| G{"deleter IN_SNAPSHOT?"}
  G -->|"yes"| GV["SNAPSHOT_SATISFIED<br/>DELETED_OTHER_TRAN / VISIBLE"]
  G -->|"no"| HV["TOO_OLD_FOR_SNAPSHOT<br/>DELETED_COMMITTED[_LOST] / INVISIBLE"]

Figure 6-3 — Deleted ladder. Visible only when the deleter is still concurrent.

6.6 How TOO_NEW drives the version-chain walk

Section titled “6.6 How TOO_NEW drives the version-chain walk”

mvcc_satisfies_snapshot never touches prev_version_lsa; it only emits TOO_NEW_FOR_SNAPSHOT, telling the caller to walk it (the link is set at insert/update time, Ch. 3). The scan/fetch layer (heap_file.c) interprets the verdict: SNAPSHOT_SATISFIED returns the version; TOO_OLD means the chain head is dead; TOO_NEW dereferences prev_version_lsa, loads the prior header, and re-calls. The walk terminates because each link’s inserter MVCCID is strictly older than the prior link’s, so eventually an in-snapshot test fails (case d) or the chain ends at a null LSA. Only cases (c) and (f) produce TOO_NEW.

6.7 The insert/delete asymmetry — a worked example

Section titled “6.7 The insert/delete asymmetry — a worked example”

Reader R holds a snapshot with frontier [lowest=100, highest=100) — transaction 100 is the only active id R knows of, taken just before 100 did anything. Four scenarios against the §6.4–§6.5 ladders:

ScenarioHeaderHelper resultsCaseVerdict
Committed T50 insert, never deletedins=50, no DELIDinserter not in-snapshot (50 < 100)(d)SNAPSHOT_SATISFIED
Concurrent T100 insert, uncommittedins=100, no DELIDinserter in-snapshot (100 >= 100)(c)TOO_NEW -> walk prev
T50 insert, concurrent T100 deleteins=50, del=100inserter not in-snapshot; deleter in-snapshot(g)SNAPSHOT_SATISFIED
T50 insert, committed T60 deleteins=50, del=60neither in-snapshot(h)TOO_OLD

Rows 2 and 3 are the asymmetry: a too-new insert sends the reader backward for an older version; a too-new delete keeps the row visible (a delete uncommitted from R’s view has not happened). An insert MVCCID gates a version’s appearance; a delete MVCCID gates its disappearance.

Every leaf, when tracking is on, calls perfmon_mvcc_snapshot(thread_p, snapshot_type, rec_type, visibility)snapshot_type is always PERF_SNAPSHOT_SATISFIES_SNAPSHOT, rec_type classifies why, visibility is the outcome. The call is guarded by perfmon_is_perf_tracking_and_active (PERFMON_ACTIVATION_FLAG_MVCC_SNAPSHOT), so the hot path is free when off. The rec_type bucket per leaf is PERF_SNAPSHOT_RECORD_<suffix>, where Figures 6-2/6-3 already give the eight suffixes and visibility bits: (a) INSERTED_VACUUMED, (b) INSERTED_CURR_TRAN, (c) INSERTED_OTHER_TRAN, (d) INSERTED_COMMITED, (e) DELETED_CURR_TRAN, (f) INSERTED_DELETED, (g) DELETED_OTHER_TRAN, (h) DELETED_COMMITTED. Two leaves split a step further: (d) and (h) each have a _LOST variant — INSERTED_COMMITED_LOST (d’) and DELETED_COMMITTED_LOST (h’).

The _LOST variants are the interesting ones. In the committed-inserter leaf (d) the code asks an extra question before choosing between (d) and (d’):

// mvcc_satisfies_snapshot -- src/transaction/mvcc.c (case d, perfmon detail)
if (rec_header->mvcc_ins_id != MVCCID_ALL_VISIBLE && vacuum_is_mvccid_vacuumed (rec_header->mvcc_ins_id))
{
perfmon_mvcc_snapshot (thread_p, PERF_SNAPSHOT_SATISFIES_SNAPSHOT,
PERF_SNAPSHOT_RECORD_INSERTED_COMMITED_LOST, PERF_SNAPSHOT_VISIBLE);
}
else
{
perfmon_mvcc_snapshot (thread_p, PERF_SNAPSHOT_SATISFIES_SNAPSHOT,
PERF_SNAPSHOT_RECORD_INSERTED_COMMITED, PERF_SNAPSHOT_VISIBLE);
}

vacuum_is_mvccid_vacuumed (in vacuum.h) returns true when the MVCCID is older than vacuum’s oldest-visible watermark (Ch. 9): vacuum was entitled to strip this stamp but the version still carries it. So _LOST counts versions still wearing a stamp vacuum should have removed — a vacuum-lag measure. The != MVCCID_ALL_VISIBLE guard skips the all-visible sentinel (never a real id). The delete-side _LOST leaf (h’) is the symmetric probe on mvcc_del_id, sans guard since a delete stamp is never the sentinel.

6.9 The visibility invariant, stated precisely

Section titled “6.9 The visibility invariant, stated precisely”

Visibility invariant. For a built (non-dirty) snapshot S and header H, mvcc_satisfies_snapshot returns SNAPSHOT_SATISFIED iff H’s inserter is visible to S (committed-before, or me, or vacuum-stripped) AND H’s deleter is not-yet-visible (no delete stamp, or deleter concurrent); TOO_NEW_FOR_SNAPSHOT exactly when the inserter is concurrent (regardless of delete state); TOO_OLD_FOR_SNAPSHOT exactly when the inserter is visible but the deleter is also visible-or-me. The per-side ordering arguments in §6.4 and §6.5 enforce it; any reordering produces phantom rows (visible too-new versions) or vanished rows (invisible committed-before versions).

  • The !!TODO on case (f). The comment doubts the inserter-in-snapshot test is reachable independently of the deleter test. Sound for the common case (one transaction inserts and deletes), but not obviously safe to remove: a row inserted by concurrent T_a then deleted by concurrent T_b (a != b, both in-snapshot) would, without (f), fall through to (g) and return SNAPSHOT_SATISFIED — exposing a row whose insert never committed. Whether such a header can arise depends on locking rules outside mvcc.c. Treat the branch as load-bearing.
  • No side effects. mvcc_satisfies_dirty (same file) mutates snapshot->lowest_active_mvccid / highest_completed_mvccid; snapshot_fnc selects which runs. Ch. 7 covers the dirty/delete/vacuum siblings.
  1. mvcc_satisfies_snapshot is pure and side-effect-free, with one top split on MVCC_IS_HEADER_DELID_VALID yielding SNAPSHOT_SATISFIED, TOO_NEW_FOR_SNAPSHOT, or TOO_OLD_FOR_SNAPSHOT.
  2. Not-deleted ladder, four ordered cases: no insert stamp (visible), inserted-by-me (visible), inserter-in-snapshot (TOO_NEW), fall-through committed-before (visible). Only the in-snapshot inserter looks backward.
  3. Deleted ladder, four ordered cases: deleted-by-me (TOO_OLD), inserter-in-snapshot (TOO_NEW), deleter-in-snapshot (visible), fall-through (TOO_OLD). Visible only when the deleter is still concurrent.
  4. The asymmetry: a too-new insert hides this version (look older); a too-new delete keeps it visible. TOO_NEW walks prev_version_lsa backward; TOO_OLD is terminal.
  5. Every leaf reports a PERF_SNAPSHOT_* bucket; the _LOST buckets fire when vacuum_is_mvccid_vacuumed says a stamp should already be gone — a vacuum-lag measure, guarded against MVCCID_ALL_VISIBLE on the insert side.
  6. Visibility invariant: visible iff inserter visible AND deleter not-yet-visible; TOO_NEW iff inserter concurrent; TOO_OLD iff inserter visible but deleter visible-or-me. Case ordering (identity before frontier) enforces it.
  7. The !!TODO on the deleted-branch inserter test is an open redundancy question, not dead code: removing it risks exposing rows whose insert never committed when insert and delete come from two distinct concurrent transactions.

Chapter 7: Sibling Predicates for Delete Dirty and Vacuum

Section titled “Chapter 7: Sibling Predicates for Delete Dirty and Vacuum”

mvcc_satisfies_snapshot (Chapter 6) answers reads. But the same MVCC_REC_HEADER — the four-field on-record stamp from Chapter 3 — is interrogated by three other callers that need a different verdict from the same bytes: a writer about to delete/update needs liveness (any in-progress deleter to block behind?); a dirty read (mvcc_satisfies_dirty) needs in-progress writers’ effects and reports which MVCCID it treated as active; vacuum needs to know whether a dead version is old enough that no running transaction can still need it.

Each tilts one comparison differently — frozen snapshot membership vs. raw liveness vs. a static watermark, the axis carried by the 7.7 contrast table. The header-decode idioms (MVCC_IS_HEADER_DELID_VALID, MVCC_IS_FLAG_SET) are assumed from Chapters 3 and 6.

7.1 The two comparison primitives: ACTIVE vs. IN-SNAPSHOT

Section titled “7.1 The two comparison primitives: ACTIVE vs. IN-SNAPSHOT”

Everything hinges on two helper-macro families. The _ACTIVE family wraps the live probe; the _IN_SNAPSHOT family wraps the frozen one (the DELETER variants pass mvcc_del_id instead of mvcc_ins_id):

// MVCC_IS_REC_{INSERTER,DELETER}_{ACTIVE,IN_SNAPSHOT} -- src/transaction/mvcc.c
#define MVCC_IS_REC_INSERTER_ACTIVE(thread_p, rec_header_p) \
(mvcc_is_active_id (thread_p, (rec_header_p)->mvcc_ins_id))
#define MVCC_IS_REC_INSERTER_IN_SNAPSHOT(thread_p, rec_header_p, snapshot) \
(mvcc_is_id_in_snapshot (thread_p, (rec_header_p)->mvcc_ins_id, (snapshot)))

mvcc_is_id_in_snapshot is the frozen test — it compares against the captured snapshot bounds and bit-area (Chapter 4): “was this writer active at the instant the snapshot was taken?” mvcc_is_active_id is the live test — it consults the current global active set, not a captured copy:

// mvcc_is_active_id -- src/transaction/mvcc.c
if (MVCC_ID_PRECEDES (mvccid, curr_mvcc_info->recent_snapshot_lowest_active_mvccid))
return false; /* below recent watermark: long dead */
if (logtb_is_current_mvccid (thread_p, mvccid))
return true; /* mine (or my sub-tx) */
return log_Gl.mvcc_table.is_active (mvccid); /* live global probe, not the snapshot copy */

Invariant — delete uses liveness, read uses the frozen snapshot. A reader probes the captured snapshot (mvcc_is_id_in_snapshot) for a stable view; a writer probes the live set (mvcc_is_active_id) and blocks on any deleter active right now, so it never loses an update. Swapping the two causes lost updates or deadlocks reads against post-snapshot commits.

The watermark macros MVCC_IS_REC_{INSERTED,DELETED}_SINCE_MVCCID (vacuum only) are pure arithmetic: INSERTED_SINCE is !MVCC_ID_PRECEDES(ins_id, mvcc_id), i.e. ins_id >= mvcc_id, with no table probe.

The four predicates return three enums. mvcc_satisfies_snapshot_result (from Chapter 6) is reused by ..._dirty and ..._is_not_deleted_for_snapshot with a narrower domain.

mvcc_satisfies_snapshot_result (returned by ..._dirty, ..._is_not_deleted_for_snapshot):

FieldRoleWhy it exists
TOO_OLD_FOR_SNAPSHOTnot visible; deleted by me or by a committed tx”dead for good” vs. “not yet born” so a chain walker stops
SNAPSHOT_SATISFIEDvisible and validthe only “yes” answer
TOO_NEW_FOR_SNAPSHOTnot visible; inserter still active — follow prev_version_lsaonly mvcc_satisfies_snapshot returns it (7.4, 7.5 never do)

mvcc_satisfies_delete_result (returned by ..._delete):

FieldRoleWhy it exists
DELETE_RECORD_INSERT_IN_PROGRESSinserted by another active txnot yet committed-visible; do not touch
DELETE_RECORD_CAN_DELETEvisible, valid — proceedall-visible, by-me, or inserter-committed
DELETE_RECORD_DELETEDdeleted by a committed txtarget gone; caller raises serialization/not-found
DELETE_RECORD_DELETE_IN_PROGRESSdeleted by another active txcaller must wait on the deleter (lock-manager handoff, see cubrid-lock-manager-detail) then retry
DELETE_RECORD_SELF_DELETEDdeleted by this txidempotent; treated as removed

mvcc_satisfies_vacuum_result (returned by ..._vacuum):

FieldRoleWhy it exists
VACUUM_RECORD_REMOVEphysically remove the whole recorddeleter committed before the oldest reader — never seen again
VACUUM_RECORD_DELETE_INSID_PREV_VERkeep record, strip insert MVCCID and prev_version_lsaall-visible-and-live: stamp and chain are dead weight, row still needed
VACUUM_RECORD_CANNOT_VACUUMleave alonealready vacuumed, or recently inserted/deleted — a running tx may need it

7.3 mvcc_satisfies_delete — the five-state liveness verdict

Section titled “7.3 mvcc_satisfies_delete — the five-state liveness verdict”

The predicate a DELETE or UPDATE runs against the heap row it intends to modify. It takes no snapshot argument — liveness is always “now”. Top split is MVCC_IS_HEADER_DELID_VALID: does the row carry a delete stamp?

Not-yet-deleted branch (!MVCC_IS_HEADER_DELID_VALID):

// mvcc_satisfies_delete -- src/transaction/mvcc.c
if (!MVCC_IS_HEADER_DELID_VALID (rec_header))
{
if (!MVCC_IS_FLAG_SET (rec_header, OR_MVCC_FLAG_VALID_INSID))
return DELETE_RECORD_CAN_DELETE; /* no insid stamp: all-visible */
if (MVCC_IS_REC_INSERTED_BY_ME (thread_p, rec_header))
return DELETE_RECORD_CAN_DELETE; /* only I can see it; safe to drop */
else if (MVCC_IS_REC_INSERTER_ACTIVE (thread_p, rec_header))
return DELETE_RECORD_INSERT_IN_PROGRESS; /* another tx is still inserting */
else /* inserter committed; ... perfmon ... */
return DELETE_RECORD_CAN_DELETE;
}

Four terminal sub-branches: no VALID_INSID (insid vacuumed away, Chapter 9), inserted-by-me, and inserter-committed all yield CAN_DELETE; only an ACTIVE inserter yields INSERT_IN_PROGRESS. The else perfmon block only splits “committed” from “committed-then-insid-vacuumed” for stats; no result change.

Already-deleted branch (else) — symmetric three-way split, same live probe:

// mvcc_satisfies_delete -- src/transaction/mvcc.c (else arm)
else if (MVCC_IS_REC_DELETED_BY_ME (thread_p, rec_header))
return DELETE_RECORD_SELF_DELETED;
else if (MVCC_IS_REC_DELETER_ACTIVE (thread_p, rec_header))
return DELETE_RECORD_DELETE_IN_PROGRESS; /* must WAIT on that deleter */
else /* ... perfmon ... */
return DELETE_RECORD_DELETED; /* deleter committed: target gone */

Three terminal sub-branches: deleted-by-me → SELF_DELETED; deleter-committed → DELETED; deleter still ACTIVE → DELETE_IN_PROGRESS — the case that requires the live probe: the caller blocks behind the in-progress deleter and re-reads (a frozen snapshot could miss a post-snapshot deleter and lose the update). Figure 7-1 maps both DELID_VALID arms onto the five verdicts.

flowchart TD
  A["DELID valid?"] -->|no| B["ins state?"]
  A -->|yes| F["del state?"]
  B -->|else| C1["CAN_DELETE"]
  B -->|ins ACTIVE| G1["INSERT_IN_PROGRESS"]
  F -->|mine| H1["SELF_DELETED"]
  F -->|del ACTIVE| H2["DELETE_IN_PROGRESS"]
  F -->|committed| H3["DELETED"]

7.4 mvcc_satisfies_dirty — the side-effecting predicate

Section titled “7.4 mvcc_satisfies_dirty — the side-effecting predicate”

mvcc_satisfies_dirty answers read-uncommitted visibility: a dirty read sees committed and in-progress effects. The function header warns it has side effects, changing snapshot->lowest_active_mvccid and snapshot->highest_completed_mvccid — and that the snapshot argument can never be the transaction snapshot. Here snapshot is a scratch struct whose two scalars are output channels: the predicate zeroes both, walks the same DELID_VALID split as 7.3 with the live ACTIVE probe, and stamps a scalar in exactly one of two mutually-exclusive ACTIVE arms:

// mvcc_satisfies_dirty -- src/transaction/mvcc.c
snapshot->lowest_active_mvccid = MVCCID_NULL; /* both scalars cleared up front */
snapshot->highest_completed_mvccid = MVCCID_NULL;
// ... not-deleted arm: only the ACTIVE-inserter branch writes a scalar ...
else if (MVCC_IS_REC_INSERTER_ACTIVE (thread_p, rec_header))
snapshot->lowest_active_mvccid = MVCC_GET_INSID (rec_header); /* side effect, then SATISFIED */
// ... already-deleted arm: only the ACTIVE-deleter branch writes a scalar ...
else if (MVCC_IS_REC_DELETER_ACTIVE (thread_p, rec_header))
snapshot->highest_completed_mvccid = rec_header->mvcc_del_id; /* side effect, then SATISFIED */

Branch shape mirrors 7.3. Not-deleted arm, four sub-branches all SNAPSHOT_SATISFIED: no VALID_INSID / inserted-by-me / inserter-committed write nothing; only inserter ACTIVE stamps lowest_active_mvccid. Already-deleted arm, three sub-branches: deleted-by-me and deleter-committed → TOO_OLD_FOR_SNAPSHOT; only deleter ACTIVESNAPSHOT_SATISFIED, stamping highest_completed_mvccid. Dirty never returns TOO_NEW — it accepts active inserters as visible.

Invariant — dirty’s two scalars are an output, not the transaction view. On the real snapshot (Chapters 5–6) these fields are captured inputs bounding the active set; here they are outputs on a throwaway struct, at most one non-NULL per call. Passing the live snapshot would corrupt its bounds — hence the header note.

7.5 mvcc_is_not_deleted_for_snapshot — the cheap still-deletable check

Section titled “7.5 mvcc_is_not_deleted_for_snapshot — the cheap still-deletable check”

The lightest predicate: “is this row not deleted from my snapshot’s view?” — used where the caller already knows the row is otherwise visible. Unlike delete and dirty, it uses frozen IN-SNAPSHOT semantics.

// mvcc_is_not_deleted_for_snapshot -- src/transaction/mvcc.c
if (!MVCC_IS_HEADER_DELID_VALID (rec_header))
return SNAPSHOT_SATISFIED; /* never deleted: trivially "not deleted" */
else if (MVCC_IS_REC_DELETED_BY_ME (thread_p, rec_header))
return TOO_OLD_FOR_SNAPSHOT; /* I deleted it: gone for me */
else if (MVCC_IS_REC_DELETER_IN_SNAPSHOT (thread_p, rec_header, snapshot)) /* frozen */
return SNAPSHOT_SATISFIED; /* deleter active/after-snapshot: still here */
else
return TOO_OLD_FOR_SNAPSHOT; /* deleter committed before snapshot: gone */

Four terminal branches — the not-deleted short-circuit plus a three-way deleted arm — no insert-side logic (the caller’s job), no TOO_NEW. The deleter test is MVCC_IS_REC_DELETER_IN_SNAPSHOT (frozen), not delete’s ACTIVE probe, because this is a read-style verdict: an in-snapshot deleter (active, or committed after the snapshot) is not yet visible, so the row counts as “not deleted”.

7.6 mvcc_satisfies_vacuum — the three-way watermark verdict

Section titled “7.6 mvcc_satisfies_vacuum — the three-way watermark verdict”

Vacuum takes only oldest_mvccid, the oldest-active watermark from Chapter 9. Below it no running transaction can need a version, so the decision is pure arithmetic. The outer split is “not deleted, or deleted too recently (del_id >= oldest) to remove wholesale.”

// mvcc_satisfies_vacuum -- src/transaction/mvcc.c
if (!MVCC_IS_HEADER_DELID_VALID (rec_header) || MVCC_IS_REC_DELETED_SINCE_MVCCID (rec_header, oldest_mvccid))
{
if (!MVCC_IS_HEADER_INSID_NOT_ALL_VISIBLE (rec_header)
|| MVCC_IS_REC_INSERTED_SINCE_MVCCID (rec_header, oldest_mvccid))
return VACUUM_RECORD_CANNOT_VACUUM; /* insid gone OR inserted too recently; ...perfmon... */
else
return VACUUM_RECORD_DELETE_INSID_PREV_VER; /* inserter committed before oldest: insid dead weight */
}
else
return VACUUM_RECORD_REMOVE; /* deleter committed before oldest: nobody sees it */

Three terminal outcomes, every branch accounted for: (1) REMOVE (outer else) — del_id < oldest_mvccid; (2) CANNOT_VACUUM (first arm) when the insid is gone (!INSID_NOT_ALL_VISIBLE) or inserted-since-oldest (ins_id >= oldest_mvccid); (3) DELETE_INSID_PREV_VER (inner else) — inserter committed before oldest_mvccid with insid still present, so the insert stamp and prev_version_lsa are dead metadata.

Invariant — vacuum only looks backward past a single global watermark. Every comparison is header_id vs. oldest_mvccid — no snapshot, no live probe. The watermark is monotonic non-decreasing (Chapter 9), so a REMOVE verdict can never later become needed; an oldest_mvccid set too high would vacuum a still-visible version — hence Chapter 9’s conservative computation.

7.7 Four predicates, one header — the contrast table

Section titled “7.7 Four predicates, one header — the contrast table”
PredicateExtra inputComparison primitiveSees in-progress writers?Result domainSide effects
mvcc_satisfies_snapshot (Ch.6)transaction snapshotfrozen IN-SNAPSHOTno (active inserter -> TOO_NEW)3-state snapshotnone
mvcc_is_not_deleted_for_snapshotsnapshotfrozen IN-SNAPSHOT (deleter only)no2 of 3 snapshot (no TOO_NEW)none
mvcc_satisfies_dirtyscratch snapshot structlive ACTIVEyes (active writer -> SNAPSHOT_SATISFIED)2 of 3 snapshot (no TOO_NEW)writes lowest_active/highest_completed
mvcc_satisfies_deletenonelive ACTIVEyes, as wait signals5-state deletenone
mvcc_satisfies_vacuumoldest_mvccid watermarkwatermark SINCEno — only fully-past versions act3-state vacuumnone

Only the middle column distinguishes them; the DELID_VALID split, the inserted-by-me short-circuit, and the perfmon bookkeeping are shared.

  1. The four predicates differ almost entirely in one comparison primitive: mvcc_is_id_in_snapshot (frozen, reads), mvcc_is_active_id (live, delete and dirty), or MVCC_ID_PRECEDES against oldest_mvccid (watermark, vacuum). Only mvcc_satisfies_snapshot ever returns TOO_NEW.
  2. mvcc_satisfies_delete uses the live ACTIVE probe so a writer blocks behind an in-progress deleter (DELETE_IN_PROGRESS), avoiding lost updates. Five states: a 4-branch not-deleted arm (all but one CAN_DELETE) and a 3-branch already-deleted arm (SELF_DELETED / DELETE_IN_PROGRESS / DELETED).
  3. mvcc_satisfies_dirty is the only side-effecting predicate, also live: on SNAPSHOT_SATISFIED it stamps the active inserter into lowest_active_mvccid or the active deleter into highest_completed_mvccid (never both) — outputs on a scratch struct, never the real one.
  4. mvcc_is_not_deleted_for_snapshot is the cheap, delete-only, frozen check: four branches (not-deleted short-circuit plus a three-way deleted arm) via MVCC_IS_REC_DELETER_IN_SNAPSHOT, no insert logic, no TOO_NEW.
  5. mvcc_satisfies_vacuum is pure watermark arithmetic: REMOVE when the deleter committed before oldest_mvccid, DELETE_INSID_PREV_VER when the inserter did, CANNOT_VACUUM otherwise — safe only because the watermark is conservatively low.

Chapter 8: Commit and the History Ring Advance

Section titled “Chapter 8: Commit and the History Ring Advance”

When a write transaction finishes, three things must happen, in an order that survives a crash and stays correct for lock-free snapshot readers: the transaction’s MVCCID is marked inactive in the global active set, the new active set is published into the history ring so concurrent build_mvcc_info callers see it, and the global oldest-active watermark VACUUM trusts is advanced — but not so far that VACUUM erases data the still-uncommitted transaction may need to recover. This chapter traces that path end to end.

For the read side of these structures (bit-area probe, cached scalars, version-recheck on read) see the high-level companion cubrid-mvcc.md and Chapters 4–5. Here we cover only the write side: retirement, publication, and watermark maintenance.

Commit touches all three central structs (full field roles for mvcc_active_tran are in Ch. 1; the commit-relevant maintenance fields are repeated here).

mvcctable — the process-global coordinator (one instance, log_Gl.mvcc_table):

FieldRoleWhy it exists
m_transaction_lowest_visible_mvccidsper-tran-index atomic<MVCCID>; oldest MVCCID this tran must keep visibleVACUUM floor; commit clamps the committer’s slot
m_transaction_lowest_visible_mvccids_sizelength of that arrayrealloc guard
m_current_status_lowest_active_mvccidatomic global oldest-active watermarkadvance_oldest_active CAS-bumps it; VACUUM reads it
m_current_trans_statusthe live mvcc_trans_status, mutated under the mutexnever read lock-free
m_trans_status_history_positionatomic ring index of the newest published statusthe single visibility store
m_trans_status_historyring of HISTORY_MAX_SIZE (2048) status slotslock-free readers grab a recent snapshot
m_active_trans_mutexserializes status mutationone completer at a time
m_new_mvccid_lock, m_oldest_visible, m_ov_lock_countMVCCID issuance + oldest-visible watermarkoff commit path; Ch. 3 and Ch. 9

mvcc_trans_status — one global “snapshot generation”:

FieldRoleWhy it exists
m_active_mvccsthe mvcc_active_tran payloadthe active-set data
m_last_completed_mvccidlast MVCCID retired into this statushighest_completed hints
m_event_typeCOMMIT, ROLLBACK, or SUBTRANpost-mortem of the generation
m_versionatomic<version_type>, bumped per generationreader recheck token

mvcc_active_tran — every field (maintenance side at commit):

FieldRole at commitWhy it exists
m_bit_area500 uint64_t units; bit set = MVCCID committedO(1) recent-MVCCID status
m_bit_area_start_mvccidMVCCID mapped to bit 0window base; ltrim_area advances it
m_bit_area_lengthwindow length in bitsgrown by set_bitarea_mvccid, shrunk by trims
m_long_tran_mvccidssorted array of active MVCCIDs older than the windowwindow can slide past stragglers
m_long_tran_mvccids_lengthlong-tran entry countbounds array; drives compute_lowest_active_mvccid
m_initializedlifecycle flag; asserted by copy_to and reset_start_mvccidguards init/finalize idempotency, no use-before-init
flowchart LR
  CUR["m_current_trans_status<br/>(live, under mutex)"] -->|copy_to THREAD_SAFE| RING["m_trans_status_history[2048]"]
  POS["m_trans_status_history_position"] -->|newest| RING
  CUR -->|set_inactive_mvccid| AT["m_active_mvccs<br/>m_bit_area / m_long_tran_mvccids"]
  CUR -.->|compute_lowest + CAS| LOW["m_current_status_lowest_active_mvccid"]

Figure 8-1: the live status feeds the published ring (copy_to), the active set (set_inactive_mvccid), and the watermark (CAS).

8.2 logtb_complete_mvcc — caller and read-only fast path

Section titled “8.2 logtb_complete_mvcc — caller and read-only fast path”

logtb_complete_mvcc (in log_tran_table.c) runs at every commit and rollback, first deciding whether the transaction even has an MVCCID to retire:

// logtb_complete_mvcc -- src/transaction/log_tran_table.c
mvccid = curr_mvcc_info->id;
tran_index = LOG_FIND_THREAD_TRAN_INDEX (thread_p);
if (MVCCID_IS_VALID (mvccid))
{
mvcc_table->complete_mvcc (tran_index, mvccid, committed); /* <- write tran: full path */
}
else
{
if (committed && logtb_tran_update_all_global_unique_stats (thread_p) != NO_ERROR)
{ assert (false); }
/* read-only tran never allocated an MVCCID; just drop its visibility floor */
log_Gl.mvcc_table.reset_transaction_lowest_active (tran_index); /* <- stores MVCCID_NULL */
}
curr_mvcc_info->recent_snapshot_lowest_active_mvccid = MVCCID_NULL;
// ... condensed: reset count-optim state, curr_mvcc_info->reset (), perf ...

A read-only tran (no MVCCID) has nothing in any active set: it skips complete_mvcc, and reset_transaction_lowest_active stores MVCCID_NULL into its slot, releasing its floor. No mutex, no ring advance. Everything below is the write path.

The body runs under m_active_trans_mutex until an explicit ulock.unlock (). Branch-complete walkthrough:

  1. Lock m_active_trans_mutex.
  2. next_trans_status_start (§8.4) — reserve the next ring slot, bump the version, invalidate the slot.
  3. if (committed)logtb_tran_update_all_global_unique_stats; failure trips assert (false). else (rollback) skip.
  4. set_inactive_mvccid(mvccid) (§8.7), then set m_last_completed_mvccid = mvccid and m_event_type = COMMIT/ROLLBACK.
  5. next_tran_status_finish (§8.5) — copy the active set, publish the position.
  6. Clamp branchif (committed) clamp the floor up to mvccid (only when slot is MVCCID_NULL or precedes); else set the floor MVCCID_NULL.
  7. unlock.
  8. Post-unlock advanceif global == mvccid or mvccid precedes bit_area_startcompute_lowest_active_mvccid, then if version unchanged → advance_oldest_active; else skip the stale result. else skip entirely.

Step 6, the clamp branch, adjusts the committer’s own slot in m_transaction_lowest_visible_mvccids[tran_index]:

// mvcctable::complete_mvcc -- src/transaction/mvcc_table.cpp
if (committed)
{
/* be sure that transaction modifications can't be vacuumed up to LOG_COMMIT. ...
* It will be set to NULL after LOG_COMMIT */
MVCCID tran_lowest_active = oldest_active_get (m_transaction_lowest_visible_mvccids[tran_index], ...);
if (tran_lowest_active == MVCCID_NULL || MVCC_ID_PRECEDES (tran_lowest_active, mvccid))
{
oldest_active_set (..., mvccid, ...); /* <- clamp UP to mvccid, never down */
}
}
else
{
oldest_active_set (..., MVCCID_NULL, ...); /* <- rollback releases the floor immediately */
}

Invariant — VACUUM must not pass a committing transaction before its LOG_COMMIT. Commit raises the slot to mvccid (only if MVCCID_NULL or strictly older), pinning VACUUM at or below mvccid until LOG_COMMIT is durable; it resets to MVCCID_NULL only after LOG_COMMIT. Enforced by the clamp condition, which never lowers the floor here. If violated VACUUM could erase this tran’s modifications, and a crash before LOG_COMMIT leaves them unrecoverable. Rollback has no such hazard, so it drops the floor immediately.

8.4 next_trans_status_start — reserve and invalidate

Section titled “8.4 next_trans_status_start — reserve and invalidate”
// mvcctable::next_trans_status_start -- src/transaction/mvcc_table.cpp
next_index = (m_trans_status_history_position.load () + 1) & HISTORY_INDEX_MASK; /* ring +1 */
next_version = ++m_current_trans_status.m_version; /* bump GLOBAL version */
mvcc_trans_status &next_trans_status = m_trans_status_history[next_index];
next_trans_status.m_version.store (next_version); /* poison the target slot */
return next_trans_status;

Three effects under the mutex: the ring index advances modulo HISTORY_MAX_SIZE (2048, power of two, so & HISTORY_INDEX_MASK wraps); the current status’s version increments (the reader’s recheck token); and the target slot’s version is stamped to next_version before its payload exists.

Invariant — a half-written slot is detectably stale, and publication is the last store. The slot’s m_version is stamped before its bit-area is populated; next_tran_status_finish (§8.5) fills the payload and only then stores m_trans_status_history_position — the single publish. The version is a plain monotonically incrementing unsigned int (version_type), +1 per generation; the reader’s recheck is an exact-value compare (next_status.m_version.load () == next_version), not seqlock parity. Enforced by stamp-first + publish-last statement ordering plus the reader recheck (Ch. 4/5). If violated a reader could match an advanced version over partial bits, accepting a torn active set.

8.5 next_tran_status_finish — publish last

Section titled “8.5 next_tran_status_finish — publish last”
// mvcctable::next_tran_status_finish -- src/transaction/mvcc_table.cpp
m_current_trans_status.m_active_mvccs.copy_to (next_trans_status.m_active_mvccs,
mvcc_active_tran::copy_safety::THREAD_SAFE); /* deep-copy the active set */
next_trans_status.m_last_completed_mvccid = m_current_trans_status.m_last_completed_mvccid;
next_trans_status.m_event_type = m_current_trans_status.m_event_type;
m_trans_status_history_position.store (next_index); /* <- THE publish */

copy_safety::THREAD_SAFE makes copy_to run check_valid on source and destination. The payload fills the already-version-stamped slot; the trailing m_trans_status_history_position.store is the publish (see the §8.4 invariant). Until it lands, readers see the previous slot as newest.

8.6 advance_oldest_active — the post-unlock CAS loop

Section titled “8.6 advance_oldest_active — the post-unlock CAS loop”

After unlocking, complete_mvcc recomputes the global watermark only when the retired mvccid was the bottleneck:

// mvcctable::complete_mvcc (post-unlock) -- src/transaction/mvcc_table.cpp
MVCCID global_lowest_active = m_current_status_lowest_active_mvccid;
if (global_lowest_active == mvccid
|| MVCC_ID_PRECEDES (mvccid, next_status.m_active_mvccs.get_bit_area_start_mvccid ()))
{
MVCCID new_lowest_active = next_status.m_active_mvccs.compute_lowest_active_mvccid ();
if (next_status.m_version.load () == next_version) /* <- recheck: result still ours? */
{
advance_oldest_active (new_lowest_active);
}
}

Two trigger conditions. Recompute only if (a) mvccid equals the global watermark — it was the oldest — or (b) mvccid precedes the slot’s bit_area_start_mvccid — a long transaction finished and the floor lives in the long-tran array; otherwise the watermark is unaffected and the work skipped. Version recheck. compute_lowest_active_mvccid reads next_status lock-free; if another completer reused the slot its version differs and the value is discarded.

// mvcctable::advance_oldest_active -- src/transaction/mvcc_table.cpp
do
{
crt_oldest_active = m_current_status_lowest_active_mvccid.load ();
if (crt_oldest_active >= next_oldest_active)
{ return; } /* <- monotonic guard */
}
while (!m_current_status_lowest_active_mvccid.compare_exchange_strong (crt_oldest_active, next_oldest_active));

Invariant — the global watermark is monotonically non-decreasing. advance_oldest_active only ever raises it. Enforced by the crt_oldest_active >= next_oldest_active early return inside the CAS loop, re-evaluated on each retry. If violated VACUUM could see the value drop and reclaim data an active reader needs. The CAS handles racing completers — the higher value wins, losers re-read and bail.

8.7 set_inactive_mvccid — routing the retirement

Section titled “8.7 set_inactive_mvccid — routing the retirement”

Back inside the mutex, set_inactive_mvccid routes the retiring MVCCID:

// mvcc_active_tran::set_inactive_mvccid -- src/transaction/mvcc_active_tran.cpp
if (MVCC_ID_PRECEDES (mvccid, m_bit_area_start_mvccid))
{
remove_long_transaction (mvccid); /* <- slid out of the window: it's a long tran */
}
else
{
set_bitarea_mvccid (mvccid); /* <- still in the window: set its committed bit */
}

An MVCCID older than the window base is in the long-tran array; everything else lives in the bit area (two-tier design, Ch. 4). The bit-area branch (set_bitarea_mvccid) then runs three maintenance triggers, traced in §8.9.

8.8 remove_long_transaction and add_long_transaction

Section titled “8.8 remove_long_transaction and add_long_transaction”
// mvcc_active_tran::remove_long_transaction -- src/transaction/mvcc_active_tran.cpp
assert (m_long_tran_mvccids_length > 0);
for (i = 0; i < m_long_tran_mvccids_length - 1; i++)
{
if (m_long_tran_mvccids[i] == mvccid)
{
size_t memsize = (m_long_tran_mvccids_length - i - 1) * sizeof (MVCCID);
std::memmove (&m_long_tran_mvccids[i], &m_long_tran_mvccids[i + 1], memsize); /* close the gap */
break;
}
}
assert ((i < m_long_tran_mvccids_length - 1) || m_long_tran_mvccids[i] == mvccid);
--m_long_tran_mvccids_length;

Linear scan; on match, memmove closes the gap (keeping the array dense and sorted), then length decrements. The loop stops at length - 1: a last-element target is never matched in the body, but the trailing assert confirms it was the tail and the unconditional --m_long_tran_mvccids_length drops it.

add_long_transaction is the inverse, used only during migration:

// mvcc_active_tran::add_long_transaction -- src/transaction/mvcc_active_tran.cpp
assert (m_long_tran_mvccids_length < long_tran_max_size ());
assert (m_long_tran_mvccids_length == 0 || m_long_tran_mvccids[m_long_tran_mvccids_length - 1] < mvccid);
m_long_tran_mvccids[m_long_tran_mvccids_length++] = mvccid; /* append; caller guarantees ascending */

Invariant — the long-tran array is sorted ascending and bounded. Enforced by add_long_transaction’s two asserts (each append exceeds the prior tail; length below long_tran_max_size); the migration source iterates the bit-area low-to-high, so appends are ascending. If violated compute_lowest_active_mvccid (Ch. 6/9) returns m_long_tran_mvccids[0] as the minimum — an unsorted array yields a wrong watermark.

8.9 set_bitarea_mvccid — set the bit and trigger maintenance

Section titled “8.9 set_bitarea_mvccid — set the bit and trigger maintenance”
// mvcc_active_tran::set_bitarea_mvccid -- src/transaction/mvcc_active_tran.cpp
const size_t CLEANUP_THRESHOLD = UNIT_BIT_COUNT; /* 64 bits */
const size_t LONG_TRAN_THRESHOLD = BITAREA_MAX_BITS - long_tran_max_size ();
size_t position = get_bit_offset (mvccid);
if (position >= BITAREA_MAX_BITS)
{
cleanup_migrate_to_long_transations (); /* <- window full: force migration to make room */
position = get_bit_offset (mvccid); /* recompute: start_mvccid moved */
}
assert (position < BITAREA_MAX_BITS);
if (position >= m_bit_area_length)
{
m_bit_area_length = position + 1; /* extend; new bits already zero (ALL_ACTIVE) */
}
*get_unit_of (position) |= get_mask_of (position); /* set the committed bit */
check_valid ();
if (m_bit_area_length > CLEANUP_THRESHOLD)
{ /* trim all-committed prefix units */
for (first_not_all_committed = 0; first_not_all_committed < get_area_size (); first_not_all_committed++)
if (m_bit_area[first_not_all_committed] != ALL_COMMITTED) break;
ltrim_area (first_not_all_committed);
check_valid ();
}
if (m_bit_area_length > LONG_TRAN_THRESHOLD)
{ cleanup_migrate_to_long_transations (); }

Overflow branch. If the offset reaches BITAREA_MAX_BITS (500 × 64 = 32000 bits), the window cannot hold it: cleanup_migrate_to_long_transations slides the window forward, the offset recomputes against the new base, and the assert guarantees it fits. Extend branch. A bit past the current length simply raises m_bit_area_length — storage is pre-zeroed (ALL_ACTIVE), so no clearing. Cleanup threshold. Once length exceeds one unit (64), the code finds the first unit not ALL_COMMITTED and ltrim_areas everything before it — the cheap common compaction. Long-tran threshold. If length still exceeds LONG_TRAN_THRESHOLD (bit-area max minus long-tran capacity), remaining stragglers migrate wholesale, leaving exactly enough long-tran slots for every possible active tran.

// mvcc_active_tran::ltrim_area -- src/transaction/mvcc_active_tran.cpp
if (trim_size == 0) { return; }
size_t new_memsize = (get_area_size () - trim_size) * sizeof (unit_type);
if (new_memsize > 0)
{ std::memmove (m_bit_area, &m_bit_area[trim_size], new_memsize); } /* shift survivors down */
size_t trimmed_bits = units_to_bits (trim_size);
m_bit_area_length -= trimmed_bits;
m_bit_area_start_mvccid += trimmed_bits; /* base advances */
std::memset (&m_bit_area[get_area_size ()], ALL_ACTIVE, trim_size * sizeof (unit_type)); /* re-zero tail */

ltrim_area removes trim_size units from the front: survivors memmove down, length shrinks by the trimmed bit count, and m_bit_area_start_mvccid advances by the same amount so the MVCCID↔offset mapping stays consistent. Vacated tail units reset to ALL_ACTIVE (0).

Invariant — units beyond bit_area_length are always ALL_ACTIVE (0). Enforced by the trailing memset here, the pre-zeroed initialize, and check_valid’s debug loop. If violated the “extend by raising length” shortcut in set_bitarea_mvccid inherits stale committed bits, marking never-seen MVCCIDs committed.

8.11 cleanup_migrate_to_long_transations — keep 16, evict the rest

Section titled “8.11 cleanup_migrate_to_long_transations — keep 16, evict the rest”
// mvcc_active_tran::cleanup_migrate_to_long_transations -- src/transaction/mvcc_active_tran.cpp
const size_t BITAREA_SIZE_AFTER_CLEANUP = 16;
size_t delete_count = get_area_size () - BITAREA_SIZE_AFTER_CLEANUP;
for (size_t i = 0; i < delete_count; i++)
{
bits = m_bit_area[i];
for (bit_pos = 0, mask = 1, long_tran_mvccid = get_mvccid (i * UNIT_BIT_COUNT);
bit_pos < UNIT_BIT_COUNT && bits != ALL_COMMITTED;
++bit_pos, mask <<= 1, ++long_tran_mvccid)
{
if ((bits & mask) == 0) /* bit clear == still active */
{
add_long_transaction (long_tran_mvccid); /* push straggler to long-tran array */
bits |= mask; /* set locally to allow early ALL_COMMITTED exit */
}
}
}
ltrim_area (delete_count);

Retain the most-recent 16 units, evict the older get_area_size () - 16. In each evicted unit, every clear bit (still-active MVCCID) is appended to the long-tran array; setting the bit locally lets the inner loop short-circuit once the unit reads ALL_COMMITTED. Then ltrim_area (delete_count) drops the migrated units. Low-to-high scanning makes appends ascending, satisfying the §8.8 sort invariant.

8.12 check_valid — the debug invariant gate

Section titled “8.12 check_valid — the debug invariant gate”
// mvcc_active_tran::check_valid -- src/transaction/mvcc_active_tran.cpp (debug-only, #if !defined(NDEBUG))
// 1. bits in the final partial unit, past bit_area_length, must be 0
if ((m_bit_area_length % UNIT_BIT_COUNT) != 0)
{
size_t last_bit_pos = m_bit_area_length - 1;
unit_type last_unit = *get_unit_of (last_bit_pos);
for (size_t i = (last_bit_pos + 1); i < UNIT_BIT_COUNT; i++)
if ((get_mask_of (i) & last_unit) != 0)
{ assert (false); } /* a set bit past the length is corruption */
}
// 2. every unit fully past bit_area_length must equal ALL_ACTIVE
for (unit_type *p_area = get_unit_of (m_bit_area_length) + 1; p_area < m_bit_area + BITAREA_MAX_SIZE; ++p_area)
if (*p_area != ALL_ACTIVE) { assert (false); }
// 3. long-tran array is ascending and every entry precedes bit_area_start_mvccid
for (size_t i = 0; i < m_long_tran_mvccids_length; i++)
{
assert (MVCC_ID_PRECEDES (m_long_tran_mvccids[i], m_bit_area_start_mvccid));
assert (i == 0 || MVCC_ID_PRECEDES (m_long_tran_mvccids[i - 1], m_long_tran_mvccids[i]));
}

check_valid is a no-op in release builds (#if !defined (NDEBUG)) but runs after every commit-path mutation (set_bitarea_mvccid, ltrim_area, cleanup_migrate_to_long_transations, remove_long_transaction, copy_to under THREAD_SAFE). Its first conditional fires only when m_bit_area_length is not unit-aligned: it scans the final partial unit from last_bit_pos + 1 to UNIT_BIT_COUNT, asserting each bit clear. #2 checks whole units past the length; #3 the sorted long-tran array strictly below the base. Any violation aborts in debug, surfacing maintenance bugs at the point of corruption.

  1. A read-only commit is nearly free. With no MVCCID, logtb_complete_mvcc skips complete_mvcc and only resets the visibility slot to MVCCID_NULL; only write transactions take the mutex-protected path.

  2. Publication is a single, last store. next_trans_status_start stamps the slot’s version (a plain monotonic unsigned int, +1 per generation, rechecked by exact equality) before the payload exists; next_tran_status_finish copies the active set and only then stores m_trans_status_history_position — which, with version-recheck, makes the new snapshot atomically visible to lock-free readers.

  3. Commit raises the visibility floor, rollback drops it. Commit clamps the per-tran slot up to mvccid (never down) so VACUUM cannot reclaim data before LOG_COMMIT is durable; rollback sets MVCCID_NULL at once.

  4. The watermark advance is lazy and monotonic. Post-unlock, complete_mvcc recomputes oldest-active only when the retired MVCCID was the bottleneck, rechecks the slot version, then advance_oldest_active CAS-bumps the watermark, which only ever increases.

  5. Retirement routes by window position and self-compacts. set_inactive_mvccid sends sub-window MVCCIDs to remove_long_transaction and the rest to set_bitarea_mvccid, whose three triggers compact the window: ltrim_area drops the all-committed prefix past 64 bits, cleanup_migrate_to_long_transations keeps 16 units past LONG_TRAN_THRESHOLD, and a BITAREA_MAX_BITS overflow forces migration. check_valid (debug-only) asserts clean tail bits/units and a sorted long-tran array below the base after every mutation.

Chapter 9: Vacuum Coordination and the Oldest-Visible Watermark

Section titled “Chapter 9: Vacuum Coordination and the Oldest-Visible Watermark”

MVCC keeps an old version until no live snapshot can need it. That decision collapses into one global scalar — the oldest-visible MVCCID watermark, mvcctable::m_oldest_visible. This chapter: how it is computed across every live snapshot, how vacuum consumes it to drive VACUUM_RECORD_REMOVE, and why one long-running small-MVCCID writer freezes reclamation database-wide. The companion cubrid-mvcc.md covers why vacuum exists and the master/worker split; inputs (m_transaction_lowest_visible_mvccids[] seeded with MVCCID_ALL_VISIBLE) came from Ch.4/Ch.5, the mvcc_satisfies_vacuum predicate from Ch.7. Here we close the loop.

9.1 The mvcctable struct — watermark fields in context

Section titled “9.1 The mvcctable struct — watermark fields in context”

History-ring fields were covered in Ch.8. This chapter’s watermark substate:

// class mvcctable (watermark fields) -- src/transaction/mvcc_table.hpp
using lowest_active_mvccid_type = std::atomic<MVCCID>; // ... history-ring fields elided (Ch.8) ...
lowest_active_mvccid_type *m_transaction_lowest_visible_mvccids; /* per-tran array */
size_t m_transaction_lowest_visible_mvccids_size;
lowest_active_mvccid_type m_current_status_lowest_active_mvccid; /* global floor */
std::atomic<MVCCID> m_oldest_visible; /* cached watermark */
std::atomic<size_t> m_ov_lock_count; /* >0 pins the watermark */
FieldRoleWhy it exists
m_transaction_lowest_visible_mvccidsArray, one atomic<MVCCID> per tran index; each slot is the oldest MVCCID that transaction’s snapshot must keep visible.Per-snapshot input to the min. Sized to logtb_get_number_of_total_tran_indices().
m_transaction_lowest_visible_mvccids_sizeCached array length.Sweep iterates without re-querying the transaction table.
m_current_status_lowest_active_mvccidGlobal floor: lowest active MVCCID in the current trans-status; advanced monotonically by advance_oldest_active (Ch.8).Seeds the sweep so a transaction that has not yet published its per-tran value is still bounded.
m_oldest_visibleCached watermark; stored by update_global_oldest_visible, read by get_global_oldest_visible.One atomic read per vacuum consumer; recompute amortized to the master heartbeat.
m_ov_lock_countCount of operations that have pinned the watermark; > 0 means the cached value must not advance.A caller that reads get_global_oldest_visible() and acts on it holds the floor steady until it finishes.
m_current_trans_status / m_trans_status_history / m_trans_status_history_position / m_new_mvccid_lock / m_active_trans_mutexOwned by other chapters — the history ring and status mutex (Ch.8), the MVCCID-allocation lock (Ch.3).Not watermark state; the lock-free sweep reads none of them.
flowchart TB
  FLOOR["m_current_status_lowest_active_mvccid\n(global floor, monotonic)"] --> COMPUTE["compute_oldest_visible_mvccid()"]
  ARR["m_transaction_lowest_visible_mvccids[0..N]\n(per-snapshot inputs)"] --> COMPUTE
  COMPUTE --> UPDATE["update_global_oldest_visible()"]
  LC["m_ov_lock_count\n(pin counter)"] --> UPDATE
  UPDATE --> OV["m_oldest_visible\n(cached watermark)"]
  OV --> GET["get_global_oldest_visible()"] --> VAC["vacuum consumers\n(threshold_mvccid)"]

Figure 9-1 — Watermark substate, from inputs to the cached scalar vacuum reads.

Invariant (watermark monotonicity): m_oldest_visible never decreases. update_global_oldest_visible enforces it with assert (m_oldest_visible.load () <= oldest_visible) before the store. A regression would let a worker holding an older threshold_mvccid remove a version a newer-but-lower snapshot still needs. The monotonic floor m_current_status_lowest_active_mvccid keeps the computed min non-decreasing.

9.2 compute_oldest_visible_mvccid — the cross-snapshot sweep

Section titled “9.2 compute_oldest_visible_mvccid — the cross-snapshot sweep”

const, lock-free: reads atomics, never takes m_active_trans_mutex. Returns the min MVCCID any live snapshot can still see.

// mvcctable::compute_oldest_visible_mvccid -- src/transaction/mvcc_table.cpp
cubmem::appendable_array<size_t, 32> waiting_mvccids_pos;
MVCCID lowest_active_mvccid = oldest_active_get (m_current_status_lowest_active_mvccid, 0, /*...*/); /* <- seed = floor */
for (size_t idx = 0; idx < m_transaction_lowest_visible_mvccids_size; idx++) {
MVCCID loaded = oldest_active_get (m_transaction_lowest_visible_mvccids[idx], idx, /*...*/);
if (loaded == MVCCID_ALL_VISIBLE) waiting_mvccids_pos.append (idx); /* <- in flight; defer (9.2.1) */
else if (loaded != MVCCID_NULL && MVCC_ID_PRECEDES (loaded, lowest_active_mvccid))
lowest_active_mvccid = loaded; /* <- min; NULL = ended, ignored */
}
// ... re-check loop for deferred slots (9.2.1) ...
assert (MVCCID_IS_NORMAL (lowest_active_mvccid)); /* return value */

The sweep classifies each slot by the three sentinel cases Ch.4/Ch.5 wrote there: MVCCID_ALL_VISIBLE (== 3) means build_mvcc_info is mid-flight (slot claimed, real value not yet published) — defer the index and re-check; MVCCID_NULL (== 0) means the transaction ended (written by reset_transaction_lowest_active, 9.5) — ignore; any >= MVCCID_FIRST is a published value — take the min via MVCC_ID_PRECEDES.

9.2.1 The deferred re-check loop and the 20-retry backoff

Section titled “9.2.1 The deferred re-check loop and the 20-retry backoff”

MVCCID_ALL_VISIBLE is transient (Ch.5: stamp, read floor, overwrite), so the loop spins until each deferred slot publishes a real value or drops to MVCCID_NULL:

// mvcctable::compute_oldest_visible_mvccid (re-check loop) -- src/transaction/mvcc_table.cpp
size_t retry_count = 0;
while (waiting_mvccids_pos.get_size () > 0) {
if (++retry_count % 20 == 0) { thread_sleep (10); } /* <- 10ms backoff every 20 spins */
for (size_t i = waiting_mvccids_pos.get_size () - 1; i < waiting_mvccids_pos.get_size (); --i) {
/* reverse walk: decrement past 0 wraps >= size, so erase shrinks the tail safely */
size_t pos = waiting_mvccids_pos.get_array ()[i];
MVCCID loaded = oldest_active_get (m_transaction_lowest_visible_mvccids[pos], pos, /*...*/);
if (loaded == MVCCID_ALL_VISIBLE) { continue; } /* <- still unset; keep in set */
if (loaded != MVCCID_NULL && MVCC_ID_PRECEDES (loaded, lowest_active_mvccid))
lowest_active_mvccid = loaded;
waiting_mvccids_pos.erase (i); /* <- resolved (value or NULL); drop */
}
}

A re-read still showing MVCCID_ALL_VISIBLE keeps the slot (continue); any other value resolves it — a normal MVCCID smaller than the current min lowers the min, anything else is dropped via erase.

Invariant (sweep terminates with a normal result): ends with assert (MVCCID_IS_NORMAL (...)). The seed is always >= MVCCID_FIRST, and every MVCCID_ALL_VISIBLE slot resolves because the publishing writer runs its two lines back-to-back (Ch.5); the loop cannot hang under correct operation.

flowchart TD
  A["seed = m_current_status_lowest_active_mvccid"] --> B["sweep idx 0..size"]
  B --> C{slot value?}
  C -->|ALL_VISIBLE| D["defer: append idx"]
  C -->|NULL| E["ignore"]
  C -->|normal| F["if PRECEDES min: min = slot"]
  D --> G{waiting set empty?}
  E --> G
  F --> G
  G -->|yes| Z["assert NORMAL; return min"]
  G -->|no| H["retry++; every 20th sleep 10ms"]
  H --> I["reverse-walk waiting set"] --> J{re-read slot}
  J -->|still ALL_VISIBLE| K["keep"] --> G
  J -->|normal/NULL| L["maybe update min; erase"] --> G

Figure 9-2 — compute_oldest_visible_mvccid control flow, all branches.

9.3 update_global_oldest_visible — the pinned double-check store

Section titled “9.3 update_global_oldest_visible — the pinned double-check store”

The master heartbeat (9.6) recomputes only if no operation pins the watermark. m_ov_lock_count is checked twice — before computing and after the sweep, before the store.

// mvcctable::update_global_oldest_visible -- src/transaction/mvcc_table.cpp
MVCCID mvcctable::update_global_oldest_visible ()
{
if (m_ov_lock_count == 0) /* <- gate 1: skip work if pinned */
{
MVCCID oldest_visible = compute_oldest_visible_mvccid ();
if (m_ov_lock_count == 0) /* <- gate 2: pin may have arrived during sweep */
{
assert (m_oldest_visible.load () <= oldest_visible); /* monotonicity (9.1) */
m_oldest_visible.store (oldest_visible);
}
}
return m_oldest_visible.load (); /* <- always return cached (possibly stale) */
}

Three outcomes: a pin at gate 1 skips the sweep and returns the cached value; a pin arriving during the sweep (gate 2 != 0) discards the fresh value; only == 0 at both gates asserts monotonicity and stores. Gate 2 matters because the sweep can take milliseconds — and a plain atomic load suffices, since a pinning caller increments m_ov_lock_count before reading the watermark, so the happens-before edge lives on the caller side.

9.4 The pin API — lock / unlock / is_locked / get

Section titled “9.4 The pin API — lock / unlock / is_locked / get”

All trivial atomics:

// pin API -- src/transaction/mvcc_table.cpp
MVCCID mvcctable::get_global_oldest_visible () const { return m_oldest_visible.load (); }
void mvcctable::lock_global_oldest_visible () { ++m_ov_lock_count; }
void mvcctable::unlock_global_oldest_visible () { assert (m_ov_lock_count > 0); --m_ov_lock_count; }
bool mvcctable::is_global_oldest_visible_locked () const { return m_ov_lock_count != 0; }

get_global_oldest_visible is the vacuum fast path; lock/unlock are the pin pair; is_..._locked reports whether any pin is outstanding.

Invariant (balanced pins): every lock pairs with exactly one unlock; the unlock asserts m_ov_lock_count > 0, tripping on double-unlock or missing-lock. The log_tdes wrappers carry the pairing across function boundaries: the unlock lives in log_complete (log_manager.c), while the matching lock is taken on the locator side (locator_sr.c, e.g. in xlocator_upgrade_instances_domain just before heap_vacuum_all_objects, and in redistribute_partition_data) so the watermark stays pinned while that operation reads get_global_oldest_visible(). A leaked pin freezes m_oldest_visible forever and stops reclamation.

9.5 reset_transaction_lowest_active — clearing a slot at transaction end

Section titled “9.5 reset_transaction_lowest_active — clearing a slot at transaction end”

A finished transaction’s slot must return to MVCCID_NULL. This is the only writer of MVCCID_NULL into the per-tran array from the commit path:

// mvcctable::reset_transaction_lowest_active -- src/transaction/mvcc_table.cpp
void mvcctable::reset_transaction_lowest_active (int tran_index)
{
oldest_active_set (m_transaction_lowest_visible_mvccids[tran_index], tran_index, MVCCID_NULL,
oldest_active_event::RESET);
}

Ordering against LOG_COMMIT is the reason for the pin (9.4). log_complete does, in order: append commit/abort record → drop pin → then, if committed, reset the slot:

// log_complete (commit tail) -- src/transaction/log_manager.c
log_append_commit_log (thread_p, tdes, &commit_lsa);
/* ... */
tdes->unlock_global_oldest_visible_mvccid (); /* <- drop the pin first */
if (iscommitted == LOG_COMMIT)
log_Gl.mvcc_table.reset_transaction_lowest_active (LOG_FIND_THREAD_TRAN_INDEX (thread_p));

complete_mvcc (Ch.8) already set the slot to this transaction’s own mvccid, and the pin held the watermark steady; only after LOG_COMMIT is appended is the pin dropped and the slot reset. Resetting earlier would let vacuum clean modifications a post-crash recovery expects to exist.

9.6 The vacuum side — master heartbeat and per-record consumption

Section titled “9.6 The vacuum side — master heartbeat and per-record consumption”

On every vacuum_master_task::execute the master refreshes and captures the watermark; it also runs at vacuum_boot (“for debug only”) and vacuum_data_load_and_recover, so a fresh server always has a watermark before the first job:

// vacuum_master_task::execute -- src/query/vacuum.c
m_oldest_visible_mvccid = log_Gl.mvcc_table.update_global_oldest_visible ();

The master gates block eligibility on it:

// vacuum_master_task::is_cursor_entry_ready_to_vacuum -- src/query/vacuum.c
if (m_cursor.get_current_entry ().newest_mvccid >= m_oldest_visible_mvccid)
return false; /* <- newest still visible; whole block not vacuumable */

Blocks are scanned in blockid order and a later block cannot be ready if the current one is not, so the master breaks on the first not-ready block.

9.6.1 vacuum_process_log_block — capturing the threshold per job

Section titled “9.6.1 vacuum_process_log_block — capturing the threshold per job”

Each worker re-reads the watermark into a local threshold_mvccid, then an NDEBUG tripwire bounds every op three ways:

// vacuum_process_log_block -- src/query/vacuum.c
MVCCID threshold_mvccid = log_Gl.mvcc_table.get_global_oldest_visible (); /* <- one atomic load */
#if !defined (NDEBUG)
if (MVCC_ID_FOLLOW_OR_EQUAL (mvccid, threshold_mvccid) /* not yet below watermark? */
|| MVCC_ID_PRECEDES (mvccid, data->oldest_visible_mvccid) /* older than block floor? */
|| MVCC_ID_PRECEDES (data->newest_mvccid, mvccid)) /* newer than block ceiling? */
{ assert (0); logpb_fatal_error (thread_p, true, ARG_FILE_LINE, "vacuum_process_log_block"); goto end; }
#endif

VACUUM_DATA_ENTRY::oldest_visible_mvccid (captured at log time, 9.6.3) bounds the job from below, live threshold_mvccid from above. An op at-or-above the current watermark should never reach a job — the master gate would have deferred its block — hence the assert. Gathered heap objects go to vacuum_heap_page, which carries threshold_mvccid to the record predicate.

9.6.2 mvcc_satisfies_vacuum — the per-record verdict

Section titled “9.6.2 mvcc_satisfies_vacuum — the per-record verdict”

vacuum_heap_page asserts MVCCID_IS_NORMAL (threshold_mvccid) and dispatches on the Ch.7 predicate’s verdict for each candidate record:

// vacuum_heap_page (per-record) -- src/query/vacuum.c
helper.can_vacuum = mvcc_satisfies_vacuum (thread_p, &helper.mvcc_header, threshold_mvccid);
if (helper.can_vacuum == VACUUM_RECORD_REMOVE)
vacuum_heap_record (thread_p, &helper); /* <- whole version dies */
else if (helper.can_vacuum == VACUUM_RECORD_DELETE_INSID_PREV_VER)
vacuum_heap_record_insid_and_prev_version (thread_p, &helper); /* <- shrink header */
/* else VACUUM_RECORD_CANNOT_VACUUM: leave it */

The predicate (whose body is dissected in Ch.7) takes oldest_mvccid, which here is the watermark — so m_oldest_visible alone decides each verdict against the record header:

Record state vs. watermarkVerdictEffect
Deleter committed and delete MVCCID < watermarkVACUUM_RECORD_REMOVERemove entirely.
Not deleted (or deleted >= watermark) and insert all-visible / < watermarkVACUUM_RECORD_DELETE_INSID_PREV_VERKeep version; trim insert MVCCID + prev-version LSA.
Inserted >= watermark, or insert not all-visibleVACUUM_RECORD_CANNOT_VACUUMA live snapshot may need it; leave it.

The predicate is invoked from the per-record job path in vacuum_heap_page; two other sites exist — is_not_vacuumed_and_lost (a consistency check against vacuum_Data.oldest_unvacuumed_mvccid) and vacuum_rv_check_at_undo (the undo-time recheck against get_global_oldest_visible()) — but neither is on the block-job hot path. All three feed the same monotonic watermark lineage.

9.6.3 Block-level capture of the watermark at log time

Section titled “9.6.3 Block-level capture of the watermark at log time”

VACUUM_DATA_ENTRY records the watermark as of when the block was logged, so the job keeps that floor even after the global watermark advances:

// struct vacuum_data_entry -- src/query/vacuum.c
struct vacuum_data_entry {
VACUUM_LOG_BLOCKID blockid;
LOG_LSA start_lsa; // lsa of last mvcc op log record in block
MVCCID oldest_visible_mvccid; // oldest visible MVCCID while block was logged
MVCCID newest_mvccid; // newest MVCCID in log block
// ...
};

On append, oldest_visible_mvccid is asserted <= get_global_oldest_visible() and >= the previous block’s value — the 9.1 monotonicity invariant projected onto the block stream.

9.7 The structural limitation — one small writer pins everything

Section titled “9.7 The structural limitation — one small writer pins everything”

The watermark is a global minimum, only ever as fresh as the oldest live snapshot. An idle T_old holding a small MVCCID keeps its slot small, so every sweep takes it as the min and m_oldest_visible freezes:

flowchart LR
  TOLD["T_old (small MVCCID, idle)"] -->|slot stays small| ARR["per-tran array min"]
  ARR --> WM["m_oldest_visible frozen"]
  WM --> PRED["mvcc_satisfies_vacuum -> CANNOT_VACUUM"]
  PRED --> ACC["dead versions accumulate"]

Figure 9-3 — A single long-running writer pins the global watermark and stalls reclamation database-wide.

This is inherent to a single watermark with no per-table or per-tablespace scope: the slowest snapshot governs the system — the MVCC analogue of a long transaction blocking autovacuum/oldest xmin elsewhere. Remedies are operational (bound transaction lifetime, avoid idle-in-transaction). The m_ov_lock_count pin (9.3/9.4) is a deliberate bounded version of the same freeze, scoped to one pinned operation window.

  1. The watermark is one scalar, m_oldest_visible — the min MVCCID any live snapshot can see; vacuum reads it via get_global_oldest_visible, recomputed at the master heartbeat.
  2. compute_oldest_visible_mvccid sweeps the per-tran array lock-free, seeded by the monotonic floor: MVCCID_ALL_VISIBLE → defer, MVCCID_NULL → ignore, normal → min; the deferred slots re-check until they publish a value (10 ms backoff every 20 spins), then assert a normal result.
  3. update_global_oldest_visible double-checks m_ov_lock_count (before sweep, before store), so a mid-sweep pin discards the new value; the store asserts monotonicity.
  4. The pin freezes the watermark across the pinned operation window; the lock is taken on the locator side, the unlock in log_complete, after which the slot is reset to MVCCID_NULL by reset_transaction_lowest_active.
  5. Vacuum consumes the watermark twice: master block gate (newest_mvccid >= m_oldest_visible_mvccid → skip), and worker threshold_mvccid into mvcc_satisfies_vacuum, which alone picks REMOVE / DELETE_INSID_PREV_VER / CANNOT_VACUUM.
  6. One small-MVCCID long-running writer pins the watermark and stalls reclamation database-wide — the cost of a single global minimum with no per-object scoping; the remedy is operational.

Chapter 10: Sub-Transactions and Special Paths

Section titled “Chapter 10: Sub-Transactions and Special Paths”

Chapters 3 through 9 traced the clean lifecycle: one transaction gets one MVCCID, stamps a record, becomes visible at commit, is reclaimed by vacuum. This chapter covers the paths that do not fit that model: sub-transactions (savepoints / system operations) that need their own MVCCID while the parent is still open and complete before it; MVCC-disabled classes (root class, _db_serial, collation/HA cached-OID classes) whose records carry no MVCC header fields; restart seeding via reset_start_mvccid; and the 2048-ring saturation question deferred from Chapter 5.

For the snapshot/visibility theory these paths perturb, see cubrid-mvcc.md. For the sub-transaction boundary vs. lock escalation and SERIALIZABLE write-skew, see §10.8.

Closing coverage of mvcc_info (per-transaction state, Ch.1/Ch.3) and mvcc_trans_status (one ring slot, Ch.1/Ch.8) with the fields only the sub-transaction paths use.

// struct mvcc_info -- src/transaction/mvcc.h
struct mvcc_info
{
MVCC_SNAPSHOT snapshot; /* MVCC Snapshot */
MVCCID id; /* the transaction's own MVCCID (Ch.3) */
MVCCID recent_snapshot_lowest_active_mvccid; /* fast-reject floor (Ch.4) */
std::vector<MVCCID> sub_ids; /* MVCC sub-transaction ID array */
bool is_sub_active; /* true while a sub-transaction is running */
// ... methods condensed ...
};
FieldRoleWhy it exists
snapshotThe read view (built Ch.5, consumed Ch.6).One snapshot per worker per consistency window.
idThe transaction’s own MVCCID, lazily allocated (Ch.3).The stamp written into record headers by top-level writes.
recent_snapshot_lowest_active_mvccidFloor below which any MVCCID is definitely inactive — fast-reject gate in mvcc_is_active_id (Ch.4).Avoids a global probe for old IDs.
sub_idsLIFO stack of MVCCIDs for nested sub-transactions, newest at back().A parent may need several MVCCIDs over its life, one per open system operation; they must complete in reverse order.
is_sub_activeFlag set true while a sub-transaction owns the “current” write identity.Signals the active write MVCCID is sub_ids.back(), not id. Mirrored in copy_to but never read inside the MVCC core — informational state for passive servers.

sub_ids is a stack, not a set. logtb_assign_subtransaction_mvccid only push_backs; logtb_complete_sub_mvcc only pop_backs the value read from back(). Out-of-order completion would pop the wrong id; CUBRID’s system-operation/savepoint machinery guarantees the required LIFO nesting.

// struct mvcc_trans_status -- src/transaction/mvcc_table.hpp
struct mvcc_trans_status
{
enum event_type { COMMIT, ROLLBACK, SUBTRAN };
mvcc_active_tran m_active_mvccs; /* the bit-area + long-list snapshot */
MVCCID m_last_completed_mvccid; // just for info
event_type m_event_type; // just for info
std::atomic<version_type> m_version;
// ... methods condensed ...
};
FieldRoleWhy it exists
m_active_mvccsThe active-transaction set (bit-area + long-list) snapshotted in this slot.The payload a snapshot builder copies (Ch.4/Ch.5).
m_last_completed_mvccidMVCCID of the completion that produced this slot. Diagnostic only.Debug/trace aid; not read by visibility.
m_event_typeTags the completion: COMMIT, ROLLBACK, or SUBTRAN.Diagnostic. Distinguishes a sub-completion from a top-level one.
m_versionMonotonic counter bumped on every status transition; re-read by snapshot builders to detect mid-copy mutation (the Ch.5 retry loop).The lock-free consistency check letting readers copy without holding the mutex.

Role matrix for m_event_type:

Producing callm_event_typeAdvances oldest?
complete_mvcc(.., committed=true)COMMITYes — committed work may raise the floor.
complete_mvcc(.., committed=false)ROLLBACKYes — rolled-back ID leaves the active set.
complete_sub_mvccSUBTRAN intended; actually left untouched (§10.5 bug)No — parent still open, so the sub-id can never be the lowest.

logtb_get_new_subtransaction_mvccid is the entry point when a system operation or savepoint needs to write under its own identity.

// logtb_get_new_subtransaction_mvccid -- src/transaction/log_tran_table.c
void
logtb_get_new_subtransaction_mvccid (THREAD_ENTRY * thread_p, MVCC_INFO * curr_mvcc_info)
{
MVCCID mvcc_subid;
mvcctable *mvcc_table = &log_Gl.mvcc_table;
if (MVCCID_IS_VALID (curr_mvcc_info->id))
{
mvcc_subid = mvcc_table->get_new_mvccid (); /* parent already has an id */
}
else
{
mvcc_table->get_two_new_mvccid (curr_mvcc_info->id, mvcc_subid); /* seed parent + sub */
}
logtb_assign_subtransaction_mvccid (thread_p, curr_mvcc_info, mvcc_subid);
}

Two branches on whether the parent already owns an MVCCID. Parent id valid → allocate one via get_new_mvccid. Parent id NULL → the parent never wrote (Ch.3’s lazy allocation never fired); since visibility (Ch.6) requires the parent’s id to precede its sub’s, get_two_new_mvccid pulls two consecutive ids under one lock — first to the parent (by reference), second to the sub:

// mvcctable::get_two_new_mvccid -- src/transaction/mvcc_table.cpp
void
mvcctable::get_two_new_mvccid (MVCCID &first, MVCCID &second)
{
m_new_mvccid_lock.lock ();
first = log_Gl.hdr.mvcc_next_id; MVCCID_FORWARD (log_Gl.hdr.mvcc_next_id);
second = log_Gl.hdr.mvcc_next_id; MVCCID_FORWARD (log_Gl.hdr.mvcc_next_id);
m_new_mvccid_lock.unlock ();
}

Invariant: parent id strictly precedes every sub-id. Either branch keeps the parent below the sub (later counter, or first vs second = first + 1), and both ids are taken under one m_new_mvccid_lock so nothing slots between them. Visibility relies on this so a sub-stamped record reads as “newer” than a parent-stamped one.

flowchart TD
  A["logtb_get_new_subtransaction_mvccid"] --> B{"MVCCID_IS_VALID(curr_mvcc_info->id)?"}
  B -- "yes (parent stamped)" --> C["get_new_mvccid() -> mvcc_subid"]
  B -- "no (parent unstamped)" --> D["get_two_new_mvccid(id, mvcc_subid)<br/>id := first, mvcc_subid := second"]
  C --> E["logtb_assign_subtransaction_mvccid"]
  D --> E
  E --> F["sub_ids.push_back(mvcc_subid)"]

Figure 10-1: branch structure of sub-transaction MVCCID allocation.

logtb_assign_subtransaction_mvccid carries the load-bearing assertion:

// logtb_assign_subtransaction_mvccid -- src/transaction/log_tran_table.c
static void
logtb_assign_subtransaction_mvccid (THREAD_ENTRY * thread_p, MVCC_INFO * curr_mvcc_info, MVCCID mvcc_subid)
{
assert (MVCCID_IS_VALID (curr_mvcc_info->id)); /* <- parent MUST be stamped by now */
curr_mvcc_info->sub_ids.push_back (mvcc_subid);
}

By the time we push, the parent’s id is valid (valid on entry, or just set by get_two_new_mvccid); a push onto an unstamped parent is a bug, caught here in debug builds.

10.3 A parent sees its own sub-transaction writes

Section titled “10.3 A parent sees its own sub-transaction writes”

A sub-stamped record must read back to the parent (and its later subs) as “written by me”, never as foreign active work — the job of logtb_is_current_mvccid, reached through Chapter 6’s MVCC_IS_REC_INSERTED_BY_ME / MVCC_IS_REC_DELETED_BY_ME macros.

// logtb_is_current_mvccid -- src/transaction/log_tran_table.c
bool
logtb_is_current_mvccid (THREAD_ENTRY * thread_p, MVCCID mvccid)
{
// ... condensed: tdes lookup + assert ...
MVCC_INFO *curr_mvcc_info = &tdes->mvccinfo;
if (curr_mvcc_info->id == mvccid)
{
return true; /* the parent's own id */
}
else if (curr_mvcc_info->sub_ids.size () > 0)
{
for (size_t i = 0; i < curr_mvcc_info->sub_ids.size (); i++)
{
if (curr_mvcc_info->sub_ids[i] == mvccid)
{
return true; /* one of my sub-transactions */
}
}
}
return false;
}

Every exit: (1) id == mvccidtrue, parent’s top-level write. (2) Else if sub_ids is non-empty, linear-scan the whole vector (i < size(), not just back()) — nested system operations may stack several sub-ids, and a record from an earlier still-open sub must also count as “mine”. (3) Empty or no match → false, falling through to the snapshot-based active check. MVCC_IS_REC_INSERTED_BY_ME expands straight to logtb_is_current_mvccid (thread_p, rec_header->mvcc_ins_id).

The companions logtb_find_current_mvccid / logtb_get_current_mvccid resolve the write identity from the other end: sub_ids.back() if non-empty (innermost open sub), else id. So a write while a sub is open is stamped with the sub-id, and logtb_is_current_mvccid guarantees it reads back as “mine”.

mvcc_is_active_id (Ch.4) layers the fast-reject floor on top:

// mvcc_is_active_id -- src/transaction/mvcc.c
STATIC_INLINE bool
mvcc_is_active_id (THREAD_ENTRY * thread_p, MVCCID mvccid)
{
// ... condensed: tdes lookup + assert ...
MVCC_INFO *curr_mvcc_info = &tdes->mvccinfo;
if (MVCC_ID_PRECEDES (mvccid, curr_mvcc_info->recent_snapshot_lowest_active_mvccid))
{
return false; /* below the floor: definitely inactive */
}
if (logtb_is_current_mvccid (thread_p, mvccid))
{
return true; /* mine (parent or any sub) */
}
return log_Gl.mvcc_table.is_active (mvccid); /* foreign: global probe */
}
stateDiagram-v2
  [*] --> CheckFloor
  CheckFloor --> Inactive: mvccid precedes recent_lowest
  CheckFloor --> CheckMine: at or above floor
  CheckMine --> Active: id or any sub_id matches
  CheckMine --> GlobalProbe: no local match
  GlobalProbe --> Active: mvcc_table.is_active true
  GlobalProbe --> Inactive: not in active set

Figure 10-2: mvcc_is_active_id — the local sub_ids check sits between the cheap floor reject and the expensive global probe.

A sub ends before its parent. logtb_complete_sub_mvcc runs the per-transaction half, then patches the parent’s live snapshot.

// logtb_complete_sub_mvcc -- src/transaction/log_tran_table.c
void
logtb_complete_sub_mvcc (THREAD_ENTRY * thread_p, LOG_TDES * tdes)
{
MVCC_INFO *curr_mvcc_info = &tdes->mvccinfo;
MVCCID mvcc_sub_id = curr_mvcc_info->sub_ids.back (); /* innermost open sub */
mvcc_table->complete_sub_mvcc (mvcc_sub_id); /* global half */
curr_mvcc_info->sub_ids.pop_back (); /* drop it from the stack */
if (tdes->mvccinfo.snapshot.valid)
{
MVCC_SNAPSHOT *snapshot = &tdes->mvccinfo.snapshot;
if (mvcc_sub_id >= snapshot->highest_completed_mvccid)
{
snapshot->highest_completed_mvccid = mvcc_sub_id;
MVCCID_FORWARD (snapshot->highest_completed_mvccid);
}
snapshot->m_active_mvccs.set_inactive_mvccid (mvcc_sub_id);
}
}

Branches: (1) Read sub_ids.back() (LIFO §10.1), call the global complete_sub_mvcc (§10.5), then pop_back. After the pop, logtb_is_current_mvccid no longer matches the sub-id for new reads, so the fix-up must repair the existing snapshot. (2) Valid snapshot → if mvcc_sub_id >= highest_completed_mvccid, raise the ceiling one past the sub-id; then unconditionally clear it from the active bit-area (set_inactive_mvccid). (3) No valid snapshot (READ COMMITTED between statements, or none yet) → skip; the next build_mvcc_info picks up the global state updated in step 1.

Invariant: a parent’s snapshot never loses sight of its own committed sub-transaction. The sub-id was allocated after the snapshot’s ceiling, so an unpatched snapshot would judge it “too new”; the ceiling-raise plus active-set clear repair both halves of the Ch.6 predicate so the parent reads its own sub’s rows immediately.

mvcctable::complete_sub_mvcc is the global counterpart — almost identical to complete_mvcc (Ch.8) but omitting the oldest-active recompute.

// mvcctable::complete_sub_mvcc -- src/transaction/mvcc_table.cpp
void
mvcctable::complete_sub_mvcc (MVCCID mvccid)
{
assert (MVCCID_IS_VALID (mvccid));
std::unique_lock<std::mutex> ulock (m_active_trans_mutex); /* only one status change at a time */
mvcc_trans_status::version_type next_version;
size_t next_index;
mvcc_trans_status &next_status = next_trans_status_start (next_version, next_index);
// update current trans status
m_current_trans_status.m_active_mvccs.set_inactive_mvccid (mvccid);
m_current_trans_status.m_last_completed_mvccid = mvccid;
m_current_trans_status.m_last_completed_mvccid = mvcc_trans_status::SUBTRAN; /* source-as-is; see note */
next_tran_status_finish (next_status, next_index); /* publish new ring slot */
ulock.unlock ();
// mvccid can't be lowest, so no need to update it here
}

Walkthrough: (1) take m_active_trans_mutex. (2) next_trans_status_start bumps m_version and reserves+invalidates the next slot (Ch.8’s version protocol — the bump makes a concurrent snapshot copy retry). (3) clear the sub-id from the current status active set. (4) record the info fields, then publish via next_tran_status_finish (copies the active set into the reserved slot, advances m_trans_status_history_position). (5) No advance_oldest_active — the comment says it: an open parent holds an older id, so the sub-id can never be the oldest-visible watermark (Ch.9). The double assignment to m_last_completed_mvccid is a copy-paste slip vs. complete_mvcc; harmless since both fields are // just for info (Open Question #2).

flowchart TD
  A["complete_sub_mvcc(mvccid)"] --> B["lock m_active_trans_mutex"]
  B --> C["next_trans_status_start<br/>bump m_version, reserve slot"]
  C --> D["m_current.set_inactive_mvccid(mvccid)"]
  D --> E["record info fields"]
  E --> F["next_tran_status_finish<br/>copy active set, advance position"]
  F --> G["unlock"]
  G --> H["return — NO advance_oldest_active"]

Figure 10-3: complete_sub_mvcc flow — note the absent oldest-active recompute.

mvcc_is_mvcc_disabled_class decides participation purely from the class OID:

// mvcc_is_mvcc_disabled_class -- src/transaction/mvcc.c
bool
mvcc_is_mvcc_disabled_class (const OID * class_oid)
{
if (OID_ISNULL (class_oid) || OID_IS_ROOTOID (class_oid))
{
return true; /* root class (the class-of-classes) */
}
if (oid_is_serial (class_oid))
{
return true; /* _db_serial: serial/auto-increment generators */
}
if (oid_check_cached_class_oid (OID_CACHE_COLLATION_CLASS_ID, class_oid))
{
return true; /* _db_collation */
}
if (oid_check_cached_class_oid (OID_CACHE_HA_APPLY_INFO_CLASS_ID, class_oid))
{
return true; /* HA apply-info catalog */
}
return false; /* normal MVCC class */
}
BranchClassWhy MVCC is disabled
OID_ISNULL || OID_IS_ROOTOIDRoot class (schema metaclass)Catalog bootstrap; cannot itself be versioned.
oid_is_serial_db_serialGenerated values must be globally visible at once; a versioned serial would let two txns draw the same value.
OID_CACHE_COLLATION_CLASS_ID_db_collationEffectively static metadata; in-place is cheaper.
OID_CACHE_HA_APPLY_INFO_CLASS_IDHA apply-infoReplication progress must be observed without snapshot lag.

What “MVCC disabled” means for a record: its header carries no OR_MVCC_FLAG_VALID_INSID flag, so mvcc_ins_id reads as MVCCID_ALL_VISIBLE, and every visibility/vacuum entry guards on that value. In mvcc_satisfies_snapshot (Ch.6) the first branch short-circuits to SNAPSHOT_SATISFIED (always visible); the perfmon block below skips its ..._LOST accounting via the same != MVCCID_ALL_VISIBLE test; and the vacuum predicate mvcc_satisfies_vacuum asks the identical question through the MVCC_IS_HEADER_INSID_NOT_ALL_VISIBLE macro:

// mvcc_satisfies_snapshot guard -- src/transaction/mvcc.c
if (rec_header->mvcc_ins_id != MVCCID_ALL_VISIBLE && vacuum_is_mvccid_vacuumed (rec_header->mvcc_ins_id))
{
perfmon_mvcc_snapshot (thread_p, PERF_SNAPSHOT_SATISFIES_SNAPSHOT,
PERF_SNAPSHOT_RECORD_INSERTED_COMMITED_LOST, PERF_SNAPSHOT_VISIBLE);
}
// ... condensed: same != MVCCID_ALL_VISIBLE guard recurs in mvcc_satisfies_delete / _dirty ...

Invariant: a disabled-class record is never handed to the active/visible machinery. With mvcc_ins_id == MVCCID_ALL_VISIBLE, those guards read false, so the row is committed-visible and un-vacuumable on the insert side. Callers memoize the verdict per class (heap insert) rather than re-walk it per row.

Cross-check. The function header comment lists “root class and _db_serial, db_partition”, but the code checks collation and HA apply-info, not partition. The comment is stale; the field-by-field table above follows the actual oid_check_cached_class_oid branches.

10.7 Restart seeding — re-anchoring the bit-area

Section titled “10.7 Restart seeding — re-anchoring the bit-area”

After recovery, the in-memory mvcctable is re-anchored to the MVCCID counter restored into log_Gl.hdr.mvcc_next_id:

// mvcctable::reset_start_mvccid -- src/transaction/mvcc_table.cpp
void
mvcctable::reset_start_mvccid ()
{
m_current_trans_status.m_active_mvccs.reset_start_mvccid (log_Gl.hdr.mvcc_next_id);
assert (m_trans_status_history_position < HISTORY_MAX_SIZE);
m_trans_status_history[m_trans_status_history_position].m_active_mvccs.reset_start_mvccid (log_Gl.hdr.mvcc_next_id);
m_current_status_lowest_active_mvccid.store (log_Gl.hdr.mvcc_next_id);
}

Three places, all seeded from the same restored counter: (1) the current status’s active-set start (m_bit_area_start_mvccid, per-class half below); (2) the current ring slot’s active-set start — the other 2047 slots stay untouched, overwritten lazily as completions cycle the ring (Ch.8); (3) the cached m_current_status_lowest_active_mvccid scalar, set to mvcc_next_id (no active transactions yet).

// mvcc_active_tran::reset_start_mvccid -- src/transaction/mvcc_active_tran.cpp
void
mvcc_active_tran::reset_start_mvccid (MVCCID mvccid)
{
m_bit_area_start_mvccid = mvccid;
if (m_initialized)
{
check_valid (); /* debug: bits past length must be zero */
}
}

Invariant: after restart, the active-set origin equals the next-to-issue MVCCID, with an empty active region. Every id the recovered database issues is >= mvcc_next_id, so the bit-area starts empty and correctly positioned. Called from the boot path in log_manager.c and three points in log_recovery.c (after analysis, after redo, after the final pass); explicitly // not thread safe, running single-threaded before any worker can build a snapshot.

  1. 2048-ring saturation under a slow snapshot build. Chapter 5’s build_mvcc_info copies a slot’s m_active_mvccs, then re-checks the captured trans_status_version against m_version.load (), resetting and looping on mismatch. The retry defends against a single concurrent mutation, but whether it is provably safe against a full-ring (HISTORY_MAX_SIZE = 2048) overwrite during one uninterrupted copy_to — where the slot mutates without a distinguishable version change — is not established by code or comments. Not expected in practice, but no explicit bound enforces it.

  2. complete_sub_mvcc informational-field bug (§10.5). The double assignment to m_last_completed_mvccid (second writes the SUBTRAN enum) plus the never-assigned m_event_type look like a copy-paste slip vs. complete_mvcc. Harmless (both // just for info), but a reader trusting the field on a sub-tran slot gets the enum, not an MVCCID.

  3. is_sub_active write path. Copied in mvcc_info::copy_to but never set by the MVCC core read here; its producer lives in the savepoint/system-operation layer. For visibility, sub_ids emptiness is the operative signal.

For the sub-transaction boundary vs. lock acquisition, escalation, and SERIALIZABLE write-skew detection — where MVCC visibility alone is insufficient and locks must close the gap — see the lock-manager detail companion (cubrid-lock-manager-detail.md), chapters on escalation and serializable conflict handling.

  1. Sub-transactions get their own MVCCIDs on a LIFO sub_ids stack. logtb_get_new_subtransaction_mvccid allocates one id (parent stamped) or two atomically via get_two_new_mvccid (parent unstamped), always keeping the parent below every sub-id.
  2. A parent sees its own and its subs’ writes via logtb_is_current_mvccid, which checks id then linear-scans the whole sub_ids vector — not just the top — so an earlier still-open sub’s write also reads back as “mine”.
  3. Sub-completion is a snapshot fix-up, not a vacuum event. logtb_complete_sub_mvcc bumps the snapshot’s highest_completed_mvccid past the sub-id and clears it from the active set; mvcctable::complete_sub_mvcc publishes a SUBTRAN slot but skips advance_oldest_active since an open parent’s sub can never be the oldest.
  4. MVCC-disabled classes carry no insert id. mvcc_is_mvcc_disabled_class returns true for the root class, _db_serial, _db_collation, and HA apply-info; their records read MVCCID_ALL_VISIBLE, short-circuiting every visibility/vacuum guard to “always visible, never reclaimed”.
  5. Restart re-anchors the table from the log header. reset_start_mvccid sets the current-status and current-ring-slot bit-area origins plus the cached lowest-active scalar to log_Gl.hdr.mvcc_next_id, leaving the active region empty — single-threaded during boot and at three recovery checkpoints.
  6. Two latent issues are open questions: a full-ring (2048) overwrite during one in-flight snapshot copy, and the complete_sub_mvcc informational-field double-assignment — neither affects correctness in observed paths.
SymbolFileLine
OR_MVCC_INSERT_ID_OFFSETsrc/base/object_representation.h483
OR_MVCC_DELETE_ID_OFFSETsrc/base/object_representation.h486
OR_MVCC_PREV_VERSION_LSA_OFFSETsrc/base/object_representation.h490
OR_GET_MVCC_FLAGsrc/base/object_representation.h548
OR_MVCC_MAX_HEADER_SIZEsrc/base/object_representation_constants.h142
OR_MVCC_MIN_HEADER_SIZEsrc/base/object_representation_constants.h145
OR_MVCC_FLAG_MASKsrc/base/object_representation_constants.h160
OR_MVCC_FLAG_VALID_INSIDsrc/base/object_representation_constants.h165
OR_MVCC_FLAG_VALID_DELIDsrc/base/object_representation_constants.h168
OR_MVCC_FLAG_VALID_PREV_VERSIONsrc/base/object_representation_constants.h171
or_mvcc_get_headersrc/base/object_representation_sr.c4237
or_mvcc_set_headersrc/base/object_representation_sr.c4296
or_mvcc_add_headersrc/base/object_representation_sr.c4381
or_mvcc_get_flagsrc/base/object_representation_sr.c4473
or_mvcc_set_flagsrc/base/object_representation_sr.c4488
or_mvcc_get_insidsrc/base/object_representation_sr.c4517
or_mvcc_set_insidsrc/base/object_representation_sr.c4544
or_mvcc_get_delidsrc/base/object_representation_sr.c4564
or_mvcc_get_chnsrc/base/object_representation_sr.c4592
or_mvcc_set_delidsrc/base/object_representation_sr.c4617
or_mvcc_set_chnsrc/base/object_representation_sr.c4638
or_mvcc_set_prev_version_lsasrc/base/object_representation_sr.c4654
or_mvcc_get_prev_version_lsasrc/base/object_representation_sr.c4680
PERF_SNAPSHOT_SATISFIES_SNAPSHOTsrc/base/perf_monitor.h238
PERF_SNAPSHOT_RECORD_INSERTED_VACUUMEDsrc/base/perf_monitor.h246
PERF_SNAPSHOT_RECORD_INSERTED_COMMITED_LOSTsrc/base/perf_monitor.h250
PERF_SNAPSHOT_RECORD_INSERTED_DELETEDsrc/base/perf_monitor.h252
PERF_SNAPSHOT_RECORD_DELETED_COMMITTED_LOSTsrc/base/perf_monitor.h257
perfmon_mvcc_snapshotsrc/base/perf_monitor.h1693
mvcc_header_size_lookupsrc/object/object_representation.c70
vacuum_data_entrysrc/query/vacuum.c104
vacuum_bootsrc/query/vacuum.c1291
vacuum_heap_pagesrc/query/vacuum.c1577
vacuum_master_task::executesrc/query/vacuum.c3002
vacuum_master_task::is_cursor_entry_ready_to_vacuumsrc/query/vacuum.c3106
vacuum_process_log_blocksrc/query/vacuum.c3251
is_not_vacuumed_and_lostsrc/query/vacuum.c7379
vacuum_rv_check_at_undosrc/query/vacuum.c7627
vacuum_is_mvccid_vacuumedsrc/query/vacuum.h271
heap_get_mvcc_headersrc/storage/heap_file.c7747
heap_attrinfo_transform_header_to_disksrc/storage/heap_file.c11937
heap_mvcc_log_insertsrc/storage/heap_file.c16371
heap_rv_mvcc_redo_insertsrc/storage/heap_file.c16442
heap_get_mvcc_rec_header_from_overflowsrc/storage/heap_file.c19541
heap_insert_adjust_recdes_headersrc/storage/heap_file.c20540
NULL_CHNsrc/storage/storage_common.h66
MVCCIDsrc/storage/storage_common.h186
MVCCID_NULLsrc/storage/storage_common.h327
MVCCID_ALL_VISIBLEsrc/storage/storage_common.h329
MVCCID_FIRSTsrc/storage/storage_common.h330
MVCCID_IS_NORMALsrc/storage/storage_common.h335
MVCCID_FORWARDsrc/storage/storage_common.h343
xlocator_upgrade_instances_domainsrc/transaction/locator_sr.c12126
log_Gl.mvcc_tablesrc/transaction/log_impl.h707
mvcc_next_idsrc/transaction/log_storage.hpp131
logtb_expand_trantablesrc/transaction/log_tran_table.c251
logtb_define_trantablesrc/transaction/log_tran_table.c366
logtb_get_number_of_total_tran_indicessrc/transaction/log_tran_table.c696
logtb_rv_assign_mvccid_for_undo_recoverysrc/transaction/log_tran_table.c1115
logtb_invalidate_snapshot_datasrc/transaction/log_tran_table.c3861
logtb_find_current_mvccidsrc/transaction/log_tran_table.c3910
logtb_get_current_mvccidsrc/transaction/log_tran_table.c3939
logtb_is_current_mvccidsrc/transaction/log_tran_table.c3972
logtb_get_mvcc_snapshotsrc/transaction/log_tran_table.c4007
logtb_complete_mvccsrc/transaction/log_tran_table.c4050
logtb_get_new_subtransaction_mvccidsrc/transaction/log_tran_table.c4547
logtb_assign_subtransaction_mvccidsrc/transaction/log_tran_table.c4578
logtb_complete_sub_mvccsrc/transaction/log_tran_table.c4593
log_tdes::lock_global_oldest_visible_mvccidsrc/transaction/log_tran_table.c6220
log_tdes::unlock_global_oldest_visible_mvccidsrc/transaction/log_tran_table.c6230
MVCC_IS_REC_INSERTER_ACTIVEsrc/transaction/mvcc.c46
MVCC_IS_REC_DELETER_ACTIVEsrc/transaction/mvcc.c49
MVCC_IS_REC_INSERTER_IN_SNAPSHOTsrc/transaction/mvcc.c52
MVCC_IS_REC_DELETER_IN_SNAPSHOTsrc/transaction/mvcc.c55
MVCC_IS_REC_INSERTED_SINCE_MVCCIDsrc/transaction/mvcc.c58
MVCC_IS_REC_DELETED_SINCE_MVCCIDsrc/transaction/mvcc.c61
mvcc_is_id_in_snapshotsrc/transaction/mvcc.c90
mvcc_is_active_idsrc/transaction/mvcc.c122
mvcc_satisfies_snapshotsrc/transaction/mvcc.c156
mvcc_is_not_deleted_for_snapshotsrc/transaction/mvcc.c280
mvcc_satisfies_vacuumsrc/transaction/mvcc.c321
mvcc_satisfies_deletesrc/transaction/mvcc.c389
mvcc_satisfies_dirtysrc/transaction/mvcc.c513
mvcc_is_mvcc_disabled_classsrc/transaction/mvcc.c628
mvcc_snapshot::copy_tosrc/transaction/mvcc.c679
mvcc_info::copy_tosrc/transaction/mvcc.c714
mvcc_rec_headersrc/transaction/mvcc.h38
MVCC_REC_HEADER_INITIALIZERsrc/transaction/mvcc.h47
MVCC_IS_HEADER_DELID_VALIDsrc/transaction/mvcc.h87
MVCC_IS_HEADER_INSID_NOT_ALL_VISIBLEsrc/transaction/mvcc.h91
MVCC_IS_HEADER_ALL_VISIBLEsrc/transaction/mvcc.h95
MVCC_IS_REC_INSERTED_BY_MEsrc/transaction/mvcc.h118
MVCC_IS_REC_DELETED_BY_MEsrc/transaction/mvcc.h122
MVCC_IS_REC_DELETED_BYsrc/transaction/mvcc.h130
MVCC_ID_PRECEDESsrc/transaction/mvcc.h141
MVCC_ID_FOLLOW_OR_EQUALsrc/transaction/mvcc.h142
MVCC_GET_PREV_VERSION_LSAsrc/transaction/mvcc.h156
mvcc_satisfies_snapshot_resultsrc/transaction/mvcc.h159
MVCC_SNAPSHOT_FUNCsrc/transaction/mvcc.h171
mvcc_snapshotsrc/transaction/mvcc.h173
mvcc_infosrc/transaction/mvcc.h196
mvcc_satisfies_delete_resultsrc/transaction/mvcc.h222
mvcc_satisfies_vacuum_resultsrc/transaction/mvcc.h232
mvcc_active_tran::mvcc_active_transrc/transaction/mvcc_active_tran.cpp31
mvcc_active_tran::initializesrc/transaction/mvcc_active_tran.cpp47
mvcc_active_tran::finalizesrc/transaction/mvcc_active_tran.cpp62
mvcc_active_tran::resetsrc/transaction/mvcc_active_tran.cpp74
mvcc_active_tran::long_tran_max_sizesrc/transaction/mvcc_active_tran.cpp99
mvcc_active_tran::bit_size_to_unit_sizesrc/transaction/mvcc_active_tran.cpp105
mvcc_active_tran::units_to_bitssrc/transaction/mvcc_active_tran.cpp111
mvcc_active_tran::units_to_bytessrc/transaction/mvcc_active_tran.cpp117
mvcc_active_tran::get_mask_ofsrc/transaction/mvcc_active_tran.cpp123
mvcc_active_tran::get_bit_offsetsrc/transaction/mvcc_active_tran.cpp129
mvcc_active_tran::get_mvccidsrc/transaction/mvcc_active_tran.cpp135
mvcc_active_tran::get_unit_ofsrc/transaction/mvcc_active_tran.cpp141
mvcc_active_tran::is_setsrc/transaction/mvcc_active_tran.cpp147
mvcc_active_tran::get_area_sizesrc/transaction/mvcc_active_tran.cpp153
mvcc_active_tran::get_bit_area_memsizesrc/transaction/mvcc_active_tran.cpp159
mvcc_active_tran::compute_highest_completed_mvccidsrc/transaction/mvcc_active_tran.cpp171
mvcc_active_tran::compute_lowest_active_mvccidsrc/transaction/mvcc_active_tran.cpp220
mvcc_active_tran::copy_tosrc/transaction/mvcc_active_tran.cpp280
mvcc_active_tran::is_activesrc/transaction/mvcc_active_tran.cpp318
mvcc_active_tran::remove_long_transactionsrc/transaction/mvcc_active_tran.cpp356
mvcc_active_tran::add_long_transactionsrc/transaction/mvcc_active_tran.cpp377
mvcc_active_tran::ltrim_areasrc/transaction/mvcc_active_tran.cpp386
mvcc_active_tran::set_bitarea_mvccidsrc/transaction/mvcc_active_tran.cpp414
mvcc_active_tran::cleanup_migrate_to_long_transationssrc/transaction/mvcc_active_tran.cpp462
mvcc_active_tran::set_inactive_mvccidsrc/transaction/mvcc_active_tran.cpp493
mvcc_active_tran::reset_start_mvccidsrc/transaction/mvcc_active_tran.cpp506
mvcc_active_tran::reset_active_transactionssrc/transaction/mvcc_active_tran.cpp517
mvcc_active_tran::check_validsrc/transaction/mvcc_active_tran.cpp525
mvcc_active_transrc/transaction/mvcc_active_tran.hpp31
mvcc_active_tran::unit_typesrc/transaction/mvcc_active_tran.hpp63
BITAREA_MAX_SIZEsrc/transaction/mvcc_active_tran.hpp65
mvcc_active_tran::BITAREA_MAX_SIZEsrc/transaction/mvcc_active_tran.hpp65
UNIT_BIT_COUNTsrc/transaction/mvcc_active_tran.hpp69
BITAREA_MAX_MEMSIZEsrc/transaction/mvcc_active_tran.hpp71
BITAREA_MAX_BITSsrc/transaction/mvcc_active_tran.hpp72
ALL_ACTIVEsrc/transaction/mvcc_active_tran.hpp74
mvcc_active_tran::ALL_ACTIVEsrc/transaction/mvcc_active_tran.hpp74
mvcc_active_tran::ALL_COMMITTEDsrc/transaction/mvcc_active_tran.hpp75
mvcc_active_tran::m_bit_areasrc/transaction/mvcc_active_tran.hpp78
mvcc_active_tran::m_bit_area_start_mvccidsrc/transaction/mvcc_active_tran.hpp80
mvcc_active_tran::m_bit_area_lengthsrc/transaction/mvcc_active_tran.hpp82
mvcc_active_tran::m_long_tran_mvccidssrc/transaction/mvcc_active_tran.hpp85
mvcc_active_tran::m_long_tran_mvccids_lengthsrc/transaction/mvcc_active_tran.hpp87
mvcc_active_tran::m_initializedsrc/transaction/mvcc_active_tran.hpp89
oldest_active_setsrc/transaction/mvcc_table.cpp92
oldest_active_getsrc/transaction/mvcc_table.cpp102
mvcc_trans_status::mvcc_trans_statussrc/transaction/mvcc_table.cpp116
mvcc_trans_status::initializesrc/transaction/mvcc_table.cpp128
mvcc_trans_status::finalizesrc/transaction/mvcc_table.cpp135
mvcctable::advance_oldest_activesrc/transaction/mvcc_table.cpp142
mvcctable::mvcctablesrc/transaction/mvcc_table.cpp164
mvcctable::initializesrc/transaction/mvcc_table.cpp184
mvcctable::alloc_transaction_lowest_activesrc/transaction/mvcc_table.cpp199
mvcctable::finalizesrc/transaction/mvcc_table.cpp212
mvcctable::build_mvcc_infosrc/transaction/mvcc_table.cpp226
mvcctable::compute_oldest_visible_mvccidsrc/transaction/mvcc_table.cpp355
mvcctable::is_activesrc/transaction/mvcc_table.cpp422
mvcctable::next_trans_status_startsrc/transaction/mvcc_table.cpp441
mvcctable::next_tran_status_finishsrc/transaction/mvcc_table.cpp455
mvcctable::complete_mvccsrc/transaction/mvcc_table.cpp465
mvcctable::complete_sub_mvccsrc/transaction/mvcc_table.cpp541
mvcctable::get_new_mvccidsrc/transaction/mvcc_table.cpp565
mvcctable::get_two_new_mvccidsrc/transaction/mvcc_table.cpp579
mvcctable::reset_transaction_lowest_activesrc/transaction/mvcc_table.cpp593
mvcctable::reset_start_mvccidsrc/transaction/mvcc_table.cpp599
mvcctable::get_global_oldest_visiblesrc/transaction/mvcc_table.cpp611
mvcctable::update_global_oldest_visiblesrc/transaction/mvcc_table.cpp617
mvcctable::lock_global_oldest_visiblesrc/transaction/mvcc_table.cpp632
mvcctable::unlock_global_oldest_visiblesrc/transaction/mvcc_table.cpp638
mvcctable::is_global_oldest_visible_lockedsrc/transaction/mvcc_table.cpp645
mvcc_trans_statussrc/transaction/mvcc_table.hpp40
mvcctablesrc/transaction/mvcc_table.hpp64
HISTORY_MAX_SIZEsrc/transaction/mvcc_table.hpp97
mvcctable::HISTORY_MAX_SIZEsrc/transaction/mvcc_table.hpp97
HISTORY_INDEX_MASKsrc/transaction/mvcc_table.hpp98
mvcctable::m_transaction_lowest_visible_mvccidssrc/transaction/mvcc_table.hpp101
mvcctable::m_current_status_lowest_active_mvccidsrc/transaction/mvcc_table.hpp104
mvcctable::m_current_trans_statussrc/transaction/mvcc_table.hpp107
m_trans_status_history_positionsrc/transaction/mvcc_table.hpp110
mvcctable::m_trans_status_history_positionsrc/transaction/mvcc_table.hpp110
mvcctable::m_trans_status_historysrc/transaction/mvcc_table.hpp111
mvcctable::m_oldest_visiblesrc/transaction/mvcc_table.hpp118
mvcctable::m_ov_lock_countsrc/transaction/mvcc_table.hpp119
  • cubrid-mvcc.md — the high-level companion (design intent, theory).
  • Raw analyses under raw/code-analysis/cubrid/storage/mvcc/.
  • Code: src/transaction/mvcc.{h,c}, mvcc_table.{hpp,cpp}, mvcc_active_tran.{hpp,cpp}; MVCC record headers in src/storage/heap_file.c; vacuum coordination in src/transaction/vacuum.c.
  • Methodology: knowledge/methodology/code-analysis-detail-doc.md.