CUBRID Heap Manager — Code-Level Deep Dive
이 콘텐츠는 아직 번역되지 않았습니다.
Where this document fits: The high-level analysis
cubrid-heap-manager.mdcovers design intent and theoretical background. This document traces every branch and field at the code level. Each chapter is self-contained, but reading in order follows the full lifecycle of a single heap record inside the kernel.
Contents:
Chapter 1: Data Structure Map
Section titled “Chapter 1: Data Structure Map”The field-by-field reference the rest of the document leans on. MVCC visibility,
slotted-page theory, and the forwarding/overflow rationale are not re-derived
here — see the high-level companion (cubrid-heap-manager.md). Three layers,
stitched by the record OID: the page layer (slotted_page.h), the heap-file layer
(heap_file.c), and the transient operation/read bundles (heap_file.h).
1.1 The relationship map
Section titled “1.1 The relationship map”graph TB
subgraph disk["On-disk heap page"]
HDR["SPAGE_HEADER"]
SLOT0["SPAGE_SLOT[0]"]
SLOTN["SPAGE_SLOT[1..n]"]
HDR --> SLOT0
HDR --> SLOTN
end
STATS["HEAP_HDR_STATS<br/>header page slot 0"]
CHAIN["HEAP_CHAIN<br/>data page slot 0"]
HFID["HFID<br/>vfid + hpgid"] -->|hpgid| STATS
STATS -->|next_vpid| CHAIN
CHAIN -->|prev/next_vpid| CHAIN
subgraph mem["In-memory bundles"]
OPCTX["HEAP_OPERATION_CONTEXT<br/>4x PGBUF_WATCHER"]
GETCTX["HEAP_GET_CONTEXT<br/>2x PGBUF_WATCHER"]
SCAN["HEAP_SCANCACHE<br/>snapshot / page_latch"]
NODE["HEAP_SCANCACHE_NODE"]
SCAN --> NODE
OPCTX -->|scan_cache_p| SCAN
GETCTX -->|scan_cache| SCAN
end
OPCTX -->|hfid| HFID
NODE -->|hfid| HFID
Figure 1-1. Disk structures (top), the heap-file spine (HFID to header-page
HEAP_HDR_STATS to the HEAP_CHAIN doubly-linked list), and the in-memory
bundles that latch into those pages during one operation.
Invariant:
HEAP_HEADER_AND_CHAIN_SLOTID == 0. Slot 0 of every heap page is reserved metadata:HEAP_HDR_STATSon the header page,HEAP_CHAINon every other page. Enforced twice — slot 0 is allocated at page-init so a normalspage_insertnever returns it, andHEAP_ISJUNK_OIDrejects any OID withslotid == 0. A user record in slot 0 would make the chain walk read record bytes as a chain/stats struct, corrupting the heap.
// HEAP_HEADER_AND_CHAIN_SLOTID -- src/storage/heap_file.h#define HEAP_HEADER_AND_CHAIN_SLOTID 0 /* Slot for chain and header */
// HEAP_ISJUNK_OID -- src/storage/heap_file.h#define HEAP_ISJUNK_OID(oid) \ ((oid)->slotid == HEAP_HEADER_AND_CHAIN_SLOTID \ || (oid)->slotid < 0 || (oid)->volid < 0 || (oid)->pageid < 0)1.2 Page layer — SPAGE_HEADER and SPAGE_SLOT
Section titled “1.2 Page layer — SPAGE_HEADER and SPAGE_SLOT”A fixed header, the slot array growing from the front, records from the back. Mechanics are Chapter 2.
// spage_header -- src/storage/slotted_page.h (comments in table)struct spage_header{ PGNSLOTS num_slots, num_records; INT16 anchor_type; /* ANCHORED / ANCHORED_DONT_REUSE_SLOTS / UNANCHORED_* */ unsigned short alignment; /* char, short, int, double */ int total_free, cont_free, offset_to_free_area; int reserved1; int flags; /* always SPAGE_HEADER_FLAG_NONE */ unsigned int is_saving:1; /* save-for-undo */ unsigned int need_update_best_hint:1; /* best-hint refresh */ unsigned int reserved_bits:30;};| Field | Role / why it exists |
|---|---|
num_slots, num_records | Slot-array length and live count; slots persist after delete, so num_slots >= num_records. |
anchor_type | ANCHORED/ANCHORED_DONT_REUSE_SLOTS/UNANCHORED_*. Heap uses anchored → stable slot ids → OIDs never move. |
alignment | Record alignment in bytes (heap uses INT_ALIGNMENT). |
total_free | All free bytes; tested vs record+slot size to see if it fits after compaction. |
cont_free | Bytes in the single contiguous gap; if short but total_free suffices, page is compacted first. |
offset_to_free_area | Bump pointer where a new record is written. |
reserved1 | Reserved int; keeps header 8-byte aligned. |
flags | Page flags; only SPAGE_HEADER_FLAG_NONE used. |
is_saving | 1 bit: save-space-for-undo (Ch 10). |
need_update_best_hint | 1 bit: best hint stale, refresh estimates (Ch 9). |
reserved_bits | 30-bit padding; pins the bitfield-word layout. |
// spage_slot -- src/storage/slotted_page.h (4-byte disk slot)struct spage_slot{ unsigned int offset_to_record:14; /* Byte offset to start of record */ unsigned int record_length:14; /* Length of record */ unsigned int record_type:4; /* REC_HOME, REC_NEWHOME, ... */};| Field | Role / why it exists |
|---|---|
offset_to_record (14b) | Offset to record bytes; the indirection making slot ids stable (compaction rewrites only this). Caps at 16383. |
record_length (14b) | Record byte length; bounds copy/peek. Caps an on-page record at 16383 bytes (larger → overflow). |
record_type (4b) | REC_HOME/NEWHOME/RELOCATION/BIGONE/DELETED_WILL_REUSE/MARKDELETED/ASSIGN_ADDRESS/… — branch key for every flow (Ch 3). |
The whole slot is exactly 4 bytes (14+14+4 = 32 bits).
1.3 Heap-file layer — HFID, HEAP_HDR_STATS, HEAP_CHAIN
Section titled “1.3 Heap-file layer — HFID, HEAP_HDR_STATS, HEAP_CHAIN”HFID is the heap’s name: a file id plus the header page’s id.
// hfid -- src/storage/storage_common.hstruct hfid{ VFID vfid; /* Volume and file identifier */ INT32 hpgid; /* First page identifier (the header page) */};| Field | Role / why it exists |
|---|---|
vfid | (volid, fileid) of the heap’s file; used for every page alloc/dealloc. |
hpgid | Page id of the header page — heap entry point; with vfid.volid gives the header VPID that next_vpid walks from. |
HEAP_HDR_STATS is the heap’s global control block, in slot 0 of the header page.
Its nested estimates block is the best-space hint cache (Chapter 9), not logged.
// heap_hdr_stats -- src/storage/heap_file.c (comments elided; see table)struct heap_hdr_stats{ OID class_oid; /* the first field MUST be class_oid */ VFID ovf_vfid; VPID next_vpid; /* the 2nd page of the heap file */ int unfill_space; struct { int num_pages, num_recs; float recs_sumlen; int num_other_high_best, num_high_best, num_substitutions; int num_second_best, head_second_best, tail_second_best, head; VPID last_vpid; /* todo: move out of estimates */ VPID full_search_vpid; VPID second_best[HEAP_NUM_BEST_SPACESTATS]; /* 10 hints */ HEAP_BESTSPACE best[HEAP_NUM_BEST_SPACESTATS]; /* 10 hints */ } estimates; int reserve0_for_future, reserve1_for_future, reserve2_for_future;};| Field | Role / why it exists |
|---|---|
class_oid | OID of the stored class. Must be first — slot 0’s leading OID is the class id, shared with HEAP_CHAIN for validation. |
ovf_vfid | Overflow file id (holds REC_BIGONE bodies); null until first overflow. |
next_vpid | VPID of the 2nd page — head of the page chain. |
unfill_space | Free-space floor for inserts; headroom so updates grow in place. |
estimates.num_pages, num_recs, recs_sumlen | Estimated page/record count and total record bytes; recs_sumlen derives average length. |
estimates.num_other_high_best | Good pages not in best[]; triggers fuller search before growing the file. |
estimates.num_high_best | best[] entries >= HEAP_DROP_FREE_SPACE; at zero, rescans. |
estimates.num_substitutions | Substitution count; feeds second-best promotion. |
estimates.num_second_best, head_second_best, tail_second_best | Count, read index (oldest), write index of the second_best[] ring. |
estimates.head | Head index of best[] ring; where alloc/scan starts. |
estimates.last_vpid | Last/append page; todo to relocate out of estimates. |
estimates.full_search_vpid | Resume point for an incremental scan across calls. |
estimates.second_best[10] | Ring of 10 decent VPIDs; used when best[] runs dry. |
estimates.best[10] | Ring of 10 HEAP_BESTSPACE; primary hint set. HEAP_NUM_BEST_SPACESTATS == 10. |
reserve0/1/2_for_future | Padding ints; reserved. |
Invariant:
estimatesis a hint, never the truth. Changes are not logged — “only used for hints,” “may not be accurate,” “may contain duplicated pages.” Consumers re-validate a candidate’s realtotal_freebefore use; trustingbest[]blindly could target a full page (Chapter 9).
HEAP_CHAIN lives in slot 0 of every non-header page; the doubly-linked list.
// heap_chain -- src/storage/heap_file.c (Double-linked)struct heap_chain{ OID class_oid; /* the first must be class_oid */ VPID prev_vpid; /* Previous page */ VPID next_vpid; /* Next page */ MVCCID max_mvccid; /* Max MVCCID of any MVCC operation on this page */ INT32 flags; /* 2 high bits encode vacuum state */};| Field | Role / why it exists |
|---|---|
class_oid | Class OID (same leading-field convention); page-validate without distinguishing header vs data page. |
prev_vpid | Previous page VPID; backward traversal and chain repair. |
next_vpid | Next page VPID; forward traversal — what the spine HEAP_HDR_STATS.next_vpid feeds. |
max_mvccid | Largest MVCCID of any op here. Vacuum predictor: when it precedes vacuum’s oldest MVCCID, the page is fully vacuumed. Init MVCCID_NULL (Ch 8). |
flags | INT32; top 2 bits = vacuum status (HEAP_PAGE_VACUUM_NONE/ONCE/UNKNOWN, mask 0xC0000000). Rest reserved. |
Vacuum status is packed into the two high bits of flags, accessed only through
HEAP_PAGE_SET_VACUUM_STATUS / HEAP_PAGE_GET_VACUUM_STATUS:
// HEAP_PAGE_FLAG_VACUUM_STATUS_MASK -- src/storage/heap_file.c#define HEAP_PAGE_FLAG_VACUUM_STATUS_MASK 0xC0000000#define HEAP_PAGE_FLAG_VACUUM_ONCE 0x80000000#define HEAP_PAGE_FLAG_VACUUM_UNKNOWN 0x40000000// status == NONE => both bits clear /* <- the HEAP_PAGE_VACUUM_NONE encoding */Invariant:
max_mvccidis monotonically non-decreasing per page. Every MVCC op doesif (MVCC_ID_PRECEDES (chain->max_mvccid, mvccid)) chain->max_mvccid = mvccid;(debug-asserted). If it moved backward, vacuum could deallocate a page still holding pending versions.
1.4 Operation bundle — HEAP_OPERATION_CONTEXT and its enums
Section titled “1.4 Operation bundle — HEAP_OPERATION_CONTEXT and its enums”The single argument threaded through every write: type and update_in_place
drive dispatch; up to four watchers, a home-record stack buffer, and the output OID
ride along.
// HEAP_OPERATION_TYPE -- src/storage/heap_file.htypedef enum{ HEAP_OPERATION_NONE = 0, HEAP_OPERATION_INSERT, HEAP_OPERATION_DELETE, HEAP_OPERATION_UPDATE} HEAP_OPERATION_TYPE;
// update_inplace_style -- src/storage/heap_file.henum update_inplace_style{ UPDATE_INPLACE_NONE = 0, /* None */ UPDATE_INPLACE_CURRENT_MVCCID = 1, /* non-MVCC in-place update with current MVCC ID */ UPDATE_INPLACE_OLD_MVCCID = 2 /* non-MVCC in-place update, preserves old MVCC ID */};typedef enum update_inplace_style UPDATE_INPLACE_STYLE;#define HEAP_IS_UPDATE_INPLACE(update_inplace_style) \ ((update_inplace_style) != UPDATE_INPLACE_NONE)HEAP_OPERATION_TYPE tags the write path. UPDATE_INPLACE_STYLE is orthogonal: an
MVCC update runs physically in place yet maps to UPDATE_INPLACE_NONE (source:
“mvcc update is also executed inplace, but coresponds to UPDATE_INPLACE_NONE”) —
so UPDATE_INPLACE_NONE means new logical version, while HEAP_IS_UPDATE_INPLACE
(styles 1/2) is a true non-MVCC overwrite. Chapter 6 walks the branches.
// heap_operation_context -- src/storage/heap_file.h (condensed)struct heap_operation_context{ HEAP_OPERATION_TYPE type; UPDATE_INPLACE_STYLE update_in_place; HFID hfid; OID oid; OID class_oid; RECDES *recdes_p; HEAP_SCANCACHE *scan_cache_p; RECDES map_recdes; OID ovf_oid; /* overflow transient */ RECDES home_recdes; char home_recdes_buffer[IO_MAX_PAGE_SIZE + MAX_ALIGNMENT]; INT16 record_type; FILE_TYPE file_type; PGBUF_WATCHER home_page_watcher, overflow_page_watcher, header_page_watcher, forward_page_watcher; PGBUF_WATCHER *home_page_watcher_p, *overflow_page_watcher_p, *header_page_watcher_p, *forward_page_watcher_p; /* the handles */ OID res_oid; bool is_logical_old; /* logical output */ bool is_redistribute_insert_with_delid; bool is_bulk_op; bool use_bulk_logging; bool do_supplemental_log; LOG_LSA supp_undo_lsa, supp_redo_lsa; PERF_UTIME_TRACKER *time_track; /* perf stat dump */};| Field | Role / why it exists |
|---|---|
type | Which write op (INSERT/DELETE/UPDATE/NONE); drives dispatch. |
update_in_place | Update style; mutate vs new-version relocation (Ch 6). |
hfid | Target heap; names file/header page for alloc + best-space. |
oid | Input OID (delete/update target; ignored for insert) — the home address. |
class_oid | Class; locking, MVCC header build, index maintenance. |
recdes_p | Caller’s record descriptor — the new bytes. |
scan_cache_p | Optional reuse of latched pages + snapshot. |
map_recdes | Map record built during overflow insert (points at the overflow object). |
ovf_oid | Overflow object location; set for REC_BIGONE. |
home_recdes | Descriptor for the fetched home record (read before mutation). |
home_recdes_buffer | Inline stack buffer backing home_recdes; avoids heap alloc. |
record_type | Type of the original record before mutation. |
file_type | FILE_HEAP/FILE_HEAP_REUSE_SLOTS; slot reuse (Ch 7). |
home/overflow/header/forward_page_watcher | Four embedded PGBUF_WATCHER storage slots (overflow → REC_BIGONE, header → best-space update, forward → REC_RELOCATION/REC_NEWHOME). |
*_watcher_p (×4) | Pointers to the four watchers — the handles; null = page not involved. |
res_oid | Output OID; for insert, the assigned address. |
is_logical_old | Output: initial record was not REC_ASSIGN_ADDRESS; for logging. |
is_redistribute_insert_with_delid | Insert from a partition redistribute carrying a valid delid. |
is_bulk_op, use_bulk_logging, do_supplemental_log, supp_undo_lsa, supp_redo_lsa | Logging/bulk control: bulk-insert flag (also disables MVCC ops), bulk log path, supplemental-log enable, and the supplemental undo/redo image LSAs. |
time_track | PERF_UTIME_TRACKER * — perf-stat dump. |
Invariant: pages are touched only through the
_pwatcher pointers. The four embeddedPGBUF_WATCHERs are storage; the*_watcher_ppointers are the handles (“should not be referenced directly”). Code points one at its embedded watcher to latch that page, NULL otherwise; cleanup unfixes exactly the non-NULL ones. Touching an embedded watcher directly risks a double-unfix or leaked latch.
1.5 Read bundle — HEAP_GET_CONTEXT
Section titled “1.5 Read bundle — HEAP_GET_CONTEXT”The read-side counterpart. It needs only two watchers — home and forward — since a read never touches the header or overflow page the way a write does.
// heap_get_context -- src/storage/heap_file.h (comments in table)struct heap_get_context{ INT16 record_type; const OID *oid_p; OID forward_oid; /* of REC_RELOCATION or REC_BIGONE */ OID *class_oid_p; RECDES *recdes_p; HEAP_SCANCACHE *scan_cache; PGBUF_WATCHER home_page_watcher, fwd_page_watcher; bool ispeeking; /* PEEK or COPY */ int old_chn; PGBUF_LATCH_MODE latch_mode;};| Field | Role / why it exists |
|---|---|
record_type | Type at oid_p; branch key for forwarding (Ch 5): REC_HOME in place, REC_RELOCATION/REC_BIGONE chase forward_oid. |
oid_p | Requested OID (input, const); where the read starts. |
forward_oid | OID the home slot forwards to; second hop, filled when record_type demands. |
class_oid_p | Class OID (in/out); needed for snapshot/CHN. |
recdes_p | Where bytes are returned; into the page (PEEK) or a copy buffer (COPY). |
scan_cache | Governing HEAP_SCANCACHE; supplies mvcc_snapshot. |
home_page_watcher, fwd_page_watcher | Home and forward page watchers; forward latched on the second hop, null until forwarding. |
ispeeking | PEEK (zero-copy, holds latch) vs COPY (frees latch sooner). |
old_chn | Caller’s cached CHN; matching the record’s CHN skips the copy (see companion). |
latch_mode | READ normally, WRITE for e.g. serial increment. |
1.6 Scan bundle — HEAP_SCANCACHE and HEAP_SCANCACHE_NODE
Section titled “1.6 Scan bundle — HEAP_SCANCACHE and HEAP_SCANCACHE_NODE”The longest-lived bundle: it survives many get/next calls, caching a fixed page,
the snapshot, and the latch/lock policy. It embeds one HEAP_SCANCACHE_NODE for the
current heap plus a list of more for partitioned scans.
// heap_scancache_node -- src/storage/heap_file.hstruct heap_scancache_node{ HFID hfid; /* Heap file of scan */ OID class_oid; /* Class oid of scanned instances */ const char *classname;};
// heap_scancache -- src/storage/heap_file.h (condensed C++ class; comments in table)struct heap_scancache{ int debug_initpattern; HEAP_SCANCACHE_NODE node; LOCK page_latch; /* NULL_LOCK to skip per-page lock */ bool cache_last_fix_page, mvcc_disabled_class; PGBUF_WATCHER page_watcher; int num_btids; multi_index_unique_stats *m_index_stats; FILE_TYPE file_type; MVCC_SNAPSHOT *mvcc_snapshot; HEAP_SCANCACHE_NODE_LIST *partition_list;private: cubmem::single_block_allocator *m_area; /* the one private member */};| Field | Role / why it exists |
|---|---|
node.hfid | Heap being scanned — the file whose pages the scan walks. |
node.class_oid | Class of scanned instances; locking and visibility. |
node.classname | Cached class name; logging/diagnostics, avoids re-lookup per record. |
debug_initpattern | Init sentinel; catches use of an uninitialized scancache in debug builds. |
node | Current HEAP_SCANCACHE_NODE; set via HEAP_SCANCACHE_SET_NODE. |
page_latch | LOCK for heap pages, or NULL_LOCK when the class is already locked S/SIX/X. |
cache_last_fix_page | Keep last fixed page + area memory; avoids re-fixing when many records sit on one page. |
mvcc_disabled_class | Class is non-MVCC; skips snapshot visibility for catalog/non-MVCC classes. |
page_watcher | Watcher holding the cached fixed page — the handle for cache_last_fix_page. |
num_btids | Index count on the class; sizes index-maintenance work for scan-driven updates. |
m_index_stats | Per-index unique stats (a source comment questions if it belongs here). |
file_type | FILE_HEAP/FILE_HEAP_REUSE_SLOTS; same slot-reuse decision, available to the scan. |
mvcc_snapshot | Governing MVCC snapshot; single source of visibility, passed to the get context. |
partition_list | List of HEAP_SCANCACHE_NODEs for sub-heaps; a partitioned scan crosses several heaps. |
(private) m_area | The cubmem::single_block_allocator * backing the COPY-area methods. |
Invariant: a non-NULL
page_latchis required unless the class lock already covers the page. It “may beNULL_LOCKwhen it is secure to skip lock on heap pages” — i.e. when the class is held S/SIX/X. LeavingNULL_LOCKwithout that covering lock lets two transactions touch the same page without serialization.
1.7 HEAP_BESTSPACE
Section titled “1.7 HEAP_BESTSPACE”The unit the best-space machinery (Chapter 9) trades in. Each estimates.best[]
entry is one, and the global best-space hash caches them too.
// heap_bestspace -- src/storage/heap_file.hstruct heap_bestspace{ VPID vpid; /* Vpid of one of the best pages */ int freespace; /* Estimated free space in this page */};| Field | Role / why it exists |
|---|---|
vpid | (volid, pageid) of a good-free-space page; the page an insert tries first. |
freespace | Estimated free bytes; ranks candidates, re-validated against real SPAGE_HEADER.total_free before use (per §1.3). |
1.8 Chapter summary — key takeaways
Section titled “1.8 Chapter summary — key takeaways”- Three layers, one OID thread. Slotted page stores bytes; the heap-file layer
(
HFID→HEAP_HDR_STATS→ chainedHEAP_CHAINs) gives identity and order; the bundles carry latches and policy through one call. - Slot 0 is sacred.
HEAP_HEADER_AND_CHAIN_SLOTID == 0reserves it forHEAP_HDR_STATS(header) orHEAP_CHAIN(data);HEAP_ISJUNK_OIDkeeps user OIDs out. SPAGE_SLOTis a 4-byte 14/14/4 bitfield.offset_to_recordis the stable-address indirection,record_lengthbounds the span, 4-bitrecord_typeis the branch key every flow switches on.HEAP_HDR_STATS.estimatesandHEAP_BESTSPACEare advisory and unlogged. Best/second-best rings (10 each) are hints, re-validated against real free space.HEAP_CHAINpacks vacuum state intoflags. Top two bits encodeVACUUM_NONE/ONCE/UNKNOWN;max_mvccid(monotonic) predicts a clean page.HEAP_OPERATION_CONTEXTowns four watchers via indirection.*_watcher_ppointers are the only legal handles;type+update_in_placeselect the path.HEAP_GET_CONTEXTis the lighter read twin (home + forward only), andHEAP_SCANCACHE(withHEAP_SCANCACHE_NODE) holds the snapshot, cached page, andpage_latchpolicy across many calls.
Chapter 2: Slotted Page Primitives and Page Initialization
Section titled “Chapter 2: Slotted Page Primitives and Page Initialization”Every heap record — REC_HOME, REC_NEWHOME, REC_BIGONE, REC_RELOCATION — lives inside a slotted page. The heap layer never touches payload directly; it asks slotted_page.c to carve out, resize, and reclaim variable-length areas and hands back a stable slot id. This chapter dissects that substrate: layout, the free-space invariant, the four anchor types, and the slot primitives Chapters 4–8 call down into. The high-level companion (cubrid-heap-manager.md) explains why CUBRID splits a stable OID from a moving byte offset; here we trace how the slot id stays fixed while the offset moves.
2.1 The page in memory
Section titled “2.1 The page in memory”A slotted page is a fixed buffer (SPAGE_DB_PAGESIZE) with the SPAGE_HEADER at offset 0. Record payloads grow downward from past the header, the slot array upward from the page end, and the gap between is the contiguous free area. SPAGE_SLOT is the 4-byte indirection unit, three packed bit-fields:
// spage_slot -- src/storage/slotted_page.hstruct spage_slot{ unsigned int offset_to_record:14; /* byte offset of record start */ unsigned int record_length:14; /* current record length */ unsigned int record_type:4; /* REC_HOME, REC_NEWHOME, ... */};The 14-bit fields cap a page at 16 KB — CUBRID’s max page size. record_type is what Chapter 3 reads to dispatch interpretation. Slot N is found by counting backward from the last 4 bytes (spage_find_slot: slot_p = page_p + SPAGE_DB_PAGESIZE - sizeof(SPAGE_SLOT); slot_p -= slot_id;) — the geometry that lets the slot array (growing up, slotN..slot0) and record area (growing down) meet at the free gap, whose first byte is offset_to_free_area.
The SPAGE_HEADER carries the bookkeeping this chapter manipulates: num_slots (array length / iteration bound) and num_records (live count, so num_slots - num_records is the reuse pool); anchor_type (§2.3) and alignment; the three free-space counters below; and is_saving, arming the undo-reserve of §2.7.
INVARIANT (header consistency). At every primitive’s entry/exit:
total_free >= 0,0 <= cont_free <= total_free,offset_to_free_area < SPAGE_DB_PAGESIZEandalignment-aligned,0 <= num_records <= num_slots. Enforced byspage_verify_header/SPAGE_VERIFY_HEADER. If violated, free-space arithmetic is corrupt and a later insert can overwrite a live record.
2.2 Page bring-up: spage_initialize and spage_verify_header
Section titled “2.2 Page bring-up: spage_initialize and spage_verify_header”spage_initialize is the only function that establishes the starting invariant from scratch, with no branches (a debug-only assert (spage_is_valid_anchor_type (slot_type)) guards the 1..4 anchor range). It zeroes num_slots/num_records, stores is_saving and anchor_type, then sets total_free = DB_ALIGN (SPAGE_DB_PAGESIZE - sizeof (SPAGE_HEADER), alignment), cont_free = total_free, and offset_to_free_area = DB_ALIGN (sizeof (SPAGE_HEADER), alignment). The canonical empty page: cont_free == total_free, both counts zero, cursor just past the aligned header. total_free excludes the header but does not yet subtract slot-array space — slot bytes are charged lazily per allocation.
spage_verify_header is the runtime auditor behind §2.1: it ANDs all bound checks and on failure formats the header, raises ER_SP_INVALID_HEADER, and assert (false). SPAGE_VERIFY_HEADER is its macro form, at nearly every primitive boundary.
2.3 The four anchor types and slot reuse
Section titled “2.3 The four anchor types and slot reuse”| Anchor type | On delete, slot id is… | Reuse |
|---|---|---|
ANCHORED (1) | Kept; marked REC_DELETED_WILL_REUSE | Same id reusable by later insert |
ANCHORED_DONT_REUSE_SLOTS (2) | Kept; marked REC_MARKDELETED | Never reused until spage_reclaim; heap’s choice so an OID is never silently reassigned |
UNANCHORED_ANY_SEQUENCE (3) | Removed; last slot moved into hole | Ids unstable, order not preserved |
UNANCHORED_KEEP_SEQUENCE (4) | Removed; higher slots memmove down | Ids unstable, order preserved |
Heap data pages are ANCHORED_DONT_REUSE_SLOTS: an OID is (volid, pageid, slotid), so the slotid must outlive the record. spage_is_valid_anchor_type rejects anything outside 1..4.
2.4 Free-space checks: spage_has_enough_total_space, spage_has_enough_contiguous_space, spage_check_space
Section titled “2.4 Free-space checks: spage_has_enough_total_space, spage_has_enough_contiguous_space, spage_check_space”spage_has_enough_total_space answers “room at all”: true for space <= 0, else space <= total_free minus spage_get_total_saved_spaces(...) on an is_saving page (reserved undo). spage_has_enough_contiguous_space is space <= cont_free || spage_compact(...) == NO_ERROR — it triggers compaction as a side effect. spage_check_space composes both: total fails → SP_DOESNT_FIT; else contiguous fails (compaction errored) → SP_ERROR; else SP_SUCCESS.
Return-code contract.
SP_SUCCESS (1)= done;SP_DOESNT_FIT (3)= too full even after compaction, caller must find another page;SP_ERROR (-1)= hard internal failure (compaction error, corrupt slot, illegal op for the anchor), not “try elsewhere.” Confusing them is a correctness bug: aSP_DOESNT_FITmistaken forSP_ERRORaborts a transaction that should have moved pages.
2.5 Allocating a slot: spage_find_free_slot and spage_find_empty_slot
Section titled “2.5 Allocating a slot: spage_find_free_slot and spage_find_empty_slot”spage_find_free_slot picks an id: if every slot is live (num_slots == num_records) there is no hole, so slot_id = num_slots (append); otherwise it scans forward from start_slot for the first slot with record_type == REC_DELETED_WILL_REUSE and reuses that id. Only REC_DELETED_WILL_REUSE is reusable — REC_MARKDELETED (the ANCHORED_DONT_REUSE_SLOTS tombstone) is deliberately not matched, so heap OIDs avoid reassignment until vacuum reclaims them.
spage_find_empty_slot is the allocate-and-reserve primitive, branch by branch: (1) spage_has_enough_total_space fails → SP_DOESNT_FIT; (2) spage_find_free_slot returning SP_ERROR or an id > num_slots → SP_ERROR; (3) a new id (slot_id == num_slots) adds sizeof(SPAGE_SLOT) to space and runs spage_check_space (the array eats the gap), returning that status if not SP_SUCCESS, else num_slots++; (4) a reused hole already verified total space, so only cont_free is re-checked (fail → SP_ERROR); (5) both converge on the reserve block — set the slot, num_records++, debit both counters by space, advance offset_to_free_area → SP_SUCCESS.
2.6 Insert, in-place update, and the two-tier update path
Section titled “2.6 Insert, in-place update, and the two-tier update path”spage_insert composes spage_find_slot_for_insert(...) then, on SP_SUCCESS, spage_insert_data(...). spage_find_slot_for_insert calls spage_check_record_for_insert — rejects oversize with SP_DOESNT_FIT, rewrites a REC_MARKDELETED/REC_DELETED_WILL_REUSE descriptor type into REC_HOME (no inserting a tombstone). spage_insert_data branches on REC_ASSIGN_ADDRESS (a TRANID placeholder) versus a normal memcpy, each overflow-checked.
spage_insert_at is the explicit-id variant for UNANCHORED pages and recovery: validate slot_id <= num_slots (SP_ERROR/ER_SP_UNKNOWN_SLOTID on overflow), then via spage_find_empty_slot_at dispatch to spage_add_new_slot (append) or spage_take_slot_in_use (rejects re-targeting an in-use slot on an ANCHORED* page with ER_SP_BAD_INSERTION_SLOT, else shifts the array up).
Update. spage_update saves total_free_save, checks fit (spage_check_updatable → SP_DOESNT_FIT if no room), then branches on size: length <= slot_p->record_length takes spage_update_record_in_place (fast), else spage_update_record_after_compact (grow). When is_saving it reserves the net change via spage_save_space(..., total_free - total_free_save), returning SP_ERROR on failure.
The fast path (spage_update_record_in_place) sets slot_p->record_length, memcpys into the existing offset_to_record, and does total_free -= space. Because space <= 0 for a non-growing update, that subtraction increases free space. It branches on is_located_end = spage_is_record_located_at_end(...): only a last-in-area record also does cont_free -= space and offset_to_free_area += space (cursor pulls back); otherwise the freed bytes stay fragmented. The grow path (spage_update_record_after_compact, Chapter 6) does a tail-compaction, full spage_compact, or rolls back.
INVARIANT (savings symmetry). When
is_saving, the net free-space change is handed tospage_save_spaceastotal_free - total_free_save. If skipped, a concurrent transaction could consume bytes this one needs to roll back into, breaking undo.
2.7 Delete, tombstone, and the savings mechanism
Section titled “2.7 Delete, tombstone, and the savings mechanism”spage_delete returns the slot id on success, NULL_SLOTID on failure, branching on anchor_type. Figure 2-1 is branch-complete:
Figure 2-1 — spage_delete control flow
flowchart TD
B{"slot NULL?"} -->|yes| R0["UNKNOWN_SLOTID; NULL_SLOTID"]
B -->|no| C["num_records--; total_free += freed"]
C --> G{"anchor?"}
G -->|ANCHORED| H["EMPTY; WILL_REUSE"]
G -->|DONT_REUSE| I["EMPTY; MARKDELETED"]
G -->|UNANCHORED| J["shift_down; freed += slot sz"]
G -->|default| R1["assert; NULL_SLOTID"]
H --> K{"is_saving?"}
I --> K
J --> K
K -->|yes| L{"save_space ok?"}
L -->|no| R2["NULL_SLOTID"]
L -->|yes| N["dirty; return slot_id"]
K -->|no| N
The total_free += free_space credit is unconditional; the cont_free/cursor adjustment applies only when the deleted record was last in the area (spage_is_record_located_at_end, else a hole is left for compaction). The UNANCHORED cases also reclaim the slot (free_space += sizeof(SPAGE_SLOT)) and assert is_saving == false.
spage_mark_deleted_slot_as_reusable is the vacuum-side flip (Chapter 8): it downgrades an empty tombstone to REC_DELETED_WILL_REUSE so spage_find_free_slot hands the id back out. Its two failing branches raise distinct error codes: an out-of-range slot_id (< 0 or >= num_slots) → SP_ERROR/ER_SP_UNKNOWN_SLOTID; a slot that is not an empty tombstone (must have offset_to_record == SPAGE_EMPTY_OFFSET and a REC_MARKDELETED/REC_DELETED_WILL_REUSE type), i.e. a live record → SP_ERROR/ER_SP_BAD_INSERTION_SLOT (both assert (false) first). Only the tombstone path sets record_type = REC_DELETED_WILL_REUSE and returns SP_SUCCESS.
The savings mechanism. On an is_saving page freed bytes are not immediately spendable — a rollback may reinsert the larger old record. spage_save_space reserves them in the lock-free spage_Saving_hashmap (keyed by VPID; per-transaction SPAGE_SAVE_ENTRY under a SPAGE_SAVE_HEAD). It early-returns NO_ERROR for space == 0, crash recovery, vacuum workers, or space < 0 / inactive transaction; only positive savings by an active transaction allocate an entry, bumping head->total_saved. spage_free_saved_spaces walks the tran_next_save chain at commit/abort, erasing the head when first becomes NULL — hence §2.4 subtracts total_saved.
2.8 Compaction and reclamation: identity fixed, offset moves
Section titled “2.8 Compaction and reclamation: identity fixed, offset moves”Compaction slides records together without changing any slot id. spage_compact builds an array of live slots (skipping SPAGE_EMPTY_OFFSET holes), sorts by offset_to_record, then walks in offset order memmove-ing each record down to the next aligned to_offset and rewriting that slot’s offset:
// spage_compact -- src/storage/slotted_page.cmemmove ((char *) page_p + to_offset, (char *) page_p + slot_array[i]->offset_to_record, slot_array[i]->record_length);slot_array[i]->offset_to_record = to_offset; /* <- offset moves, slot id stays */to_offset += slot_array[i]->record_length;...page_header_p->total_free = SPAGE_DB_PAGESIZE - to_offset - (num_slots * sizeof (SPAGE_SLOT));page_header_p->cont_free = page_header_p->total_free; /* <- all free now contiguous */Branches: num_records == 0 skips the array and resets to_offset to header size; calloc failure → ER_FAILED; a record_type > REC_4BIT_USED_TYPE_MAX or slot-count mismatch (num_records != j) is fatal — the latter raises ER_SP_WRONG_NUM_SLOTS and calls logpb_fatal_error_exit_immediately_wo_flush.
INVARIANT (slot identity under compaction). Compaction may rewrite
offset_to_recordfor any slot but must never change a slot’s index. A reader holding OID(…, slotid)re-reads throughspage_find_slottrusting the index; reordering indices would point an OID at a different row.
spage_need_compact is the policy gate: true only when fragmented free space is at least 5% of the page (total_free - cont_free >= SPAGE_DB_PAGESIZE / 20).
spage_reclaim shrinks the slot array on ANCHORED_DONT_REUSE_SLOTS pages. It iterates slot_id backward from num_slots - 1 (so trailing empties collapse cleanly); for each empty tombstone (offset_to_record == SPAGE_EMPTY_OFFSET with a REC_MARKDELETED/REC_DELETED_WILL_REUSE type) it branches — a current-last slot (slot_id + 1 == num_slots) is dropped via spage_reduce_a_slot, else downgraded to REC_DELETED_WILL_REUSE — setting is_reclaim = true. After the loop, if anything was reclaimed and num_slots == 0 it re-runs spage_initialize. Returns true iff something was reclaimed.
2.9 Chapter summary — key takeaways
Section titled “2.9 Chapter summary — key takeaways”- A slot is a 4-byte indirection.
SPAGE_SLOTpacksoffset_to_record/record_length/record_typeinto 14/14/4 bits, addressed backward from the page end so record area and slot array grow toward each other. - Three counters encode free-space state.
total_free(any),cont_free(without compacting,<= total_free),offset_to_free_area(write cursor) —spage_verify_headerenforces their invariant at every boundary. anchor_typeis the slot-identity knob. Heap pages useANCHORED_DONT_REUSE_SLOTS, so a delete becomes aREC_MARKDELETEDtombstone whose id is never reassigned until vacuum.- Return codes are not interchangeable.
SP_DOESNT_FIT= try another page,SP_ERROR= hard failure,SP_SUCCESS= done;spage_has_enough_contiguous_spaceattempts compaction before reporting failure. - Compaction moves bytes, never slot ids.
spage_compactrewritesoffset_to_recordtocont_free == total_free, but slot indices stay immutable — the basis of stable OIDs. - Deletes and shrinking updates may owe undo space. On
is_savingpages freed bytes are reserved viaspage_save_space, subtracted by the total-space check, released byspage_free_saved_spacesat transaction end. - Reclamation is anchored-page housekeeping.
spage_reclaimcollapses trailing tombstones and downgrades the rest to reusable;spage_need_compactgates compaction at 5% fragmentation.
Chapter 3: Record Types and the MVCC Record Header
Section titled “Chapter 3: Record Types and the MVCC Record Header”Chapter 2 gave a slot that holds an opaque blob. What is a heap record
physically? Two vocabularies do the work. The 4-bit record_type in
the slot (Chapter 2’s spage_slot) says how to read the bytes — record,
forwarding pointer, overflow pointer, or tombstone. The variable-size
MVCC record header inside the body carries versioning metadata
(insert/delete ids, prev-version pointer) consumed by the read path
(Chapter 5) and vacuum (Chapter 8). For why MVCC needs per-record stamps,
see the companion cubrid-heap-manager.md, “MVCC and the heap”.
3.1 The nine record_type values
Section titled “3.1 The nine record_type values”The slot’s record_type:4 bit-field (Chapter 2) is an enum in
storage_common.h. Nine values are meaningful; the rest of the 4-bit
space (REC_RESERVED_TYPE_8.._15) is reserved.
// enum record_type — src/storage/storage_common.henum{/* Unknown record type */ REC_UNKNOWN = 0,/* Record without content, just the address */ REC_ASSIGN_ADDRESS = 1,/* Home of record */ REC_HOME = 2,/* No the original home of record. part of relocation process */ REC_NEWHOME = 3,/* Record describe new home of record */ REC_RELOCATION = 4,/* Record describe location of big record */ REC_BIGONE = 5, // ... condensed: REC_MARKDELETED = 6, REC_DELETED_WILL_REUSE = 7 ... // ... condensed: REC_RESERVED_TYPE_8 .. 15 ... REC_4BIT_USED_TYPE_MAX = REC_DELETED_WILL_REUSE, // highest live value is 7 REC_4BIT_TYPE_MAX = REC_RESERVED_TYPE_15};(Enum comments are verbatim source text, original grammar and all.) Classified by what they encode (column 3 = does the slot hold a body):
| Type | Body? | Encodes |
|---|---|---|
REC_HOME | yes | record lives entirely in this slot — the common case |
REC_NEWHOME | yes | relocated body pointed to by a REC_RELOCATION; not OID-addressable |
REC_RELOCATION | forwarding | body outgrew its home page; slot holds a forward OID to a REC_NEWHOME |
REC_BIGONE | overflow | record exceeds one page; slot holds an overflow OID into the overflow file |
REC_ASSIGN_ADDRESS | no | OID reserved, content not yet written; bypasses MVCC stamping (3.4) |
REC_MARKDELETED | no | tombstone whose slot cannot be reused |
REC_DELETED_WILL_REUSE | no | tombstone whose slot will be reused by a future insert |
REC_UNKNOWN | no | sentinel / uninitialized (RECDES_INITIALIZER); never a live slot type |
Four types are “live” to a public OID lookup: REC_HOME,
REC_RELOCATION, REC_BIGONE, REC_ASSIGN_ADDRESS. REC_NEWHOME is
live data reached only by dereferencing a REC_RELOCATION forward OID;
a direct OID hit on it is a bug. The REC_*DELETED* types are tombstones
(Chapter 7); REC_UNKNOWN is a sentinel never written to a live slot.
flowchart TB
OID["public OID lands on slot"] --> T{record_type}
T -->|REC_HOME| H["body here, header inline"]
T -->|REC_RELOCATION| R["forward OID"] --> NH["REC_NEWHOME"]
T -->|REC_BIGONE| B["overflow OID"]
T -->|REC_ASSIGN_ADDRESS| A["address only"]
T -->|MARKDELETED / WILL_REUSE| D["tombstone"]
Figure 3-1 — Record-type dispatch after fetching a slot; the traversal
is Chapter 5. Forwarding (REC_RELOCATION/REC_NEWHOME) and overflow
(REC_BIGONE) differ in header storage: a relocated REC_NEWHOME
carries an inline variable-size header like any home record; a
REC_BIGONE carries a fixed maximum header (3.5).
3.2 mvcc_rec_header — every field
Section titled “3.2 mvcc_rec_header — every field”The in-memory decoded form is a struct in mvcc.h. On disk it is packed
by or_mvcc_set_header / unpacked by or_mvcc_get_header (both in
src/base/object_representation_sr.c) per the flag byte (3.3); the
on-disk bytes are not laid out as this struct.
// struct mvcc_rec_header — src/transaction/mvcc.hstruct mvcc_rec_header{ INT32 mvcc_flag:8; /* MVCC flags */ INT32 repid:24; /* representation id */ int chn; /* cache coherency number */ MVCCID mvcc_ins_id; /* MVCC insert id */ MVCCID mvcc_del_id; /* MVCC delete id */ LOG_LSA prev_version_lsa; /* log address of previous version */};// MVCC_REC_HEADER_INITIALIZER zeroes flag/repid and sets NULL_CHN, MVCCID_NULL x2, LSA_INITIALIZER| Field | Role | Why it exists |
|---|---|---|
mvcc_flag:8 | Bit set of OR_MVCC_FLAG_VALID_INSID (0x01), _VALID_DELID (0x02), _VALID_PREV_VERSION (0x04); low 5 bits usable (OR_MVCC_FLAG_MASK = 0x1f). | Self-describing: decides which optional fields are present, hence total size. Shares word 0 with repid. |
repid:24 | Representation (schema version) id. | An old-schema row keeps its repid so the engine reads the right layout. Packed via OR_MVCC_REPID_MASK = 0x00FFFFFF. |
chn | Cache coherency number. | For non-MVCC classes the only versioning info, bumped each update so a client cache detects staleness; present but not the visibility key for MVCC classes (3.6). Always present in the decoded struct. |
mvcc_ins_id | MVCCID of the inserter. | Visibility lower bound (Chapter 5). On disk present only if _VALID_INSID set; else MVCCID_ALL_VISIBLE. |
mvcc_del_id | MVCCID of the deleter. | Visibility upper bound / tombstone marker (Chapter 7). On disk present only if _VALID_DELID set; else MVCCID_NULL. |
prev_version_lsa | Log LSA of the previous version. | Back-pointer the read path follows on TOO_NEW_FOR_SNAPSHOT, and vacuum prunes. On disk present only if _VALID_PREV_VERSION set. |
On disk, the word after chn holds either the delete id or nothing,
discriminated by OR_MVCC_FLAG_VALID_DELID — per the
object_representation_constants.h:168 comment on that flag: “The
record contains MVCC delete id. If not set, the record contains chn”.
This chn/del_id overlap is purely the on-disk encoding (CUBRID’s
MVCC_* macros speak of a delid_chn view, e.g. MVCC_IS_REC_DELETED_BY
in mvcc.h); it is distinct from the decoded in-memory struct above,
which carries chn and mvcc_del_id as two separate always-present
fields.
Invariant: the flag byte alone determines header size.
or_header_sizeismvcc_header_size_lookup[OR_GET_MVCC_FLAG(ptr)]— no length field or terminator, so reader and writer must agree on the flag→size mapping. A writer that setsmvcc_ins_idbut forgetsOR_MVCC_FLAG_VALID_INSIDmakes the next reader compute a header 8 bytes too short and read attribute data as the insert id.
3.3 Flag-driven variable-size encoding: mvcc_header_size_lookup
Section titled “3.3 Flag-driven variable-size encoding: mvcc_header_size_lookup”With optional ids present only when flagged, the header has eight sizes; the table makes the flag→size map O(1):
// mvcc_header_size_lookup — src/object/object_representation.cint mvcc_header_size_lookup[8] = { OR_MVCC_REP_SIZE + OR_CHN_SIZE, // index 0 OR_MVCC_REP_SIZE + OR_CHN_SIZE + OR_MVCCID_SIZE, // index 1 (+INSID) // ... condensed: indices 2..6 sum REP+CHN with the flagged MVCCID/LSA terms ... OR_MVCC_REP_SIZE + OR_CHN_SIZE + OR_MVCCID_SIZE + OR_MVCCID_SIZE + OR_MVCC_PREV_VERSION_LSA_SIZE};The mandatory rep+chn prefix is 8 bytes; each MVCCID and the
prev-version LSA add 8. So the eight entries, by flag index 0..7,
evaluate to 8, 16, 16, 24, 16, 24, 24, 32 bytes. The table indexes a
bitmask, not a count: indices 1 and 2 are both 16 (one MVCCID either
way), 4/6 likewise. Offsets are positional —
OR_MVCC_DELETE_ID_OFFSET(flags) adds OR_MVCC_INSERT_ID_SIZE only when
VALID_INSID is set — bracketed by OR_MVCC_MIN_HEADER_SIZE = 8 /
OR_MVCC_MAX_HEADER_SIZE = 32. On disk the 8-byte prefix
(repid+flags, then chn) is followed by mvcc_ins_id,
mvcc_del_id, prev_version_lsa (8B each) in that order, each only when
flagged; attribute data begins at or_header_size(ptr).
3.4 heap_insert_adjust_recdes_header — stamping the header before placement
Section titled “3.4 heap_insert_adjust_recdes_header — stamping the header before placement”heap_insert_adjust_recdes_header turns the client-supplied record (a
bare header) into a stamped one: insert-id added for MVCC classes,
flags stripped for non-MVCC classes, prev-version cleared, length
adjusted. Two paths. The fast path is gated on use_optimization —
is_mvcc_class && update_in_place == UPDATE_INPLACE_NONE && !VALID_PREV_VERSION && !heap_is_big_length(record_size + OR_MVCCID_SIZE) && !is_bulk_op (SERVER_MODE only).
Branch A — use_optimization. A plain MVCC insert that skips the
unpack/repack round-trip and writes the INSID directly into the body.
// heap_insert_adjust_recdes_header (Branch A) — src/storage/heap_file.cassert (!(mvcc_flags & OR_MVCC_FLAG_VALID_DELID)); /* <- a fresh insert is never pre-deleted */mvcc_id = logtb_get_current_mvccid (thread_p);new_ins_mvccid_pos_p = start_p + OR_MVCC_INSERT_ID_OFFSET;if (!(mvcc_flags & OR_MVCC_FLAG_VALID_INSID)) { repid_and_flag_bits |= (OR_MVCC_FLAG_VALID_INSID << OR_MVCC_FLAG_SHIFT_BITS); OR_PUT_INT (start_p, repid_and_flag_bits); /* <- set flag in word 0 */ memmove (new_ins_mvccid_pos_p + OR_MVCCID_SIZE, new_ins_mvccid_pos_p, insert_context->recdes_p->length - OR_MVCC_INSERT_ID_OFFSET); /* <- open 8-byte gap */ insert_context->recdes_p->length += OR_MVCCID_SIZE; }OR_PUT_BIGINT (new_ins_mvccid_pos_p, &mvcc_id); /* <- write INSID into the gap */return NO_ERROR;Sub-branch: if the INSID flag is already set the gap exists, so the
memmove and length bump are skipped and only the id is overwritten
(Chapter 4 pre-sizes the buffer for the extra 8 bytes).
Branch B — general path. Reached when any optimization precondition fails (non-MVCC class, in-place update, prev-version present, would go big, bulk op, client mode); it decodes the header fully first.
// heap_insert_adjust_recdes_header (Branch B) — src/storage/heap_file.cor_mvcc_get_header (insert_context->recdes_p, &mvcc_rec_header); // ... err check condensed ...if (insert_context->update_in_place != UPDATE_INPLACE_OLD_MVCCID) { if (is_mvcc_class && !insert_context->is_bulk_op) /* B1: MVCC class */ { mvcc_id = logtb_get_current_mvccid (thread_p); if (!MVCC_IS_FLAG_SET (&mvcc_rec_header, OR_MVCC_FLAG_VALID_INSID)) { MVCC_SET_FLAG (&mvcc_rec_header, OR_MVCC_FLAG_VALID_INSID); record_size += OR_MVCCID_SIZE; } MVCC_SET_INSID (&mvcc_rec_header, mvcc_id); } else /* B2: non-MVCC / client */ { curr_header_size = mvcc_header_size_lookup[mvcc_rec_header.mvcc_flag]; MVCC_CLEAR_ALL_FLAG_BITS (&mvcc_rec_header); /* <- strip all optional ids */ new_header_size = mvcc_header_size_lookup[mvcc_rec_header.mvcc_flag]; record_size -= (curr_header_size - new_header_size); /* <- shrink length to match */ } }else if (MVCC_IS_HEADER_DELID_VALID (&mvcc_rec_header)) /* B3: redistribute keeps DELID */ insert_context->is_redistribute_insert_with_delid = true;
MVCC_CLEAR_FLAG_BITS (&mvcc_rec_header, OR_MVCC_FLAG_VALID_PREV_VERSION); /* always: new row has no prev */if (is_mvcc_class && heap_is_big_length (record_size)) HEAP_MVCC_SET_HEADER_MAXIMUM_SIZE (&mvcc_rec_header); /* <- big record -> full 32B header */or_mvcc_set_header (insert_context->recdes_p, &mvcc_rec_header); // ... err check condensed ...Four mutually exclusive outcomes: B1 stamps INSID and grows by one
MVCCID; B2 strips all optional ids and shrinks by the exact
lookup-table delta; B3 (partition redistribute) only records that an
existing DELID must be preserved; UPDATE_INPLACE_OLD_MVCCID falls
through all three untouched. Then, branch-independent, prev-version is
cleared and a big record is promoted to max size (3.5).
SM_CLASS root-class vs ATTRINFO normal-table path. The function runs
only for normal table rows. The insert driver calls it under the gate
!OID_ISNULL(class_oid) && !OID_IS_ROOTOID(class_oid) && recdes_p->type != REC_ASSIGN_ADDRESS, so two kinds skip adjustment: a root-class
row (OID_IS_ROOTOID — a raw serialized SM_CLASS record with its own
object header, not MVCC-versioned, so an insert-id would corrupt it) and
a REC_ASSIGN_ADDRESS placeholder. The ATTRINFO path (table rows from
heap_attrinfo-built descriptors) is the non-root, non-placeholder
branch that does call the adjuster.
flowchart TB
IN["insert driver"] --> Q{"root class?\nor REC_ASSIGN_ADDRESS?"}
Q -->|yes: SM_CLASS raw record| SKIP["skip adjust"]
Q -->|no: ATTRINFO table row| OPT{use_optimization?}
OPT -->|yes| A["Branch A: fast INSID stamp"]
OPT -->|no| B["Branch B: full get/set header"]
Figure 3-2 — Caller gate plus internal branching of header adjustment.
3.5 Overflow records keep a fixed-size header
Section titled “3.5 Overflow records keep a fixed-size header”A REC_BIGONE body lives in the overflow file, its MVCC header on the
first overflow page, always written at maximum size so it updates in
place. Two helpers enforce this. heap_get_mvcc_rec_header_from_overflow
reads it back as a recdes of OR_MVCC_MAX_HEADER_SIZE:
// heap_get_mvcc_rec_header_from_overflow — src/storage/heap_file.cpeek_recdes->data = overflow_get_first_page_data (ovf_page);peek_recdes->length = OR_MVCC_MAX_HEADER_SIZE; /* <- always read 32B */return or_mvcc_get_header (peek_recdes, mvcc_header);The getter’s only branch: peek_recdes == NULL uses a local
ovf_recdes; otherwise the caller’s recdes is populated so it can reach
the overflow body too. heap_set_mvcc_rec_header_on_overflow writes it
back, forcing maximum size so the overwritten slot never changes length:
// heap_set_mvcc_rec_header_on_overflow — src/storage/heap_file.covf_recdes.area_size = ovf_recdes.length = OR_HEADER_SIZE (ovf_recdes.data);assert (ovf_recdes.length == OR_MVCC_MAX_HEADER_SIZE); /* <- existing header must already be 32B */if (!MVCC_IS_FLAG_SET (mvcc_header, OR_MVCC_FLAG_VALID_INSID)) { MVCC_SET_FLAG_BITS (mvcc_header, OR_MVCC_FLAG_VALID_INSID); MVCC_SET_INSID (mvcc_header, MVCCID_ALL_VISIBLE); } /* <- force INSID present */if (!MVCC_IS_FLAG_SET (mvcc_header, OR_MVCC_FLAG_VALID_DELID)) { MVCC_SET_FLAG_BITS (mvcc_header, OR_MVCC_FLAG_VALID_DELID); MVCC_SET_DELID (mvcc_header, MVCCID_NULL); } /* <- force DELID present */assert (mvcc_header_size_lookup[MVCC_GET_FLAG (mvcc_header)] == OR_MVCC_MAX_HEADER_SIZE);return or_mvcc_set_header (&ovf_recdes, mvcc_header);The two if blocks are the only branches: a set flag is left alone; a
missing INSID/DELID flag is added with a neutral value
(MVCCID_ALL_VISIBLE, MVCCID_NULL). HEAP_MVCC_SET_HEADER_MAXIMUM_SIZE
(3.4) does the same at insert time.
Invariant: an overflow header is exactly
OR_MVCC_MAX_HEADER_SIZE(32 bytes), always. Both helpers assert it. An in-place header update (e.g. stamping DELID at delete time) must not relocate the body; the fixed 32-byte header keeps record length constant. A variable-size header would shift every byte of a possibly multi-megabyte record on each DELID stamp.
3.6 chn vs mvcc_ins_id
Section titled “3.6 chn vs mvcc_ins_id”Already traced above (the chn row in 3.2, Branch B2 in 3.4): non-MVCC
classes strip all MVCC flags to the 8-byte rep+chn form, where chn
alone carries coherency (checked by MVCC_IS_CHN_UPTODATE); MVCC classes
stamp mvcc_ins_id as the visibility key while chn stays present but
inert. One decoder (or_mvcc_get_header) serves both.
3.7 Chapter summary — key takeaways
Section titled “3.7 Chapter summary — key takeaways”- The 4-bit
record_typeis the first dispatch: four types (REC_HOME,REC_RELOCATION,REC_BIGONE,REC_ASSIGN_ADDRESS) are what a public OID may land on;REC_NEWHOMEis reachable only via aREC_RELOCATIONforward;REC_*DELETED*are tombstones;REC_UNKNOWNis a sentinel. - Forwarding (
REC_NEWHOME) carries an inline variable-size header; overflow (REC_BIGONE) carries a fixed 32-byte header. mvcc_rec_headerpacks flags+repid into one word, an always-presentchn, and three optional on-disk fields (mvcc_ins_id,mvcc_del_id,prev_version_lsa) gated by the 8-bitmvcc_flag.- Header size is a pure function of the flag byte —
or_header_sizeindexesmvcc_header_size_lookup[flag]; no length field, so writers and readers must agree on flags or mis-parse. heap_insert_adjust_recdes_headerhas a fast path (Branch A: stamp INSID via memmove) and a general path (Branch B: B1 MVCC-stamp, B2 strip-for-non-MVCC, B3 redistribute-keep-DELID), always clearing prev-version and promoting big records to max size. Root-classSM_CLASSrecords andREC_ASSIGN_ADDRESSplaceholders skip adjustment; only non-root ATTRINFO rows are stamped.chnis the coherency key for non-MVCC classes (8-byte header);mvcc_ins_idis the visibility key for MVCC classes.
Chapter 4: Insert Flow and OID Assignment
Section titled “Chapter 4: Insert Flow and OID Assignment”This chapter follows the birth of a single record — from the caller’s
RECDES to a minted, locked OID and bytes resident in a slotted page:
how is the OID minted, how is a home page chosen, and how does an
oversized record spill to overflow at insert time? The high-level
companion’s ### Insert flow section gives the placement algorithm; we
trace every branch instead. Record types are in Chapter 3;
HEAP_OPERATION_CONTEXT in Chapter 1; best-space selection in Chapter 9;
logging in Chapter 10.
4.1 Context build-up — heap_create_insert_context
Section titled “4.1 Context build-up — heap_create_insert_context”Every logical heap operation funnels through a HEAP_OPERATION_CONTEXT.
The insert constructor clears it, then stamps in the inputs:
// heap_create_insert_context -- src/storage/heap_file.cheap_clear_operation_context (context, hfid_p); /* <- reset ALL fields; flag defaults below */if (class_oid_p != NULL) { COPY_OID (&context->class_oid, class_oid_p); } /* <- may stay NULL */context->recdes_p = recdes_p; /* <- caller's record bytes + type */context->scan_cache_p = scancache_p; /* <- optional page-caching hint */context->type = HEAP_OPERATION_INSERT;heap_clear_operation_context nulls res_oid, ovf_oid, map_recdes
and the three behavior flags below (each = false):
| Flag | Set where | Effect on insert |
|---|---|---|
is_logical_old | false for insert; true only in heap_update_logical for the relocated source | false = genuinely new logical object; REC_NEWHOME relocation uses heap_insert_newhome, not this path, so it never flips. |
is_redistribute_insert_with_delid | heap_insert_adjust_recdes_header, when the incoming header already has a valid DELID | Routes logging to heap_mvcc_log_redistribute (§4.7). |
is_bulk_op | bulk-load caller, before heap_insert_logical | Suppresses INSID stamping, takes NULL_LOCK not X_LOCK, asserts a pre-held BU_LOCK. |
Invariant — a context is single-use and fully reset. Because
heap_create_insert_context always clears first, no field survives a
prior operation; a reused-without-reset context would leak a stale
ovf_oid/res_oid and corrupt the forwarding map.
4.2 The whole flow — heap_insert_logical
Section titled “4.2 The whole flow — heap_insert_logical”heap_insert_logical is the single entry point (Figure 4-1, every branch
to error:). One decision is not in the figure: MVCC-op
classification — is_mvcc_op is true only under SERVER_MODE with an
MVCC-enabled class, a non-REC_ASSIGN_ADDRESS record, and not bulk; it is
consumed only at logging (§4.7), never at placement. The header-adjust
gate (§4.3) is skipped for the root class, a NULL class OID, and
REC_ASSIGN_ADDRESS.
flowchart TB
A["heap_insert_logical"] --> B{"scancache_check OK?"}
B -- "no" --> RF1["return ER_FAILED (no page fixed)"]
B -- "yes" --> G{"class not NULL, not ROOT,\ntype != REC_ASSIGN_ADDRESS?"}
G -- "yes, fail" --> RF1
G -- "yes, ok / no" --> J["heap_insert_handle_multipage_record"]
J -- "ER_FAILED" --> ERR["goto error"]
J -- "ok" --> M["class lock: bulk asserts BU_LOCK,\nelse IX_LOCK UNCOND"]
M -- "not granted" --> RF1
M -- "granted" --> Q["heap_get_insert_location_with_lock"]
Q -- "fail" --> RF1
Q -- "ok, res_oid set + locked" --> O["heap_insert_physical"]
O -- "fail" --> ERR
O -- "ok" --> U["log unless bulk; set_dirty;\ncache or pgbuf_ordered_unfix; perfmon"]
U --> ERR
ERR["error:"] --> RET["return rc"]
Figure 4-1 — Branch-complete control flow. Early failures return
directly (no page fixed); post-fix failures goto error (the shared
SystemTap exit). error: is also the normal (success) exit.
4.3 Stamping the MVCC INSID — heap_insert_adjust_recdes_header
Section titled “4.3 Stamping the MVCC INSID — heap_insert_adjust_recdes_header”For an MVCC class the row gets its insert MVCCID here, via a fast path (guard predicate below) and a general path:
// heap_insert_adjust_recdes_header -- src/storage/heap_file.cuse_optimization = (is_mvcc_class && update_in_place == UPDATE_INPLACE_NONE && !(mvcc_flags & OR_MVCC_FLAG_VALID_PREV_VERSION) && !heap_is_big_length (record_size + OR_MVCCID_SIZE) && !is_bulk_op);if (use_optimization) { /* <- in-place: OR_PUT_INT INSID flag; memmove 8-byte gap; */ return NO_ERROR; /* length += OR_MVCCID_SIZE; OR_PUT_BIGINT INSID; one memmove, no pack/unpack */}The fast path fires for the common case (a fresh, non-big, no-prev-version
insert). The general path (or_mvcc_get_header →
mutate → or_mvcc_set_header) has three sub-branches: MVCC, not bulk —
set OR_MVCC_FLAG_VALID_INSID if absent, grow by OR_MVCCID_SIZE,
MVCC_SET_INSID; non-MVCC or bulk — strip all flags
(MVCC_CLEAR_ALL_FLAG_BITS) and shrink; UPDATE_INPLACE_OLD_MVCCID —
keep the existing MVCCID, setting is_redistribute_insert_with_delid if a
DELID is present (a partition redistribute). A now-big MVCC record then has
its header forced to maximum size (HEAP_MVCC_SET_HEADER_MAXIMUM_SIZE) so
a later in-place delete/update of the overflow map never grows the home
slot.
Invariant — INSID width is reserved before placement. The §4.5 home
page is sized against recdes_p->length after OR_MVCCID_SIZE is added;
the fast path’s area_size >= length + OR_MVCCID_SIZE assert keeps the
memmove in bounds.
4.4 Oversized spill — heap_insert_handle_multipage_record
Section titled “4.4 Oversized spill — heap_insert_handle_multipage_record”This is the gate between “fits in a page” and “goes to overflow”:
// heap_insert_handle_multipage_record -- src/storage/heap_file.cif (!heap_is_big_length (context->recdes_p->length)) { return NO_ERROR; } /* <- normal: untouched */if (heap_ovf_insert (thread_p, &context->hfid, &context->ovf_oid, context->recdes_p) == NULL) { return ER_FAILED; } /* <- overflow insert failed */heap_build_forwarding_recdes (&context->map_recdes, REC_BIGONE, &context->ovf_oid);context->recdes_p = &context->map_recdes; /* <- home page now receives the 8-byte map */heap_is_big_length (length > heap_Maxslotted_reclength) is the single
source of truth for “oversized”; when not big, the function is a no-op.
When big, heap_build_forwarding_recdes builds the 8-byte REC_BIGONE
map and repoints context->recdes_p at it, so the rest of
heap_insert_logical places that tiny record like a normal one.
Invariant — the MVCC header lives in the overflow record, not the
map. The source comment is explicit (“MVCC information is held in
overflow record”) — which is why §4.3 forced the overflow record’s
header to maximum size. The home-page REC_BIGONE map carries no MVCCID;
visibility (Chapter 5) follows the OID into overflow.
4.4.1 heap_ovf_insert
Section titled “4.4.1 heap_ovf_insert”// heap_ovf_insert -- src/storage/heap_file.cif (heap_ovf_find_vfid (thread_p, hfid, &ovf_vfid, true, PGBUF_UNCONDITIONAL_LATCH) == NULL || overflow_insert (thread_p, &ovf_vfid, &ovf_vpid, recdes, FILE_MULTIPAGE_OBJECT_HEAP) != NO_ERROR) { return NULL; } /* <- either branch fails -> NULL */ovf_oid->pageid = ovf_vpid.pageid; ovf_oid->volid = ovf_vpid.volid;ovf_oid->slotid = NULL_SLOTID; /* <- overflow has no slot */The true argument creates the overflow file if absent; ovf_oid
identifies the first overflow page (its slot field is meaningless —
overflow pages are page-chained, not slotted).
4.5 Choosing the home page and minting the OID
Section titled “4.5 Choosing the home page and minting the OID”With recdes_p ready and the class IX lock held, heap_insert_logical
calls heap_get_insert_location_with_lock — where the OID is born:
- Page select.
home_hint == NULL→heap_stats_find_best_page(isnew_rec = true, Chapter 9’s black box — a fixed page or an error via the §4.6 fallback, never “no space”;NULL→return error_code). A hint →ER_SP_NOSPACE_IN_PAGEif too small, else adopt it. Either wayres_oid.volid/pageidare set. - Lock mode:
SCH_M_LOCK(root),NULL_LOCK(bulk), elseX_LOCK. - Slot loop (
slot 0..slot_count):spage_find_free_slotreturns aslot_id(also reclaimsREC_DELETED_WILL_REUSEslots, Chapter 7;== slot_countmeans “append”) orSP_ERROR→ break. Setres_oid.slotid;NULL_LOCKreturns immediately, elselock_objectconditionally:LK_GRANTED→ return;LK_NOTGRANTED_DUE_TIMEOUT→ next slot; any other error → break. - Break path: null
res_oid, unfix the page,assert(false),return ER_FAILED.
Invariant — the OID is fully determined and X-locked before any byte is
written. res_oid is complete only after a successful lock_object (or
immediately for NULL_LOCK bulk), with the lock on the not-yet-written
slot — so the row lock is taken inside INSERT, and step 3’s conditional
retry means two inserters never deadlock on a tentative slot.
4.6 Physical placement — heap_insert_physical and heap_alloc_new_page
Section titled “4.6 Physical placement — heap_insert_physical and heap_alloc_new_page”heap_insert_physical writes at the exact reserved slot — it does not
choose one:
// heap_insert_physical -- src/storage/heap_file.cassert (context->res_oid.slotid != NULL_SLOTID); /* <- slot chosen in §4.5 */if (spage_insert_at (thread_p, context->home_page_watcher_p->pgptr, context->res_oid.slotid, context->recdes_p) != SP_SUCCESS) { er_set (ER_FATAL_ERROR_SEVERITY, ...); OID_SET_NULL (&context->res_oid); return ER_FAILED; }spage_insert_at is Chapter 2’s primitive; failure is fatal (the slot was
just proven free and lockable, so SP_DOESNT_FIT means page corruption).
heap_alloc_new_page is the new-page fallback behind best-space —
called not by heap_insert_logical but by heap_stats_find_best_page
(and the bulk loader) when no page has room:
// heap_alloc_new_page -- src/storage/heap_file.cHEAP_PAGE_SET_VACUUM_STATUS (&new_page_chain, HEAP_PAGE_VACUUM_NONE); /* <- clean page, links nulled */error_code = file_alloc (thread_p, &hfid->vfid, heap_vpid_init_new, &new_page_chain, new_page_vpid, &page_ptr);if (error_code != NO_ERROR) { ASSERT_ERROR (); return error_code; } /* <- no watcher attached yet */pgbuf_attach_watcher (thread_p, page_ptr, PGBUF_LATCH_WRITE, hfid, home_hint_p);It initializes a fresh HEAP_CHAIN header (Chapter 8) then attaches a
write-latched watcher, fixing the page before return. The error path
mirrors §4.2’s asymmetry: when file_alloc fails it returns the error via
ASSERT_ERROR() before any watcher is attached, leaving no page fixed.
(HEAP_CHAIN fields: Chapter 1.)
4.7 Logging hook points
Section titled “4.7 Logging hook points”After a successful insert (unless use_bulk_logging),
heap_log_insert_physical dispatches by record type and op kind (using
is_mvcc_op and is_redistribute_insert_with_delid):
- MVCC + redistribute →
heap_mvcc_log_redistribute. - MVCC, normal →
heap_mvcc_log_insert. - non-MVCC,
REC_ASSIGN_ADDRESS→RVHF_INSERTundoredo, 2-byte reserved-length payload (no body yet). - non-MVCC,
REC_NEWHOME→RVHF_INSERT_NEWHOME. - non-MVCC, else → plain
RVHF_INSERT.
RVHF_* semantics are Chapter 10’s. Perfmon counters
(PSTAT_HEAP_HOME_INSERTS / BIG_INSERTS / ASSIGN_INSERTS) are keyed by
the final recdes_p->type (REC_BIGONE if §4.4 spilled).
4.8 The pre-mint path — heap_assign_address
Section titled “4.8 The pre-mint path — heap_assign_address”heap_assign_address reserves an OID before the row’s bytes exist
(the transformer uses it to break a circular reference).
// heap_assign_address -- src/storage/heap_file.cif (expected_length <= 0) { /* ... heap_estimate_avg_length ... */ }recdes.length = /* <- clamp to [OID_SIZE, non-big] */ ((expected_length > SSIZEOF (OID) && !heap_is_big_length (expected_length)) ? expected_length : SSIZEOF (OID));recdes.data = NULL; recdes.type = REC_ASSIGN_ADDRESS; /* <- placeholder type */heap_create_insert_context (&insert_context, (HFID *) hfid, class_oid, &recdes, NULL);rc = heap_insert_logical (thread_p, &insert_context, NULL);COPY_OID (oid, &insert_context.res_oid); /* <- hand back the minted OID */The clamp falls back to the heap’s average object length when the length
is unknown (a big reservation reserves just OID_SIZE, the content going
to overflow later). The REC_ASSIGN_ADDRESS type makes the insert skip
header adjustment (§4.2) and log non-MVCC (§4.7); a later UPDATE
(Chapter 6) fills the slot.
4.9 The shared relocation helper — heap_insert_newhome
Section titled “4.9 The shared relocation helper — heap_insert_newhome”heap_insert_newhome places a REC_NEWHOME body — the relocated copy of
a record that outgrew its home page. It is never reached from a logical
insert; update (Chapter 6) and delete relocation reuse it.
// heap_insert_newhome -- src/storage/heap_file.cassert (parent_context->type == HEAP_OPERATION_DELETE || parent_context->type == HEAP_OPERATION_UPDATE);heap_create_insert_context (&ins_context, &parent_context->hfid, &parent_context->class_oid, recdes_p, NULL);error_code = heap_find_location_and_insert_rec_newhome (thread_p, &ins_context); /* <- find page + spage_insert */heap_log_insert_physical (thread_p, ..., &ins_context.res_oid, ins_context.recdes_p, false, false); /* <- always non-MVCC */if (out_oid_p != NULL) { COPY_OID (out_oid_p, &ins_context.res_oid); } /* <- give caller the OID */if (newhome_pg_watcher != NULL) /* <- optional: hand page back */ pgbuf_replace_watcher (thread_p, ins_context.home_page_watcher_p, newhome_pg_watcher);Differences that matter to a modifier: it builds its own child
ins_context (§4.1’s single-use invariant); it takes no row lock
(REC_NEWHOME is reached only via its REC_RELOCATION pointer, so the
visible OID and X-lock live there, and placement uses
heap_stats_find_best_page with isnew_rec = false then spage_insert —
a page-chosen, not pre-locked, slot); logging is always non-MVCC
(RVHF_INSERT_NEWHOME) since vacuum never inspects REC_NEWHOME; and a
non-NULL newhome_pg_watcher keeps the page fixed via
pgbuf_replace_watcher so the caller sets the prev-version LSA (Chapter 6)
without re-fixing.
4.10 Chapter summary — key takeaways
Section titled “4.10 Chapter summary — key takeaways”- Insert is one funnel.
heap_insert_logicaldrives a single-use context through header-adjust → spill → page-select+lock → physical-write → log; early failures return directly, post-fix failuresgoto error. - The MVCCID is stamped before placement.
heap_insert_adjust_recdes_headeradds an 8-byte INSID (fast in-place path) so the page is sized against the final length; non-MVCC/bulk strip all flags. - Oversized means a tiny home record.
heap_insert_handle_multipage_recordwrites the bytes to overflow viaheap_ovf_insert, then leaves an 8-byteREC_BIGONEmap whose MVCC header lives in the overflow record. - The OID is minted and X-locked before any byte is written.
heap_get_insert_location_with_lockfixes a page, completesres_oid, and conditionally X-locks the tentative slot, advancing on contention. - Physical write trusts the reservation.
heap_insert_physicalcallsspage_insert_atat the pre-locked slot (failure is fatal);heap_alloc_new_pageis the fallback, leaving no page fixed whenfile_allocfails. - Two specialized births.
heap_assign_addressmints a body-less OID (REC_ASSIGN_ADDRESS, filled later by UPDATE);heap_insert_newhomeis the relocation-only twin — own context, no row lock, always non-MVCCRVHF_INSERT_NEWHOME, optional page-watcher hand-back.
Chapter 5: Read Path Visibility and Following the Forwarding Chain
Section titled “Chapter 5: Read Path Visibility and Following the Forwarding Chain”The read path answers: given an OID (or sequential cursor), what bytes may this transaction see? The record at the OID may not hold its own data — a REC_RELOCATION pointer to a REC_NEWHOME body or a REC_BIGONE pointer to an overflow page — and even once located, the MVCC snapshot may rule the current version invisible, forcing a walk back into the log. This chapter traces both, branch by branch.
The companion’s ### Read flow and ### MVCC integration — record header give the concepts; the snapshot predicate (mvcc_satisfies_snapshot) is theory we do not re-derive — we assume SNAPSHOT_SATISFIED / TOO_OLD_FOR_SNAPSHOT / TOO_NEW_FOR_SNAPSHOT are understood.
5.1 The HEAP_GET_CONTEXT workspace
Section titled “5.1 The HEAP_GET_CONTEXT workspace”Every read funnels through a HEAP_GET_CONTEXT (struct heap_get_context in heap_file.h) — the scratchpad carrying the OID, latched pages, and record type between helpers. Every field:
| Field | Role | Why it exists |
|---|---|---|
record_type | Dispatcher output; REC_HOME/REC_RELOCATION/REC_BIGONE/… | Drives per-type dispatch; slot read once. |
oid_p | Home OID (input, const). | The home page+slot; never REC_NEWHOME. |
forward_oid | Filled for REC_RELOCATION/REC_BIGONE. | The REC_NEWHOME body slot, or first overflow page. |
class_oid_p | Class OID; may be NULL, filled from page chain. | Decides whether the class is MVCC-disabled. |
recdes_p | Output bytes; NULL for header-only. | Visibility runs without copying data. |
scan_cache | Owning HEAP_SCANCACHE, or NULL. | Copy area + cache_last_fix_page latch retention. |
home_page_watcher | Ordered watcher on the home page. | Held across the get; handed back on cleanup. |
fwd_page_watcher | Ordered watcher on forward / overflow page. | Fixed only for relocation/bigone. |
ispeeking | PEEK (alias) vs COPY (into an area). | Whether data outlives the latch — see 5.7. |
old_chn | Caller’s cached CHN, or NULL_CHN. | The “client already has this version” short-circuit. |
latch_mode | READ, or WRITE when scan cache demands X. | Serial increment reads under X to avoid a re-fix. |
Invariant (forward consistency). On heap_prepare_get_context returning S_SUCCESS, every caller asserts record_type == REC_HOME or (!OID_ISNULL(&forward_oid) and fwd_page_watcher.pgptr != NULL) — enforced by the assert pair atop heap_get_visible_version_internal/heap_get_last_version. If violated, heap_get_mvcc_header would dereference a NULL forward page for a relocation/bigone record — a crash.
5.2 Fixing the home page — heap_prepare_object_page
Section titled “5.2 Fixing the home page — heap_prepare_object_page”On success the watcher holds the page the OID lives on.
// heap_prepare_object_page -- src/storage/heap_file.cVPID_GET_FROM_OID (&object_vpid, oid);if (page_watcher_p->pgptr != NULL && !VPID_EQ (pgbuf_get_vpid_ptr (page_watcher_p->pgptr), &object_vpid)) pgbuf_ordered_unfix (thread_p, page_watcher_p); /* <- wrong page latched; drop it */if (page_watcher_p->pgptr == NULL) { ret = pgbuf_ordered_fix (thread_p, &object_vpid, OLD_PAGE, latch_mode, page_watcher_p); if (ret == ER_PB_BAD_PAGEID) ret = ER_HEAP_UNKNOWN_OBJECT; /* <- bad page = "no object" */ if (ret == ER_LK_PAGE_TIMEOUT && er_errid () == NO_ERROR) ret = ER_PAGE_LATCH_ABORTED; }return ret;Three branches: right page held → no fix; wrong page → unfix first; no page → fix. ER_PB_BAD_PAGEID → ER_HEAP_UNKNOWN_OBJECT (caller maps to S_DOESNT_EXIST — a dangling OID is normal); ER_LK_PAGE_TIMEOUT → ER_PAGE_LATCH_ABORTED, the ordered-fix retry signal.
5.3 The dispatcher — heap_prepare_get_context
Section titled “5.3 The dispatcher — heap_prepare_get_context”This fixes the home page (5.2), reads the record type, and for indirect types fixes the forward page too — the branching heart of the read path:
flowchart TD
A[heap_prepare_get_context] --> B[heap_prepare_object_page home]
B -->|ER_HEAP_UNKNOWN_OBJECT| Z1[S_DOESNT_EXIST]
B -->|other error| ERR[goto error: clean + S_ERROR]
B -->|ok| E{slot record_type}
E -->|NULL slot / ASSIGN_ADDRESS / MARKDELETED / DELETED_WILL_REUSE| Z2[S_DOESNT_EXIST or err]
E -->|REC_HOME| H[S_SUCCESS, home only]
E -->|REC_RELOCATION| R[peek forward_oid, fix fwd page]
E -->|REC_BIGONE| G[peek forward_oid, fix overflow page]
E -->|REC_NEWHOME direct read| ERR2[ER_HEAP_BAD_OBJECT_TYPE]
R -->|home unfixed| RT[retry once: try_again]
R -->|stable| H
G -->|fix ok| H
G -->|page_was_unfixed| AS[assert false]
Figure 5-2. Every branch of heap_prepare_get_context.
Non-obvious branches:
REC_RELOCATIONretry. After the forward fix, ifhome_page_watcher.page_was_unfixedshows ordered-fix re-grabbed home, the relocation link may have changed, so it loops totry_againonce (try_max == 1); a second unfix raisesER_PAGE_LATCH_ABORTED. ForREC_BIGONEthe forward watcher re-ranks toPGBUF_ORDERED_HEAP_OVERFLOWand a homepage_was_unfixedis not expected (overflow is immutable) →assert(false).- Edge cases. A supplied-but-NULL
class_oid_pis filled from the chain record;REC_NEWHOMEdirect read is illegal →ER_HEAP_BAD_OBJECT_TYPE. Theerror:label runsheap_clean_get_context+S_ERROR; theS_DOESNT_EXIST/S_SUCCESSpaths do not clean up — the caller owns the latches.
5.4 Reading the header — heap_get_mvcc_header
Section titled “5.4 Reading the header — heap_get_mvcc_header”With pages latched and type known, heap_get_mvcc_header is a pure 3-way switch (context->record_type): REC_HOME peeks the home slot and or_mvcc_get_headers it; REC_RELOCATION does the same on forward_oid; REC_BIGONE calls heap_get_mvcc_rec_header_from_overflow; default is assert(false) → S_ERROR. The pre-condition asserts (home matches the OID, forward page matches forward_oid) are the teeth of the forward-consistency invariant of 5.1.
5.5 The visibility decision — heap_get_visible_version_internal
Section titled “5.5 The visibility decision — heap_get_visible_version_internal”heap_get_visible_version is a thin wrapper. The internal prepares the context (5.3; not S_SUCCESS → goto exit), reads the header (5.4) when a snapshot or old_chn is present, then maps the verdict:
// heap_get_visible_version_internal -- src/storage/heap_file.csnapshot_res = mvcc_snapshot->snapshot_fnc (thread_p, &mvcc_header, mvcc_snapshot);if (snapshot_res == TOO_NEW_FOR_SNAPSHOT) /* wanted version is older, in the log */ { scan = heap_get_visible_version_from_log (..., &MVCC_GET_PREV_VERSION_LSA (&mvcc_header), ...); goto exit; }else if (snapshot_res == TOO_OLD_FOR_SNAPSHOT) /* dead to us; a miss */ { scan = S_SNAPSHOT_NOT_SATISFIED; goto exit; }/* else SNAPSHOT_SATISFIED falls through to the CHN check, else copy (5.7) */if (MVCC_IS_CHN_UPTODATE (&mvcc_header, context->old_chn)) /* <- runs even with no snapshot */ { scan = S_SUCCESS_CHN_UPTODATE; goto exit; }The asymmetry is load-bearing: TOO_OLD is a miss, TOO_NEW is not — the wanted version is older in the undo log, so we follow MVCC_GET_PREV_VERSION_LSA (5.6). The CHN short-circuit runs after the snapshot block unconditionally, firing even with no snapshot when an old_chn was passed (### CHN). heap_get_last_version is this skeleton minus the snapshot block, the snapshot-free sibling for updaters/lockers.
5.6 Walking into the log — heap_get_visible_version_from_log
Section titled “5.6 Walking into the log — heap_get_visible_version_from_log”On TOO_NEW, older versions are reconstructed from undo records chained by prev_version_lsa:
// heap_get_visible_version_from_log -- src/storage/heap_file.cfor (LSA_COPY (&process_lsa, previous_version_lsa); !LSA_ISNULL (&process_lsa);) { /* fetch page + log_get_undo_record elided */ if (scan_code == S_DOESNT_FIT && scan_cache->is_recdes_assigned_to_area (*recdes)) { scan_cache->assign_recdes_to_area (*recdes, (size_t) (-recdes->length)); continue; } /* grow + retry */ or_mvcc_get_header (recdes, &mvcc_header); snapshot_res = scan_cache->mvcc_snapshot->snapshot_fnc (...); if (snapshot_res == SNAPSHOT_SATISFIED) return MVCC_IS_CHN_UPTODATE (&mvcc_header, has_chn) ? S_SUCCESS_CHN_UPTODATE : S_SUCCESS; else if (snapshot_res == TOO_OLD_FOR_SNAPSHOT) { assert (false); return S_ERROR; } /* <- impossible: only older here */ else /* TOO_NEW */ LSA_COPY (&process_lsa, &MVCC_GET_PREV_VERSION_LSA (&mvcc_header)); /* step back */ }return S_DOESNT_EXIST; /* chain exhausted, nothing visible */TOO_OLD is impossible here (only older versions live in the log) and asserts; an exhausted chain (LSA_ISNULL) → S_DOESNT_EXIST.
5.7 Copying the bytes — heap_get_record_data_when_all_ready and PEEK vs COPY
Section titled “5.7 Copying the bytes — heap_get_record_data_when_all_ready and PEEK vs COPY”This helper maps type to a spage_get_record — the only place honoring ispeeking:
// heap_get_record_data_when_all_ready -- src/storage/heap_file.cswitch (context->record_type) { case REC_RELOCATION: /* never aliased -- forced COPY */ return spage_get_record (..., fwd_page_watcher.pgptr, forward_oid.slotid, recdes_p, COPY); case REC_BIGONE: return heap_get_bigone_content (..., ispeeking, &forward_oid, recdes_p); case REC_HOME: /* honors context->ispeeking */ return spage_get_record (..., home_page_watcher.pgptr, oid_p->slotid, recdes_p, context->ispeeking); default: break; }return S_ERROR;PEEK vs COPY (the latch lifetime contract). PEEK returns recdes->data into the latched page — zero copy, valid only while latched; COPY memcpys into a preallocated area. Entry assertion: PEEK, or COPY with a scan_cache or caller-provided recdes->data.
Invariant (REC_RELOCATION is never peeked). The relocation case hard-codes COPY; using context->ispeeking would let a PEEK scan alias forward-page memory and read garbage after heap_clean_get_context unfixes the forward watcher.
5.8 The overflow fetch — heap_get_bigone_content
Section titled “5.8 The overflow fetch — heap_get_bigone_content”REC_BIGONE data lives in an overflow file (### Overflow file); the fetch grows its area on S_DOESNT_FIT:
// heap_get_bigone_content -- src/storage/heap_file.cif (scan_cache != NULL && (ispeeking == PEEK || recdes->data == NULL || scan_cache->is_recdes_assigned_to_area (*recdes))) { scan_cache->assign_recdes_to_area (*recdes); while ((scan = heap_ovf_get (thread_p, forward_oid, recdes, NULL_CHN, NULL)) == S_DOESNT_FIT) { assert (recdes->length < 0); scan_cache->assign_recdes_to_area (*recdes, (size_t) (-recdes->length)); } /* grow + retry */ if (scan != S_SUCCESS) recdes->data = NULL; /* <- no dangling pointer */ }else scan = heap_ovf_get (thread_p, forward_oid, recdes, NULL_CHN, NULL);The snapshot was already validated by the caller — no re-check. The retry reallocates to -recdes->length (the negative size heap_ovf_get returns); non-success nulls recdes->data.
5.9 The legacy CHN fast path — heap_get_if_diff_chn
Section titled “5.9 The legacy CHN fast path — heap_get_if_diff_chn”heap_get_if_diff_chn is the pre-context-refactor primitive, guarded by #if defined(ENABLE_UNUSED_FUNCTION) and not compiled into production builds. Its logic — peek only the header, skip the data COPY when scan == S_SUCCESS_CHN_UPTODATE — now lives as the CHN short-circuit in heap_get_visible_version_internal (5.5), which is the source of truth.
5.10 The scan variant — heap_next and heap_next_internal
Section titled “5.10 The scan variant — heap_next and heap_next_internal”A scan walks the page chain, iterates slots, filters non-object slots, and qualifies survivors through the same machinery (heap_scan_get_visible_version). heap_next, heap_prev, heap_next_sampling, and the *_record_info variants are one-line wrappers over heap_next_internal.
flowchart TD
A[heap_next_internal: start at hpgid/slot0, heap_get_last_vpid, or resume next_oid] --> P[outer loop: per page]
P --> PG{cache page right VPID?}
PG -->|stale| SW[stash to old_page_watcher then fetch]
PG -->|missing| FX[heap_scan_pb_lock_and_fetch]
PG -->|hit| IT
SW --> IT
FX --> IT[inner loop: spage_next_record PEEK]
IT --> T{slot type}
T -->|slot0 / REC_NEWHOME / REC_ASSIGN_ADDRESS / REC_UNKNOWN| IT
T -->|S_END| NX[heap_vpid_next]
T -->|object| QV[heap_scan_get_visible_version]
NX -->|NULL_PAGEID| END[return S_END]
NX -->|more| P
QV -->|S_SUCCESS, right class| RET[set next_oid, return S_SUCCESS]
QV -->|S_SUCCESS wrong class / NOT_SATISFIED / DOESNT_EXIST| IT
QV -->|S_ERROR| ERR[return error]
Figure 5-4. heap_next_internal page-chain + slot iteration.
Beyond the flowchart: the latch-retention stash keeps the previous page fixed in old_page_watcher until the next fix succeeds (avoiding thrash); the right page already cached → no fix. Filtering skips REC_NEWHOME (reached only via its relocation), REC_ASSIGN_ADDRESS/REC_UNKNOWN, and slot HEAP_HEADER_AND_CHAIN_SLOTID — except the get_rec_info variant (spage_next_record_dont_skip_empty) wanting every slot. cache_last_fix_page is forced true around heap_scan_get_visible_version so the home page returns for the next call; that qualifier has a REC_HOME+PEEK shortcut (MVCC_IS_HEADER_ALL_VISIBLE or snapshot satisfied → peeked recdes, no full context).
5.11 Stepping the page chain — heap_vpid_next
Section titled “5.11 Stepping the page chain — heap_vpid_next”heap_vpid_next reads slot HEAP_HEADER_AND_CHAIN_SLOTID and returns the successor VPID:
// heap_vpid_next -- src/storage/heap_file.cif (spage_get_record (thread_p, pgptr, HEAP_HEADER_AND_CHAIN_SLOTID, &recdes, PEEK) != S_SUCCESS) { VPID_SET_NULL (next_vpid); ret = ER_FAILED; }else { pgbuf_get_vpid (pgptr, next_vpid); if (next_vpid->pageid == hfid->hpgid && next_vpid->volid == hfid->vfid.volid) *next_vpid = ((HEAP_HDR_STATS *) recdes.data)->next_vpid; /* <- header page */ else *next_vpid = ((HEAP_CHAIN *) recdes.data)->next_vpid; /* <- normal page */ }The single non-obvious branch: the first page stores its link inside HEAP_HDR_STATS (which embeds a chain), every other page in a bare HEAP_CHAIN — picked by comparing the current VPID against hfid->hpgid (both structs in Chapter 1). A NULL_PAGEID terminates the walk.
5.12 Chapter summary — key takeaways
Section titled “5.12 Chapter summary — key takeaways”- One context, three data-bearing types. Only
REC_HOME/REC_RELOCATION/REC_BIGONEhold data; every other type →S_DOESNT_EXIST(scan) or an error (point read). heap_prepare_get_contextis the branch hub — fixes home, reads the type, and for relocation/bigone fixes the forward page, with a singletry_againretry on a mid-flight home unfix andassert(false)for the immutable BIGONE case.- The forward-consistency invariant (
REC_HOMExor non-null forward OID + forward page) is asserted at every consumer, lettingheap_get_mvcc_headerand the copier dereference the forward page blindly. - Snapshot verdicts are asymmetric.
TOO_OLD→S_SNAPSHOT_NOT_SATISFIED;SNAPSHOT_SATISFIED→ copy;TOO_NEW→ walkprev_version_lsainto the undo log until satisfied or exhausted (S_DOESNT_EXIST). - PEEK aliases the page, COPY duplicates into an area —
REC_RELOCATIONis always COPY; the CHN short-circuit runs unconditionally after the snapshot block. - The scan path reuses the point-read machinery plus
heap_vpid_nextchain walking, slot filtering, andcache_last_fix_pagelatch retention.heap_get_if_diff_chnis legacy;heap_get_last_versionis the snapshot-free sibling for updaters/lockers.
Chapter 6: Update Flow and Record Type Transitions
Section titled “Chapter 6: Update Flow and Record Type Transitions”A heap update is the most branch-heavy operation in the manager: the new image
may shrink, stay, or grow, and the source slot may be REC_HOME, a
REC_RELOCATION + REC_NEWHOME pair, or a REC_BIGONE overflow. CUBRID’s
contract is the OID never moves (res_oid is reset to oid), so growth is
absorbed by changing the physical representation behind the stable home slot.
This chapter traces how heap_update_logical dispatches on the current type
and how each worker chooses among in-place / relocate / overflow. For the
record-type taxonomy and MVCC header layout see Chapter 3; for insert-side
placement see Chapter 4; for the rationale see cubrid-heap-manager.md.
6.1 Entry: context creation and the MVCC/in-place fork
Section titled “6.1 Entry: context creation and the MVCC/in-place fork”heap_create_update_context is a pure initializer — no I/O — recording the
OID, class OID, the new image (recdes_p), and the update_in_place style:
// heap_create_update_context -- src/storage/heap_file.cCOPY_OID (&context->oid, oid_p);// ... condensed: COPY_OID (&context->class_oid, ...); context->scan_cache_p = scancache_p; ...context->recdes_p = recdes_p; /* the new image */context->type = HEAP_OPERATION_UPDATE;context->update_in_place = in_place; /* <- in-place vs MVCC switch */UPDATE_INPLACE_STYLE (in heap_file.h) has three values: UPDATE_INPLACE_NONE
(default MVCC path — new version with fresh INSID + a prev_version_lsa chain
entry); UPDATE_INPLACE_CURRENT_MVCCID (destructive in-place rewrite stamping
the current MVCCID, no new version); UPDATE_INPLACE_OLD_MVCCID (destructive
rewrite preserving old MVCC IDs — replication / redistribution). The
whole-operation toggle is computed once in heap_update_logical via
is_mvcc_op = HEAP_UPDATE_IS_MVCC_OP (is_mvcc_class, update_in_place), where
is_mvcc_class = !mvcc_is_mvcc_disabled_class (&class_oid) and the macro is
is_mvcc_class && !HEAP_IS_UPDATE_INPLACE(style).
INVARIANT — MVCC and in-place are mutually exclusive.
is_mvcc_opis true iff the class is MVCC-enabled and the style isUPDATE_INPLACE_NONE. On non-SERVER_MODEbuildsHEAP_UPDATE_IS_MVCC_OPis hard-codedfalse, so standalone tools always take the in-place arm.
6.2 heap_update_logical: locate, fetch, adjust, dispatch
Section titled “6.2 heap_update_logical: locate, fetch, adjust, dispatch”After the scancache, file-type (FILE_HEAP / FILE_HEAP_REUSE_SLOTS only),
and heap_is_valid_oid validations, the function fixes the home page
(heap_get_record_location), reads record_type = spage_get_record_type
(a REC_UNKNOWN slot is ER_HEAP_UNKNOWN_OBJECT), copies the home record into
home_recdes (undo source — COPY, not PEEK), calls
heap_update_adjust_recdes_header (Section 6.7) on the new image for a real
user class (!OID_IS_ROOTOID), decides do_supplemental_log (CDC), and
dispatches:
// heap_update_logical -- src/storage/heap_file.cswitch (context->record_type) { case REC_RELOCATION: rc = heap_update_relocation (thread_p, context, is_mvcc_op); break; case REC_BIGONE: rc = heap_update_bigone (thread_p, context, is_mvcc_op); break; case REC_ASSIGN_ADDRESS: context->is_logical_old = false; /* <- inserted this tran, not an old version */ [[fallthrough]]; case REC_HOME: rc = heap_update_home (thread_p, context, is_mvcc_op); break; default: rc = ER_HEAP_BAD_OBJECT_TYPE; goto exit; }The REC_ASSIGN_ADDRESS fallthrough routes a just-reserved address slot
through heap_update_home flagged not logically-old. On exit the home
watcher is handed to the scancache (if cache_last_fix_page) or unfixed, and
heap_unfix_watchers releases the rest.
6.3 heap_update_home: REC_HOME as the source type
Section titled “6.3 heap_update_home: REC_HOME as the source type”heap_update_home handles REC_HOME (and REC_ASSIGN_ADDRESS), picking one
of three destinations in strict priority — overflow, in-place, else relocation:
// heap_update_home -- src/storage/heap_file.cif (heap_is_big_length (context->recdes_p->length)) { /* 1. overflow */ heap_ovf_insert (..., &forward_oid, context->recdes_p); heap_build_forwarding_recdes (&forwarding_recdes, REC_BIGONE, &forward_oid); home_page_updated_recdes_p = &forwarding_recdes;}else if (!spage_is_updatable (..., context->recdes_p->length)) { /* 3. relocate */ context->recdes_p->type = REC_NEWHOME; heap_insert_newhome (..., context->recdes_p, &forward_oid, newhome_pg_watcher_p); heap_build_forwarding_recdes (&forwarding_recdes, REC_RELOCATION, &forward_oid); home_page_updated_recdes_p = &forwarding_recdes;}else { /* 2. in place: stays REC_HOME */ context->recdes_p->type = REC_HOME; home_page_updated_recdes_p = context->recdes_p;}These are rows 1–3 of Section 6.8. After choosing, a re-peek guard fires
when the destination is a forwarder and home_page_watcher_p->page_was_unfixed
(the page was released during the ordered second-page fix, possibly
vacuumed/compacted): it re-reads home_recdes so the logged undo image matches
current bytes. It then logs (heap_log_update_physical, RVHF_UPDATE_NOTIFY_VACUUM
for MVCC else RVHF_UPDATE; a live REC_ASSIGN_ADDRESS also uses
NOTIFY_VACUUM), captures prev_version_lsa, calls heap_update_physical, and
for MVCC ops calls heap_update_set_prev_version.
INVARIANT — REC_ASSIGN_ADDRESS is never MVCC-updated. An early guard (
!HEAP_IS_UPDATE_INPLACE && home_recdes.type == REC_ASSIGN_ADDRESS) returnsER_FAILEDwithassert(false): a reservation has no MVCC header to version, so only the non-MVCC in-place arm fills it.
flowchart TD
A["heap_update_home"] --> Z{REC_ASSIGN_ADDRESS\n and MVCC op?}
Z -->|yes| ZF["assert false; ER_FAILED"]
Z -->|no| B{heap_is_big_length?}
B -->|yes| C["REC_BIGONE forwarder"]
B -->|no| D{spage_is_updatable home?}
D -->|no| E["REC_RELOCATION + REC_NEWHOME"]
D -->|yes| F["REC_HOME in place"]
C --> H["re-peek if unfixed; log; physical; set_prev_version if mvcc"]
E --> H
F --> H
Figure 6-1: heap_update_home branch tree.
6.4 heap_update_relocation: the REC_RELOCATION + REC_NEWHOME source
Section titled “6.4 heap_update_relocation: the REC_RELOCATION + REC_NEWHOME source”The densest worker. The home slot holds a REC_RELOCATION OID pointing at a
REC_NEWHOME on the forward page. It reads forward_oid, fixes the forward
page (heap_fix_forward_page), and computes two predicates:
// heap_update_relocation -- src/storage/heap_file.cfits_in_home = spage_is_updatable (... home slot ..., context->recdes_p->length);fits_in_forward = spage_is_updatable (... forward slot ..., context->recdes_p->length);if (heap_is_big_length (context->recdes_p->length) || (!fits_in_forward && !fits_in_home)) heap_fix_header_page (thread_p, context); /* header page needed for overflow or a new newhome */A four-way decision then sets three booleans (update_old_home,
update_old_forward, remove_old_forward) and the new home image — rows 4–7
of Section 6.8, plus an impossible else (assert(false); ER_FAILED).
fits_in_home beats fits_in_forward, but only after the fits-nowhere case
has spawned a new relocation. Before any I/O, two asserts encode the bookkeeping
invariant: assert (remove_old_forward != update_old_forward) and
assert (remove_old_forward == update_old_home).
INVARIANT — the stale REC_NEWHOME is reconciled exactly once. The old
REC_NEWHOMEis either deleted (remove_old_forward, when the home slot changes meaning) or overwritten (update_old_forward, stay-relocated) — never both, never neither. The two asserts above crash on violation in debug builds: removed XOR updated, and removed iff the home is rewritten.
Up to three conditional blocks then run (home rewrite via
heap_log_update_physical; old-newhome free via heap_log_delete_physical +
heap_delete_physical; or in-place forward rewrite logging
heap_mvcc_log_home_no_change). The stay-relocated branch sets
prev_version_lsa to the forward-update LSA; the others use the
heap_log_delete_physical undo LSA of the deleted old newhome — that record
is the prior version. MVCC ops then call heap_update_set_prev_version with
newhome_pg_watcher_p (fresh newhome) or the existing forward watcher.
6.5 heap_update_bigone: the REC_BIGONE source
Section titled “6.5 heap_update_bigone: the REC_BIGONE source”The home slot holds a REC_BIGONE forwarder; the body lives in the overflow
file. ovf_oid is read from home_recdes.data, the header page fixed, and —
for MVCC ops — the old overflow content is read and logged under
RVHF_MVCC_UPDATE_OVERFLOW before anything changes:
// heap_update_bigone -- src/storage/heap_file.ccontext->ovf_oid = *((OID *) context->home_recdes.data);heap_fix_header_page (thread_p, context);if (is_mvcc_op) { heap_get_bigone_content (... &context->ovf_oid, &ovf_recdes); /* old version image */ log_append_undo_recdes2 (thread_p, RVHF_MVCC_UPDATE_OVERFLOW, &ovf_vfid, first_pgptr, -1, &ovf_recdes); or_mvcc_set_log_lsa_to_record (context->recdes_p, logtb_find_current_tran_lsa (thread_p)); /* prev_version_lsa */}INVARIANT — the overflow path stamps prev_version before the body changes. For
REC_BIGONEthe prev-version LSA is theRVHF_MVCC_UPDATE_OVERFLOWundo-record LSA, wired into the new image’s header here becauseheap_update_set_prev_version(6.7) is not called by this worker.
The body update then forks three ways — rows 8–10 of Section 6.8:
// heap_update_bigone -- src/storage/heap_file.cif (heap_is_big_length (context->recdes_p->length)) { /* overflow -> overflow */ is_old_home_updated = false; heap_ovf_update (..., &context->ovf_oid, context->recdes_p); if (is_mvcc_op) heap_mvcc_log_home_no_change (...); /* vacuum must still reach new overflow */}else if (spage_update (..., context->recdes_p) == SP_SUCCESS) { /* overflow -> home */ is_old_home_updated = true; context->record_type = context->recdes_p->type = REC_HOME; spage_update_record_type (..., REC_HOME); new_home_recdes = *context->recdes_p;}else { /* overflow -> relocation */ context->recdes_p->type = REC_NEWHOME; heap_insert_newhome (..., context->recdes_p, &newhome_oid, NULL); /* + REC_RELOCATION forwarder */ is_old_home_updated = true;}Cleanup is keyed by is_old_home_updated: false in the stay-overflow branch
(only heap_mvcc_log_home_no_change under MVCC); true in both contract
branches, where a trailing block logs the home change (RVHF_UPDATE_NOTIFY_VACUUM
/ RVHF_UPDATE) and heap_ovf_delete reclaims the orphan overflow chain. A
NULL from heap_ovf_update propagates as ASSERT_ERROR_AND_SET; goto exit.
6.6 heap_insert_newhome and heap_ovf_update — the placement helpers
Section titled “6.6 heap_insert_newhome and heap_ovf_update — the placement helpers”heap_insert_newhome is shared by all three workers (and the delete path) to
materialize a relocated REC_NEWHOME: it builds a fresh insert context on
the parent’s HFID/class, places the record via best-space search (Chapter 9),
logs a plain RVHF_INSERT (vacuum not notified — it never scans REC_NEWHOME),
and copies the OID out:
// heap_insert_newhome -- src/storage/heap_file.cheap_create_insert_context (&ins_context, &parent_context->hfid, &parent_context->class_oid, recdes_p, NULL);heap_find_location_and_insert_rec_newhome (thread_p, &ins_context);heap_log_insert_physical (... ins_context.recdes_p, false, false); /* RVHF_INSERT, not MVCC */if (out_oid_p != NULL) COPY_OID (out_oid_p, &ins_context.res_oid);if (newhome_pg_watcher != NULL) /* hand fixed page back */ pgbuf_replace_watcher (thread_p, ins_context.home_page_watcher_p, newhome_pg_watcher);When newhome_pg_watcher is non-NULL (the MVCC relocate branches), the
newly-fixed page is handed back via pgbuf_replace_watcher rather than
unfixed, so heap_update_set_prev_version can later patch the prev-version
field in place. heap_ovf_update is a thin wrapper resolving the overflow VFID
(heap_ovf_find_vfid) and delegating to overflow_update, returning ovf_oid
or NULL.
6.7 Header stamping: heap_update_adjust_recdes_header and heap_update_set_prev_version
Section titled “6.7 Header stamping: heap_update_adjust_recdes_header and heap_update_set_prev_version”These two split the MVCC version-chain work across the operation’s timeline.
heap_update_adjust_recdes_header runs up front (from heap_update_logical),
rewriting the new image’s header with a NULL prev-version LSA (the undo LSA is
unknown until logging). Its optimized path (MVCC op, image not big, source
header has no DELID) reserves room for INSID + prev-version LSA in one
memmove, stamps the fresh MVCCID, and writes a placeholder NULL LSA:
// heap_update_adjust_recdes_header -- src/storage/heap_file.cupdate_mvcc_flags = OR_MVCC_FLAG_VALID_INSID | OR_MVCC_FLAG_VALID_PREV_VERSION;mvcc_id = logtb_get_current_mvccid (thread_p);if ((mvcc_flags & update_mvcc_flags) != update_mvcc_flags) { repid_and_flag_bits |= (update_mvcc_flags << OR_MVCC_FLAG_SHIFT_BITS); memmove (new_data_p, existing_data_p, ...); /* room for INSID + LSA; ... condensed ... */}OR_PUT_BIGINT (new_ins_mvccid_pos_p, &mvcc_id); /* fresh INSID */memcpy (new_ins_mvccid_pos_p + OR_MVCCID_SIZE, &null_lsa, ...); /* placeholder LSA */The slow path (or_mvcc_get_header + per-flag editing) handles the rest:
UPDATE_INPLACE_OLD_MVCCID keeps the old INSID; the non-MVCC arm strips all
MVCC flags (MVCC_CLEAR_ALL_FLAG_BITS); big records force
OR_MVCC_MAX_HEADER_SIZE. Every MVCC arm leaves prev_version_lsa NULL.
heap_update_set_prev_version runs after the physical update and logging,
patching the now-known undo LSA into the record via PEEK (no spage_update),
dispatching on the home slot’s current type:
// heap_update_set_prev_version -- src/storage/heap_file.cspage_get_record (... home_pg_watcher->pgptr, oid->slotid, &recdes, PEEK);if (recdes.type == REC_HOME) { /* patch home record */ or_mvcc_set_log_lsa_to_record (&recdes, prev_version_lsa);} else if (recdes.type == REC_RELOCATION) { /* follow to REC_NEWHOME, patch it */ forward_oid = *((OID *) recdes.data); spage_get_record (... fwd_pg_watcher->pgptr, forward_oid.slotid, &forward_recdes, PEEK); or_mvcc_set_log_lsa_to_record (&forward_recdes, prev_version_lsa);} else if (recdes.type == REC_BIGONE) { /* patch overflow OR header */ forward_recdes.data = overflow_get_first_page_data (overflow_pg_watcher.pgptr); or_mvcc_set_log_lsa_to_record (&forward_recdes, prev_version_lsa);} else { assert (false); error_code = ER_FAILED; } /* each arm pgbuf_set_dirty's its page */INVARIANT — prev_version_lsa points at the undo image of the immediately prior version.
heap_update_adjust_recdes_headerreserves the slot;heap_update_set_prev_version(home/relocation) or the inline stamp inheap_update_bigonefills it with the logging-time LSA. TheREC_RELOCATIONarm needsfwd_pg_watcheron the forward page — hence thepgbuf_replace_watcherhandoff in 6.6.
6.8 The full old-type × new-size transition matrix
Section titled “6.8 The full old-type × new-size transition matrix”Combining the three workers, every reachable transition is:
| Source type | Size condition | Resulting home type | Body action |
|---|---|---|---|
REC_HOME | big | REC_BIGONE | new overflow inserted |
REC_HOME | not big, no home fit | REC_RELOCATION | new REC_NEWHOME |
REC_HOME | not big, fits home | REC_HOME | in place |
REC_RELOCATION | big | REC_BIGONE | new overflow; old newhome deleted |
REC_RELOCATION | not big, fits neither | REC_RELOCATION | new newhome; old newhome deleted |
REC_RELOCATION | not big, fits home | REC_HOME | image into home; old newhome deleted |
REC_RELOCATION | not big, fits fwd | REC_RELOCATION (unchanged) | old newhome updated in place |
REC_BIGONE | big | REC_BIGONE (unchanged) | heap_ovf_update in place |
REC_BIGONE | not big, spage_update ok | REC_HOME | overflow deleted |
REC_BIGONE | not big, spage_update fails | REC_RELOCATION | new newhome; overflow deleted |
The unifying priority across all three sources is overflow > home >
relocate, with heap_update_relocation’s one specialization that reusing the
already-fixed forward slot (stay-relocated) beats allocating a new newhome.
6.9 Chapter summary — key takeaways
Section titled “6.9 Chapter summary — key takeaways”heap_update_logicaldispatches on the current home-slot type into one of three workers; the OID never moves (res_oid = oid), only the physical representation behind it changes.- The MVCC-vs-in-place fork is one boolean (
is_mvcc_op = is_mvcc_class && update_in_place == UPDATE_INPLACE_NONE), asserted mutually exclusive; the in-place styles stamp the current or preserve the old MVCCID. - Each worker chooses its destination in the priority overflow > home >
relocate;
heap_update_relocationadds a stay-in-forward specialization preferring the already-fixed forward slot. heap_update_relocation’s three booleans are guarded by two asserts that reconcile the staleREC_NEWHOMEexactly once — deleted when the home meaning changes, updated when it stays relocated, never both or neither.- Header stamping is split in time:
heap_update_adjust_recdes_headerreserves INSID + a NULL prev-version LSA up front; the LSA is patched later (byheap_update_set_prev_version, or inline for bigone underRVHF_MVCC_UPDATE_OVERFLOW), keeping the read-path version chain (Chapter 5) intact.
Chapter 7: Delete Flow and Tombstoning
Section titled “Chapter 7: Delete Flow and Tombstoning”A DELETE in CUBRID is not “remove the bytes.” Under MVCC the record
body must survive so a snapshot that began before the delete committed
can still see the old row. An MVCC delete is therefore almost
identical to the in-place update of
Chapter 6 — it
stamps a delete MVCCID on the header and leaves the payload intact — and
only physical deletes (non-MVCC tables, plus the eventual freeing of
forwarded bodies) call spage_delete. This chapter answers: how does
DELETE leave the record in place for readers, where does it diverge from
UPDATE, and when is a slot physically torn out instead? (Reading the
stamped mvcc_del_id back is
Chapter 5.)
7.1 Entry point: heap_create_delete_context and the delete dispatcher
Section titled “7.1 Entry point: heap_create_delete_context and the delete dispatcher”DELETE enters through a HEAP_OPERATION_CONTEXT from
heap_create_delete_context. Unlike the update context-builder
(Chapter 6) it is deliberately bare — there is no recdes_p: the
new record is synthesized during the delete from the record already
on the page, so the caller supplies nothing but the OID.
// heap_create_delete_context -- src/storage/heap_file.cheap_clear_operation_context (context, hfid_p);COPY_OID (&context->oid, oid_p); COPY_OID (&context->class_oid, class_oid_p);context->scan_cache_p = scancache_p;context->type = HEAP_OPERATION_DELETE; /* <- no recdes_p */heap_delete_logical drives the branches mapped in Figure 7-1: input
validation (heap_is_valid_oid / heap_scancache_check_with_hfid,
failing to ER_FAILED before any page is touched); a file-type guard
(anything but FILE_HEAP/FILE_HEAP_REUSE_SLOTS is fatal); the
root-class case (heap_mark_class_as_modified); the MVCC decision
(invariant below); locate under X_LOCK via heap_get_record_location
spage_get_record_type(aREC_UNKNOWNslot raisesER_HEAP_UNKNOWN_OBJECT); aCOPYsnapshot intocontext->home_recdes; and therecord_typedispatch:
// heap_delete_logical -- src/storage/heap_file.c#if defined (SERVER_MODE) if (mvcc_is_mvcc_disabled_class (&context->class_oid)) is_mvcc_op = false; /* <- catalog/system class */ else is_mvcc_op = true; /* <- ordinary user table */#else is_mvcc_op = false; /* <- standalone (SA) mode */#endif // ... condensed ... switch (context->record_type) { case REC_BIGONE: rc = heap_delete_bigone (thread_p, context, is_mvcc_op); break; case REC_RELOCATION: rc = heap_delete_relocation (thread_p, context, is_mvcc_op); break; case REC_HOME: case REC_ASSIGN_ADDRESS: rc = heap_delete_home (thread_p, context, is_mvcc_op); break; default: /* REC_NEWHOME, REC_MARKDELETED, ... reached directly => bug */ er_set (..., ER_HEAP_BAD_OBJECT_TYPE, ...); rc = ER_FAILED; goto error; }flowchart TD
A["heap_delete_logical"] --> B{"valid oid &\nFILE_HEAP?"}
B -- no --> ERR["ER_FAILED / fatal"]
B -- yes --> D{"mvcc disabled\nclass / SA?"}
D -- yes --> E["is_mvcc_op=false"]
D -- no --> F["is_mvcc_op=true"]
E --> H{"record_type"}
F --> H
H -- REC_HOME / REC_ASSIGN_ADDRESS --> I["heap_delete_home"]
H -- REC_RELOCATION --> J["heap_delete_relocation"]
H -- REC_BIGONE --> K["heap_delete_bigone"]
H -- other --> L["ER_HEAP_BAD_OBJECT_TYPE"]
Figure 7-1 — heap_delete_logical branch map (rootclass + X_LOCK
locate elided). The same three workers serve delete and update.
Invariant 7-A — the MVCC/physical decision is made once, from the class, and threaded down unchanged.
is_mvcc_opis computed once (!mvcc_is_mvcc_disabled_classon the class OID,falsestandalone) and passed to every worker; they never re-derive it. Wrong here, a user row would be physically deleted and vanish from snapshots that should still see it.
7.2 heap_delete_home — the REC_HOME / REC_ASSIGN_ADDRESS path
Section titled “7.2 heap_delete_home — the REC_HOME / REC_ASSIGN_ADDRESS path”heap_delete_home shares the spine of the update worker
heap_update_home (Chapter 6) — read flags, build via a fast/slow path,
classify, relocate if it no longer fits — but diverges precisely:
| Aspect | UPDATE worker (Ch 6) | DELETE worker (this chapter) |
|---|---|---|
| New payload | caller’s context->recdes_p | synthesized from the existing record |
| Header stamp | new repid/data, prev_version_lsa rewritten | mvcc_del_id set; prev_version_lsa NOT written |
| Size delta | arbitrary (shrink or grow a lot) | grows by at most OR_MVCCID_SIZE (8B) if DELID was absent |
| Body bytes | replaced | copied through verbatim |
| Forwarder/overflow | may be created or freed | touched only when it must be (REC_BIGONE: edit in place) |
Invariant 7-B — an MVCC delete never writes
prev_version_lsa. A delete creates no new version, soheap_delete_adjust_headersets onlyOR_MVCC_FLAG_VALID_DELID+ the DELID, and the fast path preserves existing prev-version bytes by copying fromdelid_offsetonward. Break this and a reader following the chain from a later version loops back into a dead record.
After re-fetching the record if the page was unfixed (the
vacuum-might-have-shrunk-it guard), heap_delete_home splits on
is_mvcc_op. The MVCC branch builds the death-stamped record by one
of two paths gated by use_optimization:
// heap_delete_home -- src/storage/heap_file.crepid_and_flag_bits = OR_GET_MVCC_REPID_AND_FLAG (context->home_recdes.data);mvcc_flags = (repid_and_flag_bits >> OR_MVCC_FLAG_SHIFT_BITS) & OR_MVCC_FLAG_MASK;adjusted_size = context->home_recdes.length;use_optimization = true;if (!(mvcc_flags & OR_MVCC_FLAG_VALID_DELID)) { adjusted_size += OR_MVCCID_SIZE; /* <- 8 more bytes for DELID */ is_adjusted_size_big = heap_is_big_length (adjusted_size); if (is_adjusted_size_big) use_optimization = false; /* rare: spills to overflow */ }else { /* DELID already set: re-delete of vacuum-pending row */ is_adjusted_size_big = false; use_optimization = false; }The fast path (use_optimization; DELID absent, result small) is
pure byte surgery — copy [0, delid_offset), OR
OR_MVCC_FLAG_VALID_DELID into the leading word, splice in the 8-byte
MVCCID, copy the remainder, no header parse. The slow path (DELID
present, or the record would become big) calls or_mvcc_get_header,
heap_delete_adjust_header, re-serializes with or_mvcc_add_header,
then memcpys the body.
// heap_delete_adjust_header -- src/storage/heap_file.cMVCC_SET_FLAG_BITS (header_p, OR_MVCC_FLAG_VALID_DELID);MVCC_SET_DELID (header_p, mvcc_id); /* <- the death stamp, nothing else */if (need_mvcc_header_max_size) HEAP_MVCC_SET_HEADER_MAXIMUM_SIZE (header_p); /* <- only when spilling to REC_BIGONE */The built record is classified — REC_BIGONE if
is_adjusted_size_big, else REC_NEWHOME if !spage_is_updatable (... built_recdes.length), else REC_HOME — and acted on (Figure 7-2).
REC_HOME (common) does no relocation: heap_mvcc_log_delete (..., RVHF_MVCC_DELETE_REC_HOME), then heap_update_physical overwrites the
slot in place, OID and body unchanged. REC_NEWHOME (no longer fits)
calls heap_insert_newhome, builds a REC_RELOCATION forwarder via
heap_build_forwarding_recdes, logs heap_mvcc_log_home_change_on_delete,
then heap_update_physical writes the home. REC_BIGONE (crossed the
big-length threshold) is identical but with heap_ovf_insert to overflow.
Non-MVCC branch (is_mvcc_op == false): no header games — call
heap_log_delete_physical (..., is_reusable, ...) with
is_reusable = heap_is_reusable_oid (context->file_type), then
heap_delete_physical. That flag, not the page, drives slot
recyclability (§7.5).
flowchart TD
A["heap_delete_home"] --> B{"is_mvcc_op?"}
B -- no --> P["log + heap_delete_physical"]
B -- yes --> C{"DELID present\nor size big?"}
C -- no, small --> D["fast: set DELID flag + MVCCID"]
C -- yes/rare --> E["slow: get/adjust/add header"]
D --> F{"classify"}
E --> F
F -- REC_HOME --> G["RVHF_MVCC_DELETE_REC_HOME\nupdate in place"]
F -- REC_NEWHOME --> H["heap_insert_newhome\nREC_RELOCATION in home"]
F -- REC_BIGONE --> I["heap_ovf_insert\nREC_BIGONE in home"]
Figure 7-2 — heap_delete_home. The MVCC branch can promote a home
record to relocated/big when the 8-byte DELID overflows its slot.
7.3 heap_delete_relocation — the already-forwarded path
Section titled “7.3 heap_delete_relocation — the already-forwarded path”When the home slot is a REC_RELOCATION forwarder, the body lives in a
REC_NEWHOME slot on the forward page. heap_delete_relocation fixes
the forward page (heap_fix_forward_page), peeks the forward record,
and (MVCC branch) builds the death-stamped record with the §7.2
fast/slow split. Three booleans decide where it lands:
| Branch | remove_old_forward | update_old_forward | update_old_home | New home record |
|---|---|---|---|---|
is_adjusted_size_big | yes | no | yes | REC_BIGONE forwarder |
fits_in_home | yes | no | yes | REC_HOME (body folded back into home) |
fits_in_forward | no | yes | no | unchanged (forward updated in place) |
| else | yes | no | yes | REC_RELOCATION to a fresh NEWHOME |
Epilogues: update_old_home rewrites the home slot
(heap_mvcc_log_home_change_on_delete + heap_update_physical); the
else/fits_in_forward case logs heap_mvcc_log_home_no_change to
advance vacuum status; update_old_forward rewrites the forward slot in
place (heap_mvcc_log_delete (..., RVHF_MVCC_DELETE_REC_NEWHOME));
remove_old_forward frees the old forward slot
(log_append_undoredo_recdes (RVHF_DELETE, ...) + heap_delete_physical),
appending a RVHF_MARK_REUSABLE_SLOT postpone for reusable-OID heaps
since the relocated body is unreferenced.
Non-MVCC branch: physically delete both slots (home then forward,
each heap_log_delete_physical + heap_delete_physical); the forward
slot is always logged mark_reusable = true regardless of heap type,
since a relocated record is never referenced by an index.
Invariant 7-C —
heap_mvcc_log_home_no_changeruns on every MVCC delete that does not rewrite the home slot. Vacuum walks the page’s vacuum chain (Chapter 8), keyed off its max MVCCID; a delete touching only the forward/overflow page would leave the home chain unaware and vacuum would never revisit. The call updates the chain viaheap_page_update_chain_after_mvcc_opand, on a status flip, ORsHEAP_RV_FLAG_VACUUM_STATUS_CHANGEinto the log offset so recovery rebuilds it.
7.4 heap_delete_bigone — editing the fixed-size overflow header
Section titled “7.4 heap_delete_bigone — editing the fixed-size overflow header”A REC_BIGONE home slot holds only an overflow OID; the real record
(with its full-size MVCC header) sits on the overflow page. Rather than
move the body, the MVCC branch edits the header in place on the
overflow page:
// heap_delete_bigone -- src/storage/heap_file.coverflow_oid = *((OID *) context->home_recdes.data); /* home holds only the OID */// ... fix overflow page WRITE, check PAGE_OVERFLOW ptype ...heap_get_mvcc_rec_header_from_overflow (... &overflow_header ...);heap_mvcc_log_delete (thread_p, &log_addr, RVHF_MVCC_DELETE_OVERFLOW);heap_delete_adjust_header (&overflow_header, mvcc_id, false); /* <- false: already max size */rc = heap_set_mvcc_rec_header_on_overflow (context->overflow_page_watcher_p->pgptr, &overflow_header);heap_set_mvcc_rec_header_on_overflow is the dedicated mutator. Because
overflow headers are always stored at maximum size (every flag present,
slots pre-reserved), the DELID is written without moving a payload byte:
// heap_set_mvcc_rec_header_on_overflow -- src/storage/heap_file.covf_recdes.data = overflow_get_first_page_data (ovf_page);ovf_recdes.area_size = ovf_recdes.length = OR_HEADER_SIZE (ovf_recdes.data);// force INSID slot present (MVCCID_ALL_VISIBLE) if absent ...if (!MVCC_IS_FLAG_SET (mvcc_header, OR_MVCC_FLAG_VALID_DELID)) { /* force DELID slot present */ MVCC_SET_FLAG_BITS (mvcc_header, OR_MVCC_FLAG_VALID_DELID); MVCC_SET_DELID (mvcc_header, MVCCID_NULL); }return or_mvcc_set_header (&ovf_recdes, mvcc_header); /* <- overwrites only the header region */After stamping overflow the home slot is not rewritten, but
heap_mvcc_log_home_no_change is still logged against it so the home
page’s vacuum chain advances (Invariant 7-C); the forwarder keeps
pointing at the overflow record and is freed only on the non-MVCC branch.
Non-MVCC branch: free both — heap_log_delete_physical +
heap_delete_physical removes the home forwarder slot, then
heap_ovf_delete releases the overflow chain (the only delete path that
calls it).
Invariant 7-D — the overflow header is fixed at
OR_MVCC_MAX_HEADER_SIZE, which is whyREC_BIGONEdeletes never relocate. A compact home/relocation header can grow past a slot/big-length boundary when the 8-byte DELID is added (§7.2, §7.3); an overflow header always carries every slot (the mutator forces INSID and DELID present), so the DELID writes into pre-reserved space — henceneed_mvcc_header_max_sizeisfalse. Stored compact, the in-place stamp would overrun into the body.
7.5 heap_delete_physical, spage_delete, and how OID reuse is decided
Section titled “7.5 heap_delete_physical, spage_delete, and how OID reuse is decided”Every physical removal funnels through heap_delete_physical, a thin
wrapper that snapshots spage_get_free_space_without_saving, calls
spage_delete (ER_FAILED on NULL_SLOTID), then heap_stats_update
(best-space cache, Ch 9) and pgbuf_set_dirty. spage_delete chooses
the tombstone shape from the page header’s anchor_type:
// spage_delete -- src/storage/slotted_page.cswitch (page_header_p->anchor_type) /* ANCHORED arms set offset_to_record = SPAGE_EMPTY_OFFSET */ { case ANCHORED: /* non-heap callers only */ slot_p->record_type = REC_DELETED_WILL_REUSE; break; case ANCHORED_DONT_REUSE_SLOTS: /* <- ALWAYS taken by heap deletes */ slot_p->record_type = REC_MARKDELETED; break; // UNANCHORED_* (not used by heap files) remove the slot entry; default asserts }The anchor cannot by itself tell heaps apart: every heap page is
ANCHORED_DONT_REUSE_SLOTS, because heap_get_spage_type returns that
type unconditionally (for both FILE_HEAP and FILE_HEAP_REUSE_SLOTS)
and every heap spage_initialize passes it. So a heap delete always
takes that arm and stamps REC_MARKDELETED, reclaiming the body bytes
but keeping the SPAGE_SLOT entry; the ANCHORED arm is non-heap only.
OID reuse is therefore decided by the caller: for a
FILE_HEAP_REUSE_SLOTS table heap_is_reusable_oid (file_type) is
true, so the workers append a RVHF_MARK_REUSABLE_SLOT postpone
after the physical delete (in heap_log_delete_physical when
mark_reusable, and on the relocation forward-slot path), whose redo
handler heap_rv_redo_mark_reusable_slot runs the upgrader at commit and
on recovery replay:
// spage_mark_deleted_slot_as_reusable -- src/storage/slotted_page.c// ... asserts slot is empty and REC_MARKDELETED/REC_DELETED_WILL_REUSE ...slot_p->record_type = REC_DELETED_WILL_REUSE; /* <- REC_MARKDELETED -> reusable */Invariant 7-E — recyclability is decided by the caller (
heap_is_reusable_oid+ aRVHF_MARK_REUSABLE_SLOTpostpone), not by the page anchor type. All heap pages areANCHORED_DONT_REUSE_SLOTS, sospage_deletealways tombstones toREC_MARKDELETED; only aFILE_HEAP_REUSE_SLOTSdelete appends the postpone whose handler runsspage_mark_deleted_slot_as_reusableto upgrade it toREC_DELETED_WILL_REUSE. If a non-reusable heap appended that postpone, its OIDs would be recycled under stale index keys. Corroboration:spage_delete_for_recoveryonly special-casesANCHORED_DONT_REUSE_SLOTS(downgrading a freshREC_MARKDELETEDtoREC_DELETED_WILL_REUSEwhile undoing an insert — a never-committed record has no external OID reference) — pointless unless heap pages are that type.
7.6 Chapter summary — key takeaways
Section titled “7.6 Chapter summary — key takeaways”- DELETE reuses the update workers —
heap_delete_logicaldispatches toheap_delete_home/heap_delete_relocation/heap_delete_bigoneexactly as update does (Chapter 6); the difference is per-worker, not structural. - MVCC delete = stamp, not remove — the worker sets
OR_MVCC_FLAG_VALID_DELID+ the MVCCID viaheap_delete_adjust_headerand writes in place; the body survives and noprev_version_lsais written (Invariant 7-B). The 8-byte stamp can push a compactREC_HOMEtoREC_NEWHOMEorREC_BIGONE, so the worker re-classifies after building. REC_BIGONEis the cheap case — overflow headers are stored atOR_MVCC_MAX_HEADER_SIZE, soheap_set_mvcc_rec_header_on_overflowstamps the DELID into pre-reserved space without moving the body (Invariant 7-D).- Vacuum is always notified — even when the home slot is unchanged,
heap_mvcc_log_home_no_changeadvances its vacuum chain (Invariant 7-C); physical delete happens only on non-MVCC paths (catalog, standalone) and when freeing forwarded/overflow bodies. - The page anchor does NOT decide OID reuse (Invariant 7-E) — all heap pages are
ANCHORED_DONT_REUSE_SLOTS(heap_get_spage_type), sospage_deletealways stampsREC_MARKDELETED; aFILE_HEAP_REUSE_SLOTSdelete (gated byheap_is_reusable_oid) appends aRVHF_MARK_REUSABLE_SLOTpostpone whose handler runsspage_mark_deleted_slot_as_reusableto upgradeREC_MARKDELETED -> REC_DELETED_WILL_REUSE.
Chapter 8: Vacuum Reclamation and Page Vacuum Status
Section titled “Chapter 8: Vacuum Reclamation and Page Vacuum Status”Chapters 6 and 7 left dead tuples on the page (a deleted record’s
mvcc_del_id, a superseded version at the tail of a prev_version_lsa
chain); neither was freed then, because an older snapshot might still
need the old version. The deferred reclaimer is vacuum. This chapter
answers: who physically reclaims the slot, and how does a page learn it
is safe to deallocate? For MVCC visibility theory and the
active-vs-vacuum split, see cubrid-heap-manager.md (“Vacuum and the
dead-version problem”). Here we trace the code.
8.1 The threshold contract and the page-walk driver
Section titled “8.1 The threshold contract and the page-walk driver”Vacuum never reasons about individual snapshots. It is handed a single
threshold_mvccid — the oldest MVCCID still possibly visible to any
active transaction. Any version whose mvcc_del_id (delete) or insert
MVCCID (superseded version) precedes that threshold is provably invisible
to everyone and may be destroyed. heap_vacuum_all_objects is the bulk
entry point (compactdb, class purge); it walks the file page-by-page.
// heap_vacuum_all_objects -- src/storage/heap_file.c next_vpid.pageid = upd_scancache->node.hfid.hpgid; /* <- start at header page */ reusable = heap_is_reusable_oid (upd_scancache->file_type); /* <- slot reuse policy */ while (!VPID_ISNULL (&next_vpid)) { vpid = next_vpid; error_code = pgbuf_ordered_fix (thread_p, &vpid, OLD_PAGE, PGBUF_LATCH_WRITE, &pg_watcher); if (error_code != NO_ERROR) { goto exit; } /* <- fix failed */ // ... unfix old_pg_watcher (previous page in the rolling double-watcher) ... error_code = heap_vpid_next (thread_p, &upd_scancache->node.hfid, pg_watcher.pgptr, &next_vpid); if (error_code != NO_ERROR) { assert (false); goto exit; } /* <- corrupt next_vpid chain */ worker.n_heap_objects = spage_number_of_slots (pg_watcher.pgptr) - 1; /* <- minus header slot */ if (worker.n_heap_objects > 0 && heap_page_get_vacuum_status (thread_p, pg_watcher.pgptr) != HEAP_PAGE_VACUUM_NONE) { // ... fill worker.heap_objects[i].oid.slotid = 1..n; skip already-clean pages ... error_code = vacuum_heap_page (thread_p, worker.heap_objects, worker.n_heap_objects, threshold_mvccid, &upd_scancache->node.hfid, &reusable, false); if (error_code != NO_ERROR) { goto exit; } } pgbuf_replace_watcher (thread_p, &pg_watcher, &old_pg_watcher); }exit: // ... unfix both watchers, free worker.heap_objects ...Error branches are annotated inline (Figure 8-1 covers all of them). The
design-carrying branch is the HEAP_PAGE_VACUUM_NONE skip — a page owing
no vacuum is never scanned. Otherwise the driver builds the OID array
[1..n] and delegates to vacuum_heap_page (query/vacuum.c), the only
caller of spage_vacuum_slot, heap_page_set_vacuum_status_none, and
heap_remove_page_on_vacuum: heap_vacuum_all_objects selects candidate
pages, vacuum_heap_page selects candidate slots.
Invariant — vacuum only ever frees provably-dead versions. A slot reaches
spage_vacuum_slotonly aftervacuum_heap_recordconfirms its MVCCID precedesthreshold_mvccid(viavacuum_is_mvccid_vacuumed). The!= HEAP_PAGE_VACUUM_NONEgate is a coarse pre-filter; the per-record test is the real guard. Violating it strands an active snapshot on a freed slot.
flowchart TD
A["heap_vacuum_all_objects\nnext_vpid = header page"] --> B{"VPID_ISNULL(next_vpid)?"}
B -- yes --> Z["unfix, free, return"]
B -- no --> C["pgbuf_ordered_fix(WRITE)"]
C --> D{"fix ok?"}
D -- no --> Z
D -- yes --> E["heap_vpid_next -> next_vpid"]
E --> F{"n_objects>0 AND status != NONE?"}
F -- no --> G["replace watcher, loop"]
F -- yes --> H["build OID[1..n], vacuum_heap_page"]
H --> I{"error?"}
I -- yes --> Z
I -- no --> G
G --> B
Figure 8-1. heap_vacuum_all_objects page-walk, every branch.
8.2 spage_vacuum_slot: turning a live slot into reclaimable space
Section titled “8.2 spage_vacuum_slot: turning a live slot into reclaimable space”vacuum_heap_record calls spage_vacuum_slot once per dead version. It
credits the freed bytes back to total_free; it does not compact.
// spage_vacuum_slot -- src/storage/slotted_page.c SPAGE_SLOT *slot_p = spage_find_slot (page_p, page_header_p, slotid, false); if (slot_p->record_type == REC_MARKDELETED || slot_p->record_type == REC_DELETED_WILL_REUSE) vacuum_er_log_error (..., "... was already vacuumed", ...); /* <- double-vacuum: log, continue */ page_header_p->num_records--; /* <- one fewer live record */ waste = DB_WASTED_ALIGN (slot_p->record_length, page_header_p->alignment); page_header_p->total_free += slot_p->record_length + waste; /* <- bytes returned to the page */ slot_p->offset_to_record = SPAGE_EMPTY_OFFSET; /* <- slot now points at nothing */ if (reusable) slot_p->record_type = REC_DELETED_WILL_REUSE;/* <- no refs: OID can be recycled */ else slot_p->record_type = REC_MARKDELETED; /* <- refs may exist: keep slot id reserved */When reusable == false the slot becomes REC_MARKDELETED and its slot
id stays reserved — an external reference (an old version’s
prev_version_lsa chain, or a relocation) must still resolve to a
tombstone, not a recycled record. The already-vacuumed branch is
defensive, tolerating the interrupted-and-replayed scenario (8.4). What
it leaves behind: offset_to_record is SPAGE_EMPTY_OFFSET, but the
slot-array entry is not removed and the record bytes are still present
— only total_free was bumped. Shrinking the slot array is
spage_reclaim’s job (8.5); recovering data-area free space is
spage_compact’s (Chapter 9).
8.3 The HEAP_PAGE_VACUUM_STATUS state machine
Section titled “8.3 The HEAP_PAGE_VACUUM_STATUS state machine”A heap page must not be deallocated while a vacuum worker could still
visit it. CUBRID predicts that with a three-state flag in the top two
bits of HEAP_CHAIN.flags (the chain struct is fully tabulated in
Chapter 1).
// HEAP_PAGE_VACUUM_STATUS enum -- src/storage/heap_file.hHEAP_PAGE_VACUUM_NONE, /* Heap page is completely vacuumed. */HEAP_PAGE_VACUUM_ONCE, /* Heap page requires one vacuum action. */HEAP_PAGE_VACUUM_UNKNOWN /* Heap page requires an unknown number of vacuum actions. */
// flag bits -- src/storage/heap_file.c#define HEAP_PAGE_FLAG_VACUUM_STATUS_MASK 0xC0000000#define HEAP_PAGE_FLAG_VACUUM_ONCE 0x80000000#define HEAP_PAGE_FLAG_VACUUM_UNKNOWN 0x40000000HEAP_PAGE_GET_VACUUM_STATUS decodes those bits (both clear ⇒ NONE);
HEAP_PAGE_SET_VACUUM_STATUS clears the mask then ORs the bit in. The
low 30 bits of flags carry other attributes. Two HEAP_CHAIN fields
drive the machine:
| Field | Role | Why it exists |
|---|---|---|
flags (top 2 bits) | Current vacuum status (NONE/ONCE/UNKNOWN) | Advertises owed vacuum visits without scanning records |
max_mvccid | Largest MVCCID of any MVCC op ever applied to the page | The only escape from UNKNOWN: older than vacuum’s horizon ⇒ all owed vacuum has run |
max_mvccid reads differently per state: in NONE stale (next op
asserts it precedes the new id, then bumps it); in ONCE the owed vacuum
targets a version <= max_mvccid; in UNKNOWN the escape test
(vacuum_is_mvccid_vacuumed(max_mvccid) true ⇒ all owed vacuum ran).
MVCC ops push the machine forward; vacuum pulls it back:
// heap_page_update_chain_after_mvcc_op -- src/storage/heap_file.cswitch (vacuum_status) {case HEAP_PAGE_VACUUM_NONE: assert (MVCC_ID_PRECEDES (chain->max_mvccid, mvccid)); HEAP_PAGE_SET_VACUUM_STATUS (chain, HEAP_PAGE_VACUUM_ONCE); break; /* <- first op since clean */case HEAP_PAGE_VACUUM_ONCE: HEAP_PAGE_SET_VACUUM_STATUS (chain, HEAP_PAGE_VACUUM_UNKNOWN); break; /* <- future unpredictable */case HEAP_PAGE_VACUUM_UNKNOWN: if (vacuum_is_mvccid_vacuumed (chain->max_mvccid)) /* <- all prior owed vacuum ran */ HEAP_PAGE_SET_VACUUM_STATUS (chain, HEAP_PAGE_VACUUM_ONCE); break; /* <- else: stays UNKNOWN */}if (MVCC_ID_PRECEDES (chain->max_mvccid, mvccid)) chain->max_mvccid = mvccid; /* <- track the max */
// heap_page_set_vacuum_status_none -- src/storage/heap_file.cassert (HEAP_PAGE_GET_VACUUM_STATUS (chain) == HEAP_PAGE_VACUUM_ONCE); /* <- only ONCE -> NONE */HEAP_PAGE_SET_VACUUM_STATUS (chain, HEAP_PAGE_VACUUM_NONE);stateDiagram-v2 [*] --> NONE NONE --> ONCE: mvcc op \n assert max_mvccid precedes new id ONCE --> UNKNOWN: second mvcc op \n future unpredictable ONCE --> NONE: vacuum visit \n set_vacuum_status_none UNKNOWN --> ONCE: mvcc op AND max_mvccid already vacuumed UNKNOWN --> UNKNOWN: mvcc op AND max_mvccid not yet vacuumed NONE --> [*]: page may be deallocated
Figure 8-2. HEAP_PAGE_VACUUM_STATUS transitions. The only exit edge to deallocation is from NONE.
UNKNOWN never collapses straight to NONE: once two MVCC ops stack up
without an intervening vacuum, CUBRID stops counting owed visits. Only a
new op observing that max_mvccid is past vacuum’s horizon drops it to
ONCE, and from ONCE one vacuum visit returns it to NONE.
Invariant — a page is deallocatable only in state
NONE. Vacuum drives a page toNONEonly fromONCE, never directly fromUNKNOWN, andheap_page_set_vacuum_status_noneasserts the prior state isONCE.NONEtherefore means no owed vacuum visit remains — the preconditionheap_remove_page_on_vacuumrelies on. Reaching it from any other state is a use-after-free of a slot a future worker still intends to touch.
8.4 Tying status to reclamation: the vacuum_heap_page decision
Section titled “8.4 Tying status to reclamation: the vacuum_heap_page decision”vacuum_heap_page reads the status after per-record work, then decides
whether to flip it to NONE and whether to attempt removal:
// vacuum_heap_page (decision tail) -- src/query/vacuum.cpage_vacuum_status = heap_page_get_vacuum_status (thread_p, helper.home_page);assert (page_vacuum_status != HEAP_PAGE_VACUUM_NONE || (was_interrupted && helper.n_vacuumed == 0));if ((page_vacuum_status == HEAP_PAGE_VACUUM_ONCE && !was_interrupted) || (page_vacuum_status == HEAP_PAGE_VACUUM_NONE && was_interrupted)) { if (page_vacuum_status == HEAP_PAGE_VACUUM_ONCE) heap_page_set_vacuum_status_none (thread_p, helper.home_page); /* <- ONCE -> NONE, logged */ // ... pgbuf_set_dirty (home_page) ... if (spage_number_of_records (helper.home_page) <= 1 && helper.reusable) { /* <- only header slot, reusable heap */ if (pgbuf_has_prevent_dealloc (helper.home_page) == false && heap_remove_page_on_vacuum (thread_p, &helper.home_page, &helper.hfid)) { /* page gone */ goto end; } } }Two branches match: ONCE && !was_interrupted (normal — flip to NONE,
then consider removal) and NONE && was_interrupted (a replayed task
whose vacuum ran in a prior life — already NONE, no re-flip).
ONCE && was_interrupted is deliberately excluded: a replayed worker
cannot tell whether the ONCE is its own task or a new delete logged
after the crash, so flipping could strand owed vacuum. UNKNOWN never
matches. Removal is attempted only for a page holding just the header
slot, on a reusable heap, with no scanner pinning it.
8.5 spage_reclaim: compacting the slot array of a don’t-reuse page
Section titled “8.5 spage_reclaim: compacting the slot array of a don’t-reuse page”spage_vacuum_slot leaves dead slot-array entries; spage_reclaim
shrinks the slot array for ANCHORED_DONT_REUSE_SLOTS pages, invoked
from the reclaim-addresses path (xheap_reclaim_addresses, reached by
compactdb), not the steady-state vacuum loop.
// spage_reclaim -- src/storage/slotted_page.c if (page_header_p->num_slots > 0) { first_slot_p = spage_find_slot (page_p, page_header_p, 0, false); for (slot_id = page_header_p->num_slots - 1; slot_id >= 0; slot_id--) { /* <- backwards */ slot_p = first_slot_p - slot_id; if (slot_p->offset_to_record == SPAGE_EMPTY_OFFSET && (slot_p->record_type == REC_MARKDELETED || slot_p->record_type == REC_DELETED_WILL_REUSE)) { assert (page_header_p->anchor_type == ANCHORED_DONT_REUSE_SLOTS); if ((slot_id + 1) == page_header_p->num_slots) spage_reduce_a_slot (page_p); /* <- trailing dead slot: drop array entry */ else slot_p->record_type = REC_DELETED_WILL_REUSE; /* <- interior: mark recyclable */ is_reclaim = true; } } } if (is_reclaim == true) { if (page_header_p->num_slots == 0) spage_initialize (thread_p, page_p, ...); /* <- fully empty: reset page to pristine layout */ pgbuf_set_dirty (thread_p, page_p, DONT_FREE); } return is_reclaim;The load-bearing branch is backwards iteration. Only a trailing
dead slot can be removed with spage_reduce_a_slot (which shortens the
array), so walking from the last slot toward 0 lets each removal expose
the next trailing dead slot. An interior dead slot cannot be removed
without renumbering the live slots after it (invalidating their OIDs), so
it is re-typed REC_DELETED_WILL_REUSE for a future insert. Exit
branches: num_slots == 0 at entry returns false; if anything was
reclaimed and the page is now fully empty, spage_initialize resets it;
if nothing was reclaimed the page is untouched.
The asymmetry with spage_vacuum_slot is the takeaway: vacuum frees
space continuously via spage_vacuum_slot and logs it through
vacuum_log_vacuum_heap_page. spage_reclaim reclaims slot-array
entries only on the heavier reclaim-addresses path and only for the
don’t-reuse anchor; there xheap_reclaim_addresses deliberately skips
logging the reclaim (log_skip_logging, in the
if (spage_reclaim (...) == true) block) because reusing unreferenced
dead OIDs leaves the database logically unmodified, so neither REDO nor
UNDO is required.
8.6 heap_remove_page_on_vacuum: unlinking the empty page
Section titled “8.6 heap_remove_page_on_vacuum: unlinking the empty page”When a page reaches NONE and holds only its header slot, vacuum tries
to give it back to the file manager. The function is minimally
intrusive: every neighbor is fixed conditionally through ordered
watchers, and any failure abandons the attempt (return false).
// heap_remove_page_on_vacuum -- src/storage/heap_file.c assert (spage_number_of_records (*page_ptr) <= 1); /* <- precondition: page is empty */ pgbuf_get_vpid (*page_ptr, &page_vpid); if (page_vpid.pageid == hfid->hpgid && page_vpid.volid == hfid->vfid.volid) return false; /* <- never remove the header page */ if (pgbuf_ordered_fix (thread_p, &header_vpid, OLD_PAGE, PGBUF_LATCH_WRITE, &header_watcher) != NO_ERROR) goto error; /* <- give up if header busy */ // ... heap_vpid_prev/next, conditional fix of prev (unless == header) and next ... if (crt_watcher.page_was_unfixed) { *page_ptr = crt_watcher.pgptr; /* <- ordered fix may have refixed home */ if (spage_number_of_records (crt_watcher.pgptr) > 1) goto error; /* <- re-filled while we waited */ } if (pgbuf_has_prevent_dealloc (crt_watcher.pgptr)) goto error; /* <- a scanner reached us */ if (pgbuf_has_any_waiters (crt_watcher.pgptr)) { assert (false); goto error; } log_sysop_start (thread_p); is_system_op_started = true; /* <- atomic unlink begins */ // ... scrub vpid from estimates.best[]/second_best[]/last_vpid/full_search_vpid, // splice prev.next_vpid=next and next.prev_vpid=prev (RVHF_STATS / RVHF_CHAIN logs) ... pgbuf_ordered_unfix_and_init (thread_p, *page_ptr, &crt_watcher); if (file_dealloc (thread_p, &hfid->vfid, &page_vpid, FILE_HEAP) != NO_ERROR) goto error; (void) heap_stats_del_bestspace_by_vpid (thread_p, &page_vpid); /* <- evict from cache */ log_sysop_commit (thread_p); is_system_op_started = false; return true;error: if (is_system_op_started) log_sysop_abort (thread_p); /* <- roll back half-done unlink */ // ... unfix any fixed watchers; return false ...Every branch is a “give up safely” path (Figure 8-3); the function never
partially unlinks a page. A busy neighbor merely defers removal;
page_was_unfixed with > 1 records means a concurrent insert re-filled
the page while watchers were acquired in VPID order; has_any_waiters
should be impossible (assert(false)). The happy path scrubs the page
from the header’s best-space estimates, splices the chain
(prev.next_vpid = next, next.prev_vpid = prev, plus
heap_hdr.next_vpid = next when this was the file’s first data page),
and file_deallocs it — all under one system op.
Invariant — chain splice and dealloc are atomic. The header-stats, prev-chain, and next-chain
spage_updates plusfile_deallocrun inside onelog_sysop_start/log_sysop_commitop (RVHF_STATS/RVHF_CHAIN). A crash mid-splice recovers all-or-nothing; a half-spliced chain would corrupt traversal — what the sysop prevents.
flowchart TD
A["remove_page_on_vacuum\nassert <=1 record"] --> B{"is header page?"}
B -- yes --> R0["return false"]
B -- no --> C["ordered_fix header"]
C --> D{"fix ok?"}
D -- no --> E["goto error -> false"]
D -- yes --> F["vpid_prev / vpid_next"]
F --> G["fix prev, fix next conditionally"]
G --> H{"home refilled, >1 record?"}
H -- yes --> E
H -- no --> I{"prevent_dealloc or waiters?"}
I -- yes --> E
I -- no --> J["sysop_start"]
J --> K["scrub bestspace, splice prev/next, file_dealloc"]
K --> L{"all ok?"}
L -- no --> M["sysop_abort -> false"]
L -- yes --> N["sysop_commit, del bestspace cache, return true"]
Figure 8-3. heap_remove_page_on_vacuum — every path is fail-safe.
8.7 Best-space cache and the HEAP_DROP_FREE_SPACE threshold
Section titled “8.7 Best-space cache and the HEAP_DROP_FREE_SPACE threshold”The best-space machinery is dissected in Chapter 9; vacuum touches two
seams. HEAP_DROP_FREE_SPACE (heap_file.h, (int)(DB_PAGESIZE * 0.3))
is the free-space level at which a page is worth advertising as an insert
target; when spage_vacuum_slot returns enough bytes to cross it, the
page can re-enter the best-space hints. Conversely, when
heap_remove_page_on_vacuum deallocates a page it removes it from the
cache (nulling the matching estimates.best[]/second_best[] slots and
calling heap_stats_del_bestspace_by_vpid) so a future insert is never
handed a freed VPID.
8.8 REC_MVCC_NEXT_VERSION (legacy) versus prev_version_lsa
Section titled “8.8 REC_MVCC_NEXT_VERSION (legacy) versus prev_version_lsa”Older CUBRID stored the forward pointer to a record’s successor version
inside the heap as a REC_MVCC_NEXT_VERSION slot that vacuum had to
chase. Current code supersedes that with prev_version_lsa: an update
writes the log LSA of the previous version into the new record’s MVCC
header (see heap_update_set_prev_version and Chapters 5-6), so version
chaining lives in the log, not in extra heap slots. Consequently vacuum
no longer walks a forwarding link to reclaim a successor — it tombstones
the dead version’s own slot via spage_vacuum_slot, reclaiming its bytes
in place, while readers needing the prior image follow prev_version_lsa
into the log. This is also why spage_vacuum_slot on a reusable heap can
immediately pick REC_DELETED_WILL_REUSE: with no in-heap next-version
link to honor, there are no references to the slot.
8.9 Chapter summary — key takeaways
Section titled “8.9 Chapter summary — key takeaways”heap_vacuum_all_objectsselects pages;vacuum_heap_pageselects slots — the driver walks thenext_vpidchain, skipsHEAP_PAGE_VACUUM_NONEpages, delegating the per-recordthreshold_mvcciddeath test downward.spage_vacuum_slotfrees bytes, not slot entries — it bumpstotal_free, setsoffset_to_record = SPAGE_EMPTY_OFFSET, and stampsREC_DELETED_WILL_REUSE(reusable) orREC_MARKDELETED(referable); it never compacts.- The
NONE → ONCE → UNKNOWNmachine predicts owed vacuum visits — MVCC ops push it forward, only a vacuum visit pulls it back (onlyONCE → NONE), andUNKNOWNescapes only whenmax_mvccidfalls below vacuum’s horizon. - A page is deallocatable only in state
NONE— the preconditionheap_remove_page_on_vacuumrelies on to avoid a future worker touching freed slots. spage_reclaimshrinks the slot array backwards, forANCHORED_DONT_REUSE_SLOTSonly, on the reclaim path wherexheap_reclaim_addressesskips logging the reclaim; trailing tombstones drop viaspage_reduce_a_slot, interior ones become reusable.heap_remove_page_on_vacuumis fail-safe and atomic — every contention case aborts (return false); the chain splice plusfile_deallocrun in one logged system op (all-or-nothing recovery).prev_version_lsareplaced the in-heapREC_MVCC_NEXT_VERSIONlink, so vacuum reclaims a dead version’s slot in place; freed space crossingHEAP_DROP_FREE_SPACE(30%) re-feeds the best-space cache.
Chapter 9: Best Space and Free Space Management
Section titled “Chapter 9: Best Space and Free Space Management”Chapters 4 (Insert) and 6 (Update) deferred one function: “pick a page with
enough room” — heap_stats_find_best_page. This chapter dissects the machinery
behind it: the on-disk hints (HEAP_HDR_STATS.estimates), the in-memory cache
(heap_Bestspace), and the lazy reconciliation (heap_stats_sync_bestspace)
that keeps the picture current without scanning the whole heap on the hot path.
The governing idea: free-space accounting is deliberately approximate.
Nothing here is logged for correctness (the log_skip_logging calls), entries
may be stale, and a wrong hint costs at most a wasted page fix, never a data
error. For why a heap keeps free-space hints, see cubrid-heap-manager.md,
“Free-space management”; this is the implementation.
9.1 The two-tier hint model
Section titled “9.1 The two-tier hint model”Two stores, consulted in order: (1) the in-memory cache heap_Bestspace
(HEAP_STATS_BESTSPACE_CACHE), a process-global hash keyed by both HFID and
VPID, bounded by PRM_ID_HF_MAX_BESTSPACE_ENTRIES (disabled when <= 0); (2)
the on-disk hints in the heap header HEAP_HDR_STATS.estimates — a best[10]
ring, a second_best[10] ring, and aggregate counters that persist across
restarts. heap_stats_find_best_page reads both tiers; heap_stats_sync_bestspace
rebuilds both.
9.2 heap_bestspace and the estimates block — every field
Section titled “9.2 heap_bestspace and the estimates block — every field”heap_bestspace (in heap_file.h) is the atom of hinting: one candidate page
plus its estimated room.
// heap_bestspace -- src/storage/heap_file.hstruct heap_bestspace{ VPID vpid; /* Vpid of one of the best pages */ int freespace; /* Estimated free space in this page */};| Field | Role | Why it exists |
|---|---|---|
vpid | Candidate page id | Fix the page directly, no directory walk |
freespace | Estimated bytes free | Pre-filter: skip if < needed; lazy, may lie |
This struct appears three ways: the best[] ring element on disk,
HEAP_STATS_ENTRY.best in memory, and a selector stack local. The on-disk ring
lives in the anonymous estimates sub-struct of heap_hdr_stats (heap_file.c),
whose every field follows:
| Field | Role | Why it exists |
|---|---|---|
num_pages | Approx page count | Sizes the bounded scan min(20%, 100) |
num_recs | Approx record count | Avg record size for heap_stats_get_min_freespace |
recs_sumlen | Approx sum of record lengths | Numerator of the avg-record-size estimate |
num_other_high_best | Good pages not in best[] | Re-sync gate: scan only if /num_pages >= 0.1 |
num_high_best | best[] slots still good | 0 means exhausted; sync worthwhile |
num_substitutions | Best-slot eviction count | % 1000 == 0 admits one to second_best[] |
num_second_best | Live second_best[] entries | Tells empty from full ring (both head==tail) |
head_second_best | second_best[] consume index | heap_stats_get_second_best pops here |
tail_second_best | second_best[] produce index | heap_stats_put_second_best pushes here |
head | best[] consume/insert index | Where the next substitution lands |
last_vpid | The heap’s tail page | heap_vpid_alloc links the new page after it |
full_search_vpid | Bounded-scan resume cursor | Spreads scanning over calls |
second_best[10] | Ring of evicted-but-good pages | Reservoir; not all reused at once |
best[10] | Ring of primary candidates | The hot-path source, read first |
Invariant — ring consistency. head stays in [0,10), advanced only via
HEAP_STATS_NEXT_BEST_INDEX(i)=(i+1)%10; a bad head reads past the array. For
second_best, (tail - head + 10) % 10 == num_second_best (asserted in the
put/get helpers) — num_second_best is the only way to tell a full ring from an
empty one, since both make head==tail. None of these are logged, so they are
purely in-flight.
9.3 The thresholds: HEAP_DROP_FREE_SPACE, unfill_space, min-freespace
Section titled “9.3 The thresholds: HEAP_DROP_FREE_SPACE, unfill_space, min-freespace”HEAP_DROP_FREE_SPACE (heap_file.h) is (int)(DB_PAGESIZE * 0.3) — the 30%
admission floor. A page is cached only if more than ~30% of it is free; below
that it is dropped (hence “drop free space”). unfill_space is reserved per-page headroom
(DB_PAGESIZE * PRM_ID_HF_UNFILL_FACTOR, set at heap creation); the selector
adds it to the request so a page must fit this record plus slack for future
in-place update growth (Chapter 6: a growing update prefers to stay home).
heap_stats_get_min_freespace combines both: it takes the average record size
(recs_sumlen/num_recs, or header_size + 20 when num_recs == 0), adds
unfill_space, then clamps with MIN(..., HEAP_DROP_FREE_SPACE).
Invariant — min-freespace never exceeds the drop floor. That final
MIN(..., HEAP_DROP_FREE_SPACE) keeps any page clearing 30% eligible. Without
it, large records could push min-freespace above 30%, making the updater refuse
pages the sync (which only checks 30%) records — a permanent disagreement.
9.4 heap_stats_find_page_in_bestspace — scanning the hints
Section titled “9.4 heap_stats_find_page_in_bestspace — scanning the hints”The inner loop: given best[] (from the caller) and the cache, return a fixed,
X-latched page with >= needed_space, or HEAP_FINDSPACE_NOTFOUND/_ERROR,
under LK_FORCE_ZERO_WAIT so a busy page is skipped, never waited on:
// heap_stats_find_page_in_bestspace -- src/storage/heap_file.cwhile (notfound_cnt < BEST_PAGE_SEARCH_MAX_COUNT /* cap self-heal at 100 misses */ && (ent = mht_get2 (heap_Bestspace->hfid_ht, hfid, NULL)) != NULL) { if (ent->best.freespace >= needed_space) { best = ent->best; break; } /* hit */ mht_rem2 (...); mht_rem (...); /* stale: evict from both ht */ heap_stats_entry_free (thread_p, ent, NULL); heap_Bestspace->num_stats_entries--; notfound_cnt++; }The cache pull self-heals — a hint short on room is evicted from both hash tables.
On a cache miss (or disabled cache) best.freespace stays -1 and the code scans
the on-disk best[] from best_array_index, setting best_hint_is_used so a
refresh writes the corrected freespace back. Each candidate runs one
fix-and-recheck cycle. Branches a modifier must respect:
- Recheck, repair even on a miss.
spage_max_space_for_new_recordis the truth; if short, unfix and continue, but still refresh the hint/cache. - Returned freespace excludes unfill — only
record_length + heap_Slotted_overheadis subtracted; the reserve was already inneeded_space. - Error vs. timeout.
NULLfix wither_errid()==NO_ERRORis the zero-wait timeout (try next);ER_INTERRUPTEDaborts; any other error drops the hint and returns_ERRORwithassert(false). idx_badspaceout-param returns the least-room slot; the caller makes it the newhead, so the next substitution overwrites the worst slot.
9.5 heap_stats_find_best_page — the orchestrator
Section titled “9.5 heap_stats_find_best_page — the orchestrator”What Chapters 4 and 6 called. It bumps record estimates (num_recs += 1 on
insert; recs_sumlen += needed_space always) then loops over hints, sync, and
allocation. Two subtleties live in the code:
// heap_stats_find_best_page -- src/storage/heap_file.ctotal_space = needed_space + heap_Slotted_overhead + heap_hdr->unfill_space;if (heap_is_big_length (total_space)) /* unfill would overflow page */ total_space = needed_space + heap_Slotted_overhead; /* -> drop the reserve */if (try_find >= 2 || other_high_best_ratio < HEAP_BESTSPACE_SYNC_THRESHOLD) /*0.1f*/ break; /* sync-admission gate */Every branch: (1) header fix fails → goto error, return NULL. (2) Else loop
(try_find++). (3) heap_stats_find_page_in_bestspace → _ERROR unfixes and
goto error; a page → finish. (4) No page → the gate (num_pages<=0 or
num_other_high_best<=0 force ratio 0, assert(num_pages>0); try_find>=2;
ratio<0.1) breaks to allocation. (5) Gate passes → syncloop runs
heap_stats_sync_bestspace(false,true) up to 3 times while found is 0, then
re-scans; num_pages_found<0 is a hard error, <=0 after retries → alloc.
(6) Still NULL → heap_vpid_alloc (failure → goto error); else exit:
log_skip_logging header, set dirty + free, return pgptr.
Invariant — header held WRITE for the whole selection. Fixed
PGBUF_LATCH_WRITE before any estimate is touched, released only at the single
dirty-and-free exit (or each goto error). Estimate mutation, sync, and
allocation all happen under that one latch, so two inserters can’t corrupt
head/num_high_best. The log_skip_logging at exit leaves the mutations
unlogged; they self-heal.
9.6 heap_stats_update — marking pages good after a delete/update
Section titled “9.6 heap_stats_update — marking pages good after a delete/update”When a delete or in-place shrink frees space, the page may become a good reuse
target. heap_stats_update is the lightweight notifier (delete/update paths and
recovery redo, Chapter 10).
// heap_stats_update -- src/storage/heap_file.cfreespace = spage_get_free_space_without_saving (thread_p, pgptr, &need_update);if (PRM_ID_HF_MAX_BESTSPACE_ENTRIES > 0 && prev_freespace < freespace) heap_stats_add_bestspace (thread_p, hfid, vpid, freespace); /* room grew: cache */if (need_update || prev_freespace <= HEAP_DROP_FREE_SPACE) if (freespace > HEAP_DROP_FREE_SPACE) { /* now genuinely good */ error = heap_stats_update_internal (...); /* try to write best[] */ if (error != NO_ERROR) spage_set_need_update_best_hint (..., true); /* defer */ else if (need_update) spage_set_need_update_best_hint (..., false); /* clear */ } else if (need_update) spage_set_need_update_best_hint (..., false); /* obsolete */Branches: cache tried first (if enabled and room grew — no latch); on-disk
update only if need_update or prev_freespace <= 30%; heap_stats_update_internal
takes a CONDITIONAL header latch — on failure it flags need_update_best_hint = true (“good page, couldn’t tell the header”), invisible until a future delete or
full sync rediscovers it (the approximation tax). Success with need_update
clears the flag; a now-sub-30% flagged page clears it too (obsolete note).
Inside heap_stats_update_internal, an evicted slot above 30% is offered to the
reservoir via heap_stats_put_second_best, where the cadence lives: the guard
if (heap_hdr->estimates.num_substitutions++ % 1000 == 0) admits one in 1000
evictions — pushing *vpid at tail_second_best, advancing the tail (and head
if full), bumping num_second_best, and resetting the counter to 1.
Why 1-in-1000 (write-spreading). Caching every page of a bulk-emptied run would make the next inserts pile back in, re-dirtying just-emptied pages. Sampling every 1000th spreads reuse across the file so emptied extents can stay empty (and be returned to the OS).
9.7 heap_stats_sync_bestspace — the bounded rebuild
Section titled “9.7 heap_stats_sync_bestspace — the bounded rebuild”When the hints run dry, this rebuilds best[] and refreshes the counters by
walking the chain — but only a slice, the answer to “without scanning the whole
heap.” The cap is max_iterations = MAX(MIN((int)(num_pages*0.2), heap_Find_best_page_limit /*100*/), HEAP_NUM_BEST_SPACESTATS /*10*/): at most
min(20% of pages, 100), never fewer than 10. The seed cascade: (1) cache
enabled → search_all=true from full_search_vpid, the resume cursor written
forward each step so syncs stride the file 20%-per-call (scan-level
write-spreading); (2) else num_high_best > 0 → just behind head; (3) else
num_second_best > 0 → pop a reservoir page (heap_stats_get_second_best); (4)
else search_all=true from full_search_vpid, or if NULL the first page
(hfid->hpgid) with can_cycle=false.
stateDiagram-v2 [*] --> Seed Seed --> Walk: start_vpid by cascade Walk --> Inspect: fix page READ, prevent dealloc Inspect --> Walk: free_space <= 30 pct, skip Inspect --> Record: free_space > 30 pct Record --> Walk: best[] not full, store and num_high_best++ Record --> Walk: best[] full, num_other_best++ Walk --> Commit: iterations > max OR num_high_best == 10 OR at stopat_vpid Commit --> [*]: rebuild head, num_high_best, counters
Figure 9-3: heap_stats_sync_bestspace, bounded (scan_all=false) sync. On the
cap it sets iterate_all and breaks; a seeding slot that yielded nothing
(start_pos != -1 && num_high_best == 0) is NULLed so the next call won’t re-seed
from a dead spot. Per page, spage_collect_statistics accumulates the counters;
search_all persists full_search_vpid; can_cycle wraps to the heap head.
The commit phase: early goto end if a bounded scan found nothing and
second_best is empty; else NULL the unused best[] tail, set
head/num_high_best; if scan_all or num_pages >= stored overwrite all
counters wholesale; otherwise a conservative merge (num_other_high_best -= num_high_best, bump to num_other_best if larger, overwrite record counts only
if the partial scan saw more).
Invariant — sync never logs. Its header comment says so; the caller wraps it
in log_skip_logging. A crash mid-sync leaves stale estimates that the next
insert re-syncs — hence no undo data, and readers tolerate the chain walk
(OLD_PAGE_PREVENT_DEALLOC, PGBUF_LATCH_READ, no record locks).
9.8 Allocation fallback: heap_vpid_alloc vs heap_alloc_new_page
Section titled “9.8 Allocation fallback: heap_vpid_alloc vs heap_alloc_new_page”When reuse fails, the orchestrator calls heap_vpid_alloc — the stats-aware
allocator. It file_allocs a page, links it after last_vpid (chain logged
RVHF_CHAIN), bumps last_vpid/num_pages, then installs it at head (which it
advances). The three install branches:
// heap_vpid_alloc -- src/storage/heap_file.cbest = heap_hdr->estimates.head;heap_hdr->estimates.head = HEAP_STATS_NEXT_BEST_INDEX (best);if (VPID_ISNULL (&heap_hdr->estimates.best[best].vpid)) heap_hdr->estimates.num_high_best++; /* slot was empty */else if (heap_hdr->estimates.best[best].freespace > HEAP_DROP_FREE_SPACE) { heap_hdr->estimates.num_other_high_best++; /* evict good page to reservoir */ heap_stats_put_second_best (heap_hdr, &heap_hdr->estimates.best[best].vpid); }heap_hdr->estimates.best[best].vpid = vpid;heap_hdr->estimates.best[best].freespace = DB_PAGESIZE; /* fresh page is all free */Unlike the estimate mutations, the chain link and RVHF_STATS here are
logged inside a log_sysop_start/commit — losing a fresh page from the chain
would leak storage (Chapter 10). heap_alloc_new_page is the bare allocator
(takeaway 7): file_alloc with NULL prev/next links, attach a watcher, touch no
estimates.
9.9 Chapter summary — key takeaways
Section titled “9.9 Chapter summary — key takeaways”- Two tiers, in order:
heap_stats_find_page_in_bestspaceprobes the in-memoryheap_Bestspacecache, then the on-diskbest[10]ring; both self-heal stale entries on touch. heap_stats_find_best_pageorchestrates under a held header WRITE latch: bump estimates, scan hints, conditionally sync, elseheap_vpid_alloc. The gate (try_find >= 2ORratio < 0.1) is the storage-vs-CPU knob.- Nothing here is logged for correctness — estimate mutations run under
log_skip_logging, sync logs nothing; onlyheap_vpid_alloclogs the chain link andRVHF_STATS. HEAP_DROP_FREE_SPACE(30%) is the admission floor;unfill_spaceis update headroom, folded intoneeded_spacethen excluded from the returned freespace.- Bounded, resumable scanning: a sync inspects
min(20% of pages, 100), seeded fromfull_search_vpidand advancing that cursor, so syncs stride the file without blocking an insert. - Second-best reservoir spreads writes: only every 1000th evicted-but-good
page is admitted (
num_substitutions % 1000), so a bulk-emptied region is reused sparsely. heap_vpid_alloc(stats-aware) vsheap_alloc_new_page(bare): the former installs the page asbest[head]and maintainslast_vpid/num_pages; the latter touches no estimates and serves callers that chain pages themselves.
Chapter 10: Crash Recovery and the Redo Undo Log Paths
Section titled “Chapter 10: Crash Recovery and the Redo Undo Log Paths”This chapter answers the question every prior flow deferred: what
happens after a crash? It traces the redo/undo handler each operation
pins, shows how the undo image of an UPDATE is the exact bytes
prev_version_lsa points at (tying recovery back to Chapter 6), and
dissects the recovery-only slotted-page primitives. This is the edge
path: it never invents a new record state, only reconstructs states
the forward path produced. For WAL / ARIES theory see
cubrid-heap-manager.md §“Durability and the Write-Ahead Log”.
10.1 The recovery dispatch table
Section titled “10.1 The recovery dispatch table”RV_fun[] in recovery.c maps each LOG_RCVINDEX to an
{undofun, redofun, ...} tuple. The driver calls the matching function
with a LOG_RCV carrying rcv->pgptr (fixed page), rcv->offset
(slotid, possibly OR-ed with a vacuum flag), rcv->length / rcv->data
(payload), and rcv->mvcc_id. Two facts shape every handler: one redo
handler serves physically identical operations (heap_rv_redo_insert
redoes both RVHF_INSERT and RVHF_INSERT_NEWHOME), and undo is the
redo of the inverse — the heap logs no separate undo image for
insert/delete, so slotid plus the redo payload reverses it.
flowchart TB
LOG["log record\nrcvindex + LOG_RCV{pgptr, offset, data, mvcc_id}"]
DISP["recovery driver\nlooks up RV_fun[rcvindex]"]
LOG --> DISP
DISP -->|redo pass| REDO["redofun\nheap_rv_redo_* / heap_rv_mvcc_redo_*"]
DISP -->|undo pass| UNDO["undofun\nheap_rv_undo_* / heap_rv_mvcc_undo_*"]
REDO --> SPI["spage_insert_for_recovery /\nspage_update / spage_delete"]
UNDO --> SPD["spage_delete_for_recovery /\nspage_update"]
SPI --> DIRTY["pgbuf_set_dirty"]
SPD --> DIRTY
Figure 10-1 — How a log record reaches a slotted-page primitive.
10.2 The shared vacuum-flag preamble
Section titled “10.2 The shared vacuum-flag preamble”The forward path (heap_mvcc_log_insert, heap_mvcc_log_delete,
heap_mvcc_log_home_change_on_delete) OR-s
HEAP_RV_FLAG_VACUUM_STATUS_CHANGE (0x8000) into p_addr->offset when
the operation also flipped the page’s vacuum status. Every MVCC handler
therefore opens with the same masking lines, elided later as
// ... mask vacuum flag, see 10.2 ...:
// shared preamble in every MVCC handler -- src/storage/heap_file.cif (slotid & HEAP_RV_FLAG_VACUUM_STATUS_CHANGE) { vacuum_status_change = true; }slotid = slotid & (~HEAP_RV_FLAG_VACUUM_STATUS_CHANGE); /* recover real slotid */The bit decides whether to propagate the status change to the page chain (Chapter 8); the mask recovers the real slotid.
Invariant — the recovered slotid never carries the vacuum flag into a slotted-page call. A forgotten mask gives a slotid ≥ 32768 that fails
spage_find_slot. The record-rebuilding handlers (heap_rv_mvcc_redo_insert,heap_rv_undoredo_update,heap_rv_redo_update_and_update_chain,heap_rv_mvcc_undo_delete) makeassert (slotid > 0)the tripwire;heap_rv_undo_insertmasks without asserting because it only deletes, andspage_delete_for_recoveryrejects a bad slotid itself.
10.3 INSERT recovery — heap_rv_redo_insert and heap_rv_mvcc_redo_insert
Section titled “10.3 INSERT recovery — heap_rv_redo_insert and heap_rv_mvcc_redo_insert”Non-MVCC redo (heap_rv_redo_insert). Payload is
[INT16 record_type][record bytes]:
// heap_rv_redo_insert -- src/storage/heap_file.crecdes.type = *(INT16 *) (rcv->data); /* <- type prefix; recdes points past it */if (recdes.type == REC_ASSIGN_ADDRESS) /* <- reserved-only slot: data IS byte count */ { recdes.area_size = recdes.length = *(INT16 *) recdes.data; recdes.data = NULL; }sp_success = spage_insert_for_recovery (thread_p, rcv->pgptr, slotid, &recdes); /* fail: fatal er_set */Ordinary record copies bytes; REC_ASSIGN_ADDRESS (Chapter 4) reserves
without copying. Redo for RVHF_INSERT / RVHF_INSERT_NEWHOME.
MVCC redo (heap_rv_mvcc_redo_insert). The insert MVCCID is not in
the payload — it lives in rcv->mvcc_id, so the handler rebuilds the
header to keep visibility data (Chapter 3) correct:
// heap_rv_mvcc_redo_insert -- src/storage/heap_file.c// ... mask vacuum flag, see 10.2 ...if (record_type == REC_BIGONE) /* <- overflow: no inline header rebuild */ { HEAP_SET_RECORD (&recdes, ..., REC_BIGONE, rcv->data + sizeof (record_type)); }else /* inline record: rebuild the header */ { MVCC_SET_INSID (&mvcc_rec_header, rcv->mvcc_id); /* <- INSID from log header, not page */ or_mvcc_add_header (&recdes, &mvcc_rec_header, ...); }spage_insert_for_recovery (thread_p, rcv->pgptr, slotid, &recdes);heap_page_rv_chain_update (thread_p, rcv->pgptr, rcv->mvcc_id, vacuum_status_change); /* re-apply chain */Both branches insert, then heap_page_rv_chain_update re-applies the
page-chain MVCCID and saved vacuum-status change.
Invariant — a redone MVCC insert carries the same INSID it had before the crash. INSID comes from
rcv->mvcc_id, not page bytes, so redo is idempotent and snapshot-correct; stale bytes would diverge from committed history.
10.4 INSERT undo — heap_rv_undo_insert is a delete
Section titled “10.4 INSERT undo — heap_rv_undo_insert is a delete”An uncommitted insert must vanish on rollback: delete the slot and, only when the system is fully up, repair free-space stats:
// heap_rv_undo_insert -- src/storage/heap_file.cif (LOG_ISRESTARTED ()) /* <- measure freed space only after restart */ { free_space = spage_get_free_space_without_saving (thread_p, rcv->pgptr, NULL); }slotid = rcv->offset & (~HEAP_RV_FLAG_VACUUM_STATUS_CHANGE);(void) spage_delete_for_recovery (thread_p, rcv->pgptr, slotid); /* <- reuse the slot */pgbuf_set_dirty (thread_p, rcv->pgptr, DONT_FREE);if (LOG_ISRESTARTED ()) /* <- look up HFID (best-effort), fix stats */ { if (heap_get_class_oid_from_page (...) != NO_ERROR || heap_get_class_info (...) != NO_ERROR) goto end; heap_stats_update (thread_p, rcv->pgptr, &hfid, free_space); }end: ; /* falls through to return NO_ERROR */During crash recovery only the slot delete runs (best-space stats,
Chapter 9, not yet trustworthy); after restart both heap_get_class_*
failures goto end and swallow the error (stats are a hint). The OID is
reused — an uncommitted insert was never permanent. Undo for RVHF_INSERT,
RVHF_MVCC_INSERT, RVHF_INSERT_NEWHOME, RVHF_MVCC_REDISTRIBUTE.
10.5 UPDATE recovery — heap_rv_undoredo_update and the prev_version tie-in
Section titled “10.5 UPDATE recovery — heap_rv_undoredo_update and the prev_version tie-in”UPDATE recovery is unusual: one function serves both undo and redo,
since either direction overwrites slot slotid with the payload bytes.
heap_rv_redo_update is a one-line wrapper; heap_rv_undo_update calls
the same core then adds a vacuum check.
// heap_rv_undoredo_update -- src/storage/heap_file.c// ... mask vacuum flag (10.2); assert (slotid > 0); point recdes past the type prefix ...if (recdes.area_size <= 0) { sp_success = SP_SUCCESS; } /* <- empty image: header-only change */else if (heap_update_physical (thread_p, rcv->pgptr, slotid, &recdes) != NO_ERROR) { assert_release (false); return ER_FAILED; } /* heap_update_physical = spage_update + type fix-up */Why the same payload works both ways — the prev_version_lsa link. On the forward UPDATE path, before logging, CUBRID stamps the new record’s header with the LSA of the old record’s undo log record:
// heap_update_set_prev_version -- src/storage/heap_file.cif (recdes.type == REC_HOME) { or_mvcc_set_log_lsa_to_record (&recdes, prev_version_lsa); } /* <- LSA of the old record's undo log */So the UPDATE’s undo image (the old bytes) lives at exactly the LSA the
new record’s prev_version_lsa records. The version-chain reader
(Chapter 5) fetches the undo record there — precisely what
heap_rv_undoredo_update would replay on rollback: recovery and MVCC
time-travel read the same physical undo record, no second copy.
(heap_update_bigone wires the same link for overflow.)
Invariant —
prev_version_lsaequals the LSA of the undo log record holding the predecessor’s bytes. Enforced byheap_update_set_prev_versionandheap_update_bigone. Drift means rollback restores wrong bytes or a snapshot read walks to garbage — silent corruption, not a crash.
10.6 The MVCC-delete redo — heap_rv_redo_update_and_update_chain
Section titled “10.6 The MVCC-delete redo — heap_rv_redo_update_and_update_chain”An MVCC delete stamps a delete-MVCCID into the header (Chapter 7) rather
than erasing the record — physically an update. So after masking the
vacuum flag (§10.2), heap_rv_redo_update_and_update_chain is literally
heap_rv_redo_update (thread_p, rcv) (the §10.5 core) then
heap_page_rv_chain_update (..., rcv->mvcc_id, vacuum_status_change); an
inner error propagates via ASSERT_ERROR (). It is the redo for both
RVHF_MVCC_DELETE_MODIFY_HOME and RVHF_UPDATE_NOTIFY_VACUUM. The undo
side, heap_rv_undo_update, restores the old header bytes and runs
vacuum_rv_check_at_undo (REC_HOME / REC_NEWHOME) so a rolled-back
delete is not left falsely visible to vacuum.
10.7 DELETE recovery — non-MVCC and MVCC undo
Section titled “10.7 DELETE recovery — non-MVCC and MVCC undo”Non-MVCC delete. Redo (heap_rv_redo_delete) is a bare
spage_delete; undo (heap_rv_undo_delete) is the mirror of an insert:
// heap_rv_undo_delete -- src/storage/heap_file.cerror_code = heap_rv_redo_insert (thread_p, rcv); /* <- re-insert the deleted record */if (error_code != NO_ERROR) { return error_code; }recdes_type = *(INT16 *) (rcv->data);if (recdes_type == REC_NEWHOME) /* <- only REC_NEWHOME needs the guard */ { vacuum_rv_check_at_undo (thread_p, rcv->pgptr, slotid, recdes_type); } /* fail: assert_release+ER_FAILED */Re-inserting works because the redo payload still carries type and bytes;
only REC_NEWHOME also runs the vacuum atomicity check.
MVCC delete undo (heap_rv_mvcc_undo_delete) clears the
delete-MVCCID flag rather than deleting anything:
// heap_rv_mvcc_undo_delete -- src/storage/heap_file.cslotid = rcv->offset & (~HEAP_RV_FLAG_VACUUM_STATUS_CHANGE);spage_get_record (..., slotid, &rebuild_record, COPY); /* read current page bytes */or_mvcc_get_header (&rebuild_record, &mvcc_rec_header);assert (MVCC_IS_FLAG_SET (&mvcc_rec_header, OR_MVCC_FLAG_VALID_DELID)); /* must have been deleted */MVCC_CLEAR_FLAG_BITS (&mvcc_rec_header, OR_MVCC_FLAG_VALID_DELID); /* <- un-delete, then or_mvcc_set_header */spage_update (..., slotid, &rebuild_record); /* each step: assert_release(false)+ER_FAILED on fail */There is no redo payload — the undo image is “the page minus the DELID
flag.” Undo for RVHF_MVCC_DELETE_REC_HOME / ..._REC_NEWHOME.
10.8 Page-level recovery — heap_rv_redo_newpage and heap_rv_redo_reuse_page
Section titled “10.8 Page-level recovery — heap_rv_redo_newpage and heap_rv_redo_reuse_page”New page (heap_rv_redo_newpage) redoes a page’s first state — set
type, spage_initialize with recovery-space saving on, insert the
header/chain record at the reserved slot:
// heap_rv_redo_newpage -- src/storage/heap_file.cspage_initialize (thread_p, rcv->pgptr, heap_get_spage_type (), HEAP_MAX_ALIGN, SAFEGUARD_RVSPACE);sp_success = spage_insert (thread_p, rcv->pgptr, &recdes, &slotid); /* recdes.type = REC_HOME */if (sp_success != SP_SUCCESS || slotid != HEAP_HEADER_AND_CHAIN_SLOTID) { er_set (ER_FATAL_ERROR_SEVERITY, ...); return er_errid (); } /* header/chain must be slot 0, else fatal */Reuse page (heap_rv_redo_reuse_page) bulk-deletes all records when
a page is recycled for a different class:
// heap_rv_redo_reuse_page -- src/storage/heap_file.cconst bool is_header_page = ((rcv->offset != 0) ? true : false);(void) heap_delete_all_page_records (thread_p, &vpid, rcv->pgptr); /* idempotent: redo may run twice */if (!is_header_page) /* header page skips, fixed later via RVHF_STATS */ { COPY_OID (&(chain->class_oid), (OID *) (rcv->data)); ... } /* <- re-stamp class, reset max_mvccid, vacuum=NONE */heap_delete_all_page_records is idempotent because redo may run twice;
header pages skip the chain rewrite (fixed later via RVHF_STATS).
10.9 Slotted-page recovery primitives
Section titled “10.9 Slotted-page recovery primitives”These exist because of recovery: where forward spage_insert picks a
slot, recovery must place a record at a specific, previously-assigned
slotid so OIDs stay stable across a crash.
// spage_insert_for_recovery -- src/storage/slotted_page.cif (anchor_type != ANCHORED && anchor_type != ANCHORED_DONT_REUSE_SLOTS) { return spage_insert_at (thread_p, page_p, slot_id, record_descriptor_p); } /* unanchored: shift */if (slot_id < page_header_p->num_slots) /* slot exists: assert empty, then refill */ { slot_p->record_type = REC_DELETED_WILL_REUSE; } /* <- keeps the OID at this slotid */spage_find_empty_slot_at (thread_p, page_p, slot_id, ...);if (record_descriptor_p->type != REC_ASSIGN_ADDRESS) /* <- ASSIGN_ADDRESS reserves, copies nothing */ { memcpy ((char *) page_p + slot_p->offset_to_record, record_descriptor_p->data, ...); }// spage_delete_for_recovery -- src/storage/slotted_page.cif (spage_delete (thread_p, page_p, slot_id) != slot_id) { return NULL_SLOTID; }if (page_header_p->anchor_type == ANCHORED_DONT_REUSE_SLOTS) /* normal delete left REC_MARKDELETED */ { slot_p = spage_find_slot (page_p, page_header_p, slot_id, false); if (slot_p->offset_to_record == SPAGE_EMPTY_OFFSET && slot_p->record_type == REC_MARKDELETED) { slot_p->record_type = REC_DELETED_WILL_REUSE; pgbuf_set_dirty (...); } /* <- override no-reuse */ }Invariant — recovery never burns a slot for an uncommitted OID.
spage_delete_for_recoveryforcesREC_DELETED_WILL_REUSEeven onANCHORED_DONT_REUSE_SLOTSpages; leavingREC_MARKDELETEDwould leak a slot per rolled-back insert and inflate OIDs after crashes.
10.10 The is_saving / spage_save_space undo-space reservation
Section titled “10.10 The is_saving / spage_save_space undo-space reservation”SPAGE_HEADER::is_saving (Chapter 2; set at spage_initialize via
SAFEGUARD_RVSPACE = true) exists purely for recovery: when a
transaction frees space it must reserve it so a later rollback can
re-grow the record. spage_delete calls spage_save_space only when
is_saving, which short-circuits before recording an entry in three cases:
// spage_save_space -- src/storage/slotted_page.cif (space == 0 || log_is_in_crash_recovery ()) { return NO_ERROR; }if (VACUUM_IS_THREAD_VACUUM_WORKER (thread_p)) { return NO_ERROR; } /* vacuum never rolls back */if (space < 0 || !logtb_is_active (thread_p, tranid)) { return NO_ERROR; }// ... otherwise: find_or_insert SPAGE_SAVE_HEAD for VPID, extend SPAGE_SAVE_ENTRY for tranid ...Only an active forward transaction freeing positive space records an
entry, keyed by VPID in spage_Saving_hashmap and threaded onto the TDES
for release at transaction end (§10.4’s heap_rv_undo_insert reads
through these reservations).
Invariant — freed space is reserved for the freeing transaction until it commits or aborts, but never during recovery. Enforced by the
is_savinggate plus thelog_is_in_crash_recoveryshort-circuit; otherwise the hashmap fills with phantom entries for dead transactions.
10.11 Chapter summary — key takeaways
Section titled “10.11 Chapter summary — key takeaways”- Recovery is table-driven and symmetric.
RV_fun[]pins eachRVHF_*to an undo/redo pair; undo-insert deletes, undo-delete inserts, and UPDATE shares oneheap_rv_undoredo_updatecore both ways. - The vacuum bit rides in
rcv->offset. Handlers maskHEAP_RV_FLAG_VACUUM_STATUS_CHANGE(0x8000); record-rebuilders alsoassert (slotid > 0), whileheap_rv_undo_insertonly masks. - MVCC redo rebuilds the header from the log. INSID comes from
rcv->mvcc_id, not page bytes, making redo idempotent and snapshot-correct. - The UPDATE undo image is the version-chain predecessor.
heap_update_set_prev_version/heap_update_bigonestamp the new record’sprev_version_lsawith the old record’s undo-record LSA, so MVCC reads (Chapter 5) and rollback replay the same physical record. - Recovery has its own slotted-page primitives.
spage_insert_for_recoverykeeps OIDs stable at a specific slotid;spage_delete_for_recoveryforcesREC_DELETED_WILL_REUSEso a rolled-back insert never leaks a slot. is_savingreserves freed space for rollback, but stands down during recovery via thelog_is_in_crash_recoveryshort-circuit.- This path reconstructs states; it never creates new ones — every handler is the mechanical inverse or replay of a Chapter 4–9 operation.
Position hints as of this revision
Section titled “Position hints as of this revision”The following are line numbers as observed on 2026-06-08; symbols are the canonical anchor and line numbers are hints that decay.
| Symbol | File | Line |
|---|---|---|
OR_MVCC_DELETE_ID_OFFSET | src/base/object_representation.h | 486 |
OR_MVCC_MAX_HEADER_SIZE | src/base/object_representation_constants.h | 142 |
OR_MVCC_MIN_HEADER_SIZE | src/base/object_representation_constants.h | 145 |
OR_MVCC_FLAG_MASK | src/base/object_representation_constants.h | 160 |
OR_MVCC_FLAG_VALID_INSID | src/base/object_representation_constants.h | 165 |
OR_MVCC_FLAG_VALID_DELID | src/base/object_representation_constants.h | 168 |
OR_MVCC_FLAG_VALID_PREV_VERSION | src/base/object_representation_constants.h | 171 |
OR_MVCC_REPID_MASK | src/base/object_representation_constants.h | 173 |
or_mvcc_get_header | src/base/object_representation_sr.c | 4237 |
or_mvcc_set_header | src/base/object_representation_sr.c | 4296 |
mvcc_header_size_lookup | src/object/object_representation.c | 70 |
or_header_size | src/object/object_representation.c | 5757 |
vacuum_heap_page | src/query/vacuum.c | 1577 |
vacuum_is_mvccid_vacuumed | src/query/vacuum.c | 7463 |
HEAP_BESTSPACE_SYNC_THRESHOLD | src/storage/heap_file.c | 90 |
HEAP_MVCC_SET_HEADER_MAXIMUM_SIZE | src/storage/heap_file.c | 129 |
HEAP_UPDATE_IS_MVCC_OP | src/storage/heap_file.c | 151 |
HEAP_NUM_BEST_SPACESTATS | src/storage/heap_file.c | 182 |
HEAP_STATS_NEXT_BEST_INDEX | src/storage/heap_file.c | 185 |
HEAP_STATS_PREV_BEST_INDEX | src/storage/heap_file.c | 187 |
heap_hdr_stats | src/storage/heap_file.c | 191 |
HEAP_PAGE_FLAG_VACUUM_STATUS_MASK | src/storage/heap_file.c | 240 |
HEAP_PAGE_SET_VACUUM_STATUS | src/storage/heap_file.c | 244 |
HEAP_PAGE_GET_VACUUM_STATUS | src/storage/heap_file.c | 262 |
heap_chain | src/storage/heap_file.c | 270 |
struct heap_chain | src/storage/heap_file.c | 270 |
heap_stats_bestspace_cache | src/storage/heap_file.c | 469 |
heap_Find_best_page_limit | src/storage/heap_file.c | 488 |
heap_Bestspace | src/storage/heap_file.c | 499 |
HEAP_RV_FLAG_VACUUM_STATUS_CHANGE | src/storage/heap_file.c | 514 |
heap_stats_add_bestspace | src/storage/heap_file.c | 1024 |
heap_is_big_length | src/storage/heap_file.c | 1330 |
heap_get_spage_type | src/storage/heap_file.c | 1353 |
heap_is_reusable_oid | src/storage/heap_file.c | 1364 |
heap_stats_get_min_freespace | src/storage/heap_file.c | 2917 |
heap_stats_update | src/storage/heap_file.c | 2966 |
heap_stats_update_internal | src/storage/heap_file.c | 3020 |
heap_stats_put_second_best | src/storage/heap_file.c | 3142 |
heap_stats_get_second_best | src/storage/heap_file.c | 3184 |
heap_stats_find_page_in_bestspace | src/storage/heap_file.c | 3272 |
heap_stats_find_best_page | src/storage/heap_file.c | 3519 |
heap_stats_sync_bestspace | src/storage/heap_file.c | 3728 |
heap_vpid_alloc | src/storage/heap_file.c | 4284 |
heap_remove_page_on_vacuum | src/storage/heap_file.c | 4698 |
heap_vpid_next | src/storage/heap_file.c | 5038 |
heap_assign_address | src/storage/heap_file.c | 6015 |
xheap_reclaim_addresses | src/storage/heap_file.c | 6227 |
heap_ovf_insert | src/storage/heap_file.c | 6569 |
heap_ovf_update | src/storage/heap_file.c | 6597 |
heap_get_if_diff_chn | src/storage/heap_file.c | 7400 |
heap_prepare_get_context | src/storage/heap_file.c | 7512 |
heap_get_mvcc_header | src/storage/heap_file.c | 7747 |
heap_get_record_data_when_all_ready | src/storage/heap_file.c | 7834 |
heap_next_internal | src/storage/heap_file.c | 7902 |
heap_rv_redo_newpage | src/storage/heap_file.c | 16203 |
heap_rv_redo_insert | src/storage/heap_file.c | 16321 |
heap_mvcc_log_insert | src/storage/heap_file.c | 16371 |
heap_rv_mvcc_redo_insert | src/storage/heap_file.c | 16442 |
heap_rv_undo_insert | src/storage/heap_file.c | 16536 |
heap_rv_redo_delete | src/storage/heap_file.c | 16589 |
heap_mvcc_log_delete | src/storage/heap_file.c | 16610 |
heap_rv_mvcc_undo_delete | src/storage/heap_file.c | 16663 |
heap_rv_redo_mark_reusable_slot | src/storage/heap_file.c | 16929 |
heap_rv_undo_delete | src/storage/heap_file.c | 16946 |
heap_rv_undo_update | src/storage/heap_file.c | 16981 |
heap_rv_redo_update | src/storage/heap_file.c | 17018 |
heap_rv_undoredo_update | src/storage/heap_file.c | 17029 |
heap_rv_redo_reuse_page | src/storage/heap_file.c | 17065 |
heap_next | src/storage/heap_file.c | 19427 |
heap_get_mvcc_rec_header_from_overflow | src/storage/heap_file.c | 19540 |
heap_set_mvcc_rec_header_on_overflow | src/storage/heap_file.c | 19566 |
heap_set_mvcc_rec_header_on_overflow | src/storage/heap_file.c | 19567 |
heap_get_bigone_content | src/storage/heap_file.c | 19610 |
heap_mvcc_log_home_change_on_delete | src/storage/heap_file.c | 19689 |
heap_mvcc_log_home_no_change | src/storage/heap_file.c | 19724 |
heap_rv_redo_update_and_update_chain | src/storage/heap_file.c | 19745 |
heap_clear_operation_context | src/storage/heap_file.c | 20231 |
heap_build_forwarding_recdes | src/storage/heap_file.c | 20516 |
heap_insert_adjust_recdes_header | src/storage/heap_file.c | 20539 |
heap_insert_adjust_recdes_header | src/storage/heap_file.c | 20540 |
heap_update_adjust_recdes_header | src/storage/heap_file.c | 20671 |
heap_insert_handle_multipage_record | src/storage/heap_file.c | 20834 |
heap_get_insert_location_with_lock | src/storage/heap_file.c | 20885 |
heap_find_location_and_insert_rec_newhome | src/storage/heap_file.c | 21022 |
heap_insert_newhome | src/storage/heap_file.c | 21105 |
heap_insert_physical | src/storage/heap_file.c | 21169 |
heap_log_insert_physical | src/storage/heap_file.c | 21229 |
heap_delete_adjust_header | src/storage/heap_file.c | 21290 |
heap_delete_bigone | src/storage/heap_file.c | 21389 |
heap_delete_relocation | src/storage/heap_file.c | 21570 |
heap_delete_home | src/storage/heap_file.c | 22067 |
heap_delete_physical | src/storage/heap_file.c | 22388 |
heap_log_delete_physical | src/storage/heap_file.c | 22428 |
heap_update_bigone | src/storage/heap_file.c | 22484 |
heap_update_relocation | src/storage/heap_file.c | 22700 |
heap_update_home | src/storage/heap_file.c | 23026 |
heap_update_physical | src/storage/heap_file.c | 23257 |
heap_create_insert_context | src/storage/heap_file.c | 23358 |
heap_create_delete_context | src/storage/heap_file.c | 23385 |
heap_create_update_context | src/storage/heap_file.c | 23412 |
heap_insert_logical | src/storage/heap_file.c | 23460 |
heap_delete_logical | src/storage/heap_file.c | 23676 |
heap_update_logical | src/storage/heap_file.c | 23867 |
heap_vacuum_all_objects | src/storage/heap_file.c | 24408 |
heap_page_update_chain_after_mvcc_op | src/storage/heap_file.c | 24785 |
heap_page_set_vacuum_status_none | src/storage/heap_file.c | 24939 |
heap_page_get_vacuum_status | src/storage/heap_file.c | 25014 |
heap_get_visible_version_from_log | src/storage/heap_file.c | 25329 |
heap_get_visible_version | src/storage/heap_file.c | 25456 |
heap_scan_get_visible_version | src/storage/heap_file.c | 25494 |
heap_get_visible_version_internal | src/storage/heap_file.c | 25577 |
heap_update_set_prev_version | src/storage/heap_file.c | 25689 |
heap_get_last_version | src/storage/heap_file.c | 25793 |
heap_prepare_object_page | src/storage/heap_file.c | 25856 |
heap_clean_get_context | src/storage/heap_file.c | 25904 |
heap_init_get_context | src/storage/heap_file.c | 25944 |
heap_alloc_new_page | src/storage/heap_file.c | 26241 |
HEAP_HEADER_AND_CHAIN_SLOTID | src/storage/heap_file.h | 62 |
HEAP_ISJUNK_OID | src/storage/heap_file.h | 66 |
HEAP_SCANCACHE_SET_NODE | src/storage/heap_file.h | 83 |
HEAP_DROP_FREE_SPACE | src/storage/heap_file.h | 103 |
heap_bestspace | src/storage/heap_file.h | 120 |
heap_scancache_node | src/storage/heap_file.h | 127 |
heap_scancache | src/storage/heap_file.h | 143 |
HEAP_OPERATION_TYPE | src/storage/heap_file.h | 251 |
update_inplace_style | src/storage/heap_file.h | 253 |
HEAP_IS_UPDATE_INPLACE | src/storage/heap_file.h | 262 |
heap_operation_context | src/storage/heap_file.h | 267 |
HEAP_PAGE_VACUUM_STATUS | src/storage/heap_file.h | 354 |
heap_get_context | src/storage/heap_file.h | 362 |
spage_verify_header | src/storage/slotted_page.c | 346 |
spage_is_valid_anchor_type | src/storage/slotted_page.c | 375 |
spage_free_saved_spaces | src/storage/slotted_page.c | 393 |
spage_save_space | src/storage/slotted_page.c | 488 |
spage_initialize | src/storage/slotted_page.c | 1094 |
spage_compact | src/storage/slotted_page.c | 1174 |
spage_find_free_slot | src/storage/slotted_page.c | 1294 |
spage_check_space | src/storage/slotted_page.c | 1347 |
spage_find_empty_slot | src/storage/slotted_page.c | 1396 |
spage_add_new_slot | src/storage/slotted_page.c | 1568 |
spage_take_slot_in_use | src/storage/slotted_page.c | 1608 |
spage_find_empty_slot_at | src/storage/slotted_page.c | 1674 |
spage_check_record_for_insert | src/storage/slotted_page.c | 1745 |
spage_insert | src/storage/slotted_page.c | 1769 |
spage_find_slot_for_insert | src/storage/slotted_page.c | 1801 |
spage_insert_data | src/storage/slotted_page.c | 1841 |
spage_insert_at | src/storage/slotted_page.c | 1902 |
spage_insert_for_recovery | src/storage/slotted_page.c | 1962 |
spage_is_record_located_at_end | src/storage/slotted_page.c | 2039 |
spage_reduce_a_slot | src/storage/slotted_page.c | 2057 |
spage_delete | src/storage/slotted_page.c | 2084 |
spage_delete_for_recovery | src/storage/slotted_page.c | 2177 |
spage_check_updatable | src/storage/slotted_page.c | 2223 |
spage_update_record_in_place | src/storage/slotted_page.c | 2409 |
spage_update_record_after_compact | src/storage/slotted_page.c | 2465 |
spage_update | src/storage/slotted_page.c | 2556 |
spage_reclaim | src/storage/slotted_page.c | 2719 |
spage_mark_deleted_slot_as_reusable | src/storage/slotted_page.c | 4022 |
spage_find_slot | src/storage/slotted_page.c | 4609 |
spage_has_enough_total_space | src/storage/slotted_page.c | 4639 |
spage_has_enough_contiguous_space | src/storage/slotted_page.c | 4679 |
spage_vacuum_slot | src/storage/slotted_page.c | 4857 |
spage_need_compact | src/storage/slotted_page.c | 5275 |
ANCHORED | src/storage/slotted_page.h | 38 |
SP_ERROR | src/storage/slotted_page.h | 49 |
SP_SUCCESS | src/storage/slotted_page.h | 50 |
SP_DOESNT_FIT | src/storage/slotted_page.h | 51 |
SAFEGUARD_RVSPACE | src/storage/slotted_page.h | 53 |
SPAGE_HEADER_FLAG_NONE | src/storage/slotted_page.h | 57 |
spage_header | src/storage/slotted_page.h | 64 |
spage_slot | src/storage/slotted_page.h | 88 |
hfid | src/storage/storage_common.h | 193 |
record_type | src/storage/storage_common.h | 1145 |
REC_UNKNOWN | src/storage/storage_common.h | 1148 |
REC_ASSIGN_ADDRESS | src/storage/storage_common.h | 1151 |
REC_HOME | src/storage/storage_common.h | 1154 |
REC_NEWHOME | src/storage/storage_common.h | 1157 |
REC_RELOCATION | src/storage/storage_common.h | 1160 |
REC_BIGONE | src/storage/storage_common.h | 1163 |
REC_MARKDELETED | src/storage/storage_common.h | 1168 |
REC_DELETED_WILL_REUSE | src/storage/storage_common.h | 1173 |
REC_4BIT_USED_TYPE_MAX | src/storage/storage_common.h | 1185 |
mvcc_rec_header | src/transaction/mvcc.h | 38 |
MVCC_REC_HEADER_INITIALIZER | src/transaction/mvcc.h | 47 |
MVCC_IS_REC_DELETED_BY | src/transaction/mvcc.h | 130 |
MVCC_IS_CHN_UPTODATE | src/transaction/mvcc.h | 137 |
RVHF_INSERT | src/transaction/recovery.c | 279 |
Sources
Section titled “Sources”cubrid-heap-manager.md— the high-level companion (design intent, theory).- Raw analyses under
raw/code-analysis/cubrid/storage/heap_manager/. - Code:
src/storage/heap_file.{c,h},src/storage/slotted_page.{c,h}. - Methodology:
knowledge/methodology/code-analysis-detail-doc.md.