CUBRID Heap Manager — Slotted Pages, Record Layout, Operations, MVCC, and Caches
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Source verification (as of 2026-05-01)
- Beyond CUBRID — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”A heap file is the row-oriented physical storage at the bottom of a relational database engine: an unordered collection of records spread across pages, indexed by a per-record identifier (RID, ROWID, OID). Database Internals (Petrov, Ch. 3 “File Formats” and Ch. 4 “Implementing B-Trees”) frames it as the storage substrate that indexes point into — index entries carry keys and OIDs, and following an OID terminates in the heap page where the record actually lives. Petrov is also the canonical citation for the slotted page: the page-internal layout that solves the “variable-length records on a fixed-size page” problem by separating fixed-size slots (record offset + length) growing from one end of the page from variable-length record bodies growing from the other, with a free-space gap in the middle. Compaction reuses the gap; deletion only marks a slot, never moves data eagerly.
Two more textbook ingredients shape every implementation:
- Variable-length and oversized records. A record can outgrow its current slot (UPDATE that lengthens a string), or be too big for any single page from the start (a multi-megabyte BLOB). The classical answer is forwarding pointers: the original location keeps a “see other” record that points to the actual data, either on another page in the same heap or in a separate overflow file. See Garcia-Molina/Ullman/Widom Database Systems: The Complete Book, §13.7 “Variable-Length Data and Records”.
- MVCC version chain anchored in the heap. Once the engine
adopts MVCC (see the companion
cubrid-mvcc.md), the per-row header has to carry insert / delete MVCCIDs and a back-pointer to the previous version. The visibility predicate then walks heap → log → undo segment as needed. The heap manager and the MVCC subsystem share the record header and split responsibility: the heap owns layout, the MVCC subsystem owns visibility.
This document tracks how CUBRID realizes these three pieces — slotted
pages, multi-shape records, and inline MVCC — in src/storage/heap_file.{h,c}
and src/storage/slotted_page.{h,c}.
Common DBMS Design
Section titled “Common DBMS Design”The textbook gives the model; this section names the engineering
conventions that almost every row-oriented engine — PostgreSQL,
Oracle, MySQL InnoDB, SQL Server, CUBRID — adopts in some form.
CUBRID’s specific choices in ## CUBRID's Approach are best read as
one set of dials within this shared design space, not as inventions.
Slotted page as the page-level invariant
Section titled “Slotted page as the page-level invariant”Every page in a row-oriented heap is slotted: a small fixed header, a slot directory growing back from the end, and record bodies growing forward from the header. The free space in the middle is the only place inserts can land; deletes leave holes that are either reused (reusable-slot mode) or compacted on demand. The universal property: slots have stable identifiers, so the record’s OID / TID / RID does not change when its body is shifted during compaction.
OID / TID / RID = (file, page, slot)
Section titled “OID / TID / RID = (file, page, slot)”The per-record identifier is a 3-tuple naming the heap file, the
page within it, and the slot within the page. Index entries carry
these triples; following one terminates at a slot, and the slot’s
offset is the only thing that may move during compaction. PostgreSQL
calls it (blocknumber, offsetnumber) and pairs it with xmin/
xmax; Oracle uses the same shape under ROWID; CUBRID uses
OID = (volid, pageid, slotid).
In-page forwarding (relocation) for “grew slightly”
Section titled “In-page forwarding (relocation) for “grew slightly””When an UPDATE makes the new image too big for its current slot, the engine has two cheap options before falling back to the overflow file:
- Move within the page if there is enough free space (slot identifier stays valid; only the offset in the slot changes).
- Relocate to another page in the same heap with a forwarding
record in the original slot pointing at the new home. Index
entries continue to land at the original OID; the engine
transparently follows the forwarding record on read. PostgreSQL
calls this an HOT chain or, in the cross-page case, a line-pointer
redirect; Oracle calls the forwarding record a row migration;
CUBRID calls them
REC_RELOCATION+REC_NEWHOME.
Overflow file for “too big from the start”
Section titled “Overflow file for “too big from the start””Records bigger than the page’s payload area cannot live in the heap
file at all. They go to a separate overflow file as a
page-spanning blob, and the heap stores only a tiny reference
record pointing at the overflow location. CUBRID calls this
REC_BIGONE + an unstructured page-aligned overflow record.
Free-space hint cache
Section titled “Free-space hint cache”Finding “a page with enough free space for this record” is a recurring cost. The textbook solution is a per-heap free-space map or a small cache of best-fit pages. Almost every engine maintains something like it: PostgreSQL’s FSM, Oracle’s segment header free lists, InnoDB’s PAGE_LSN-driven free space pages. CUBRID maintains a Best Space cache of N pages plus a second-best hint that seeds full scans. Cache is approximate by design — exact tracking costs more than scanning.
Schema-representation cache
Section titled “Schema-representation cache”To interpret a raw record’s attributes, the engine has to know the
table’s schema (column count, types, offsets, indexes). The schema
itself is a row in another heap (the catalog / root class), so a
naive read parses two records: the catalog row, then the data row.
The standard fix is a class-representation cache keyed by class
OID. CUBRID’s is HEAP_CLASSREPR_CACHE; PostgreSQL’s relcache and
syscache play the equivalent role.
MVCC version metadata in the record header
Section titled “MVCC version metadata in the record header”When the engine adopts MVCC, the per-record header gains the
standard (inserter, deleter, prev_version_pointer) triple plus a
flag byte controlling which fields are physically present. The
visibility predicate (see cubrid-mvcc.md) reads the inserter and
deleter against the snapshot’s active set; the version chain walks
the back-pointer when the current version is too new for the
snapshot. CUBRID’s mvcc_rec_header lives inside each REC_HOME /
REC_NEWHOME body.
Theory ↔ CUBRID mapping
Section titled “Theory ↔ CUBRID mapping”The textbook concepts of §“Theoretical Background” map to CUBRID’s
named entities as follows. ## CUBRID's Approach is the slow zoom
into each row.
| Theory | CUBRID name |
|---|---|
| Heap file identifier | HFID = {volume_file_id, header_page_id} |
| Per-record identifier | OID = {volid, pageid, slotid} |
| Slotted page header | SPAGE_HEADER in slotted_page.h |
| Slot directory entry | SPAGE_SLOT (offset:14 + length:14 + type:4 = 32 bits) |
| Record types (vocabulary) | 9-mode INT16 enum: REC_HOME / REC_RELOCATION / REC_NEWHOME / REC_BIGONE / REC_ASSIGN_ADDRESS / REC_MVCC_NEXT_VERSION / REC_MARKDELETED / REC_DELETED_WILL_REUSE / REC_UNKNOWN |
| In-page forwarding record | REC_RELOCATION (header) + REC_NEWHOME (body) |
| Overflow record | REC_BIGONE (heap reference) + raw overflow record (separate file) |
| Per-heap header | HEAP_HDR_STATS (slot 0 of header page) |
| Per-page chain link | HEAP_CHAIN (slot 0 of every non-header page) |
| MVCC version metadata in record | mvcc_rec_header (mvcc.h) embedded inside REC_HOME body |
| Free-space hint cache (global) | HEAP_STATS_BESTSPACE_CACHE |
| Schema-repr cache (global) | HEAP_CLASSREPR_CACHE |
| Class OID → HFID cache (global) | HEAP_HFID_TABLE (lock-free hash) |
| Per-scan local cache | HEAP_SCANCACHE (page latch + watcher + snapshot) |
| Per-attrinfo local cache | HEAP_CACHE_ATTRINFO (decoded record values) |
CUBRID’s Approach
Section titled “CUBRID’s Approach”CUBRID instantiates the conventions above with two layers — the
generic slotted-page abstraction in slotted_page.{h,c} (used by
heap, catalog, btree, and other subsystems) and the heap file
manager in heap_file.{h,c} that adds the multi-shape record
vocabulary, MVCC integration, and five caches. The distinguishing
choices are: (1) a 9-record-type vocabulary that makes the
forwarding / overflow / MVCC-tombstone states explicit and
greppable; (2) the heap header is a real record (slot 0 of the
header page) rather than reserved bytes — uniform with normal heap
pages, which carry a HEAP_CHAIN in slot 0 instead; (3) the
HEAP_OPERATION_CONTEXT struct unifies INSERT / UPDATE / DELETE
through one switch on type, instead of three independent code
paths; (4) a five-cache architecture (3 global + 2 local) keeps
the hot paths off the catalog and off full heap scans.

Figure 1 — Heap file layout. The first page (HFID.hpgid) is the
header page: slot 0 holds HEAP_HDR_STATS, slot 1+ may hold
records. Every subsequent page is a regular slotted page whose
slot 0 holds a HEAP_CHAIN (prev_vpid / next_vpid linkage +
max_mvccid for vacuum). All pages are double-linked into one
chain rooted at the header. Records too big for any heap page go
to a separate overflow file as raw REC_BIGONE payload; the
heap holds only the small forwarding record. (Source: original
“1. Heap Overview & Architecture” deck.)
How a heap operation flows
Section titled “How a heap operation flows”flowchart LR
A["INSERT / UPDATE / DELETE / READ"] --> B["heap_create_<op>_context\n→ HEAP_OPERATION_CONTEXT"]
B --> C{"op type?"}
C -- "INSERT" --> I["heap_insert_logical\n• consult bestspace cache\n• allocate slot\n• write record + MVCC header\n• log redo/undo"]
C -- "UPDATE" --> U["heap_update_logical\n• read existing\n• adjust MVCC header\n (prev_version_lsa)\n• maybe relocate / overflow\n• log undo with old image"]
C -- "DELETE" --> D["heap_delete_logical\n• set delete_mvccid\n• special-case overflow\n• log redo/undo"]
C -- "READ" --> R["heap_get_visible_version\n• fix home page\n• follow REC_RELOCATION /\n REC_BIGONE if needed\n• mvcc_satisfies_snapshot\n• walk prev_version_lsa if too-new"]
I --> Z["return result OID\nor SCAN_CODE"]
U --> Z
D --> Z
R --> Z
Each labeled box is unpacked in the subsections below. The boxes do not move; only the level of detail increases.
Slotted page format — slotted_page.{h,c}
Section titled “Slotted page format — slotted_page.{h,c}”The slotted page is the page-internal substrate: header at the top, slot directory at the bottom (growing back), record bodies in between (growing forward), and a free gap in the middle. CUBRID’s specific shape:
// SPAGE_HEADER — src/storage/slotted_page.h (condensed)typedef struct spage_header SPAGE_HEADER;struct spage_header{ PGNSLOTS num_slots; /* total slots (including holes) */ PGNSLOTS num_records; /* slots actually holding records */ INT16 anchor_type; /* see Anchor types below */ unsigned short alignment; int total_free; /* total free bytes (incl. holes) */ int cont_free; /* contiguous free area in middle */ int offset_to_free_area; /* end of last record body */ int reserved1; int flags; unsigned int is_saving:1; /* undo-image preservation */ unsigned int need_update_best_hint:1; unsigned int reserved_bits:30;};
// SPAGE_SLOT — src/storage/slotted_page.htypedef struct spage_slot SPAGE_SLOT;struct spage_slot{ unsigned int offset_to_record:14; /* byte offset of body in page */ unsigned int record_length:14; unsigned int record_type:4; /* see Record types below */};Each slot is 32 bits. With a 16 KB page (DB_PAGESIZE), the offset
and length need 14 bits each (16 K ≤ 2^14). The four-bit
record_type gives the nine-state record-type vocabulary.
flowchart LR
subgraph Page["Slotted page (16 KB)"]
direction LR
H["SPAGE_HEADER\n(num_slots,\nnum_records,\nanchor_type,\ntotal_free,\noffset_to_free_area)"]
R0["record body 0"]
R1["record body 1"]
R2["record body 2"]
GAP["… free gap …"]
S2["slot 2"]
S1["slot 1"]
S0["slot 0"]
H --> R0 --> R1 --> R2 --> GAP --> S2 --> S1 --> S0
end
The slot at index k never moves — its offset can change (when
record bodies are shuffled during compaction) but its identity is
fixed. That stability is what makes the OID (volid, pageid, slotid)
durable across compactions.
Anchor types
Section titled “Anchor types”The SPAGE_HEADER.anchor_type field selects the page’s policy on
slot reuse:
| Anchor type | Slot reuse? | Use case |
|---|---|---|
ANCHORED | No | Heap page in FILE_HEAP (slot ID is the on-disk OID — must not be reassigned) |
ANCHORED_DONT_REUSE_SLOTS | No | (alias / explicit form) |
UNANCHORED_ANY_SEQUENCE | Yes | Sort runs, query results — slot identity is ephemeral |
UNANCHORED_KEEP_SEQUENCE | Yes (in order) | Catalogs / indexes that need stable slot order |
Heap pages are always anchored — once a slot ID has been handed out
as part of an OID, the slot must remain bound to that OID. After a
DELETE, the slot is marked REC_MARKDELETED (kept) or
REC_DELETED_WILL_REUSE (reusable in the special FILE_HEAP_REUSE_SLOTS
file type, used for some system tables).
Heap file structure — HFID, HEAP_HDR_STATS, HEAP_CHAIN
Section titled “Heap file structure — HFID, HEAP_HDR_STATS, HEAP_CHAIN”A HFID names a heap file by its first page:
// HFID — src/storage/storage_common.h (sketch)typedef struct hfid HFID;struct hfid{ VFID vfid; /* file identifier (volume + file table id) */ INT32 hpgid; /* page id of the header page */};The header page is just a slotted page whose slot 0 holds the heap-wide stats record. Every subsequent page also reserves slot 0, but for the per-page chain record:
// HEAP_HDR_STATS — src/storage/heap_file.c (condensed)typedef struct heap_hdr_stats HEAP_HDR_STATS;struct heap_hdr_stats{ OID class_oid; /* the class whose rows live here */ VFID ovf_vfid; /* overflow file for REC_BIGONE records */ VPID next_vpid; /* first non-header heap page */ int unfill_space; /* per-page reserve kept free for UPDATEs */ struct { int num_pages; int num_recs; float recs_sumlen; int num_other_high_best; int num_high_best; int num_substitutions; int num_second_best; int head_second_best; int tail_second_best; int head; VPID last_vpid; VPID full_search_vpid; VPID second_best[HEAP_NUM_BEST_SPACESTATS]; HEAP_BESTSPACE best[HEAP_NUM_BEST_SPACESTATS]; } estimates; /* approximate heap-wide stats + best-space hint */ int reserved0_for_future; int reserved1_for_future; int reserved2_for_future;};
// HEAP_CHAIN — src/storage/heap_file.ctypedef struct heap_chain HEAP_CHAIN;struct heap_chain{ OID class_oid; VPID prev_vpid; VPID next_vpid; MVCCID max_mvccid; /* max MVCCID seen on this page (vacuum) */ INT32 flags;};Two facts encoded by this layout:
- Slot 0 is always reserved.
HEAP_HEADER_AND_CHAIN_SLOTID = 0inheap_file.h. User data starts at slot 1. Code that scans records skips slot 0 explicitly. - The double linkage is per-page.
HEAP_CHAIN.prev_vpid/next_vpidform a doubly-linked list of heap pages rooted at the header. This is whatheap_nextwalks when no scan-cache hint is available.
Record-type vocabulary
Section titled “Record-type vocabulary”A heap record is one of nine types, encoded in the slot’s 4-bit
record_type:
| Type | Where bytes live | Meaning |
|---|---|---|
REC_HOME | Inline in this slot | Normal record on its home page. |
REC_RELOCATION | This slot | Forwarding pointer (8 bytes: target OID) — body lives elsewhere as REC_NEWHOME. |
REC_NEWHOME | Inline (different page) | The actual body for a relocated record. Reachable only via REC_RELOCATION. |
REC_BIGONE | This slot | Forwarding pointer to overflow file (target VPID). |
REC_ASSIGN_ADDRESS | This slot (zero-length body) | An OID has been minted but no record yet — used by catalog / root class flows that need the OID before the record exists. |
REC_MVCC_NEXT_VERSION | This slot | (Legacy MVCC version-chain marker; superseded by prev_version_lsa in the record header.) |
REC_MARKDELETED | Tombstone | Slot deleted on an ANCHORED_DONT_REUSE_SLOTS page (kept forever to preserve OID identity). |
REC_DELETED_WILL_REUSE | Tombstone (reusable) | Slot deleted on an ANCHORED page (FILE_HEAP_REUSE_SLOTS file type — slot may be re-handed-out). |
REC_UNKNOWN | — | Reserved / sentinel. |
Of these, four are live in normal heap reads:
REC_ASSIGN_ADDRESS, REC_HOME, REC_BIGONE, REC_RELOCATION.
REC_NEWHOME is reachable only via REC_RELOCATION — heap_next
filters it out so it is never returned twice.

Figure 2 — Record types in a CUBRID heap. Slotted Page A
holds two REC_HOME records (orange) and one REC_RELOCATION
forwarding pointer (yellow) → its body lives as a REC_NEWHOME
(olive) on Slotted Page B. Records too big for any heap page
become a REC_BIGONE forwarding pointer (blue) → the actual body
lives as an unmarked overflow record in the overflow file
(HEAP_HDR_STATS.ovf_vfid). Index entries always point to the
original OID; CUBRID transparently follows the forwarding chain
on read. (Source: original “2. Heap Operations” deck.)
Insert flow — heap_insert_logical
Section titled “Insert flow — heap_insert_logical”// heap_insert_logical (sketch) — src/storage/heap_file.cintheap_insert_logical (THREAD_ENTRY *thread_p, HEAP_OPERATION_CONTEXT *context, PGBUF_WATCHER *home_hint_p){ /* 1. Adjust MVCC header on the record (set ins_mvccid). */ /* 2. If record too big → insert into overflow file first; build * REC_BIGONE forwarding record into context->recdes_p. */ /* 3. Find a target home page: * - check HEAP_STATS_BESTSPACE_CACHE * - else consult HEAP_HDR_STATS.estimates.best[] * - else scan up to min(20% of pages, 100 pages) starting from * full_search_vpid / second_best[] * - else allocate a new heap page (heap_alloc_new_page). * Once a slot is chosen, the resulting OID is determined and * its row lock can be acquired — the heap operation itself is * one of the few code paths that takes the row lock inside * rather than before the call (UPDATE / DELETE acquire upstream * in locator). */ /* 4. Splice the record into the slot. */ /* 5. Log redo/undo for crash recovery. */ /* 6. Post-process: bump statistics, update the bestspace cache, * flag the schema cache (CAS) if this insert is into a * catalog row. */}Three things to internalize:
- MVCCID is stamped before the record is placed. The flag bits
in the record header are set so the runtime knows whether
mvcc_ins_id,mvcc_del_id,prev_version_lsaare present. - OID is decided by the slot, not requested in advance. Hence the row lock is acquired during INSERT, not before. (Compare with UPDATE / DELETE, which already know the target OID and lock upstream in locator.)
- “Best” is a hint, not a search. The cache and the estimates bias toward pages-with-room; if they fail, the engine falls back to a bounded scan, then to allocation. The cost amortizes — most inserts take O(1) cache hits.
Update flow — heap_update_logical
Section titled “Update flow — heap_update_logical”UPDATE is a superset of INSERT plus a read-old + maybe-relocate step:
flowchart TD
A["heap_update_logical(context)"] --> B["read existing record\n(via home → REC_RELOCATION → REC_NEWHOME\nor home → REC_BIGONE chain)"]
B --> C["heap_update_adjust_recdes_header\n• stamp new ins_mvccid\n• fill prev_version_lsa from undo write"]
C --> D{"new size fits\noriginal slot?"}
D -- "yes" --> E["overwrite in place\n(REC_HOME → REC_HOME)"]
D -- "no, fits other heap page" --> F["allocate target slot,\nsplice REC_NEWHOME there,\nrewrite home as REC_RELOCATION"]
D -- "no, too big for any page" --> G["insert into overflow file,\nrewrite home as REC_BIGONE"]
E --> Z["log undo + redo"]
F --> Z
G --> Z
The current type can also transition. The matrix below summarizes
the legal transitions a single UPDATE can produce. Each transition
also involves cleanup of the old type’s storage (e.g.,
REC_RELOCATION’s old REC_NEWHOME slot has to be marked deleted).
| Old type | New type alternatives (priority order) |
|---|---|
REC_HOME | REC_HOME (same slot) → REC_RELOCATION + REC_NEWHOME → REC_BIGONE + overflow body |
REC_RELOCATION + REC_NEWHOME | REC_HOME → REC_RELOCATION + REC_NEWHOME (same other-page) → REC_RELOCATION + REC_NEWHOME (different page) → REC_BIGONE + overflow |
REC_BIGONE + overflow body | REC_HOME → REC_RELOCATION + REC_NEWHOME → REC_BIGONE + overflow (same body, in place if fits) |
The undo image is what the prior version of the row will eventually
reach via prev_version_lsa — so the LSN of the undo log entry is
what gets stamped into the new record’s header.
Delete flow — heap_delete_logical
Section titled “Delete flow — heap_delete_logical”DELETE under MVCC does not physically remove the record. It
sets mvcc_del_id in the existing record header and logs the
operation. Vacuum (heap_vacuum_all_objects) is the one that
actually frees the slot once mvcc_del_id is older than the global
oldest-visible MVCCID. The DELETE flow therefore looks a lot like
UPDATE: read the record, adjust the header, log.
Two special cases:
- Overflow records (
REC_BIGONE). The MVCC header lives on the overflow page (heap_set_mvcc_rec_header_on_overflow). The delete path edits that page’s record, not the heap-page forwarder. The forwarder is updated only when the overflow record itself is freed. - Non-MVCC classes. Some system catalogs are MVCC-disabled
(
HEAP_SCANCACHE.mvcc_disabled_class = true). For those, DELETE physically marks the slotREC_MARKDELETED/REC_DELETED_WILL_REUSEimmediately.
Read flow — heap_get_visible_version / heap_next
Section titled “Read flow — heap_get_visible_version / heap_next”Two entry points: a point-read by OID (used after an index lookup)
and a sequential scan (used for SELECT *).
// heap_get_visible_version (sketch) — src/storage/heap_file.cSCAN_CODEheap_get_visible_version (THREAD_ENTRY *thread_p, const OID *oid, OID *class_oid, RECDES *recdes, HEAP_SCANCACHE *scan_cache, int ispeeking, int old_chn){ /* 1. heap_prepare_object_page: fix the home page (using * scan_cache->page_watcher to keep the latch across calls). */ /* 2. Inspect the slot type: * - REC_HOME → done, body is here. * - REC_RELOCATION → fix forward page, follow to REC_NEWHOME. * - REC_BIGONE → fix overflow page, read raw record. * - REC_ASSIGN_ADDRESS / REC_NEWHOME / REC_MARKDELETED / * REC_DELETED_WILL_REUSE / REC_UNKNOWN → not visible. */ /* 3. Read mvcc_rec_header. * Apply mvcc_satisfies_snapshot (see cubrid-mvcc.md): * - SNAPSHOT_SATISFIED → return record body. * - TOO_OLD_FOR_SNAPSHOT → return S_DOESNT_EXIST. * - TOO_NEW_FOR_SNAPSHOT → walk prev_version_lsa into the log * and re-evaluate. */}heap_next is the scan variant: it walks the page chain, then
within each page walks slots, calling heap_get_visible_version
per slot and skipping non-live record types
(REC_NEWHOME, REC_ASSIGN_ADDRESS, header / chain, empty slots).
The HEAP_SCANCACHE keeps the page latch across calls
(cache_last_fix_page = true) so consecutive heap_next calls
within the same page do not pay re-fix cost.
flowchart LR TX["TX with snapshot"] --> POINT["point read\nheap_get_visible_version(oid)"] TX --> SCAN["sequential scan\nheap_next(scan_cache)"] POINT --> FIX1["heap_prepare_object_page"] SCAN --> FIX2["page chain walk\n(HEAP_CHAIN.next_vpid)"] FIX1 --> SLOT FIX2 --> SLOT["per-slot dispatch\nby record_type"] SLOT -- "HOME" --> READ["read body in place"] SLOT -- "RELOCATION" --> NEW["follow → REC_NEWHOME"] SLOT -- "BIGONE" --> OVF["follow → overflow file"] NEW --> READ OVF --> READ READ --> VIS["mvcc_satisfies_snapshot"] VIS -- "SATISFIED" --> RET["return record"] VIS -- "TOO_NEW" --> CHAIN["walk prev_version_lsa\ninto log"] VIS -- "TOO_OLD" --> SKIP["skip"] CHAIN --> VIS
MVCC integration — record header
Section titled “MVCC integration — record header”Every REC_HOME / REC_NEWHOME body and every overflow record
carries a mvcc_rec_header whose layout was specified in
cubrid-mvcc.md:
// mvcc_rec_header — src/transaction/mvcc.h (recap)struct mvcc_rec_header{ INT32 mvcc_flag:8; /* which optional fields are present */ INT32 repid:24; /* representation id */ int chn; /* cache coherency number */ MVCCID mvcc_ins_id; /* set on insert / update */ MVCCID mvcc_del_id; /* set on delete */ LOG_LSA prev_version_lsa; /* back-pointer to previous version * (lives in undo log) */};
Figure 3 — A record updated twice without intervening DELETE. The
heap (top row) holds the current version (mvcc_ins_id3, prev_version_lsa3). Following prev_version_lsa3 lands on a log
record in the log file (bottom rows) carrying the previous
image with (mvcc_ins_id2, prev_version_lsa2), which in turn
points to the original (mvcc_ins_id1, prev_version_lsa1 = NULL).
The visibility predicate walks this chain when the heap version is
too new for the reading snapshot. (Source: original “3. MVCC in
Heap” deck.)
The flag byte controls physical layout — a record that has never
been deleted carries no mvcc_del_id field, saving 8 bytes; a
record that has never been updated carries no prev_version_lsa.
The mvcc_header_size_lookup[8] array (heap_file.c) maps the
flag-byte value to the resulting on-disk header size.
The per-page vacuum tracking lives in HEAP_PAGE_VACUUM_STATUS
(HEAP_PAGE_VACUUM_NONE / _ONCE / _UNKNOWN). The status lives
in HEAP_CHAIN.flags and predicts whether this page still needs
future vacuum visits, which gates page deallocation.
Caches — three global, two local
Section titled “Caches — three global, two local”CUBRID maintains five caches around the heap manager. The classification (“global” vs “local”) is the curator’s, not in source.
flowchart TB
subgraph G["Global caches (process-wide)"]
BS["HEAP_STATS_BESTSPACE_CACHE\nhfid → list of (vpid, freespace)\nseeds insert page selection"]
CR["HEAP_CLASSREPR_CACHE\nclass_oid → OR_CLASSREP\n(parsed schema, attr offsets, btids)"]
HF["HEAP_HFID_TABLE\nclass_oid → HFID + classname\n(lock-free hash, size 1000)"]
end
subgraph L["Local caches (per scan / op)"]
SC["HEAP_SCANCACHE\nscan-scoped\n(page_watcher, mvcc_snapshot,\ncache_last_fix_page, file_type)"]
AI["HEAP_CACHE_ATTRINFO\nrecord-decoding scratchpad\n(read_classrepr, values[])"]
end
HF --> BS
HF --> CR
CR --> AI
SC --> AI
Best space cache — HEAP_STATS_BESTSPACE_CACHE (in
heap_file.c). Maps HFID → small array of (VPID, freespace)
entries. INSERT / UPDATE consult this cache before scanning the
heap. The on-disk equivalent in HEAP_HDR_STATS.estimates.best[]
is the persistent fallback when the in-memory cache is cold or
stale. Sync between cache and on-disk happens lazily via
heap_stats_sync_bestspace. The second_best[] array is a
hint used as a starting point for full scans (treated as a
circular queue with head_second_best / tail_second_best); the
num_substitutions counter makes promotion to best[] happen only
once every 1 000 insertions, deliberately spreading writes across
pages instead of packing them into one.
Class-representation cache — HEAP_CLASSREPR_CACHE. An
OR_CLASSREP is the parsed schema for a single class: column types,
attribute offsets within a raw record, list of indexes (BTIDs),
representation id. Without this cache, every heap_attrinfo_* call
would re-read the catalog row and re-parse it. Eviction is
handled via heap_classrepr_decache.
HFID table — HEAP_HFID_TABLE. A lock-free hash table from
class OID to HFID + cached classname. Hash size is HEAP_HFID_HASH_SIZE = 1000.
Because almost every operation needs the HFID for locking and
heap traversal, this cache short-circuits the catalog lookup.
// HEAP_HFID_TABLE_ENTRY — src/storage/heap_file.hstruct heap_hfid_table_entry{ OID class_oid; /* key */ HEAP_HFID_TABLE_ENTRY *stack; HEAP_HFID_TABLE_ENTRY *next; UINT64 del_id; /* lock-free reclamation */
HFID hfid; /* value */ FILE_TYPE ftype; /* FILE_HEAP or FILE_HEAP_REUSE_SLOTS */ std::atomic<char*> classname;};Scan cache — HEAP_SCANCACHE. Per-scan local state. Crucially
holds:
cache_last_fix_page— whether to keep the page latch across consecutiveheap_nextcalls.mvcc_snapshot— the snapshot to evaluate visibility against.page_latch—LOCKmode to take on each new page (oftenNULL_LOCKbecause the class has already been locked S/X/SIX upstream).page_watcher— physical page latch handle held across calls.partition_list— for partitioned classes, the list of HFIDs currently being scanned.
AttrInfo cache — HEAP_CACHE_ATTRINFO. A scratchpad that
holds an OR_CLASSREP and an array of decoded HEAP_ATTRVALUEs
for one record being read or written. Used by heap_attrinfo_read_dbvalues
(decoding) and heap_attrinfo_transform_to_disk (encoding). The
last_classrepr field tracks the latest representation; the
read_classrepr field tracks the representation actually used for
the record being read (which may differ from the latest if the
record predates a schema change).
CHN — cache coherency between client and server
Section titled “CHN — cache coherency between client and server”mvcc_rec_header.chn is the cache coherency number, a
non-MVCC counter used by client-side caches. When a query reads a
record, the server returns chn along with the data; if the client
later asks for the same record and supplies the previous chn, the
server can skip resending the body if chn is unchanged.
For MVCC tables, chn does not increment on every update (that is
what mvcc_ins_id is for); it increments only for non-MVCC tables
(root class, system catalogs). The
HEAP_CHNGUESS cache (marked “to-be-used” in source — partially
wired, not fully active on the hot path) is intended to
short-circuit the chn comparison for hot OIDs.
Source Walkthrough
Section titled “Source Walkthrough”Anchor on symbol names, not line numbers. The CUBRID source moves; a function name (or struct/enum tag) is the stable handle. Use
git grep -n '<symbol>' src/storage/to locate the current position. The line numbers in the position-hint table at the end of this section were observed when the document was lastupdated:and are intended only as quick hints.
Type definitions (src/storage/)
Section titled “Type definitions (src/storage/)”struct hfid(instorage_common.h) —(VFID, hpgid)heap-file identifier.struct spage_header(inslotted_page.h) — slotted-page header.struct spage_slot(inslotted_page.h) — 32-bit slot directory entry (offset / length / type).struct heap_hdr_stats(inheap_file.c) — heap-wide header record (slot 0 of header page).struct heap_chain(inheap_file.c) — per-page chain link (slot 0 of every non-header page).struct heap_scancache(inheap_file.h) — scan-scoped local state.struct heap_operation_context(inheap_file.h) — INSERT / UPDATE / DELETE input + output bundle.struct heap_get_context(inheap_file.h) — read-by-OID input- output bundle.
struct heap_hfid_table/heap_hfid_table_entry(inheap_file.h) — class OID → HFID lock-free hash.struct heap_cache_attrinfo(inheap_attrinfo.h) — per-record decoder scratchpad.enum HEAP_OPERATION_TYPE(inheap_file.h) —INSERT/UPDATE/DELETE.enum HEAP_PAGE_VACUUM_STATUS(inheap_file.h) —NONE/ONCE/UNKNOWN.- Record-type constants (
REC_HOME,REC_RELOCATION,REC_NEWHOME,REC_BIGONE,REC_ASSIGN_ADDRESS,REC_MVCC_NEXT_VERSION,REC_MARKDELETED,REC_DELETED_WILL_REUSE,REC_UNKNOWN) (inslotted_page.h). - Anchor types
ANCHORED,ANCHORED_DONT_REUSE_SLOTS,UNANCHORED_ANY_SEQUENCE,UNANCHORED_KEEP_SEQUENCE(inslotted_page.h).
Lifecycle (src/storage/heap_file.c)
Section titled “Lifecycle (src/storage/heap_file.c)”heap_manager_initialize— module init.heap_manager_finalize— module teardown.heap_initialize_hfid_table/heap_finalize_hfid_table— HFID cache.heap_classrepr_restart_cache— clears CR cache (used after schema-change events).
Operation entry points (src/storage/heap_file.c)
Section titled “Operation entry points (src/storage/heap_file.c)”heap_create_insert_context/heap_insert_logical.heap_create_update_context/heap_update_logical.heap_create_delete_context/heap_delete_logical.heap_assign_address— pre-mint an OID (REC_ASSIGN_ADDRESS) before the record is written.heap_get_visible_version— point read with MVCC.heap_get_last_version— point read returning latest committed version regardless of snapshot.heap_next/heap_prev/heap_first/heap_last— scan iterators.heap_scancache_start/heap_scancache_end— scan-cache lifecycle.
Page walk and overflow (src/storage/heap_file.c)
Section titled “Page walk and overflow (src/storage/heap_file.c)”heap_vpid_next/heap_vpid_prev/heap_vpid_skip_next— navigate the heap page chain.heap_alloc_new_page— allocate a new heap page and link it.heap_ovf_find_vfid/heap_ovf_delete— overflow file management forREC_BIGONE.heap_set_mvcc_rec_header_on_overflow/heap_get_mvcc_rec_header_from_overflow— MVCC header manipulation on overflow records.heap_remove_page_on_vacuum/heap_page_set_vacuum_status_none/heap_page_get_vacuum_status— page-level vacuum coordination.
Caches (src/storage/heap_file.c)
Section titled “Caches (src/storage/heap_file.c)”heap_classrepr_get/heap_classrepr_free/heap_classrepr_decache— class-representation cache.heap_get_class_info/heap_cache_class_info/heap_get_hfid_if_cached— HFID table lookups.heap_stats_update/heap_should_try_update_stat/heap_stats_sync_bestspace— bestspace cache sync.heap_attrinfo_start/heap_attrinfo_end/heap_attrinfo_read_dbvalues/heap_attrinfo_transform_to_disk— AttrInfo cache.heap_chnguess_get/heap_chnguess_put/heap_chnguess_clear— CHN guess cache.
Recovery (src/storage/heap_file.c)
Section titled “Recovery (src/storage/heap_file.c)”heap_rv_undo_insert/heap_rv_redo_insert/heap_rv_mvcc_redo_insert.heap_rv_undo_delete/heap_rv_redo_delete/heap_rv_mvcc_undo_delete/heap_rv_mvcc_redo_delete_*.heap_rv_undo_update/heap_rv_redo_update/heap_rv_undoredo_update/heap_rv_redo_update_and_update_chain.heap_rv_redo_newpage/heap_rv_redo_reuse_page.
Position hints as of this revision
Section titled “Position hints as of this revision”These line numbers held when the document was last updated:. If
you land at a different definition, the symbol name above is
authoritative; update the table on your way through.
| Symbol | File | Line |
|---|---|---|
struct spage_header | slotted_page.h | (varies) |
struct spage_slot | slotted_page.h | (varies) |
struct heap_hdr_stats | heap_file.c | (varies, near top) |
struct heap_chain | heap_file.c | (varies, near top) |
struct heap_scancache | heap_file.h | 143 |
struct heap_operation_context | heap_file.h | 267 |
struct heap_get_context | heap_file.h | 362 |
struct heap_hfid_table_entry | heap_file.h | 201 |
enum HEAP_PAGE_VACUUM_STATUS | heap_file.h | 354 |
heap_manager_initialize | heap_file.c | (varies) |
heap_insert_logical | heap_file.c | (varies) |
heap_update_logical | heap_file.c | (varies) |
heap_delete_logical | heap_file.c | (varies) |
heap_get_visible_version | heap_file.c | (varies) |
heap_next | heap_file.c | (varies) |
heap_classrepr_get | heap_file.c | (varies) |
heap_alloc_new_page | heap_file.c | (varies) |
The line column is intentionally left as (varies) for the heap
file because it is the largest source file in the project (≈ 27 000
lines). Symbol-level git grep is the recommended lookup.
Source verification (as of 2026-05-01)
Section titled “Source verification (as of 2026-05-01)”Each entry is a fact about the current source — readable without the original analysis materials. The trailing note shows how it was checked and, where relevant, historical drift or limits of verification. Open questions follow as the curator’s recorded gaps; future readers should treat them as starting points, not as known bugs.
Heading date bumped 2026-05-01 to align with the frontmatter
updated:field; the underlying source-verification reads were performed on 2026-04-29 and have not drifted since (no re-verification this pass). Each fact bullet retains its original 2026-04-29 verification date.
Verified facts
Section titled “Verified facts”-
HEAP_HEADER_AND_CHAIN_SLOTID = 0. Verified inheap_file.hon 2026-04-29. Slot 0 is unconditionally reserved on every heap page — forHEAP_HDR_STATSon the header page, forHEAP_CHAINon every other page. Code that scans records skips slot 0 explicitly; the scan iterators (heap_next,heap_first) treat slot 0 as not-a-record. -
The HFID lock-free hash table has 1 000 buckets.
HEAP_HFID_HASH_SIZE = 1000inheap_file.h, verified 2026-04-29. Hard-coded; not a runtime parameter. -
The “page is comfortably free” threshold is 30 % of page size.
HEAP_DROP_FREE_SPACE = (int)(DB_PAGESIZE * 0.3)inheap_file.h, verified 2026-04-29. With the default 16 KB page, a page enters the bestspace cache only when at least ≈ 4 915 bytes are free. The threshold trades insert-speed (more candidate pages, fewer scans) for storage utilization (reserved space sits unused on partially-filled pages). -
Slot fields are 14/14/4 bits.
spage_slotisoffset_to_record:14 + record_length:14 + record_type:4= 32 bits. Verified inslotted_page.hon 2026-04-29. The 14-bit cap on offset matchesDB_PAGESIZE = 16384exactly (2^14); a page-size increase beyond 16 KB would require widening this field. -
Slot zero is not the only special slot — anchor type also matters.
ANCHORED_DONT_REUSE_SLOTSmakes the slot table monotone-grow-only;UNANCHORED_*allows reuse. Heap pages ofFILE_HEAPuseANCHORED_DONT_REUSE_SLOTSso OIDs never alias across deletions; pages ofFILE_HEAP_REUSE_SLOTS(used for some root-class catalogs) useANCHOREDto allow slot reuse. Verified inslotted_page.con 2026-04-29 by readingspage_slot_descriptor_setupand the FILE_HEAP_* type mapping. -
HEAP_OPERATION_CONTEXTcarries fourPGBUF_WATCHERs.home_page_watcher,overflow_page_watcher,header_page_watcher,forward_page_watcher. An UPDATE that relocates a record across heap pages can hold up to three of them simultaneously (home, forward, header). The fourth (overflow) is forREC_BIGONEpaths. Verified inheap_file.hon 2026-04-29. -
The record-type vocabulary is 4 bits = 16 values; only 9 are used. Verified in
slotted_page.hon 2026-04-29. Values 9–15 are reserved. -
mvcc_header_size_lookup[8]indexes by flag-byte value. The MVCC flag byte has 5 bits inOR_MVCC_FLAG_MASK = 0x1f(seecubrid-mvcc.md), which gives 32 possible values, but only combinations of the 3 documented bits actually occur, hence the 8-entry lookup. Confirmed viaheap_file.con 2026-04-29.
Open questions
Section titled “Open questions”-
Why a hash size of exactly 1 000?
HEAP_HFID_HASH_SIZE = 1000is hard-coded with no obvious tuning rationale. Investigation path: trace the value through git history; check whether workloads with > 1 000 classes (large schemas) experience hash collisions on the hot path; consider whether it should scale withmax_clientsornum_classes. -
num_substitutions = 1000— why this constant for second-best promotion? The bestspace cache promotes a page fromsecond_besttobestonly every 1 000 insertions. The deck says this is to “spread out” inserts across pages by leaving high-locality space underutilized. Investigation path: instrument the promotion rate under different workloads; confirm the locality argument empirically. -
REC_MVCC_NEXT_VERSIONlooks legacy. The comment inslotted_page.hdescribes it as “legacy MVCC version-chain marker”, and the modern path usesprev_version_lsain the record header instead. IsREC_MVCC_NEXT_VERSIONactually produced anywhere in the current code, or is it only consumed for backward compatibility on old data files? Investigation path:git grep REC_MVCC_NEXT_VERSIONand trace producer call sites. -
CHN guess cache (
HEAP_CHNGUESS) status. The deck calls it “TBU” (to-be-used) and the source has the structure but limited use. Is this complete and live, partially abandoned, or a stub? Investigation path: check call sites ofheap_chnguess_get/heap_chnguess_putand run a CAS workload to see whether the cache fills. -
unfill_spacebehavior in modern builds.HEAP_HDR_STATSreserves a per-pageunfill_spacefor future UPDATE growth. The deck describes this as a static reserve, but recent commits may have made it adaptive. Investigation path: trace the field’s assignment paths and whether any vacuum step rebalances it. -
Interaction with OOS (Out-of-row Overflow Storage). The
feat/oosbranch is reshapingheap_file.cand the record header to addOR_MVCC_FLAG_HAS_OOS. This document targetsdevelop; once OOS lands, the record-type vocabulary and the overflow flow described here will need a follow-up. (See the adjacent OOS context skill for current status.)
Beyond CUBRID — Comparative Designs & Research Frontiers
Section titled “Beyond CUBRID — Comparative Designs & Research Frontiers”Pointers, not analysis. Each bullet is a starting handle for a follow-up doc; depth here is intentionally shallow.
- PostgreSQL HOT (Heap-Only Tuple). PostgreSQL keeps old
versions inline on the same page when an UPDATE does not touch
any indexed column, avoiding new index entries. CUBRID’s
in-place vs
REC_RELOCATIONdecision is structurally similar but driven by size rather than index touch. A direct comparison would clarify which workload regime each policy optimizes. - PostgreSQL TOAST (The Oversized-Attribute Storage Technique)
vs CUBRID
REC_BIGONE+ overflow file. Both push large values out of the heap, but TOAST is per-attribute and inline-line- pointer; CUBRID is per-record and to a separate file. Different trade-offs for selective scans vs full reads. - InnoDB clustered-index storage. InnoDB stores rows ordered by PK in a B-tree leaf, not in a heap. The heap manager abstraction does not exist; the equivalent is the clustered index’s leaf-page layout. Comparison would frame why CUBRID separates “where rows live” from “where indexes point”.
- In-place vs out-of-place version chains. PostgreSQL keeps
old versions inline on the heap; Oracle UNDO and CUBRID
prev_version_lsapush old versions into the log. Wu et al. In-Memory MVCC Empirical Evaluation (VLDB 2017) compares these design points quantitatively. Worth quantifying for CUBRID’s typical OLTP workload. - MVCC garbage-collection scheduling. CUBRID’s
HEAP_PAGE_VACUUM_STATUSpredicts whether a page needs further vacuum visits before deallocation. Compare with PostgreSQL’s visibility map / freeze map and Oracle’s UNDO retention tuning. - Column-store storage as an alternative. Engines that store columns separately (Vertica, MonetDB, Snowflake) make slotted-page mechanics largely irrelevant. CUBRID is row-only today; positioning the heap manager against column-store designs sets up “what would a column-store CUBRID look like” conversations.
- Recent paper trail. Wu et al., empirical MVCC (VLDB 17); the OOS feature design doc on this knowledge base (in progress); Stoica & Ailamaki, “Enabling Efficient OS Paging for Main-Memory OLTP Databases” (DaMoN 13) for the buffer-pool / heap interaction.
The intent of this section is to seed next documents, not to analyze. Each bullet should become its own curated note when its turn comes.
Sources
Section titled “Sources”Raw analyses (under raw/code-analysis/cubrid/storage/heap_manager/)
Section titled “Raw analyses (under raw/code-analysis/cubrid/storage/heap_manager/)”1._Heap_Overview_Architecture.pdf— heap file structure, slotted page primer, record types overview.2._Heap_Operations.pdf— INSERT / UPDATE / DELETE / READ flows with worked examples and the update-transition matrix.3._MVCC_in_Heap.pdf— MVCC record header layout, visibility predicate, prev_version_lsa chain.4._Caches_in_heap.pdf— bestspace cache, classrepr cache, HFID table, scancache, attrinfo cache.5._Record_in_heap.pdf— record-format details (representation id, flag bits, fixed/variable attributes).[코드분석]heap.pptx— presentation deck consolidating the five-part series, with diagrams of slotted page layout, record types, update transitions, and cache architecture. Most useful text-side index of figures.slotted_page.pdf/slotted page_min.pdf— slotted-page format reference, anchor types, record types.CUBRID-Multiple_Page_Ordered_Fix.pdf— multi-page latching protocol used byPGBUF_WATCHERchains in heap operations.TK-0811-037-CUBRID_Heap_File_Manager.pdf— original technical document; structural reference for the heap file manager.DML Log sequence.pdf— DML → log record sequence reference.
Notion (CUBRID DEV WIKI)
Section titled “Notion (CUBRID DEV WIKI)”- Storage – Concurrency 코드 분석 — module-level context placing Heap Manager underneath Lock Manager / MVCC / Vacuum.
Textbook chapters (under knowledge/research/dbms-general/)
Section titled “Textbook chapters (under knowledge/research/dbms-general/)”- Database Internals (Petrov), Ch. 3 “File Formats” and Ch. 4 “Implementing B-Trees” — slotted page, RID semantics, free-space management.
- Database Systems: The Complete Book (Garcia-Molina, Ullman, Widom), §13.7 “Variable-Length Data and Records” — forwarding, TOAST-style overflow, in-page reorganization.
CUBRID source (under /data/hgryoo/references/cubrid/)
Section titled “CUBRID source (under /data/hgryoo/references/cubrid/)”src/storage/heap_file.hsrc/storage/heap_file.csrc/storage/slotted_page.hsrc/storage/slotted_page.csrc/storage/heap_attrinfo.h(HEAP_CACHE_ATTRINFO type)src/transaction/mvcc.h(mvcc_rec_header type, recapped here)