Skip to content

CUBRID Heap Manager — Slotted Pages, Record Layout, Operations, MVCC, and Caches

Contents:

A heap file is the row-oriented physical storage at the bottom of a relational database engine: an unordered collection of records spread across pages, indexed by a per-record identifier (RID, ROWID, OID). Database Internals (Petrov, Ch. 3 “File Formats” and Ch. 4 “Implementing B-Trees”) frames it as the storage substrate that indexes point into — index entries carry keys and OIDs, and following an OID terminates in the heap page where the record actually lives. Petrov is also the canonical citation for the slotted page: the page-internal layout that solves the “variable-length records on a fixed-size page” problem by separating fixed-size slots (record offset + length) growing from one end of the page from variable-length record bodies growing from the other, with a free-space gap in the middle. Compaction reuses the gap; deletion only marks a slot, never moves data eagerly.

Two more textbook ingredients shape every implementation:

  1. Variable-length and oversized records. A record can outgrow its current slot (UPDATE that lengthens a string), or be too big for any single page from the start (a multi-megabyte BLOB). The classical answer is forwarding pointers: the original location keeps a “see other” record that points to the actual data, either on another page in the same heap or in a separate overflow file. See Garcia-Molina/Ullman/Widom Database Systems: The Complete Book, §13.7 “Variable-Length Data and Records”.
  2. MVCC version chain anchored in the heap. Once the engine adopts MVCC (see the companion cubrid-mvcc.md), the per-row header has to carry insert / delete MVCCIDs and a back-pointer to the previous version. The visibility predicate then walks heap → log → undo segment as needed. The heap manager and the MVCC subsystem share the record header and split responsibility: the heap owns layout, the MVCC subsystem owns visibility.

This document tracks how CUBRID realizes these three pieces — slotted pages, multi-shape records, and inline MVCC — in src/storage/heap_file.{h,c} and src/storage/slotted_page.{h,c}.

The textbook gives the model; this section names the engineering conventions that almost every row-oriented engine — PostgreSQL, Oracle, MySQL InnoDB, SQL Server, CUBRID — adopts in some form. CUBRID’s specific choices in ## CUBRID's Approach are best read as one set of dials within this shared design space, not as inventions.

Every page in a row-oriented heap is slotted: a small fixed header, a slot directory growing back from the end, and record bodies growing forward from the header. The free space in the middle is the only place inserts can land; deletes leave holes that are either reused (reusable-slot mode) or compacted on demand. The universal property: slots have stable identifiers, so the record’s OID / TID / RID does not change when its body is shifted during compaction.

The per-record identifier is a 3-tuple naming the heap file, the page within it, and the slot within the page. Index entries carry these triples; following one terminates at a slot, and the slot’s offset is the only thing that may move during compaction. PostgreSQL calls it (blocknumber, offsetnumber) and pairs it with xmin/ xmax; Oracle uses the same shape under ROWID; CUBRID uses OID = (volid, pageid, slotid).

In-page forwarding (relocation) for “grew slightly”

Section titled “In-page forwarding (relocation) for “grew slightly””

When an UPDATE makes the new image too big for its current slot, the engine has two cheap options before falling back to the overflow file:

  • Move within the page if there is enough free space (slot identifier stays valid; only the offset in the slot changes).
  • Relocate to another page in the same heap with a forwarding record in the original slot pointing at the new home. Index entries continue to land at the original OID; the engine transparently follows the forwarding record on read. PostgreSQL calls this an HOT chain or, in the cross-page case, a line-pointer redirect; Oracle calls the forwarding record a row migration; CUBRID calls them REC_RELOCATION + REC_NEWHOME.

Overflow file for “too big from the start”

Section titled “Overflow file for “too big from the start””

Records bigger than the page’s payload area cannot live in the heap file at all. They go to a separate overflow file as a page-spanning blob, and the heap stores only a tiny reference record pointing at the overflow location. CUBRID calls this REC_BIGONE + an unstructured page-aligned overflow record.

Finding “a page with enough free space for this record” is a recurring cost. The textbook solution is a per-heap free-space map or a small cache of best-fit pages. Almost every engine maintains something like it: PostgreSQL’s FSM, Oracle’s segment header free lists, InnoDB’s PAGE_LSN-driven free space pages. CUBRID maintains a Best Space cache of N pages plus a second-best hint that seeds full scans. Cache is approximate by design — exact tracking costs more than scanning.

To interpret a raw record’s attributes, the engine has to know the table’s schema (column count, types, offsets, indexes). The schema itself is a row in another heap (the catalog / root class), so a naive read parses two records: the catalog row, then the data row. The standard fix is a class-representation cache keyed by class OID. CUBRID’s is HEAP_CLASSREPR_CACHE; PostgreSQL’s relcache and syscache play the equivalent role.

MVCC version metadata in the record header

Section titled “MVCC version metadata in the record header”

When the engine adopts MVCC, the per-record header gains the standard (inserter, deleter, prev_version_pointer) triple plus a flag byte controlling which fields are physically present. The visibility predicate (see cubrid-mvcc.md) reads the inserter and deleter against the snapshot’s active set; the version chain walks the back-pointer when the current version is too new for the snapshot. CUBRID’s mvcc_rec_header lives inside each REC_HOME / REC_NEWHOME body.

The textbook concepts of §“Theoretical Background” map to CUBRID’s named entities as follows. ## CUBRID's Approach is the slow zoom into each row.

TheoryCUBRID name
Heap file identifierHFID = {volume_file_id, header_page_id}
Per-record identifierOID = {volid, pageid, slotid}
Slotted page headerSPAGE_HEADER in slotted_page.h
Slot directory entrySPAGE_SLOT (offset:14 + length:14 + type:4 = 32 bits)
Record types (vocabulary)9-mode INT16 enum: REC_HOME / REC_RELOCATION / REC_NEWHOME / REC_BIGONE / REC_ASSIGN_ADDRESS / REC_MVCC_NEXT_VERSION / REC_MARKDELETED / REC_DELETED_WILL_REUSE / REC_UNKNOWN
In-page forwarding recordREC_RELOCATION (header) + REC_NEWHOME (body)
Overflow recordREC_BIGONE (heap reference) + raw overflow record (separate file)
Per-heap headerHEAP_HDR_STATS (slot 0 of header page)
Per-page chain linkHEAP_CHAIN (slot 0 of every non-header page)
MVCC version metadata in recordmvcc_rec_header (mvcc.h) embedded inside REC_HOME body
Free-space hint cache (global)HEAP_STATS_BESTSPACE_CACHE
Schema-repr cache (global)HEAP_CLASSREPR_CACHE
Class OID → HFID cache (global)HEAP_HFID_TABLE (lock-free hash)
Per-scan local cacheHEAP_SCANCACHE (page latch + watcher + snapshot)
Per-attrinfo local cacheHEAP_CACHE_ATTRINFO (decoded record values)

CUBRID instantiates the conventions above with two layers — the generic slotted-page abstraction in slotted_page.{h,c} (used by heap, catalog, btree, and other subsystems) and the heap file manager in heap_file.{h,c} that adds the multi-shape record vocabulary, MVCC integration, and five caches. The distinguishing choices are: (1) a 9-record-type vocabulary that makes the forwarding / overflow / MVCC-tombstone states explicit and greppable; (2) the heap header is a real record (slot 0 of the header page) rather than reserved bytes — uniform with normal heap pages, which carry a HEAP_CHAIN in slot 0 instead; (3) the HEAP_OPERATION_CONTEXT struct unifies INSERT / UPDATE / DELETE through one switch on type, instead of three independent code paths; (4) a five-cache architecture (3 global + 2 local) keeps the hot paths off the catalog and off full heap scans.

Figure 1 — Heap file architecture

Figure 1 — Heap file layout. The first page (HFID.hpgid) is the header page: slot 0 holds HEAP_HDR_STATS, slot 1+ may hold records. Every subsequent page is a regular slotted page whose slot 0 holds a HEAP_CHAIN (prev_vpid / next_vpid linkage + max_mvccid for vacuum). All pages are double-linked into one chain rooted at the header. Records too big for any heap page go to a separate overflow file as raw REC_BIGONE payload; the heap holds only the small forwarding record. (Source: original “1. Heap Overview & Architecture” deck.)

flowchart LR
  A["INSERT / UPDATE / DELETE / READ"] --> B["heap_create_<op>_context\n→ HEAP_OPERATION_CONTEXT"]
  B --> C{"op type?"}
  C -- "INSERT" --> I["heap_insert_logical\n• consult bestspace cache\n• allocate slot\n• write record + MVCC header\n• log redo/undo"]
  C -- "UPDATE" --> U["heap_update_logical\n• read existing\n• adjust MVCC header\n  (prev_version_lsa)\n• maybe relocate / overflow\n• log undo with old image"]
  C -- "DELETE" --> D["heap_delete_logical\n• set delete_mvccid\n• special-case overflow\n• log redo/undo"]
  C -- "READ" --> R["heap_get_visible_version\n• fix home page\n• follow REC_RELOCATION /\n  REC_BIGONE if needed\n• mvcc_satisfies_snapshot\n• walk prev_version_lsa if too-new"]
  I --> Z["return result OID\nor SCAN_CODE"]
  U --> Z
  D --> Z
  R --> Z

Each labeled box is unpacked in the subsections below. The boxes do not move; only the level of detail increases.

Slotted page format — slotted_page.{h,c}

Section titled “Slotted page format — slotted_page.{h,c}”

The slotted page is the page-internal substrate: header at the top, slot directory at the bottom (growing back), record bodies in between (growing forward), and a free gap in the middle. CUBRID’s specific shape:

// SPAGE_HEADER — src/storage/slotted_page.h (condensed)
typedef struct spage_header SPAGE_HEADER;
struct spage_header
{
PGNSLOTS num_slots; /* total slots (including holes) */
PGNSLOTS num_records; /* slots actually holding records */
INT16 anchor_type; /* see Anchor types below */
unsigned short alignment;
int total_free; /* total free bytes (incl. holes) */
int cont_free; /* contiguous free area in middle */
int offset_to_free_area; /* end of last record body */
int reserved1;
int flags;
unsigned int is_saving:1; /* undo-image preservation */
unsigned int need_update_best_hint:1;
unsigned int reserved_bits:30;
};
// SPAGE_SLOT — src/storage/slotted_page.h
typedef struct spage_slot SPAGE_SLOT;
struct spage_slot
{
unsigned int offset_to_record:14; /* byte offset of body in page */
unsigned int record_length:14;
unsigned int record_type:4; /* see Record types below */
};

Each slot is 32 bits. With a 16 KB page (DB_PAGESIZE), the offset and length need 14 bits each (16 K ≤ 2^14). The four-bit record_type gives the nine-state record-type vocabulary.

flowchart LR
  subgraph Page["Slotted page (16 KB)"]
    direction LR
    H["SPAGE_HEADER\n(num_slots,\nnum_records,\nanchor_type,\ntotal_free,\noffset_to_free_area)"]
    R0["record body 0"]
    R1["record body 1"]
    R2["record body 2"]
    GAP["… free gap …"]
    S2["slot 2"]
    S1["slot 1"]
    S0["slot 0"]
    H --> R0 --> R1 --> R2 --> GAP --> S2 --> S1 --> S0
  end

The slot at index k never moves — its offset can change (when record bodies are shuffled during compaction) but its identity is fixed. That stability is what makes the OID (volid, pageid, slotid) durable across compactions.

The SPAGE_HEADER.anchor_type field selects the page’s policy on slot reuse:

Anchor typeSlot reuse?Use case
ANCHOREDNoHeap page in FILE_HEAP (slot ID is the on-disk OID — must not be reassigned)
ANCHORED_DONT_REUSE_SLOTSNo(alias / explicit form)
UNANCHORED_ANY_SEQUENCEYesSort runs, query results — slot identity is ephemeral
UNANCHORED_KEEP_SEQUENCEYes (in order)Catalogs / indexes that need stable slot order

Heap pages are always anchored — once a slot ID has been handed out as part of an OID, the slot must remain bound to that OID. After a DELETE, the slot is marked REC_MARKDELETED (kept) or REC_DELETED_WILL_REUSE (reusable in the special FILE_HEAP_REUSE_SLOTS file type, used for some system tables).

Heap file structure — HFID, HEAP_HDR_STATS, HEAP_CHAIN

Section titled “Heap file structure — HFID, HEAP_HDR_STATS, HEAP_CHAIN”

A HFID names a heap file by its first page:

// HFID — src/storage/storage_common.h (sketch)
typedef struct hfid HFID;
struct hfid
{
VFID vfid; /* file identifier (volume + file table id) */
INT32 hpgid; /* page id of the header page */
};

The header page is just a slotted page whose slot 0 holds the heap-wide stats record. Every subsequent page also reserves slot 0, but for the per-page chain record:

// HEAP_HDR_STATS — src/storage/heap_file.c (condensed)
typedef struct heap_hdr_stats HEAP_HDR_STATS;
struct heap_hdr_stats
{
OID class_oid; /* the class whose rows live here */
VFID ovf_vfid; /* overflow file for REC_BIGONE records */
VPID next_vpid; /* first non-header heap page */
int unfill_space; /* per-page reserve kept free for UPDATEs */
struct
{
int num_pages;
int num_recs;
float recs_sumlen;
int num_other_high_best;
int num_high_best;
int num_substitutions;
int num_second_best;
int head_second_best;
int tail_second_best;
int head;
VPID last_vpid;
VPID full_search_vpid;
VPID second_best[HEAP_NUM_BEST_SPACESTATS];
HEAP_BESTSPACE best[HEAP_NUM_BEST_SPACESTATS];
} estimates; /* approximate heap-wide stats + best-space hint */
int reserved0_for_future;
int reserved1_for_future;
int reserved2_for_future;
};
// HEAP_CHAIN — src/storage/heap_file.c
typedef struct heap_chain HEAP_CHAIN;
struct heap_chain
{
OID class_oid;
VPID prev_vpid;
VPID next_vpid;
MVCCID max_mvccid; /* max MVCCID seen on this page (vacuum) */
INT32 flags;
};

Two facts encoded by this layout:

  1. Slot 0 is always reserved. HEAP_HEADER_AND_CHAIN_SLOTID = 0 in heap_file.h. User data starts at slot 1. Code that scans records skips slot 0 explicitly.
  2. The double linkage is per-page. HEAP_CHAIN.prev_vpid / next_vpid form a doubly-linked list of heap pages rooted at the header. This is what heap_next walks when no scan-cache hint is available.

A heap record is one of nine types, encoded in the slot’s 4-bit record_type:

TypeWhere bytes liveMeaning
REC_HOMEInline in this slotNormal record on its home page.
REC_RELOCATIONThis slotForwarding pointer (8 bytes: target OID) — body lives elsewhere as REC_NEWHOME.
REC_NEWHOMEInline (different page)The actual body for a relocated record. Reachable only via REC_RELOCATION.
REC_BIGONEThis slotForwarding pointer to overflow file (target VPID).
REC_ASSIGN_ADDRESSThis slot (zero-length body)An OID has been minted but no record yet — used by catalog / root class flows that need the OID before the record exists.
REC_MVCC_NEXT_VERSIONThis slot(Legacy MVCC version-chain marker; superseded by prev_version_lsa in the record header.)
REC_MARKDELETEDTombstoneSlot deleted on an ANCHORED_DONT_REUSE_SLOTS page (kept forever to preserve OID identity).
REC_DELETED_WILL_REUSETombstone (reusable)Slot deleted on an ANCHORED page (FILE_HEAP_REUSE_SLOTS file type — slot may be re-handed-out).
REC_UNKNOWNReserved / sentinel.

Of these, four are live in normal heap reads: REC_ASSIGN_ADDRESS, REC_HOME, REC_BIGONE, REC_RELOCATION. REC_NEWHOME is reachable only via REC_RELOCATIONheap_next filters it out so it is never returned twice.

Figure 2 — Record types

Figure 2 — Record types in a CUBRID heap. Slotted Page A holds two REC_HOME records (orange) and one REC_RELOCATION forwarding pointer (yellow) → its body lives as a REC_NEWHOME (olive) on Slotted Page B. Records too big for any heap page become a REC_BIGONE forwarding pointer (blue) → the actual body lives as an unmarked overflow record in the overflow file (HEAP_HDR_STATS.ovf_vfid). Index entries always point to the original OID; CUBRID transparently follows the forwarding chain on read. (Source: original “2. Heap Operations” deck.)

// heap_insert_logical (sketch) — src/storage/heap_file.c
int
heap_insert_logical (THREAD_ENTRY *thread_p,
HEAP_OPERATION_CONTEXT *context,
PGBUF_WATCHER *home_hint_p)
{
/* 1. Adjust MVCC header on the record (set ins_mvccid). */
/* 2. If record too big → insert into overflow file first; build
* REC_BIGONE forwarding record into context->recdes_p. */
/* 3. Find a target home page:
* - check HEAP_STATS_BESTSPACE_CACHE
* - else consult HEAP_HDR_STATS.estimates.best[]
* - else scan up to min(20% of pages, 100 pages) starting from
* full_search_vpid / second_best[]
* - else allocate a new heap page (heap_alloc_new_page).
* Once a slot is chosen, the resulting OID is determined and
* its row lock can be acquired — the heap operation itself is
* one of the few code paths that takes the row lock inside
* rather than before the call (UPDATE / DELETE acquire upstream
* in locator). */
/* 4. Splice the record into the slot. */
/* 5. Log redo/undo for crash recovery. */
/* 6. Post-process: bump statistics, update the bestspace cache,
* flag the schema cache (CAS) if this insert is into a
* catalog row. */
}

Three things to internalize:

  1. MVCCID is stamped before the record is placed. The flag bits in the record header are set so the runtime knows whether mvcc_ins_id, mvcc_del_id, prev_version_lsa are present.
  2. OID is decided by the slot, not requested in advance. Hence the row lock is acquired during INSERT, not before. (Compare with UPDATE / DELETE, which already know the target OID and lock upstream in locator.)
  3. “Best” is a hint, not a search. The cache and the estimates bias toward pages-with-room; if they fail, the engine falls back to a bounded scan, then to allocation. The cost amortizes — most inserts take O(1) cache hits.

UPDATE is a superset of INSERT plus a read-old + maybe-relocate step:

flowchart TD
  A["heap_update_logical(context)"] --> B["read existing record\n(via home → REC_RELOCATION → REC_NEWHOME\nor home → REC_BIGONE chain)"]
  B --> C["heap_update_adjust_recdes_header\n• stamp new ins_mvccid\n• fill prev_version_lsa from undo write"]
  C --> D{"new size fits\noriginal slot?"}
  D -- "yes" --> E["overwrite in place\n(REC_HOME → REC_HOME)"]
  D -- "no, fits other heap page" --> F["allocate target slot,\nsplice REC_NEWHOME there,\nrewrite home as REC_RELOCATION"]
  D -- "no, too big for any page" --> G["insert into overflow file,\nrewrite home as REC_BIGONE"]
  E --> Z["log undo + redo"]
  F --> Z
  G --> Z

The current type can also transition. The matrix below summarizes the legal transitions a single UPDATE can produce. Each transition also involves cleanup of the old type’s storage (e.g., REC_RELOCATION’s old REC_NEWHOME slot has to be marked deleted).

Old typeNew type alternatives (priority order)
REC_HOMEREC_HOME (same slot) → REC_RELOCATION + REC_NEWHOMEREC_BIGONE + overflow body
REC_RELOCATION + REC_NEWHOMEREC_HOMEREC_RELOCATION + REC_NEWHOME (same other-page) → REC_RELOCATION + REC_NEWHOME (different page) → REC_BIGONE + overflow
REC_BIGONE + overflow bodyREC_HOMEREC_RELOCATION + REC_NEWHOMEREC_BIGONE + overflow (same body, in place if fits)

The undo image is what the prior version of the row will eventually reach via prev_version_lsa — so the LSN of the undo log entry is what gets stamped into the new record’s header.

DELETE under MVCC does not physically remove the record. It sets mvcc_del_id in the existing record header and logs the operation. Vacuum (heap_vacuum_all_objects) is the one that actually frees the slot once mvcc_del_id is older than the global oldest-visible MVCCID. The DELETE flow therefore looks a lot like UPDATE: read the record, adjust the header, log.

Two special cases:

  1. Overflow records (REC_BIGONE). The MVCC header lives on the overflow page (heap_set_mvcc_rec_header_on_overflow). The delete path edits that page’s record, not the heap-page forwarder. The forwarder is updated only when the overflow record itself is freed.
  2. Non-MVCC classes. Some system catalogs are MVCC-disabled (HEAP_SCANCACHE.mvcc_disabled_class = true). For those, DELETE physically marks the slot REC_MARKDELETED / REC_DELETED_WILL_REUSE immediately.

Read flow — heap_get_visible_version / heap_next

Section titled “Read flow — heap_get_visible_version / heap_next”

Two entry points: a point-read by OID (used after an index lookup) and a sequential scan (used for SELECT *).

// heap_get_visible_version (sketch) — src/storage/heap_file.c
SCAN_CODE
heap_get_visible_version (THREAD_ENTRY *thread_p,
const OID *oid, OID *class_oid,
RECDES *recdes,
HEAP_SCANCACHE *scan_cache,
int ispeeking, int old_chn)
{
/* 1. heap_prepare_object_page: fix the home page (using
* scan_cache->page_watcher to keep the latch across calls). */
/* 2. Inspect the slot type:
* - REC_HOME → done, body is here.
* - REC_RELOCATION → fix forward page, follow to REC_NEWHOME.
* - REC_BIGONE → fix overflow page, read raw record.
* - REC_ASSIGN_ADDRESS / REC_NEWHOME / REC_MARKDELETED /
* REC_DELETED_WILL_REUSE / REC_UNKNOWN → not visible. */
/* 3. Read mvcc_rec_header.
* Apply mvcc_satisfies_snapshot (see cubrid-mvcc.md):
* - SNAPSHOT_SATISFIED → return record body.
* - TOO_OLD_FOR_SNAPSHOT → return S_DOESNT_EXIST.
* - TOO_NEW_FOR_SNAPSHOT → walk prev_version_lsa into the log
* and re-evaluate. */
}

heap_next is the scan variant: it walks the page chain, then within each page walks slots, calling heap_get_visible_version per slot and skipping non-live record types (REC_NEWHOME, REC_ASSIGN_ADDRESS, header / chain, empty slots). The HEAP_SCANCACHE keeps the page latch across calls (cache_last_fix_page = true) so consecutive heap_next calls within the same page do not pay re-fix cost.

flowchart LR
  TX["TX with snapshot"] --> POINT["point read\nheap_get_visible_version(oid)"]
  TX --> SCAN["sequential scan\nheap_next(scan_cache)"]
  POINT --> FIX1["heap_prepare_object_page"]
  SCAN --> FIX2["page chain walk\n(HEAP_CHAIN.next_vpid)"]
  FIX1 --> SLOT
  FIX2 --> SLOT["per-slot dispatch\nby record_type"]
  SLOT -- "HOME" --> READ["read body in place"]
  SLOT -- "RELOCATION" --> NEW["follow → REC_NEWHOME"]
  SLOT -- "BIGONE" --> OVF["follow → overflow file"]
  NEW --> READ
  OVF --> READ
  READ --> VIS["mvcc_satisfies_snapshot"]
  VIS -- "SATISFIED" --> RET["return record"]
  VIS -- "TOO_NEW" --> CHAIN["walk prev_version_lsa\ninto log"]
  VIS -- "TOO_OLD" --> SKIP["skip"]
  CHAIN --> VIS

Every REC_HOME / REC_NEWHOME body and every overflow record carries a mvcc_rec_header whose layout was specified in cubrid-mvcc.md:

// mvcc_rec_header — src/transaction/mvcc.h (recap)
struct mvcc_rec_header
{
INT32 mvcc_flag:8; /* which optional fields are present */
INT32 repid:24; /* representation id */
int chn; /* cache coherency number */
MVCCID mvcc_ins_id; /* set on insert / update */
MVCCID mvcc_del_id; /* set on delete */
LOG_LSA prev_version_lsa; /* back-pointer to previous version
* (lives in undo log) */
};

Figure 3 — MVCC version chain across heap and log

Figure 3 — A record updated twice without intervening DELETE. The heap (top row) holds the current version (mvcc_ins_id3, prev_version_lsa3). Following prev_version_lsa3 lands on a log record in the log file (bottom rows) carrying the previous image with (mvcc_ins_id2, prev_version_lsa2), which in turn points to the original (mvcc_ins_id1, prev_version_lsa1 = NULL). The visibility predicate walks this chain when the heap version is too new for the reading snapshot. (Source: original “3. MVCC in Heap” deck.)

The flag byte controls physical layout — a record that has never been deleted carries no mvcc_del_id field, saving 8 bytes; a record that has never been updated carries no prev_version_lsa. The mvcc_header_size_lookup[8] array (heap_file.c) maps the flag-byte value to the resulting on-disk header size.

The per-page vacuum tracking lives in HEAP_PAGE_VACUUM_STATUS (HEAP_PAGE_VACUUM_NONE / _ONCE / _UNKNOWN). The status lives in HEAP_CHAIN.flags and predicts whether this page still needs future vacuum visits, which gates page deallocation.

CUBRID maintains five caches around the heap manager. The classification (“global” vs “local”) is the curator’s, not in source.

flowchart TB
  subgraph G["Global caches (process-wide)"]
    BS["HEAP_STATS_BESTSPACE_CACHE\nhfid → list of (vpid, freespace)\nseeds insert page selection"]
    CR["HEAP_CLASSREPR_CACHE\nclass_oid → OR_CLASSREP\n(parsed schema, attr offsets, btids)"]
    HF["HEAP_HFID_TABLE\nclass_oid → HFID + classname\n(lock-free hash, size 1000)"]
  end
  subgraph L["Local caches (per scan / op)"]
    SC["HEAP_SCANCACHE\nscan-scoped\n(page_watcher, mvcc_snapshot,\ncache_last_fix_page, file_type)"]
    AI["HEAP_CACHE_ATTRINFO\nrecord-decoding scratchpad\n(read_classrepr, values[])"]
  end

  HF --> BS
  HF --> CR
  CR --> AI
  SC --> AI

Best space cacheHEAP_STATS_BESTSPACE_CACHE (in heap_file.c). Maps HFID → small array of (VPID, freespace) entries. INSERT / UPDATE consult this cache before scanning the heap. The on-disk equivalent in HEAP_HDR_STATS.estimates.best[] is the persistent fallback when the in-memory cache is cold or stale. Sync between cache and on-disk happens lazily via heap_stats_sync_bestspace. The second_best[] array is a hint used as a starting point for full scans (treated as a circular queue with head_second_best / tail_second_best); the num_substitutions counter makes promotion to best[] happen only once every 1 000 insertions, deliberately spreading writes across pages instead of packing them into one.

Class-representation cacheHEAP_CLASSREPR_CACHE. An OR_CLASSREP is the parsed schema for a single class: column types, attribute offsets within a raw record, list of indexes (BTIDs), representation id. Without this cache, every heap_attrinfo_* call would re-read the catalog row and re-parse it. Eviction is handled via heap_classrepr_decache.

HFID tableHEAP_HFID_TABLE. A lock-free hash table from class OID to HFID + cached classname. Hash size is HEAP_HFID_HASH_SIZE = 1000. Because almost every operation needs the HFID for locking and heap traversal, this cache short-circuits the catalog lookup.

// HEAP_HFID_TABLE_ENTRY — src/storage/heap_file.h
struct heap_hfid_table_entry
{
OID class_oid; /* key */
HEAP_HFID_TABLE_ENTRY *stack;
HEAP_HFID_TABLE_ENTRY *next;
UINT64 del_id; /* lock-free reclamation */
HFID hfid; /* value */
FILE_TYPE ftype; /* FILE_HEAP or FILE_HEAP_REUSE_SLOTS */
std::atomic<char*> classname;
};

Scan cacheHEAP_SCANCACHE. Per-scan local state. Crucially holds:

  • cache_last_fix_page — whether to keep the page latch across consecutive heap_next calls.
  • mvcc_snapshot — the snapshot to evaluate visibility against.
  • page_latchLOCK mode to take on each new page (often NULL_LOCK because the class has already been locked S/X/SIX upstream).
  • page_watcher — physical page latch handle held across calls.
  • partition_list — for partitioned classes, the list of HFIDs currently being scanned.

AttrInfo cacheHEAP_CACHE_ATTRINFO. A scratchpad that holds an OR_CLASSREP and an array of decoded HEAP_ATTRVALUEs for one record being read or written. Used by heap_attrinfo_read_dbvalues (decoding) and heap_attrinfo_transform_to_disk (encoding). The last_classrepr field tracks the latest representation; the read_classrepr field tracks the representation actually used for the record being read (which may differ from the latest if the record predates a schema change).

CHN — cache coherency between client and server

Section titled “CHN — cache coherency between client and server”

mvcc_rec_header.chn is the cache coherency number, a non-MVCC counter used by client-side caches. When a query reads a record, the server returns chn along with the data; if the client later asks for the same record and supplies the previous chn, the server can skip resending the body if chn is unchanged.

For MVCC tables, chn does not increment on every update (that is what mvcc_ins_id is for); it increments only for non-MVCC tables (root class, system catalogs). The HEAP_CHNGUESS cache (marked “to-be-used” in source — partially wired, not fully active on the hot path) is intended to short-circuit the chn comparison for hot OIDs.

Anchor on symbol names, not line numbers. The CUBRID source moves; a function name (or struct/enum tag) is the stable handle. Use git grep -n '<symbol>' src/storage/ to locate the current position. The line numbers in the position-hint table at the end of this section were observed when the document was last updated: and are intended only as quick hints.

  • struct hfid (in storage_common.h) — (VFID, hpgid) heap-file identifier.
  • struct spage_header (in slotted_page.h) — slotted-page header.
  • struct spage_slot (in slotted_page.h) — 32-bit slot directory entry (offset / length / type).
  • struct heap_hdr_stats (in heap_file.c) — heap-wide header record (slot 0 of header page).
  • struct heap_chain (in heap_file.c) — per-page chain link (slot 0 of every non-header page).
  • struct heap_scancache (in heap_file.h) — scan-scoped local state.
  • struct heap_operation_context (in heap_file.h) — INSERT / UPDATE / DELETE input + output bundle.
  • struct heap_get_context (in heap_file.h) — read-by-OID input
    • output bundle.
  • struct heap_hfid_table / heap_hfid_table_entry (in heap_file.h) — class OID → HFID lock-free hash.
  • struct heap_cache_attrinfo (in heap_attrinfo.h) — per-record decoder scratchpad.
  • enum HEAP_OPERATION_TYPE (in heap_file.h) — INSERT / UPDATE / DELETE.
  • enum HEAP_PAGE_VACUUM_STATUS (in heap_file.h) — NONE / ONCE / UNKNOWN.
  • Record-type constants (REC_HOME, REC_RELOCATION, REC_NEWHOME, REC_BIGONE, REC_ASSIGN_ADDRESS, REC_MVCC_NEXT_VERSION, REC_MARKDELETED, REC_DELETED_WILL_REUSE, REC_UNKNOWN) (in slotted_page.h).
  • Anchor types ANCHORED, ANCHORED_DONT_REUSE_SLOTS, UNANCHORED_ANY_SEQUENCE, UNANCHORED_KEEP_SEQUENCE (in slotted_page.h).
  • heap_manager_initialize — module init.
  • heap_manager_finalize — module teardown.
  • heap_initialize_hfid_table / heap_finalize_hfid_table — HFID cache.
  • heap_classrepr_restart_cache — clears CR cache (used after schema-change events).

Operation entry points (src/storage/heap_file.c)

Section titled “Operation entry points (src/storage/heap_file.c)”
  • heap_create_insert_context / heap_insert_logical.
  • heap_create_update_context / heap_update_logical.
  • heap_create_delete_context / heap_delete_logical.
  • heap_assign_address — pre-mint an OID (REC_ASSIGN_ADDRESS) before the record is written.
  • heap_get_visible_version — point read with MVCC.
  • heap_get_last_version — point read returning latest committed version regardless of snapshot.
  • heap_next / heap_prev / heap_first / heap_last — scan iterators.
  • heap_scancache_start / heap_scancache_end — scan-cache lifecycle.

Page walk and overflow (src/storage/heap_file.c)

Section titled “Page walk and overflow (src/storage/heap_file.c)”
  • heap_vpid_next / heap_vpid_prev / heap_vpid_skip_next — navigate the heap page chain.
  • heap_alloc_new_page — allocate a new heap page and link it.
  • heap_ovf_find_vfid / heap_ovf_delete — overflow file management for REC_BIGONE.
  • heap_set_mvcc_rec_header_on_overflow / heap_get_mvcc_rec_header_from_overflow — MVCC header manipulation on overflow records.
  • heap_remove_page_on_vacuum / heap_page_set_vacuum_status_none / heap_page_get_vacuum_status — page-level vacuum coordination.
  • heap_classrepr_get / heap_classrepr_free / heap_classrepr_decache — class-representation cache.
  • heap_get_class_info / heap_cache_class_info / heap_get_hfid_if_cached — HFID table lookups.
  • heap_stats_update / heap_should_try_update_stat / heap_stats_sync_bestspace — bestspace cache sync.
  • heap_attrinfo_start / heap_attrinfo_end / heap_attrinfo_read_dbvalues / heap_attrinfo_transform_to_disk — AttrInfo cache.
  • heap_chnguess_get / heap_chnguess_put / heap_chnguess_clear — CHN guess cache.
  • heap_rv_undo_insert / heap_rv_redo_insert / heap_rv_mvcc_redo_insert.
  • heap_rv_undo_delete / heap_rv_redo_delete / heap_rv_mvcc_undo_delete / heap_rv_mvcc_redo_delete_*.
  • heap_rv_undo_update / heap_rv_redo_update / heap_rv_undoredo_update / heap_rv_redo_update_and_update_chain.
  • heap_rv_redo_newpage / heap_rv_redo_reuse_page.

These line numbers held when the document was last updated:. If you land at a different definition, the symbol name above is authoritative; update the table on your way through.

SymbolFileLine
struct spage_headerslotted_page.h(varies)
struct spage_slotslotted_page.h(varies)
struct heap_hdr_statsheap_file.c(varies, near top)
struct heap_chainheap_file.c(varies, near top)
struct heap_scancacheheap_file.h143
struct heap_operation_contextheap_file.h267
struct heap_get_contextheap_file.h362
struct heap_hfid_table_entryheap_file.h201
enum HEAP_PAGE_VACUUM_STATUSheap_file.h354
heap_manager_initializeheap_file.c(varies)
heap_insert_logicalheap_file.c(varies)
heap_update_logicalheap_file.c(varies)
heap_delete_logicalheap_file.c(varies)
heap_get_visible_versionheap_file.c(varies)
heap_nextheap_file.c(varies)
heap_classrepr_getheap_file.c(varies)
heap_alloc_new_pageheap_file.c(varies)

The line column is intentionally left as (varies) for the heap file because it is the largest source file in the project (≈ 27 000 lines). Symbol-level git grep is the recommended lookup.

Each entry is a fact about the current source — readable without the original analysis materials. The trailing note shows how it was checked and, where relevant, historical drift or limits of verification. Open questions follow as the curator’s recorded gaps; future readers should treat them as starting points, not as known bugs.

Heading date bumped 2026-05-01 to align with the frontmatter updated: field; the underlying source-verification reads were performed on 2026-04-29 and have not drifted since (no re-verification this pass). Each fact bullet retains its original 2026-04-29 verification date.

  • HEAP_HEADER_AND_CHAIN_SLOTID = 0. Verified in heap_file.h on 2026-04-29. Slot 0 is unconditionally reserved on every heap page — for HEAP_HDR_STATS on the header page, for HEAP_CHAIN on every other page. Code that scans records skips slot 0 explicitly; the scan iterators (heap_next, heap_first) treat slot 0 as not-a-record.

  • The HFID lock-free hash table has 1 000 buckets. HEAP_HFID_HASH_SIZE = 1000 in heap_file.h, verified 2026-04-29. Hard-coded; not a runtime parameter.

  • The “page is comfortably free” threshold is 30 % of page size. HEAP_DROP_FREE_SPACE = (int)(DB_PAGESIZE * 0.3) in heap_file.h, verified 2026-04-29. With the default 16 KB page, a page enters the bestspace cache only when at least ≈ 4 915 bytes are free. The threshold trades insert-speed (more candidate pages, fewer scans) for storage utilization (reserved space sits unused on partially-filled pages).

  • Slot fields are 14/14/4 bits. spage_slot is offset_to_record:14 + record_length:14 + record_type:4 = 32 bits. Verified in slotted_page.h on 2026-04-29. The 14-bit cap on offset matches DB_PAGESIZE = 16384 exactly (2^14); a page-size increase beyond 16 KB would require widening this field.

  • Slot zero is not the only special slot — anchor type also matters. ANCHORED_DONT_REUSE_SLOTS makes the slot table monotone-grow-only; UNANCHORED_* allows reuse. Heap pages of FILE_HEAP use ANCHORED_DONT_REUSE_SLOTS so OIDs never alias across deletions; pages of FILE_HEAP_REUSE_SLOTS (used for some root-class catalogs) use ANCHORED to allow slot reuse. Verified in slotted_page.c on 2026-04-29 by reading spage_slot_descriptor_setup and the FILE_HEAP_* type mapping.

  • HEAP_OPERATION_CONTEXT carries four PGBUF_WATCHERs. home_page_watcher, overflow_page_watcher, header_page_watcher, forward_page_watcher. An UPDATE that relocates a record across heap pages can hold up to three of them simultaneously (home, forward, header). The fourth (overflow) is for REC_BIGONE paths. Verified in heap_file.h on 2026-04-29.

  • The record-type vocabulary is 4 bits = 16 values; only 9 are used. Verified in slotted_page.h on 2026-04-29. Values 9–15 are reserved.

  • mvcc_header_size_lookup[8] indexes by flag-byte value. The MVCC flag byte has 5 bits in OR_MVCC_FLAG_MASK = 0x1f (see cubrid-mvcc.md), which gives 32 possible values, but only combinations of the 3 documented bits actually occur, hence the 8-entry lookup. Confirmed via heap_file.c on 2026-04-29.

  1. Why a hash size of exactly 1 000? HEAP_HFID_HASH_SIZE = 1000 is hard-coded with no obvious tuning rationale. Investigation path: trace the value through git history; check whether workloads with > 1 000 classes (large schemas) experience hash collisions on the hot path; consider whether it should scale with max_clients or num_classes.

  2. num_substitutions = 1000 — why this constant for second-best promotion? The bestspace cache promotes a page from second_best to best only every 1 000 insertions. The deck says this is to “spread out” inserts across pages by leaving high-locality space underutilized. Investigation path: instrument the promotion rate under different workloads; confirm the locality argument empirically.

  3. REC_MVCC_NEXT_VERSION looks legacy. The comment in slotted_page.h describes it as “legacy MVCC version-chain marker”, and the modern path uses prev_version_lsa in the record header instead. Is REC_MVCC_NEXT_VERSION actually produced anywhere in the current code, or is it only consumed for backward compatibility on old data files? Investigation path: git grep REC_MVCC_NEXT_VERSION and trace producer call sites.

  4. CHN guess cache (HEAP_CHNGUESS) status. The deck calls it “TBU” (to-be-used) and the source has the structure but limited use. Is this complete and live, partially abandoned, or a stub? Investigation path: check call sites of heap_chnguess_get / heap_chnguess_put and run a CAS workload to see whether the cache fills.

  5. unfill_space behavior in modern builds. HEAP_HDR_STATS reserves a per-page unfill_space for future UPDATE growth. The deck describes this as a static reserve, but recent commits may have made it adaptive. Investigation path: trace the field’s assignment paths and whether any vacuum step rebalances it.

  6. Interaction with OOS (Out-of-row Overflow Storage). The feat/oos branch is reshaping heap_file.c and the record header to add OR_MVCC_FLAG_HAS_OOS. This document targets develop; once OOS lands, the record-type vocabulary and the overflow flow described here will need a follow-up. (See the adjacent OOS context skill for current status.)

Beyond CUBRID — Comparative Designs & Research Frontiers

Section titled “Beyond CUBRID — Comparative Designs & Research Frontiers”

Pointers, not analysis. Each bullet is a starting handle for a follow-up doc; depth here is intentionally shallow.

  • PostgreSQL HOT (Heap-Only Tuple). PostgreSQL keeps old versions inline on the same page when an UPDATE does not touch any indexed column, avoiding new index entries. CUBRID’s in-place vs REC_RELOCATION decision is structurally similar but driven by size rather than index touch. A direct comparison would clarify which workload regime each policy optimizes.
  • PostgreSQL TOAST (The Oversized-Attribute Storage Technique) vs CUBRID REC_BIGONE + overflow file. Both push large values out of the heap, but TOAST is per-attribute and inline-line- pointer; CUBRID is per-record and to a separate file. Different trade-offs for selective scans vs full reads.
  • InnoDB clustered-index storage. InnoDB stores rows ordered by PK in a B-tree leaf, not in a heap. The heap manager abstraction does not exist; the equivalent is the clustered index’s leaf-page layout. Comparison would frame why CUBRID separates “where rows live” from “where indexes point”.
  • In-place vs out-of-place version chains. PostgreSQL keeps old versions inline on the heap; Oracle UNDO and CUBRID prev_version_lsa push old versions into the log. Wu et al. In-Memory MVCC Empirical Evaluation (VLDB 2017) compares these design points quantitatively. Worth quantifying for CUBRID’s typical OLTP workload.
  • MVCC garbage-collection scheduling. CUBRID’s HEAP_PAGE_VACUUM_STATUS predicts whether a page needs further vacuum visits before deallocation. Compare with PostgreSQL’s visibility map / freeze map and Oracle’s UNDO retention tuning.
  • Column-store storage as an alternative. Engines that store columns separately (Vertica, MonetDB, Snowflake) make slotted-page mechanics largely irrelevant. CUBRID is row-only today; positioning the heap manager against column-store designs sets up “what would a column-store CUBRID look like” conversations.
  • Recent paper trail. Wu et al., empirical MVCC (VLDB 17); the OOS feature design doc on this knowledge base (in progress); Stoica & Ailamaki, “Enabling Efficient OS Paging for Main-Memory OLTP Databases” (DaMoN 13) for the buffer-pool / heap interaction.

The intent of this section is to seed next documents, not to analyze. Each bullet should become its own curated note when its turn comes.

Raw analyses (under raw/code-analysis/cubrid/storage/heap_manager/)

Section titled “Raw analyses (under raw/code-analysis/cubrid/storage/heap_manager/)”
  • 1._Heap_Overview_Architecture.pdf — heap file structure, slotted page primer, record types overview.
  • 2._Heap_Operations.pdf — INSERT / UPDATE / DELETE / READ flows with worked examples and the update-transition matrix.
  • 3._MVCC_in_Heap.pdf — MVCC record header layout, visibility predicate, prev_version_lsa chain.
  • 4._Caches_in_heap.pdf — bestspace cache, classrepr cache, HFID table, scancache, attrinfo cache.
  • 5._Record_in_heap.pdf — record-format details (representation id, flag bits, fixed/variable attributes).
  • [코드분석]heap.pptx — presentation deck consolidating the five-part series, with diagrams of slotted page layout, record types, update transitions, and cache architecture. Most useful text-side index of figures.
  • slotted_page.pdf / slotted page_min.pdf — slotted-page format reference, anchor types, record types.
  • CUBRID-Multiple_Page_Ordered_Fix.pdf — multi-page latching protocol used by PGBUF_WATCHER chains in heap operations.
  • TK-0811-037-CUBRID_Heap_File_Manager.pdf — original technical document; structural reference for the heap file manager.
  • DML Log sequence.pdf — DML → log record sequence reference.
  • Storage – Concurrency 코드 분석 — module-level context placing Heap Manager underneath Lock Manager / MVCC / Vacuum.

Textbook chapters (under knowledge/research/dbms-general/)

Section titled “Textbook chapters (under knowledge/research/dbms-general/)”
  • Database Internals (Petrov), Ch. 3 “File Formats” and Ch. 4 “Implementing B-Trees” — slotted page, RID semantics, free-space management.
  • Database Systems: The Complete Book (Garcia-Molina, Ullman, Widom), §13.7 “Variable-Length Data and Records” — forwarding, TOAST-style overflow, in-page reorganization.

CUBRID source (under /data/hgryoo/references/cubrid/)

Section titled “CUBRID source (under /data/hgryoo/references/cubrid/)”
  • src/storage/heap_file.h
  • src/storage/heap_file.c
  • src/storage/slotted_page.h
  • src/storage/slotted_page.c
  • src/storage/heap_attrinfo.h (HEAP_CACHE_ATTRINFO type)
  • src/transaction/mvcc.h (mvcc_rec_header type, recapped here)