CUBRID Overflow File — Heap Big-Record and B+Tree Overflow-OID Page Chains
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Cross-check Notes
- Open Questions
- Sources
Theoretical Background
Section titled “Theoretical Background”The slotted page (cubrid-heap-manager.md §“Slotted page”) solves the “variable-length records on a fixed-size page” problem only within the page. Once a record itself outgrows what a page can hold, or once the per-key satellite payload of a non-unique index outgrows what a single leaf record can hold, slotted layout has nothing left to say. Database Internals (Petrov, ch. 3 §“File Formats”, §“Variable-Size Records”) frames the three classical strategies:
- In-line growth with tombstones. Keep the record on its home page; if it grows beyond the slot, move it within the page and leave a tombstone. Works only for small growth. Garcia-Molina / Ullman / Widom Database Systems: The Complete Book, §13.7 “Variable-Length Data and Records”.
- In-line forwarding. Move the new image to another page in the
same heap; leave a forwarding pointer at the original slot.
Indexes still point to the original OID; reads pay one extra
page-fix to follow the pointer. CUBRID’s
REC_RELOCATION+REC_NEWHOME(cubrid-heap-manager.md §“Record-type vocabulary”). - Out-of-line overflow. When the record is too big for any page in the heap, push the bulk to a separate file (the overflow file) and store only a small reference at the home slot. The reference is the original OID’s permanent identity; the overflow chain is private to it. Comer (1979) and the IBM System R notes both describe this for “long fields”; PostgreSQL formalized it as TOAST (The Oversized-Attribute Storage Technique).
The same problem appears one level up in B+Tree non-unique
indexes. A leaf record carries key || OID_list; when too many
duplicates accumulate for one key, the OID list outgrows the leaf
page. The textbook fix (Bayer, Lehman / Yao, and successors) is
identical in shape: spill the surplus into a per-key overflow
chain — distinct from the heap’s overflow file but built on the
same disk-file substrate.
CUBRID has one symbol-level overflow-file module
(src/storage/overflow_file.{h,c}) that serves both spillover
needs, plus two callers that wrap it differently:
- Heap “big record” path —
heap_ovf_*inheap_file.c. - B+Tree overflow-OID-list path —
btree_*overflow*inbtree.c(usesoverflow_file.conly for overflow keys; the OID-list overflow chain is a different on-page layout built directly on slotted B+Tree pages, but reuses the same “linked list of pages owned by one logical record” model).
This document closes the gap that cubrid-heap-manager.md and
cubrid-btree.md both leave open by pointing at “see
overflow_file.c”.
Common DBMS Design
Section titled “Common DBMS Design”Every row-store with variable-length records and non-unique indexes reaches for some form of out-of-line spillover. The shared ingredients:
One overflow file per containing object
Section titled “One overflow file per containing object”The unit of allocation is the file (CUBRID’s VFID,
PostgreSQL’s toast relation, InnoDB’s off-page page chain).
One overflow file is bound to one heap, or one B+Tree, or one
column. Pages allocated from that file are reused only by the
same owner — no cross-tenant overflow.
Single-linked page chain per logical record
Section titled “Single-linked page chain per logical record”The bulk of the data lives in a chain of pages, each carrying a “next-page” pointer in a small header. Random access into the middle of a long record is supported by walking from the head, optionally skipping pages whose offsets the caller already knows. Single-linked is sufficient because all callers know the head page (it is what the home record’s reference points at).
Reference record at the original location
Section titled “Reference record at the original location”The original location keeps a small fixed-size record carrying just enough to find the overflow chain: a page identifier and sometimes a length. Indexes and other readers continue to point at the original; the overflow lookup is transparent to them.
Length stored on the first page
Section titled “Length stored on the first page”The total byte length of the logical record lives on the first page only, so the head-page fix returns it without walking. The rest pages have just the next-page link.
Pages not individually locked
Section titled “Pages not individually locked”Because each chain is private to one logical record, and that record’s identity is already protected (by row lock for heap, by key range lock for index OID lists), the overflow pages themselves do not require independent lock-manager entries. The WAL discipline plus the head-page latch are sufficient.
Reference designs
Section titled “Reference designs”- PostgreSQL TOAST — separate
pg_toast_<oid>relation per user table, automatically created when a column’s row exceedsTOAST_TUPLE_THRESHOLD(≈ 2 KB on 8 KB pages). Each oversized attribute is compressed (LZ-class), then chunked into 1996-byte pieces stored as ordinary tuples in the toast relation, keyed by(chunk_id, chunk_seq). The user row holds an 18-byte toast pointer (varatt_external) carrying the toast OID, original size, and compressed size. Readersde-toastlazily — selective reads that don’t touch the column never fetch the chunks. - InnoDB DYNAMIC / COMPRESSED row format — when a row exceeds half the page, BLOB / TEXT / VARCHAR columns may be stored off-page in a chain of blob pages allocated from the same tablespace as the clustered index. The clustered-index record holds a 20-byte pointer per off-page column (space, page no, offset, length). DYNAMIC stores only the pointer in the index page; COMPRESSED additionally compresses the leaf page.
- Oracle row chaining / migration — Oracle uses a different
word for two cases: chained rows (record larger than one block
→ split across multiple blocks linked by ROWID) and migrated
rows (UPDATE made the record too big for the original block →
moved to another block, leaving a forwarding ROWID stub).
Visible to administrators as
V$SYSSTAT.table fetch continued rowandchained_rowsinDBA_TABLES; analyzed viaANALYZE TABLE ... LIST CHAINED ROWS. - MySQL MyISAM dynamic format — segments the record into variable-length chunks linked by (offset, length) headers inline within the data file; no separate overflow file.
CUBRID sits in the PostgreSQL family: a separate file per heap for big records, plus a per-tree file for B+Tree overflow keys. But unlike TOAST it does not chunk per attribute — the whole record spills as one blob — and it does not compress. B+Tree-OID-list overflow is its own design point (an OID-sorted slotted page in a per-tree file), unrelated to the heap path beyond the file-substrate.
Theory ↔ CUBRID mapping
Section titled “Theory ↔ CUBRID mapping”| Theoretical concept | CUBRID name |
|---|---|
| Heap big-record overflow file | FILE_MULTIPAGE_OBJECT_HEAP (file_manager.h), VFID stored as HEAP_HDR_STATS.ovf_vfid |
| Heap reference record (forwarding) | REC_BIGONE (cubrid-heap-manager.md §“Record types”) |
| B+Tree overflow-key file | FILE_BTREE_OVERFLOW_KEY, VFID stored as BTID_INT::ovfid |
| B+Tree overflow-key reference | LEAF_REC.ovfl for leaf, NON_LEAF_REC + ovfl for non-leaf |
| B+Tree overflow-OID-list page chain | Slotted B+Tree pages allocated from BTID_INT::ovfid, headed by BTREE_OVERFLOW_HEADER |
| First-page header (heap big-rec / ovfkey) | OVERFLOW_FIRST_PART { next_vpid; length; data[] } (overflow_file.h) |
| Rest-page header (heap big-rec / ovfkey) | OVERFLOW_REST_PART { next_vpid; data[] } (overflow_file.h) |
| Overflow OID page header (B+Tree) | BTREE_OVERFLOW_HEADER { next_vpid } (btree_load.h:244, slot 0/HEADER) |
| Insert / update / delete API | overflow_insert / overflow_update / overflow_delete (overflow_file.c) |
| Heap wrappers | heap_ovf_insert / heap_ovf_update / heap_ovf_delete |
| B+Tree key wrappers | btree_store_overflow_key / btree_load_overflow_key / btree_delete_overflow_key |
| B+Tree OID-list operations | btree_start_overflow_page / btree_modify_overflow_link / btree_find_oid_from_ovfl |
| Find-or-create overflow file | heap_ovf_find_vfid (heap), btree_create_overflow_key_file (btree) |
| WAL records | RVOVF_NEWPAGE_INSERT, RVOVF_NEWPAGE_LINK, RVOVF_PAGE_UPDATE, RVOVF_CHANGE_LINK (recovery.h:100-103) |
| First-overflow-page marker (heap MVCC) | LOG_DUMMY_OVF_RECORD (used to tell vacuum / replication which is the head) |
| MVCC header on overflow | heap_get_mvcc_rec_header_from_overflow / heap_set_mvcc_rec_header_on_overflow (heap_file.c) |
CUBRID’s Approach
Section titled “CUBRID’s Approach”The module sits as a substrate for two distinct callers — heap big-records and B+Tree overflow keys — and as a template for the third (B+Tree overflow-OID lists, which reuse the linked-pages discipline but build their own on-page slotted format). Five properties shape every code path:
- One generic API, three file types.
overflow_insert/overflow_update/overflow_deleteaccept aFILE_TYPEdiscriminator. Only three values are legal at the assert atoverflow_file.c:102:FILE_TEMP(sort runs),FILE_BTREE_OVERFLOW_KEY, andFILE_MULTIPAGE_OBJECT_HEAP. Anything else trips theassert. OVERFLOW_FIRST_PART≠OVERFLOW_REST_PART. The first page of any chain has an extralengthfield carrying total record bytes; rest pages do not. Header size differs between the two, so the per-page payload capacity differs bysizeof (int).- No per-page lock. The header comment at
overflow_file.c:77is explicit: “Overflow pages are not locked in any mode since they are not shared by other pieces of data and its address is only know by accessing the relocation overflow record data which has been appropriately locked.” The home reference’s lock is the access-control gate for the whole chain. - System operation per
_insert/_update/_delete. Multi-page allocations are bracketed inlog_sysop_start/log_sysop_attach_to_outer(commit) orlog_sysop_abort(failure), so the caller’s outer transaction sees the spill as an atomic step — partial chains never escape an aborted sysop. - MVCC header lives on the head page only (heap path). The
record’s first 24 bytes (
OR_MVCC_MAX_HEADER_SIZE) on the first overflow page carry themvcc_rec_headerthat the visibility predicate consults; the heap home page’sREC_BIGONEslot does not. This is whyheap_deleteof a big record edits the overflow page rather than the heap page (heap_file.c:21459–21498).
Heap-overflow path — overall layout
Section titled “Heap-overflow path — overall layout”flowchart LR
subgraph HEAP["Heap file (FILE_HEAP)"]
HP["Heap page\nslot k = REC_BIGONE\n→ {ovf_vpid: VPID, ovf_oid.slotid = NULL_SLOTID}"]
end
subgraph OVF["Overflow file (FILE_MULTIPAGE_OBJECT_HEAP)\nVFID = HEAP_HDR_STATS.ovf_vfid"]
P0["First page\nOVERFLOW_FIRST_PART {\n next_vpid → P1,\n length = total bytes,\n data[ MVCC hdr | row body 0..N ]\n}"]
P1["Rest page\nOVERFLOW_REST_PART {\n next_vpid → P2,\n data[ row body N+1..M ]\n}"]
P2["Rest page\nOVERFLOW_REST_PART {\n next_vpid = NULL,\n data[ row body M+1..end ]\n}"]
P0 --> P1 --> P2
end
HP -- "ovf_vpid points at P0" --> P0
The slot in the heap page is type REC_BIGONE and its body is an
OID with pageid / volid set to the first overflow page and
slotid = NULL_SLOTID (heap_file.c:6582 —
“slotid = NULL_SLOTID; / Irrelevant /”). The overflow record has
no slot directory of its own; pages are addressed by VPID, and the
record body is the contiguous byte stream stitched together from
each page’s data[] area.
Walk: heap insert hits the overflow path
Section titled “Walk: heap insert hits the overflow path”sequenceDiagram
participant CTX as HEAP_OPERATION_CONTEXT
participant HI as heap_insert_logical
participant OVF as heap_ovf_insert
participant OFI as overflow_insert
participant FM as file_manager
participant LM as log_manager
CTX->>HI: record too big for any heap page
HI->>OVF: heap_ovf_insert(hfid, &ovf_oid, recdes)
OVF->>OVF: heap_ovf_find_vfid(... docreate=true ...)
alt overflow file does not exist yet
OVF->>FM: file_create_with_npages(FILE_MULTIPAGE_OBJECT_HEAP, 1, ...)
OVF->>LM: log_append_undo/redo RVHF_STATS (heap-header change)
OVF->>OVF: heap_hdr->ovf_vfid := new VFID
end
OVF->>OFI: overflow_insert(ovf_vfid, &ovf_vpid, recdes, FILE_MULTIPAGE_OBJECT_HEAP)
OFI->>OFI: npages = ceil((len - first_payload) / rest_payload) + 1
OFI->>LM: log_sysop_start
OFI->>FM: file_alloc_multiple(npages, vpids[])
loop for each page i in 0..npages-1
OFI->>OFI: pgbuf_fix(vpids[i], OLD_PAGE, LATCH_WRITE)
OFI->>OFI: write OVERFLOW_FIRST_PART/REST_PART header + payload
alt i == 0 (first page)
OFI->>LM: log_append_empty_record(LOG_DUMMY_OVF_RECORD)
end
OFI->>LM: log_append_redo_data(RVOVF_NEWPAGE_INSERT, page bytes)
OFI->>OFI: pgbuf_set_dirty_and_free
end
OFI->>LM: log_sysop_attach_to_outer
OFI-->>OVF: NO_ERROR, ovf_vpid = vpids[0]
OVF-->>HI: ovf_oid = (vpids[0].volid, vpids[0].pageid, NULL_SLOTID)
HI->>HI: write REC_BIGONE record carrying ovf_oid into home heap page
Two design points to internalize:
-
Page count is computed up-front (
overflow_file.c:112-121). The estimate is exact, not an over-allocation; if the math is wrong, anassert (length == 0)at the end of the loop fires in debug builds. The formula:// overflow_insert — overflow_file.c (condensed)length = recdes->length - (DB_PAGESIZE - (int) offsetof (OVERFLOW_FIRST_PART, data));if (length > 0){i = DB_PAGESIZE - offsetof (OVERFLOW_REST_PART, data);npages = 1 + CEIL_PTVDIV (length, i);}else{npages = 1;} -
file_alloc_multiple+ per-page fix loop, notfile_allocinside the loop. All pages are allocated under onelog_sysop_start; failure on any single page aborts the sysop and rolls back every allocation already done. This is why the caller never sees half-built chains.
Walk: heap update of a big record
Section titled “Walk: heap update of a big record”heap_ovf_update (heap_file.c:6597) is the special-case path
that calls into overflow_update. The interesting move is what
happens when the new image is shorter or longer than the
old:
// overflow_update — overflow_file.c (sketch)log_sysop_start (thread_p);while (length > 0) { addr.pgptr = pgbuf_fix (next_vpid, OLD_PAGE, LATCH_WRITE); /* compute new copy_length for this page; log undo image */ log_append_undo_data (RVOVF_PAGE_UPDATE, before-image); memcpy (page->data, source, copy_length); log_append_redo_data (RVOVF_PAGE_UPDATE, after-image); if (length > 0 && VPID_ISNULL (&next_vpid)) { /* extend chain: allocate a new rest-page */ file_alloc (ovf_vfid, ..., &next_vpid); log_append_undoredo_data (RVOVF_NEWPAGE_LINK, ...); first_part->next_vpid = next_vpid; /* or rest_parts->next_vpid */ } }/* truncate any tail pages no longer needed */while (!VPID_ISNULL (&next_vpid)) { /* unfix tmp_vpid, file_dealloc it */ }log_sysop_attach_to_outer (thread_p);Three nuances:
- Update is restricted to
FILE_MULTIPAGE_OBJECT_HEAPat the top of the function (overflow_file.c:398). B+Tree overflow keys are replaced (delete + insert), not updated — so this path runs only for heap big-records. - Logging is split: undo for the before-image, redo for the
after-image, plus a
RVOVF_NEWPAGE_LINKundo/redo whenever the chain extends. TheRVOVF_CHANGE_LINKundo/redo is emitted when the chain shrinks — the head’snext_vpidis set to NULL and the tail pages are deallocated one by one. LOG_DUMMY_OVF_RECORDis re-emitted on the first page (overflow_file.c:461) so HA replication and vacuum can re- identify the chain head after the update; the dummy log record carries no payload but anchors the LSN that replicas use.
Walk: heap delete of a big record (MVCC vs non-MVCC)
Section titled “Walk: heap delete of a big record (MVCC vs non-MVCC)”heap_delete_logical for a REC_BIGONE slot does not call
overflow_delete directly under MVCC. Instead it edits the MVCC
header on the first overflow page (the mvcc_del_id lives there,
not on the heap home page):
// excerpt — heap_file.c around 21459-21498heap_get_mvcc_rec_header_from_overflow (overflow_page, &overflow_header, NULL);heap_mvcc_log_delete (thread_p, &log_addr, RVHF_MVCC_DELETE_OVERFLOW);heap_delete_adjust_header (&overflow_header, mvcc_id, /*is_insid=*/false);heap_set_mvcc_rec_header_on_overflow (overflow_page, &overflow_header);pgbuf_set_dirty (overflow_page, DONT_FREE);heap_mvcc_log_home_no_change (...); /* heap home: no change, but vacuum needs to see it */The home heap page’s REC_BIGONE slot is not modified during
an MVCC delete. Only when vacuum (or a non-MVCC class delete)
finally reclaims the row does heap_ovf_delete →
overflow_delete → overflow_traverse(OVERFLOW_DO_DELETE) walk
the chain and file_dealloc each page. At that point the heap
home slot is also marked REC_MARKDELETED /
REC_DELETED_WILL_REUSE.
Walk: heap read of a big record
Section titled “Walk: heap read of a big record”The home heap page’s REC_BIGONE slot is a small forwarding
pointer (an OID-shaped 12-byte payload). To read the actual row,
heap_get_visible_version follows the pointer:
// heap_get_visible_version — REC_BIGONE branch (sketch)case REC_BIGONE: /* Read forward_oid from the REC_BIGONE slot. */ /* Apply MVCC visibility against the overflow page's MVCC header. */ pgptr_ovf = pgbuf_fix (forward_oid, OLD_PAGE, LATCH_READ, ...); heap_get_mvcc_rec_header_from_overflow (pgptr_ovf, &mvcc_header, &peek_recdes); if (mvcc_satisfies_snapshot(...) != SNAPSHOT_SATISFIED) return S_DOESNT_EXIST_OR_WALK_PREV_VERSION_LSA; scan = heap_ovf_get (...); /* materializes the chain into the caller's recdes */heap_ovf_get calls overflow_get →
overflow_get_nbytes (start_offset=0, max_nbytes=-1) with the
caller’s MVCC snapshot. The visibility check happens inside
overflow_get_nbytes (overflow_file.c:769-780), not at the
call site:
// overflow_get_nbytes — visibility (overflow_file.c:769-780)if (mvcc_snapshot != NULL) { MVCC_REC_HEADER mvcc_header; heap_get_mvcc_rec_header_from_overflow (pgptr, &mvcc_header, NULL); if (mvcc_snapshot->snapshot_fnc (thread_p, &mvcc_header, mvcc_snapshot) == TOO_OLD_FOR_SNAPSHOT) { pgbuf_unfix_and_init (thread_p, pgptr); return S_SNAPSHOT_NOT_SATISFIED; } }Note the asymmetry: only TOO_OLD_FOR_SNAPSHOT aborts the read;
TOO_NEW_FOR_SNAPSHOT records are still returned. The comment in
the source explains: “TOO_NEW_FOR_SNAPSHOT records should be
accepted, e.g. a recently updated record, locked at select”. This
matters because a SELECT ... FOR UPDATE against a big record
that was just updated by an uncommitted transaction must still see
the candidate row to acquire the lock.
B+Tree overflow path — two distinct shapes
Section titled “B+Tree overflow path — two distinct shapes”The B+Tree module uses overflow files for two unrelated things:
- Overflow keys — when a single key is too large to fit
inline in a leaf or non-leaf record (e.g., a long string PK).
This uses
overflow_file.cwith file typeFILE_BTREE_OVERFLOW_KEY. The chain stores the serializedDB_VALUEof the key, just like heap big-records store the row body. - Overflow OID lists — when a non-unique key has too many
OIDs to fit inline in its leaf record. This does not use
overflow_file.c; it builds its own page format on top of slotted B+Tree pages allocated from the same per-tree fileBTID_INT::ovfid.
Both share BTID_INT::ovfid as the file VFID, created once per
B+Tree by btree_create_overflow_key_file (btree.c:1975):
// btree_create_overflow_key_file — btree.c:1975 (condensed)des.btree_key_overflow.btid = *btid->sys_btid;des.btree_key_overflow.class_oid = btid->topclass_oid;file_create_with_npages (thread_p, FILE_BTREE_OVERFLOW_KEY, 3, &des, &btid->ovfid);heap_get_class_tde_algorithm (&btid->topclass_oid, &tde_algo);file_apply_tde_algorithm (&btid->ovfid, tde_algo); /* TDE for at-rest encryption */The file is created with at least 3 pages preallocated and inherits the index’s class-level TDE setting. Both overflow-key chains and overflow-OID-list chains live in the same file.
Walk: B+Tree overflow OID list
Section titled “Walk: B+Tree overflow OID list”flowchart LR
subgraph LEAF["Leaf page"]
LR0["Leaf record:\n[ first_oid (8B) | (class_oid 8B if cross-class unique) |\n ins_mvccid | del_mvccid | KEY bytes |\n oid_2 | oid_3 | ... ] |\nLEAF_REC.ovfl → V0"]
end
subgraph OVF["Per-key overflow chain (in BTID_INT::ovfid)"]
V0["Overflow page V0 (PAGE_BTREE)\nslot 0 / HEADER = BTREE_OVERFLOW_HEADER {\n next_vpid → V1\n}\nslot 1 = sorted-by-OID record:\n[ oid_k | ins_mvccid | del_mvccid | oid_{k+1} | ... ]"]
V1["Overflow page V1\nslot 0 / HEADER = next_vpid → V2\nslot 1 = more sorted OIDs"]
V2["Overflow page V2\nslot 0 / HEADER = next_vpid = NULL\nslot 1 = remaining sorted OIDs"]
V0 --> V1 --> V2
end
LR0 -- "leaf_rec_info.ovfl points at V0" --> V0
Properties of the OID-list overflow chain:
- Each overflow page is a slotted page of type
PAGE_BTREE— notPAGE_OVERFLOW.pgbuf_check_page_ptypeatbtree.c:11606confirms this. The check distinguishes OID-list overflow pages (still B+Tree pages, just at the overflow level) from heap big-record pages (separate page type). - Slot 0 holds the
BTREE_OVERFLOW_HEADER(defined atbtree_load.h:244as a single-field struct:VPID next_vpid). Slot 1 holds the OID payload — sorted by OID for binary search. - MVCC info is
OR_MVCC_MAX_HEADER_SIZEper OID, always. Inline OIDs in the leaf record can omitmvcc_ins_id/mvcc_del_idwhen not relevant; in the overflow page every object has both fields. The assertion on the call site (btree.c:3994— “BTREE_MVCC_INFO_HAS_INSID && HAS_DELID”) enforces this. Reason: binary search by OID needs fixed-size records, and predictability beats space. spage_max_space_for_new_recorddecides where a new OID goes (btree.c:11609). The walkerbtree_find_free_overflow_oids_pagetraverses the chain and stops at the first page withBTREE_OBJECT_FIXED_SIZEbytes free. If none has room,btree_start_overflow_pageallocates a new page and links it in at the head of the chain (the new page’snext_vpidbecomes the old first overflow page;LEAF_REC.ovflis rewritten to the new page). This is the opposite of append — new pages go to the front because that saves walking to the tail.
Walk: insert OID into the overflow chain
Section titled “Walk: insert OID into the overflow chain”// excerpt — btree.c around 11340-11423log_sysop_start (...);btree_start_overflow_page (btid_int, object_info, first_ovfl_vpid /* old first */, pgbuf_get_vpid_ptr (leaf_page), &ovfl_vpid /* new */, &ovfl_page);/* btree_start_overflow_page: * - btree_get_new_page → pgbuf_fix(NEW_PAGE) * - ovf_header_info.next_vpid := first_ovfl_vpid (link new page to old chain) * - btree_init_overflow_header → spage_insert_at(HEADER, &ovf_header_info) * - btree_record_append_object → write [oid|ins_mvccid|del_mvccid] into rec * - spage_insert_at(slot 1, &rec) * - log_append_redo_data(RVBT_RECORD_MODIFY_NO_UNDO, full page replay) */
LOG_RV_RECORD_SET_MODIFY_MODE (&insert_helper->leaf_addr, LOG_RV_RECORD_UPDATE_PARTIAL);btree_leaf_record_change_overflow_link (..., &ovfl_vpid, &rv_undo_data_ptr, &insert_helper->rv_redo_data_ptr);spage_update (leaf_page, search_key->slotid, leaf_rec);log_append_undoredo_data (RVBT_RECORD_MODIFY_UNDOREDO, ...); /* leaf record undo + redo */log_sysop_end (...);The system-op bracket is critical: if the server crashes between
“new overflow page allocated” and “leaf record points at it”, the
sysop log replay deallocates the orphan page (or, if the bracket
is log_sysop_end_logical_undo, re-merges; the OID-list path uses
plain commit-on-success / abort-on-failure). The header comment in
btree.c notes this risk: “Note that this page may be leaked if
server crashes before changing the link in leaf page” — which is
why the new-page allocation and the leaf link rewrite must be in
the same sysop.
Walk: delete from the overflow chain
Section titled “Walk: delete from the overflow chain”btree_overflow_remove_object (btree.c:1622) deletes the OID
from its page; if that empties the page, the chain has to be
re-linked. There are two cases:
- First overflow page — the leaf record’s
LEAF_REC.ovflmust be rewritten to point at the next overflow page (or NULL_VPID if the chain becomes empty).btree_modify_overflow_linkdoes this for the leaf side;btree_overflow_remove_objectfor the overflow side. - Non-first overflow page — the previous overflow page’s
BTREE_OVERFLOW_HEADER.next_vpidis rewritten. This requires the previous page to be fixed simultaneously with the page being removed;btree_find_oid_and_its_page(btree.c:11646-11772) returns both via*found_pageand*prev_pageto support this.
// btree_modify_overflow_link — btree.c:10054 (condensed)spage_get_record (ovfl_page, HEADER, &overflow_header_record, COPY); /* undo image */ovf_header_info.next_vpid := next_ovfl_vpid;spage_update (ovfl_page, HEADER, &overflow_header_record); /* new image */log_append_undoredo_data (RVBT_RECORD_MODIFY_UNDOREDO, ovf_addr, undo (old header), redo (new header));pgbuf_set_dirty (ovfl_page, DONT_FREE);RVBT_RECORD_MODIFY_UNDOREDO is the unified physical
undo/redo record for B+Tree node modifications — the same one
used for inline leaf inserts and deletes — keyed off the slot
ID and the modify-mode flag (LOG_RV_RECORD_UPDATE_ALL vs
LOG_RV_RECORD_UPDATE_PARTIAL vs LOG_RV_RECORD_INSERT vs
LOG_RV_RECORD_DELETE). The B+Tree overflow chain reuses this
same record family rather than minting a per-overflow record
type.
Walk: B+Tree overflow key
Section titled “Walk: B+Tree overflow key”When the key itself is too long for inline storage, a wholly
different path applies: btree_store_overflow_key
(btree.c:2020). The key is serialized via
pr_type->index_writeval into a RECDES, and that RECDES
becomes the input to overflow_insert with file type
FILE_BTREE_OVERFLOW_KEY:
// btree_store_overflow_key — btree.c:2020 (condensed)overflow_file_vfid = btid->ovfid; /* shared with OID-list overflow */rec.area_size = size;rec.data = db_private_alloc (size);or_init (&buf, rec.data, rec.area_size);pr_type->index_writeval (&buf, key_ptr);rec.length = (int) (buf.ptr - buf.buffer);
overflow_insert (&overflow_file_vfid, first_overflow_page_vpid, &rec, FILE_BTREE_OVERFLOW_KEY);The leaf record then carries the VPID of the first overflow
page (instead of the inlined key bytes), with a flag
(BTREE_LEAF_RECORD_OVERFLOW_KEY) telling readers to consult the
overflow chain. btree_load_overflow_key (btree.c:2131) is the
read path:
// btree_load_overflow_key — btree.c:2131 (condensed)rec.area_size = overflow_get_length (thread_p, first_overflow_page_vpid);rec.data = db_private_alloc (rec.area_size);overflow_get (thread_p, first_overflow_page_vpid, &rec, NULL); /* mvcc_snapshot=NULL */or_init (&buf, rec.data, rec.length);pr_type->index_readval (&buf, key, btid->key_type, -1, /*copy=*/true, NULL, 0);Two design points:
mvcc_snapshot=NULLin theoverflow_getcall — overflow keys do not carry MVCC info. The visibility of the OID attached to that key is what mediates whether the key is visible at all; the key bytes themselves are MVCC-invariant for as long as the leaf entry exists.- The key is always copied out of the overflow record
(
copy=true). Unlike inline keys, which can be peeked at zero-copy under the leaf latch, an overflow key requires materializing into aDB_VALUEbecause the page is unfixed beforeindex_readvalreturns.
btree_delete_overflow_key (btree.c:2202) reads the leaf
record, extracts the first overflow VPID, and calls
overflow_delete to deallocate the chain. It does not delete
the leaf record’s slot — that is the caller’s job.
Crash recovery — four log record types
Section titled “Crash recovery — four log record types”The overflow file emits four record types, all defined in
recovery.h:100-103:
| Code | When emitted | Redo handler |
|---|---|---|
RVOVF_NEWPAGE_INSERT | Per-page, during overflow_insert — full-page redo image | overflow_rv_newpage_insert_redo → log_rv_copy_char |
RVOVF_NEWPAGE_LINK | When overflow_update extends the chain | Undo: overflow_rv_newpage_link_undo (sets next_vpid := NULL); Redo: overflow_rv_link (sets next_vpid := ) |
RVOVF_PAGE_UPDATE | Per-page, during overflow_update — undo + redo images | Redo: overflow_rv_page_update_redo (sets ptype, copies bytes); undo is a per-page byte image |
RVOVF_CHANGE_LINK | When overflow_update shrinks the chain | Same handler as RVOVF_NEWPAGE_LINK |
LOG_DUMMY_OVF_RECORD is not in the RV* family — it is a
zero-length log record whose only purpose is to anchor an LSN that
HA replication and vacuum_master can use as a “this is a chain
head” marker. It is emitted on the first page only
(overflow_file.c:202 for insert, :461 for update) and has no
redo / undo function; replicas observing this LSN look up the
page-LSN to identify the head.
flowchart TB
subgraph INSERT["overflow_insert"]
II1["log_sysop_start"]
II2["file_alloc_multiple\n(npages)"]
II3["per page i:\nLOG_DUMMY_OVF_RECORD on i==0\nRVOVF_NEWPAGE_INSERT on each"]
II4["log_sysop_attach_to_outer\n(commit / fail → abort)"]
II1 --> II2 --> II3 --> II4
end
subgraph UPDATE["overflow_update"]
UU1["log_sysop_start"]
UU2["per page touched:\nRVOVF_PAGE_UPDATE undo+redo\nRVOVF_NEWPAGE_LINK if extending\nLOG_DUMMY_OVF_RECORD on first"]
UU3["if shrinking:\nRVOVF_CHANGE_LINK undoredo\nthen file_dealloc tail pages"]
UU4["log_sysop_attach_to_outer\n(commit / fail → abort)"]
UU1 --> UU2 --> UU3 --> UU4
end
subgraph DELETE["overflow_delete"]
DD1["overflow_traverse(OVERFLOW_DO_DELETE)"]
DD2["per page:\npgbuf_unfix\nfile_dealloc(vpid) — no per-page log record\n(file manager logs the dealloc itself)"]
DD1 --> DD2
end
The asymmetry — insert and update emit RVOVF_* records,
delete does not — is because deletion is implemented as
file_dealloc on each page, and the file manager logs page
deallocations through its own RV*_FILE_* record family.
overflow_file.c does not need to log anything beyond what
file_dealloc already does.
Vacuum interaction
Section titled “Vacuum interaction”For heap big-records, vacuum visits the home heap page
chain in vacuum_heap_page and, when it encounters a
REC_BIGONE slot whose visibility is “delete-visible to all”
(i.e., mvcc_del_id on the overflow page is older than the
oldest active snapshot), calls heap_ovf_delete to free the
overflow chain. The home slot transitions to REC_MARKDELETED /
REC_DELETED_WILL_REUSE.
For B+Tree overflow OID lists, vacuum visits each B+Tree
leaf and, for each OID whose mvcc_del_id is too old, calls
btree_overflow_remove_object (or its in-leaf counterpart). If
that leaves an overflow page empty, the page is unlinked and
file_dealloc’d via btree_modify_overflow_link plus the
file-manager call. The overflow file’s pages are reused by
later inserts to the same B+Tree — never across trees.
For B+Tree overflow keys, vacuum has no special role — the
key chain is freed only when the leaf record carrying it is
deleted, via btree_delete_overflow_key →
overflow_delete. Since the leaf record is deleted by either
DDL (key-type change, drop-index) or row-delete plus
collapse-merge, there is no MVCC delay.
Why OVERFLOW_FIRST_PART::length matters
Section titled “Why OVERFLOW_FIRST_PART::length matters”Two callers depend on the head-only length field:
overflow_get_length(overflow_file.c:692) is an O(1) query — fix the head, read the field, unfix. Used bybtree_load_overflow_keyto size the destination buffer before the first read.overflow_get_capacity(overflow_file.c:935) reports total bytes, page count, header overhead, and free space in the last page — useful for diagnostic dumps and capacity planning. The walker useslengthto decide how many pages to traverse, instead of walking until a NULLnext_vpid, which means a corrupt chain (NULL link beforelengthbytes consumed) trips an explicit error rather than a silent truncation.
The trade-off is that overflow_update has to maintain
length correctly across resize, including the case where the
new length is smaller — captured by the post-update truncation
loop in overflow_update (overflow_file.c:563-588). The first
OVERFLOW_FIRST_PART page is always retained even if the new
length fits in less than one full page (i.e., the chain never
shrinks below 1 page) because removing the head would invalidate
the REC_BIGONE reference in the heap home record.
Page-type discipline
Section titled “Page-type discipline”The overflow file uses two distinct page types depending on the caller:
- Heap big-records and B+Tree overflow keys allocate pages
of type
PAGE_OVERFLOW.overflow_insertinitializes new pages withfile_init_page_type(orfile_init_temp_page_typeforFILE_TEMP) andptype = PAGE_OVERFLOW. Reads validate viapgbuf_check_page_ptype (..., PAGE_OVERFLOW). This page type signals to the buffer-pool layer that the page is not slotted in the usual sense — the body is rawOVERFLOW_FIRST_PART/OVERFLOW_REST_PARTheaders plus contiguous data, not slot directory + records. - B+Tree overflow OID lists allocate pages of type
PAGE_BTREE— the same as leaf and non-leaf pages. They are slotted (slot 0 =BTREE_OVERFLOW_HEADER, slot 1 = OID record), so the standard slotted-page invariants apply.
This is the cleanest tell that the two B+Tree overflow paths are unrelated: the OID-list overflow chain is just “more B+Tree pages that happen to be at depth N+1”, while the overflow-key chain shares its on-disk layout with heap big-records.
Source Walkthrough
Section titled “Source Walkthrough”Anchor on symbol names, not line numbers. Use
git grep -n '<symbol>' src/storage/to locate current positions. The trailing position-hint table records the lines observed at this revision.
Type definitions
Section titled “Type definitions”struct overflow_first_part(overflow_file.h) — head-page format:next_vpid + length + data[1].struct overflow_rest_part(overflow_file.h) — rest-page format:next_vpid + data[1].enum OVERFLOW_DO_FUNC { OVERFLOW_DO_DELETE, OVERFLOW_DO_FLUSH }(overflow_file.c) — discriminator for the shared traversal worker.BTREE_OVERFLOW_HEADER(btree_load.h:244) — singlenext_vpidfield; lives in slot 0 of B+Tree OID-overflow pages.- File-type constants
FILE_TEMP,FILE_BTREE_OVERFLOW_KEY,FILE_MULTIPAGE_OBJECT_HEAP(file_manager.h) — the three file typesoverflow_insertaccepts. - Page-type constants
PAGE_OVERFLOW,PAGE_BTREE(page_buffer.h) — distinguish heap-big-record / overflow-key pages from B+Tree OID-list pages. - WAL codes
RVOVF_NEWPAGE_INSERT,RVOVF_NEWPAGE_LINK,RVOVF_PAGE_UPDATE,RVOVF_CHANGE_LINK(recovery.h:100-103).
Public API (src/storage/overflow_file.c)
Section titled “Public API (src/storage/overflow_file.c)”overflow_insert— multi-page allocation + per-page write + per-page redo log under one sysop. Pre-computes page count.overflow_update— extend / shrink / overwrite the chain; assertsFILE_MULTIPAGE_OBJECT_HEAP(heap-only path).overflow_delete— wrapsoverflow_traverse(OVERFLOW_DO_DELETE).overflow_flush— wrapsoverflow_traverse(OVERFLOW_DO_FLUSH).overflow_get— convenience wrapper foroverflow_get_nbytes(0, -1, ...).overflow_get_nbytes— partial-record read with start offset and max length; applies MVCC visibility on the head page.overflow_get_length— O(1) total-bytes query (head page only).overflow_get_capacity— full-chain walk; reports size, pages, overhead, free space.overflow_get_first_page_data— pointer to the first page’sdata[]area (used byheap_get_mvcc_rec_header_from_overflow).
Internal helpers (src/storage/overflow_file.c)
Section titled “Internal helpers (src/storage/overflow_file.c)”overflow_next_vpid— read the next-page link, type-aware (head vs rest).overflow_traverse— walk-and-call-callback worker; shared by delete and flush.overflow_delete_internal— per-page deallocation (file_dealloc).overflow_flush_internal— per-page WAL flush (pgbuf_flush_with_wal).
Recovery (src/storage/overflow_file.c)
Section titled “Recovery (src/storage/overflow_file.c)”overflow_rv_newpage_insert_redo— full-page redo vialog_rv_copy_char.overflow_rv_newpage_link_undo— setnext_vpid := NULLon the page that was linked-to; undoes a chain extension.overflow_rv_link— setnext_vpid := <data>; serves as redo for bothRVOVF_NEWPAGE_LINK(extend) andRVOVF_CHANGE_LINK(shrink).overflow_rv_link_dump— diagnostic dump of anext_vpidlog record.overflow_rv_page_update_redo— set page type (PAGE_OVERFLOW) and copy bytes vialog_rv_copy_char.overflow_rv_page_dump— diagnostic dump forRVOVF_PAGE_UPDATErecords.
Heap-side wrappers (src/storage/heap_file.c)
Section titled “Heap-side wrappers (src/storage/heap_file.c)”heap_ovf_find_vfid— get / create the heap’s overflow file VFID fromHEAP_HDR_STATS.ovf_vfid. Withdocreate=true, createsFILE_MULTIPAGE_OBJECT_HEAPunder a sysop, applies TDE algorithm from class metadata, logsRVHF_STATSbefore / after for the heap-header change.heap_ovf_insert— wrapoverflow_insert(..., FILE_MULTIPAGE_OBJECT_HEAP); convert returnedVPIDinto anOIDwithslotid = NULL_SLOTIDfor the home page’sREC_BIGONEbody.heap_ovf_update— wrapoverflow_update(..., FILE_MULTIPAGE_OBJECT_HEAP).heap_ovf_delete— wrapoverflow_delete. Ifovf_vfid_pis NULL, looks it up viaheap_ovf_find_vfid.heap_ovf_flush/heap_ovf_get_length/heap_ovf_get/heap_ovf_get_capacity— straight wrappers over the correspondingoverflow_*functions.heap_get_mvcc_rec_header_from_overflow— peek the MVCC header from the head page’sdata[](usesoverflow_get_first_page_data).heap_set_mvcc_rec_header_on_overflow— write the MVCC header back; forces fullOR_MVCC_MAX_HEADER_SIZEeven for records that don’t yet have all flags set, by addingMVCCID_ALL_VISIBLEfor missingINSIDandMVCCID_NULLfor missingDELID. The forced-max-size invariant is what letsmvcc_header_size_lookuppredict the in-place update cost.heap_get_bigone_content— read a big record into aRECDES, with optionalscan_cache-backed buffer reuse.
B+Tree-side overflow-key wrappers (src/storage/btree.c)
Section titled “B+Tree-side overflow-key wrappers (src/storage/btree.c)”btree_create_overflow_key_file— createFILE_BTREE_OVERFLOW_KEYand store the VFID inBTID_INT::ovfid. Allocates ≥ 3 pages up front; applies TDE.btree_store_overflow_key— serialize aDB_VALUEkey into aRECDES; calloverflow_insert(... FILE_BTREE_OVERFLOW_KEY); output the head VPID.btree_load_overflow_key— read the head’slength, allocate, calloverflow_get(... NULL); deserialize into aDB_VALUE.btree_delete_overflow_key— find the head VPID embedded in the leaf / non-leaf record; calloverflow_delete.
B+Tree-side overflow-OID-list helpers (src/storage/btree.c)
Section titled “B+Tree-side overflow-OID-list helpers (src/storage/btree.c)”btree_start_overflow_page— allocate a new B+Tree page (PAGE_BTREE), initializeBTREE_OVERFLOW_HEADER, append the new object to slot 1, and link the new page in at the head of the chain (new → old-first).btree_modify_overflow_link— rewrite an overflow page’s headernext_vpid; emitsRVBT_RECORD_MODIFY_UNDOREDO.btree_find_free_overflow_oids_page— walk chain, return first page with enough room, elseNULL.btree_find_oid_and_its_page— locate the page (leaf or overflow) carrying a given(key, OID)pair, and optionally return the previous page so the caller can rewrite the predecessor’s link.btree_find_oid_from_ovfl— binary search OIDs in one overflow page’s slot-1 record (sorted by OID).btree_overflow_remove_object— physically remove an OID from an overflow page; if the page becomes empty, deallocate and re-link.btree_overflow_record_replace_object— replace one OID with another (used by unique-constraint maintenance and FK).btree_get_next_overflow_vpid(inbtree_load.c) — readBTREE_OVERFLOW_HEADER.next_vpidfrom a fixed page.btree_get_overflow_header(inbtree_load.c) — peek the overflow page’s HEADER slot.btree_init_overflow_header(inbtree_load.c) — write the overflow page’s HEADER slot duringbtree_start_overflow_page.
Position hints as of this revision
Section titled “Position hints as of this revision”These line numbers were observed when the document was last
updated:. Symbols are authoritative; lines may drift.
| Symbol | File | Line |
|---|---|---|
struct overflow_first_part | overflow_file.h | 38 |
struct overflow_rest_part | overflow_file.h | 46 |
enum OVERFLOW_DO_FUNC | overflow_file.c | 45 |
overflow_insert | overflow_file.c | 81 |
overflow_next_vpid | overflow_file.c | 275 |
overflow_traverse | overflow_file.c | 296 |
overflow_update | overflow_file.c | 373 |
overflow_delete_internal | overflow_file.c | 613 |
overflow_delete | overflow_file.c | 644 |
overflow_flush_internal | overflow_file.c | 655 |
overflow_flush | overflow_file.c | 677 |
overflow_get_length | overflow_file.c | 691 |
overflow_get_nbytes | overflow_file.c | 738 |
overflow_get | overflow_file.c | 916 |
overflow_get_capacity | overflow_file.c | 934 |
overflow_rv_newpage_insert_redo | overflow_file.c | 1112 |
overflow_rv_newpage_link_undo | overflow_file.c | 1124 |
overflow_rv_link | overflow_file.c | 1144 |
overflow_rv_page_update_redo | overflow_file.c | 1178 |
overflow_get_first_page_data | overflow_file.c | 1217 |
RVOVF_NEWPAGE_INSERT / etc. | recovery.h | 100 |
BTREE_OVERFLOW_HEADER | btree_load.h | 244 |
btree_get_overflow_header | btree_load.c | 339 |
btree_init_overflow_header | btree_load.c | 575 |
btree_get_next_overflow_vpid | btree_load.c | 674 |
btree_create_overflow_key_file | btree.c | 1975 |
btree_store_overflow_key | btree.c | 2020 |
btree_load_overflow_key | btree.c | 2131 |
btree_delete_overflow_key | btree.c | 2202 |
btree_start_overflow_page | btree.c | 3973 |
btree_modify_overflow_link | btree.c | 10054 |
btree_find_free_overflow_oids_page | btree.c | 11579 |
btree_find_oid_and_its_page | btree.c | 11646 |
btree_find_oid_from_ovfl | btree.c | 12037 |
heap_ovf_find_vfid | heap_file.c | 6462 |
heap_ovf_insert | heap_file.c | 6568 |
heap_ovf_update | heap_file.c | 6597 |
heap_ovf_delete | heap_file.c | 6632 |
heap_get_mvcc_rec_header_from_overflow | heap_file.c | 19541 |
heap_set_mvcc_rec_header_on_overflow | heap_file.c | 19567 |
heap_get_bigone_content | heap_file.c | 19610 |
Cross-check Notes
Section titled “Cross-check Notes”-
overflow_updateis heap-only. Theassert (file_type == FILE_MULTIPAGE_OBJECT_HEAP)atoverflow_file.c:398rejects any other file type — includingFILE_BTREE_OVERFLOW_KEY. B+Tree overflow keys cannot be updated in place; the caller mustoverflow_deletethe old chain andoverflow_inserta new one, even for tiny edits. The heap path is privileged because the row body’s MVCC update flow already implies “edit the existing chain in place, extending or shrinking”, and duplicating the data twice (delete + insert) would burn double the WAL. -
B+Tree OID-list overflow pages are
PAGE_BTREE, notPAGE_OVERFLOW. Thepgbuf_check_page_ptypeatbtree.c:11606is the easiest way to tell that the OID-list path does not useoverflow_file.c. The overflow-key path does use it and getsPAGE_OVERFLOWpages fromfile_init_page_typeinsideoverflow_insert. -
OVERFLOW_FIRST_PART::lengthis computed and maintained by the overflow-file layer, not by callers. Callers see onlyrecdes->lengthon the way in and on the way out; the internal field is private. -
LOG_DUMMY_OVF_RECORDcarries no payload. It is purely an LSN anchor for HA replication and vacuum to identify the head page of a chain. Removing it would not break recovery (the chain itself has full redo) but would break the replication side that consumes the LSN to position itself in the log stream. -
Heap big-record MVCC delete edits the overflow page, not the heap home page. The home page’s
REC_BIGONEslot is only modified at vacuum / non-MVCC-class-delete time. This is the structural difference vsREC_HOMErecords, where the MVCC header lives inline in the heap slot itself. -
Per-page lock absence is a deliberate optimization. The comment block at
overflow_file.c:77-79documents the invariant: each chain is private to one logical record, and that record’s higher-level lock (row lock or key-range lock) protects the whole chain. This is what allowspgbuf_fix(PGBUF_LATCH_WRITE)on overflow pages without lock manager involvement.
Open Questions
Section titled “Open Questions”-
Does
overflow_updateever trigger a chain split? The chain is single-linked, so an update that grows the record only ever appends pages at the tail or extends in place. But the code atoverflow_file.c:511-541has a branch that appears to allocate a new page mid-chain whennext_vpidis NULL but more bytes remain. Investigation path: trace whether the resulting chain is always tail-extended or whether mid-chain insertion is possible. -
Why does
overflow_delete_internalpassFILE_UNKNOWN_TYPEtofile_dealloc? The TODO comment atoverflow_file.c:621flags this as something to clarify. The file manager presumably looks up the actual type from the file descriptor on disk; verifying this and removing the TODO would be a small cleanup. Investigation path:file_deallocsignature and behavior onFILE_UNKNOWN_TYPE. -
FILE_TEMPuse case inoverflow_insert. The valid-types assert atoverflow_file.c:102acceptsFILE_TEMP(described as “sort files”). Where in CUBRID does the external sort write multi-page records viaoverflow_insert? The path does not appear inbtree_load.c’s sort phase or inquery_executor.c. Investigation path: grep foroverflow_insertcallers with aFILE_TEMPargument. -
TDE coverage on B+Tree OID-list overflow pages. The TDE algorithm is applied to
BTID_INT::ovfidonce atbtree_create_overflow_key_filetime. Both overflow-key and OID-list pages live in this file. Are OID-list pages actually encrypted at rest, or does TDE skip them because they arePAGE_BTREEand the encryption hook is keyed offPAGE_OVERFLOW? Investigation path:pgbuf_dealloc_page/disk_page_writeTDE path and thePAGE_*discriminator. -
Concurrency between
heap_ovf_updateand a concurrent reader. The reader fixes the head page first, readslength, then walks the chain — but each rest-page fix is independent. Ifheap_ovf_updateshrinks the chain between the reader’s fix of page N and page N+1, the reader might fix a deallocated page. Is this prevented by the row-level lock the reader holds, or by the page-buffer’s pin-count blocking dealloc? Investigation path: tracepgbuf_fixof a page that is concurrently beingfile_dealloc’d under an X row lock vs SELECT FOR UPDATE. -
B+Tree overflow-key MVCC. The code passes
mvcc_snapshot=NULLwhen reading an overflow key, on the assumption that the key is MVCC-invariant for the lifetime of the leaf entry. But the leaf entry has its own MVCC info per OID; if all OIDs for a key are vacuumed, the leaf entry is removed and the overflow-key chain is freed. Is there a window where a snapshot reader could observe a leaf entry whose overflow-key chain has been concurrently freed? Investigation path: vacuum’s leaf-purge path and its ordering vsbtree_delete_overflow_key.
Sources
Section titled “Sources”This document is Code-only (sources: []). All material was
distilled from the CUBRID source tree referenced below; no
slide deck, raw analysis PDF, or external write-up was used.
CUBRID source (under /data/hgryoo/references/cubrid/)
Section titled “CUBRID source (under /data/hgryoo/references/cubrid/)”src/storage/overflow_file.hsrc/storage/overflow_file.csrc/storage/heap_file.c(heap_ovf_*,heap_get_mvcc_rec_header_from_overflow,heap_set_mvcc_rec_header_on_overflow, REC_BIGONE handling around lines 7527–7873 and 21420–21504)src/storage/btree.c(btree_create_overflow_key_file,btree_store_overflow_key,btree_load_overflow_key,btree_delete_overflow_key,btree_start_overflow_page,btree_modify_overflow_link,btree_find_free_overflow_oids_page,btree_find_oid_and_its_page,btree_find_oid_from_ovfl)src/storage/btree_load.h(BTREE_OVERFLOW_HEADER)src/storage/btree_load.c(btree_get_overflow_header,btree_init_overflow_header,btree_get_next_overflow_vpid)src/storage/file_manager.h(FILE_MULTIPAGE_OBJECT_HEAP,FILE_BTREE_OVERFLOW_KEY,FILE_TEMP)src/transaction/recovery.h(RVOVF_NEWPAGE_INSERT,RVOVF_NEWPAGE_LINK,RVOVF_PAGE_UPDATE,RVOVF_CHANGE_LINK)
Sibling docs in this knowledge base
Section titled “Sibling docs in this knowledge base”knowledge/code-analysis/cubrid/cubrid-heap-manager.md— the slotted-page substrate,REC_BIGONErecord type, andHEAP_HDR_STATS::ovf_vfidlocation.knowledge/code-analysis/cubrid/cubrid-btree.md— the three-record-kinds layout (NON_LEAF_REC,LEAF_REC, overflow),BTID_INT::ovfid, and thebtree_find_oid_and_its_pageunique-enforcement workhorse.knowledge/code-analysis/cubrid/cubrid-mvcc.md—mvcc_rec_headerlayout and the visibility predicate thatoverflow_get_nbytescalls.knowledge/code-analysis/cubrid/cubrid-page-buffer-manager.md—pgbuf_fixdiscipline andPAGE_OVERFLOW/PAGE_BTREEpage-type checks.knowledge/code-analysis/cubrid/cubrid-log-manager.md—log_sysop_start/log_sysop_attach_to_outer/log_sysop_abortsystem-op semantics,RVOVF_*record family interaction with WAL, andLOG_DUMMY_OVF_RECORD’s role for HA replication.
Textbook chapters (under knowledge/research/dbms-general/)
Section titled “Textbook chapters (under knowledge/research/dbms-general/)”- Database Internals (Petrov), Ch. 3 §“File Formats”, §“Variable-Size Records” — slotted page, forwarding, and out-of-line overflow as the three classical strategies.
- Database Systems: The Complete Book (Garcia-Molina, Ullman, Widom), §13.7 “Variable-Length Data and Records” — in-line forwarding vs out-of-line overflow trade-offs.
- Comer (1979), The Ubiquitous B-Tree (ACM CSUR) — the per-key OID list and overflow-chain idea predates B+Trees.