Skip to content

CUBRID Overflow File — Heap Big-Record and B+Tree Overflow-OID Page Chains

Contents:

The slotted page (cubrid-heap-manager.md §“Slotted page”) solves the “variable-length records on a fixed-size page” problem only within the page. Once a record itself outgrows what a page can hold, or once the per-key satellite payload of a non-unique index outgrows what a single leaf record can hold, slotted layout has nothing left to say. Database Internals (Petrov, ch. 3 §“File Formats”, §“Variable-Size Records”) frames the three classical strategies:

  1. In-line growth with tombstones. Keep the record on its home page; if it grows beyond the slot, move it within the page and leave a tombstone. Works only for small growth. Garcia-Molina / Ullman / Widom Database Systems: The Complete Book, §13.7 “Variable-Length Data and Records”.
  2. In-line forwarding. Move the new image to another page in the same heap; leave a forwarding pointer at the original slot. Indexes still point to the original OID; reads pay one extra page-fix to follow the pointer. CUBRID’s REC_RELOCATION + REC_NEWHOME (cubrid-heap-manager.md §“Record-type vocabulary”).
  3. Out-of-line overflow. When the record is too big for any page in the heap, push the bulk to a separate file (the overflow file) and store only a small reference at the home slot. The reference is the original OID’s permanent identity; the overflow chain is private to it. Comer (1979) and the IBM System R notes both describe this for “long fields”; PostgreSQL formalized it as TOAST (The Oversized-Attribute Storage Technique).

The same problem appears one level up in B+Tree non-unique indexes. A leaf record carries key || OID_list; when too many duplicates accumulate for one key, the OID list outgrows the leaf page. The textbook fix (Bayer, Lehman / Yao, and successors) is identical in shape: spill the surplus into a per-key overflow chain — distinct from the heap’s overflow file but built on the same disk-file substrate.

CUBRID has one symbol-level overflow-file module (src/storage/overflow_file.{h,c}) that serves both spillover needs, plus two callers that wrap it differently:

  • Heap “big record” path — heap_ovf_* in heap_file.c.
  • B+Tree overflow-OID-list path — btree_*overflow* in btree.c (uses overflow_file.c only for overflow keys; the OID-list overflow chain is a different on-page layout built directly on slotted B+Tree pages, but reuses the same “linked list of pages owned by one logical record” model).

This document closes the gap that cubrid-heap-manager.md and cubrid-btree.md both leave open by pointing at “see overflow_file.c”.

Every row-store with variable-length records and non-unique indexes reaches for some form of out-of-line spillover. The shared ingredients:

The unit of allocation is the file (CUBRID’s VFID, PostgreSQL’s toast relation, InnoDB’s off-page page chain). One overflow file is bound to one heap, or one B+Tree, or one column. Pages allocated from that file are reused only by the same owner — no cross-tenant overflow.

Single-linked page chain per logical record

Section titled “Single-linked page chain per logical record”

The bulk of the data lives in a chain of pages, each carrying a “next-page” pointer in a small header. Random access into the middle of a long record is supported by walking from the head, optionally skipping pages whose offsets the caller already knows. Single-linked is sufficient because all callers know the head page (it is what the home record’s reference points at).

The original location keeps a small fixed-size record carrying just enough to find the overflow chain: a page identifier and sometimes a length. Indexes and other readers continue to point at the original; the overflow lookup is transparent to them.

The total byte length of the logical record lives on the first page only, so the head-page fix returns it without walking. The rest pages have just the next-page link.

Because each chain is private to one logical record, and that record’s identity is already protected (by row lock for heap, by key range lock for index OID lists), the overflow pages themselves do not require independent lock-manager entries. The WAL discipline plus the head-page latch are sufficient.

  • PostgreSQL TOAST — separate pg_toast_<oid> relation per user table, automatically created when a column’s row exceeds TOAST_TUPLE_THRESHOLD (≈ 2 KB on 8 KB pages). Each oversized attribute is compressed (LZ-class), then chunked into 1996-byte pieces stored as ordinary tuples in the toast relation, keyed by (chunk_id, chunk_seq). The user row holds an 18-byte toast pointer (varatt_external) carrying the toast OID, original size, and compressed size. Readers de-toast lazily — selective reads that don’t touch the column never fetch the chunks.
  • InnoDB DYNAMIC / COMPRESSED row format — when a row exceeds half the page, BLOB / TEXT / VARCHAR columns may be stored off-page in a chain of blob pages allocated from the same tablespace as the clustered index. The clustered-index record holds a 20-byte pointer per off-page column (space, page no, offset, length). DYNAMIC stores only the pointer in the index page; COMPRESSED additionally compresses the leaf page.
  • Oracle row chaining / migration — Oracle uses a different word for two cases: chained rows (record larger than one block → split across multiple blocks linked by ROWID) and migrated rows (UPDATE made the record too big for the original block → moved to another block, leaving a forwarding ROWID stub). Visible to administrators as V$SYSSTAT.table fetch continued row and chained_rows in DBA_TABLES; analyzed via ANALYZE TABLE ... LIST CHAINED ROWS.
  • MySQL MyISAM dynamic format — segments the record into variable-length chunks linked by (offset, length) headers inline within the data file; no separate overflow file.

CUBRID sits in the PostgreSQL family: a separate file per heap for big records, plus a per-tree file for B+Tree overflow keys. But unlike TOAST it does not chunk per attribute — the whole record spills as one blob — and it does not compress. B+Tree-OID-list overflow is its own design point (an OID-sorted slotted page in a per-tree file), unrelated to the heap path beyond the file-substrate.

Theoretical conceptCUBRID name
Heap big-record overflow fileFILE_MULTIPAGE_OBJECT_HEAP (file_manager.h), VFID stored as HEAP_HDR_STATS.ovf_vfid
Heap reference record (forwarding)REC_BIGONE (cubrid-heap-manager.md §“Record types”)
B+Tree overflow-key fileFILE_BTREE_OVERFLOW_KEY, VFID stored as BTID_INT::ovfid
B+Tree overflow-key referenceLEAF_REC.ovfl for leaf, NON_LEAF_REC + ovfl for non-leaf
B+Tree overflow-OID-list page chainSlotted B+Tree pages allocated from BTID_INT::ovfid, headed by BTREE_OVERFLOW_HEADER
First-page header (heap big-rec / ovfkey)OVERFLOW_FIRST_PART { next_vpid; length; data[] } (overflow_file.h)
Rest-page header (heap big-rec / ovfkey)OVERFLOW_REST_PART { next_vpid; data[] } (overflow_file.h)
Overflow OID page header (B+Tree)BTREE_OVERFLOW_HEADER { next_vpid } (btree_load.h:244, slot 0/HEADER)
Insert / update / delete APIoverflow_insert / overflow_update / overflow_delete (overflow_file.c)
Heap wrappersheap_ovf_insert / heap_ovf_update / heap_ovf_delete
B+Tree key wrappersbtree_store_overflow_key / btree_load_overflow_key / btree_delete_overflow_key
B+Tree OID-list operationsbtree_start_overflow_page / btree_modify_overflow_link / btree_find_oid_from_ovfl
Find-or-create overflow fileheap_ovf_find_vfid (heap), btree_create_overflow_key_file (btree)
WAL recordsRVOVF_NEWPAGE_INSERT, RVOVF_NEWPAGE_LINK, RVOVF_PAGE_UPDATE, RVOVF_CHANGE_LINK (recovery.h:100-103)
First-overflow-page marker (heap MVCC)LOG_DUMMY_OVF_RECORD (used to tell vacuum / replication which is the head)
MVCC header on overflowheap_get_mvcc_rec_header_from_overflow / heap_set_mvcc_rec_header_on_overflow (heap_file.c)

The module sits as a substrate for two distinct callers — heap big-records and B+Tree overflow keys — and as a template for the third (B+Tree overflow-OID lists, which reuse the linked-pages discipline but build their own on-page slotted format). Five properties shape every code path:

  1. One generic API, three file types. overflow_insert / overflow_update / overflow_delete accept a FILE_TYPE discriminator. Only three values are legal at the assert at overflow_file.c:102: FILE_TEMP (sort runs), FILE_BTREE_OVERFLOW_KEY, and FILE_MULTIPAGE_OBJECT_HEAP. Anything else trips the assert.
  2. OVERFLOW_FIRST_PARTOVERFLOW_REST_PART. The first page of any chain has an extra length field carrying total record bytes; rest pages do not. Header size differs between the two, so the per-page payload capacity differs by sizeof (int).
  3. No per-page lock. The header comment at overflow_file.c:77 is explicit: “Overflow pages are not locked in any mode since they are not shared by other pieces of data and its address is only know by accessing the relocation overflow record data which has been appropriately locked.” The home reference’s lock is the access-control gate for the whole chain.
  4. System operation per _insert / _update / _delete. Multi-page allocations are bracketed in log_sysop_start / log_sysop_attach_to_outer (commit) or log_sysop_abort (failure), so the caller’s outer transaction sees the spill as an atomic step — partial chains never escape an aborted sysop.
  5. MVCC header lives on the head page only (heap path). The record’s first 24 bytes (OR_MVCC_MAX_HEADER_SIZE) on the first overflow page carry the mvcc_rec_header that the visibility predicate consults; the heap home page’s REC_BIGONE slot does not. This is why heap_delete of a big record edits the overflow page rather than the heap page (heap_file.c:2145921498).
flowchart LR
  subgraph HEAP["Heap file (FILE_HEAP)"]
    HP["Heap page\nslot k = REC_BIGONE\n→ {ovf_vpid: VPID, ovf_oid.slotid = NULL_SLOTID}"]
  end
  subgraph OVF["Overflow file (FILE_MULTIPAGE_OBJECT_HEAP)\nVFID = HEAP_HDR_STATS.ovf_vfid"]
    P0["First page\nOVERFLOW_FIRST_PART {\n  next_vpid → P1,\n  length = total bytes,\n  data[ MVCC hdr | row body 0..N ]\n}"]
    P1["Rest page\nOVERFLOW_REST_PART {\n  next_vpid → P2,\n  data[ row body N+1..M ]\n}"]
    P2["Rest page\nOVERFLOW_REST_PART {\n  next_vpid = NULL,\n  data[ row body M+1..end ]\n}"]
    P0 --> P1 --> P2
  end
  HP -- "ovf_vpid points at P0" --> P0

The slot in the heap page is type REC_BIGONE and its body is an OID with pageid / volid set to the first overflow page and slotid = NULL_SLOTID (heap_file.c:6582“slotid = NULL_SLOTID; / Irrelevant /”). The overflow record has no slot directory of its own; pages are addressed by VPID, and the record body is the contiguous byte stream stitched together from each page’s data[] area.

sequenceDiagram
  participant CTX as HEAP_OPERATION_CONTEXT
  participant HI as heap_insert_logical
  participant OVF as heap_ovf_insert
  participant OFI as overflow_insert
  participant FM as file_manager
  participant LM as log_manager

  CTX->>HI: record too big for any heap page
  HI->>OVF: heap_ovf_insert(hfid, &ovf_oid, recdes)
  OVF->>OVF: heap_ovf_find_vfid(... docreate=true ...)
  alt overflow file does not exist yet
    OVF->>FM: file_create_with_npages(FILE_MULTIPAGE_OBJECT_HEAP, 1, ...)
    OVF->>LM: log_append_undo/redo RVHF_STATS (heap-header change)
    OVF->>OVF: heap_hdr->ovf_vfid := new VFID
  end
  OVF->>OFI: overflow_insert(ovf_vfid, &ovf_vpid, recdes, FILE_MULTIPAGE_OBJECT_HEAP)
  OFI->>OFI: npages = ceil((len - first_payload) / rest_payload) + 1
  OFI->>LM: log_sysop_start
  OFI->>FM: file_alloc_multiple(npages, vpids[])
  loop for each page i in 0..npages-1
    OFI->>OFI: pgbuf_fix(vpids[i], OLD_PAGE, LATCH_WRITE)
    OFI->>OFI: write OVERFLOW_FIRST_PART/REST_PART header + payload
    alt i == 0 (first page)
      OFI->>LM: log_append_empty_record(LOG_DUMMY_OVF_RECORD)
    end
    OFI->>LM: log_append_redo_data(RVOVF_NEWPAGE_INSERT, page bytes)
    OFI->>OFI: pgbuf_set_dirty_and_free
  end
  OFI->>LM: log_sysop_attach_to_outer
  OFI-->>OVF: NO_ERROR, ovf_vpid = vpids[0]
  OVF-->>HI: ovf_oid = (vpids[0].volid, vpids[0].pageid, NULL_SLOTID)
  HI->>HI: write REC_BIGONE record carrying ovf_oid into home heap page

Two design points to internalize:

  • Page count is computed up-front (overflow_file.c:112-121). The estimate is exact, not an over-allocation; if the math is wrong, an assert (length == 0) at the end of the loop fires in debug builds. The formula:

    // overflow_insert — overflow_file.c (condensed)
    length = recdes->length - (DB_PAGESIZE - (int) offsetof (OVERFLOW_FIRST_PART, data));
    if (length > 0)
    {
    i = DB_PAGESIZE - offsetof (OVERFLOW_REST_PART, data);
    npages = 1 + CEIL_PTVDIV (length, i);
    }
    else
    {
    npages = 1;
    }
  • file_alloc_multiple + per-page fix loop, not file_alloc inside the loop. All pages are allocated under one log_sysop_start; failure on any single page aborts the sysop and rolls back every allocation already done. This is why the caller never sees half-built chains.

heap_ovf_update (heap_file.c:6597) is the special-case path that calls into overflow_update. The interesting move is what happens when the new image is shorter or longer than the old:

// overflow_update — overflow_file.c (sketch)
log_sysop_start (thread_p);
while (length > 0)
{
addr.pgptr = pgbuf_fix (next_vpid, OLD_PAGE, LATCH_WRITE);
/* compute new copy_length for this page; log undo image */
log_append_undo_data (RVOVF_PAGE_UPDATE, before-image);
memcpy (page->data, source, copy_length);
log_append_redo_data (RVOVF_PAGE_UPDATE, after-image);
if (length > 0 && VPID_ISNULL (&next_vpid))
{
/* extend chain: allocate a new rest-page */
file_alloc (ovf_vfid, ..., &next_vpid);
log_append_undoredo_data (RVOVF_NEWPAGE_LINK, ...);
first_part->next_vpid = next_vpid; /* or rest_parts->next_vpid */
}
}
/* truncate any tail pages no longer needed */
while (!VPID_ISNULL (&next_vpid))
{
/* unfix tmp_vpid, file_dealloc it */
}
log_sysop_attach_to_outer (thread_p);

Three nuances:

  1. Update is restricted to FILE_MULTIPAGE_OBJECT_HEAP at the top of the function (overflow_file.c:398). B+Tree overflow keys are replaced (delete + insert), not updated — so this path runs only for heap big-records.
  2. Logging is split: undo for the before-image, redo for the after-image, plus a RVOVF_NEWPAGE_LINK undo/redo whenever the chain extends. The RVOVF_CHANGE_LINK undo/redo is emitted when the chain shrinks — the head’s next_vpid is set to NULL and the tail pages are deallocated one by one.
  3. LOG_DUMMY_OVF_RECORD is re-emitted on the first page (overflow_file.c:461) so HA replication and vacuum can re- identify the chain head after the update; the dummy log record carries no payload but anchors the LSN that replicas use.

Walk: heap delete of a big record (MVCC vs non-MVCC)

Section titled “Walk: heap delete of a big record (MVCC vs non-MVCC)”

heap_delete_logical for a REC_BIGONE slot does not call overflow_delete directly under MVCC. Instead it edits the MVCC header on the first overflow page (the mvcc_del_id lives there, not on the heap home page):

// excerpt — heap_file.c around 21459-21498
heap_get_mvcc_rec_header_from_overflow (overflow_page, &overflow_header, NULL);
heap_mvcc_log_delete (thread_p, &log_addr, RVHF_MVCC_DELETE_OVERFLOW);
heap_delete_adjust_header (&overflow_header, mvcc_id, /*is_insid=*/false);
heap_set_mvcc_rec_header_on_overflow (overflow_page, &overflow_header);
pgbuf_set_dirty (overflow_page, DONT_FREE);
heap_mvcc_log_home_no_change (...); /* heap home: no change, but vacuum needs to see it */

The home heap page’s REC_BIGONE slot is not modified during an MVCC delete. Only when vacuum (or a non-MVCC class delete) finally reclaims the row does heap_ovf_deleteoverflow_deleteoverflow_traverse(OVERFLOW_DO_DELETE) walk the chain and file_dealloc each page. At that point the heap home slot is also marked REC_MARKDELETED / REC_DELETED_WILL_REUSE.

The home heap page’s REC_BIGONE slot is a small forwarding pointer (an OID-shaped 12-byte payload). To read the actual row, heap_get_visible_version follows the pointer:

// heap_get_visible_version — REC_BIGONE branch (sketch)
case REC_BIGONE:
/* Read forward_oid from the REC_BIGONE slot. */
/* Apply MVCC visibility against the overflow page's MVCC header. */
pgptr_ovf = pgbuf_fix (forward_oid, OLD_PAGE, LATCH_READ, ...);
heap_get_mvcc_rec_header_from_overflow (pgptr_ovf, &mvcc_header, &peek_recdes);
if (mvcc_satisfies_snapshot(...) != SNAPSHOT_SATISFIED)
return S_DOESNT_EXIST_OR_WALK_PREV_VERSION_LSA;
scan = heap_ovf_get (...); /* materializes the chain into the caller's recdes */

heap_ovf_get calls overflow_getoverflow_get_nbytes (start_offset=0, max_nbytes=-1) with the caller’s MVCC snapshot. The visibility check happens inside overflow_get_nbytes (overflow_file.c:769-780), not at the call site:

// overflow_get_nbytes — visibility (overflow_file.c:769-780)
if (mvcc_snapshot != NULL)
{
MVCC_REC_HEADER mvcc_header;
heap_get_mvcc_rec_header_from_overflow (pgptr, &mvcc_header, NULL);
if (mvcc_snapshot->snapshot_fnc (thread_p, &mvcc_header, mvcc_snapshot) == TOO_OLD_FOR_SNAPSHOT)
{
pgbuf_unfix_and_init (thread_p, pgptr);
return S_SNAPSHOT_NOT_SATISFIED;
}
}

Note the asymmetry: only TOO_OLD_FOR_SNAPSHOT aborts the read; TOO_NEW_FOR_SNAPSHOT records are still returned. The comment in the source explains: “TOO_NEW_FOR_SNAPSHOT records should be accepted, e.g. a recently updated record, locked at select”. This matters because a SELECT ... FOR UPDATE against a big record that was just updated by an uncommitted transaction must still see the candidate row to acquire the lock.

B+Tree overflow path — two distinct shapes

Section titled “B+Tree overflow path — two distinct shapes”

The B+Tree module uses overflow files for two unrelated things:

  1. Overflow keys — when a single key is too large to fit inline in a leaf or non-leaf record (e.g., a long string PK). This uses overflow_file.c with file type FILE_BTREE_OVERFLOW_KEY. The chain stores the serialized DB_VALUE of the key, just like heap big-records store the row body.
  2. Overflow OID lists — when a non-unique key has too many OIDs to fit inline in its leaf record. This does not use overflow_file.c; it builds its own page format on top of slotted B+Tree pages allocated from the same per-tree file BTID_INT::ovfid.

Both share BTID_INT::ovfid as the file VFID, created once per B+Tree by btree_create_overflow_key_file (btree.c:1975):

// btree_create_overflow_key_file — btree.c:1975 (condensed)
des.btree_key_overflow.btid = *btid->sys_btid;
des.btree_key_overflow.class_oid = btid->topclass_oid;
file_create_with_npages (thread_p, FILE_BTREE_OVERFLOW_KEY, 3, &des, &btid->ovfid);
heap_get_class_tde_algorithm (&btid->topclass_oid, &tde_algo);
file_apply_tde_algorithm (&btid->ovfid, tde_algo); /* TDE for at-rest encryption */

The file is created with at least 3 pages preallocated and inherits the index’s class-level TDE setting. Both overflow-key chains and overflow-OID-list chains live in the same file.

flowchart LR
  subgraph LEAF["Leaf page"]
    LR0["Leaf record:\n[ first_oid (8B) | (class_oid 8B if cross-class unique) |\n  ins_mvccid | del_mvccid | KEY bytes |\n  oid_2 | oid_3 | ... ] |\nLEAF_REC.ovfl → V0"]
  end
  subgraph OVF["Per-key overflow chain (in BTID_INT::ovfid)"]
    V0["Overflow page V0 (PAGE_BTREE)\nslot 0 / HEADER = BTREE_OVERFLOW_HEADER {\n  next_vpid → V1\n}\nslot 1 = sorted-by-OID record:\n[ oid_k | ins_mvccid | del_mvccid | oid_{k+1} | ... ]"]
    V1["Overflow page V1\nslot 0 / HEADER = next_vpid → V2\nslot 1 = more sorted OIDs"]
    V2["Overflow page V2\nslot 0 / HEADER = next_vpid = NULL\nslot 1 = remaining sorted OIDs"]
    V0 --> V1 --> V2
  end
  LR0 -- "leaf_rec_info.ovfl points at V0" --> V0

Properties of the OID-list overflow chain:

  • Each overflow page is a slotted page of type PAGE_BTREE — not PAGE_OVERFLOW. pgbuf_check_page_ptype at btree.c:11606 confirms this. The check distinguishes OID-list overflow pages (still B+Tree pages, just at the overflow level) from heap big-record pages (separate page type).
  • Slot 0 holds the BTREE_OVERFLOW_HEADER (defined at btree_load.h:244 as a single-field struct: VPID next_vpid). Slot 1 holds the OID payload — sorted by OID for binary search.
  • MVCC info is OR_MVCC_MAX_HEADER_SIZE per OID, always. Inline OIDs in the leaf record can omit mvcc_ins_id / mvcc_del_id when not relevant; in the overflow page every object has both fields. The assertion on the call site (btree.c:3994“BTREE_MVCC_INFO_HAS_INSID && HAS_DELID”) enforces this. Reason: binary search by OID needs fixed-size records, and predictability beats space.
  • spage_max_space_for_new_record decides where a new OID goes (btree.c:11609). The walker btree_find_free_overflow_oids_page traverses the chain and stops at the first page with BTREE_OBJECT_FIXED_SIZE bytes free. If none has room, btree_start_overflow_page allocates a new page and links it in at the head of the chain (the new page’s next_vpid becomes the old first overflow page; LEAF_REC.ovfl is rewritten to the new page). This is the opposite of append — new pages go to the front because that saves walking to the tail.
// excerpt — btree.c around 11340-11423
log_sysop_start (...);
btree_start_overflow_page (btid_int, object_info, first_ovfl_vpid /* old first */,
pgbuf_get_vpid_ptr (leaf_page), &ovfl_vpid /* new */, &ovfl_page);
/* btree_start_overflow_page:
* - btree_get_new_page → pgbuf_fix(NEW_PAGE)
* - ovf_header_info.next_vpid := first_ovfl_vpid (link new page to old chain)
* - btree_init_overflow_header → spage_insert_at(HEADER, &ovf_header_info)
* - btree_record_append_object → write [oid|ins_mvccid|del_mvccid] into rec
* - spage_insert_at(slot 1, &rec)
* - log_append_redo_data(RVBT_RECORD_MODIFY_NO_UNDO, full page replay)
*/
LOG_RV_RECORD_SET_MODIFY_MODE (&insert_helper->leaf_addr, LOG_RV_RECORD_UPDATE_PARTIAL);
btree_leaf_record_change_overflow_link (..., &ovfl_vpid, &rv_undo_data_ptr,
&insert_helper->rv_redo_data_ptr);
spage_update (leaf_page, search_key->slotid, leaf_rec);
log_append_undoredo_data (RVBT_RECORD_MODIFY_UNDOREDO, ...); /* leaf record undo + redo */
log_sysop_end (...);

The system-op bracket is critical: if the server crashes between “new overflow page allocated” and “leaf record points at it”, the sysop log replay deallocates the orphan page (or, if the bracket is log_sysop_end_logical_undo, re-merges; the OID-list path uses plain commit-on-success / abort-on-failure). The header comment in btree.c notes this risk: “Note that this page may be leaked if server crashes before changing the link in leaf page” — which is why the new-page allocation and the leaf link rewrite must be in the same sysop.

btree_overflow_remove_object (btree.c:1622) deletes the OID from its page; if that empties the page, the chain has to be re-linked. There are two cases:

  • First overflow page — the leaf record’s LEAF_REC.ovfl must be rewritten to point at the next overflow page (or NULL_VPID if the chain becomes empty). btree_modify_overflow_link does this for the leaf side; btree_overflow_remove_object for the overflow side.
  • Non-first overflow page — the previous overflow page’s BTREE_OVERFLOW_HEADER.next_vpid is rewritten. This requires the previous page to be fixed simultaneously with the page being removed; btree_find_oid_and_its_page (btree.c:11646-11772) returns both via *found_page and *prev_page to support this.
// btree_modify_overflow_link — btree.c:10054 (condensed)
spage_get_record (ovfl_page, HEADER, &overflow_header_record, COPY); /* undo image */
ovf_header_info.next_vpid := next_ovfl_vpid;
spage_update (ovfl_page, HEADER, &overflow_header_record); /* new image */
log_append_undoredo_data (RVBT_RECORD_MODIFY_UNDOREDO,
ovf_addr,
undo (old header), redo (new header));
pgbuf_set_dirty (ovfl_page, DONT_FREE);

RVBT_RECORD_MODIFY_UNDOREDO is the unified physical undo/redo record for B+Tree node modifications — the same one used for inline leaf inserts and deletes — keyed off the slot ID and the modify-mode flag (LOG_RV_RECORD_UPDATE_ALL vs LOG_RV_RECORD_UPDATE_PARTIAL vs LOG_RV_RECORD_INSERT vs LOG_RV_RECORD_DELETE). The B+Tree overflow chain reuses this same record family rather than minting a per-overflow record type.

When the key itself is too long for inline storage, a wholly different path applies: btree_store_overflow_key (btree.c:2020). The key is serialized via pr_type->index_writeval into a RECDES, and that RECDES becomes the input to overflow_insert with file type FILE_BTREE_OVERFLOW_KEY:

// btree_store_overflow_key — btree.c:2020 (condensed)
overflow_file_vfid = btid->ovfid; /* shared with OID-list overflow */
rec.area_size = size;
rec.data = db_private_alloc (size);
or_init (&buf, rec.data, rec.area_size);
pr_type->index_writeval (&buf, key_ptr);
rec.length = (int) (buf.ptr - buf.buffer);
overflow_insert (&overflow_file_vfid, first_overflow_page_vpid, &rec, FILE_BTREE_OVERFLOW_KEY);

The leaf record then carries the VPID of the first overflow page (instead of the inlined key bytes), with a flag (BTREE_LEAF_RECORD_OVERFLOW_KEY) telling readers to consult the overflow chain. btree_load_overflow_key (btree.c:2131) is the read path:

// btree_load_overflow_key — btree.c:2131 (condensed)
rec.area_size = overflow_get_length (thread_p, first_overflow_page_vpid);
rec.data = db_private_alloc (rec.area_size);
overflow_get (thread_p, first_overflow_page_vpid, &rec, NULL); /* mvcc_snapshot=NULL */
or_init (&buf, rec.data, rec.length);
pr_type->index_readval (&buf, key, btid->key_type, -1, /*copy=*/true, NULL, 0);

Two design points:

  • mvcc_snapshot=NULL in the overflow_get call — overflow keys do not carry MVCC info. The visibility of the OID attached to that key is what mediates whether the key is visible at all; the key bytes themselves are MVCC-invariant for as long as the leaf entry exists.
  • The key is always copied out of the overflow record (copy=true). Unlike inline keys, which can be peeked at zero-copy under the leaf latch, an overflow key requires materializing into a DB_VALUE because the page is unfixed before index_readval returns.

btree_delete_overflow_key (btree.c:2202) reads the leaf record, extracts the first overflow VPID, and calls overflow_delete to deallocate the chain. It does not delete the leaf record’s slot — that is the caller’s job.

The overflow file emits four record types, all defined in recovery.h:100-103:

CodeWhen emittedRedo handler
RVOVF_NEWPAGE_INSERTPer-page, during overflow_insert — full-page redo imageoverflow_rv_newpage_insert_redolog_rv_copy_char
RVOVF_NEWPAGE_LINKWhen overflow_update extends the chainUndo: overflow_rv_newpage_link_undo (sets next_vpid := NULL); Redo: overflow_rv_link (sets next_vpid := )
RVOVF_PAGE_UPDATEPer-page, during overflow_update — undo + redo imagesRedo: overflow_rv_page_update_redo (sets ptype, copies bytes); undo is a per-page byte image
RVOVF_CHANGE_LINKWhen overflow_update shrinks the chainSame handler as RVOVF_NEWPAGE_LINK

LOG_DUMMY_OVF_RECORD is not in the RV* family — it is a zero-length log record whose only purpose is to anchor an LSN that HA replication and vacuum_master can use as a “this is a chain head” marker. It is emitted on the first page only (overflow_file.c:202 for insert, :461 for update) and has no redo / undo function; replicas observing this LSN look up the page-LSN to identify the head.

flowchart TB
  subgraph INSERT["overflow_insert"]
    II1["log_sysop_start"]
    II2["file_alloc_multiple\n(npages)"]
    II3["per page i:\nLOG_DUMMY_OVF_RECORD on i==0\nRVOVF_NEWPAGE_INSERT on each"]
    II4["log_sysop_attach_to_outer\n(commit / fail → abort)"]
    II1 --> II2 --> II3 --> II4
  end
  subgraph UPDATE["overflow_update"]
    UU1["log_sysop_start"]
    UU2["per page touched:\nRVOVF_PAGE_UPDATE undo+redo\nRVOVF_NEWPAGE_LINK if extending\nLOG_DUMMY_OVF_RECORD on first"]
    UU3["if shrinking:\nRVOVF_CHANGE_LINK undoredo\nthen file_dealloc tail pages"]
    UU4["log_sysop_attach_to_outer\n(commit / fail → abort)"]
    UU1 --> UU2 --> UU3 --> UU4
  end
  subgraph DELETE["overflow_delete"]
    DD1["overflow_traverse(OVERFLOW_DO_DELETE)"]
    DD2["per page:\npgbuf_unfix\nfile_dealloc(vpid) — no per-page log record\n(file manager logs the dealloc itself)"]
    DD1 --> DD2
  end

The asymmetry — insert and update emit RVOVF_* records, delete does not — is because deletion is implemented as file_dealloc on each page, and the file manager logs page deallocations through its own RV*_FILE_* record family. overflow_file.c does not need to log anything beyond what file_dealloc already does.

For heap big-records, vacuum visits the home heap page chain in vacuum_heap_page and, when it encounters a REC_BIGONE slot whose visibility is “delete-visible to all” (i.e., mvcc_del_id on the overflow page is older than the oldest active snapshot), calls heap_ovf_delete to free the overflow chain. The home slot transitions to REC_MARKDELETED / REC_DELETED_WILL_REUSE.

For B+Tree overflow OID lists, vacuum visits each B+Tree leaf and, for each OID whose mvcc_del_id is too old, calls btree_overflow_remove_object (or its in-leaf counterpart). If that leaves an overflow page empty, the page is unlinked and file_dealloc’d via btree_modify_overflow_link plus the file-manager call. The overflow file’s pages are reused by later inserts to the same B+Tree — never across trees.

For B+Tree overflow keys, vacuum has no special role — the key chain is freed only when the leaf record carrying it is deleted, via btree_delete_overflow_keyoverflow_delete. Since the leaf record is deleted by either DDL (key-type change, drop-index) or row-delete plus collapse-merge, there is no MVCC delay.

Two callers depend on the head-only length field:

  1. overflow_get_length (overflow_file.c:692) is an O(1) query — fix the head, read the field, unfix. Used by btree_load_overflow_key to size the destination buffer before the first read.
  2. overflow_get_capacity (overflow_file.c:935) reports total bytes, page count, header overhead, and free space in the last page — useful for diagnostic dumps and capacity planning. The walker uses length to decide how many pages to traverse, instead of walking until a NULL next_vpid, which means a corrupt chain (NULL link before length bytes consumed) trips an explicit error rather than a silent truncation.

The trade-off is that overflow_update has to maintain length correctly across resize, including the case where the new length is smaller — captured by the post-update truncation loop in overflow_update (overflow_file.c:563-588). The first OVERFLOW_FIRST_PART page is always retained even if the new length fits in less than one full page (i.e., the chain never shrinks below 1 page) because removing the head would invalidate the REC_BIGONE reference in the heap home record.

The overflow file uses two distinct page types depending on the caller:

  • Heap big-records and B+Tree overflow keys allocate pages of type PAGE_OVERFLOW. overflow_insert initializes new pages with file_init_page_type (or file_init_temp_page_type for FILE_TEMP) and ptype = PAGE_OVERFLOW. Reads validate via pgbuf_check_page_ptype (..., PAGE_OVERFLOW). This page type signals to the buffer-pool layer that the page is not slotted in the usual sense — the body is raw OVERFLOW_FIRST_PART / OVERFLOW_REST_PART headers plus contiguous data, not slot directory + records.
  • B+Tree overflow OID lists allocate pages of type PAGE_BTREE — the same as leaf and non-leaf pages. They are slotted (slot 0 = BTREE_OVERFLOW_HEADER, slot 1 = OID record), so the standard slotted-page invariants apply.

This is the cleanest tell that the two B+Tree overflow paths are unrelated: the OID-list overflow chain is just “more B+Tree pages that happen to be at depth N+1”, while the overflow-key chain shares its on-disk layout with heap big-records.

Anchor on symbol names, not line numbers. Use git grep -n '<symbol>' src/storage/ to locate current positions. The trailing position-hint table records the lines observed at this revision.

  • struct overflow_first_part (overflow_file.h) — head-page format: next_vpid + length + data[1].
  • struct overflow_rest_part (overflow_file.h) — rest-page format: next_vpid + data[1].
  • enum OVERFLOW_DO_FUNC { OVERFLOW_DO_DELETE, OVERFLOW_DO_FLUSH } (overflow_file.c) — discriminator for the shared traversal worker.
  • BTREE_OVERFLOW_HEADER (btree_load.h:244) — single next_vpid field; lives in slot 0 of B+Tree OID-overflow pages.
  • File-type constants FILE_TEMP, FILE_BTREE_OVERFLOW_KEY, FILE_MULTIPAGE_OBJECT_HEAP (file_manager.h) — the three file types overflow_insert accepts.
  • Page-type constants PAGE_OVERFLOW, PAGE_BTREE (page_buffer.h) — distinguish heap-big-record / overflow-key pages from B+Tree OID-list pages.
  • WAL codes RVOVF_NEWPAGE_INSERT, RVOVF_NEWPAGE_LINK, RVOVF_PAGE_UPDATE, RVOVF_CHANGE_LINK (recovery.h:100-103).
  • overflow_insert — multi-page allocation + per-page write + per-page redo log under one sysop. Pre-computes page count.
  • overflow_update — extend / shrink / overwrite the chain; asserts FILE_MULTIPAGE_OBJECT_HEAP (heap-only path).
  • overflow_delete — wraps overflow_traverse(OVERFLOW_DO_DELETE).
  • overflow_flush — wraps overflow_traverse(OVERFLOW_DO_FLUSH).
  • overflow_get — convenience wrapper for overflow_get_nbytes(0, -1, ...).
  • overflow_get_nbytes — partial-record read with start offset and max length; applies MVCC visibility on the head page.
  • overflow_get_length — O(1) total-bytes query (head page only).
  • overflow_get_capacity — full-chain walk; reports size, pages, overhead, free space.
  • overflow_get_first_page_data — pointer to the first page’s data[] area (used by heap_get_mvcc_rec_header_from_overflow).

Internal helpers (src/storage/overflow_file.c)

Section titled “Internal helpers (src/storage/overflow_file.c)”
  • overflow_next_vpid — read the next-page link, type-aware (head vs rest).
  • overflow_traverse — walk-and-call-callback worker; shared by delete and flush.
  • overflow_delete_internal — per-page deallocation (file_dealloc).
  • overflow_flush_internal — per-page WAL flush (pgbuf_flush_with_wal).
  • overflow_rv_newpage_insert_redo — full-page redo via log_rv_copy_char.
  • overflow_rv_newpage_link_undo — set next_vpid := NULL on the page that was linked-to; undoes a chain extension.
  • overflow_rv_link — set next_vpid := <data>; serves as redo for both RVOVF_NEWPAGE_LINK (extend) and RVOVF_CHANGE_LINK (shrink).
  • overflow_rv_link_dump — diagnostic dump of a next_vpid log record.
  • overflow_rv_page_update_redo — set page type (PAGE_OVERFLOW) and copy bytes via log_rv_copy_char.
  • overflow_rv_page_dump — diagnostic dump for RVOVF_PAGE_UPDATE records.

Heap-side wrappers (src/storage/heap_file.c)

Section titled “Heap-side wrappers (src/storage/heap_file.c)”
  • heap_ovf_find_vfid — get / create the heap’s overflow file VFID from HEAP_HDR_STATS.ovf_vfid. With docreate=true, creates FILE_MULTIPAGE_OBJECT_HEAP under a sysop, applies TDE algorithm from class metadata, logs RVHF_STATS before / after for the heap-header change.
  • heap_ovf_insert — wrap overflow_insert(..., FILE_MULTIPAGE_OBJECT_HEAP); convert returned VPID into an OID with slotid = NULL_SLOTID for the home page’s REC_BIGONE body.
  • heap_ovf_update — wrap overflow_update(..., FILE_MULTIPAGE_OBJECT_HEAP).
  • heap_ovf_delete — wrap overflow_delete. If ovf_vfid_p is NULL, looks it up via heap_ovf_find_vfid.
  • heap_ovf_flush / heap_ovf_get_length / heap_ovf_get / heap_ovf_get_capacity — straight wrappers over the corresponding overflow_* functions.
  • heap_get_mvcc_rec_header_from_overflow — peek the MVCC header from the head page’s data[] (uses overflow_get_first_page_data).
  • heap_set_mvcc_rec_header_on_overflow — write the MVCC header back; forces full OR_MVCC_MAX_HEADER_SIZE even for records that don’t yet have all flags set, by adding MVCCID_ALL_VISIBLE for missing INSID and MVCCID_NULL for missing DELID. The forced-max-size invariant is what lets mvcc_header_size_lookup predict the in-place update cost.
  • heap_get_bigone_content — read a big record into a RECDES, with optional scan_cache-backed buffer reuse.

B+Tree-side overflow-key wrappers (src/storage/btree.c)

Section titled “B+Tree-side overflow-key wrappers (src/storage/btree.c)”
  • btree_create_overflow_key_file — create FILE_BTREE_OVERFLOW_KEY and store the VFID in BTID_INT::ovfid. Allocates ≥ 3 pages up front; applies TDE.
  • btree_store_overflow_key — serialize a DB_VALUE key into a RECDES; call overflow_insert(... FILE_BTREE_OVERFLOW_KEY); output the head VPID.
  • btree_load_overflow_key — read the head’s length, allocate, call overflow_get(... NULL); deserialize into a DB_VALUE.
  • btree_delete_overflow_key — find the head VPID embedded in the leaf / non-leaf record; call overflow_delete.

B+Tree-side overflow-OID-list helpers (src/storage/btree.c)

Section titled “B+Tree-side overflow-OID-list helpers (src/storage/btree.c)”
  • btree_start_overflow_page — allocate a new B+Tree page (PAGE_BTREE), initialize BTREE_OVERFLOW_HEADER, append the new object to slot 1, and link the new page in at the head of the chain (new → old-first).
  • btree_modify_overflow_link — rewrite an overflow page’s header next_vpid; emits RVBT_RECORD_MODIFY_UNDOREDO.
  • btree_find_free_overflow_oids_page — walk chain, return first page with enough room, else NULL.
  • btree_find_oid_and_its_page — locate the page (leaf or overflow) carrying a given (key, OID) pair, and optionally return the previous page so the caller can rewrite the predecessor’s link.
  • btree_find_oid_from_ovfl — binary search OIDs in one overflow page’s slot-1 record (sorted by OID).
  • btree_overflow_remove_object — physically remove an OID from an overflow page; if the page becomes empty, deallocate and re-link.
  • btree_overflow_record_replace_object — replace one OID with another (used by unique-constraint maintenance and FK).
  • btree_get_next_overflow_vpid (in btree_load.c) — read BTREE_OVERFLOW_HEADER.next_vpid from a fixed page.
  • btree_get_overflow_header (in btree_load.c) — peek the overflow page’s HEADER slot.
  • btree_init_overflow_header (in btree_load.c) — write the overflow page’s HEADER slot during btree_start_overflow_page.

These line numbers were observed when the document was last updated:. Symbols are authoritative; lines may drift.

SymbolFileLine
struct overflow_first_partoverflow_file.h38
struct overflow_rest_partoverflow_file.h46
enum OVERFLOW_DO_FUNCoverflow_file.c45
overflow_insertoverflow_file.c81
overflow_next_vpidoverflow_file.c275
overflow_traverseoverflow_file.c296
overflow_updateoverflow_file.c373
overflow_delete_internaloverflow_file.c613
overflow_deleteoverflow_file.c644
overflow_flush_internaloverflow_file.c655
overflow_flushoverflow_file.c677
overflow_get_lengthoverflow_file.c691
overflow_get_nbytesoverflow_file.c738
overflow_getoverflow_file.c916
overflow_get_capacityoverflow_file.c934
overflow_rv_newpage_insert_redooverflow_file.c1112
overflow_rv_newpage_link_undooverflow_file.c1124
overflow_rv_linkoverflow_file.c1144
overflow_rv_page_update_redooverflow_file.c1178
overflow_get_first_page_dataoverflow_file.c1217
RVOVF_NEWPAGE_INSERT / etc.recovery.h100
BTREE_OVERFLOW_HEADERbtree_load.h244
btree_get_overflow_headerbtree_load.c339
btree_init_overflow_headerbtree_load.c575
btree_get_next_overflow_vpidbtree_load.c674
btree_create_overflow_key_filebtree.c1975
btree_store_overflow_keybtree.c2020
btree_load_overflow_keybtree.c2131
btree_delete_overflow_keybtree.c2202
btree_start_overflow_pagebtree.c3973
btree_modify_overflow_linkbtree.c10054
btree_find_free_overflow_oids_pagebtree.c11579
btree_find_oid_and_its_pagebtree.c11646
btree_find_oid_from_ovflbtree.c12037
heap_ovf_find_vfidheap_file.c6462
heap_ovf_insertheap_file.c6568
heap_ovf_updateheap_file.c6597
heap_ovf_deleteheap_file.c6632
heap_get_mvcc_rec_header_from_overflowheap_file.c19541
heap_set_mvcc_rec_header_on_overflowheap_file.c19567
heap_get_bigone_contentheap_file.c19610
  • overflow_update is heap-only. The assert (file_type == FILE_MULTIPAGE_OBJECT_HEAP) at overflow_file.c:398 rejects any other file type — including FILE_BTREE_OVERFLOW_KEY. B+Tree overflow keys cannot be updated in place; the caller must overflow_delete the old chain and overflow_insert a new one, even for tiny edits. The heap path is privileged because the row body’s MVCC update flow already implies “edit the existing chain in place, extending or shrinking”, and duplicating the data twice (delete + insert) would burn double the WAL.

  • B+Tree OID-list overflow pages are PAGE_BTREE, not PAGE_OVERFLOW. The pgbuf_check_page_ptype at btree.c:11606 is the easiest way to tell that the OID-list path does not use overflow_file.c. The overflow-key path does use it and gets PAGE_OVERFLOW pages from file_init_page_type inside overflow_insert.

  • OVERFLOW_FIRST_PART::length is computed and maintained by the overflow-file layer, not by callers. Callers see only recdes->length on the way in and on the way out; the internal field is private.

  • LOG_DUMMY_OVF_RECORD carries no payload. It is purely an LSN anchor for HA replication and vacuum to identify the head page of a chain. Removing it would not break recovery (the chain itself has full redo) but would break the replication side that consumes the LSN to position itself in the log stream.

  • Heap big-record MVCC delete edits the overflow page, not the heap home page. The home page’s REC_BIGONE slot is only modified at vacuum / non-MVCC-class-delete time. This is the structural difference vs REC_HOME records, where the MVCC header lives inline in the heap slot itself.

  • Per-page lock absence is a deliberate optimization. The comment block at overflow_file.c:77-79 documents the invariant: each chain is private to one logical record, and that record’s higher-level lock (row lock or key-range lock) protects the whole chain. This is what allows pgbuf_fix(PGBUF_LATCH_WRITE) on overflow pages without lock manager involvement.

  1. Does overflow_update ever trigger a chain split? The chain is single-linked, so an update that grows the record only ever appends pages at the tail or extends in place. But the code at overflow_file.c:511-541 has a branch that appears to allocate a new page mid-chain when next_vpid is NULL but more bytes remain. Investigation path: trace whether the resulting chain is always tail-extended or whether mid-chain insertion is possible.

  2. Why does overflow_delete_internal pass FILE_UNKNOWN_TYPE to file_dealloc? The TODO comment at overflow_file.c:621 flags this as something to clarify. The file manager presumably looks up the actual type from the file descriptor on disk; verifying this and removing the TODO would be a small cleanup. Investigation path: file_dealloc signature and behavior on FILE_UNKNOWN_TYPE.

  3. FILE_TEMP use case in overflow_insert. The valid-types assert at overflow_file.c:102 accepts FILE_TEMP (described as “sort files”). Where in CUBRID does the external sort write multi-page records via overflow_insert? The path does not appear in btree_load.c’s sort phase or in query_executor.c. Investigation path: grep for overflow_insert callers with a FILE_TEMP argument.

  4. TDE coverage on B+Tree OID-list overflow pages. The TDE algorithm is applied to BTID_INT::ovfid once at btree_create_overflow_key_file time. Both overflow-key and OID-list pages live in this file. Are OID-list pages actually encrypted at rest, or does TDE skip them because they are PAGE_BTREE and the encryption hook is keyed off PAGE_OVERFLOW? Investigation path: pgbuf_dealloc_page / disk_page_write TDE path and the PAGE_* discriminator.

  5. Concurrency between heap_ovf_update and a concurrent reader. The reader fixes the head page first, reads length, then walks the chain — but each rest-page fix is independent. If heap_ovf_update shrinks the chain between the reader’s fix of page N and page N+1, the reader might fix a deallocated page. Is this prevented by the row-level lock the reader holds, or by the page-buffer’s pin-count blocking dealloc? Investigation path: trace pgbuf_fix of a page that is concurrently being file_dealloc’d under an X row lock vs SELECT FOR UPDATE.

  6. B+Tree overflow-key MVCC. The code passes mvcc_snapshot=NULL when reading an overflow key, on the assumption that the key is MVCC-invariant for the lifetime of the leaf entry. But the leaf entry has its own MVCC info per OID; if all OIDs for a key are vacuumed, the leaf entry is removed and the overflow-key chain is freed. Is there a window where a snapshot reader could observe a leaf entry whose overflow-key chain has been concurrently freed? Investigation path: vacuum’s leaf-purge path and its ordering vs btree_delete_overflow_key.

This document is Code-only (sources: []). All material was distilled from the CUBRID source tree referenced below; no slide deck, raw analysis PDF, or external write-up was used.

CUBRID source (under /data/hgryoo/references/cubrid/)

Section titled “CUBRID source (under /data/hgryoo/references/cubrid/)”
  • src/storage/overflow_file.h
  • src/storage/overflow_file.c
  • src/storage/heap_file.c (heap_ovf_*, heap_get_mvcc_rec_header_from_overflow, heap_set_mvcc_rec_header_on_overflow, REC_BIGONE handling around lines 7527–7873 and 21420–21504)
  • src/storage/btree.c (btree_create_overflow_key_file, btree_store_overflow_key, btree_load_overflow_key, btree_delete_overflow_key, btree_start_overflow_page, btree_modify_overflow_link, btree_find_free_overflow_oids_page, btree_find_oid_and_its_page, btree_find_oid_from_ovfl)
  • src/storage/btree_load.h (BTREE_OVERFLOW_HEADER)
  • src/storage/btree_load.c (btree_get_overflow_header, btree_init_overflow_header, btree_get_next_overflow_vpid)
  • src/storage/file_manager.h (FILE_MULTIPAGE_OBJECT_HEAP, FILE_BTREE_OVERFLOW_KEY, FILE_TEMP)
  • src/transaction/recovery.h (RVOVF_NEWPAGE_INSERT, RVOVF_NEWPAGE_LINK, RVOVF_PAGE_UPDATE, RVOVF_CHANGE_LINK)
  • knowledge/code-analysis/cubrid/cubrid-heap-manager.md — the slotted-page substrate, REC_BIGONE record type, and HEAP_HDR_STATS::ovf_vfid location.
  • knowledge/code-analysis/cubrid/cubrid-btree.md — the three-record-kinds layout (NON_LEAF_REC, LEAF_REC, overflow), BTID_INT::ovfid, and the btree_find_oid_and_its_page unique-enforcement workhorse.
  • knowledge/code-analysis/cubrid/cubrid-mvcc.mdmvcc_rec_header layout and the visibility predicate that overflow_get_nbytes calls.
  • knowledge/code-analysis/cubrid/cubrid-page-buffer-manager.mdpgbuf_fix discipline and PAGE_OVERFLOW / PAGE_BTREE page-type checks.
  • knowledge/code-analysis/cubrid/cubrid-log-manager.mdlog_sysop_start / log_sysop_attach_to_outer / log_sysop_abort system-op semantics, RVOVF_* record family interaction with WAL, and LOG_DUMMY_OVF_RECORD’s role for HA replication.

Textbook chapters (under knowledge/research/dbms-general/)

Section titled “Textbook chapters (under knowledge/research/dbms-general/)”
  • Database Internals (Petrov), Ch. 3 §“File Formats”, §“Variable-Size Records” — slotted page, forwarding, and out-of-line overflow as the three classical strategies.
  • Database Systems: The Complete Book (Garcia-Molina, Ullman, Widom), §13.7 “Variable-Length Data and Records” — in-line forwarding vs out-of-line overflow trade-offs.
  • Comer (1979), The Ubiquitous B-Tree (ACM CSUR) — the per-key OID list and overflow-chain idea predates B+Trees.