CUBRID Vacuum — Reclaiming Dead MVCC Versions Through Log Replay
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Source verification (as of 2026-04-30)
- Beyond CUBRID — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”A multi-version concurrency control (MVCC) system creates dead versions: every UPDATE writes a new version of a row and leaves the previous one in place; every DELETE marks a row deleted but does not physically reclaim it; every aborted INSERT leaves a tombstone. Left unchecked, the heap and the indexes accumulate these tombstones and old versions until storage and scans are dominated by garbage. The vacuum subsystem is the engine’s garbage collector for MVCC.
Database Internals (Petrov, ch. 5 §“MVCC”) frames the reclamation problem in a single sentence: a version is reclaimable when no in-flight or future transaction’s snapshot can see it. The mechanics depend on what the engine does next:
- The engine knows the oldest visible MVCCID — every transaction’s snapshot bounds visibility, and the smallest of those bounds across all live transactions is the threshold.
- A version with
xmax < oldest_visible_mvccid, or an aborted-insertion tombstone withxmin < oldest_visible_mvccid, is dead and reclaimable. - The reclamation work happens on data pages (heap rows,
B+Tree leaf entries, OID list compaction) but is driven by the
log: every MVCC operation has emitted a
LOG_MVCC_*record, and the vacuum subsystem walks those records to learn what to clean.
Two implementation choices the model leaves open shape every real engine and frame the rest of this document:
- What drives reclamation: data-side scan or log-side replay? PostgreSQL’s autovacuum scans heap files; InnoDB’s purge thread walks the undo log. CUBRID is in the second camp — vacuum walks the WAL itself, fixing target pages on demand. The trade-off: data-side scan visits each tuple once per pass; log-side replay visits each modification once. Log-side wins on workloads where most tuples are never updated, loses on long-running scans without modifications.
- What granularity of work? Per-tuple visit, per-page sweep, or per-block log replay? CUBRID picks per-block — the log is chunked into fixed-size vacuum blocks (default 31 log pages), and one block is one work unit dispatched to a worker.
After the choices are named, every CUBRID-specific structure in this document either implements one of them or makes the implementation faster.
Common DBMS Design
Section titled “Common DBMS Design”Every MVCC engine ships some form of garbage collector, and the shapes converge on a small handful of patterns.
Oldest-visible-MVCCID watermark
Section titled “Oldest-visible-MVCCID watermark”Reclamation cannot proceed past the smallest active-snapshot
MVCCID. Every engine maintains this watermark, recomputed when
transactions begin or end. PostgreSQL calls it OldestXmin and
vacuum_defer_cleanup_age; InnoDB calls it purge_view; CUBRID
calls it oldest_visible_mvccid on the log header and inside each
TDES’s mvccinfo.
Master / worker pool
Section titled “Master / worker pool”A single master picks work, multiple workers do it. The master needs to be cheap so it can scan candidate work continuously; the workers need their own state because vacuuming pages requires page fixes, log reads, and undo-data buffers. PostgreSQL’s autovacuum launcher + workers, InnoDB’s coordinator + workers, CUBRID’s master + workers are all the same architecture.
Dropped-file table
Section titled “Dropped-file table”When a table or index is dropped while old MVCC versions still
reference it, vacuum must know not to follow the file ID into a
freed extent. Every engine keeps a separate map from dropped-file
ID to the MVCCID at which it was dropped; a vacuum job consults
the map before chasing a record into a missing file. CUBRID’s
vacuum_dropped_files_page is the structure.
Block / batch unit of work
Section titled “Block / batch unit of work”A “block” or “batch” or “page” of log records is the unit a worker
consumes. The size is a tuning knob: larger blocks amortise
per-job overhead; smaller blocks parallelise better. PostgreSQL
uses heap pages as the unit; InnoDB uses undo-log batches; CUBRID
uses 31 log pages by default
(VACUUM_LOG_BLOCK_PAGES_DEFAULT).
Per-page sequential, across-page parallel
Section titled “Per-page sequential, across-page parallel”On a single target page, vacuum operations must be applied in LSN order to keep the page state consistent. Across pages, vacuum is embarrassingly parallel. The buffer manager’s page-fix is the natural synchronisation primitive — a single page can only have one writer at a time, so a worker that touches it serialises against any other worker that targets the same page.
Theory ↔ CUBRID mapping
Section titled “Theory ↔ CUBRID mapping”| Theoretical concept | CUBRID name |
|---|---|
| Oldest-visible-MVCCID watermark | log_Gl.hdr.oldest_visible_mvccid (log_storage.hpp); per-TDES mvccinfo |
| Vacuum master | vacuum_master_task : public cubthread::entry_task (vacuum.c:813) |
| Vacuum worker | VACUUM_WORKER struct, max VACUUM_MAX_WORKER_COUNT = 50 |
| Worker state | VACUUM_WORKER_STATE { INACTIVE, PROCESS_LOG, EXECUTE } (vacuum.h) |
| Block of work | VACUUM_DATA_ENTRY { blockid, start_lsa, oldest_visible_mvccid, newest_mvccid } |
| Block size in log pages | VACUUM_LOG_BLOCK_PAGES_DEFAULT = 31 |
| Vacuum data file | vacuum_Data global with first_page/last_page cached |
| Per-page block list | VACUUM_DATA_PAGE { next_page, index_unvacuumed, index_free, data[] } |
| Block-status bit-pack | Top 3 bits of blockid: STATUS_VACUUMED, IN_PROGRESS, AVAILABLE; +INTERRUPTED flag |
| Job cursor | vacuum_job_cursor class — tracks progress across blockid relocations |
| Heap-side target list | VACUUM_HEAP_OBJECT { vfid, oid } array per worker |
| Dropped-file table | vacuum_dropped_files_page + vacuum_dropped_file records (vacuum.c:580) |
| Per-block log link in WAL | LOG_VACUUM_INFO::prev_mvcc_op_log_lsa (log_record.hpp) |
| Log block boundary on the log header | log_header::vacuum_last_blockid and does_block_need_vacuum |
| Per-record dispatch | Reuses RV_fun[] with vacuum-side undo / mvcc-undo paths |
CUBRID’s Approach
Section titled “CUBRID’s Approach”The vacuum subsystem has four moving parts: vacuum data — the on-disk catalogue of work to do, the master task — the selector that picks the next block, the worker pool — the parallel executors that consume blocks, and the dropped-file table — the catalogue of files to skip. We walk them in that order.
Overall structure
Section titled “Overall structure”flowchart LR
subgraph LOG["WAL log (cubrid-log-manager.md)"]
LR1["MVCC op record\nblock B-1"]
LR2["MVCC op record\nblock B"]
LR3["MVCC op record\nblock B+1"]
end
subgraph VD["vacuum_Data (vacuum data file)"]
VP1["VACUUM_DATA_PAGE 1\nentries..."]
VP2["VACUUM_DATA_PAGE 2\nentries..."]
VPn["..."]
end
subgraph M["Master (vacuum_master_task)"]
CUR["vacuum_job_cursor"]
SEL["select next block\nbelow watermark"]
end
subgraph W["Worker pool (≤ 50 VACUUM_WORKER)"]
W1["worker 1\nstate=PROCESS_LOG"]
W2["worker 2\nstate=EXECUTE"]
Wn["..."]
end
subgraph DF["Dropped files (vacuum_dropped_files_page)"]
DFP["vfid → mvccid map"]
end
subgraph TGT["Target heap and B+Tree pages"]
HP["heap page"]
BT["btree leaf"]
end
LOG -->|consume_buffer_log_blocks| VD
VD -->|cursor visit| M
M -->|dispatch block| W
W -->|read MVCC ops in block| LOG
W -->|consult before chase| DF
W -->|fix + clean| TGT
The figure encodes three loops. (producer loop) the WAL emits
MVCC operation records; vacuum_consume_buffer_log_blocks
periodically translates them into vacuum-data entries. (master
loop) the master walks vacuum_Data looking for blocks below
the oldest-visible watermark and dispatches them. (worker
loop) workers fetch a block, walk its log records, and clean
target pages.
Vacuum data — the catalogue of pending work
Section titled “Vacuum data — the catalogue of pending work”Each entry is one block of log to vacuum.
// VACUUM_DATA_ENTRY — src/query/vacuum.cstruct vacuum_data_entry{ VACUUM_LOG_BLOCKID blockid; /* blockid + flags packed in top bits */ LOG_LSA start_lsa; /* LSA of last MVCC op log record in block */ MVCCID oldest_visible_mvccid; /* threshold at the time the block was logged */ MVCCID newest_mvccid; /* newest MVCCID in this block */
vacuum_data_entry () = default; vacuum_data_entry (const log_lsa &lsa, MVCCID oldest, MVCCID newest); vacuum_data_entry (const log_header &hdr);
VACUUM_LOG_BLOCKID get_blockid () const; bool is_available () const; bool is_vacuumed () const; bool is_job_in_progress () const; bool was_interrupted () const; void set_vacuumed (); void set_job_in_progress (); void set_interrupted ();};The packing is interesting and worth marking up. blockid is
64-bit; the top 3 bits carry a 4-state status (AVAILABLE,
IN_PROGRESS, VACUUMED — leaving one combination free) and the
4th-from-top bit carries an INTERRUPTED flag. The remaining
60 bits carry the actual block id. The macros encode this:
// Block status macros — src/query/vacuum.c#define VACUUM_DATA_ENTRY_FLAG_MASK 0xE000000000000000#define VACUUM_DATA_ENTRY_BLOCKID_MASK 0x1FFFFFFFFFFFFFFF
#define VACUUM_BLOCK_STATUS_VACUUMED 0x8000000000000000#define VACUUM_BLOCK_STATUS_IN_PROGRESS_VACUUM 0x4000000000000000#define VACUUM_BLOCK_STATUS_AVAILABLE 0x0000000000000000
#define VACUUM_BLOCK_FLAG_INTERRUPTED 0x2000000000000000The bit-pack saves the per-entry int status; bool interrupted;
that would otherwise pad to 8 bytes. Across millions of entries
the saving is real.
The blocks live in pages of the vacuum data file:
// VACUUM_DATA_PAGE — src/query/vacuum.cstruct vacuum_data_page{ VPID next_page; /* Linked list of pages */ INT16 index_unvacuumed; /* First not-yet-vacuumed index in data[] */ INT16 index_free; /* First free index in data[] */ VACUUM_DATA_ENTRY data[1]; /* Variable-size array */
bool is_empty () const; bool is_index_valid (INT16 index) const; INT16 get_index_of_blockid (VACUUM_LOG_BLOCKID blockid) const; VACUUM_LOG_BLOCKID get_first_blockid () const;};The index_unvacuumed / index_free cursors mean that vacuumed
entries at the head of a page can be cleared without reshuffling
the live entries; growth happens at the tail. When index_unvacuumed
catches up with index_free, the page is empty and unlinked.
The vacuum_Data global keeps the first and last pages of
this list permanently fixed in the buffer pool — vacuum reads them
on every cycle, so the per-fix overhead would be prohibitive. The
vacuum_fix_data_page macro short-circuits to the cached pages
when the requested VPID matches.
Vacuum data construction — log → blocks
Section titled “Vacuum data construction — log → blocks”vacuum_consume_buffer_log_blocks (vacuum.c:5096) is the
bridge from log to vacuum data. It runs whenever the log has
accumulated unprocessed MVCC operations:
- Read the log forward from the last consumed LSA. (Note the
direction contrast: this construction phase scans the WAL
forward; the per-block worker walk in §“Worker” walks the
block’s MVCC chain backward via
prev_mvcc_op_log_lsa.) - For each
LOG_MVCC_*record encountered, find or create the block this record falls into (block id =pageid / vacuum_Data.log_block_npages— the runtime field initialised fromVACUUM_LOG_BLOCK_PAGES_DEFAULTbut overridable viaPRM_ID_VACUUM_LOG_BLOCK_PAGES, so the divisor is the live value, not the macro). - Update the block’s
start_lsa(last MVCC op seen),newest_mvccid, andoldest_visible_mvccid(the watermark captured at the time the record was logged — not now). - When a block is filled (no more MVCC ops will fall into it because the log has moved past it), write its entry to vacuum data with status AVAILABLE.
The captured oldest_visible_mvccid is the watermark from the
time of logging, not the time of consumption. This matters: a
block can be dispatched as soon as the current watermark exceeds
the block’s newest_mvccid, regardless of how the watermark has
moved since the block was logged.
Master task — picking blocks, dispatching jobs
Section titled “Master task — picking blocks, dispatching jobs”The master is a cubthread::entry_task subclass:
// vacuum_master_task — src/query/vacuum.c:813class vacuum_master_task : public cubthread::entry_task{public: void execute (cubthread::entry &thread_ref) override;
private: bool check_shutdown () const; bool is_task_queue_full () const; bool should_interrupt_iteration () const; bool is_cursor_entry_ready_to_vacuum () const; bool is_cursor_entry_available () const; void start_job_on_cursor_entry (); bool should_force_data_update () const; void increase_outstanding_job (); void decrease_outstanding_job (int count);
vacuum_job_cursor m_cursor; /* Where in vacuum data we are */ // ... condensed ...};vacuum_master_task::execute is the master loop. Each tick:
- Check shutdown / queue-full / interrupt conditions.
- Advance the cursor to the next AVAILABLE entry whose
newest_mvccidis less than the current oldest-visible watermark. - Atomically transition the entry to
IN_PROGRESS. - Increment the outstanding-job counter.
- Dispatch the entry to a worker via the thread pool.
The cursor is a separate class (vacuum_job_cursor,
vacuum.c:277) because vacuum data pages can be added or removed
between ticks (a fully-vacuumed page is freed; new blocks always
land on the last page). The cursor’s readjust_to_vacuum_data_changes
relocates the cursor’s blockid → page mapping after such changes.
Worker — per-block log replay and target cleanup
Section titled “Worker — per-block log replay and target cleanup”vacuum_process_log_block (vacuum.c:3251) is the worker entry
point. Given a block, it:
- Sets the worker’s state to
PROCESS_LOG. - Walks the block’s log records backward via
LOG_VACUUM_INFO::prev_mvcc_op_log_lsachains. The chain exists exactly because the log manager is courteous to the vacuum subsystem: every MVCC record carries a back-pointer to the previous MVCC record (cubrid-log-manager.md §“MVCC-flavoured records”). - For each record, decompresses the undo image (using the
worker’s per-thread
log_zip_p). - Builds a
VACUUM_HEAP_OBJECT(vfid + oid) for each candidate to clean. - Switches to state
EXECUTE. Fixes target pages, removes dead versions, compacts B+Tree OID lists. - On success, sets the block’s status to
VACUUMED. On failure (interrupted, page latch contention, error), setsINTERRUPTEDso the master will re-dispatch.
The worker maintains buffers reused across jobs to avoid allocation:
// VACUUM_WORKER — src/query/vacuum.hstruct vacuum_worker{ VACUUM_WORKER_STATE state; /* INACTIVE / PROCESS_LOG / EXECUTE */ INT32 drop_files_version; /* Last seen dropped-files version */
struct log_zip *log_zip_p; /* Decompression context */
VACUUM_HEAP_OBJECT *heap_objects; /* Targets to clean this job */ int heap_objects_capacity; int n_heap_objects;
char *undo_data_buffer; int undo_data_buffer_capacity;
int private_lru_index; /* Per-worker LRU list in page buffer */
char *prefetch_log_buffer; /* Prefetched log pages */ LOG_PAGEID prefetch_first_pageid; LOG_PAGEID prefetch_last_pageid;
bool allocated_resources; int idx; /* -1 for master; sequence for workers */};The private LRU index is worth noting. CUBRID’s buffer manager supports per-thread LRU lists (cubrid-page-buffer-manager.md §“Quota and private lists”); vacuum workers each get their own list so a vacuum scan doesn’t pollute the global hot list. The prefetch buffer is a stash of upcoming log pages so the worker can chain through MVCC records without per-page fault latency.
Dropped files — the skip list
Section titled “Dropped files — the skip list”When a class is dropped while old MVCC versions still reference its
file, the vacuum worker must not follow the file id into freed
storage. The vacuum_dropped_file table maps vfid to the MVCCID
at which the file was dropped:
// vacuum_dropped_file — src/query/vacuum.c:580struct vacuum_dropped_file{ VFID vfid; MVCCID mvccid;};
struct vacuum_dropped_files_page{ VPID next_page; INT16 n_dropped_files; vacuum_dropped_file dropped_files[1]; /* variable-size */};A worker calls vacuum_is_file_dropped (vacuum.c:6587) before
chasing a vfid into a heap; if the answer is yes and the version
predates the drop MVCCID, the version is implicitly dead and the
worker skips it.
The dropped-files page list is updated by
vacuum_log_add_dropped_file (vacuum.c:6121). The selector
VACUUM_LOG_ADD_DROPPED_FILE_POSTPONE vs.
VACUUM_LOG_ADD_DROPPED_FILE_UNDO (these are plain bool
values passed as the pospone_or_undo argument, not OR-able
flag bits) distinguishes between “this file was dropped at
commit; vacuum it on commit-side postpone replay” and “this
file was created and then aborted; vacuum it on undo replay”.
Recovery integration
Section titled “Recovery integration”vacuum_data_load_and_recover (vacuum.c:4183) is the post-restart
entry point. After log_recovery finishes the three ARIES passes,
this function:
- Reloads
vacuum_Datafrom its on-disk pages. - Walks any blocks that were
IN_PROGRESSat crash and resets them toAVAILABLE(withINTERRUPTEDflag set), so the master picks them up again. - Calls
vacuum_recover_lost_block_data(vacuum.c:5465) to patch any blocks that were in flight in the WAL but not yet recorded in vacuum data — this can happen if the crash occurred between log emission of an MVCC record and the nextvacuum_consume_buffer_log_blockstick.
One block, end to end
Section titled “One block, end to end”sequenceDiagram
participant LM as log_manager
participant CB as vacuum_consume_buffer_log_blocks
participant VD as vacuum_Data file
participant M as vacuum_master_task
participant W as vacuum_worker
participant PG as page buffer / heap / btree
LM->>LM: append LOG_MVCC_* record
Note over LM: every record's prev_mvcc_op_log_lsa\nlinks to previous MVCC record
CB->>LM: read since last consumed LSA
CB->>VD: append AVAILABLE entry for filled block
loop master tick
M->>VD: cursor next AVAILABLE entry below watermark
M->>VD: CAS status → IN_PROGRESS
M->>W: dispatch (block)
W->>LM: walk MVCC chain in block
W->>W: build VACUUM_HEAP_OBJECT list
W->>PG: fix + remove dead versions / compact OID lists
alt success
W->>VD: CAS status → VACUUMED
else interrupt / error
W->>VD: CAS flag INTERRUPTED, status → AVAILABLE
end
end
Source Walkthrough
Section titled “Source Walkthrough”Anchor on symbol names, not line numbers.
Headers and types
Section titled “Headers and types”vacuum_worker(vacuum.h) — per-worker bookkeeping.vacuum_worker_stateenum (vacuum.h) — INACTIVE / PROCESS_LOG / EXECUTE.VACUUM_HEAP_OBJECT(vacuum.h) — heap-side target.VACUUM_LOG_BLOCK_PAGES_DEFAULT(vacuum.h) — block size.vacuum_data_entry(vacuum.c) — one block of pending work.vacuum_data_page(vacuum.c) — page of entries.vacuum_data(vacuum.c) — global state.vacuum_dropped_file/vacuum_dropped_files_page(vacuum.c) — skip list.vacuum_master_task(vacuum.c) — master loop class.vacuum_job_cursor(vacuum.c) — relocation-tolerant cursor.
Lifecycle
Section titled “Lifecycle”vacuum_initialize(vacuum.c) — boot-time init.vacuum_finalize(vacuum.c) — shutdown.vacuum_data_load_and_recover(vacuum.c) — post-recovery reload.vacuum_recover_lost_block_data(vacuum.c) — patch in-flight blocks the consumer didn’t see before crash.
Block lifecycle
Section titled “Block lifecycle”vacuum_consume_buffer_log_blocks(vacuum.c) — log → vacuum data.vacuum_master_task::execute(vacuum.c) — master tick.vacuum_master_task::start_job_on_cursor_entry(vacuum.c) — CAS to IN_PROGRESS + dispatch.vacuum_process_log_block(vacuum.c) — worker entry.vacuum_worker_allocate_resources(vacuum.c) — first-touch allocation oflog_zip_p, buffers.
Dropped files
Section titled “Dropped files”vacuum_log_add_dropped_file(vacuum.c) — register.vacuum_is_file_dropped(vacuum.c) — query.
Worker-state inline accessors (in vacuum.h)
Section titled “Worker-state inline accessors (in vacuum.h)”vacuum_get_vacuum_worker,vacuum_is_thread_vacuum,vacuum_is_thread_vacuum_worker,vacuum_is_thread_vacuum_master,vacuum_get_worker_state,vacuum_set_worker_state,vacuum_worker_state_is_*. All__attribute__ ((ALWAYS_INLINE))because they sit on the worker hot path.
Position hints as of 2026-04-30
Section titled “Position hints as of 2026-04-30”| Symbol | File | Line |
|---|---|---|
VACUUM_WORKER (struct) | vacuum.h | 106 |
VACUUM_WORKER_STATE enum | vacuum.h | 85 |
VACUUM_LOG_BLOCK_PAGES_DEFAULT | vacuum.h | 82 |
VACUUM_MAX_WORKER_COUNT | vacuum.h | 132 |
vacuum_data_entry (struct) | vacuum.c | 104 |
vacuum_data_page (struct) | vacuum.c | 194 |
vacuum_data (struct) | vacuum.c | 350 |
vacuum_dropped_file (struct) | vacuum.c | 580 |
vacuum_dropped_files_page (struct) | vacuum.c | 588 |
vacuum_job_cursor (class) | vacuum.c | 277 |
vacuum_master_task (class) | vacuum.c | 813 |
vacuum_master_task::execute | vacuum.c | 3002 |
vacuum_initialize | vacuum.c | 1180 |
vacuum_finalize | vacuum.c | 1416 |
vacuum_process_log_block | vacuum.c | 3251 |
vacuum_worker_allocate_resources | vacuum.c | 3620 |
vacuum_finalize_worker | vacuum.c | 3689 |
vacuum_data_load_and_recover | vacuum.c | 4183 |
vacuum_consume_buffer_log_blocks | vacuum.c | 5096 |
vacuum_recover_lost_block_data | vacuum.c | 5465 |
vacuum_log_add_dropped_file | vacuum.c | 6121 |
vacuum_is_file_dropped | vacuum.c | 6587 |
Source verification (as of 2026-04-30)
Section titled “Source verification (as of 2026-04-30)”Verified facts
Section titled “Verified facts”-
Vacuum sources live under
src/query/, notsrc/transaction/. Verified byfind—src/query/vacuum.{c,h}exist, nosrc/transaction/vacuum.*. Implication:references:in the meta and frontmatter of this doc were corrected at draft time (the original skeleton placed them undersrc/transaction/by analogy with cubrid-mvcc). -
Block size is 31 log pages by default, encoded as
VACUUM_LOG_BLOCK_PAGES_DEFAULT. Verified atvacuum.h:82. The corresponding runtime parameter isprm_get_integer_value (PRM_ID_VACUUM_LOG_BLOCK_PAGES)(covered in thevacuum_initializebody); 31 is the default but it can be overridden at boot. -
Worker pool is capped at 50. Verified by
VACUUM_MAX_WORKER_COUNT = 50(vacuum.h:132). The actual count is configurable via a server parameter; the macro is the hard upper bound. -
Block status is bit-packed into the top 3 bits of the 64-bit blockid; an
INTERRUPTEDflag uses the 4th bit. Verified atvacuum.c:135-186. Available status is0x0000000000000000(top bits zero); vacuumed is0x8000000000000000; in-progress is0x4000000000000000. The BLOCKID_MASK0x1FFFFFFFFFFFFFFFextracts the actual id — 60 bits, ~1.15 × 10^18 blocks before exhaustion. -
vacuum_Data.first_pageandvacuum_Data.last_pageare permanently fixed in the buffer pool. Verified atvacuum.c:223(vacuum_fix_data_pagemacro short-circuits to the cached pages). Implication: page-buffer eviction never touches them, so master ticks pay no fix-overhead. -
Workers maintain a private LRU list in the page buffer. Verified at
vacuum.h:122(VACUUM_WORKER::private_lru_index). This prevents vacuum scans from polluting the global hot list. (Cross-doc: cubrid-page-buffer-manager.md describes the per-thread LRU mechanism.) -
Workers prefetch upcoming log pages. Verified at
vacuum.h:124-126(prefetch_log_buffer,prefetch_first_pageid,prefetch_last_pageid). The buffer is per-worker, sized at first allocation invacuum_worker_allocate_resources. -
The MVCC log chain is what drives backward walking inside a block. Verified by reading
vacuum_process_log_blockand theLOG_VACUUM_INFO::prev_mvcc_op_log_lsafield (cubrid-log-manager.md §“MVCC-flavoured records”). The chain is the only reason vacuum is faster than a full forward log walk. -
The dropped-files table is paged like vacuum data, not inline in vacuum data. Verified at
vacuum.c:588(vacuum_dropped_files_page). Implication: dropped files survive vacuum data page churn, and vacuum cleanup can run even after vacuum data has been compacted. -
Worker recovery on crash: in-progress blocks are reset to AVAILABLE with INTERRUPTED. Verified by reading
vacuum_data_load_and_recoverand thewas_interrupted/set_interruptedaccessors onvacuum_data_entry. The flag signals to the master that this block was already partially done, so the worker can skip records the previous attempt marked as cleaned (target pages already have advanced LSA).
Open questions
Section titled “Open questions”-
Master tick interval and adaptive throttling. The master loop’s wake interval and any backpressure mechanism (slow down when the system is busy) were not located. Investigation path: read the cubthread daemon registration of
vacuum_master_taskplus theshould_interrupt_iteration/is_task_queue_fullmethods. -
Watermark advancement triggers. When does
log_Gl.hdr.oldest_visible_mvccidget recomputed? Per transaction commit / abort? Periodically? Both? Investigation path: grep for writes tooldest_visible_mvccid; cross-ref withlogtb_complete_mvcc(cubrid-transaction.md). -
Heap-vs-btree dispatch. A
VACUUM_HEAP_OBJECTis just(vfid, oid). How does the worker decide that a vfid points to a heap rather than a B+Tree, and does it call the right per-subsystem cleanup function? Investigation path: read the body of the vacuum-execute path (around the worker’s state transition toEXECUTE). -
Interaction with online schema changes. If a B+Tree is dropped while vacuum is mid-job on a block that touches it, the dropped-files table prevents chasing the file — but what about the in-flight
VACUUM_HEAP_OBJECTlist? Are entries filtered against dropped-files, or does the worker have to handle ER_FILE_DROPPED at the page-fix layer? Investigation path: look forvacuum_is_file_droppedcallers inside the execute path. -
Page-buffer private LRU semantics under contention. If a worker’s private LRU is full and another worker needs the same page, what happens? Hand off, or share via global LRU? Investigation path: cubrid-page-buffer-manager.md and
pgbuf_*_private_lru_*paths. -
Recovery of
vacuum_recover_lost_block_data. What exactly are “lost blocks” — blocks where the vacuum consumer was running at crash? Or blocks where the log emitted MVCC records but the consumer never ran? The function name suggests both. Investigation path: read its body and look for the ranges it patches over.
Beyond CUBRID — Comparative Designs & Research Frontiers
Section titled “Beyond CUBRID — Comparative Designs & Research Frontiers”Pointers, not analysis. Each bullet is a starting handle for a follow-up doc.
-
PostgreSQL VACUUM — heap and index passes scan data pages, not the WAL. The dead-tuple bitmap is computed per-relation and consulted during cleanup. Cost: full scan of the relation; benefit: no log-side dependency. CUBRID’s log-driven design is closer to InnoDB’s.
-
InnoDB purge thread — walks the undo log (rollback segments) backward, removing dead versions when MVCCID watermark permits. CUBRID’s log-driven walk is structurally similar but uses redo log, not undo log, because CUBRID logs MVCC undo inside the same
LOG_MVCC_*records. -
Hekaton garbage collection (Larson et al., VLDB 2011) — epoch-based, lock-free, runs on a per-thread basis after the oldest active transaction’s epoch has retired. CUBRID’s master/worker model is the disk-resident analogue of the same idea.
-
Aurora’s MVCC at storage layer — versions are reclaimed by the storage engine, not the compute node, eliminating vacuum-on-compute. CUBRID is process-local; this is more a structural contrast than a feature gap.
-
Self-tuning autovacuum (PostgreSQL 16+) — dynamic block size and worker count based on dirty-page rate. CUBRID’s
VACUUM_LOG_BLOCK_PAGES_DEFAULTis static; an adaptive variant would be a useful CBRD ticket follow-up. -
VACUUM-as-replication-source (Debezium-style) — vacuuming emits a log stream of “this row went away”. CUBRID’s supplemental log records (cubrid-log-manager.md §“Supplemental records”) could be repurposed for this; the cubrid-cdc.md doc is the natural follow-up.
Sources
Section titled “Sources”Raw analyses (raw/code-analysis/cubrid/storage/vacuum/)
Section titled “Raw analyses (raw/code-analysis/cubrid/storage/vacuum/)”vacuum.pdfvacuum.pptx
Sibling docs
Section titled “Sibling docs”knowledge/code-analysis/cubrid/cubrid-mvcc.md— visibility model andoldest_visible_mvccidwatermark.knowledge/code-analysis/cubrid/cubrid-log-manager.md—LOG_MVCC_*records andLOG_VACUUM_INFO::prev_mvcc_op_log_lsachain.knowledge/code-analysis/cubrid/cubrid-heap-manager.md— per-record vacuum on the heap side.knowledge/code-analysis/cubrid/cubrid-page-buffer-manager.md— per-thread LRU lists vacuum workers use.knowledge/code-analysis/cubrid/cubrid-recovery-manager.md—vacuum_data_load_and_recoverruns after the three-pass restart.
Textbook chapters (under knowledge/research/dbms-general/)
Section titled “Textbook chapters (under knowledge/research/dbms-general/)”- Database Internals (Petrov), Ch. 5 §“MVCC”, §“Garbage collection in MVCC”.
CUBRID source (/data/hgryoo/references/cubrid/)
Section titled “CUBRID source (/data/hgryoo/references/cubrid/)”src/query/vacuum.{c,h}src/transaction/mvcc.{c,h}