Skip to content

CUBRID Vacuum — Code-Level Deep Dive

Where this document fits: The high-level analysis cubrid-vacuum.md covers design intent and theoretical background. This document traces every branch and field at the code level. Each chapter is self-contained, but reading in order follows the full lifecycle of a single vacuum log block — from append-time tracking to heap and index cleanup — inside the kernel.

Contents:

ChTitleStatus
1Data Structure Map
2Initialization and Memory Management
3Block Birth in the Log Append Path
4Block Registration into Vacuum Data
5Eligibility and the Oldest Visible Watermark
6Master Dispatch and the Job Cursor
7Worker Log Pass and Per Record Dispatch
8Heap Execution
9Block Completion and Log Reclamation
10The Dropped Files Ledger
11Crash Recovery and Standalone Paths

The high-level companion (cubrid-vacuum.md, “Overall structure” and “Vacuum data — the catalogue of pending work”) names the four moving parts: log producers, vacuum data, the master, the worker pool. This chapter is the field dictionary underneath: which structs represent a vacuum block at each stage of its life, and how they point at each other. Later chapters trace operations over these structures without re-explaining fields.

A block has four representations, in chronological order:

  1. an accumulator inside log_header (log_Gl.hdr), updated while transactions append MVCC log records (Ch. 3);
  2. a vacuum_data_entry in flight inside the lock-free vacuum_Block_data_buffer (Ch. 3 → Ch. 4);
  3. a persisted vacuum_data_entry inside a vacuum_data_page, tracked AVAILABLE → IN_PROGRESS → VACUUMED (Ch. 4–6, 9);
  4. a bare VACUUM_LOG_BLOCKID with status flags inside vacuum_Finished_job_queue (Ch. 9).
flowchart LR
  subgraph logside["log side"]
    HDR["log_Gl.hdr accumulator"]
    VI["log_vacuum_info per MVCC record"]
  end
  subgraph queues["lock-free queues"]
    BB["vacuum_Block_data_buffer<br/>1024 x vacuum_data_entry"]
    FQ["vacuum_Finished_job_queue<br/>2048 x VACUUM_LOG_BLOCKID"]
  end
  subgraph vdata["vacuum data file"]
    VD["vacuum_Data global"]
    VDP["vacuum_data_page chain<br/>entry window per page"]
  end
  subgraph exec["execution side"]
    W["vacuum_Workers[]"]
    DF["dropped-files page chain"]
  end
  HDR -- "block boundary" --> BB
  VI -. "back-chain read by worker" .- W
  BB -- "master consumes" --> VDP
  VD --- VDP
  VDP -- "job dispatch" --> W
  W -- "is_file_dropped" --> DF
  W -- "done / interrupted" --> FQ
  FQ -- "master marks finished" --> VDP

Figure 1-1 — The four representations of a block. The two queues are the only contact points between transaction threads, the master, and workers.

While transactions append log records, the current (still filling) block exists only as five fields of log_header (log_storage.hpp):

FieldRoleWhy it exists
does_block_need_vacuumTrue once an MVCC undo/undoredo record landed in the current blockThe “block is dirty” bit; a block with no MVCC ops produces no queue entry
mvcc_op_log_lsaLSA of the last MVCC op record so farBecomes vacuum_data_entry::start_lsa, the worker’s chain-walk entry point (Ch. 7)
oldest_visible_mvccidGlobal oldest-visible snapshot taken at the block’s first MVCC opBecomes the entry’s eligibility key (Ch. 5)
newest_block_mvccidMax MVCCID among the block’s opsBlock runs only once this drops below the oldest-visible watermark, checked by vacuum_master_task::is_cursor_entry_ready_to_vacuum
vacuum_last_blockidPersisted high-water mark of consumed blocksWritten by SA-mode vacuum_sa_reflect_last_blockid before archive purge; recovery takes MAX with vacuum data (Ch. 11)

The first four are maintained by one function on the prior-LSA path, under prior_lsa_mutex:

// prior_update_header_mvcc_info -- src/transaction/log_append.cpp
static void
prior_update_header_mvcc_info (const LOG_LSA &record_lsa, MVCCID mvccid)
{
if (!log_Gl.hdr.does_block_need_vacuum)
{
// first mvcc record for this block
log_Gl.hdr.oldest_visible_mvccid = log_Gl.mvcc_table.get_global_oldest_visible ();
log_Gl.hdr.newest_block_mvccid = mvccid;
}
else
{
// ... condensed: sanity asserts ...
assert (vacuum_get_log_blockid (log_Gl.hdr.mvcc_op_log_lsa.pageid)
== vacuum_get_log_blockid (record_lsa.pageid)); /* <- both records in same block */
if (log_Gl.hdr.newest_block_mvccid < mvccid)
{
log_Gl.hdr.newest_block_mvccid = mvccid;
}
}
log_Gl.hdr.mvcc_op_log_lsa = record_lsa;
log_Gl.hdr.does_block_need_vacuum = true;
}

Invariant 1-A — single-block accumulator. All four MVCC accumulator fields describe exactly one block: the one containing mvcc_op_log_lsa. Enforcement: post-restart (LOG_ISRESTARTED), while the accumulator is dirty (does_block_need_vacuum), prior_lsa_next_record_internal checks on every record append — any type, before the MVCC-type branch — whether the newly reserved LSA falls in a different block than mvcc_op_log_lsa; if so it flushes the accumulator via vacuum_produce_log_block_data first, all under prior_lsa_mutex. What breaks: one entry would aggregate two blocks’ MVCCIDs — oldest_visible_mvccid could be too new (vacuum removes a still-visible version) or start_lsa could point outside the block (worker scans the wrong range).

Each MVCC undo record additionally embeds a log_vacuum_info (log_record.hpp):

FieldRoleWhy it exists
prev_mvcc_op_log_lsaLSA of the previous MVCC op record — a backward singly linked list, copied from mvcc_op_log_lsa under the same mutexWorker hops the chain from start_lsa, touching only MVCC records (Ch. 7)
vfidFile the operation touchedWorker checks the dropped-files ledger and groups heap objects per file without fetching pages

Note the asymmetric reset: the boundary flush in vacuum_produce_log_block_data clears does_block_need_vacuum and newest_block_mvccid but not mvcc_op_log_lsa, so the back-chain runs unbroken across block boundaries.

1.2 vacuum_data_entry — the bit-packed block record

Section titled “1.2 vacuum_data_entry — the bit-packed block record”
// vacuum_data_entry -- src/query/vacuum.c
struct vacuum_data_entry
{
VACUUM_LOG_BLOCKID blockid; // blockid and flags
LOG_LSA start_lsa; // lsa of last mvcc op log record in block
MVCCID oldest_visible_mvccid; // oldest visible MVCCID while block was logged
MVCCID newest_mvccid; // newest MVCCID in log block
// ... condensed: constructors, mask-test accessors, setters ...
};
FieldRoleWhy it exists
blockid61-bit id plus 3 flag bits (2-bit status + interrupted) in the top bitsStatus must survive crashes with the entry; stealing high bits avoids widening the record (VACUUM_LOG_BLOCKID is std::int64_t)
start_lsaCopied from log_Gl.hdr.mvcc_op_log_lsaEntry point of the worker’s backward chain walk
oldest_visible_mvccidCopied from the accumulatorEligibility key; feeds vacuum_Data.oldest_unvacuumed_mvccid (Ch. 5)
newest_mvccidCopied from the accumulatorBlock runs only when this drops below the oldest-visible watermark

The bit layout of blockid:

// VACUUM_DATA_ENTRY_FLAG_MASK -- src/query/vacuum.c
#define VACUUM_DATA_ENTRY_FLAG_MASK 0xE000000000000000 /* <- top 3 bits */
#define VACUUM_DATA_ENTRY_BLOCKID_MASK 0x1FFFFFFFFFFFFFFF /* <- low 61 bits */
#define VACUUM_BLOCK_STATUS_MASK 0xC000000000000000 /* <- top 2 bits = status */
#define VACUUM_BLOCK_STATUS_VACUUMED 0x8000000000000000
#define VACUUM_BLOCK_STATUS_IN_PROGRESS_VACUUM 0x4000000000000000
#define VACUUM_BLOCK_STATUS_AVAILABLE 0x0000000000000000
#define VACUUM_BLOCK_FLAG_INTERRUPTED 0x2000000000000000 /* <- bit 61 */
#define VACUUM_BLOCKID_WITHOUT_FLAGS(blockid) \
((blockid) & VACUUM_DATA_ENTRY_BLOCKID_MASK)

Bits 63–62: status (10 VACUUMED, 01 IN_PROGRESS, 00 AVAILABLE, 11 unused); bit 61: the orthogonal INTERRUPTED flag; bits 60–0: the id. AVAILABLE being all-zero, a freshly computed blockid is born AVAILABLE for free. The accessors (is_available etc.) are one-line mask tests; the two compound setters are the subtle part:

// vacuum_data_entry::set_vacuumed -- src/query/vacuum.c
void vacuum_data_entry::set_vacuumed ()
{
VACUUM_BLOCK_STATUS_SET_VACUUMED (blockid);
VACUUM_BLOCK_CLEAR_INTERRUPTED (blockid); /* <- success wipes the interrupted history */
}
void vacuum_data_entry::set_interrupted ()
{
VACUUM_BLOCK_STATUS_SET_AVAILABLE (blockid); /* <- back to AVAILABLE for re-dispatch */
VACUUM_BLOCK_SET_INTERRUPTED (blockid); /* <- but remember the scar */
}

set_interrupted does not add a fourth status; it returns to AVAILABLE and raises the flag — which is why 11 never appears.

Role matrix for the status bits — the same two bits mean different things to different actors:

StatusTo the masterTo a workerTo recovery
AVAILABLEcandidate for the job cursor (Ch. 6)never sees itredo of job start re-marks IN_PROGRESS
IN_PROGRESSskip; a worker owns it”this is my job”job died with the crash → interrupted (Ch. 11)
VACUUMEDremove entry, advance index_unvacuumed (Ch. 9)terminal; set by set_vacuumedentry may be dropped from the page
+ INTERRUPTEDre-dispatch, flag the prior deathcautious mode: half-cleaned pages tolerated (Ch. 8)preserved across restart
stateDiagram-v2
    [*] --> AVAILABLE : appended by master\nvacuum_consume_buffer_log_blocks
    AVAILABLE --> IN_PROGRESS : set_job_in_progress\nmaster dispatches job
    IN_PROGRESS --> VACUUMED : set_vacuumed\nworker success, clears INTERRUPTED
    IN_PROGRESS --> AVAILABLE_INTERRUPTED : set_interrupted\nshutdown or error
    AVAILABLE_INTERRUPTED --> IN_PROGRESS : set_job_in_progress\nredispatch keeps INTERRUPTED flag
    VACUUMED --> [*] : entry removed\nvacuum_data_mark_finished

Figure 1-2 — Status lifecycle in the top bits of blockid. AVAILABLE_INTERRUPTED is AVAILABLE plus the bit-61 flag, not a distinct status.

The id is pure arithmetic over log page ids:

// vacuum_get_log_blockid -- src/query/vacuum.c
VACUUM_LOG_BLOCKID
vacuum_get_log_blockid (LOG_PAGEID pageid)
{
if (prm_get_bool_value (PRM_ID_DISABLE_VACUUM) || pageid == NULL_PAGEID)
{
return VACUUM_NULL_LOG_BLOCKID; /* <- -1; the only escape hatch */
}
// ... condensed ...
return pageid / vacuum_Data.log_block_npages;
}

log_block_npages defaults to VACUUM_LOG_BLOCK_PAGES_DEFAULT (31, vacuum.h); the inverse VACUUM_FIRST_LOG_PAGEID_IN_BLOCK(blockid) is blockid * log_block_npages. The const log_header & constructor is representation 1 → 2: it delegates to the three-argument constructor, which asserts oldest <= newest and computes blockid = vacuum_get_log_blockid (start_lsa.pageid).

Invariant 1-B — blockid/start_lsa coherence. VACUUM_BLOCKID_WITHOUT_FLAGS (blockid) == vacuum_get_log_blockid (start_lsa.pageid) for every entry. Enforcement: by construction (the three-argument constructor). What breaks: the prefetch window and the chain walk would target different log ranges.

1.3 vacuum_data_page — the persisted array with a sliding window

Section titled “1.3 vacuum_data_page — the persisted array with a sliding window”
// vacuum_data_page -- src/query/vacuum.c
struct vacuum_data_page
{
VPID next_page;
INT16 index_unvacuumed;
INT16 index_free;
VACUUM_DATA_ENTRY data[1]; /* <- flexible array; capacity computed at runtime */
static const INT16 INDEX_NOT_FOUND = -1;
// ... condensed: is_empty, is_index_valid, get_index_of_blockid, get_first_blockid ...
};
FieldRoleWhy it exists
next_pageVPID link to the next page; NULL VPID on the lastVacuum data is a queue of pages: consume at head, append at tail
index_unvacuumedFirst entry not yet VACUUMEDFinished entries are skipped by sliding this forward, not compacted (Ch. 9)
index_freeFirst unused slot; append position[index_unvacuumed, index_free) is the live window
data[1]Entry array, page_data_max_count slotsVACUUM_DATA_PAGE_HEADER_SIZE = offsetof (VACUUM_DATA_PAGE, data); capacity = (DB_PAGESIZE - header) / sizeof (entry)
INDEX_NOT_FOUNDSentinel (-1) from get_index_of_blockidDistinguishes “not on this page” from a valid slot

Invariant 1-C — dense, consecutive window. 0 <= index_unvacuumed <= index_free <= page_data_max_count, and the window holds strictly consecutive blockids. Enforcement: the master appends in blockid order at index_free (vacuum_consume_buffer_log_blocks, Ch. 4), synthesizing a placeholder entry born VACUUMED (page_free_data->set_vacuumed ()) for every gap block that had no MVCC ops; it never deletes from the middle (a finished entry keeps its slot, only its status changes) and slides index_unvacuumed only over VACUUMED prefixes. vacuum_data_mark_finished asserts page_free_blockid == data_page->data[data_page->index_free - 1].get_blockid () + 1. What breaks: the O(1) lookup below returns the wrong slot — the master would mark the wrong block finished.

// vacuum_data_page::get_index_of_blockid -- src/query/vacuum.c
INT16
vacuum_data_page::get_index_of_blockid (VACUUM_LOG_BLOCKID blockid) const
{
if (is_empty ())
{
return INDEX_NOT_FOUND;
}
VACUUM_LOG_BLOCKID first_blockid = data[index_unvacuumed].get_blockid ();
// ... condensed: return INDEX_NOT_FOUND if blockid before or after the window ...
INT16 index_of_blockid = (INT16) (blockid - first_blockid) + index_unvacuumed;
assert (data[index_of_blockid].get_blockid () == blockid); /* <- relies on Invariant 1-C */
return index_of_blockid; /* <- O(1), no loop */
}

is_empty () is index_unvacuumed == index_free; both indexes are reset to 0 by vacuum_data_initialize_new_page.

Invariant 1-D — an empty page still carries last_blockid. vacuum_init_data_page_with_last_blockid writes a blockid into slot 0 (data->blockid) of a freshly initialized page even though the window is empty; when vacuum data is empty, recovery reads vacuum_Data.last_page->data->blockid and takes MAX with log_Gl.hdr.vacuum_last_blockid (Ch. 11). Slot 0 plays two roles: a live entry when the window covers it, the persisted high-water mark when the page is empty. What breaks: zeroing slot 0 on emptying would let a restart re-consume already-vacuumed blocks.

1.4 The vacuum_data global and the job cursor

Section titled “1.4 The vacuum_data global and the job cursor”

One static instance, vacuum_Data, glues everything together:

// vacuum_data -- src/query/vacuum.c
struct vacuum_data
{
VFID vacuum_data_file;
LOG_PAGEID keep_from_log_pageid; /* Smallest LOG_PAGEID that vacuum may still need for its jobs. */
MVCCID oldest_unvacuumed_mvccid; /* Global oldest MVCCID not vacuumed (yet). */
VACUUM_DATA_PAGE *first_page; /* Cached first vacuum data page. */
VACUUM_DATA_PAGE *last_page; /* Cached last vacuum data page. */
// ... condensed ...
private:
VACUUM_LOG_BLOCKID m_last_blockid; /* ... the id of last added block
* which may not even be in vacuum data (being already vacuumed). */
};
FieldRoleWhy it exists
vacuum_data_fileVFID of the disk file holding the page chainFinds the head VPID via the file descriptor at boot
keep_from_log_pageidFirst log page of the first unvacuumed blockArchive purger’s fence, via vacuum_min_log_pageid_to_keep (Ch. 9)
oldest_unvacuumed_mvccid”Everything below is fully cleaned” watermarkSanity checks, vacuum_is_mvccid_vacuumed; upgrade_oldest_unvacuumed asserts ascent (Ch. 5)
first_page / last_pagePermanently fixed head/tail pagesMaster reads head constantly, producers append at tail; latching once avoids per-op pgbuf fixes
page_data_max_countEntries per pageComputed once in vacuum_initialize from DB_PAGESIZE
log_block_npagesBlock granularity in log pagesDivisor of vacuum_get_log_blockid; fixed at db creation, default 31
is_loadedPages fixed and readyGuards vacuum_data_load_first_and_last_page re-entry
shutdown_sequencevacuum_shutdown_sequence objectOrderly stop; workers turn it into set_interrupted paths
is_archive_removal_safeFalse until keep_from_log_pageid first computedvacuum_is_safe_to_remove_archives — no purge before vacuum knows its needs
recovery_lsaLSA where recovery startedBackward-scan anchor for vacuum_recover_lost_block_data (Ch. 11)
is_restoredb_sessionBooted by restoredbRecovery-path behavior switch
is_vacuum_complete (SA_MODE)Standalone “all caught up” flagSA-mode runs vacuum to completion inside xvacuum (Ch. 11)
m_last_blockid (private)Id of the last block ever consumedNot necessarily present as an entry — see role matrix

Role matrix for m_last_blockid — meaning depends on whether vacuum data is empty:

Stateget_last_blockid () meansget_first_blockid () returns
non-emptyblockid of the last appended entryfirst_page->get_first_blockid () — first window entry of the head page
emptylast block that was consumed, possibly long since removedm_last_blockid too — both accessors collapse to the same value

set_last_blockid strips flags (VACUUM_BLOCKID_WITHOUT_FLAGS) and, in debug builds, asserts the value is strictly below the block of log_Gl.prior_info.prior_lsa — the last block must never overtake the log.

Invariant 1-E — head and tail pages stay fixed. first_page and last_page are latched once at load and never unfixed between operations. Enforcement: the pgbuf wrappers — vacuum_fix_data_page returns the cached pointer when the VPID matches either end, vacuum_unfix_data_page silently skips them, and only vacuum_unfix_first_and_last_data_page (shutdown) really releases them. What breaks: unfixing first_page directly via pgbuf_unfix leaves the next vacuum_fix_data_page returning a stale pointer to a page the buffer manager may have victimized.

Two small companions live beside vacuum_Data. The vacuum_data_load struct (global vacuum_Data_load, fields vpid_first / vpid_last) records the chain ends so vacuum_data_load_first_and_last_page can fix both without walking the chain — with the special case vpid_first == vpid_lastlast_page = first_page (one page must not be fixed twice). And vacuum_job_cursor is the master’s persistent iteration state over the window:

// vacuum_job_cursor -- src/query/vacuum.c
class vacuum_job_cursor
{
// ... condensed: increment_blockid, readjust_to_vacuum_data_changes, load/unload ...
private:
VACUUM_LOG_BLOCKID m_blockid; // current cursor blockid
VACUUM_DATA_PAGE *m_page; // loaded page of blockid or null
INT16 m_index; // loaded index of blockid or INDEX_NOT_FOUND
};
FieldRoleWhy it exists
m_blockidThe cursor’s canonical positionBlockids are stable; after blocks are removed/appended an entry’s location moves but its id does not
m_page / m_indexCached physical location of m_blockidAvoids re-searching per step; recomputed via get_index_of_blockid by readjust_to_vacuum_data_changes after vacuum data shifts

The cursor’s traversal logic is Ch. 6’s subject; only its layout is fixed here.

// vacuum_Block_data_buffer -- src/query/vacuum.c
lockfree::circular_queue<vacuum_data_entry> *vacuum_Block_data_buffer = NULL;
#define VACUUM_BLOCK_DATA_BUFFER_CAPACITY 1024
lockfree::circular_queue<VACUUM_LOG_BLOCKID> *vacuum_Finished_job_queue = NULL;
#define VACUUM_FINISHED_JOB_QUEUE_CAPACITY 2048
QueueElementProducer → ConsumerWhy a queue at all
vacuum_Block_data_bufferfull vacuum_data_entry (1024 cap)any transaction thread on the prior-LSA path → master (vacuum_consume_buffer_log_blocks, Ch. 4)transactions must never latch vacuum data pages — the source comment: “It is advisable to avoid synchronizing running transactions with vacuum threads”
vacuum_Finished_job_queuebare VACUUM_LOG_BLOCKID with status flags attached (2048 cap)workers (vacuum_finished_block_vacuum, Ch. 9) → master (vacuum_data_mark_finished)workers must not write vacuum data pages either; the master is the only writer of representation 3

The producer side of the first queue:

// vacuum_produce_log_block_data -- src/query/vacuum.c
void
vacuum_produce_log_block_data (THREAD_ENTRY * thread_p)
{
// ... condensed: PRM_ID_DISABLE_VACUUM early return ...
VACUUM_DATA_ENTRY block_data { log_Gl.hdr }; /* <- representation 1 -> 2 */
log_Gl.hdr.does_block_need_vacuum = false; /* <- reset accumulator for next block */
log_Gl.hdr.newest_block_mvccid = MVCCID_NULL;
// ... condensed: NULL-buffer guard, vacuum_er_log ...
if (!vacuum_Block_data_buffer->produce (block_data))
{
/* Push failed, the buffer must be full */
vacuum_er_log_error (VACUUM_ER_LOG_ERROR, "%s", "Cannot produce new log block data! The buffer is already full.");
assert (false);
return; /* <- block metadata LOST in release builds */
}
}

The full-queue branch is the known soft spot (the TODO above it admits it): a full buffer silently drops a block’s metadata in release builds. Ch. 11 covers the safety net — vacuum_recover_lost_block_data rebuilds entries by scanning the mvcc_op_log_lsa back-chain.

On the finished side, the element format is the trick: the worker pushes data->blockid after calling set_vacuumed () or set_interrupted () on its private copy, so the flags ride along in the top bits. The master strips them with VACUUM_BLOCKID_WITHOUT_FLAGS to locate the entry (via get_index_of_blockid) and reads the flags to decide VACUUMED-remove versus AVAILABLE+INTERRUPTED-requeue.

// vacuum_worker -- src/query/vacuum.h
struct vacuum_worker
{
VACUUM_WORKER_STATE state;
INT32 drop_files_version;
struct log_zip *log_zip_p;
VACUUM_HEAP_OBJECT *heap_objects;
int heap_objects_capacity;
int n_heap_objects;
char *undo_data_buffer;
int undo_data_buffer_capacity;
int private_lru_index; // page buffer private lru list
char *prefetch_log_buffer;
LOG_PAGEID prefetch_first_pageid;
LOG_PAGEID prefetch_last_pageid;
bool allocated_resources;
int idx; // -1 for vacuum_master; Otherwise, the sequence number of vacuum_worker
};
FieldRoleWhy it exists
stateVACUUM_WORKER_STATE_INACTIVE / PROCESS_LOG / EXECUTELog-reading code behaves differently for vacuum threads (vacuum_is_process_log_for_vacuum gates LOG_CS handling)
drop_files_versionLast vacuum_Dropped_files_version observedThe master’s min across workers decides when old ledger entries can be cleaned (Ch. 10)
log_zip_pPersistent unzip scratchLog undo data may be compressed; per-record reallocation would thrash
heap_objects / heap_objects_capacity / n_heap_objectsGrowable target array from the log pass (initial VACUUM_DEFAULT_HEAP_OBJECT_BUFFER_SIZE = 4000)Log pass only collects; execution sorts by VFID, batches per heap page (Ch. 7→8)
undo_data_buffer / undo_data_buffer_capacityCopy buffer for undo data spanning log pages (initial IO_PAGESIZE)B-tree vacuum needs the data contiguous
private_lru_indexPrivate LRU list id in the page bufferVacuum touches huge page counts once; quarantine protects the shared LRU
prefetch_log_bufferVACUUM_PREFETCH_LOG_BLOCK_BUFFER_PAGES = 1 + log_block_npages (32) pagesOne bulk read beats 31 single fetches; the +1 is one extra page beyond the block’s start_lsa page — which is logically the block’s last page — since the last record may spill into the next page; vacuum_log_prefetch_vacuum_block’s comment warns it handles at most that one extra page
prefetch_first_pageid / prefetch_last_pageidBuffer range: first = VACUUM_FIRST_LOG_PAGEID_IN_BLOCK (entry->get_blockid ()), last = first + VACUUM_PREFETCH_LOG_BLOCK_BUFFER_PAGES - 1 (first + 31 at default)vacuum_fetch_log_page serves from the buffer iff the pageid is inside the range (Ch. 7)
allocated_resourcesLazy-allocation flagBuffers malloc’d on first job (vacuum_worker_allocate_resources), not at boot
idx-1 for vacuum_Master, else slot in vacuum_Workers[] (max VACUUM_MAX_WORKER_COUNT = 50)Identifies the thread in logs and the min-version scan

VACUUM_HEAP_OBJECT is the unit the collection buffer holds — deliberately minimal:

FieldRoleWhy it exists
vfidHeap file of the objectPrimary sort key: the dropped-ledger is checked once per file
oidObject id; its pageid is the secondary grouping keyBatches per heap page → one fix, one log record per page (Ch. 8)

The comment above vacuum_Master’s definition explains why the master is also a VACUUM_WORKER: it needed system operations and a transaction descriptor for page allocation, so it reuses the struct. vacuum_heap_helper, the worker’s per-page scratch (home/forward pages, slot and result arrays, MVCC header), is dissected in Ch. 8.

Struct / FieldRoleWhy it exists
vacuum_dropped_file.vfidThe dropped fileWorkers match log_vacuum_info.vfid against it
vacuum_dropped_file.mvccidMVCCID recorded at drop timeOnly records with mvccid <= this are skipped — the file id may be reused by a newer file whose records must still be vacuumed (Ch. 10)
vacuum_dropped_files_page.next_pageVPID chain linkLedger can outgrow one page
vacuum_dropped_files_page.n_dropped_filesLive entry countEntries kept VFID-sorted per page (vacuum_add_dropped_file binary-searches via util_bsearch, then memmove-inserts in position); capacity VACUUM_DROPPED_FILES_PAGE_CAPACITY
vacuum_dropped_files_page.dropped_files[1]Flexible entry arraySame flexible-array idiom as vacuum_data_page

Supporting globals: vacuum_Dropped_files_vfid / vacuum_Dropped_files_vpid (file and head page), vacuum_Dropped_files_loaded, vacuum_Dropped_files_count, vacuum_Last_dropped_vfid, and vacuum_Dropped_files_version — the INT32 generation counter paired with each worker’s drop_files_version. vacuum_Dropped_files_mutex is held only by vacuum_notify_all_workers_dropped_file: it serializes the notify-workers step of each drop (guarding vacuum_Last_dropped_vfid and the ++vacuum_Dropped_files_version bump, one file at a time) — page edits rely on page latches, and reading workers never take it. Debug builds mirror the page chain in memory via vacuum_Track_dropped_files. (VACUUM_DROPPED_FILE_FLAG_DUPLICATE is defined but unused in the current source — a re-dropped vfid is instead handled in place: the binary search finds the existing entry and only its mvccid is updated.) Behavior is Ch. 10’s subject; the layout is fixed here.

Figure 1-3 — One block&#x27;s life across every structure

Figure 1-3 — Panorama of vacuum’s data structures. Blue arrows: block birth (Ch. 3–4). Green: master’s append and dispatch (Ch. 4, 6). Red: completion report (Ch. 9). Dashed purple: worker reads log pages back from the volume (Ch. 7).

  1. A vacuum block has four representations: the log_Gl.hdr accumulator, an in-flight vacuum_data_entry in vacuum_Block_data_buffer, a persisted entry in a vacuum_data_page, and a flags-carrying VACUUM_LOG_BLOCKID in vacuum_Finished_job_queue. The same four values — start LSA, oldest/newest MVCCID, blockid — flow through all of them.
  2. vacuum_data_entry::blockid packs a 2-bit status (AVAILABLE 00, IN_PROGRESS 01, VACUUMED 10) plus the orthogonal bit-61 INTERRUPTED flag above a 61-bit id; set_interrupted returns status to AVAILABLE while raising the flag, set_vacuumed clears it — so 11 never appears.
  3. vacuum_data_page keeps a dense window [index_unvacuumed, index_free) of strictly consecutive blockids (no-MVCC gap blocks get placeholder entries born VACUUMED); get_index_of_blockid exploits this for an O(1) lookup, and slot 0 of an empty page doubles as the persisted last_blockid (Invariant 1-D).
  4. vacuum_Data.first_page / last_page are permanently fixed; the vacuum_fix_data_page / vacuum_unfix_data_page wrappers route around pgbuf for those two VPIDs. The job cursor anchors on m_blockid (stable) and recomputes its cached m_page/m_index after the window shifts.
  5. The two lock-free queues exist so transaction threads never touch vacuum data pages and workers never write them — the master is the sole writer of the persisted representation. A full vacuum_Block_data_buffer drops block metadata in release builds; recovery’s backward chain scan is the safety net (Ch. 11).
  6. VACUUM_WORKER is a bag of persistent per-thread buffers (prefetch buffer of 1 + log_block_npages pages — the +1 for the last record spilling past the start_lsa page — heap-object array, undo buffer, zip scratch) plus drop_files_version, whose per-worker minimum gates dropped-files cleanup; the master reuses the struct with idx == -1.
  7. Block geometry is pure arithmetic: blockid = pageid / log_block_npages and first pageid = blockid * log_block_npages — every fence (keep_from_log_pageid, prefetch ranges) derives from these two formulas.

Chapter 2: Initialization and Memory Management

Section titled “Chapter 2: Initialization and Memory Management”

How the Chapter 1 structures — vacuum_Data, the two lock-free queues, vacuum_Master, vacuum_Workers, the on-disk file pair — come into existence, how threads acquire vacuum identity, and in what order everything is torn down. Design rationale for the master/worker split lives in the high-level companion (cubrid-vacuum.md); none of it is re-derived here.

Startup is two-phase. vacuum_initialize runs in boot_restart_server (boot_sr.c) before log_initialize, because crash recovery already needs vacuum_Data.log_block_npages, the dropped-files VPID, and worker contexts. vacuum_boot runs after recovery, because the master must not consume vacuum data while redo is still rewriting it.

2.1 vacuum_initialize — parameter capture and static arrays

Section titled “2.1 vacuum_initialize — parameter capture and static arrays”

Inputs come from boot_Db_parm: block size in log pages plus the two VFIDs from createdb (section 2.2). The is_restore flag is r_args != NULL && r_args->is_restore_from_backup in boot_restart_server; the only other caller, xboot_emergency_patch, hard-codes false. It lands in vacuum_Data.is_restoredb_session, consumed exactly once — at SA shutdown (section 2.7).

// vacuum_initialize -- src/query/vacuum.c
if (prm_get_bool_value (PRM_ID_DISABLE_VACUUM))
return NO_ERROR; /* <- branch 1: vacuum disabled, nothing is built */
vacuum_Data.is_restoredb_session = is_restore;
vacuum_Data.log_block_npages = vacuum_log_block_npages;
vacuum_Data.page_data_max_count = (DB_PAGESIZE - VACUUM_DATA_PAGE_HEADER_SIZE) / sizeof (VACUUM_DATA_ENTRY);
// ... condensed: VFID copies; SA_MODE is_vacuum_complete = false; dropped-files globals, mutex init ...
if (vacuum_get_first_page_dropped_files (thread_p, &vacuum_Dropped_files_vpid) != NO_ERROR)
{ assert (false); goto error; } /* <- branch 2: sticky first page lookup failed */
vacuum_Block_data_buffer = new lockfree::circular_queue<vacuum_data_entry> (VACUUM_BLOCK_DATA_BUFFER_CAPACITY);
if (vacuum_Block_data_buffer == NULL) goto error; /* <- branches 3, 4: queue NULL checks
(vacuum_Finished_job_queue is identical) */
vacuum_Master.state = VACUUM_WORKER_STATE_EXECUTE; /* <- master is *always* in execute state */
vacuum_Master.idx = -1; /* private_lru_index = -1, buffers NULL */
for (i = 0; i < VACUUM_MAX_WORKER_COUNT; i++)
{
vacuum_Workers[i].state = VACUUM_WORKER_STATE_INACTIVE;
vacuum_Workers[i].private_lru_index = pgbuf_assign_private_lru (thread_p); /* <- eager */
vacuum_Workers[i].allocated_resources = false;
vacuum_Workers[i].idx = i; /* buffer pointers NULL, capacities 0 */
}
return NO_ERROR;
error:
vacuum_finalize (thread_p); /* <- error path reuses the full teardown */
return (error_code == NO_ERROR) ? ER_FAILED : error_code;

Branch (2) is a thin wrapper over file_get_sticky_first_page, so failure means the createdb-time file is missing — an unbootable database. Branches (3)/(4) are vestigial under throwing new, but route to the shared error: label so a half-built state is torn down by the same vacuum_finalize as a fully-built one. Note the worker-loop asymmetry: all VACUUM_MAX_WORKER_COUNT (50) static slots get a private page-buffer LRU list immediately, even though PRM_ID_VACUUM_WORKER_COUNT may cap actual threads far lower — LRU indices must be claimed during page-buffer bootstrap; every other per-worker buffer waits for the first job (section 2.5). The slots are VACUUM_WORKER structs whose per-field Field | Role | Why table lives in Chapter 1; vacuum_initialize only NULLs the pointers and zeroes the capacities, leaving state = INACTIVE, allocated_resources = false.

2.2 The createdb pair — where the files come from

Section titled “2.2 The createdb pair — where the files come from”

boot_create_all_volumes (createdb) calls both creation functions back to back and stores the VFIDs in boot_Db_parm. vacuum_create_file_for_vacuum_data: file_create_with_npages (FILE_VACUUM_DATA, one page), file_alloc, then the first-page VPID is written into the file descriptor via file_descriptor_update — that descriptor is how restart finds the page — and the page is formatted by vacuum_init_data_page_with_last_blockid (..., 0): empty data, last_blockid 0. vacuum_create_file_for_dropped_files is the same skeleton with two differences: it allocates via file_alloc_sticky_first_page instead of writing a descriptor (hence the boot-time file_get_sticky_first_page lookup), and it formats the page inline — NULL next_page, n_dropped_files = 0 — before vacuum_set_dirty_dropped_entries_page (..., FREE). Exits per function: create error, alloc error, NULL-page guard, success — plus a fifth for vacuum data (descriptor-update error). Page format details: Chapter 10.

2.3 vacuum_boot — load state, then start threads

Section titled “2.3 vacuum_boot — load state, then start threads”
flowchart TD
    A["boot_restart_server"] --> B["vacuum_initialize<br/>(before log_initialize)"]
    B --> C["log_initialize -- crash recovery runs here"]
    C --> D["vacuum_boot"]
    D --> E["vacuum_data_load_and_recover<br/>recover entries, stash VPIDs in vacuum_Data_load, unload"]
    E --> F["vacuum_load_dropped_files_from_disk"]
    F --> G["new vacuum_master_entry_manager<br/>new vacuum_worker_entry_manager"]
    G --> H{"SERVER_MODE?"}
    H -- yes --> I["thread_create_stats_worker_pool<br/>create_daemon vacuum-master"]
    H -- no --> J["no threads -- xvacuum drives jobs"]
    I --> K["vacuum_Is_booted = true"]
    J --> K
    K --> L["first master iteration / first SA pass<br/>vacuum_data_load_first_and_last_page re-fixes the pair"]

Figure 2-1: boot sequence; vacuum_initialize and vacuum_boot bracket crash recovery.

vacuum_boot (assert (!vacuum_Is_booted) — boot once) has four runtime branches plus a compile-time split: the disable-vacuum branch still calls log_Gl.mvcc_table.update_global_oldest_visible () (“for debug only” — the Chapter 5 watermark must advance even with vacuum off); a thread_p == NULL fallback to thread_get_thread_entry_info; and two error returns from the load steps — vacuum_data_load_and_recover (Chapter 11) and vacuum_load_dropped_files_from_disk (Chapter 10). Note what the first one does not do: it fixes the page chain only while walking it; its end: label stashes the first/last VPIDs into vacuum_Data_load and unloads both pages — they “must be fixed by vacuum master” (in-code comment), not by the boot thread (section 2.6). Only then are the two entry managers allocated — in both modes, because the SA path claims workers through vacuum_Worker_entry_manager too. Under SERVER_MODE the worker pool gets PRM_ID_VACUUM_WORKER_COUNT threads and one task queue (thread_create_stats_worker_pool), and the master becomes a cubthread::daemon running vacuum_master_task every PRM_ID_VACUUM_MASTER_WAKEUP_INTERVAL ms; SA builds create no thread. Cross-check: log_vacuum_worker_pool is computed from logging flags but its one use is commented out (// m_log = log_vacuum_worker_pool) — dead configuration in this revision.

2.4 Thread identity — entry managers, system tdes, pool handoff

Section titled “2.4 Thread identity — entry managers, system tdes, pool handoff”

Vacuum threads are generic cubthread workers until an entry manager hook brands them. Both managers funnel into one helper:

// vacuum_init_thread_context -- src/query/vacuum.c
context.type = type; /* <- TT_VACUUM_MASTER or TT_VACUUM_WORKER */
context.vacuum_worker = worker;
context.check_interrupt = false; /* <- vacuum is immune to client interrupt checks */
assert (context.get_system_tdes () == NULL);
context.claim_system_worker (); /* <- new log_system_tdes; tran_index = LOG_SYSTEM_TRAN_INDEX */

After restart, claim_system_worker draws its tdes from the shared allocator in log_system_tran.cpp (systdes_claim_tdes: systb_Next_tranid seeded with LOG_SYSTEM_WORKER_FIRST_TRANID = NULL_TRANID - 1, stepped by -1, free-list reuse). The macros VACUUM_WORKER_INDEX_TO_TRANID / ..._TRANID_TO_INDEX still sit at the top of vacuum.c but have no remaining callers — leftovers of the old fixed-TRANID-per-slot design; binding is now dynamic, and recovery rebuilds system transactions via log_system_tdes::rv_get_or_alloc_tdes from TRANIDs in the log.

vacuum_master_entry_manager (extends cubthread::daemon_entry_manager) — no data members, two final overrides:

MemberRoleWhy it exists
on_daemon_createAssert vacuum_Master.state == VACUUM_WORKER_STATE_EXECUTE; vacuum_init_thread_context (..., TT_VACUUM_MASTER, &vacuum_Master)Master’s VACUUM_WORKER is the static singleton; no pool claim
on_daemon_retirevacuum_finalize (&context) (tagged // todo: is this the rightful place?); retire_system_worker; null vacuum_worker (asserting it was &vacuum_Master)Piggybacks subsystem teardown on the daemon’s death — section 2.7

vacuum_worker_entry_manager (extends cubthread::entry_manager):

MemberRoleWhy it exists
m_poolresource_shared_pool<VACUUM_WORKER>* over the static 50-slot array; deleted in destructorNon-owning: claim () pops a slot pointer off a mutex-guarded free stack, retire () pushes it back; no fixed thread-slot binding
claim_worker / retire_workerPublic pass-throughs to m_poolLets vacuum_sa_run_job claim a worker without pool hooks
on_createtran_index = 0; vacuum_init_thread_context (..., TT_VACUUM_WORKER, m_pool->claim ()); vacuum_worker_allocate_resources (assert on failure); copy private_lru_index into the entryPer pooled thread, before its first task: identity, tdes, buffers, private LRU handoff
on_retireretire_system_worker; worker state = INACTIVE; m_pool->retire; null vacuum_worker; private_lru_index = -1Mirror of on_create; entry returns to the global manager clean
on_recycletran_index = LOG_SYSTEM_TRAN_INDEXRecycling resets entries to NULL_TRAN_INDEX; vacuum must keep looking like the system transaction

Invariant — vacuum thread identity is atomic. An entry is a vacuum thread iff type is TT_VACUUM_MASTER/TT_VACUUM_WORKER, vacuum_worker is non-NULL, a system tdes is claimed, and tran_index == LOG_SYSTEM_TRAN_INDEX. The hooks set and clear all four together (vacuum_init_thread_context asserts no tdes pre-exists; on_retire asserts vacuum_worker != NULL). Violated, vacuum_get_vacuum_worker asserts and vacuum log records could carry a client TRANID.

SA mode drapes the same identity over the main thread temporarily: vacuum_convert_thread_to_master (save thread_p->type, set master identity, claim tdes only if none exists), vacuum_convert_thread_to_worker (same, plus vacuum_worker_allocate_resources with assert_release on failure), vacuum_restore_thread (restore type, null vacuum_worker, retire_system_worker, tran_index = LOG_SYSTEM_TRAN_INDEX); each tolerates thread_p == NULL by self-lookup. xvacuum brackets the SA pass with convert-to-master/restore; vacuum_sa_run_job nests convert-to-worker/back-to-master per block, asserting the saved types match, then retires the slot via retire_worker.

2.5 Lazy buffers — vacuum_worker_allocate_resources

Section titled “2.5 Lazy buffers — vacuum_worker_allocate_resources”
// vacuum_worker_allocate_resources -- src/query/vacuum.c
assert (worker->state == VACUUM_WORKER_STATE::VACUUM_WORKER_STATE_INACTIVE);
if (worker->allocated_resources)
return NO_ERROR; /* <- idempotent: SA mode re-converts the same thread repeatedly */
worker->log_zip_p = log_zip_alloc (IO_PAGESIZE);
// ... condensed: NULL -> logpb_fatal_error + return ER_FAILED ...
worker->heap_objects = (VACUUM_HEAP_OBJECT *) malloc (worker->heap_objects_capacity * sizeof (VACUUM_HEAP_OBJECT));
// ... condensed: NULL -> goto error; same for undo_data_buffer (IO_PAGESIZE) ...
worker->prefetch_log_buffer = (char *) malloc (VACUUM_PREFETCH_LOG_BLOCK_BUFFER_PAGES * LOG_PAGESIZE);
// ... condensed: NULL -> goto error ... /* <- (1 + log_block_npages) log pages */
assert (logtb_get_system_tdes (thread_p) != NULL); /* <- tdes must already be claimed */
worker->allocated_resources = true;

Six branches: the short-circuit, four allocation failures (each calls logpb_fatal_error — a worker without buffers is server-fatal, since vacuum falling behind is unbounded debt), and success. The four allocations fill exactly the four VACUUM_WORKER buffer fields left NULL by vacuum_initializelog_zip_p, heap_objects (VACUUM_DEFAULT_HEAP_OBJECT_BUFFER_SIZE), undo_data_buffer (IO_PAGESIZE), prefetch_log_buffer — then flip allocated_resources = true. The first failure returns directly; the rest goto error, where vacuum_finalize_worker frees whatever subset exists — four independent idempotent frees (log_zip_free plus three free_and_init), which also makes it the universal teardown vacuum_finalize runs on all 50 slots plus vacuum_Master, allocated or not. The prefetch sizing ties memory to the Chapter 3 block geometry: VACUUM_PREFETCH_LOG_BLOCK_BUFFER_PAGES = 1 + log_block_npages, one whole log block plus one page.

2.6 Why first_page and last_page stay permanently fixed

Section titled “2.6 Why first_page and last_page stay permanently fixed”

vacuum_Data.first_page and last_page are hot on every master iteration and block append, so vacuum holds write latches on them for the whole of the master’s runtime. The page-buffer discipline is bent in exactly three wrappers:

// vacuum_fix_data_page -- src/query/vacuum.c
#define vacuum_fix_data_page(thread_p, vpidp) \
(vacuum_Data.first_page != NULL && VPID_EQ (pgbuf_get_vpid_ptr ((PAGE_PTR) vacuum_Data.first_page), vpidp) ? \
vacuum_Data.first_page : /* <- short-circuit: reuse held latch */ \
/* ... same test against vacuum_Data.last_page ... */ \
(VACUUM_DATA_PAGE *) pgbuf_fix (thread_p, vpidp, OLD_PAGE, PGBUF_LATCH_WRITE, PGBUF_UNCONDITIONAL_LATCH))

vacuum_unfix_data_page and vacuum_set_dirty_data_page apply the same identity test: a page aliasing the held pair is never unfixed and is dirtied with DONT_FREE. Cross-check note: vacuum_set_dirty_data_page is now an inline function taking the pointer by value, so its trailing data_page = NULL only clears the local copy — unlike the still-macro vacuum_unfix_data_page, it cannot null the caller’s pointer.

Invariant — exactly one extra fix per cached page, held by vacuum. Every fix of a first/last VPID must route through these wrappers; a raw pgbuf_fix/pgbuf_unfix on those VPIDs skews the fix count, which debug builds verify via vacuum_verify_vacuum_data_page_fix_count. Violated, either the latch leaks or the cached pointer dangles.

Establishment always funnels through vacuum_data_load_first_and_last_pagenot through boot. vacuum_data_load_and_recover deliberately unloads the pair on exit: the in-code comment on vacuum_master_task::execute explains that the load “was initially in boot_restart_server”, but boot’s commit complains about — and unfixes — any page its thread left fixed, “so we have to load the data here (vacuum master never commits)”. Hence the master’s first iteration re-fixes the pair in SERVER_MODE; xvacuum and vacuum_sa_reflect_last_blockid do it in SA mode. Load branches: is_loaded early return; first-page fix failure (assert_release, return); the single-page case where vpid_first == vpid_last makes last_page alias first_page (why vacuum_unfix_first_and_last_data_page only unfixes last_page when it differs); last-page fix failure (unfix both, assert_release). The inverse, vacuum_data_unload_first_and_last_page, early-returns when not loaded, stashes both VPIDs into the file-scope vacuum_Data_load struct, unfixes the pair, clears is_loaded. VACUUM_DATA_LOAD, in full:

FieldRoleWhy it exists
vpid_firstFirst-page VPID, saved at unloadReload without re-reading the file descriptor; NULL means never loaded (checked by vacuum_sa_reflect_last_blockid)
vpid_lastLast-page VPID, saved at unloadSame; reload re-fixes the tail directly instead of walking the page chain

2.7 Shutdown — workers first, master last, finalize inside

Section titled “2.7 Shutdown — workers first, master last, finalize inside”

xboot_shutdown_server (boot_sr.c) encodes the ordering contract in its comments:

// xboot_shutdown_server -- src/transaction/boot_sr.c
log_abort_all_active_transaction (thread_p);
vacuum_stop_workers (thread_p); /* <- 1: no new jobs, drain pool */
// ... condensed: stats reflection, caches, boot_remove_all_temp_volumes ...
// only after all logging is finished can this vacuum master be stopped; boot_remove_all_temp_volumes
// may add a final log entry
vacuum_stop_master (thread_p); /* <- 2: daemon dies, vacuum_finalize runs */

vacuum_stop_workers early-returns when !vacuum_Is_booted, calls vacuum_notify_server_shutdownvacuum_Data.shutdown_sequence.request_shutdown (), then (if the pool exists) logs pool stats, stop_execution (), destroy_worker_pool, and deletes vacuum_Worker_entry_manager — destroying the resource pool, whose destructor asserts every worker was retired. vacuum_stop_master (same guard) destroys the daemon if one exists — triggering on_daemon_retirevacuum_finalize on the master’s own thread — deletes the master entry manager, and clears vacuum_Is_booted. The error label of boot_restart_server calls the same two functions back to back, so a failed boot reuses the ordered teardown.

vacuum_shutdown_sequence — all fields:

FieldRoleWhy it exists
m_stateNO_SHUTDOWN to SHUTDOWN_REQUESTED (SERVER_MODE only) to SHUTDOWN_REGISTEREDSeparates “shutdown asked” from “master acknowledged”; polled via is_shutdown_requested / check_shutdown_request
m_state_mutexSERVER_MODE only; guards transitions and the condvarRequest and acknowledgement happen on different threads
m_condvarSERVER_MODE only; requester waits inside request_shutdownMakes request_shutdown synchronous — returns only after registration

The handshake: request_shutdown returns immediately if already SHUTDOWN_REGISTERED (re-requests are no-ops), else sets SHUTDOWN_REQUESTED and blocks until m_state == SHUTDOWN_REGISTERED || vacuum_Master_daemon == NULL. On its next wakeup the master’s vacuum_master_task::check_shutdown calls check_shutdown_request: NO_SHUTDOWN → false; SHUTDOWN_REGISTERED → true; SHUTDOWN_REQUESTED → take the mutex, set SHUTDOWN_REGISTERED, notify_one, true (in SA builds this branch is assert (false) — no requester thread exists). If no daemon was ever created, the requester self-registers; in SA mode request_shutdown jumps straight to SHUTDOWN_REGISTERED.

stateDiagram-v2
    [*] --> NO_SHUTDOWN
    NO_SHUTDOWN --> SHUTDOWN_REQUESTED : request_shutdown\nSERVER_MODE, requester blocks on condvar
    SHUTDOWN_REQUESTED --> SHUTDOWN_REGISTERED : master check_shutdown_request\nor requester self-registers when daemon is NULL
    NO_SHUTDOWN --> SHUTDOWN_REGISTERED : request_shutdown in SA_MODE
    SHUTDOWN_REGISTERED --> [*]

Figure 2-2: vacuum_shutdown_sequence states; the REQUESTED state exists only in SERVER_MODE.

Invariant — request before destroy. request_shutdown must run while the master daemon still exists (or before it was ever created); only the master’s acknowledgement or the daemon-NULL escape unblocks the requester. Workers-before-master ordering guarantees this; reversed, the shutdown thread could park on m_condvar with nobody left to notify it.

vacuum_finalize (reached from the init error path, the master’s retirement, or end of SA xvacuum) walks six guarded steps: disable-vacuum return; assert (!vacuum_is_work_in_progress ...); drain vacuum_Finished_job_queue via vacuum_data_mark_finished (Chapter 9), assert it emptied, delete it; loop-consume vacuum_Block_data_buffer with vacuum_consume_buffer_log_blocks — a loop because consuming appends to vacuum data, which itself logs and can complete yet another block — with a safe-guard assert/break if vacuum_Data.is_loaded is false; in SA builds, vacuum_data_empty_update_last_blockid; then vacuum_data_unload_first_and_last_page, a belt-and-braces pgbuf_unfix_all, vacuum_finalize_worker over all 50 slots plus vacuum_Master, and pthread_mutex_destroy on the dropped-files mutex.

SA mode inverts who calls finalize. (The SERVER_MODE build of xvacuum is a stub returning ER_VACUUM_CS_NOT_AVAILABLE — a client-issued VACUUM statement is rejected; the rest is the SA compile branch.) No daemon exists, so vacuum_stop_master never triggers it — vacuum_finalize runs at the end of xvacuum, inside the convert-to-master/restore bracket, after which vacuum_Data.is_vacuum_complete = true makes further xvacuum calls no-ops (the flag vacuum_initialize reset at boot). After vacuum_stop_master, the SA branch of xboot_shutdown_server adds one extra step: vacuum_sa_reflect_last_blockid, which early-returns if vacuum_Data_load.vpid_first is NULL (fresh createdb or aborted boot) or if vacuum_Data.is_restoredb_session is set (“restoredb doesn’t vacuum” — the lone consumer of the section 2.1 flag); otherwise it reloads the page pair, takes logpb_last_complete_blockid () (early-out on VACUUM_NULL_LOG_BLOCKID), persists it through vacuum_data_empty_update_last_blockid, and unloads again.

  1. Startup is two-phase around crash recovery: vacuum_initialize (parameter capture, queues, static vacuum_Master/vacuum_Workers[50]) runs before log_initialize; vacuum_boot (data load + recovery, dropped-files load, daemon and pool creation) runs after, gated by vacuum_Is_booted.
  2. The on-disk pair is born at createdb: vacuum_create_file_for_vacuum_data persists its first-page VPID in the file descriptor; vacuum_create_file_for_dropped_files uses a sticky first page — the two boot-time lookup paths mirror this.
  3. Thread identity is granted only through entry-manager hooks or the SA convert functions, always as a bundle: thread type, vacuum_worker pointer, a dynamically claimed system tdes with negative TRANID (VACUUM_WORKER_INDEX_TO_TRANID is a callerless leftover), LOG_SYSTEM_TRAN_INDEX, and the private LRU handoff.
  4. Worker slots are pooled, not bound: a resource_shared_pool over the static array lets any thread claim any slot; only private_lru_index is eager, everything else arrives lazily and idempotently in vacuum_worker_allocate_resources, with failure escalated to logpb_fatal_error.
  5. first_page/last_page stay write-latched for the master’s whole runtime — established by the master’s first iteration (or the SA pass), not by boot, because boot’s commit would unfix its thread’s latches; vacuum_fix_data_page, vacuum_unfix_data_page, and the DONT_FREE branch of vacuum_set_dirty_data_page are the only legal access paths, verified by debug fix-count checks.
  6. Shutdown is workers-then-master: vacuum_stop_workers runs the synchronous vacuum_shutdown_sequence handshake while the daemon can still acknowledge; vacuum_stop_master then destroys it, and on_daemon_retire runs vacuum_finalize. The boot error path reuses the same two calls.
  7. SA mode has no daemon: vacuum_finalize runs inside xvacuum (then is_vacuum_complete blocks re-entry), and shutdown appends vacuum_sa_reflect_last_blockid — skipped for restoredb sessions via the is_restoredb_session flag captured at vacuum_initialize.

Chapter 3: Block Birth in the Log Append Path

Section titled “Chapter 3: Block Birth in the Log Append Path”

The subsystem is booted (Ch 2) and idle. Where does a block come from? Nowhere inside vacuum.c: blocks are born as a side effect of ordinary transactions appending log records, inside prior_lsa_next_record_internal (log_append.cpp) — the tree’s sole caller of vacuum_produce_log_block_data — under the prior-LSA mutex. Vacuum only receives them through vacuum_Block_data_buffer. This chapter traces every branch of the producer side.

Every log record is first materialized as a log_prior_node and appended to the prior list — an in-memory queue the log flusher later copies into log pages. If the record is MVCC-flavored, two extra things happen under the same mutex hold: its vacuum_info.prev_mvcc_op_log_lsa is patched to point at the previous MVCC record, and the per-block accumulator in log_Gl.hdr is folded forward. When a record’s start LSA lands in a different log block than the accumulator, the pending block is closed and pushed to vacuum as a vacuum_data_entry.

flowchart LR
  NODE["log_prior_node"] -->|"prior_lsa_next_record\nunder prior_lsa_mutex"| LIST["prior list"]
  NODE -->|"MVCC type: patch vacuum_info,\nfold mvccid into header"| HDR["log_Gl.hdr accumulator\nmvcc_op_log_lsa\noldest_visible_mvccid\nnewest_block_mvccid\ndoes_block_need_vacuum"]
  HDR -->|"block boundary crossed:\nvacuum_produce_log_block_data"| ENTRY["vacuum_data_entry"]
  ENTRY --> QUEUE["vacuum_Block_data_buffer\nlockfree circular_queue, cap 1024"]
  QUEUE -->|"drained by Ch 4"| VD["vacuum data"]

Figure 3-1: the producer side. Everything left of the queue runs in transaction threads inside log_append.cpp; vacuum only consumes.

3.2 log_prior_node and the log_Gl.hdr accumulator

Section titled “3.2 log_prior_node and the log_Gl.hdr accumulator”
// log_prior_node -- src/transaction/log_append.hpp
struct log_prior_node
{
LOG_RECORD_HEADER log_header;
LOG_LSA start_lsa; /* for assertion */
// ... condensed: tde_encrypted ...
int data_header_length;
char *data_header;
// ... condensed: ulength/udata, rlength/rdata payload pointers ...
LOG_PRIOR_NODE *next;
};

Three fields matter. start_lsa is assigned by prior_lsa_start_append and drives the boundary test. log_header.type selects the dispatch branch. data_header holds the type-specific fixed header (e.g., LOG_REC_MVCC_UNDOREDO) still in memory — the append path casts it and patches vacuum_info in place before the bytes reach a log page, which is what makes the backward MVCC chain possible.

The accumulator lives in log_header — the file header log_Gl.hdr (log_storage.hpp), not the per-record header. Its fields change meaning with does_block_need_vacuum:

log_Gl.hdr fielddoes_block_need_vacuum == false== true
mvcc_op_log_lsaLast MVCC op of an already produced block (kept — Invariant 3-E)Last MVCC op of the pending block
oldest_visible_mvccidStale; re-sampled at next block birthHorizon frozen at the block’s first MVCC record
newest_block_mvccidMVCCID_NULL (reset on produce)Running max of the block’s MVCCIDs
does_block_need_vacuumBoundary branch disarmedPending block exists; branch armed

3.3 prior_lsa_next_record_internal — branch-complete walkthrough

Section titled “3.3 prior_lsa_next_record_internal — branch-complete walkthrough”

Both public entry points funnel here — prior_lsa_next_record takes the mutex itself (LOG_PRIOR_LSA_WITHOUT_LOCK); prior_lsa_next_record_with_lock is for callers already holding it. First, prior_lsa_start_append stamps node->start_lsa from log_Gl.prior_info.prior_lsa and threads the per-transaction undo chain — except for system-worker transactions outside a sysop (vacuum workers themselves log this way), whose chain LSAs are nulled. Then three zones run: (A) block-boundary check, (B) record-type dispatch, (C) list insertion and overflow.

flowchart TD
  A1["prior_lsa_start_append\nstart_lsa assigned"] --> B0{"LOG_ISRESTARTED and\ndoes_block_need_vacuum?"}
  B0 -->|no| C0
  B0 -->|"yes, and blockid of\nhdr.mvcc_op_log_lsa\n!= blockid of start_lsa"| B2["vacuum_produce_log_block_data\ncloses pending block"]
  B2 --> C0{"record type?"}
  C0 -->|"4 MVCC shapes"| D1["extract vacuum_info + mvccid\npatch prev_mvcc_op_log_lsa\nprior_update_header_mvcc_info"]
  C0 -->|"5 recovery-bookkeeping\ntype groups"| E1["save LSAs into\ntdes rcv state"]
  C0 -->|"anything else"| F0
  D1 --> F0["append payloads\nprior_lsa_end_append\ninsert at prior list tail"]
  E1 --> F0
  F0 --> I0{"WITHOUT_LOCK and\nlist_size >= log buffer size?"}
  I0 -->|no| Z["return start_lsa"]
  I0 -->|"yes, server, not crash recovery"| I3["wake log flush daemon\nsleep 1ms"]
  I0 -->|"yes, crash recovery or SA"| I4["flush prior list inline\nunder LOG_CS"]
  I3 --> Z
  I4 --> Z

Figure 3-2: prior_lsa_next_record_internal. The vacuum spine is B0 to D1; E1 covers the non-vacuum dispatch arms.

Zone A runs before the type dispatch, on every record type:

// prior_lsa_next_record_internal -- src/transaction/log_append.cpp
if (LOG_ISRESTARTED () && log_Gl.hdr.does_block_need_vacuum)
{
assert (!LSA_ISNULL (&log_Gl.hdr.mvcc_op_log_lsa));
if (vacuum_get_log_blockid (log_Gl.hdr.mvcc_op_log_lsa.pageid) != vacuum_get_log_blockid (start_lsa.pageid))
{
assert (vacuum_get_log_blockid (log_Gl.hdr.mvcc_op_log_lsa.pageid)
<= (vacuum_get_log_blockid (start_lsa.pageid) - 1)); /* <- pending block strictly older */
vacuum_produce_log_block_data (thread_p);
}
}

Two consequences. First, a pending block is closed by the next record past the boundary, of any type — a plain LOG_COMMIT suffices; on a quiet system the block lingers open until traffic resumes, which the consumer side must tolerate (Ch 4, Ch 9). Second, LOG_ISRESTARTED () (log_Gl.rcv_phase == LOG_RESTARTED, log_impl.h) disarms production during recovery — Invariant 3-C.

Zone B recognizes exactly four MVCC shapes, with a three-way extraction because they carry the payload in different places:

// prior_lsa_next_record_internal -- src/transaction/log_append.cpp
if (node->log_header.type == LOG_MVCC_UNDO_DATA || node->log_header.type == LOG_MVCC_UNDOREDO_DATA
|| node->log_header.type == LOG_MVCC_DIFF_UNDOREDO_DATA
|| (node->log_header.type == LOG_SYSOP_END
&& ((LOG_REC_SYSOP_END *) node->data_header)->type == LOG_SYSOP_END_LOGICAL_MVCC_UNDO))
{
// ... condensed: vacuum_info / mvccid from LOG_REC_MVCC_UNDO (undo and
// sysop_end.mvcc_undo cases) or LOG_REC_MVCC_UNDOREDO (undoredo
// and diff-undoredo, which share the struct) ...
/* Save previous mvcc operation log lsa to vacuum info */
LSA_COPY (&vacuum_info->prev_mvcc_op_log_lsa, &log_Gl.hdr.mvcc_op_log_lsa);
prior_update_header_mvcc_info (start_lsa, mvccid);
}

The LSA_COPY welds the backward MVCC chain: log_vacuum_info::prev_mvcc_op_log_lsa (log_record.hpp) of the new record gets the previous MVCC record’s LSA, written directly into the node’s data_header bytes. The worker (Ch 7) and crash recovery (Ch 11) walk this chain backwards.

The five non-vacuum else if arms: LOG_SYSOP_START_POSTPONE, non-MVCC LOG_SYSOP_END, LOG_COMMIT_WITH_POSTPONE (+_OBSOLETE), LOG_SYSOP_ATOMIC_START, and LOG_COMMIT/LOG_ABORT — all save recovery-bookkeeping LSAs into tdes->rcv (or flip tdes state) under the same mutex hold; none touch the vacuum accumulator.

Zone C inserts the node at the prior-list tail, grows list_size (in bytes), and — only in WITHOUT_LOCK mode, after releasing the mutex — checks overflow against logpb_get_memsize (). On overflow, server mode outside crash recovery wakes the flush daemon and naps 1 ms; crash-recovery and standalone modes flush the list inline under LOG_CS. With-lock callers skip this: they cannot flush while holding the mutex.

3.4 prior_update_header_mvcc_info — folding the block forward

Section titled “3.4 prior_update_header_mvcc_info — folding the block forward”
// prior_update_header_mvcc_info -- src/transaction/log_append.cpp
static void
prior_update_header_mvcc_info (const LOG_LSA &record_lsa, MVCCID mvccid)
{
if (!log_Gl.hdr.does_block_need_vacuum)
{
// first mvcc record for this block
log_Gl.hdr.oldest_visible_mvccid = log_Gl.mvcc_table.get_global_oldest_visible (); /* <- sampled ONCE */
log_Gl.hdr.newest_block_mvccid = mvccid;
}
else
{
// ... condensed: sanity asserts ...
assert (vacuum_get_log_blockid (log_Gl.hdr.mvcc_op_log_lsa.pageid) == vacuum_get_log_blockid (record_lsa.pageid));
if (log_Gl.hdr.newest_block_mvccid < mvccid)
{
log_Gl.hdr.newest_block_mvccid = mvccid; /* <- running max; ids may arrive out of order */
}
}
log_Gl.hdr.mvcc_op_log_lsa = record_lsa;
log_Gl.hdr.does_block_need_vacuum = true;
}

A block is literally born on the false branch — that is the only place oldest_visible_mvccid is (re)sampled.

Invariant 3-A — the watermark is frozen at logging time. oldest_visible_mvccid is sampled exactly once, when the block’s first MVCC record is logged — never at consumption. Re-sampling later would let the advanced horizon exclude transactions still live when the block’s undo was created, breaking monotonic oldest_unvacuumed tracking (Ch 5); the recovery path even refuses to reset it between rebuilt blocks (“we don’t reset data.oldest_visible_mvccid between blocks” in vacuum_recover_lost_block_data). Recovery is not identical to the live path here — see Cross-check Notes (3.7).

Invariant 3-B — one accumulator, one block. Every record folded into the accumulator lies in the same log block. Enforced by ordering: zone A runs before prior_update_header_mvcc_info under one hold of prior_lsa_mutex, and the else branch asserts vacuum_get_log_blockid equality. Otherwise the entry’s blockid would not cover all its records.

Invariant 3-C — recovery never produces. The LOG_ISRESTARTED guard keeps redo-time appends from pushing entries; blocks pending at crash time are rebuilt by vacuum_recover_lost_block_data instead (Ch 11). Without it, replayed appends would mint duplicates of blocks already registered in vacuum data before the crash.

3.5 vacuum_get_log_blockid — fixed block geometry

Section titled “3.5 vacuum_get_log_blockid — fixed block geometry”
// vacuum_get_log_blockid -- src/query/vacuum.c
VACUUM_LOG_BLOCKID
vacuum_get_log_blockid (LOG_PAGEID pageid)
{
if (prm_get_bool_value (PRM_ID_DISABLE_VACUUM) || pageid == NULL_PAGEID)
{
return VACUUM_NULL_LOG_BLOCKID;
}
assert (vacuum_Data.log_block_npages != 0);
return pageid / vacuum_Data.log_block_npages;
}
// VACUUM_FIRST_LOG_PAGEID_IN_BLOCK -- src/query/vacuum.c
#define VACUUM_FIRST_LOG_PAGEID_IN_BLOCK(blockid) \
((blockid) * vacuum_Data.log_block_npages)
#define VACUUM_LAST_LOG_PAGEID_IN_BLOCK(blockid) \
(VACUUM_FIRST_LOG_PAGEID_IN_BLOCK (blockid + 1) - 1)

A block is pure arithmetic; no block object exists until the boundary branch decides one has ended. The disabled/NULL_PAGEID early return yields VACUUM_NULL_LOG_BLOCKID (-1, log_common_impl.h), making the boundary comparison inert when vacuum is off.

Invariant 3-D — block boundaries align to a fixed log-page count. Block b covers exactly pages [b * log_block_npages, (b+1) * log_block_npages - 1]; division and multiplication above are exact inverses. log_block_npages is set once in vacuum_initialize (Ch 2) and never changes — every blockid persisted in vacuum data encodes the geometry. The worker’s page-range termination (Ch 7) and recovery’s per-block stop page (Ch 11) rely on it.

3.6 vacuum_produce_log_block_data — minting the entry

Section titled “3.6 vacuum_produce_log_block_data — minting the entry”
// vacuum_produce_log_block_data -- src/query/vacuum.c
void
vacuum_produce_log_block_data (THREAD_ENTRY * thread_p)
{
if (prm_get_bool_value (PRM_ID_DISABLE_VACUUM))
{
return; /* branch 1: vacuum disabled */
}
assert (log_Gl.hdr.does_block_need_vacuum == true);
VACUUM_DATA_ENTRY block_data { log_Gl.hdr }; /* <- snapshot the accumulator */
// reset info for next block
log_Gl.hdr.does_block_need_vacuum = false;
log_Gl.hdr.newest_block_mvccid = MVCCID_NULL; /* <- mvcc_op_log_lsa deliberately NOT reset */
if (vacuum_Block_data_buffer == NULL)
{
assert (false);
return; /* branch 2: not booted, debug-only trap */
}
// ... condensed: vacuum_er_log of the new entry ...
if (!vacuum_Block_data_buffer->produce (block_data))
{
/* TODO: ... Make sure that we do not lose vacuum data ... */
vacuum_er_log_error (VACUUM_ER_LOG_ERROR, "%s", "Cannot produce new log block data! The buffer is already full.");
assert (false);
return; /* branch 3: buffer full -- entry DROPPED in release */
}
perfmon_add_stat (thread_p, PSTAT_VAC_NUM_TO_VACUUM_LOG_PAGES, vacuum_Data.log_block_npages);
}

The entry constructor (a log_header overload delegates to this one) derives blockid from the accumulator’s last-record LSA:

// vacuum_data_entry::vacuum_data_entry -- src/query/vacuum.c
vacuum_data_entry::vacuum_data_entry (const log_lsa &lsa, MVCCID oldest, MVCCID newest)
: blockid (VACUUM_NULL_LOG_BLOCKID)
, start_lsa (lsa) /* <- lsa of LAST mvcc op in the block, despite the name */
, oldest_visible_mvccid (oldest)
, newest_mvccid (newest)
{
// ... condensed: asserts, incl. oldest <= newest ...
blockid = vacuum_get_log_blockid (start_lsa.pageid);
}

Note start_lsa is where the worker starts its backward walk — the last MVCC op — not the block’s first page.

blockid is a VACUUM_LOG_BLOCKID — an int64_t alias from log_storage.hpp — but only the low 61 bits carry the block number; the top three are lifecycle flags, set later by the master and workers (Ch 5-7), never by the producer:

// VACUUM_DATA_ENTRY_FLAG_MASK -- src/query/vacuum.c
#define VACUUM_DATA_ENTRY_FLAG_MASK 0xE000000000000000
#define VACUUM_DATA_ENTRY_BLOCKID_MASK 0x1FFFFFFFFFFFFFFF
#define VACUUM_BLOCK_STATUS_MASK 0xC000000000000000
#define VACUUM_BLOCK_STATUS_VACUUMED 0x8000000000000000
#define VACUUM_BLOCK_STATUS_IN_PROGRESS_VACUUM 0x4000000000000000
#define VACUUM_BLOCK_STATUS_AVAILABLE 0x0000000000000000
#define VACUUM_BLOCK_FLAG_INTERRUPTED 0x2000000000000000
BitsMeaning
63-62Status: 10 vacuumed, 01 in-progress, 00 available
61VACUUM_BLOCK_FLAG_INTERRUPTED — job was cut short (Ch 6)
60-0Block number, extracted by VACUUM_BLOCKID_WITHOUT_FLAGS

(The in-source comment above the masks — “first bit will be used for this flag” — predates the three-bit reality.) A newborn entry’s flags are all zero — VACUUM_BLOCK_STATUS_AVAILABLE — because vacuum_get_log_blockid returns a pure quotient, far below bit 61.

Branch 3 deserves emphasis: vacuum_Block_data_buffer is a lock-free circular queue of VACUUM_BLOCK_DATA_BUFFER_CAPACITY (1024) entries, allocated in vacuum_initialize. If full, the entry is logged, asserted on, and lost in a release build — the accumulator was already reset two lines earlier, so nothing retries. The in-source TODOs acknowledge this; in practice the consumer (vacuum_consume_buffer_log_blocks, Ch 4) drains far faster than 1024 blocks of log can be written.

Invariant 3-E — the MVCC chain is continuous across blocks. vacuum_produce_log_block_data resets does_block_need_vacuum and newest_block_mvccid but leaves log_Gl.hdr.mvcc_op_log_lsa pointing at the previous block’s last MVCC record. The next block’s first MVCC record therefore links back across the boundary — the LSA_COPY in zone B reads the stale value before prior_update_header_mvcc_info overwrites it. This lets vacuum_recover_lost_block_data rebuild several lost blocks in one backward walk (Ch 11); resetting the field to NULL would cap recovery at a single block.

3.7 Cross-check notes — producer vs. recovery rebuild

Section titled “3.7 Cross-check notes — producer vs. recovery rebuild”

The live producer and the crash-recovery rebuilder (vacuum_recover_lost_block_data, Ch 11) mint entries by two different rules, easy to conflate:

  • oldest_visible_mvccid source. The live path samples the global horizon once per block via get_global_oldest_visible (Invariant 3-A). Recovery cannot — that horizon is gone — so it carries the minimum MVCCID seen while replaying the block’s records (MVCC_ID_PRECEDES); a later, smaller MVCCID implies a transaction active during the block, so the minimum is the safe substitute. The two values need not be equal.
  • Direction. The live path closes the current block going forward; recovery walks prev_mvcc_op_log_lsa backward across several blocks and produces them oldest-first off a std::stack.
  • Last block re-armed, not produced. If the rebuilt block is the one the live header still owns (blockid == vacuum_get_log_blockid (prior_lsa.pageid)), recovery restores it into log_Gl.hdr instead of pushing it — the seam where the two paths rejoin.
  1. Blocks are born in log_append.cpp, not vacuum.c: prior_lsa_next_record_internal folds MVCC info into log_Gl.hdr and mints a vacuum_data_entry when a record’s start_lsa crosses a block boundary — all under prior_lsa_mutex, the accumulator’s only (and sufficient) serialization.
  2. Four record shapes feed the accumulator — LOG_MVCC_UNDO_DATA, LOG_MVCC_UNDOREDO_DATA, LOG_MVCC_DIFF_UNDOREDO_DATA, and LOG_SYSOP_END carrying LOG_SYSOP_END_LOGICAL_MVCC_UNDO — each contributing an MVCCID and receiving a prev_mvcc_op_log_lsa back link patched into its still-in-memory data_header.
  3. The boundary test fires on every record type; a quiet system leaves the last block open in log_Gl.hdr until traffic resumes (handled at consumption, Ch 4/9).
  4. A block is a fixed arithmetic page range (Invariant 3-D); the entry’s 64-bit blockid keeps the number in bits 60-0 and reserves bits 63-61 for status/interrupted flags, all zero at birth.
  5. oldest_visible_mvccid is frozen at the block’s first MVCC record (Invariant 3-A); newest_block_mvccid is a running max; mvcc_op_log_lsa survives block production, keeping the backward chain unbroken (Invariant 3-E).
  6. LOG_ISRESTARTED disarms production during recovery (Invariant 3-C); the 1024-entry handoff queue drops entries on overflow with only an assert and an error log — a known, tolerated weak point.

Chapter 4: Block Registration into Vacuum Data

Section titled “Chapter 4: Block Registration into Vacuum Data”

Chapter 3 ended with a vacuum_data_entry sitting in vacuum_Block_data_buffer, a lock-free circular queue in volatile memory; a crash at that instant loses it. This chapter walks the function that fixes that — vacuum_consume_buffer_log_blocks — branch by branch: draining the buffer, filling blockid gaps, appending at last_page->index_free, growing the file when a page fills, and the redo-only WAL protocol behind it.

vacuum_consume_buffer_log_blocks has exactly three direct call sites:

CallerModeTrigger
vacuum_data::updatebothCanonical wrapper: mark finished jobs (Ch 9), then consume. Reached via vacuum_job_cursor::force_data_update.
vacuum_recover_lost_block_databootReplays the log tail into the buffer, then consumes it immediately (Ch 11).
vacuum_finalizeshutdownDrains the buffer in a loop — consuming appends log, which can complete another block.

In SERVER_MODE, vacuum_master_task::execute calls m_cursor.force_data_update () once per wakeup, and again whenever should_force_data_update reports is_half_full () on vacuum_Finished_job_queue or vacuum_Block_data_buffer (“don’t wait until it’s full”; log appenders must never find the queue full). The queues are sized independently — VACUUM_BLOCK_DATA_BUFFER_CAPACITY = 1024, VACUUM_FINISHED_JOB_QUEUE_CAPACITY = 2048. In SA mode, xvacuum runs the cursor loop inline, forcing an update when the buffer is non-empty, when vacuum_Finished_job_queue->is_full (), or when the cursor runs off the end. At SA shutdown, xboot_shutdown_server (boot_sr.c) calls vacuum_sa_reflect_last_blockid: last_blockid jumps to logpb_last_complete_blockid (), is mirrored into log_Gl.hdr.vacuum_last_blockid, and re-stamped into the empty page via vacuum_data_empty_update_last_blockid, so the next boot does not re-scan log that SA mode already vacuumed; vacuum_finalize repeats the re-stamp after its drain loop.

vacuum_job_cursor::force_data_update brackets vacuum_data::update with unload () / readjust_to_vacuum_data_changes () + load (), because consuming can swap or free the very page the cursor has fixed (Ch 6). update runs mark-finished, then consume, then — only if !vacuum_Data.is_empty ()upgrade_oldest_unvacuumed (get_first_entry ().oldest_visible_mvccid), valid only because oldest_visible_mvccid is non-decreasing across entries (§4.3).

4.2 Entry guards and the empty-buffer fast path

Section titled “4.2 Entry guards and the empty-buffer fast path”

The function opens with two guards and a subtle fast path:

  1. PRM_ID_DISABLE_VACUUM set → return NO_ERROR.
  2. vacuum_Block_data_buffer == NULLassert (false), return NO_ERROR (never happens live).
  3. Buffer empty. If vacuum data is also empty (vacuum_is_empty () — single page, index_unvacuumed == index_free), the function advances m_last_blockid anyway, so an idle system does not pin ever-older log archives:
// vacuum_consume_buffer_log_blocks -- src/query/vacuum.c
if (vacuum_Block_data_buffer->is_empty ())
{
if (vacuum_is_empty ())
{
if (log_Gl.hdr.does_block_need_vacuum)
{
return NO_ERROR; /* <- current block has MVCC ops; cannot skip it */
}
std::unique_lock<std::mutex> ulock { log_Gl.prior_info.prior_lsa_mutex };
// ... condensed: recheck does_block_need_vacuum under the mutex -> return;
// recheck buffer: non-empty -> fall through to consume ...
LOG_LSA log_lsa = log_Gl.prior_info.prior_lsa;
ulock.unlock (); // unlock after reading prior_lsa
const VACUUM_LOG_BLOCKID LOG_BLOCK_TRAILING_DIFF = 2;
VACUUM_LOG_BLOCKID log_blockid = vacuum_get_log_blockid (log_lsa.pageid);
if (log_blockid > vacuum_Data.get_last_blockid () + LOG_BLOCK_TRAILING_DIFF)
{
vacuum_Data.set_last_blockid (log_blockid - LOG_BLOCK_TRAILING_DIFF);
vacuum_data_empty_update_last_blockid (thread_p);
vacuum_update_keep_from_log_pageid (thread_p);
}
return NO_ERROR;
}
else
{
return NO_ERROR; /* <- data non-empty: last entry already defines last_blockid */
}
}

Three details matter. The check–lock–recheck on log_Gl.hdr.does_block_need_vacuum: skipping past an in-flight block with MVCC ops would orphan them; the recheck under prior_lsa_mutex closes the race with the Chapter 3 producer, which flips the flag under the same mutex. LOG_BLOCK_TRAILING_DIFF = 2: log_blockid is the block containing prior_lsa, still being written; staying two behind means the catch-up never claims a block that could still produce MVCC ops. And vacuum_data_empty_update_last_blockid re-initializes the single empty page through vacuum_init_data_page_with_last_blockid — even the “no work” path is WAL-logged (§4.5).

Past the fast path, vacuum_Data.last_page == NULL is an assert_release + ER_FAILED (data not loaded — caller bug).

4.3 The drain loop and the dense-monotonic invariant

Section titled “4.3 The drain loop and the dense-monotonic invariant”

The consume loop walks the buffer and appends at the cached last page:

// vacuum_consume_buffer_log_blocks -- src/query/vacuum.c
data_page = vacuum_Data.last_page;
page_free_data = data_page->data + data_page->index_free;
save_page_free_data = page_free_data; /* <- start of the not-yet-logged run */
was_vacuum_data_empty = vacuum_is_empty ();
while (vacuum_Block_data_buffer->consume (consumed_data))
{
assert (vacuum_Data.get_last_blockid () < consumed_data.blockid);
for (next_blockid = vacuum_Data.get_last_blockid () + 1; next_blockid <= consumed_data.blockid; next_blockid++)
{
// ... page-full branch, see 4.4 ...
if (data_page->index_unvacuumed == data_page->index_free && next_blockid < consumed_data.blockid)
{
next_blockid = consumed_data.blockid - 1; // empty page: skip gaps; for will increment to target
continue;
}
page_free_data->blockid = next_blockid;
if (next_blockid == consumed_data.blockid)
{
LSA_COPY (&page_free_data->start_lsa, &consumed_data.start_lsa);
page_free_data->newest_mvccid = consumed_data.newest_mvccid;
page_free_data->oldest_visible_mvccid = consumed_data.oldest_visible_mvccid;
// ... condensed: NDEBUG asserts, er_log ...
}
else
{
page_free_data->set_vacuumed (); /* <- gap block: no MVCC ops */
LSA_SET_NULL (&page_free_data->start_lsa);
// ... condensed: oldest_visible_mvccid = newest_mvccid = MVCCID_NULL ...
}
vacuum_Data.set_last_blockid (next_blockid);
page_free_data++;
data_page->index_free++;
}
}

The buffer only carries blocks that had MVCC ops (Ch 3). Yet the inner for runs over every blockid between m_last_blockid + 1 and consumed_data.blockid, materializing the missing ones as pre-VACUUMED gap entries with NULL start_lsa and NULL MVCCIDs.

Key invariant: vacuum data blockids are dense and monotonic. Every blockid in [get_first_blockid (), get_last_blockid ()] appears exactly once, in consecutive order across the page chain, and oldest_visible_mvccid is non-decreasing along that order. The gap-fill loop enforces density; the debug asserts (page_free_data - 1)->get_blockid () + 1 == page_free_data->get_blockid () and the oldest_visible_mvccid comparison check both. If violated: the job cursor’s blockid arithmetic (vacuum_job_cursor::change_blockid, Ch 6) lands on the wrong entry, vacuum_data_mark_finished (Ch 9) marks the wrong block, and the “first entry has the oldest MVCCID” shortcut corrupts oldest_unvacuumed_mvccid (Ch 5).

One refinement keeps the invariant cheap: if the current page is empty (index_unvacuumed == index_free) and we are still below consumed_data.blockid, the loop teleports next_blockid to consumed_data.blockid - 1 and continues — gap entries on an empty page would be dead weight; the page’s range legally restarts at the real block. A new real entry starts life AVAILABLE (flags live in the blockid’s high bits, Ch 1); gap entries are born VACUUMED and are reclaimed by the next mark-finished pass without any worker seeing them.

After the loop: if was_vacuum_data_empty, vacuum_update_keep_from_log_pageid recomputes the log-removal watermark (Ch 5). The function works in whole blocks, not LSAs: empty data keeps log from the first page of the block after last_blockid; non-empty data keeps from the first log page of get_first_blockid ()’s block. Appending the first entry to empty data therefore pulls the watermark back to that entry’s block boundary.

When data_page->index_free == vacuum_Data.page_data_max_count, the chain grows:

// vacuum_consume_buffer_log_blocks -- src/query/vacuum.c (page-full branch)
if (page_free_data > save_page_free_data)
{
log_append_redo_data2 (thread_p, RVVAC_DATA_APPEND_BLOCKS, NULL, (PAGE_PTR) data_page,
(PGLENGTH) (save_page_free_data - data_page->data),
/* ... condensed ... */); /* <- log the run appended so far */
vacuum_set_dirty_data_page (thread_p, data_page, DONT_FREE);
}
if (is_sysop)
{
log_sysop_commit (thread_p); /* <- second new page in one call: commit previous sysop */
}
log_sysop_start (thread_p);
is_sysop = true;
error_code = file_alloc (thread_p, &vacuum_Data.vacuum_data_file, file_init_page_type, &ptype,
&next_vpid, (PAGE_PTR *) (&data_page));
// ... condensed: on error or NULL page -> log_sysop_abort + return ...
vacuum_init_data_page_with_last_blockid (thread_p, data_page, vacuum_Data.get_last_blockid ());
VPID_COPY (&vacuum_Data.last_page->next_page, &next_vpid);
log_append_undoredo_data2 (thread_p, RVVAC_DATA_SET_LINK, NULL, (PAGE_PTR) vacuum_Data.last_page, 0,
0, sizeof (VPID), NULL, &next_vpid); /* <- undo data is NULL */
save_last_page = vacuum_Data.last_page;
vacuum_Data.last_page = data_page; /* <- swap the cached fixed page */
vacuum_set_dirty_data_page (thread_p, save_last_page, FREE);
// we cannot commit here. we should append some data blocks first.
page_free_data = data_page->data + data_page->index_free;
save_page_free_data = page_free_data;

Branches, in order: (a) the old page’s pending run is logged only if non-empty; (b) an already-open sysop from a previous page-full iteration is committed first; (c) file_alloc failure or a NULL page aborts the sysop and returns — the abort undoes the allocation itself; (d) the new page is initialized and stamped with the current last_blockid, the old last page’s next_page is linked, the cached last_page pointer is swapped, the old page unfixed (FREE).

The system operation is the consistency boundary: allocation, page init, and link commit as one unit. The deliberate oddity is “we cannot commit here” — the sysop stays open until at least one entry run is logged into the new page (§4.5), so recovery never surfaces an allocated-but-empty page at the end of the chain whose data[0].blockid stamp the loop has already moved past.

flowchart TD
    A["page full?"] -->|yes| B["pending run on old page?<br/>log RVVAC_DATA_APPEND_BLOCKS"]
    B --> C["previous sysop open?<br/>commit it"]
    C --> D["log_sysop_start + file_alloc"]
    D -->|fail| X["log_sysop_abort<br/>return error"]
    D -->|ok| E["RVVAC_DATA_INIT_NEW_PAGE<br/>RVVAC_DATA_SET_LINK<br/>swap cached last_page"]
    A -->|no| G["gap blockid?"]
    E --> G
    G -->|yes| H["set_vacuumed<br/>NULL lsa and mvccids"]
    G -->|no| I["copy start_lsa, oldest, newest"]
    H --> J["fill slot at index_free<br/>set_last_blockid; index_free++"]
    I --> J
    J -->|next blockid| A
    J -->|buffer drained| K["log final run<br/>commit sysop if open"]

Figure 4-1. The consume loop with the page-full branch. Error exits abort the sysop; every other path converges on the closing append record.

The closing branch mirrors the page-full prologue: if save_page_free_data < page_free_data, the final run is logged with RVVAC_DATA_APPEND_BLOCKS, then — only now that the new page has data — an open sysop is committed and the page marked dirty (DONT_FREE). The else arm covers the impossible leftover: an open sysop with no appended run is assert (false) but still committed, “don’t leak the sysop”. Three recovery indexes cover the whole path (the RV_fun table in recovery.c confirms which sides exist):

rcvindexundoredo
RVVAC_DATA_APPEND_BLOCKSvacuum_rv_redo_append_data: bulk-copy run to data + rcv->offset, advance index_free
RVVAC_DATA_INIT_NEW_PAGEvacuum_rv_redo_initialize_data_page: re-run vacuum_data_initialize_new_page, restore data->blockid
RVVAC_DATA_SET_LINKvacuum_rv_undoredo_data_set_linksame function: set or NULL next_page
// vacuum_rv_redo_append_data -- src/query/vacuum.c
int n_blocks = rcv->length / sizeof (VACUUM_DATA_ENTRY);
// ... condensed: length sanity asserts ...
assert (rcv->offset == data_page->index_free); /* <- append-only: offset must equal index_free */
memcpy (data_page->data + rcv->offset, rcv->data, n_blocks * sizeof (VACUUM_DATA_ENTRY));
data_page->index_free += n_blocks;

Key invariant: vacuum data appends are idempotent-by-position, so redo suffices and undo is never needed. Entries are only written at index_free, the redo record carries the absolute slot offset, and the page LSA decides whether the redo applies. Nothing here needs rollback: registration runs under the vacuum master’s system thread, not a client transaction. If undo existed, a rollback would erase pending work and the corresponding log interval would never be vacuumed — the one failure vacuum cannot afford.

The single undoredo record, RVVAC_DATA_SET_LINK, exists for the sysop: on abort (allocation failure) its undo runs with rcv->data == NULL, which vacuum_rv_undoredo_data_set_link maps to VPID_SET_NULL (&data_page->next_page) — detaching the half-born page the aborted sysop simultaneously deallocates. Both directions share the function; the NULL-data branch is the undo, the copy branch the redo.

Page initialization is its own redo record because it does more than zero the page:

// vacuum_init_data_page_with_last_blockid -- src/query/vacuum.c
vacuum_data_initialize_new_page (thread_p, data_page); /* memset, NULL next_page, indexes = 0, ptype */
data_page->data->blockid = blockid; /* <- ghost slot: last_blockid survives in data[0] */
log_append_redo_data2 (thread_p, RVVAC_DATA_INIT_NEW_PAGE, NULL, (PAGE_PTR) data_page, 0, sizeof (blockid), &blockid);
vacuum_set_dirty_data_page (thread_p, data_page, DONT_FREE);

Key invariant: an empty vacuum data page still remembers last_blockid in data[0].blockid. With index_free == 0 the slot is logically dead, yet it carries the high-water mark; vacuum_data::set_last_blockid can therefore stay a plain in-memory setter — it only strips flag bits (VACUUM_BLOCKID_WITHOUT_FLAGS) and debug-asserts the value stays below prior_lsa’s block. At boot, vacuum_data_load_and_recover rebuilds m_last_blockid: non-empty data → the last real entry, last_page->data[index_free - 1].blockid (defensive fallback to slot 0 — the ghost slot again — should index_free be 0); empty data → MAX (log_Gl.hdr.vacuum_last_blockid, vacuum_Data.last_page->data->blockid), the ghost slot possibly overridden by the header value SA mode advanced before deleting archives. (Side branches: a fresh log with a still-negative logpb_last_complete_blockid () leaves the value untouched; a 10.1-era database with NULL recovery_lsa/mvcc_op_log_lsa takes it directly.) Break the ghost-slot rule and an idle-then-crashed server resumes with a stale last_blockid, re-fills “gaps” for blocks already consumed, and double-registers work.

The function ends with VACUUM_VERIFY_VACUUM_DATA plus a page-fix-count check in debug builds, then NO_ERROR.

  1. vacuum_consume_buffer_log_blocks is the only writer of new vacuum data entries, reached through vacuum_data::update (master tick, SA-mode xvacuum), vacuum_recover_lost_block_data (boot), or the vacuum_finalize drain loop; the master forces it whenever buffer or finished-job queue passes half-full.
  2. The drain loop enforces the dense-monotonic blockid invariant by materializing gap entries — born VACUUMED, NULL LSA — for every blockid with no MVCC ops, except across an empty page where the range may legally restart.
  3. Page growth is wrapped in a system operation that stays open until real data lands on the new page; allocation failure aborts the sysop, and RVVAC_DATA_SET_LINK’s undo (NULL data → NULL link) detaches the half-born page.
  4. Everything else is redo-only (RVVAC_DATA_APPEND_BLOCKS, RVVAC_DATA_INIT_NEW_PAGE): appends are positional at index_free, registration belongs to no client transaction, and undo would mean losing vacuum work forever.
  5. m_last_blockid is volatile but recoverable: from the last entry when data is non-empty, else from the ghost slot data[0].blockid maxed with log_Gl.hdr.vacuum_last_blockid — kept fresh at SA-mode shutdown by vacuum_sa_reflect_last_blockid.
  6. Even the empty-buffer fast path is durable: catching last_blockid up to prior_lsa’s block minus LOG_BLOCK_TRAILING_DIFF (2) re-logs the empty page, after a check–lock–recheck on log_Gl.hdr.does_block_need_vacuum so an in-flight block with MVCC ops is never skipped.

Chapter 5: Eligibility and the Oldest Visible Watermark

Section titled “Chapter 5: Eligibility and the Oldest Visible Watermark”

Chapter 4 left a block registered as AVAILABLE — which does not mean safe: its records may still be visible to a running transaction. The decision of when it becomes vacuumable is split. The MVCC table (mvcctable, mvcc_table.hpp/.cpp) owns the oldest visible watermark — the MVCCID below which every snapshot agrees version history is settled — and the vacuum master merely consults it, once per iteration, against each entry’s newest_mvccid. The watermark machinery is the MVCC detail document’s territory: cubrid-mvcc-detail.md Chapter 9 (Vacuum Coordination and the Oldest-Visible Watermark, §9.1–9.5) derives every field, the cross-snapshot sweep, and the pin API line by line; the high-level companion (cubrid-vacuum.md, Common DBMS Design → Oldest-visible-MVCCID watermark) explains why one global watermark suffices. Here, 5.1–5.4 keep just enough of the producer side to read the refresh call correctly; 5.5–5.6 fully trace the vacuum-side consumption, which the MVCC doc only sketches.

5.1 The producer side in brief — mvcctable’s watermark fields

Section titled “5.1 The producer side in brief — mvcctable’s watermark fields”

mvcctable is the server-wide MVCC bookkeeping object at log_Gl.mvcc_table. Most fields serve snapshot construction (field tables in cubrid-mvcc-detail.md §1.7, §2.5); two exist purely for vacuum: m_oldest_visible, an std::atomic<MVCCID> caching the published watermark so readers pay one atomic load (get_global_oldest_visible), and m_ov_lock_count, an atomic freeze counter that, while nonzero, forbids publishing (5.4). The scan reads two inputs: the global lower bound m_current_status_lowest_active_mvccid (advanced opportunistically from complete_mvcc via advance_oldest_active) and the per-transaction slot array m_transaction_lowest_visible_mvccids. One slot cell means three things:

Slot valueStateWho sets itScan action
MVCCID_NULL (0)No live snapshotreset_transaction_lowest_active after LOG_COMMIT; complete_mvcc on rollbackSkip.
MVCCID_ALL_VISIBLE (3, storage_common.h)Handshake sentinel: real value imminentbuild_mvcc_info, momentarilyWait — the value may be lower than anything seen.
Normal MVCCIDSnapshot’s lowest_active_mvccid; post-commit, the tran’s own MVCCIDbuild_mvcc_info; raised by complete_mvcc on commitMin-merge.
flowchart LR
    subgraph producers["producers (per transaction)"]
        BMI["build_mvcc_info<br/>publish snapshot lowest"]
        CM["complete_mvcc<br/>commit: raise to own mvccid<br/>rollback: NULL"]
        RST["reset_transaction_lowest_active<br/>after LOG_COMMIT: NULL"]
    end
    SLOTS["m_transaction_lowest_visible_mvccids[ ]"]
    GLOW["m_current_status_lowest_active_mvccid"]
    COMP["compute_oldest_visible_mvccid"]
    UPD["update_global_oldest_visible"]
    OV["m_oldest_visible"]
    LCK["m_ov_lock_count"]
    BMI --> SLOTS
    CM --> SLOTS
    RST --> SLOTS
    CM -- "advance_oldest_active" --> GLOW
    SLOTS --> COMP
    GLOW --> COMP
    COMP --> UPD
    LCK -- "gates store" --> UPD
    UPD --> OV
    OV -- "get_global_oldest_visible" --> READERS["vacuum master / workers / locator DDL"]

Figure 5-1: producers and consumers around the watermark fields of mvcctable.

Invariant (slot durability across commit). A committing transaction’s slot is not cleared at complete_mvcc; it is raised to the transaction’s own MVCCID and stays until the LOG_COMMIT record is written, when log_complete calls reset_transaction_lowest_active. The in-code comment gives the failure otherwise: slot goes NULL early → vacuum cleans the transaction’s modifications → crash before LOG_COMMIT → recovery rolls back a transaction whose garbage is already gone. The watermark can never pass an MVCCID whose commit record is not yet durable.

// mvcctable::complete_mvcc -- src/transaction/mvcc_table.cpp
if (committed)
{
/* be sure that transaction modifications can't be vacuumed up to LOG_COMMIT. ... */
if (tran_lowest_active == MVCCID_NULL || MVCC_ID_PRECEDES (tran_lowest_active, mvccid))
{
oldest_active_set (..., mvccid, ...); /* <- raise, do not clear */
}
}
else
{
oldest_active_set (..., MVCCID_NULL, ...); /* <- rollback clears immediately */
}

5.2 The ALL_VISIBLE handshake, in one paragraph

Section titled “5.2 The ALL_VISIBLE handshake, in one paragraph”

build_mvcc_info must read the global m_current_status_lowest_active_mvccid and store it into the transaction’s slot — two steps that are not atomic together; if the thread is descheduled between them while the global advances and vacuum refreshes, the stale lower value would land in the slot and force the watermark backwards. The fix is pre-announcement: the slot is first set to MVCCID_ALL_VISIBLE, then the global is read and the real value stored “between next two code lines … no delays” (the in-code comment). The sentinel therefore lives only between two adjacent statements — no I/O, no locks — which is why the scan may simply busy-wait it out (5.3); an edit inserting work between those lines stalls the master’s refresh (latency, not correctness). The full interleaving derivation, with the excerpt, is cubrid-mvcc-detail.md §5.3 (the build_mvcc_info walkthrough); §9.2 of the same doc classifies the sentinel from the scanner’s side. The oldest_active_set/get wrappers also feed Oldest_active_tracker, an 8K debug-build ring with source tags — the first stop when a watermark-regression assert fires.

5.3 compute_oldest_visible_mvccid — the two-phase scan

Section titled “5.3 compute_oldest_visible_mvccid — the two-phase scan”

This private const method (reachable only through update_global_oldest_visible) takes the minimum over the global lower bound and every slot:

// mvcctable::compute_oldest_visible_mvccid -- src/transaction/mvcc_table.cpp
MVCCID lowest_active_mvccid = oldest_active_get (m_current_status_lowest_active_mvccid, ...);
for (size_t idx = 0; idx < m_transaction_lowest_visible_mvccids_size; idx++) /* phase 1 */
{
loaded_tran_mvccid = oldest_active_get (m_transaction_lowest_visible_mvccids[idx], ...);
if (loaded_tran_mvccid == MVCCID_ALL_VISIBLE)
{
waiting_mvccids_pos.append (idx); /* <- defer: real value imminent */
}
else if (loaded_tran_mvccid != MVCCID_NULL && MVCC_ID_PRECEDES (loaded_tran_mvccid, lowest_active_mvccid))
{
lowest_active_mvccid = loaded_tran_mvccid; /* <- new minimum */
}
}
size_t retry_count = 0;
while (waiting_mvccids_pos.get_size () > 0) /* phase 2: drain stragglers */
{
++retry_count;
if (retry_count % 20 == 0)
{
thread_sleep (10); /* <- back off 10 ms every 20 spins */
}
// ... condensed: re-read each waiting slot; still ALL_VISIBLE -> keep waiting;
// resolved -> min-merge like phase 1, erase from set ...
}
assert (MVCCID_IS_NORMAL (lowest_active_mvccid));
return lowest_active_mvccid;

Phase 1 routes each slot down exactly one of three paths: sentinel → deferred, valid-and-lower → new minimum, MVCCID_NULL or not-lower → ignored. Phase 2 spins on the deferred set — short, by sentinel transience (5.2) — sleeping 10 ms every 20th retry so a descheduled snapshotter doesn’t burn a core; a slot resolving to MVCCID_NULL mid-wait is a transaction that finished and stops constraining. The final assert holds because the starting candidate is always a normal MVCCID. (cubrid-mvcc-detail.md §9.2.1 walks the reverse-iterated erase and the perf counters.)

5.4 update_global_oldest_visible and the freeze counter

Section titled “5.4 update_global_oldest_visible and the freeze counter”

The scan result is published only when nobody has frozen the watermark:

// mvcctable::update_global_oldest_visible -- src/transaction/mvcc_table.cpp
if (m_ov_lock_count == 0) /* <- cheap pre-check: skip scan if frozen */
{
MVCCID oldest_visible = compute_oldest_visible_mvccid ();
if (m_ov_lock_count == 0) /* <- re-check: a locker may have arrived mid-scan */
{
assert (m_oldest_visible.load () <= oldest_visible); /* <- monotonicity */
m_oldest_visible.store (oldest_visible);
}
}
return m_oldest_visible.load (); /* <- always returns the published value */

Three outcomes: frozen on entry → return cached; freeze acquired mid-scan → discard the computed value, return cached; unfrozen throughout → assert monotonic, publish, return. get_global_oldest_visible is the read-only twin; lock_global_oldest_visible / unlock_global_oldest_visible just move m_ov_lock_count (full pin API in cubrid-mvcc-detail.md §9.4).

Who locks, and why. Both lockers go through log_tdes::lock_global_oldest_visible_mvccid (log_tran_table.c), idempotent per transaction via the TDES flag block_global_oldest_active_until_commit. The matching unlock is not in the MVCC completion path but in the log one: log_complete and log_complete_for_2pc (log_manager.c) call unlock_global_oldest_visible_mvccid after the completion record — on commit and abort alike, immediately before the commit path’s reset_transaction_lowest_active — so no count leaks. The two lockers in the tree, xlocator_upgrade_instances_domain and redistribute_partition_data (locator_sr.c), share a pattern: lock, read get_global_oldest_visible () as threshold_mvccid, then run inline cleanup via heap_vacuum_all_objects on pages they are about to rewrite. The freeze keeps the system-wide watermark from advancing past that captured threshold mid-operation — otherwise real vacuum could clean the same heap with a newer threshold and disagree about which versions exist. (§9.7 of the MVCC doc discusses the cost: one small DDL pins vacuum globally.)

Invariant (watermark monotonicity). m_oldest_visible never decreases — enforced by the commit-slot rule (5.1), the handshake (5.2), and the assert above. If it decreased, an already-dispatched job, which judged eligibility against the higher value, would be cleaning versions some snapshot still needs.

One residual subtlety: check–compute–recheck is not atomic with the store; a locker arriving between the second check and the store still sees one watermark move, by a value computed entirely before its lock. In-tree callers tolerate this (they read their threshold only after locking), but a caller assuming a strict “no store after lock returns” fence would be wrong.

5.5 The consumer: master snapshot and the eligibility gate

Section titled “5.5 The consumer: master snapshot and the eligibility gate”

The master refreshes and snapshots the watermark exactly once per iteration into the member m_oldest_visible_mvccid (declared in vacuum.c as “saved oldest visible mvccid (recomputed on each iteration)”):

// vacuum_master_task::execute -- src/query/vacuum.c
m_oldest_visible_mvccid = log_Gl.mvcc_table.update_global_oldest_visible ();
// ... condensed: first-run data load, page flushes, force_data_update ...
for (; m_cursor.is_valid () && !should_interrupt_iteration (); m_cursor.increment_blockid ())
{
if (!is_cursor_entry_ready_to_vacuum ())
{
// next entries cannot be ready if current entry is not ready; stop this iteration
break; /* <- break, not continue */
}
if (!is_cursor_entry_available ())
{
continue; /* <- vacuumed or in-progress: try next */
}
start_job_on_cursor_entry ();
// ... condensed ...
}

The master is the only steady-state caller of update_global_oldest_visible: vacuum_data_load_and_recover calls it once at boot, and the only other call, in vacuum_boot, sits in the vacuum-disabled early return and is marked “for debug only” — everyone else reads. One snapshot per iteration gives all entries of a pass a consistent yardstick. The gate has exactly two rejection branches:

// vacuum_master_task::is_cursor_entry_ready_to_vacuum -- src/query/vacuum.c
assert (m_cursor.is_valid ());
if (m_cursor.get_current_entry ().newest_mvccid >= m_oldest_visible_mvccid)
{
// if entry newest MVCCID is still visible, it cannot be vacuumed
// ... condensed: vacuum_er_log ...
return false; /* <- visibility gate */
}
if (m_cursor.get_current_entry ().start_lsa.pageid + 1 >= log_Gl.append.prev_lsa.pageid)
{
// too close to end of log; let more log be appended before trying to vacuum the block
// ... condensed: vacuum_er_log ...
return false; /* <- log-tail proximity gate */
}
return true;

The first branch is the heart of eligibility: the block recorded at logging time (Chapter 3) the highest MVCCID among its operations as newest_mvccid; once even that falls strictly below the watermark, every operation in the block is settled for all current and future snapshots. The >= keeps blocks whose newest equals the watermark — that MVCCID may itself still be active. The second branch is not MVCC at all: it refuses blocks overlapping the log append head, whose pages are still being written (Chapter 7). The caller’s break-not-continue is justified by registration order (Chapter 4): blocks enter vacuum data in log order, so if this one is not ready, none after it can be. The third decision, is_cursor_entry_available, is state bookkeeping rather than eligibility — it skips entries already vacuumed or with a job in flight (Chapter 6).

Invariant (settled-deleter guarantee). Because newest_mvccid bounds every MVCCID logged in the block, and the watermark lower-bounds every live snapshot’s lowest_active_mvccid, vacuum never processes a record whose inserter or deleter could still be invisible to any snapshot — current or, by monotonicity (5.4), future. Every MVCCID the worker meets from an admitted block is either committed-and-globally-visible or rolled back; mvcc_satisfies_vacuum (Chapter 8) never has to ask “is this still in doubt?” for an admitted block’s own operations.

Two different “oldest visible” values now coexist, and confusing them is the classic reader error. The entry’s stored oldest_visible_mvccid is the watermark as of logging timevacuum_data_entry’s header constructor copies it from log_Gl.hdr.oldest_visible_mvccid, and the delegated constructor asserts oldest <= newest. It plays no role in eligibility, which tests newest_mvccid against the current watermark; the recorded value serves bookkeeping (vacuum_consume_buffer_log_blocks asserts it never exceeds get_global_oldest_visible (), vacuum_verify_vacuum_data_debug asserts entries ascend across neighbors) and drives the trailing edge (5.6). Workers do not reuse the master’s snapshot either: vacuum_process_log_block re-reads get_global_oldest_visible () as its threshold_mvccid at job start — by monotonicity only ever newer than the value that qualified the block.

5.6 The trailing edge: vacuum_Data.oldest_unvacuumed_mvccid

Section titled “5.6 The trailing edge: vacuum_Data.oldest_unvacuumed_mvccid”

The watermark is the leading edge — nothing newer may be touched. vacuum_data::oldest_unvacuumed_mvccid (“Global oldest MVCCID not vacuumed (yet)”) is the trailing edge — everything strictly older is guaranteed clean. It is maintained, not computed:

// vacuum_data::set_oldest_unvacuumed_on_boot -- src/query/vacuum.c
if (!log_Gl.hdr.does_block_need_vacuum)
{
// log_Gl.hdr.oldest_visible_mvccid may not remain uninitialized
log_Gl.hdr.oldest_visible_mvccid = log_Gl.hdr.mvcc_next_id; /* <- no pending block: seed */
}
if (vacuum_Data.is_empty ())
{
oldest_unvacuumed_mvccid = log_Gl.hdr.oldest_visible_mvccid;
}
else
{
oldest_unvacuumed_mvccid = first_page->data[0].oldest_visible_mvccid; /* <- first = oldest */
assert (oldest_unvacuumed_mvccid <= log_Gl.hdr.oldest_visible_mvccid);
}

Called once from vacuum_data_load_and_recover at boot (Chapter 11), this covers all three boot shapes: seed an uninitialized header watermark, then either vacuum data is empty (trailing edge = header watermark) or the first — oldest — entry’s recorded oldest_visible_mvccid bounds everything undone. Thereafter vacuum_data::update (the master’s housekeeping pass, Chapter 9) advances it after marking finished jobs and consuming new blocks, again from the first remaining entry (skipped while vacuum data is empty), through a deliberately one-directional setter:

// vacuum_data::upgrade_oldest_unvacuumed -- src/query/vacuum.c
assert (oldest_unvacuumed_mvccid <= mvccid); /* <- "upgrade": may only move forward */
oldest_unvacuumed_mvccid = mvccid;

Invariant (ascending entry oldest). Entries in vacuum data carry non-decreasing oldest_visible_mvccid (vacuum_verify_vacuum_data_debug asserts this across neighbors, plus oldest_unvacuumed_mvccid <= entry->oldest_visible_mvccid for every live entry). Blocks are consumed in log order and the watermark is monotonic, so the first entry is always the global minimum — “first entry’s oldest” is a correct trailing edge without scanning. Break the ordering and upgrade_oldest_unvacuumed’s assert fires — or the trailing edge silently overtakes unvacuumed work.

One writer exists outside the steady path: SA-mode full vacuum (xvacuum, Chapter 11) assigns log_Gl.hdr.mvcc_next_id directly after running every job and logs RVVAC_COMPLETE, whose redo handler vacuum_rv_redo_vacuum_complete replays the assignment. The inverse query rounds out the picture:

// vacuum_is_mvccid_vacuumed -- src/query/vacuum.c
if (id < vacuum_Data.oldest_unvacuumed_mvccid) /* <- strictly older than trailing edge */
{
return true;
}
return false;

Its consumers sit on the storage side, and one is fully operational: xheap_reclaim_addresses (heap_file.c) deallocates a heap page only if the page’s max MVCCID is already vacuumed, and heap_page_update_chain_after_mvcc_op uses the same test to resolve a page’s HEAP_PAGE_VACUUM_UNKNOWN status. The rest are diagnostics: mvcc_satisfies_snapshot and mvcc_satisfies_vacuum (mvcc.c) classify perfmon counters as PERF_SNAPSHOT_..._LOST when a record that should have been vacuumed is still encountered, and btree_prepare_bts (btree.c) disables its check_not_vacuumed checker while the index’s creator MVCCID is not yet vacuumed. The trailing edge is also consumed directly, without the wrapper: vacuum_cleanup_dropped_files drops ledger entries via MVCC_ID_PRECEDES (entry mvccid, oldest_unvacuumed_mvccid) — no future job can ask about them (Chapter 10) — and the debug checker is_not_vacuumed_and_lost runs mvcc_satisfies_vacuum against oldest_unvacuumed_mvccid to flag versions that should be gone but still exist. The two edges bracket the system: oldest_unvacuumed_mvccid <= entry oldest <= m_oldest_visible — clean below the first, untouchable above the second, and vacuum’s whole job is the band between.

  1. Eligibility is decided in two places: mvcctable computes and publishes the watermark (m_oldest_visible); the master compares each entry’s newest_mvccid against a once-per-iteration snapshot (m_oldest_visible_mvccid). The mvcctable internals are fully derived in cubrid-mvcc-detail.md Chapter 9; this chapter owns the vacuum-side consumption.
  2. The scan’s inputs are per-transaction slots encoding three states — MVCCID_NULL (ignore), MVCCID_ALL_VISIBLE (value imminent — wait), real lowest-active MVCCID (min-merge); compute_oldest_visible_mvccid min-merges resolved slots, then busy-waits the sentinels out (10 ms back-off per 20 retries).
  3. Monotonicity is load-bearing: committed transactions keep their slot until LOG_COMMIT (cleared by log_complete via reset_transaction_lowest_active), the handshake prevents stale-low publishes, and update_global_oldest_visible asserts never-lower.
  4. m_ov_lock_count freezes the watermark until completion: locked through log_tdes::lock_global_oldest_visible_mvccid by xlocator_upgrade_instances_domain and redistribute_partition_data (so their inline heap_vacuum_all_objects threshold stays valid), unlocked in log_complete / log_complete_for_2pc on commit and abort alike; the double-check honors the freeze but leaves a narrow, currently-benign store-after-lock window.
  5. The gate is_cursor_entry_ready_to_vacuum has exactly two rejections — newest_mvccid >= watermark and log-tail proximity — and a failure breaks the iteration because blocks register in log order; the settled-deleter invariant follows: no admitted block contains an MVCCID any snapshot still holds in doubt.
  6. An entry’s stored oldest_visible_mvccid (captured at logging time) never gates eligibility — newest_mvccid vs the live watermark does; the stored value drives the trailing edge (set_oldest_unvacuumed_on_boot, upgrade_oldest_unvacuumed) via the ascending-entry invariant, and vacuum_is_mvccid_vacuumed (id < oldest_unvacuumed_mvccid) serves heap-page reclamation and consistency checkers — completing the bracket around vacuum’s working band.

Chapter 6: Master Dispatch and the Job Cursor

Section titled “Chapter 6: Master Dispatch and the Job Cursor”

A block is registered in vacuum data (Chapter 4) and eligible under the watermark (Chapter 5). This chapter traces how the master finds it and hands it to a worker without losing its place. vacuum_boot wires the two halves: a worker pool (vacuum_Worker_threads, sized by PRM_ID_VACUUM_WORKER_COUNT) and vacuum_Master_daemon, whose looper runs one vacuum_master_task::execute pass every PRM_ID_VACUUM_MASTER_WAKEUP_INTERVAL milliseconds. Between passes, vacuum_job_cursor preserves the iteration position across vacuum-data mutation. All code is from src/query/vacuum.c, SERVER_MODE only (standalone is Chapter 11).

vacuum_master_task is the cubthread::entry_task run on each wakeup. vacuum_boot constructs it exactly once (new vacuum_master_task ()) and hands it to create_daemon, which re-runs the same instance every interval — so its members carry state across wakeups.

FieldRole
m_cursor (vacuum_job_cursor)Iteration position; resumes where the previous wakeup stopped
m_oldest_visible_mvccid (MVCCID)Watermark snapshot, recomputed once per execute, held stable for the pass (Chapter 5)
m_outstanding_job_count (std::size_t)Jobs pushed but not yet seen finished; master-thread-only, lock-free; an estimate reconciled at finished-queue drains (6.5)

vacuum_job_cursor is the struct of this chapter. Its header comment states the contract: the blockid is the real progress indicator; the page/index pair is a refixable cache, since data maintenance can relocate an entry to a different page.

FieldRole
m_blockid (VACUUM_LOG_BLOCKID)Logical, flag-free position: dense, monotonic, stable across page unlink/append — a bare (page, index) would dangle when vacuum_Data.update () removes a head page
m_page (VACUUM_DATA_PAGE *)Page containing m_blockid, or NULL (the “unloaded” state); caches the page fix for the hot loop
m_index (INT16)Slot of m_blockid in m_page->data[], or INDEX_NOT_FOUND (-1); doubles as the WAL record offset in 6.4

Unloaded (m_page == NULL) means only the blockid is meaningful: is_valid () is false and the dispatch loop does not run. Loaded means page fixed, slot valid, get_current_entry () returns m_page->data[m_index]. The destructor asserts m_page == NULL: whoever loads must unload.

vacuum_worker_task is the job descriptor handed to the pool. Its single field m_data is a by-value copy of the entry — the worker must not touch the live data page, which can be relocated or unlinked mid-job.

// vacuum_worker_task -- src/query/vacuum.c
class vacuum_worker_task : public cubthread::entry_task
{
public:
vacuum_worker_task (const VACUUM_DATA_ENTRY & entry_ref)
: m_data (entry_ref) /* <- copy, not reference */
{ }
void execute (cubthread::entry & thread_ref) final
{ vacuum_process_log_block (&thread_ref, &m_data, false); } /* <- Chapter 7; assert elided */
private:
vacuum_worker_task (); /* <- private: no entry, no task */
VACUUM_DATA_ENTRY m_data;
};

6.2 vacuum_master_task::execute — the per-wakeup pass

Section titled “6.2 vacuum_master_task::execute — the per-wakeup pass”
// vacuum_master_task::execute -- src/query/vacuum.c
void
vacuum_master_task::execute (cubthread::entry &thread_ref)
{
if (prm_get_bool_value (PRM_ID_DISABLE_VACUUM)) { return; }
if (check_shutdown ()) { return; }
if (!BO_IS_SERVER_RESTARTED ()) { return; } /* <- boot not finished or aborted */
// ... condensed: perf tracker, pgbuf_thread_variables_init ...
m_oldest_visible_mvccid = log_Gl.mvcc_table.update_global_oldest_visible ();
if (!vacuum_Data.is_loaded)
{
vacuum_data_load_first_and_last_page (&thread_ref); /* <- lazy: master never commits */
m_cursor.set_on_vacuum_data_start ();
}
// ... condensed: pgbuf_flush_if_requested on first_page and last_page ...
decrease_outstanding_job (m_cursor.force_data_update ()); /* <- the unconditional tick */
for (; m_cursor.is_valid () && !should_interrupt_iteration (); m_cursor.increment_blockid ())
{
if (!is_cursor_entry_ready_to_vacuum ())
{ break; } /* <- entries are blockid-ordered; later ones cannot be ready either */
if (!is_cursor_entry_available ())
{ continue; } /* <- already vacuumed or in progress; skip */
start_job_on_cursor_entry ();
if (should_force_data_update ())
{ decrease_outstanding_job (m_cursor.force_data_update ()); }
}
m_cursor.unload ();
// ... condensed: NDEBUG fix-count verification, perf timer ...
}

Walkthrough, branch by branch:

  1. Early-outs. Vacuum disabled; shutdown; boot not finished (dispatching against half-restored data would be unsound). check_shutdown () delegates to vacuum_shutdown_sequence::check_shutdown_request (), a three-state handshake (NO_SHUTDOWN to SHUTDOWN_REQUESTED to SHUTDOWN_REGISTERED): the master registers under m_state_mutex and notify_ones the requester, which blocks in request_shutdown until registration — a shutdown request is never missed by a sleeping or mid-pass master.
  2. Watermark refresh, once per wakeup.
  3. Lazy data load on the first pass: boot’s commit would unfix the boundary pages and the master never commits, so the master loads them itself. set_on_vacuum_data_start parks the cursor blockid on the first blockid without loading a page.
  4. The unconditional update tick. m_cursor.force_data_update () (6.3) drains the finished-job queue and consumes the block buffer (vacuum_Data.update ()), then re-positions and re-loads the cursor; its return value — jobs marked finished — feeds decrease_outstanding_job (6.5).
  5. The cursor loop, three exits and one skip:
    • Not ready → break: newest_mvccid >= m_oldest_visible_mvccid (still visible) or start_lsa.pageid + 1 >= log_Gl.append.prev_lsa.pageid (too close to the log tail). Both are monotone over blockid order, so no later entry can pass; break bypasses the for-increment, and the next wakeup retries the same blockid. This is the nothing-eligible idle path: a quiet pass costs one watermark refresh, one tick, one entry probe.
    • Not available → continue, which does increment (6.4).
    • Dispatch (6.4), then if should_force_data_update ()vacuum_Finished_job_queue or vacuum_Block_data_buffer is_half_full () — a mid-loop tick, which the cursor survives via readjust_to_vacuum_data_changes (6.3).
    • Loop condition false → exit: cursor invalid (all data consumed, or search found nothing), or should_interrupt_iteration () — shutdown, or the pool-full backoff when m_outstanding_job_count reaches VACUUM_MAX_TASKS_IN_WORKER_POOL = 3 * PRM_ID_VACUUM_WORKER_COUNT; three queued tasks per worker bounds the backlog until a later tick reconciles the count.
  6. m_cursor.unload () unfixes the cursor page but keeps the blockid, so the next wakeup’s tick resumes exactly there — cursor persistence across sleeps.

Invariant — the cursor page is never held across a sleep or a data update. execute ends with unload (); force_data_update unloads before vacuum_Data.update (). Enforced by assert (m_page == NULL) in the destructor and in search, plus debug-build vacuum_verify_vacuum_data_page_fix_count after every pass. Otherwise update () could unlink or deallocate a page the cursor still has fixed.

6.3 The cursor: keeping a place in moving data

Section titled “6.3 The cursor: keeping a place in moving data”

All position changes funnel through change_blockid: it asserts the forward-only rule (assert (m_blockid <= blockid)), then either detects exhaustion (m_blockid > vacuum_Data.get_last_blockid (), asserted to be exactly last + 1) and unload ()s, or reloads. increment_blockid is change_blockid (m_blockid + 1). reload is the cheap path: if a page is fixed and get_index_of_blockid finds the new blockid in it, only m_index moves — the common case, since one page holds many consecutive entries. Otherwise it unloads and falls through to search; load (used after updates) asserts !is_loaded () and calls search directly:

// vacuum_job_cursor::search -- src/query/vacuum.c
void
vacuum_job_cursor::search ()
{
assert (m_page == NULL);
vacuum_data_page *data_page = vacuum_Data.first_page;
while (true)
{
m_index = data_page->get_index_of_blockid (m_blockid);
if (m_index != vacuum_data_page::INDEX_NOT_FOUND)
{ m_page = data_page; return; } /* <- found: keep the fix */
VPID next_vpid = data_page->next_page;
vacuum_unfix_data_page (&cubthread::get_entry (), data_page);
if (VPID_ISNULL (&next_vpid))
{ return; } /* <- not found: cursor stays unloaded */
data_page = vacuum_fix_data_page (&cubthread::get_entry (), &next_vpid);
}
}

The per-page probe is O(1): entries within a page are consecutive blockids, so vacuum_data_page::get_index_of_blockid computes (blockid - first_blockid) + index_unvacuumed after an emptiness check and two range checks. The not-found exit leaves m_page == NULL, hence is_valid () false — also how empty vacuum data terminates dispatch without a special case. The fix/unfix calls are vacuum-specific macros: first_page and last_page stay fixed for the master’s entire uptime, so vacuum_fix_data_page returns the cached pointer for a boundary page and vacuum_unfix_data_page skips the real pgbuf_unfix for them. Hence search may start from vacuum_Data.first_page without a fix call, and the debug fix-count check expects exactly those two permanent fixes after unload ().

force_data_update brackets the mutation: unload () (can’t be loaded while updating), mark_finished = vacuum_Data.update (), readjust_to_vacuum_data_changes (), load (), return mark_finished. The readjustment handles the update removing the cursor’s blockid: vacuum_data_mark_finished (Chapter 9) trims vacuumed head entries and may unlink head pages, so the first blockid can leap past m_blockid; the cursor “was left behind” and jumps to the new first blockid (on empty data it does nothing — load () finds nothing). Relocation of the same blockid to a different page needs no code: load () simply re-searches from the new first page.

Invariant — the cursor blockid is monotonically non-decreasing. Enforced by the change_blockid assert; readjust only moves forward. A backward move would re-visit dispatched entries and break the arithmetic assumptions in get_index_of_blockid for finished-and-removed blockids.

flowchart TD
  A["change_blockid(b)"] --> B{"b > last_blockid?"}
  B -- yes --> C["unload: all data consumed"]
  B -- no --> D{"m_page fixed and<br/>b still in this page?"}
  D -- yes --> G["move m_index only (hot path)"]
  D -- no --> I["unload + search"]
  I --> J{"probe page: get_index_of_blockid"}
  J -- found --> K["m_page = page, keep fix"]
  J -- "not found, next_page null" --> L["stay unloaded -> is_valid false"]
  J -- "not found, has next" --> M["unfix, fix next page"] --> J

Figure 6-1: cursor relocation — change_blockid, reload, and search branches.

is_cursor_entry_available gates on the status bits packed into the top of the 64-bit entry blockid (Chapter 1): is_available () passes; otherwise the entry is asserted to be is_vacuumed () (done, awaiting removal) or is_job_in_progress () (a worker owns it, or a pre-crash run left it flagged) and is skipped. For an available entry, start_job_on_cursor_entry marks the entry in the cursor’s page, pushes a vacuum_worker_task built from m_cursor.get_current_entry (), and calls increase_outstanding_job (). The marking step:

// vacuum_job_cursor::start_job_on_current_entry -- src/query/vacuum.c
void
vacuum_job_cursor::start_job_on_current_entry () const
{
assert (is_valid ());
cubthread::entry *thread_p = &cubthread::get_entry ();
vacuum_data_entry &entry = m_page->data[m_index];
entry.set_job_in_progress (); /* <- status bits -> IN_PROGRESS, in the page itself */
if (!entry.was_interrupted ())
{
/* Log that a new job is starting. After recovery, the system will then know this job was
* partially executed. */
LOG_DATA_ADDR addr { NULL, (PAGE_PTR) m_page, (PGLENGTH) m_index };
log_append_redo_data (thread_p, RVVAC_START_JOB, &addr, 0, NULL);
}
vacuum_set_dirty_data_page_dont_free (thread_p, m_page);
}

The entry is flagged IN_PROGRESS in the data page and the RVVAC_START_JOB redo record is appended before the task is pushed. The record carries zero data bytes — page pointer plus entry index (the record’s offset) are the payload — and its redo function, vacuum_rv_redo_start_job, replays set_job_in_progress () on data[rcv->offset] and dirties the page.

Invariant — a job is WAL-marked started before any worker can act on it, and a block is in the worker pool at most once at any time. Enforced by the flag-then-log-then-push ordering in one dispatch call (the source comment notes logging happens here to avoid re-latching vacuum data later) plus the is_job_in_progress () skip above. On a mid-job crash, recovery replays RVVAC_START_JOB, and Chapter 11’s restore pass converts IN_PROGRESS to AVAILABLE plus INTERRUPTED, so the block is re-vacuumed in safe mode; the !was_interrupted () guard skips re-logging such re-dispatched blocks. The worker task’s by-value copy carries the just-set IN_PROGRESS and any INTERRUPTED bit, which vacuum_process_log_block consults for safe-mode redo (Chapter 7).

increase_outstanding_job is a bare ++m_outstanding_job_count, master-thread-only. The decrease side never observes completion directly: workers report into the lock-free vacuum_Finished_job_queue (Chapter 9), and the master learns of completions only when vacuum_Data.update () drains that queue inside force_data_update, whose return value (the count vacuum_data_mark_finished consumed) flows into decrease_outstanding_job. Both of its defensive branches — negative count, and count exceeding the current total — assert (false), log, and clamp the counter to zero rather than wrap: an unsigned underflow would read as “pool full forever” and silently stop all vacuuming. Because decreases happen only at update ticks, the count over-approximates in-flight jobs between ticks; the worst case is a conservative early is_task_queue_full exit, fixed by the next tick. This is also why should_force_data_update fires at half-full (VACUUM_FINISHED_JOB_QUEUE_CAPACITY = 2048): waiting for full would risk workers blocking on a full finished queue while the master is still mid-loop.

  1. vacuum_master_task::execute runs one bounded pass per PRM_ID_VACUUM_MASTER_WAKEUP_INTERVAL wakeup: early-out guards (disable parameter, shutdown handshake, boot incomplete), watermark refresh, an unconditional force_data_update tick, then the cursor loop — exited by entry-not-ready (break; same blockid retried next wakeup, the idle path), cursor exhaustion, shutdown, or pool saturation at 3 * PRM_ID_VACUUM_WORKER_COUNT.
  2. vacuum_job_cursor’s source of truth is the logical m_blockid; (m_page, m_index) is a refixable cache: reload moves only the index in the common case, search re-walks the page chain with an O(1) arithmetic probe per page, and readjust_to_vacuum_data_changes jumps forward when head trimming removed the cursor’s blockid.
  3. The cursor is forward-only (assert (m_blockid <= blockid)) and must be unloaded across every vacuum_Data.update () and at the end of every pass — the update may unlink the very page the cursor has fixed; the fix/unfix macros special-case the permanently fixed first/last pages.
  4. Dispatch is flag → WAL → push: RVVAC_START_JOB (zero payload, addressed by page + entry index) is logged before the worker task exists, so recovery knows which jobs may have partially run; the IN_PROGRESS skip guarantees a block is dispatched at most once concurrently.
  5. vacuum_worker_task carries a by-value copy of the data entry, fully decoupling workers from vacuum data pages; its default constructor is private.
  6. m_outstanding_job_count is master-only and lock-free, reconciled solely through the finished-job queue at update ticks; it over-counts between ticks (occasional early backoff, zero synchronization) and clamps to zero on accounting errors instead of wrapping.

Chapter 7: Worker Log Pass and Per Record Dispatch

Section titled “Chapter 7: Worker Log Pass and Per Record Dispatch”

Chapter 6 ended with vacuum_worker_task::execute handing a copy of a VACUUM_DATA_ENTRY to vacuum_process_log_block. This chapter traces its PROCESS_LOG phase: prefetch the block’s log pages once, walk the MVCC-op chain backward from start_lsa, and turn each record into one of three actions — collect a heap OID, vacuum a b-tree entry inline, or delete an external-storage file. Replaying the collected heap array is Chapter 8; completion reporting is Chapter 9.

7.1 Worker-local state and the one-shot prefetch

Section titled “7.1 Worker-local state and the one-shot prefetch”

Everything here lives in VACUUM_WORKER, allocated once by vacuum_worker_allocate_resources (Chapter 2) and reused across jobs:

// vacuum_worker -- src/query/vacuum.h
struct vacuum_worker
{
VACUUM_WORKER_STATE state; /* INACTIVE / PROCESS_LOG / EXECUTE */
INT32 drop_files_version; /* last seen dropped-files version (Ch 10) */
struct log_zip *log_zip_p; /* unzip scratch */
VACUUM_HEAP_OBJECT *heap_objects; /* collected per-job heap targets */
int heap_objects_capacity; /* starts at 4000, doubles on demand */
int n_heap_objects; /* reset to 0 at job start */
char *undo_data_buffer; /* page-straddling undo reassembly */
int undo_data_buffer_capacity;
// ... condensed (private_lru_index) ...
char *prefetch_log_buffer; /* the block's log pages, fetched once */
LOG_PAGEID prefetch_first_pageid;
LOG_PAGEID prefetch_last_pageid;
// ... condensed (allocated_resources, idx) ...
};
// vacuum_heap_object -- src/query/vacuum.h
struct vacuum_heap_object
{
VFID vfid; /* File ID of heap file. */
OID oid; /* Object OID. */
};

The prefetch buffer holds VACUUM_PREFETCH_LOG_BLOCK_BUFFER_PAGES = 1 + vacuum_Data.log_block_npages pages — one extra beyond the block, because a record starting in the block’s final page may spill into the next, and vacuum_log_prefetch_vacuum_block’s header comment is explicit that only one spill page is handled. The function sets prefetch_first_pageid/prefetch_last_pageid and loops logpb_fetch_page (.., LOG_CS_SAFE_READER, ..) across the range; its only branch is fetch failure (assert (false) + ER_FAILED). Prefetch is skipped when sa_mode_partial_block is true — the SA_MODE tail block of Chapter 11 is not fully logged.

Two early-outs in vacuum_process_log_block — the PRM_ID_DISABLE_VACUUM guard at entry and a prefetch failure — return before the end: label, so vacuum_finished_block_vacuum (Chapter 9) never runs and the block’s entry stays in-progress in vacuum data; nothing retries it in this server’s lifetime. Only the next restart reclaims it, when vacuum_data_load_and_recover sweeps in-progress entries with set_interrupted ().

Every later page access goes through vacuum_fetch_log_page:

// vacuum_fetch_log_page -- src/query/vacuum.c
if (vacuum_is_thread_vacuum (thread_p))
{
perfmon_inc_stat (thread_p, PSTAT_VAC_NUM_PREFETCH_REQUESTS_LOG_PAGES);
if (worker->prefetch_first_pageid <= log_pageid && log_pageid <= worker->prefetch_last_pageid)
{
size_t page_index = log_pageid - worker->prefetch_first_pageid;
memcpy (log_page_p, worker->prefetch_log_buffer + page_index * LOG_PAGESIZE, LOG_PAGESIZE);
perfmon_inc_stat (thread_p, PSTAT_VAC_NUM_PREFETCH_HITS_LOG_PAGES);
return NO_ERROR;
}
// else: warning log, fall through
}
// need to fetch from log
error = logpb_fetch_page (thread_p, &req_lsa, LOG_CS_SAFE_READER, log_page_p);
if (error != NO_ERROR)
{ assert (false); logpb_fatal_error (thread_p, true, ARG_FILE_LINE, "vacuum_fetch_log_page"); }

A worker can legitimately miss only forward — a tail record extending past the single spill page. It never misses backward: the loop bound in 7.2 stops before any LSA in the previous block is dereferenced. The non-worker arm serves vacuum_recover_lost_block_data (Chapter 11), where the boot thread has no prefetch buffer. In every path, fetch failure is logpb_fatal_error: vacuum cannot progress without the page, and skipping it would silently leak dead versions forever.

7.2 The backward walk and its per-iteration gates

Section titled “7.2 The backward walk and its per-iteration gates”

Chapter 3 showed how every MVCC op log record embeds a LOG_VACUUM_INFO whose prev_mvcc_op_log_lsa points at the previous MVCC op record. The block’s start_lsa is the chain’s newest end; the worker walks backward, the next position coming out of the record just parsed — not a scan:

// vacuum_process_log_block -- src/query/vacuum.c
for (LSA_COPY (&log_lsa, &data->start_lsa); !LSA_ISNULL (&log_lsa) && log_lsa.pageid >= first_block_pageid;
LSA_COPY (&log_lsa, &log_vacuum.prev_mvcc_op_log_lsa))

Invariant — chain-complete, block-bounded walk. Every MVCC op record in the block is reachable from start_lsa through prev_mvcc_op_log_lsa, in strictly decreasing LSA order. The bound log_lsa.pageid >= first_block_pageid (VACUUM_FIRST_LOG_PAGEID_IN_BLOCK of the blockid) partitions the single global chain into per-block jobs: this worker stops where the previous block’s job takes over. A producer that appended an MVCC op record without linking it would make it permanently invisible — there is no fallback scan.

Before the loop, the job snapshots threshold_mvccid from get_global_oldest_visible (), zeroes n_heap_objects, and computes was_interrupted = data->was_interrupted () || sa_mode_partial_block — Chapter 8 relaxes its safe-guards when a previous run may have already vacuumed some targets. Each iteration then runs four gates before dispatch:

  1. Shutdown / interrupt. Under SERVER_MODE, thread_p->shutdown causes goto end with error_code still NO_ERROR — the job is abandoned and Chapter 9’s vacuum_finished_block_vacuum marks the block interrupted for re-execution. Under SA_MODE the equivalent (logtb_get_check_interrupt plus logtb_is_interrupted) does set error_code = ER_INTERRUPTED: standalone vacuum runs inside a user-visible operation, so the interrupt must surface as an error.
  2. State flip to PROCESS_LOG, paired with PERF_UTIME_TRACKER_TIME_AND_RESTART (..., PSTAT_VAC_WORKER_EXECUTE) — the time until the flip was execute time.
  3. Page cache check. if (log_page_p->hdr.logical_pageid != log_lsa.pageid) refetches via vacuum_fetch_log_page; failure is assert_release + logpb_fatal_error + goto end.
  4. Record parse. vacuum_process_log_record (7.3). On error, vacuum_check_shutdown_interruption asserts the failure is shutdown-legitimate (!vacuum_is_thread_vacuum_worker (thread_p) || (thread_p->shutdown && error_code == ER_INTERRUPTED)), then goto end.

After parsing, the state flips to VACUUM_WORKER_STATE_EXECUTE (mirror perf restart against PSTAT_VAC_WORKER_PROCESS_LOG), then two more gates: the dropped-file continue (the record’s whole file is gone — 7.3 and Chapter 10), and a !NDEBUG-only envelope check — assert (0) + logpb_fatal_error + goto end on violation.

Invariant — MVCCID envelope. In debug builds, every MVCCID met on the walk must lie inside [data->oldest_visible_mvccid, data->newest_mvccid] recorded at block birth (Chapter 3) and strictly below the job’s threshold_mvccid snapshot. Violation means vacuum data or the watermark is corrupt, and proceeding would delete versions some snapshot can still see — hence fatal, not skip.

flowchart TD
    A["log_lsa = start_lsa"] --> B{"LSA null or pageid<br/>below first_block_pageid?"}
    B -- yes --> Z["loop done -> vacuum_heap<br/>Chapter 8, then vacuum_complete=true"]
    B -- no --> C{"shutdown SERVER_MODE /<br/>interrupt SA_MODE?"}
    C -- yes --> END["goto end: state INACTIVE,<br/>vacuum_finished_block_vacuum Ch 9"]
    C -- no --> D["state = PROCESS_LOG"] --> E{"page cached?"}
    E -- no --> F["vacuum_fetch_log_page"]
    F -- error --> END
    E -- yes --> G
    F -- ok --> G["vacuum_process_log_record"]
    G -- error --> END
    G -- ok --> H["state = EXECUTE"] --> I{"is_file_dropped?"}
    I -- yes --> N["continue"]
    I -- no --> V{"debug build: MVCCID<br/>outside envelope?"}
    V -- "yes: fatal" --> END
    V -- no --> J{"rcvindex?"}
    J -- "heap op" --> K["collect OID, 7.4"]
    J -- "btree op" --> L["decode + vacuum inline, 7.5"]
    J -- "RVES_NOTIFY_VACUUM" --> M["delete lob file, 7.6"]
    J -- other --> Q["assert safeguard"]
    K --> N
    L --> N
    M --> N
    Q --> N
    N --> R["log_lsa = prev_mvcc_op_log_lsa"] --> B

Figure 7-1: branch-complete iteration of the PROCESS_LOG loop in vacuum_process_log_block.

stateDiagram-v2
    [*] --> INACTIVE
    INACTIVE --> PROCESS_LOG : iteration starts \n parse log record
    PROCESS_LOG --> EXECUTE : record parsed \n dispatch arm runs
    EXECUTE --> PROCESS_LOG : next chain hop
    EXECUTE --> INACTIVE : loop ends or goto end

Figure 7-2: worker state flips per iteration. The split keys the PSTAT_VAC_WORKER_PROCESS_LOG / PSTAT_VAC_WORKER_EXECUTE accounting, and a non-INACTIVE state makes the worker visible to vacuum_get_worker_min_dropped_files_version (Chapter 10) — how file droppers know they must wait for this worker.

7.3 vacuum_process_log_record, fully dissected

Section titled “7.3 vacuum_process_log_record, fully dissected”

The parser leaves log_lsa_p at the record’s undo data, having extracted the MVCCID, the recovery target (LOG_DATA: rcvindex, volid, pageid, offset), the vacuum chain info, and — when needed — a usable undo-data pointer.

Header parse. After LOG_GET_LOG_RECORD_HEADER and an aligned hop over it, four record types are legal in three parse arms, each extracting the same five things from a differently shaped body:

// vacuum_process_log_record -- src/query/vacuum.c
if (log_rec_type == LOG_MVCC_UNDO_DATA)
{
vacuum_read_advance_when_doesnt_fit (thread_p, sizeof (*mvcc_undo), log_lsa_p, log_page_p);
mvcc_undo = (LOG_REC_MVCC_UNDO *) (log_page_p->area + log_lsa_p->offset);
*mvccid = mvcc_undo->mvccid;
*log_record_data = mvcc_undo->undo.data;
ulength = mvcc_undo->undo.length;
LSA_COPY (&vacuum_info->prev_mvcc_op_log_lsa, &mvcc_undo->vacuum_info.prev_mvcc_op_log_lsa);
VFID_COPY (&vacuum_info->vfid, &mvcc_undo->vacuum_info.vfid);
}
else if (log_rec_type == LOG_MVCC_UNDOREDO_DATA || log_rec_type == LOG_MVCC_DIFF_UNDOREDO_DATA)
{ /* same shape via LOG_REC_MVCC_UNDOREDO; ulength = undoredo.ulength */ }
else if (log_rec_type == LOG_SYSOP_END)
{
if (sysop_end->type != LOG_SYSOP_END_LOGICAL_MVCC_UNDO)
{ assert (false); return ER_FAILED; } /* <- only this flavor carries vacuum info */
mvcc_undo = &sysop_end->mvcc_undo; /* <- embedded LOG_REC_MVCC_UNDO, same extraction */
}
else
{ assert (false); /* ER_GENERIC_ERROR */ return ER_FAILED; } /* <- any other type = corrupt chain */

(Struct shapes — LOG_REC_MVCC_UNDO/LOG_REC_MVCC_UNDOREDO wrapping LOG_VACUUM_INFO — are Chapter 3 material in log_record.hpp; diff and non-diff undoredo parse identically because vacuum reads only the undo side.)

The aligned-read clones. Four helpers exist because the stock LOG_READ_* macros fetch through the log page buffer, while vacuum must route through vacuum_fetch_log_page to hit its prefetch buffer. vacuum_read_log_aligned aligns offset to DOUBLE_ALIGNMENT and, while offset >= LOGAREA_SIZE, advances pageid and refetches (fetch failure: logpb_fatal_error); vacuum_read_log_add_aligned is add-then-align; vacuum_read_advance_when_doesnt_fit forces a next-page fetch when the requested struct would straddle the boundary — so the casts above always see a contiguous struct; vacuum_copy_data_from_log is one memcpy when the data fits the page, else a chunked copy across fetches.

Recovery early-out. stop_after_vacuum_info == true returns NO_ERROR right after the header arms — the caller only wanted prev_mvcc_op_log_lsa. The entry asserts show the contract: worker, undo_data_ptr, undo_data_size, is_file_dropped may all be NULL in this mode. The only such caller is vacuum_recover_lost_block_data (Chapter 11).

Dropped-file short-circuit. When the record carries a non-NULL vacuum_info->vfid:

// vacuum_process_log_record -- src/query/vacuum.c
if (worker->drop_files_version != vacuum_Dropped_files_version)
{
/* But first, cleanup collected heap objects. */
VFID_COPY (&vfid, &vacuum_Last_dropped_vfid);
vacuum_cleanup_collected_by_vfid (worker, &vfid);
worker->drop_files_version = vacuum_Dropped_files_version;
}
error_code = vacuum_is_file_dropped (thread_p, is_file_dropped, &vacuum_info->vfid, *mvccid);
if (error_code != NO_ERROR) { vacuum_check_shutdown_interruption (...); return error_code; }
if (*is_file_dropped == true) { return NO_ERROR; }

The handshake and ledger lookup are Chapter 10’s subject; what matters here is ordering — the worker must scrub its own heap_objects array before publishing the new version, because publishing releases the dropper, after which the file’s pages may be reused. vacuum_cleanup_collected_by_vfid qsorts the array with vacuum_compare_heap_object and excises the contiguous run matching the VFID. A dropped verdict returns NO_ERROR with *is_file_dropped = true; the block loop continues.

Undo-data extraction. Heap records return early — if (!LOG_IS_MVCC_BTREE_OPERATION (rcvindex) && rcvindex != RVES_NOTIFY_VACUUM) return NO_ERROR; — because the heap pass (Chapter 8) reads the current heap page, not logged images. For the rest, ZIP_CHECK (ulength) decides the real size (GET_ZIP_LEN), then:

// vacuum_process_log_record -- src/query/vacuum.c
if (log_lsa_p->offset + *undo_data_size < (int) LOGAREA_SIZE)
{ *undo_data_ptr = (char *) log_page_p->area + log_lsa_p->offset; } /* <- zero-copy into the page */
else
{
if (worker->undo_data_buffer_capacity < *undo_data_size)
{ /* realloc; NULL -> fatal-logged ER_FAILED; capacity grows monotonically */ }
*undo_data_ptr = worker->undo_data_buffer;
vacuum_copy_data_from_log (thread_p, *undo_data_ptr, *undo_data_size, log_lsa_p, log_page_p);
}
if (is_zipped)
{
if (log_unzip (worker->log_zip_p, *undo_data_size, *undo_data_ptr))
{ *undo_data_size = (int) worker->log_zip_p->data_length;
*undo_data_ptr = (char *) worker->log_zip_p->log_data; } /* <- now into log_zip's buffer */
else { /* fatal-logged */ return ER_FAILED; }
}

So undo_data may alias three owners — the log page, worker->undo_data_buffer, or worker->log_zip_p — all stable until the next record is parsed, exactly as long as the dispatch arms need.

7.4 Arm 1 — heap: collect now, execute later

Section titled “7.4 Arm 1 — heap: collect now, execute later”

Heap rcvindexes (LOG_IS_MVCC_HEAP_OPERATION in mvcc.h: RVHF_MVCC_DELETE_REC_HOME, RVHF_MVCC_INSERT, RVHF_UPDATE_NOTIFY_VACUUM, RVHF_MVCC_DELETE_MODIFY_HOME, RVHF_MVCC_NO_MODIFY_HOME, RVHF_MVCC_REDISTRIBUTE) are not executed per record. The OID is reassembled from LOG_DATA — with one subtlety:

// vacuum_process_log_block -- src/query/vacuum.c
heap_object_oid.slotid = heap_rv_remove_flags_from_offset (log_record_data.offset);
/* <- offset & ~HEAP_RV_FLAG_VACUUM_STATUS_CHANGE (0x8000): the producer smuggled a recovery
flag into the slotid field (Chapter 3); strip it or vacuum targets a garbage slot */
error_code = vacuum_collect_heap_objects (thread_p, worker, &heap_object_oid, &log_vacuum.vfid);
if (error_code != NO_ERROR)
{
assert_release (false);
er_clear ();
error_code = NO_ERROR;
continue; /* <- one lost OID must not sink the block: release keeps going */
}

vacuum_collect_heap_objects appends a VACUUM_HEAP_OBJECT (7.1) to worker->heap_objects, doubling capacity by realloc when full (initial VACUUM_DEFAULT_HEAP_OBJECT_BUFFER_SIZE = 4000; only failure mode is ER_OUT_OF_VIRTUAL_MEMORY). The VFID rides along because Chapter 8 must ask the file whether slots are reusable. vacuum_compare_heap_object — VFID fileid, volid, then OID pageid, volid, slotid — groups the array by file then page, both for Chapter 8’s vacuum_heap qsort and for 7.3’s excision. Deferring heap work batches all of a page’s records into one fix/log cycle instead of one per record.

7.5 Arm 2 — b-tree: decode and execute inline

Section titled “7.5 Arm 2 — b-tree: decode and execute inline”

B-tree records (LOG_IS_MVCC_BTREE_OPERATION: RVBT_MVCC_DELETE_OBJECT, RVBT_MVCC_INSERT_OBJECT, RVBT_MVCC_INSERT_OBJECT_UNQ, RVBT_MVCC_NOTIFY_VACUUM) cannot be batched by page — the key must be searched top-down each time — so they run inline. The undo payload is decoded two ways: RVBT_MVCC_INSERT_OBJECT_UNQ goes through btree_rv_read_keybuf_two_objects, which unpacks the BTID and two BTREE_OBJECT_INFOs — a unique-index MVCC insert moves the incumbent out of the leaf’s first slot, so the log carries both versions and vacuum’s target OID/class-OID come from the old one. Everything else goes through btree_rv_read_keybuf_nocopy, the one-object flavor that also fills mvcc_info from flag bits packed into the OID. Either way key_buf wraps the still-packed key, and assert (!OID_ISNULL (&oid)) seals the decode. Then the purpose dispatch, four-way:

// vacuum_process_log_block -- src/query/vacuum.c
if (log_record_data.rcvindex == RVBT_MVCC_NOTIFY_VACUUM)
{
if (MVCCID_IS_VALID (mvcc_info.delete_mvccid))
{ error_code = btree_vacuum_object (..., mvcc_info.delete_mvccid); }
else if (MVCCID_IS_VALID (mvcc_info.insert_mvccid) && mvcc_info.insert_mvccid != MVCCID_ALL_VISIBLE)
{ error_code = btree_vacuum_insert_mvccid (..., mvcc_info.insert_mvccid); }
else
{ /* impossible case */ assert_release (false); continue; }
}
else if (log_record_data.rcvindex == RVBT_MVCC_DELETE_OBJECT)
{ error_code = btree_vacuum_object (..., mvccid); } /* <- record's own MVCCID is the delid */
else if (log_record_data.rcvindex == RVBT_MVCC_INSERT_OBJECT
|| log_record_data.rcvindex == RVBT_MVCC_INSERT_OBJECT_UNQ)
{ error_code = btree_vacuum_insert_mvccid (..., mvccid); }
else
{ /* Unexpected. */ assert_release (false); }

RVBT_MVCC_NOTIFY_VACUUM is the either-or case: an index load logs one notification per object without knowing whether vacuum will see a dead deleted object (valid delete_mvccid — remove it) or a settled insert (valid, non-MVCCID_ALL_VISIBLE insert_mvccid — strip the insid only); both invalid is corrupt, skipped with continue in release. The executors are thin wrappers over one engine:

// btree_vacuum_object -- src/storage/btree.c
BTREE_MVCC_INFO_SET_DELID (&match_mvccinfo, delete_mvccid);
return btree_delete_internal (thread_p, btid, oid, class_oid, &mvcc_info, NULL, buffered_key, NULL,
SINGLE_ROW_MODIFY, NULL, &match_mvccinfo, NULL, NULL, BTREE_OP_DELETE_VACUUM_OBJECT);

btree_vacuum_insert_mvccid is symmetric with BTREE_MVCC_INFO_SET_INSID and BTREE_OP_DELETE_VACUUM_INSID. The match_mvccinfo makes the operation idempotent: if a previous interrupted run already cleaned the entry, the traversal finds no match and succeeds — crucial because interrupted blocks re-execute from start_lsa (Chapter 9). The arm’s epilogue: a SERVER_MODE-only overflow-page accounting block (thread_p->read_ovfl_pages_count, zeroed before the dispatch, checked against g_ovfp_threshold_mgr), then the error branch — thread_p->shutdown makes a b-tree error an acceptable interruption (goto end); otherwise it is asserted, logged, er_clear ()ed, and neutralized (error_code = NO_ERROR) so the block continues. Same robustness policy as the heap arm.

7.6 Arm 3 — external storage, and the nop that explains it

Section titled “7.6 Arm 3 — external storage, and the nop that explains it”

RVES_NOTIFY_VACUUM carries a packed URI string in its undo data:

// vacuum_process_log_block -- src/query/vacuum.c
(void) or_unpack_string (undo_data, &es_uri);
if (es_delete_file (es_uri) != NO_ERROR)
{ er_clear (); } /* <- file may already be gone; swallow */
else
{ ASSERT_NO_ERROR (); }
db_private_free_and_init (thread_p, es_uri);

The producer side explains the trick. A LOB file cannot be unlinked at delete-commit time — older snapshots may still read it — so vacuum_notify_es_deleted appends an undo-only record addressed to nobody:

// vacuum_notify_es_deleted -- src/query/vacuum.c
/* This is not actually ever undone, but vacuum will process undo data of log entry. */
log_append_undo_data (thread_p, RVES_NOTIFY_VACUUM, &addr, length, data);

Because rollback would execute the record’s undo function, vacuum_rv_es_nop exists as the registered handler that does nothing: the record is a message in a bottle for vacuum, not a recoverable change. The undo-data channel is reused purely because every MVCC-undo record automatically joins the chain this chapter walks. The final else after all three arms is the safeguard assert_release (false) — an rcvindex that is neither heap, b-tree, nor ES should never have entered the chain.

Each iteration closes by asserting worker->state == VACUUM_WORKER_STATE_EXECUTE and that no system op leaked (!...is_under_sysop ()); the same pair guards the loop exit before vacuum_heap runs (Chapter 8) and vacuum_complete = true is set. The end: epilogue flips the state to INACTIVE and — except for sa_mode_partial_block jobs, which have no vacuum-data entry to report — calls vacuum_finished_block_vacuum (Chapter 9) with that flag; under SERVER_MODE it also runs pgbuf_unfix_all as a leak backstop.

  1. A worker reads each block’s log pages exactly once into its private prefetch_log_buffer (block size + 1 spill page); later accesses in vacuum_fetch_log_page are memcpy hits. Legitimate misses are forward-only — a tail record spilling past the one extra page — never backward.
  2. The PROCESS_LOG loop is a backward walk of the prev_mvcc_op_log_lsa chain from data->start_lsa, partitioned per block by the pageid >= first_block_pageid bound — there is no scan, so an unlinked MVCC record is permanently invisible to vacuum.
  3. vacuum_process_log_record accepts four record types in three parse arms (LOG_MVCC_UNDO_DATA, the two MVCC undoredo types, LOG_SYSOP_END of type LOG_SYSOP_END_LOGICAL_MVCC_UNDO); its stop_after_vacuum_info mode serves Chapter 11’s recovery walk, and its dropped-file short-circuit (Chapter 10) scrubs already-collected OIDs before acknowledging a new dropped-files version.
  4. Undo data is materialized lazily and may alias the log page, the growable undo_data_buffer, or the log_zip_p buffer after log_unzip — valid only until the next record parse; heap records skip extraction entirely.
  5. The three arms execute asymmetrically: heap OIDs are collected (flag-stripped slotid, capacity-doubling array) for Chapter 8’s batched pass; b-tree entries are vacuumed inline through btree_delete_internal with match-MVCCID idempotence; ES records delete a LOB file whose log record exists only as a message — its recovery handler vacuum_rv_es_nop does nothing.
  6. Per-record failures in release builds are logged, cleared, and skipped; shutdown/interrupt and parse corruption abandon the block for re-execution via Chapter 9, while the PRM_ID_DISABLE_VACUUM guard and a prefetch failure return before the completion path — those entries stay in-progress until vacuum_data_load_and_recover flips them to interrupted at the next restart.
  7. The PROCESS_LOG/EXECUTE state flips bracket the perf accounting and keep the worker visible to the dropped-files version handshake; both loop tail and exit assert the state discipline and that no system operation leaked.

Chapter 7 left the worker with a flat array of (VFID, OID) pairs in worker->heap_objects. This chapter traces the batched second pass: vacuum_heap groups the array by heap page, vacuum_heap_page works each batch through the VACUUM_HEAP_HELPER workbench, mvcc_satisfies_vacuum issues per-record verdicts, and every change is logged for Chapter 11. Heap page anatomy, REC_* slot types, and heap_remove_page_on_vacuum live in cubrid-heap-manager-detail.md; the MVCC header layout and visibility family in cubrid-mvcc-detail.md — used here, not re-derived.

8.1 Batching: vacuum_heap sorts, groups, and survives errors

Section titled “8.1 Batching: vacuum_heap sorts, groups, and survives errors”

Each VACUUM_HEAP_OBJECT (in vacuum.h) holds just vfid and oid. vacuum_heap runs once per job from vacuum_process_log_block, with the block’s threshold_mvccid and the was_interrupted flag (true when the job re-executes after crash or shutdown — Chapter 6). It qsorts with vacuum_compare_heap_objectvfid.fileid, vfid.volid, then oid.pageid, oid.volid, oid.slotid — so the array becomes file groups of page runs of slot-sorted duplicates, then calls vacuum_heap_page once per page run. At each file boundary it does HFID_SET_NULL on the cached HFID; the first vacuum_heap_page of the new group lazily refills it (8.3). Error handling is build-asymmetric: a debug build (or a shutdown) stops the job, but a release build (#if defined (NDEBUG)) clears the error and abandons the failed page — the block is still marked vacuumed (Chapter 9), so a skipped record leaks until a later delete touches it. Forward progress wins.

Dropped files never reach this loop; the filter runs at collection time. vacuum_process_log_record (Chapter 7) checks vacuum_is_file_dropped (Chapter 10) before vacuum_collect_heap_objects, and when vacuum_Dropped_files_version advanced mid-block it calls vacuum_cleanup_collected_by_vfid to purge collected entries of vacuum_Last_dropped_vfid (sort, find the VFID’s range, memmove the tail down).

vacuum_heap_page keeps all working state in one stack struct so latch-dropping retries can rebuild context cheaply:

FieldRole
home_page, home_vpidFixed PAGE_HEAP pointer under write latch + its VPID (survives unfix/re-fix); NULL page = “latch dropped”
forward_page, forward_oidSecond fixed page and the OID from the home link record; meaning depends on record_type (matrix below)
crt_slotidSlot being vacuumed; doubles as the duplicate filter
record_typeSlot type, re-read fresh on every retry — the record may change while unlatched
record over rec_buf[IO_MAX_PAGE_SIZE + MAX_ALIGNMENT]COPY (not PEEK) of the current record; rewritten in place by the insid strip, and for REC_RELOCATION it doubles as the NEWHOME removal’s undo image
mvcc_headerDecoded header — input to mvcc_satisfies_vacuum
hfid, overflow_vfid, reusableFile identity: per-group HFID cache, lazily-resolved overflow file (heap_ovf_delete needs it), FILE_HEAP_REUSE_SLOTS flag (slot afterlife, 8.6)
can_vacuumVerdict for the current record; drives the execute split
slots[MAX_SLOTS_IN_PAGE], results[MAX_SLOTS_IN_PAGE], n_bulk_vacuumedPending bulk-logging batch (8.7) — single-page REC_HOME changes only
forward_recdes over forward_linkCOPY of the home link slot; doubles as undo image of the home removal
n_vacuumed, initial_home_free_space, time_trackStatus-assert input, heap_stats_update delta base, prepare/execute/log perf split
FieldREC_HOMEREC_RELOCATIONREC_BIGONE
forward_pageunused (asserted NULL)page holding REC_NEWHOMEfirst overflow page
forward_oidunusedOID of REC_NEWHOME slotOID naming first overflow VPID
recordcopy of home recordcopy of REC_NEWHOME recordunused
mvcc_headerfrom recordfrom recordvia heap_get_mvcc_rec_header_from_overflow
  1. Fix, interrupted flavor. If was_interrupted: pgbuf_fix_if_not_deallocated — error → return; home_page == NULL (deallocated by the earlier partial run) → warn, return NO_ERROR; page type PAGE_FTAB (deallocated and reused as a file-table page) → unfix, return NO_ERROR. Both tolerated cases assert n_heap_objects == 1. A normal run uses plain pgbuf_fix and treats failure as a hard error. Invariant — a re-executed job must tolerate a vanished page; a first execution must not. The relaxed path used unconditionally would silently skip genuine fix failures on live data.
  2. File identity. If the cached hfid is NULL, vacuum_heap_get_hfid_and_file_type runs heap_get_class_oid_from_page, file_descriptor_get, file_get_type — any failure asserts and returns; NULL HFID or type outside {FILE_HEAP, FILE_HEAP_REUSE_SLOTS} → ER_FAILED. On success helper.reusable = (ftype == FILE_HEAP_REUSE_SLOTS); both values copy out to the caller’s per-group cache.
  3. Per-slot loop. Duplicates skipped via crt_slotid; vacuum_heap_prepare_record (8.4) on error unfixes forward and jumps to end. Then REC_RELOCATION/REC_HOME/REC_BIGONE call mvcc_satisfies_vacuum (8.5): REMOVEvacuum_heap_record; DELETE_INSID_PREV_VERvacuum_heap_record_insid_and_prev_version; CANNOT_VACUUM → no-op. Any other slot type hits default: (per the in-code comment, already vacuumed by another worker or rolled back and reused) and is ignored. An execution error does assert_release (false), then goto end if the home latch was lost, else continue.
  4. Status downgrade. After each object — vacuum-worker threads only — re-read heap_page_get_vacuum_status. The terminal branch fires on (ONCE && !was_interrupted) || (NONE && was_interrupted), asserting n_heap_objects == 1. First half: the normal “single expected vacuum” — heap_page_set_vacuum_status_none, then a bulk log record with all_vacuumed = true so redo downgrades too. Second half: the paranoid re-execution case (the in-code comment walks an insert/vacuum/delete/crash interleaving where an old job re-runs while a newer vacuum is still owed; downgrading could let the page deallocate under a pending job), so the re-run only resets counters. Either way the page is dirtied, and if spage_number_of_records <= 1 && helper.reusable the worker tries heap_remove_page_on_vacuum (guarded by pgbuf_has_prevent_dealloc) — success deallocates the emptied page; failure unfixes. Always exits to end.
  5. Courtesy yield. If pgbuf_has_any_non_vacuum_waiters and objects remain, the batch is flushed with vacuum_heap_page_log_and_reset (which unfixes) and the page re-fixed (re-fix failure → end) — fairness for foreground threads at the cost of an extra log record.
  6. end: the remaining batch is flushed with update_best_space_stat = true.
flowchart TD
    A["fix home page<br/>(interrupted: tolerate dealloc/FTAB)"] --> B{"hfid cached?"}
    B -- no --> C["vacuum_heap_get_hfid_and_file_type"]
    B -- yes --> D
    C --> D["next object; skip dup slotid"]
    D --> E["vacuum_heap_prepare_record"]
    E -- error --> Z["end: flush batch, unfix"]
    E --> F{"record_type"}
    F -- "HOME / REL / BIG" --> G["mvcc_satisfies_vacuum"]
    F -- other --> J
    G -- REMOVE --> H["vacuum_heap_record"]
    G -- DELETE_INSID_PREV_VER --> I["vacuum_heap_record_insid_and_prev_version"]
    G -- CANNOT_VACUUM --> J["check page vacuum status"]
    H --> J
    I --> J
    J -- "ONCE and not interrupted<br/>or NONE and interrupted" --> K["set status none, log all_vacuumed<br/>maybe heap_remove_page_on_vacuum"] --> Z
    J -- "waiters and more objects" --> L["log_and_reset, re-fix"] --> D
    J -- otherwise --> D

Figure 8-1: vacuum_heap_page control flow; every exit funnels through end.

8.4 vacuum_heap_prepare_record: every record-type branch

Section titled “8.4 vacuum_heap_prepare_record: every record-type branch”

Prepare gathers what each record shape needs, under a retry_prepare: label that re-reads the slot after any latch drop. It is entered with forward_page == NULL (asserted); a non-NULL forward_page inside the switch can only be left over from an earlier retry of the same call, and every non-matching branch unfixes it.

  • Slot gone (spage_get_slot NULL): type forced to REC_MARKDELETED, return NO_ERROR; the caller’s default: ignores it.
  • REC_RELOCATION: COPY the home link into forward_recdes/forward_link (failure → ER_FAILED); unfix a retry-leftover forward page with the wrong VPID. The forward fix obeys pgbuf_get_condition_for_ordered_fix: home-before-forward order → unconditional latch; forward-before-home → conditional try. If the try fails: flush and unfix home (vacuum_heap_page_log_and_reset), fix forward then home unconditionally — the deadlock-safe order — and goto retry_prepare because home was unlatched. Invariant — two heap pages are never latched contrary to pgbuf’s ordered-fix rule; violating it risks an undetected latch deadlock with foreground writers. With both pages held, COPY-read the REC_NEWHOME record into rec_buf (the future undo image) and decode it with or_mvcc_get_header.
  • REC_BIGONE: if overflow_vfid is unresolved, try heap_ovf_find_vfid with a conditional latch; on failure flush/unfix home, retry unconditionally (failure → ER_FAILED), re-fix home, goto retry_prepare. Then COPY the link record, fix the first overflow page unconditionally (overflow pages are always fixed after home pages — no ordering dance), and read the header with heap_get_mvcc_rec_header_from_overflow.
  • REC_HOME: COPY the record into rec_buf (the in-code comment says “Peek” but the call passes COPY) — not for undo this time (REC_HOME changes are logged redo-only, 8.7) but because the insid strip mutates the buffer before spage_update, which a PEEK pointer into the page would not allow; decode the header with or_mvcc_get_header.
  • default: (direct REC_NEWHOME, REC_MARKDELETED, REC_DELETED_WILL_REUSE, …) only the type is reported.
// vacuum_heap_prepare_record -- src/query/vacuum.c
/* Assert forward page is fixed if and only if record type is either REC_RELOCATION or REC_BIGONE. */
assert ((helper->record_type == REC_RELOCATION
|| helper->record_type == REC_BIGONE) == (helper->forward_page != NULL));

Invariant — forward_page != NULL iff the record type has a forward component. If violated, a later iteration would write through a latch belonging to the wrong page.

The heap pass hinges on one pure function of the MVCC header and the block’s threshold_mvccid (the oldest-visible watermark snapshotted at dispatch — Chapter 5) — the vacuum-side sibling of the visibility family in cubrid-mvcc-detail.md:

// mvcc_satisfies_vacuum -- src/transaction/mvcc.c
if (!MVCC_IS_HEADER_DELID_VALID (rec_header) || MVCC_IS_REC_DELETED_SINCE_MVCCID (rec_header, oldest_mvccid))
{
/* The record was not deleted or was recently deleted and cannot be vacuumed completely. */
if (!MVCC_IS_HEADER_INSID_NOT_ALL_VISIBLE (rec_header)
|| MVCC_IS_REC_INSERTED_SINCE_MVCCID (rec_header, oldest_mvccid))
{
// ... condensed (perfmon) ...
return VACUUM_RECORD_CANNOT_VACUUM;
}
else
{
// ... condensed (perfmon) ...
return VACUUM_RECORD_DELETE_INSID_PREV_VER;
}
}
else
{
return VACUUM_RECORD_REMOVE; /* <- delete committed before every live snapshot */
}

The macros are exact bit tests: MVCC_IS_HEADER_DELID_VALID = DELID flag set and MVCCID_IS_VALID (delid); MVCC_IS_HEADER_INSID_NOT_ALL_VISIBLE = INSID flag set and value != MVCCID_ALL_VISIBLE (the constant 3); the *_SINCE_MVCCID macros are !MVCC_ID_PRECEDES (id, T), i.e. id >= T (the version was touched at or after the threshold, so it is too recent to finish). Note the polarity: the outer test is !DELID_VALID || DELETED_SINCE_MVCCID — so the REMOVE verdict is the negation of DELETED_SINCE_MVCCID, the else branch where the record both has a valid delid and delid < T, meaning the delete committed before every live snapshot. When that fails, an old-enough, not-yet-stripped insert id (insid < T, flag set, not ALL_VISIBLE) yields DELETE_INSID_PREV_VER; otherwise — a fresh insert/delete, or an already-stripped insid — nothing happens.

flowchart TD
    S["MVCC header + threshold T"] --> D{"delid valid<br/>and delid < T ?"}
    D -- yes --> R["VACUUM_RECORD_REMOVE<br/>no live snapshot can see it"]
    D -- no --> I{"insid flag set, not ALL_VISIBLE,<br/>and insid < T ?"}
    I -- yes --> P["VACUUM_RECORD_DELETE_INSID_PREV_VER<br/>keep row, strip insid + prev_version_lsa"]
    I -- no --> C["VACUUM_RECORD_CANNOT_VACUUM<br/>already stripped, or too fresh"]

Figure 8-2: the three verdicts.

Two consequences. First, CANNOT_VACUUM is normal, not an error: the same OID is collected once per log record touching it, so a record deleted shortly after this block’s threshold gets DELETE_INSID_PREV_VER now and REMOVE from the later job whose threshold passes the delete. Second, MVCCID_ALL_VISIBLE doing double duty — “no insid flag” and “insid == 3” both mean already stripped — is what makes the REC_BIGONE path idempotent (8.6).

vacuum_heap_record_insid_and_prev_version (verdict DELETE_INSID_PREV_VER) edits the header; the row survives. REC_RELOCATION and REC_HOME share the same byte surgery on the copied record: look up the current header size from mvcc_header_size_lookup[mvcc_flags]; if both DELID and INSID are present, memcpy the DELID over the INSID slot so it survives; clear OR_MVCC_FLAG_VALID_INSID | OR_MVCC_FLAG_VALID_PREV_VERSION; memmove closes the gap. The shrunken record goes back via spage_update — to forward_page/forward_oid.slotid for the NEWHOME, to home_page/crt_slotid for HOME. REC_RELOCATION then logs immediately (a one-slot vacuum_log_vacuum_heap_page on the forward page) and unfixes; REC_HOME just appends (crt_slotid, DELETE_INSID_PREV_VER) to the bulk batch. REC_BIGONE cannot resize — the overflow header area is fixed-width — so the insid is overwritten, not stripped:

// vacuum_heap_record_insid_and_prev_version -- src/query/vacuum.c
/* Replace current insert MVCCID with MVCCID_ALL_VISIBLE. Header must remain the same size. */
MVCC_SET_INSID (&helper->mvcc_header, MVCCID_ALL_VISIBLE);
LSA_SET_NULL (&helper->mvcc_header.prev_version_lsa);
error_code = heap_set_mvcc_rec_header_on_overflow (helper->forward_page, &helper->mvcc_header);
// ... condensed ...
vacuum_log_remove_ovf_insid (thread_p, helper->forward_page); /* <- redo-only, zero payload */

Invariant — an overflow MVCC header never changes size. Enforced by substitution instead of removal; violating it would shift the large overflow payload that follows the header.

vacuum_heap_record (verdict REMOVE) deletes the version. REC_HOME removals join the bulk batch; REC_RELOCATION/REC_BIGONE are two-page operations, so the batch is flushed first (vacuum_heap_page_log_and_reset with unlatch_page = false) and the pair wrapped in a system operation (log_sysop_startlog_sysop_commit) so recovery never sees a dangling half. All three then run spage_vacuum_slot (… helper->reusable) on the home slot, whose afterlife is the OID-stability contract:

// spage_vacuum_slot -- src/storage/slotted_page.c
slot_p->offset_to_record = SPAGE_EMPTY_OFFSET;
if (reusable)
{
slot_p->record_type = REC_DELETED_WILL_REUSE; /* <- nothing references this OID; recycle slotid */
}
else
{
slot_p->record_type = REC_MARKDELETED; /* <- referable file: indexes may hold the OID; tombstone */
}

A referable heap (FILE_HEAP) may still have b-tree entries or OID references pointing at the slot, so the slotid is never handed out again; a reusable heap (FILE_HEAP_REUSE_SLOTS) guarantees no such references. The REC_NEWHOME forward slot is always vacuumed with reusable = true — nothing references a NEWHOME directly except its REC_RELOCATION home, dying in the same sysop. Within the sysop: REC_RELOCATION logs the home-slot removal (vacuum_log_redoundo_vacuum_record, undo = the copied link forward_recdes), vacuums the forward slot, conditionally feeds the forward page into best-space statistics (PRM_ID_HF_MAX_BESTSPACE_ENTRIES > 0 && freespace > HEAP_DROP_FREE_SPACEheap_stats_update), logs the forward-slot removal (undo = the copied NEWHOME record), dirties/unfixes forward, commits. REC_BIGONE logs the home-slot removal, unfixes the overflow page, then heap_ovf_delete deallocates the whole overflow chain — failure → log_sysop_abort + ER_FAILED; success → commit.

vacuum_heap_page_log_and_reset is the batch flusher: n_bulk_vacuumed == 0 → just unfix (if asked); else compact if spage_need_compact, fold freed space into best-space stats when update_best_space_stat && initial_home_free_space != -1, emit one vacuum_log_vacuum_heap_page, dirty, optionally unfix, zero the counter. Three log families result:

  • Bulk RVVAC_HEAP_PAGE_VACUUM (redo-only): addr.offset packs the slot count plus two flag bits (VACUUM_LOG_VACUUM_HEAP_REUSABLE = 0x8000, VACUUM_LOG_VACUUM_HEAP_ALL_VACUUMED = 0x4000), and each slotid’s sign encodes the verdict — negated for REMOVE, positive for the insid strip. Redo vacuum_rv_redo_vacuum_heap_page unpacks rcv->offset & ~VACUUM_LOG_VACUUM_HEAP_MASK: n_slots == 0 (asserts all_vacuumed) → heap_page_set_vacuum_status_none; negative slotid → negate, spage_vacuum_slot; positive → peek (must be REC_HOME/REC_NEWHOME, else ER_FAILED), rebuild with the smaller flag-cleared header. Then compact, downgrade if all_vacuumed, dirty.
  • Per-record RVVAC_HEAP_RECORD_VACUUM for two-page removals, via vacuum_log_redoundo_vacuum_record: an undoredo whose offset packs slotid + reusable bit, undo crumbs are record type + pre-image, redo payload empty (“only the object’s address to re-vacuum”). Redo vacuum_rv_redo_vacuum_heap_record re-derives slotid/reusable and re-runs spage_vacuum_slot + compaction; undo vacuum_rv_undo_vacuum_heap_record strips the flag bits and delegates to heap_rv_redo_insert to re-insert the pre-image. The undo half exists because these removals live in a sysop: if it aborts (the heap_ovf_delete failure branch), the home slot must come back.
  • Overflow RVVAC_REMOVE_OVF_INSID, via vacuum_log_remove_ovf_insid — zero bytes (log_append_redo_data2 (… 0, 0, NULL)). Redo vacuum_rv_redo_remove_ovf_insid rebuilds from the page: read header, MVCC_SET_INSID (… MVCCID_ALL_VISIBLE), LSA_SET_NULL (… prev_version_lsa), write back, dirty — idempotent because of the substitution trick (8.6).

was_interrupted ties execution to recovery: a re-executed job replays log records whose heap effects may already be on disk, so its relaxations — pgbuf_fix_if_not_deallocated, tolerated PAGE_FTAB reuse, the softened assert (page_vacuum_status != HEAP_PAGE_VACUUM_NONE || (was_interrupted && helper.n_vacuumed == 0)), the refusal to downgrade an already-NONE page — are the “this already happened” symptoms a crash can manufacture. Chapter 11 covers how the job regains the flag after restart.

  1. vacuum_heap sorts the collected (VFID, OID) array file-major/page-minor and calls vacuum_heap_page per page run, caching HFID/reusability per file group; dropped files were filtered at collection time, and release builds abandon a failed page rather than fail the job.
  2. VACUUM_HEAP_HELPER is the whole working set: forward_page means “NEWHOME page” for REC_RELOCATION but “first overflow page” for REC_BIGONE; the COPY-read record in rec_buf is rewritten in place by the strip; the bulk batch holds only single-page REC_HOME changes.
  3. mvcc_satisfies_vacuum reduces to two threshold comparisons: delid < TREMOVE; else insid present, not MVCCID_ALL_VISIBLE, and < TDELETE_INSID_PREV_VER; else CANNOT_VACUUM — a normal outcome, finished by whichever later job’s threshold passes the pending delete.
  4. vacuum_heap_prepare_record re-reads the slot under retry_prepare: after any latch drop and preserves pgbuf’s ordered-fix rule with a conditional forward latch plus reverse-order refix — ending in the invariant that forward_page is fixed iff the type is REC_RELOCATION or REC_BIGONE.
  5. Removal follows the OID-stability contract: referable heaps tombstone (REC_MARKDELETED), reusable heaps recycle (REC_DELETED_WILL_REUSE), NEWHOME forward slots always recycle, and an emptied reusable page is deallocated via heap_remove_page_on_vacuum.
  6. Logging is three-tiered: bulk redo-only RVVAC_HEAP_PAGE_VACUUM with verdicts as slotid signs and reusable/all_vacuumed bits in the offset; per-record undoredo RVVAC_HEAP_RECORD_VACUUM inside a sysop for two-page removals; zero-payload RVVAC_REMOVE_OVF_INSID whose redo is idempotent because overflow insids are substituted, never stripped.
  7. was_interrupted converts “impossible” states — deallocated page, FTAB reuse, status already NONE — into tolerated no-ops, because a re-executed job expects to find its own past work already applied.

Chapter 9: Block Completion and Log Reclamation

Section titled “Chapter 9: Block Completion and Log Reclamation”

A vacuum job ends one of two ways: it processed every MVCC operation in its block, or it died midway (shutdown in SERVER_MODE, interrupt in SA_MODE). This chapter traces how either outcome travels back into vacuum data, how fully vacuumed entries are physically removed from the vacuum data file, and how that removal advances the log floor — the oldest log page the system must keep — so archive purging can proceed. The garbage-collector rationale is in the companion doc (cubrid-vacuum.md); here we trace every branch.

flowchart LR
    W["worker<br/>vacuum_process_log_block"] -->|"vacuum_finished_block_vacuum<br/>blockid + status flags"| Q["vacuum_Finished_job_queue"]
    Q -->|"consume in batches"| M["master<br/>vacuum_data_mark_finished"]
    M -->|"set_vacuumed / set_interrupted"| VD["vacuum data pages"]
    VD -->|"page fully vacuumed"| EP["vacuum_data_empty_page"]
    M --> K["vacuum_update_keep_from_log_pageid"]
    K -->|"vacuum_min_log_pageid_to_keep"| AP["logpb_remove_archive_logs*"]

Figure 9-1. The completion pipeline: worker outcome flows through the finished-job queue into vacuum data, and the log floor follows.

9.1 vacuum_finished_block_vacuum — encoding the outcome into the blockid

Section titled “9.1 vacuum_finished_block_vacuum — encoding the outcome into the blockid”

When vacuum_process_log_block reaches its end: label (Chapter 7), it calls vacuum_finished_block_vacuum with vacuum_completetrue only after vacuum_heap returned NO_ERROR; every goto end error path leaves it false. The data argument is the worker’s copy of the VACUUM_DATA_ENTRY; the header comment is explicit that the real table entry may have moved while the job ran, so the outcome travels by blockid value, not by pointer.

// vacuum_finished_block_vacuum -- src/query/vacuum.c
if (is_vacuum_complete)
{
data->set_vacuumed (); /* VACUUMED status, INTERRUPTED flag cleared */
}
else
{
/* We expect that worker job is abandoned during shutdown. But all other cases are error cases. */
// ... condensed: SERVER_MODE warns iff thread_p->shutdown (asserted); SA_MODE iff ER_INTERRUPTED ...
data->set_interrupted (); /* AVAILABLE status + INTERRUPTED flag */
}
blockid = data->blockid; /* raw field: blockid WITH the flag bits just set */
if (!vacuum_Finished_job_queue->produce (blockid))
{
assert_release (false);
vacuum_er_log_error (..., "%s", "Finished job queue is full!!!");
}
// ... condensed, SERVER_MODE: is_half_full () -> vacuum_Master_daemon->wakeup ();

Branch accounting:

  1. Successset_vacuumed () writes VACUUM_BLOCK_STATUS_VACUUMED (0x8000...) into the 2-bit status mask (VACUUM_BLOCK_STATUS_MASK = 0xC000000000000000) and clears VACUUM_BLOCK_FLAG_INTERRUPTED (0x2000...) — a block interrupted once and finished on retry must not carry a stale flag.
  2. Failureset_interrupted () sets status VACUUM_BLOCK_STATUS_AVAILABLE (all-zero) and raises INTERRUPTED: re-dispatchable, with history. Severity is graded — only thread_p->shutdown (SERVER_MODE) or ER_INTERRUPTED (SA_MODE) is legitimate; anything else logs ERROR but proceeds identically.
  3. Queue produce failure — release-mode “should never happen”: the master caps outstanding jobs at VACUUM_MAX_TASKS_IN_WORKER_POOL (3 × PRM_ID_VACUUM_WORKER_COUNT), far below VACUUM_FINISHED_JOB_QUEUE_CAPACITY (2048). The handling is deliberate data loss with a loud log — the entry stays IN_PROGRESS, rescued only by boot-time recovery (Chapter 11).
  4. Half-full wakeup — the producer nudges the master when vacuum_Finished_job_queue->is_half_full (). The master’s own vacuum_master_task::should_force_data_update polls the same queue and vacuum_Block_data_buffer. Producer nudges, consumer polls; neither alone is load-bearing.

Invariant 9-A — one queue entry per dispatched job, flags already resolved. Every dispatched job calls vacuum_finished_block_vacuum exactly once — the end: label guards it with if (!sa_mode_partial_block), since partial-block jobs never existed in vacuum data. The pushed blockid carries its final status in its high bits; the queue element is the entire worker-to-master report. If a job could exit without producing, its entry would stay IN_PROGRESS, index_unvacuumed would never pass it, and keep_from_log_pageid would freeze — unbounded archive growth.

9.2 What INTERRUPTED changes on the next attempt

Section titled “9.2 What INTERRUPTED changes on the next attempt”

The bit survives in the vacuum data entry (section 9.3) and is consumed twice on redispatch: vacuum_job_cursor::start_job_on_current_entry skips appending RVVAC_START_JOB when entry.was_interrupted () — the entry is already known to be partially executed (Chapter 6) — and vacuum_process_log_block passes it into the heap pass:

// vacuum_process_log_block -- src/query/vacuum.c
was_interrupted = data->was_interrupted () || sa_mode_partial_block;
// ... condensed ...
error_code = vacuum_heap (thread_p, worker, threshold_mvccid, was_interrupted);

A re-run replays the whole block’s log, revisiting heap pages the first attempt already cleaned. With was_interrupted == true, vacuum_heap_page tolerates pages whose vacuum status says “nothing to do”; without the flag those states would be treated as corruption (Chapter 8).

9.3 vacuum_data_mark_finished — the master consumes the queue

Section titled “9.3 vacuum_data_mark_finished — the master consumes the queue”

Runs inside vacuum_data::update on the master (and once more from vacuum_finalize to drain stragglers); it is the only runtime writer that moves entries out of IN_PROGRESS — recovery replays the same transitions (section 9.5). Shape: drain the queue into a stack buffer (at most VACUUM_FINISHED_JOB_QUEUE_CAPACITY elements; zero consumed → return 0 without fixing a page), qsort with vacuum_compare_blockids — which strips the flag bits — so array order matches physical entry order, then walk vacuum data pages and the sorted array in lockstep.

flowchart TD
    A["consume queue, qsort,<br/>start at first_page"] --> D["inner loop: mark each blockid<br/>falling inside current page"]
    D --> E{"any change<br/>in this page?"}
    E -->|no| H
    E -->|yes| F["advance index_unvacuumed<br/>past leading vacuumed entries"]
    F --> G{"index_unvacuumed ==<br/>index_free?"}
    G -->|"yes: page empty"| EP["vacuum_data_empty_page<br/>no FINISHED_BLOCKS log"]
    EP --> EPN{"data_page == NULL?"}
    EPN -->|"yes, blocks left unmatched"| ERR1["assert(false)<br/>log + return"]
    EPN -->|"yes, all matched"| DONE
    EPN -->|"no: moved to next page"| D
    G -->|"no: page has data"| CMP["memmove-compact if last page;<br/>log RVVAC_DATA_FINISHED_BLOCKS"]
    CMP --> H{"index ==<br/>n_finished_blocks?"}
    H -->|yes| DONE["unfix pages"]
    H -->|"no, next_page is NULL"| ERR2["assert(false)<br/>log + return"]
    H -->|"no, follow next_page"| NXT["fix next page<br/>(fix failure: third early return)"]
    NXT --> D
    DONE --> K["vacuum_update_keep_from_log_pageid"]

Figure 9-2. vacuum_data_mark_finished, branch-complete. All three early-return exits skip the log-floor update.

The marking loop relies on blockids inside one page being contiguous (asserted as page_free_blockid == data[index_free - 1].get_blockid () + 1):

// vacuum_data_mark_finished -- src/query/vacuum.c
while ((index < n_finished_blocks)
&& ((blockid = VACUUM_BLOCKID_WITHOUT_FLAGS (finished_blocks[index])) < page_free_blockid))
{
data = page_unvacuumed_data + (blockid - page_unvacuumed_blockid); /* direct index, no search */
assert (data->get_blockid () == blockid);
assert (data->is_job_in_progress ()); /* only dispatched jobs may report back */
if (VACUUM_BLOCK_STATUS_IS_VACUUMED (finished_blocks[index]))
data->set_vacuumed ();
else
data->set_interrupted (); /* AVAILABLE + INTERRUPTED, redispatchable */
index++;
}

VACUUM_BLOCKID_WITHOUT_FLAGS strips flags for addressing; the flagged value decides VACUUMED vs INTERRUPTED. A page with no matched reports falls through untouched and unlogged. After marking, the index_unvacuumed advance walks only over leading is_vacuumed () entries — an INTERRUPTED entry stops the walk, staying visible for redispatch. Two outcomes follow:

  • Page emptied (index_unvacuumed == index_free): the page is handed to vacuum_data_empty_page (section 9.4). No RVVAC_DATA_FINISHED_BLOCKS record is written here — the page’s marking is recovered indirectly through the reset / splice records that vacuum_data_empty_page emits, since after recovery the page is reborn empty or dropped.
  • Page retains data: only the last page is compacted in place by memmove (new blocks append there, Chapter 4); interior pages are never compacted. Then one RVVAC_DATA_FINISHED_BLOCKS redo record is appended, payload exactly the slice finished_blocks[page_start_index .. index).

Both “report with no matching entry” exits and a failed page fix return early without updating the log floor — better a stale floor than a wrong one.

Invariant 9-B — index_unvacuumed never points past an unvacuumed entry, and everything before it is VACUUMED. Enforced by the strictly local advance loop (and its twin in redo). The log-floor derivation and the job cursor’s restart position both read the first unvacuumed entry; a vacuumed entry below the watermark would pin the log floor forever, and a skipped INTERRUPTED entry would never be re-vacuumed.

9.4 Physical shrink — vacuum_data_empty_page

Section titled “9.4 Physical shrink — vacuum_data_empty_page”

The function receives a fully-vacuumed page (asserts index_unvacuumed == index_free) and distinguishes three cases, numbered the same way in the source comment:

Case 1 (last page — also covers first == last) never deallocates; the file always keeps at least one page. The page is reset via vacuum_init_data_page_with_last_blockid, which reinitializes the header and stores vacuum_Data.get_last_blockid () into slot 0 (data_page->data->blockid = blockid), logging RVVAC_DATA_INIT_NEW_PAGE; *data_page = NULL tells the caller there is no next page. An empty table thus remembers the last block it ever held — how vacuum_data::get_last_blockid stays correct across restarts and how Chapter 4’s append path detects already-consumed blocks.

Case 2 (first page, more pages exist) must repoint the file descriptor’s vpid_first before deallocating; a crash between the two would leave the boot loader (Chapter 2) pointing at a deallocated page. Order under a sysop: fix next page, sysop start, file_descriptor_update, swap vacuum_Data.first_page, file_dealloc old first, sysop commit. Errors are graded — a failed next-page fix aborts before any mutation; a failed file_descriptor_update aborts the sysop with nothing visible; a failed file_dealloc aborts and manually re-swaps vacuum_Data.first_page / vacuum_Data_load.vpid_first back (release-mode damage control for a path that asserts in debug).

Case 3 (interior page) requires prev_data_page (NULL → assert, unfix, early return); inside a sysop it deallocates the page (dealloc failure → sysop abort, return) and splices the list with an undoredo RVVAC_DATA_SET_LINK on the previous page — undo restores the old link on abort. It then re-fixes prev_data_page->next_page so the caller’s scan continues on the page that followed.

The companion vacuum_data_empty_update_last_blockid (asserts vacuum_is_empty () and first_page == last_page) re-runs the slot-0 init to persist the freshest last_blockid. It is called from vacuum_finalize (non-SERVER builds) and from vacuum_sa_reflect_last_blockid, which first copies logpb_last_complete_blockid () into vacuum_Data and log_Gl.hdr.vacuum_last_blockid — keeping an offline-restarted server from re-consuming blocks it already covered.

9.5 Recovery of completion — vacuum_rv_redo_data_finished and friends

Section titled “9.5 Recovery of completion — vacuum_rv_redo_data_finished and friends”

RVVAC_DATA_FINISHED_BLOCKS redo replays section 9.3’s page mutation from the logged blockid slice — including the watermark advance and last-page compaction, which were not logged separately because they are deterministic functions of page content:

// vacuum_rv_redo_data_finished -- src/query/vacuum.c
if (rcv_data_ptr != NULL)
{
// ... condensed: per logged blockid_with_flags, locate entry by
// (blockid - page_unvacuumed_blockid) and set_vacuumed () / set_interrupted () ...
}
while (data_page->index_unvacuumed < data_page->index_free
&& data_page->data[data_page->index_unvacuumed].is_vacuumed ())
{
data_page->index_unvacuumed++; /* same watermark advance as runtime */
}
if (VPID_ISNULL (&data_page->next_page) && data_page->index_unvacuumed > 0)
{
/* Remove all vacuumed blocks. */
// ... condensed: memmove compaction, index_free -= index_unvacuumed, index_unvacuumed = 0 ...
}

The rcv_data_ptr != NULL guard is defensive — the record’s only appender (section 9.3) always logs a non-empty slice; what matters is that the advance and last-page compaction run unconditionally after marking, recomputed from page content rather than from the payload. vacuum_rv_redo_data_finished_dump pretty-prints the payload; its two strings (“vacuumed” vs “available and interrupted”) are the codebase’s most concise statement of the two flag combinations.

vacuum_rv_redo_vacuum_complete recovers the SA_MODE-only RVVAC_COMPLETE record (appended by xvacuum after an offline full vacuum, payload log_Gl.hdr.mvcc_next_id): it installs the logged MVCCID into vacuum_Data.oldest_unvacuumed_mvccid and calls logpb_vacuum_reset_log_header_cache to wipe the header’s partial-block bookkeeping (Chapter 3) — after a complete offline vacuum there is nothing left for vacuum to remember.

9.6 Closing the loop — the log floor and the archive gate

Section titled “9.6 Closing the loop — the log floor and the archive gate”

vacuum_data::update is the master’s once-per-iteration consolidation, invoked through vacuum_job_cursor::force_data_update (which unloads the cursor first — Chapter 6):

// vacuum_data::update -- src/query/vacuum.c
// first remove vacuumed blocks
mark_finished = vacuum_data_mark_finished (thread_p);
// then consume new generated blocks
vacuum_consume_buffer_log_blocks (thread_p); // Chapter 4
if (!vacuum_Data.is_empty ())
{
upgrade_oldest_unvacuumed (get_first_entry ().oldest_visible_mvccid);
}

upgrade_oldest_unvacuumed asserts monotonicity — valid because entries are appended in MVCCID order and retired from the front; on an empty table the watermark is left alone rather than guessed. The log floor itself is recomputed at the tail of every successful vacuum_data_mark_finished, and once at boot at the end of vacuum_data_load_and_recover:

// vacuum_update_keep_from_log_pageid -- src/query/vacuum.c
if (vacuum_is_empty ())
{
// keep starting with next after last_blockid ()
vacuum_Data.keep_from_log_pageid = VACUUM_FIRST_LOG_PAGEID_IN_BLOCK (vacuum_Data.get_last_blockid () + 1);
}
else
{
vacuum_Data.keep_from_log_pageid = VACUUM_FIRST_LOG_PAGEID_IN_BLOCK (vacuum_Data.get_first_blockid ());
}
// ... condensed: er_log; if (!is_archive_removal_safe) is_archive_removal_safe = true; /* set once */

VACUUM_FIRST_LOG_PAGEID_IN_BLOCK (b) is just b * vacuum_Data.log_block_npages: the floor is always a block boundary — first unvacuumed block when the table has entries, the block after the remembered last_blockid when empty.

Invariant 9-C — log pages below keep_from_log_pageid are never needed by vacuum again. Every entry removed (or page dropped) held a VACUUMED block, and INTERRUPTED entries — whose log is still needed — hold the floor down by staying at or above index_unvacuumed (Invariant 9-B). This is the contract that makes archive deletion safe.

Two read-side gates expose the floor to the log layer:

  • vacuum_min_log_pageid_to_keep returns the floor, with two overrides: PRM_ID_DISABLE_VACUUM returns 0 (keep everything — a debug aid), and SA_MODE after xvacuum sets is_vacuum_complete returns NULL_PAGEID (keep nothing for vacuum). Consumers: logpb_remove_archive_logs_exceed_limit and logpb_remove_archive_logs (in log_page_buffer.c) bound which archive numbers may go; logpb_backup uses it to decide which archives a backup must include.
  • vacuum_is_safe_to_remove_archives returns is_archive_removal_safe. Both purge functions check it first and refuse to delete anything while false. Flag and floor boot as false / NULL_PAGEID; the flag flips only inside vacuum_update_keep_from_log_pageid — normally first at the end of vacuum_data_load_and_recover — so purge stays blocked until vacuum data is loaded and a real floor exists.

The periodic driver lives in log_manager.c: the “log-rm-archive” daemon runs log_remove_log_archive_daemon_task::execute, calling logpb_remove_archive_logs_exceed_limit either on the configured PRM_ID_REMOVE_LOG_ARCHIVES_INTERVAL (one archive per tick, max_count = 1) or unthrottled (max_count = 0) when no interval is set.

Finally, vacuum_is_work_in_progress (any vacuum_Workers[i].state not VACUUM_WORKER_STATE_INACTIVE; SA_MODE trivially false) is the shutdown barrier: vacuum_finalize asserts it before draining — only when no worker can still produce is the final vacuum_data_mark_finished plus queue-empty assert meaningful, and only then may the queue be deleted.

  1. A job’s outcome is encoded in the blockid’s own high bitsset_vacuumed (VACUUMED, INTERRUPTED cleared) or set_interrupted (AVAILABLE + INTERRUPTED) — and pushed through vacuum_Finished_job_queue; the queue element is the entire worker-to-master report (Invariant 9-A).
  2. INTERRUPTED means AVAILABLE-with-history, not dead: the next dispatch skips RVVAC_START_JOB logging and passes was_interrupted into vacuum_heap, relaxing safe-guards that would otherwise treat re-vacuumed pages as corruption.
  3. vacuum_data_mark_finished drains the queue in sorted batches, marks entries by direct contiguous-blockid arithmetic, advances index_unvacuumed past leading VACUUMED entries only (Invariant 9-B), logs one RVVAC_DATA_FINISHED_BLOCKS per surviving touched page (an emptied page is recovered via its reset/splice records instead), and treats unmatched reports as a loud early return that leaves the log floor untouched.
  4. Physical shrink has three cases: last page reset in place (preserving last_blockid in slot 0), first page swapped out under a sysop that updates the file descriptor before deallocating, interior pages spliced out with an undoredo link update.
  5. Redo replays marking from the logged flagged-blockid slice and recomputes the watermark and last-page compaction — deterministic from page content, so never logged separately.
  6. The log floor keep_from_log_pageid is always a block boundary, and pages below it are never needed by vacuum again (Invariant 9-C). vacuum_min_log_pageid_to_keep serves it to the archive purgers (and backup); vacuum_is_safe_to_remove_archives keeps purge blocked until the floor is first computed — normally at boot in vacuum_data_load_and_recover; the periodic purge is driven by log_remove_log_archive_daemon_task in log_manager.c.
  7. vacuum_is_work_in_progress makes shutdown draining sound: no producer may be alive when vacuum_finalize performs the last vacuum_data_mark_finished and asserts the queue empty.

Chapters 7 and 8 showed workers calling vacuum_is_file_dropped for every log record before touching a heap page or b-tree. This chapter traces the other side: who writes that ledger, why a DROP TABLE committing mid-job cannot crash a worker holding collected OIDs for the file, and how the ledger is trimmed. Design rationale: companion, “Dropped-file table” under Common DBMS Design.

10.1 On-disk shape, globals, and the support structs

Section titled “10.1 On-disk shape, globals, and the support structs”

The ledger is a chain of PAGE_DROPPED_FILES pages rooted at vacuum_Dropped_files_vpid, each a VFID-sorted entry array:

// vacuum_dropped_file -- src/query/vacuum.c
struct vacuum_dropped_file
{
VFID vfid;
MVCCID mvccid;
};
// vacuum_dropped_files_page -- src/query/vacuum.c
struct vacuum_dropped_files_page
{
VPID next_page; /* VPID of next dropped files page. */
INT16 n_dropped_files; /* Number of entries on page */
VACUUM_DROPPED_FILE dropped_files[1]; /* Dropped files. */
};
FieldRoleWhy it exists
entry vfidDropped file (heap or b-tree).Sort/search key; one entry per VFID across reuse (10.3).
entry mvccidBorderline sampled from log_Gl.hdr.mvcc_next_id at notify time.Strictly-older records belong to the dead file; >= to a reusing successor (Invariant 10-B).
page next_pageLink to next ledger page.Singly linked chain; link changes logged separately (10.3).
page n_dropped_filesEntry count on this page.Bounds bsearch/memmove; full at VACUUM_DROPPED_FILES_PAGE_CAPACITY.
page dropped_files[1]Flexible entry array filling DB_PAGESIZE.Sorted by vacuum_compare_dropped_files so lookup is bsearch (Invariant 10-A).
rcv vfid, class_oidVACUUM_DROPPED_FILES_RCV_DATA, payload of RVVAC_NOTIFY_DROPPED_FILE.MVCCID deliberately absent — sampled at apply time (10.5); non-NULL class triggers heap_delete_hfid_from_cache.
track next_tracked_page, dropped_data_pageDebug-only (!NDEBUG) VACUUM_TRACK_DROPPED_FILES: one malloced copy per disk page.memcpy-refreshed on every mutation so a debugger can walk the ledger without fixing pages; never read by production logic.

Static globals: vacuum_Dropped_files_vfid / _vpid, vacuum_Dropped_files_count (fast-path filter, 10.6), vacuum_Dropped_files_loaded. The handshake trio vacuum_Dropped_files_version / _mutex / vacuum_Last_dropped_vfid (10.5) is non-static, though nothing else links to it. VACUUM_DROPPED_FILE_FLAG_DUPLICATE (0x8000) is defined but unused — duplicates are handled by the replace record (10.3), not a flag.

flowchart LR
  subgraph DISK["disk chain (PAGE_DROPPED_FILES)"]
    P1["page 1<br/>n_dropped_files, sorted entries"] -->|next_page| P2["page 2"]
  end
  G["vacuum_Dropped_files_vpid"] --> P1
  C["vacuum_Dropped_files_count"] -.sum of n_dropped_files.- DISK
  subgraph HS["handshake globals"]
    V["vacuum_Dropped_files_version"]
    LV["vacuum_Last_dropped_vfid"]
    M["vacuum_Dropped_files_mutex"]
  end

Figure 10-1 — Ledger pages and the globals that root them.

vacuum_load_dropped_files_from_disk fills the in-memory side at boot or lazily during recovery (10.3): already loaded → assert + NO_ERROR; stale nonzero count → assert (false) + reset; else a read-latched walk sums n_dropped_files into the global count (debug builds also build the track list; a failed malloc frees the partial list, ER_OUT_OF_VIRTUAL_MEMORY), then sets vacuum_Dropped_files_loaded.

10.2 Registration — vacuum_log_add_dropped_file and the POSTPONE/UNDO selector

Section titled “10.2 Registration — vacuum_log_add_dropped_file and the POSTPONE/UNDO selector”

Droppers never touch ledger pages directly. vacuum_log_add_dropped_file returns immediately under PRM_ID_DISABLE_VACUUM (ledger never written, never consulted); otherwise it packs VFID + class OID (NULL OID when no class) and appends RVVAC_NOTIFY_DROPPED_FILE as one of two flavors:

  • a postpone record (VACUUM_LOG_ADD_DROPPED_FILE_POSTPONE = true) when an existing file is dropped — the file dies only if the transaction commits;
  • an undo record (VACUUM_LOG_ADD_DROPPED_FILE_UNDO) when a file is created — on abort the new file is garbage.

Callers: heap destroy/create in heap_file.c (xheap_destroy appends the postpone before file_postpone_destroy, so at commit the notify handshake completes before the file is destroyed), b-tree drop/create in btree.c / btree_load.c, and the index-load sort path in external_sort.c (the b-tree file under construction, undo flavor — not its temporary sort files). The actual insert happens later, inside vacuum_rv_notify_dropped_file (10.5).

10.3 The page walk — vacuum_add_dropped_file branch by branch

Section titled “10.3 The page walk — vacuum_add_dropped_file branch by branch”
flowchart TD
  A["enter vacuum_add_dropped_file"] --> B{"vacuum_Dropped_files_loaded?"}
  B -- no --> C["assert !LOG_ISRESTARTED<br/>load from disk; fail -> ER_FAILED"]
  B -- yes --> D
  C --> D["fix page write-latched, walk chain"]
  D --> E{"util_bsearch found vfid?"}
  E -- yes --> F["overwrite mvccid<br/>log RVVAC_DROPPED_FILE_REPLACE undoredo<br/>set dirty FREE; return NO_ERROR"]
  E -- no --> G{"page full?"}
  G -- yes --> H["advance to next_page<br/>advance debug track node"] --> D
  G -- no --> I["memmove tail right at position<br/>n_dropped_files++; ATOMIC_INC count<br/>log RVVAC_DROPPED_FILE_ADD undoredo<br/>set dirty FREE; return NO_ERROR"]
  D --> J{"chain exhausted?"}
  J -- yes --> K["file_alloc new PAGE_DROPPED_FILES<br/>fail -> unfix, ER_FAILED"]
  K --> L["init: next NULL, count 1, entry[0]<br/>log RVPGBUF_NEW_PAGE redo; set dirty FREE"]
  L --> M["vacuum_dropped_files_set_next_page on old last page<br/>logs RVVAC_DROPPED_FILE_NEXT_PAGE undoredo"]
  M --> N["unfix; return NO_ERROR"]

Figure 10-2 — vacuum_add_dropped_file. Every exit is replace, in-page insert, new-page append, or an assert-backed ER_FAILED.

The replace branch is VFID reuse — dropped before, recycled, and the reincarnation now dropped too; one entry per VFID, newest borderline wins:

// vacuum_add_dropped_file -- src/query/vacuum.c
undo_data = page->dropped_files[position];
save_mvccid = page->dropped_files[position].mvccid;
page->dropped_files[position].mvccid = mvccid;
assert_release (MVCC_ID_FOLLOW_OR_EQUAL (mvccid, save_mvccid));
log_append_undoredo_data (thread_p, RVVAC_DROPPED_FILE_REPLACE, &addr, /* ... before/after entry ... */);

The insert branch logs RVVAC_DROPPED_FILE_ADD with undo length 0 — undoing an add needs only the position (addr.offset); redo carries the full entry. The new-page branch chains via vacuum_dropped_files_set_next_page, which logs old/new VPID as an undoredo pair before assigning page_p->next_page (and on the new-page branch a failed track-node malloc unfixes both pages, ER_FAILED).

Invariant 10-A — per-page VFID sort order. Entries are sorted by vacuum_compare_dropped_files (fileid, then volid); inserts land at the util_bsearch position, and recovery replays the same positional insert. If violated, the bsearch in vacuum_find_dropped_file misses entries and workers vacuum pages of a dropped file — dereferencing freed extents.

10.4 The recovery records and the MVCCID borderline

Section titled “10.4 The recovery records and the MVCCID borderline”
rcvindexundofunredofunpayload
RVVAC_NOTIFY_DROPPED_FILEvacuum_rv_notify_dropped_filesame (as run-postpone)vfid + class_oid
RVVAC_DROPPED_FILE_ADDvacuum_rv_undo_add_dropped_filevacuum_rv_redo_add_dropped_fileredo: entry; undo: none (position in rcv->offset)
RVVAC_DROPPED_FILE_REPLACEvacuum_rv_replace_dropped_filesamebefore/after entry
RVVAC_DROPPED_FILE_NEXT_PAGEvacuum_rv_set_next_page_dropped_filessameold/new VPID
RVVAC_DROPPED_FILE_CLEANUPnone — redo-onlyvacuum_rv_redo_cleanup_dropped_filesn_indexes + descending index array (10.8)

(New ledger pages ride the generic RVPGBUF_NEW_PAGE redo, 10.3.) vacuum_rv_redo_add_dropped_file opens the slot (position < nmemmove), copies the entry, and n_dropped_files++; > n is a logged assert + ER_FAILED. vacuum_rv_undo_add_dropped_file mirrors it (position >= nER_FAILED; else memmove the tail down, n_dropped_files--). The replace function is symmetric — one function for both phases (before-image on undo, after-image on redo) — and validates position < n and VFID_EQ with the on-page entry. Both redo-side functions end with the same guard:

// vacuum_rv_redo_add_dropped_file -- src/query/vacuum.c
if (!MVCC_ID_PRECEDES (dropped_file->mvccid, log_Gl.hdr.mvcc_next_id))
{
log_Gl.hdr.mvcc_next_id = dropped_file->mvccid;
MVCCID_FORWARD (log_Gl.hdr.mvcc_next_id); /* <- keep the borderline ahead of every ledger MVCCID */
}

Invariant 10-B — the borderline rule. Every ledger mvccid strictly precedes log_Gl.hdr.mvcc_next_id: committed changes to the old file carry smaller MVCCIDs; any VFID-reusing transaction starts later and carries >= ones. Redo re-asserts it by forwarding mvcc_next_id. Violated, the test in 10.6 misclassifies new-file records as dead (lost vacuum) or dead-file records as live (use-after-free).

10.5 The version handshake — publishing a drop without stalling workers

Section titled “10.5 The version handshake — publishing a drop without stalling workers”

vacuum_rv_notify_dropped_file executes RVVAC_NOTIFY_DROPPED_FILE both as run-postpone and as undo (same function in the recovery.c table). It samples the borderline, inserts, then notifies workers:

// vacuum_rv_notify_dropped_file -- src/query/vacuum.c
mvccid = ATOMIC_LOAD_64 (&log_Gl.hdr.mvcc_next_id); /* <- borderline sampled NOW, not at drop statement */
error = vacuum_add_dropped_file (thread_p, &rcv_data->vfid, mvccid);
// ... condensed: error -> return; then ...
vacuum_notify_all_workers_dropped_file (rcv_data->vfid, mvccid);
if (!OID_ISNULL (class_oid)) { (void) heap_delete_hfid_from_cache (thread_p, class_oid); }

The notify step is where “without stalling transactions” is bought: the dropper pays, never the workers. Its body is SERVER_MODE-only — in SA mode there are no concurrent workers and it compiles to a no-op:

// vacuum_notify_all_workers_dropped_file -- src/query/vacuum.c
if (!LOG_ISRESTARTED ()) { return; } /* <- workers are not running during recovery */
pthread_mutex_lock (&vacuum_Dropped_files_mutex);
VFID_COPY (&vacuum_Last_dropped_vfid, &vfid_dropped); /* <- one drop published at a time */
my_version = ++vacuum_Dropped_files_version;
for (workers_min_version = vacuum_get_worker_min_dropped_files_version ();
workers_min_version != -1 && workers_min_version < my_version;
workers_min_version = vacuum_get_worker_min_dropped_files_version ())
{
thread_sleep (1); /* <- dropper spins; workers never block */
}
VFID_SET_NULL (&vacuum_Last_dropped_vfid);
pthread_mutex_unlock (&vacuum_Dropped_files_mutex);

vacuum_get_worker_min_dropped_files_version scans vacuum_Workers[], considering only state != VACUUM_WORKER_STATE_INACTIVE; -1 (none active) exits the loop immediately. The worker side is the version gate at the top of vacuum_process_log_record (Chapter 7):

// vacuum_process_log_record -- src/query/vacuum.c
if (worker->drop_files_version != vacuum_Dropped_files_version)
{
VFID_COPY (&vfid, &vacuum_Last_dropped_vfid);
vacuum_cleanup_collected_by_vfid (worker, &vfid); /* <- purge BEFORE acknowledging */
worker->drop_files_version = vacuum_Dropped_files_version;
}
error_code = vacuum_is_file_dropped (thread_p, is_file_dropped, &vacuum_info->vfid, *mvccid);

The ordering is the point: a worker first discards heap objects already collected for vacuum_Last_dropped_vfid, then advances its own drop_files_version so the dropper’s min-scan sees it; only after every in-flight worker has done so does the dropper’s commit proceed to destroy the file. The counter is a free-running INT32, so vacuum_compare_dropped_files_version makes the min-scan wraparound-safe (same-sign values compare by plain a - b; for mixed signs the value in the extreme positive quarter >= 0x3FFFFFFF is treated as the older, pre-wraparound side).

Invariant 10-C — no worker outruns the ledger. When vacuum_notify_all_workers_dropped_file returns, every worker active at publication has purged its collected objects for the dropped VFID and will see the entry on its next check; later workers read it from disk. Enforced by the mutex (one drop in flight), vacuum_Last_dropped_vfid as the purge target, and the min-version spin. If broken, Chapter 8’s heap executor fixes pages of a deallocated file.

10.6 The worker-side query — vacuum_is_file_dropped / vacuum_find_dropped_file

Section titled “10.6 The worker-side query — vacuum_is_file_dropped / vacuum_find_dropped_file”

vacuum_is_file_dropped: PRM_ID_DISABLE_VACUUMfalse; else delegate to vacuum_find_dropped_file, which opens with the cheap check that makes the common case free — vacuum_Dropped_files_count == 0false with no page fix, no latch. Otherwise it walks the chain read-latched. A failed fix is tolerated only as interrupt/shutdown (assert (error == ER_INTERRUPTED); workers also assert (thread_p->shutdown)); *is_file_dropped is set false (“actually unknown but unimportant”) and the error aborts the job. Each fixed page gets pgbuf_notify_vacuum_follows (dropped-files pages are never LRU-boosted; this delays victimization). Then bsearch (Invariant 10-A) gives three outcomes: found and MVCC_ID_PRECEDES (mvccid, dropped_file->mvccid)true (the record predates the borderline, so it belongs to the dead file); found but the record MVCCID is >= the entry → false (VFID reuse — the record belongs to the reincarnation); not found → follow next_page, and chain-exhausted → false. The first outcome is the safety property the whole ledger exists for: a worker calls a record dropped exactly when its MVCCID precedes the entry’s borderline, so it never vacuums — and never fixes a page of — a file dropped after that record’s transaction.

10.7 Purging already-collected work — vacuum_cleanup_collected_by_vfid

Section titled “10.7 Purging already-collected work — vacuum_cleanup_collected_by_vfid”

Called only from the version gate. It qsorts the worker’s heap_objects with vacuum_compare_heap_object (VFID-major, OID-minor), scans for the first matching entry — not found → return; scans for the end of the run; run reaches the array end → truncate (n_heap_objects = start); else memmove the tail over the run and subtract end - start. The reorder is harmless: Chapter 8’s executor re-sorts before batching.

10.8 Trimming — vacuum_cleanup_dropped_files and its logging pair

Section titled “10.8 Trimming — vacuum_cleanup_dropped_files and its logging pair”

The trim runs after a full pass has set vacuum_Data.oldest_unvacuumed_mvccid = log_Gl.hdr.mvcc_next_id; in this revision the only call site is xvacuum (SA-mode cubrid vacuumdb, Chapter 11) — no server-mode caller remains, so on a live server the ledger only grows until the next offline vacuum. Branches: recovery → skip; vacuum_Dropped_files_count == 0 → skip; per page (write-latched): empty page → unfix, continue; else scan entries from the end down, and for each with MVCC_ID_PRECEDES (mvccid, oldest_unvacuumed_mvccid) record its index in removed_entries[] and immediately memmove the tail down. If anything was removed: ATOMIC_INC_32 (&count, -n), decrement the page counter, log via vacuum_log_cleanup_dropped_files (redo-only RVVAC_DROPPED_FILE_CLEANUP crumbs: n_indexes + the end-first index array), refresh the track mirror, set dirty FREE; else plain unfix. vacuum_rv_redo_cleanup_dropped_files unpacks the crumbs (assert (offset == rcv->length)) and replays each removal end-first, so each memmove compacts a tail untouched by later (smaller) indexes. No undo function: re-trimming after a crash removes only provably unneeded entries.

Three code smells to know before modifying this path: (1) the in-code todo notes emptied pages are never deallocated (“it looks like they are leaked”); (2) the trailing cut-off-link step is doubly broken — it calls vacuum_dropped_files_set_next_page (thread_p, page, &page->next_page), re-assigning the page’s own current link (a no-op that never writes the intended NULL), and last_non_empty_page_vpid records the successor of each non-empty page (it copies vpid after the advance to page->next_page), so even the fixed page is wrong; and (3) the cleanup redo computes mem_size = (page->n_dropped_files - indexes[i]) * sizeof (VACUUM_DROPPED_FILE) — one entry more than the runtime path’s (page_count - i - 1) — so each replayed removal copies one entry past the live tail (benign garbage beyond the decremented count, except that on a full page the source read crosses the page boundary).

Invariant 10-D — trim safety. An entry may be removed only when mvccid < vacuum_Data.oldest_unvacuumed_mvccid: every log record older than the watermark has been vacuumed or skipped-as-dropped (Chapter 9), so no future ledger lookup can need the entry. Violating it resurrects exactly the lost-file problem the ledger exists to prevent.

  1. The ledger is a VFID-sorted, page-chained table keyed by a borderline MVCCID sampled from log_Gl.hdr.mvcc_next_id at notify time; records strictly older belong to the dead file (Invariants 10-A, 10-B).
  2. Registration is transactional by construction: vacuum_log_add_dropped_file appends a postpone for drops (applies on commit, ahead of the file-destroy postpone) and an undo for creations (applies on abort); both funnel into vacuum_rv_notify_dropped_file.
  3. vacuum_add_dropped_file has four success exits — replace (VFID reuse), sorted in-page insert (zero-length undo), next-page retry, and new-page append chained via vacuum_dropped_files_set_next_page. VACUUM_DROPPED_FILE_FLAG_DUPLICATE is vestigial.
  4. The handshake inverts the usual cost: the dropper spins under vacuum_Dropped_files_mutex until every active worker’s drop_files_version catches up (wraparound-safe compare); workers pay one comparison per record and purge collected objects via vacuum_cleanup_collected_by_vfid before acknowledging (Invariant 10-C).
  5. vacuum_find_dropped_file short-circuits on a zero global count; the found-and-older outcome is the safety test, while found-but-newer correctly classifies VFID reuse as not-dropped.
  6. Trimming removes entries older than oldest_unvacuumed_mvccid (Invariant 10-D) with a redo-only positional record; it runs only in SA-mode xvacuum, emptied pages leak, the empty-page unlink is a no-op, and the redo over-copies by one entry — four things a modifier should not inherit silently.

Chapter 11: Crash Recovery and Standalone Paths

Section titled “Chapter 11: Crash Recovery and Standalone Paths”

Every previous chapter assumed a live SERVER_MODE process: a master producing jobs (Ch 6), workers consuming them (Ch 7-8), an append path feeding vacuum_Block_data_buffer (Ch 3-4). This chapter answers what happens to blocks in each state — AVAILABLE, IN_PROGRESS, VACUUMED, or not yet registered — across a crash, a copydb, or an SA-mode session with no daemon. The high-level story is the companion’s “Recovery integration” section (cubrid-vacuum.md); here we trace every branch.

11.1 The recovery handshake: recovery_lsa and the notify hooks

Section titled “11.1 The recovery handshake: recovery_lsa and the notify hooks”

Vacuum does not participate in the ARIES passes. Its entire recovery contract is one field, vacuum_data::recovery_lsa, set by log_recovery immediately before the analysis pass:

// log_recovery -- src/transaction/log_recovery.c
/* Notify vacuum it may need to recover the lost block data.
* ... 1. recovery finds MVCC op log records after last checkpoint ...
* ... 2. no MVCC op log record is found, so vacuum has to start recovery from checkpoint LSA ... */
vacuum_notify_server_crashed (&rcv_lsa); /* <- rcv_lsa = checkpoint LSA (older for media crash) */

vacuum_notify_server_crashed is one line: LSA_COPY (&vacuum_Data.recovery_lsa, recovery_lsa). Its clean-shutdown counterpart vacuum_notify_server_shutdown calls vacuum_Data.shutdown_sequence.request_shutdown () from vacuum_stop_workers, telling the master (Ch 6) to stop generating jobs before the worker pool is destroyed.

Invariant 11-A — recovery_lsa is non-null iff crash recovery ran in this boot. Set only by vacuum_notify_server_crashed, consumed by vacuum_recover_lost_block_data, cleared by LSA_SET_NULL in vacuum_data_load_and_recover. A clean boot thus skips lost-block recovery (first early-return in §11.4); a leaked value would force a useless log-tail re-scan.

11.2 The RVVAC_* catalogue — what the redo pass replays for vacuum

Section titled “11.2 The RVVAC_* catalogue — what the redo pass replays for vacuum”

Before any vacuum-specific boot logic runs, the ordinary ARIES redo pass has already replayed every page-level vacuum mutation through the RV_fun table in recovery.c. How a record reaches these functions — RV_fun[rcvindex] lookup, page fix, page-LSA gating, undo-vs-redo selection — is the recovery manager’s dispatch machinery; see cubrid-recovery-manager-detail.md for that and take it as given here. The rows are {index, name, undofun, redofun, undo_dump, redo_dump}:

// RV_fun -- src/transaction/recovery.c
{RVVAC_COMPLETE,
"RVVAC_COMPLETE",
NULL,
vacuum_rv_redo_vacuum_complete,
NULL, NULL},
// ... condensed: RVVAC_START_JOB (105) ... RVVAC_DROPPED_FILE_REPLACE (114) ...
{RVVAC_HEAP_RECORD_VACUUM,
"RVVAC_HEAP_RECORD_VACUUM",
vacuum_rv_undo_vacuum_heap_record,
vacuum_rv_redo_vacuum_heap_record,
NULL,
log_rv_dump_hexa},
// ... condensed: RVVAC_HEAP_PAGE_VACUUM (116), RVVAC_REMOVE_OVF_INSID (117) ...

All fourteen indices (104-117 in recovery.h), their handlers, and what each body does:

IndexUndo fnRedo fnEffect (branches)
RVVAC_COMPLETE (104)vacuum_rv_redo_vacuum_completeRestore oldest_unvacuumed_mvccid from the logged mvcc_next_id; reset the log-header vacuum cache. Written only by xvacuum (§11.6); doubles as the backward-search terminator in §11.4.
RVVAC_START_JOB (105)vacuum_rv_redo_start_jobset_job_in_progress () on the entry at rcv->offset — recreating the IN_PROGRESS state that boot then rewrites (§11.3). Logged by vacuum_job_cursor::start_job_on_current_entry, and only when the entry was not already interrupted (re-logging an interrupted job adds nothing).
RVVAC_DATA_APPEND_BLOCKS (106)vacuum_rv_redo_append_datamemcpy of N entries at index_free (asserts rcv->offset == index_free), bump index_free. The Ch 4 producer’s persistence.
RVVAC_DATA_INIT_NEW_PAGE (107)vacuum_rv_redo_initialize_data_pageReformat page, seed data->blockid with the logged watermark. Logged by vacuum_init_data_page_with_last_blockid — page growth (Ch 4), watermark persistence and copydb reset (§11.7).
RVVAC_DATA_SET_LINK (108)vacuum_rv_undoredo_data_set_linksame fnTwo branches: rcv->data == NULLVPID_SET_NULL (next_page), else copy the VPID. Logged undo+redo when chaining a new page (Ch 4) or unlinking a consumed one (Ch 9).
RVVAC_DATA_FINISHED_BLOCKS (109)vacuum_rv_redo_data_finishedReplays Ch 9’s mark-finished; excerpt below.
RVVAC_NOTIFY_DROPPED_FILE (110)vacuum_rv_notify_dropped_filesame fnLogical record (in RCV_IS_LOGICAL_LOG): re-add the file with mvccid = log_Gl.hdr.mvcc_next_id as drop boundary, notify all workers, and (if class_oid non-null) evict the HFID cache. Appended as postpone on file destroy and as undo on file create — drop-at-commit and drop-at-abort respectively (Ch 10).
RVVAC_DROPPED_FILE_CLEANUP (111)vacuum_rv_redo_cleanup_dropped_filesmemmove-delete each logged position from a dropped-files page, decrement n_dropped_files.
RVVAC_DROPPED_FILE_NEXT_PAGE (112)vacuum_rv_set_next_page_dropped_filessame fnUnconditional next_page link write from the before/after image.
RVVAC_DROPPED_FILE_ADD (113)vacuum_rv_undo_add_dropped_filevacuum_rv_redo_add_dropped_fileRedo: insert at rcv->offset (memmove to make room unless appending; position > n_dropped_files is ER_FAILED), then forward log_Gl.hdr.mvcc_next_id if the entry’s MVCCID is not behind it. Undo: memmove-delete at the position (position >= n_dropped_files is ER_FAILED).
RVVAC_DROPPED_FILE_REPLACE (114)vacuum_rv_replace_dropped_filesame fnOverwrite the entry’s MVCCID at rcv->offset; position out of range or VFID mismatch is ER_FAILED.
RVVAC_HEAP_RECORD_VACUUM (115)vacuum_rv_undo_vacuum_heap_recordvacuum_rv_redo_vacuum_heap_recordRedo: spage_vacuum_slot (slotid and reusable packed in rcv->offset under VACUUM_LOG_VACUUM_HEAP_MASK), then spage_compact if needed. Undo: strip the mask bits and delegate to heap_rv_redo_insert — re-inserting the record the worker removed (Ch 8).
RVVAC_HEAP_PAGE_VACUUM (116)vacuum_rv_redo_vacuum_heap_pageWhole-page replay: n_slots / reusable / all_vacuumed packed in rcv->offset. n_slots == 0 → header-only “once to none” status transition. Else per slot: negative slotid → record fully removed, spage_vacuum_slot; positive → only the insert MVCCID was vacuumed, strip OR_MVCC_FLAG_VALID_INSID | PREV_VERSION and rebuild the record in place. Then spage_compact if needed; all_vacuumed → status to none (Ch 8).
RVVAC_REMOVE_OVF_INSID (117)vacuum_rv_redo_remove_ovf_insidOverflow-page MVCC header: insid := MVCCID_ALL_VISIBLE, prev_version_lsa := null. The fixed-size-header asymmetry of Ch 8.

(RVES_NOTIFY_VACUUM, the external-storage neighbor row, maps undo and redo to vacuum_rv_es_nop — an explicit no-op.)

Invariant 11-B — vacuum data is redo-only state. Every RVVAC row targeting a vacuum data page (104-107, 109) has a NULL undo function; only RVVAC_DATA_SET_LINK is undoable, and its undo is the same idempotent link-write applied to the before-image. Vacuum data is mutated exclusively by master/system threads, never rolled back record-by-record: a torn mutation is healed going forward — redo replays it, then §11.3-§11.4 reconcile. The undoable rows all target other structures (dropped-files pages, heap pages) touched inside abortable operations.

The structurally richest body is RVVAC_DATA_FINISHED_BLOCKS, which reproduces Ch 9’s two-phase mark-and-compact on one page:

// vacuum_rv_redo_data_finished -- src/query/vacuum.c
if (rcv_data_ptr != NULL)
{
while (rcv_data_ptr < (char *) rcv->data + rcv->length)
{
blockid_with_flags = *((VACUUM_LOG_BLOCKID *) rcv_data_ptr);
blockid = VACUUM_BLOCKID_WITHOUT_FLAGS (blockid_with_flags);
// ... condensed: data_index = blockid - page_unvacuumed_blockid + index_unvacuumed ...
if (VACUUM_BLOCK_STATUS_IS_VACUUMED (blockid_with_flags))
data_page->data[data_index].set_vacuumed ();
else
data_page->data[data_index].set_interrupted ();
rcv_data_ptr += sizeof (VACUUM_LOG_BLOCKID);
}
}
while (data_page->index_unvacuumed < data_page->index_free
&& data_page->data[data_page->index_unvacuumed].is_vacuumed ())
data_page->index_unvacuumed++;
if (VPID_ISNULL (&data_page->next_page) && data_page->index_unvacuumed > 0)
{
/* Remove all vacuumed blocks. */
// ... condensed: memmove survivors to front; index_free -= index_unvacuumed; index_unvacuumed = 0 ...
}

Three branches: the flag loop (VACUUMED vs interrupted, exactly the per-block outcome Ch 9 logged), the index_unvacuumed advance, and the last-page-only compaction (next_page null). The loop tolerates rcv->data == NULL — a record can carry no block list and still trigger advance + compaction.

11.3 vacuum_data_load_and_recover — reload, reset, rewind

Section titled “11.3 vacuum_data_load_and_recover — reload, reset, rewind”

Called from vacuum_boot before any worker exists, in both modes — after the redo pass has put every vacuum data page back into its logged state via §11.2’s functions. Figure 11-1 accounts for every branch:

flowchart TD
    A["file_descriptor_get +<br/>fix vpid_first"] -->|error| Z["goto end:<br/>unfix + unload, return error"]
    A --> B["page-walk loop:<br/>for each entry in page"]
    B --> C{"entry->is_job_in_progress?"}
    C -->|yes| D["entry->set_interrupted()<br/>page dirty"]
    C -->|no| E[next entry]
    D --> E
    E --> F{"next_page VPID null?"}
    F -->|no, fix fails| Z
    F -->|no| B
    F -->|yes| G["last_page = data_page"]
    G --> H{"vacuum_is_empty()?"}
    H -->|yes| I{"logpb_last_complete_blockid < 0?"}
    I -->|yes: fresh copydb| J["do not touch last_blockid"]
    I -->|no| K{"recovery_lsa null AND<br/>hdr.mvcc_op_log_lsa null?"}
    K -->|yes: 10.1 compat| L["set_last_blockid(log_blockid)"]
    K -->|no| M["set_last_blockid(MAX(hdr.vacuum_last_blockid,<br/>last_page->data->blockid))"]
    H -->|no| N["set_last_blockid(last entry<br/>of last_page)"]
    J --> O["is_loaded = true;<br/>update_global_oldest_visible"]
    L --> O
    M --> O
    N --> O
    O --> P["vacuum_recover_lost_block_data"]
    P -->|error| Z
    P --> Q["LSA_SET_NULL(recovery_lsa);<br/>set_oldest_unvacuumed_on_boot;<br/>update_keep_from_log_pageid"]
    Q --> R["save vpid_first/vpid_last into<br/>vacuum_Data_load; unfix pages"]

Figure 11-1: vacuum_data_load_and_recover, branch-complete.

The IN_PROGRESS reset touches Ch 1’s status flags directly:

// vacuum_data_load_and_recover -- src/query/vacuum.c
if (entry->is_job_in_progress ())
{
/* Reset in progress flag, mark the job as interrupted and update last_blockid. */
entry->set_interrupted (); /* <- STATUS_SET_AVAILABLE + SET_INTERRUPTED */
is_page_dirty = true;
}

Invariant 11-C — IN_PROGRESS never survives a restart. A job claimed by a worker that died with the server (whether the claim came from runtime state or from a replayed RVVAC_START_JOB) must be re-runnable, so the page walk rewrites it to AVAILABLE + INTERRUPTED. INTERRUPTED is what relaxes vacuum_heap_page’s safe-guards (Ch 8): the block may have been partially executed, so “record already vacuumed” is no longer an anomaly. Resetting to plain AVAILABLE instead would re-arm assertions on the second pass over half-cleaned pages.

When vacuum data is empty, last_blockid is reconstructed three ways. log_blockid < 0 means the log has not filled its first block (“one case may be soon after copydb” — pairs with §11.7). Both recovery_lsa and log_Gl.hdr.mvcc_op_log_lsa null is an explicit 10.1-compat path trusting logpb_last_complete_blockid (). The default takes MAX (log_Gl.hdr.vacuum_last_blockid, vacuum_Data.last_page->data->blockid) because after a long SA session the on-page blockid “will be outdated. Instead, SA_MODE updates log_Gl.hdr.vacuum_last_blockid before removing old archives” (§11.7). Non-empty data needs no guessing: the last registered entry is the watermark.

After lost-block recovery, vacuum_data::set_oldest_unvacuumed_on_boot seeds Ch 5’s watermark: empty data → log_Gl.hdr.oldest_visible_mvccid (first re-initialized from mvcc_next_id if no block needs vacuum); otherwise the first entry’s oldest_visible_mvccid, which lower-bounds all others. Finally — Ch 2’s ownership model — the pages cannot stay fixed by the boot thread: both VPIDs are stashed into vacuum_Data_load and unfixed via vacuum_data_unload_first_and_last_page; the master (or xvacuum) re-fixes them with vacuum_data_load_first_and_last_page.

11.4 vacuum_recover_lost_block_data — rebuilding blocks the crash swallowed

Section titled “11.4 vacuum_recover_lost_block_data — rebuilding blocks the crash swallowed”

Ch 3 showed a filled block lives only in log_Gl.hdr and vacuum_Block_data_buffer until consumed (Ch 4). A crash loses both; this function reconstructs them from the WAL. Entry branches:

  1. recovery_lsa null → clean boot, return NO_ERROR.
  2. log_Gl.hdr.mvcc_op_log_lsa null → the recovered header forgot the last MVCC op; search backward from recovery_lsa for one.
  3. vacuum_get_log_blockid (mvcc_op_log_lsa.pageid) <= get_last_blockid () → already inside a registered block; logpb_vacuum_reset_log_header_cache and return.
  4. Otherwise → start from log_Gl.hdr.mvcc_op_log_lsa directly.

The branch-2 search walks record headers backward via back_lsa, bounded by stop_at_pageid = VACUUM_LAST_LOG_PAGEID_IN_BLOCK (get_last_blockid ()), with three terminators:

// vacuum_recover_lost_block_data -- src/query/vacuum.c
if (log_rec_header.type == LOG_MVCC_UNDO_DATA || log_rec_header.type == LOG_MVCC_UNDOREDO_DATA
|| log_rec_header.type == LOG_MVCC_DIFF_UNDOREDO_DATA)
{
LSA_COPY (&mvcc_op_log_lsa, &log_lsa); /* <- found the chain tail */
break;
}
else if (log_rec_header.type == LOG_SYSOP_END)
{
// ... condensed: hit if sysop_end->type == LOG_SYSOP_END_LOGICAL_MVCC_UNDO ...
}
else if (log_rec_header.type == LOG_REDO_DATA)
{
// ... condensed: break WITHOUT a hit if redo->data.rcvindex == RVVAC_COMPLETE ...
}
LSA_COPY (&log_lsa, &log_rec_header.back_lsa);

RVVAC_COMPLETE is the SA-mode “all clean” marker written by xvacuum (§11.6): anything older is vacuumed, so the search ends with mvcc_op_log_lsa still null → “nothing to recovery” → NO_ERROR. The same null check covers reaching stop_at_pageid. Any logpb_fetch_page failure here (and in the main loop) is logpb_fatal_error — recovery cannot proceed on an unreadable log.

The main loop rebuilds one VACUUM_DATA_ENTRY per block, newest to oldest, following each record’s prev_mvcc_op_log_lsa chain (the same chain workers walk in Ch 7) through vacuum_process_log_record:

// vacuum_recover_lost_block_data -- src/query/vacuum.c
std::stack<VACUUM_DATA_ENTRY> vacuum_block_data_buffer_stack;
/* we don't reset data.oldest_visible_mvccid between blocks. we need to maintain ordered
* oldest_visible_mvccid's ... */
data.oldest_visible_mvccid = MVCCID_NULL;
while (crt_blockid > vacuum_Data.get_last_blockid ())
{
// ... condensed: inner loop folds each record's mvccid into oldest/newest ...
if (data.blockid == vacuum_get_log_blockid (log_Gl.prior_info.prior_lsa.pageid))
{
/* <- the still-open block: restore header cache instead of registering */
log_Gl.hdr.oldest_visible_mvccid = data.oldest_visible_mvccid;
log_Gl.hdr.newest_block_mvccid = data.newest_mvccid;
log_Gl.hdr.does_block_need_vacuum = true;
log_Gl.hdr.mvcc_op_log_lsa = mvcc_op_log_lsa;
}
else
{
vacuum_block_data_buffer_stack.push (data);
}
crt_blockid = vacuum_get_log_blockid (log_lsa.pageid);
}

Two subtleties:

  • The still-open block is not registered. If the newest block contains the append point (log_Gl.prior_info.prior_lsa), registering it would violate Ch 4’s complete-blocks-only rule. Instead the four header fields Ch 3 maintains are restored — the block is produced when it fills, or consumed by xvacuum’s partial pass (§11.6). The header cache was reset just before the loop (“info will be restored if last block is not consumed”).
  • Blockid-sorted replay via the stack. The scan visits newest→oldest, but vacuum_consume_buffer_log_blocks (Ch 4) assumes ascending blockids. Popping the std::stack produces oldest→newest into vacuum_Block_data_buffer; the guard crt_blockid > get_last_blockid () guarantees no overlap with already-registered blocks.

Invariant 11-D — recovered blocks enter vacuum data in ascending blockid order, gap-free, strictly above last_blockid. Enforced by the stack reversal plus the loop guard. Violation corrupts Ch 1’s dense-array addressing (the verifier asserts entry->get_blockid () == (entry - 1)->get_blockid () + 1, §11.8).

Not resetting data.oldest_visible_mvccid between blocks (quoted comment) re-establishes Ch 5’s monotonicity: an MVCCID active while a newer block was logged must also lower-bound older blocks. Consumption needs master identity, so the function wraps vacuum_consume_buffer_log_blocks in vacuum_convert_thread_to_master / vacuum_restore_thread — the only thread conversion outside the daemons and xvacuum.

11.5 vacuum_rv_check_at_undo — the one vacuum hook inside rollback

Section titled “11.5 vacuum_rv_check_at_undo — the one vacuum hook inside rollback”

Heap undo recovery (heap_rv_undo_delete, heap_rv_undo_update, heap_rv_undo_ovf_update in heap_file.c) restores a record’s before-image — including an MVCC header that vacuum may since have been entitled to clean. vacuum_rv_check_at_undo rewrites the restored record to be “valid in terms of vacuuming”. Branches in order: read the header (heap_get_mvcc_rec_header_from_overflow for REC_BIGONE, else spage_get_record COPY + or_mvcc_get_header; each failure is assert_release + ER_FAILED), then decide:

// vacuum_rv_check_at_undo -- src/query/vacuum.c
if (log_is_in_crash_recovery ())
{
/* always clear flags when recovering from crash - all the objects are visible anyway */
if (MVCC_IS_FLAG_SET (&rec_header, OR_MVCC_FLAG_VALID_INSID))
can_vacuum = VACUUM_RECORD_DELETE_INSID_PREV_VER;
else
can_vacuum = VACUUM_RECORD_CANNOT_VACUUM;
}
else
{
/* <- runtime rollback: ask the real oracle, the Ch 5 watermark */
can_vacuum = mvcc_satisfies_vacuum (thread_p, &rec_header, log_Gl.mvcc_table.get_global_oldest_visible ());
}
/* it is impossible to restore a record that should be removed by vacuum */
assert (can_vacuum != VACUUM_RECORD_REMOVE);

During crash recovery every undone transaction is doomed, so all surviving versions are visible and any valid insid is flattened unconditionally; at runtime the decision defers to mvcc_satisfies_vacuum. The REMOVE assert is the safety claim: a record vacuum would delete outright requires a committed-and-old deleter, which cannot simultaneously be the uncommitted transaction being undone.

On VACUUM_RECORD_DELETE_INSID_PREV_VER: for REC_BIGONE, set insid to MVCCID_ALL_VISIBLE, null prev_version_lsa, write back via heap_set_mvcc_rec_header_on_overflow; otherwise clear OR_MVCC_FLAG_VALID_INSID | OR_MVCC_FLAG_VALID_PREV_VERSION, or_mvcc_set_header, spage_update. Both paths end with pgbuf_set_dirty; the asymmetry mirrors Ch 8 — overflow headers are fixed-size (values neutralized), heap headers shrink (flags dropped).

Invariant 11-E — after undo, a record header never carries MVCC metadata that vacuum has already passed by. Enforced by this hook in all three heap undo paths. If skipped, a snapshot could chase prev_version_lsa into log pages already reclaimed (Ch 9), or mvcc_satisfies_vacuum would re-classify an already-cleaned record. (tde.c deliberately formats keyinfo records to dodge this rewrite — see its “HACK” comments.)

11.6 SA mode: xvacuum, vacuum_sa_run_job, and the partial block

Section titled “11.6 SA mode: xvacuum, vacuum_sa_run_job, and the partial block”

In SA mode there is no daemon; xvacuum compresses the whole lifecycle into one synchronous call (under SERVER_MODE it returns ER_VACUUM_CS_NOT_AVAILABLE). Figure 11-2:

flowchart TD
    A{"PRM_ID_DISABLE_VACUUM or<br/>is_vacuum_complete?"} -->|yes| B[return NO_ERROR]
    A -->|no| C["convert thread to master;<br/>load first/last pages;<br/>cursor.set_on_vacuum_data_start + load"]
    C --> D{"Block_data_buffer<br/>not empty?"}
    D -->|yes| E[cursor.force_data_update]
    D -->|no| F{cursor.is_valid?}
    E --> F
    F -->|yes| G{"logtb_is_interrupted?"}
    G -->|yes| H["cursor.unload; vacuum_Data.update;<br/>return NO_ERROR"]
    G -->|no| I{"entry.is_available?"}
    I -->|yes| J["start_job_on_current_entry;<br/>vacuum_sa_run_job(entry, false)"]
    I -->|no: vacuumed| K[skip]
    J --> L["increment_blockid"]
    K --> L
    L --> M{"new block, finished queue full,<br/>or cursor exhausted?"}
    M -->|yes| N[cursor.force_data_update]
    M -->|no| F
    N --> F
    F -->|no: data empty| O{"hdr.does_block_need_vacuum?"}
    O -->|yes| P["build partial_entry from log_Gl.hdr;<br/>disable interrupt;<br/>vacuum_sa_run_job(entry, true)"]
    O -->|no| Q["oldest_unvacuumed = mvcc_next_id;<br/>log RVVAC_COMPLETE; flush;<br/>cleanup dropped files; reset hdr cache;<br/>vacuum_finalize; is_vacuum_complete = true"]
    P --> Q

Figure 11-2: xvacuum, branch-complete. The interrupt exit leaves vacuum data consistent for the next invocation.

The loop is Ch 6’s master loop re-implemented inline with the same vacuum_job_cursor, except jobs run immediately in this thread. vacuum_sa_run_job performs the double conversion:

// vacuum_sa_run_job -- src/query/vacuum.c
VACUUM_WORKER *worker_p = vacuum_Worker_entry_manager->claim_worker ();
thread_type save_type = thread_type::TT_NONE;
vacuum_convert_thread_to_worker (thread_p, worker_p, save_type);
assert (save_type == thread_type::TT_VACUUM_MASTER); /* <- caller must be master */
VACUUM_DATA_ENTRY copy_data_entry = data_entry; /* <- worker mutates its copy */
vacuum_process_log_block (thread_p, &copy_data_entry, is_partial);
vacuum_convert_thread_to_master (thread_p, save_type);
// ... condensed: retire_worker, perf tracking ...

The is_partial == true call is unique to SA mode: when the cursor is exhausted but log_Gl.hdr.does_block_need_vacuum is set (the still-open block of Ch 3, possibly restored by §11.4), xvacuum builds an entry straight from the header (vacuum_data_entry::vacuum_data_entry (const log_header &)) and runs it with interrupts off (logtb_set_check_interrupt (thread_p, false) — the header flag was already cleared, so an abort would lose the block). Inside vacuum_process_log_block, sa_mode_partial_block flips three things:

  1. No prefetchvacuum_log_prefetch_vacuum_block is skipped: “block is not entirely logged and we cannot prefetch it”.
  2. Forced interrupted semanticswas_interrupted = data->was_interrupted () || sa_mode_partial_block; because interruptions are usually marked in the blockid, but a partial block carries no flag.
  3. No completion bookkeeping — at end:, vacuum_finished_block_vacuum (Ch 9) runs only if (!sa_mode_partial_block); a partial block was never in vacuum data, so there is nothing to mark.

After the partial pass, xvacuum declares total victory: vacuum_Data.oldest_unvacuumed_mvccid = log_Gl.hdr.mvcc_next_id, then logs RVVAC_COMPLETE (carrying mvcc_next_id as redo data) against the first vacuum data page and force-flushes. That record is exactly §11.4’s backward-search terminator — it certifies no unvacuumed MVCC op exists at or before this LSA. vacuum_cleanup_dropped_files (Ch 10), logpb_vacuum_reset_log_header_cache, vacuum_finalize, and is_vacuum_complete = true (making a second xvacuum a no-op) close the pass.

11.7 SA bookkeeping: vacuum_sa_reflect_last_blockid and vacuum_reset_data_after_copydb

Section titled “11.7 SA bookkeeping: vacuum_sa_reflect_last_blockid and vacuum_reset_data_after_copydb”

A long SA session consumes log without registering blocks; on shutdown (xboot_shutdown_server, SA-only block in boot_sr.c), vacuum_sa_reflect_last_blockid persists the watermark so the next boot’s empty-data branch (§11.3) does not regress it. Early returns: VPID_ISNULL (&vacuum_Data_load.vpid_first) — fresh or aborted boot; vacuum_Data.is_restoredb_session — “restoredb doesn’t vacuum; we cannot do this here” (the flag is vacuum_initialize’s is_restore parameter); logpb_last_complete_blockid () == VACUUM_NULL_LOG_BLOCKID — unload and return. Otherwise:

// vacuum_sa_reflect_last_blockid -- src/query/vacuum.c
vacuum_Data.set_last_blockid (last_blockid);
log_Gl.hdr.vacuum_last_blockid = last_blockid; /* <- the MAX() source in section 11.3 */
vacuum_data_empty_update_last_blockid (thread_p); /* <- persists into the empty first page */

vacuum_data_empty_update_last_blockid asserts vacuum_is_empty () (single page, index_unvacuumed == index_free, both zero) and rewrites the page via vacuum_init_data_page_with_last_blockid, which logs RVVAC_DATA_INIT_NEW_PAGE redo data (§11.2) — so the persisted watermark itself survives a crash.

vacuum_reset_data_after_copydb handles the other identity discontinuity: a copied database carries vacuum data whose blockids reference the source log. On first boot after copydb (boot_after_copydb, gated on log_Gl.hdr.was_copied), it fixes the first page, asserts emptiness (VPID_ISNULL (next_page), index_free == 0 — copydb requires a fully vacuumed source), and reinitializes with vacuum_init_data_page_with_last_blockid (..., VACUUM_NULL_LOG_BLOCKID). That null watermark later trips the log_blockid < 0 “soon after copydb” branch in vacuum_data_load_and_recover.

11.8 The debug verifiers — the document’s invariant catalogue

Section titled “11.8 The debug verifiers — the document’s invariant catalogue”

vacuum_verify_vacuum_data_debug (under !NDEBUG, reached through the VACUUM_VERIFY_VACUUM_DATA macro at the end of vacuum_data_mark_finished and of vacuum_consume_buffer_log_blocks) walks every page and asserts, in effect, every structural claim this document has made:

Assert (condensed)Invariant restatedChapter
(first_page == last_page) == VPID_ISNULL (first_page->next_page)page chain has no dangling tailCh 1
0 <= index_unvacuumed <= index_free <= page_data_max_countper-page cursor sanityCh 1
is_vacuumed () ==> i != index_unvacuumedindex_unvacuumed always names a live entryCh 9
entry->oldest_visible_mvccid <= get_global_oldest_visible ()no block claims an oldest above the live watermarkCh 5
oldest_unvacuumed_mvccid <= entry->oldest_visible_mvccidboot/update watermark lower-bounds all entriesCh 5, §11.3
entry->get_blockid () <= get_last_blockid ()m_last_blockid upper-bounds registered blocksCh 4
vacuum_get_log_blockid (start_lsa.pageid) == get_blockid ()start_lsa lies inside its own blockCh 3
ascending oldest_visible_mvccid across unvacuumed entriesmonotone oldest chain (why §11.4 never resets it)Ch 5
entry->get_blockid () == (entry - 1)->get_blockid () + 1dense, gap-free blockid arrayCh 1, 11-D
in_progress_distance > 500 → warning onlyjob-leak heuristic, “vacuum is behind or blocked”Ch 6

The last row was once an assertion, demoted to vacuum_er_log_warning (“It was an assertion but we have not seen a case that vacuum is blocked”) — far-behind IN_PROGRESS entries indicate leaked jobs, not corruption.

Its companion vacuum_verify_vacuum_data_page_fix_count checks Ch 2’s fix discipline at five quiescent points (end of vacuum_data_load_and_recover, end of the master task loop, mid-xvacuum, and right after each VACUUM_VERIFY_VACUUM_DATA site above): first and last page each at fix count exactly 1, and pgbuf_get_hold_count exactly 1 (single-page data) or 2 — anything else means a leaked fix in Ch 4/6/9’s page hand-offs.

Finally xvacuum_dump (the utility entry point) is the observability twin: it prints vacuum_min_log_pageid_to_keep (Ch 9’s reclamation floor) and resolves whether that page lives in the active log or which archive (logpb_is_page_in_archive / logpb_get_archive_number). The degenerate branches do not fail — vacuum_Is_booted false or a NULL_PAGEID floor print “vacuum did not boot properly”, a negative archive number prints a bare newline — since it runs against possibly-broken servers.

  1. Vacuum’s crash-recovery contract with ARIES is a single LSA: vacuum_notify_server_crashed records the recovery start before analysis; vacuum_data_load_and_recover consumes and clears it at boot (Invariant 11-A).
  2. Fourteen RVVAC_* indices (104-117) in RV_fun replay every page-level vacuum mutation during the ordinary redo pass; the rows targeting vacuum data pages are redo-only — vacuum data is never rolled back, only reconciled forward at boot (Invariant 11-B). Dispatch itself is the recovery manager’s job (cubrid-recovery-manager-detail.md).
  3. IN_PROGRESS is volatile: the boot page-walk rewrites it (including any state recreated by RVVAC_START_JOB redo) to AVAILABLE + INTERRUPTED, which tells re-execution to tolerate half-vacuumed pages (Invariant 11-C).
  4. vacuum_recover_lost_block_data rebuilds blocks that existed only in the WAL: backward search to the chain tail (terminated by RVVAC_COMPLETE or registered data), per-block chain walks, a std::stack restoring ascending-blockid order (Invariant 11-D), and header-cache restoration — not registration — for the still-open block.
  5. vacuum_rv_check_at_undo is the only vacuum logic inside rollback: it flattens insid/prev-version metadata on restored records — unconditionally in crash recovery, watermark-driven at runtime — so undo never resurrects vacuum-skipped headers (Invariant 11-E).
  6. SA mode replays the lifecycle synchronously in xvacuum (per-job worker conversion in vacuum_sa_run_job; an uninterruptible partial-block pass that skips prefetch, forces interrupted semantics, and skips completion bookkeeping; RVVAC_COMPLETE certifies the log fully vacuumed), while vacuum_sa_reflect_last_blockid and vacuum_reset_data_after_copydb protect the watermark across shutdown and copydb — and the §11.8 verifiers are the executable regression checklist for all of it.

The following are line numbers as observed on 2026-06-17; symbols are the canonical anchor and line numbers are hints that decay.

SymbolFileLine
resource_shared_poolsrc/base/resource_shared_pool.hpp29
VACUUM_FIRST_LOG_PAGEID_IN_BLOCKsrc/query/vacuum.c81
VACUUM_LAST_LOG_PAGEID_IN_BLOCKsrc/query/vacuum.c84
vacuum_data_entrysrc/query/vacuum.c104
VACUUM_DATA_ENTRY_FLAG_MASKsrc/query/vacuum.c135
VACUUM_DATA_ENTRY_BLOCKID_MASKsrc/query/vacuum.c137
VACUUM_BLOCK_STATUS_MASKsrc/query/vacuum.c141
VACUUM_BLOCK_FLAG_INTERRUPTEDsrc/query/vacuum.c146
VACUUM_BLOCKID_WITHOUT_FLAGSsrc/query/vacuum.c150
vacuum_data_pagesrc/query/vacuum.c194
VACUUM_DATA_PAGE_HEADER_SIZEsrc/query/vacuum.c212
vacuum_fix_data_pagesrc/query/vacuum.c223
vacuum_unfix_data_pagesrc/query/vacuum.c236
vacuum_unfix_first_and_last_data_pagesrc/query/vacuum.c255
vacuum_job_cursorsrc/query/vacuum.c277
vacuum_shutdown_sequencesrc/query/vacuum.c319
vacuum_datasrc/query/vacuum.c350
oldest_unvacuumed_mvccidsrc/query/vacuum.c356
vacuum_set_dirty_data_pagesrc/query/vacuum.c423
vacuum_data_loadsrc/query/vacuum.c442
vacuum_Data_loadsrc/query/vacuum.c447
vacuum_Mastersrc/query/vacuum.c456
vacuum_Block_data_buffersrc/query/vacuum.c467
VACUUM_BLOCK_DATA_BUFFER_CAPACITYsrc/query/vacuum.c469
vacuum_Finished_job_queuesrc/query/vacuum.c475
VACUUM_PREFETCH_LOG_BLOCK_BUFFER_PAGESsrc/query/vacuum.c479
VACUUM_MAX_TASKS_IN_WORKER_POOLsrc/query/vacuum.c482
VACUUM_FINISHED_JOB_QUEUE_CAPACITYsrc/query/vacuum.c485
VACUUM_WORKER_INDEX_TO_TRANIDsrc/query/vacuum.c490
vacuum_Workerssrc/query/vacuum.c498
vacuum_heap_helpersrc/query/vacuum.c504
VACUUM_DEFAULT_HEAP_OBJECT_BUFFER_SIZEsrc/query/vacuum.c561
vacuum_Dropped_files_loadedsrc/query/vacuum.c567
vacuum_Dropped_files_countsrc/query/vacuum.c576
vacuum_dropped_filesrc/query/vacuum.c580
vacuum_dropped_files_pagesrc/query/vacuum.c588
VACUUM_DROPPED_FILES_PAGE_CAPACITYsrc/query/vacuum.c602
VACUUM_DROPPED_FILE_FLAG_DUPLICATEsrc/query/vacuum.c610
vacuum_track_dropped_filessrc/query/vacuum.c640
vacuum_Track_dropped_filessrc/query/vacuum.c645
vacuum_Dropped_files_versionsrc/query/vacuum.c650
vacuum_Last_dropped_vfidsrc/query/vacuum.c652
vacuum_dropped_files_rcv_datasrc/query/vacuum.c655
vacuum_Is_bootedsrc/query/vacuum.c661
vacuum_init_thread_contextsrc/query/vacuum.c766
vacuum_master_entry_managersrc/query/vacuum.c783
vacuum_master_tasksrc/query/vacuum.c813
m_oldest_visible_mvccidsrc/query/vacuum.c834
vacuum_worker_entry_managersrc/query/vacuum.c843
vacuum_worker_tasksrc/query/vacuum.c916
vacuum_sa_run_jobsrc/query/vacuum.c949
xvacuumsrc/query/vacuum.c979
xvacuum_dumpsrc/query/vacuum.c1121
vacuum_initializesrc/query/vacuum.c1180
vacuum_bootsrc/query/vacuum.c1291
vacuum_stop_workerssrc/query/vacuum.c1363
vacuum_stop_mastersrc/query/vacuum.c1390
vacuum_finalizesrc/query/vacuum.c1416
vacuum_heapsrc/query/vacuum.c1494
vacuum_heap_pagesrc/query/vacuum.c1577
vacuum_heap_prepare_recordsrc/query/vacuum.c1925
vacuum_heap_record_insid_and_prev_versionsrc/query/vacuum.c2195
vacuum_heap_recordsrc/query/vacuum.c2361
vacuum_heap_get_hfid_and_file_typesrc/query/vacuum.c2513
vacuum_heap_page_log_and_resetsrc/query/vacuum.c2587
vacuum_log_vacuum_heap_pagesrc/query/vacuum.c2651
vacuum_rv_redo_vacuum_heap_pagesrc/query/vacuum.c2720
vacuum_log_remove_ovf_insidsrc/query/vacuum.c2856
vacuum_rv_redo_remove_ovf_insidsrc/query/vacuum.c2869
vacuum_produce_log_block_datasrc/query/vacuum.c2905
vacuum_data_load_first_and_last_pagesrc/query/vacuum.c2948
vacuum_data_unload_first_and_last_pagesrc/query/vacuum.c2979
vacuum_master_task::executesrc/query/vacuum.c3002
vacuum_master_task::check_shutdownsrc/query/vacuum.c3077
vacuum_master_task::is_task_queue_fullsrc/query/vacuum.c3089
vacuum_master_task::should_interrupt_iterationsrc/query/vacuum.c3100
vacuum_master_task::is_cursor_entry_ready_to_vacuumsrc/query/vacuum.c3106
vacuum_master_task::is_cursor_entry_availablesrc/query/vacuum.c3136
vacuum_master_task::start_job_on_cursor_entrysrc/query/vacuum.c3155
vacuum_master_task::should_force_data_updatesrc/query/vacuum.c3165
vacuum_master_task::decrease_outstanding_jobsrc/query/vacuum.c3188
vacuum_rv_redo_vacuum_completesrc/query/vacuum.c3221
vacuum_process_log_blocksrc/query/vacuum.c3251
vacuum_worker_allocate_resourcessrc/query/vacuum.c3620
vacuum_finalize_workersrc/query/vacuum.c3689
vacuum_finished_block_vacuumsrc/query/vacuum.c3724
vacuum_read_log_alignedsrc/query/vacuum.c3797
vacuum_read_log_add_alignedsrc/query/vacuum.c3823
vacuum_read_advance_when_doesnt_fitsrc/query/vacuum.c3838
vacuum_copy_data_from_logsrc/query/vacuum.c3859
vacuum_process_log_recordsrc/query/vacuum.c3906
vacuum_get_worker_min_dropped_files_versionsrc/query/vacuum.c4135
vacuum_compare_blockidssrc/query/vacuum.c4166
vacuum_data_load_and_recoversrc/query/vacuum.c4183
vacuum_load_dropped_files_from_disksrc/query/vacuum.c4349
vacuum_create_file_for_vacuum_datasrc/query/vacuum.c4445
vacuum_data_initialize_new_pagesrc/query/vacuum.c4498
vacuum_rv_redo_initialize_data_pagesrc/query/vacuum.c4520
vacuum_create_file_for_dropped_filessrc/query/vacuum.c4544
vacuum_is_work_in_progresssrc/query/vacuum.c4594
vacuum_data_mark_finishedsrc/query/vacuum.c4621
vacuum_data_empty_pagesrc/query/vacuum.c4832
vacuum_rv_redo_data_finishedsrc/query/vacuum.c4986
vacuum_rv_redo_data_finished_dumpsrc/query/vacuum.c5055
vacuum_consume_buffer_log_blockssrc/query/vacuum.c5096
vacuum_rv_undoredo_data_set_linksrc/query/vacuum.c5361
vacuum_rv_redo_append_datasrc/query/vacuum.c5411
vacuum_recover_lost_block_datasrc/query/vacuum.c5465
vacuum_get_log_blockidsrc/query/vacuum.c5702
vacuum_min_log_pageid_to_keepsrc/query/vacuum.c5722
vacuum_is_safe_to_remove_archivessrc/query/vacuum.c5747
vacuum_rv_redo_start_jobsrc/query/vacuum.c5760
vacuum_update_keep_from_log_pageidsrc/query/vacuum.c5782
vacuum_compare_dropped_filessrc/query/vacuum.c5820
vacuum_add_dropped_filesrc/query/vacuum.c5846
vacuum_log_add_dropped_filesrc/query/vacuum.c6121
vacuum_rv_redo_add_dropped_filesrc/query/vacuum.c6167
vacuum_rv_undo_add_dropped_filesrc/query/vacuum.c6235
vacuum_rv_replace_dropped_filesrc/query/vacuum.c6269
vacuum_notify_all_workers_dropped_filesrc/query/vacuum.c6335
vacuum_rv_notify_dropped_filesrc/query/vacuum.c6391
vacuum_cleanup_dropped_filessrc/query/vacuum.c6438
vacuum_is_file_droppedsrc/query/vacuum.c6587
vacuum_find_dropped_filesrc/query/vacuum.c6609
vacuum_log_cleanup_dropped_filessrc/query/vacuum.c6719
vacuum_rv_redo_cleanup_dropped_filessrc/query/vacuum.c6754
vacuum_dropped_files_set_next_pagesrc/query/vacuum.c6809
vacuum_rv_set_next_page_dropped_filessrc/query/vacuum.c6834
vacuum_compare_heap_objectsrc/query/vacuum.c6862
vacuum_collect_heap_objectssrc/query/vacuum.c6912
vacuum_cleanup_collected_by_vfidsrc/query/vacuum.c6955
vacuum_compare_dropped_files_versionsrc/query/vacuum.c6999
vacuum_verify_vacuum_data_debugsrc/query/vacuum.c7060
vacuum_log_prefetch_vacuum_blocksrc/query/vacuum.c7165
vacuum_fetch_log_pagesrc/query/vacuum.c7215
is_not_vacuumed_and_lostsrc/query/vacuum.c7379
vacuum_get_first_page_dropped_filessrc/query/vacuum.c7449
vacuum_is_mvccid_vacuumedsrc/query/vacuum.c7463
vacuum_log_redoundo_vacuum_recordsrc/query/vacuum.c7486
vacuum_rv_undo_vacuum_heap_recordsrc/query/vacuum.c7524
vacuum_rv_redo_vacuum_heap_recordsrc/query/vacuum.c7539
vacuum_notify_server_crashedsrc/query/vacuum.c7570
vacuum_notify_server_shutdownsrc/query/vacuum.c7582
vacuum_verify_vacuum_data_page_fix_countsrc/query/vacuum.c7595
vacuum_rv_check_at_undosrc/query/vacuum.c7627
vacuum_is_emptysrc/query/vacuum.c7731
vacuum_sa_reflect_last_blockidsrc/query/vacuum.c7749
vacuum_data_empty_update_last_blockidsrc/query/vacuum.c7783
vacuum_convert_thread_to_mastersrc/query/vacuum.c7807
vacuum_convert_thread_to_workersrc/query/vacuum.c7830
vacuum_restore_threadsrc/query/vacuum.c7856
vacuum_rv_es_nopsrc/query/vacuum.c7876
vacuum_notify_es_deletedsrc/query/vacuum.c7895
vacuum_check_shutdown_interruptionsrc/query/vacuum.c7936
vacuum_reset_data_after_copydbsrc/query/vacuum.c7951
vacuum_init_data_page_with_last_blockidsrc/query/vacuum.c7989
vacuum_data::get_last_blockidsrc/query/vacuum.c8019
vacuum_data::get_first_blockidsrc/query/vacuum.c8025
vacuum_data::set_last_blockidsrc/query/vacuum.c8042
vacuum_data::updatesrc/query/vacuum.c8058
vacuum_data::set_oldest_unvacuumed_on_bootsrc/query/vacuum.c8089
vacuum_data::upgrade_oldest_unvacuumedsrc/query/vacuum.c8110
vacuum_data_entry::vacuum_data_entrysrc/query/vacuum.c8119
vacuum_data_entry::was_interruptedsrc/query/vacuum.c8162
vacuum_data_entry::set_vacuumedsrc/query/vacuum.c8168
vacuum_data_entry::set_job_in_progresssrc/query/vacuum.c8175
vacuum_data_entry::set_interruptedsrc/query/vacuum.c8181
vacuum_data_page::get_index_of_blockidsrc/query/vacuum.c8203
vacuum_data_page::get_first_blockidsrc/query/vacuum.c8226
vacuum_job_cursor::start_job_on_current_entrysrc/query/vacuum.c8295
vacuum_job_cursor::force_data_updatesrc/query/vacuum.c8314
vacuum_job_cursor::change_blockidsrc/query/vacuum.c8327
vacuum_job_cursor::readjust_to_vacuum_data_changessrc/query/vacuum.c8363
vacuum_job_cursor::searchsrc/query/vacuum.c8424
vacuum_shutdown_sequence::request_shutdownsrc/query/vacuum.c8465
vacuum_shutdown_sequence::check_shutdown_requestsrc/query/vacuum.c8498
VACUUM_LOG_ADD_DROPPED_FILE_POSTPONEsrc/query/vacuum.h78
VACUUM_LOG_BLOCK_PAGES_DEFAULTsrc/query/vacuum.h82
vacuum_worker_statesrc/query/vacuum.h85
vacuum_heap_objectsrc/query/vacuum.h98
vacuum_workersrc/query/vacuum.h106
VACUUM_WORKERsrc/query/vacuum.h106
drop_files_versionsrc/query/vacuum.h109
VACUUM_MAX_WORKER_COUNTsrc/query/vacuum.h132
btree_prepare_btssrc/storage/btree.c15753
btree_rv_read_keybuf_nocopysrc/storage/btree.c18391
btree_rv_read_keybuf_two_objectssrc/storage/btree.c18455
btree_vacuum_insert_mvccidsrc/storage/btree.c30304
btree_vacuum_objectsrc/storage/btree.c30336
HEAP_RV_FLAG_VACUUM_STATUS_CHANGEsrc/storage/heap_file.c514
xheap_reclaim_addressessrc/storage/heap_file.c6227
heap_rv_undo_deletesrc/storage/heap_file.c16946
heap_rv_undo_updatesrc/storage/heap_file.c16981
heap_page_update_chain_after_mvcc_opsrc/storage/heap_file.c24785
heap_rv_remove_flags_from_offsetsrc/storage/heap_file.c25085
heap_rv_undo_ovf_updatesrc/storage/heap_file.c26059
spage_vacuum_slotsrc/storage/slotted_page.c4857
MVCCID_ALL_VISIBLEsrc/storage/storage_common.h329
entry::claim_system_workersrc/thread/thread_entry.cpp425
entry::retire_system_workersrc/thread/thread_entry.cpp433
xboot_shutdown_serversrc/transaction/boot_sr.c3044
xboot_emergency_patchsrc/transaction/boot_sr.c5292
boot_after_copydbsrc/transaction/boot_sr.c6154
xlocator_upgrade_instances_domainsrc/transaction/locator_sr.c12126
redistribute_partition_datasrc/transaction/locator_sr.c12692
prior_update_header_mvcc_infosrc/transaction/log_append.cpp1320
prior_lsa_next_record_internalsrc/transaction/log_append.cpp1357
prior_lsa_next_recordsrc/transaction/log_append.cpp1553
prior_lsa_next_record_with_locksrc/transaction/log_append.cpp1559
prior_lsa_start_appendsrc/transaction/log_append.cpp1593
prior_lsa_end_appendsrc/transaction/log_append.cpp1652
log_prior_nodesrc/transaction/log_append.hpp91
VACUUM_NULL_LOG_BLOCKIDsrc/transaction/log_common_impl.h54
LOG_SYSTEM_WORKER_FIRST_TRANIDsrc/transaction/log_impl.h185
LOG_ISRESTARTEDsrc/transaction/log_impl.h193
block_global_oldest_active_until_commitsrc/transaction/log_impl.h555
LOG_RESTARTEDsrc/transaction/log_impl.h627
log_completesrc/transaction/log_manager.c5653
log_complete_for_2pcsrc/transaction/log_manager.c5758
log_remove_log_archive_daemon_task::executesrc/transaction/log_manager.c10243
logpb_remove_archive_logs_exceed_limitsrc/transaction/log_page_buffer.c5991
logpb_remove_archive_logssrc/transaction/log_page_buffer.c6213
logpb_backupsrc/transaction/log_page_buffer.c7593
log_vacuum_infosrc/transaction/log_record.hpp192
log_rec_mvcc_undoredosrc/transaction/log_record.hpp202
log_rec_mvcc_undosrc/transaction/log_record.hpp211
log_rec_sysop_endsrc/transaction/log_record.hpp305
log_recoverysrc/transaction/log_recovery.c736
VACUUM_LOG_BLOCKIDsrc/transaction/log_storage.hpp91
log_header::vacuum_last_blockidsrc/transaction/log_storage.hpp153
log_header::mvcc_op_log_lsasrc/transaction/log_storage.hpp166
log_header::oldest_visible_mvccidsrc/transaction/log_storage.hpp167
log_header::newest_block_mvccidsrc/transaction/log_storage.hpp168
log_header::does_block_need_vacuumsrc/transaction/log_storage.hpp173
systdes_claim_tdessrc/transaction/log_system_tran.cpp78
log_system_tdes::log_system_tdessrc/transaction/log_system_tran.cpp104
log_tdes::lock_global_oldest_visible_mvccidsrc/transaction/log_tran_table.c6220
MVCC_IS_REC_INSERTED_SINCE_MVCCIDsrc/transaction/mvcc.c58
MVCC_IS_REC_DELETED_SINCE_MVCCIDsrc/transaction/mvcc.c61
mvcc_satisfies_vacuumsrc/transaction/mvcc.c321
MVCC_IS_HEADER_DELID_VALIDsrc/transaction/mvcc.h87
MVCC_IS_HEADER_INSID_NOT_ALL_VISIBLEsrc/transaction/mvcc.h91
mvcc_satisfies_vacuum_resultsrc/transaction/mvcc.h232
LOG_IS_MVCC_HEAP_OPERATIONsrc/transaction/mvcc.h245
LOG_IS_MVCC_BTREE_OPERATIONsrc/transaction/mvcc.h254
Oldest_active_trackersrc/transaction/mvcc_table.cpp77
mvcctable::advance_oldest_activesrc/transaction/mvcc_table.cpp142
mvcctable::build_mvcc_infosrc/transaction/mvcc_table.cpp226
mvcctable::compute_oldest_visible_mvccidsrc/transaction/mvcc_table.cpp355
mvcctable::complete_mvccsrc/transaction/mvcc_table.cpp465
mvcctable::reset_transaction_lowest_activesrc/transaction/mvcc_table.cpp593
mvcctable::get_global_oldest_visiblesrc/transaction/mvcc_table.cpp611
mvcctable::update_global_oldest_visiblesrc/transaction/mvcc_table.cpp617
mvcctable::lock_global_oldest_visiblesrc/transaction/mvcc_table.cpp632
mvcctable::unlock_global_oldest_visiblesrc/transaction/mvcc_table.cpp638
mvcctablesrc/transaction/mvcc_table.hpp64
m_oldest_visiblesrc/transaction/mvcc_table.hpp118
m_ov_lock_countsrc/transaction/mvcc_table.hpp119
RV_fun (RVVAC_* rows)src/transaction/recovery.c687
RVVAC_* enum valuessrc/transaction/recovery.h156
RVVAC_START_JOBsrc/transaction/recovery.h157
  • cubrid-vacuum.md — the high-level companion. See also cubrid-mvcc-detail.md (the oldest-visible watermark vacuum consumes).
  • Raw analyses under raw/code-analysis/cubrid/storage/vacuum/.
  • Code: src/query/vacuum.{c,h}; watermark coordination in src/transaction/mvcc_table.cpp.
  • Methodology: knowledge/methodology/code-analysis-detail-doc.md.