CUBRID compactdb — Offline Database Compaction and Page Defragmentation Utility

Contents:

Theoretical Background
Common DBMS Design
CUBRID’s Approach
Source Walkthrough
Source verification (as of 2026-05-01)
Beyond CUBRID — Comparative Designs & Research Frontiers
Sources

Theoretical Background

CUBRID has two garbage collectors. The online vacuum subsystem (cubrid-vacuum.md) walks the WAL forward and clears dead MVCC versions in place. Vacuum reclaims slots inside pages but does not reclaim pages from heaps, does not delete obsolete class representations, does not defragment slotted-page free space, and never NULLs out OID columns that have come to point at non-existent objects. Those four problems are the offline compactor’s job.

The accumulated waste compactdb addresses:

Dangling OID references. A column or set element of type DB_TYPE_OID may reference an OID whose target has been deleted and vacuumed. The reference stays in the referrer’s row until somebody rewrites it.
Empty heap pages. A heap page whose every slot has been freed is still a page in the file. The disk manager does not reclaim it; the buffer pool still tracks it; sequential scans still touch it.
Internal slotted-page fragmentation. Even non-empty pages may have free space scattered across the slot array, blocking inserts that need a contiguous run of bytes.
Obsolete class representations. Every ALTER TABLE that changes columns produces a new class representation; the catalog keeps every representation that any heap row still references.

The textbook frame is physical reorganization — every engine that supports row-level deletes ships some form of it. Two design choices shape every concrete implementation:

Online vs offline. Online runs against a live database with fine-grained locks but cannot move objects to lower OIDs; offline can rewrite addresses and rebuild indexes but blocks the workload.
What about the OIDs? Moving a row invalidates every index pointing at it and every column holding its OID. PostgreSQL VACUUM FULL rewrites the heap with new TIDs and rebuilds every index; Oracle ALTER TABLE MOVE rewrites ROWIDs and marks indexes UNUSABLE; InnoDB OPTIMIZE TABLE rebuilds the clustered index. CUBRID picks a third path: it does not relocate user rows, only sweeps empty pages and rewrites references. Every existing OID stays valid, at the cost of giving up physical clustering.

Common DBMS Design

Every relational engine ships some form of offline (or near-offline) compactor; the shapes converge on a small set of recipes.

Postgres VACUUM FULL / CLUSTER rewrites the entire heap into a new file and rebuilds every index from scratch. New heap, no dead tuples, no fragmentation. Cost: AccessExclusiveLock for the duration plus disk space for both copies. CLUSTER adds index-order sorting. Both assign new TIDs to every live tuple.

MySQL InnoDB OPTIMIZE TABLE rebuilds the clustered index (which is the table), and therefore every secondary index, into a fresh tablespace. Online DDL replays a row log to keep readers and concurrent writes alive during most of the operation; the final swap still needs a brief metadata lock.

Oracle ALTER TABLE … MOVE moves a segment to a new (or same) tablespace. New ROWIDs invalidate every index, which must be rebuilt with ALTER INDEX … REBUILD. Oracle 12c’s online DBMS_REDEFINITION keeps the table available; the offline form is still common in maintenance windows.

Where CUBRID sits. Compactdb explicitly preserves OIDs of surviving objects. It does not relocate rows into a new file; it walks heap rows to NULL out dangling OID references (Pass 1), reclaims empty heap pages without moving surviving rows (Pass 2), and defragments slotted-page free space (Pass 3). Because OIDs stay stable, every B+Tree index and every foreign-OID column survives compaction untouched — a sharp departure from the Postgres / Oracle / InnoDB pattern. The flip side is that compactdb cannot bring related rows physically together; that requires a true table rewrite (CUBRID leaves it to user-driven CREATE TABLE AS SELECT plus rename). The utility holds an IX_LOCK on the root class and an X_LOCK on each class while it processes it, releasing per-class locks at iteration boundaries — “near-offline”: other connections may run against other classes, but the class under compaction is exclusively held.

CUBRID’s Approach

The compactor is split between a client-side driver (compactdb_cl.c, compactdb.c, compactdb_common.c under src/executables/) and a server-side worker (compactdb_sr.c under src/storage/). The client opens a normal DB session as user DBA, runs three numbered passes against the chosen classes, and shuts the session down. The server-side boot_compact_* functions enforce a single-instance guard so two compactdb runs cannot interleave.

flowchart TD
  A[compactdb CLI<br/>parse args] --> B[db_login DBA<br/>db_restart database]
  B --> C[compactdb_start<br/>resolve class list]
  C --> D[compact_db_start<br/>server guard via CSECT_COMPACTDB_ONE_INSTANCE]
  D --> E[Pass 1<br/>boot_compact_classes loop]
  E -->|per iteration| E1[server: boot_compact_db<br/>walk class heaps]
  E1 -->|process_object<br/>X_LOCK each instance| E2[NULL dangling OID refs<br/>locator_attribute_info_force]
  E1 -->|delete_old_repr| E3[catalog_drop_old_representations]
  E -->|next class| E
  E --> F[Pass 2<br/>do_reclaim_addresses]
  F --> F1[per class: SCH-M lock<br/>check no other class points to it]
  F1 --> F2[heap_reclaim_addresses HFID<br/>free empty pages]
  F --> G[Pass 3<br/>boot_heap_compact loop]
  G --> G1[heap_compact_pages HFID<br/>defragment slotted pages]
  G --> H[catalog_reclaim_space<br/>file_tracker_reclaim_marked_deleted<br/>only in standalone-only branch]
  H --> I[compact_db_stop<br/>db_shutdown]

The numbering on the passes is from the source itself — the messages printed at each phase are COMPACTDB_MSG_PASS1, _PASS2, _PASS3.

Pass 1 — fix dangling OID references and drop old representations

The driver in compactdb_cl.c::compactdb_start calls boot_compact_classes in a loop, each call processing up to max_processed_space bytes (pages * DB_PAGESIZE, pages ∈ [1, 20]). When the budget is exhausted the call returns; the driver commits and calls again. This bounded-work loop matches the online vacuum’s pattern: keep each transaction short so the server can release locks at iteration boundaries.

The pair (last_processed_class_oid, last_processed_oid) is the resumable cursor: those two OIDs name the exact class and instance the next call must start from. Commit and re-enter to advance; abort (e.g. ER_LK_UNILATERALLY_ABORTED from the lock manager) to retry the same window.

On the server side, boot_compact_db finds the start class via OID_EQ (class_oids + start_index, last_processed_class_oid) and iterates:

// boot_compact_db — src/storage/compactdb_sr.c (condensed)
for (i = start_index; i < n_classes; i++)
  {
    lock_ret =
      lock_object_wait_msecs (thread_p, class_oids + i, oid_Root_class_oid, IX_LOCK, LK_UNCOND_LOCK,
                              class_lock_timeout);
    if (lock_ret != LK_GRANTED)
      {
        total_objects[i] = COMPACTDB_LOCKED_CLASS;
        OID_SET_NULL (last_processed_oid);
        continue;
      }
    if (OID_ISNULL (last_processed_oid))
      initial_last_repr_id[i] = heap_get_class_repr_id (thread_p, class_oids + i);

    if (process_class (thread_p, class_oids + i, &hfid, max_space_to_process, &instance_lock_timeout,
                       &space_to_process, last_processed_oid, total_objects + i, failed_objects + i,
                       modified_objects + i, big_objects + i) != NO_ERROR) { /* rollback */ }

    if (delete_old_repr && OID_ISNULL (last_processed_oid) && failed_objects[i] == 0
        && heap_get_class_repr_id (thread_p, class_oids + i) == initial_last_repr_id[i])
      {
        /* upgrade IX_LOCK -> X_LOCK; catalog_drop_old_representations; mark COMPACTDB_REPR_DELETED */
      }
    if (space_to_process == 0) break;
  }

Three invariants govern the class-level loop:

Initial representation snapshot. Before processing class i, the server records initial_last_repr_id[i]. After processing, before dropping old reprs, it re-reads the repr ID; if a concurrent ALTER TABLE slipped in and changed it (across the IX_LOCK boundary), the drop is skipped — old reprs still encode rows now written under the new repr.
Repr drop needs an X_LOCK. The IX_LOCK is upgraded to X_LOCK on the class root before catalog_drop_old_representations. On failure the class is left untouched.
Lock-acquire failure is non-fatal. A class that cannot be IX-locked within class_lock_timeout is marked COMPACTDB_LOCKED_CLASS and skipped, to be retried on the next iteration.

process_class then fetches instances via xlocator_lock_and_fetch_all and calls process_object per row:

// process_object — src/storage/compactdb_sr.c (condensed)
scan_code = locator_lock_and_get_object (thread_p, oid, &upd_scancache->node.class_oid, &copy_recdes, upd_scancache,
                                         X_LOCK, COPY, NULL_CHN, LOG_WARNING_IF_DELETED);
for (i = 0, value = attr_info->values; i < attr_info->num_values; i++, value++)
  {
    error_code = process_value (thread_p, &value->dbvalue);
    if (error_code > 0)
      {
        value->state = HEAP_WRITTEN_ATTRVALUE;
        atts_id[updated_n_attrs_id++] = value->attrid;
      }
  }
if (updated_n_attrs_id > 0 || /* representation drift */)
  locator_attribute_info_force (thread_p, &upd_scancache->node.hfid, oid, attr_info, atts_id, updated_n_attrs_id,
                                LC_FLUSH_UPDATE, SINGLE_ROW_UPDATE, upd_scancache, &force_count, false,
                                REPL_INFO_TYPE_RBR_NORMAL, DB_NOT_PARTITIONED_CLASS, NULL, NULL, NULL,
                                UPDATE_INPLACE_NONE, &copy_recdes, false);

process_value is the heart of Pass 1:

// process_value — src/storage/compactdb_sr.c (condensed)
case DB_TYPE_OID:
  {
    OID *ref_oid = db_get_oid (value);
    if (OID_ISNULL (ref_oid)) break;
    heap_scancache_quick_start (&scan_cache);
    scan_cache.mvcc_snapshot = logtb_get_mvcc_snapshot (thread_p);
    scan_code = heap_get_visible_version (thread_p, ref_oid, &ref_class_oid, NULL, &scan_cache, PEEK, NULL_CHN);
    heap_scancache_end (thread_p, &scan_cache);
    if (scan_code != S_SUCCESS)
      {
        OID_SET_NULL (ref_oid);
        return_value = 1;          /* mark "this attribute changed" */
      }
  }
case DB_TYPE_SET: case DB_TYPE_MULTISET: case DB_TYPE_SEQUENCE:
  return_value = process_set (thread_p, db_get_set (value));

Visibility is checked via heap_get_visible_version against a fresh MVCC snapshot. If the target has no visible version, the attribute is NULL-ed and marked written; for set-types the server recurses through process_set. Any non-zero return triggers locator_attribute_info_force, which writes the row back with B+Tree maintenance and replication hooks.

UPDATE_INPLACE_NONE is significant: Pass 1 generates a normal MVCC update (new version, old version visible-until-vacuumed), not an in-place rewrite. So Pass 1 leaves dead versions for the online vacuum to clean later — and a mid-pass crash recovers like any other WAL-logged update.

Per-iteration error policy: NO_ERROR → commit and continue; ER_LK_UNILATERALLY_ABORTED → tran_abort_only_client, treat cursor as valid, continue; ER_FAILED → abort and exit. Per-class totals (total_objects, failed_objects, modified_objects, big_objects) are accumulated and surfaced via show_statistics.

Pass 2 — reclaim empty heap pages

Once Pass 1 has fixed every dangling OID, the driver runs do_reclaim_addresses. Per class:

// do_reclaim_class_addresses — src/executables/compactdb_cl.c (condensed)
db_set_isolation (TRAN_READ_COMMITTED);
locator_fetch_class (sm_Root_class_mop, DB_FETCH_QUERY_WRITE);     /* IX_LOCK on root */
class_ = locator_fetch_class (class_mop, DB_FETCH_WRITE);          /* SCH-M lock */
locator_flush_all_instances (class_mop, DECACHE);

/* reachability analysis */
if (class_->flags & SM_CLASSFLAG_SYSTEM)         can_reclaim_addresses = false;
else if (class_->flags & SM_CLASSFLAG_REUSE_OID) can_reclaim_addresses = true;
else
  {
    lmops = locator_get_all_class_mops (DB_FETCH_CLREAD_INSTREAD, is_not_system_class);
    class_instances_can_be_referenced (class_mop, parent_mop, &class_can_be_referenced,
                                       any_class_can_be_referenced, lmops->mops, lmops->num);
    can_reclaim_addresses = !class_can_be_referenced && !*any_class_can_be_referenced;
  }

if (can_reclaim_addresses)
  heap_reclaim_addresses (hfid);

Reachability is the load-bearing safety check. class_referenced_by_class walks every other class and class_referenced_by_domain inspects each attribute’s domain:

// class_referenced_by_domain — src/executables/compactdb_cl.c (condensed)
if (type == DB_TYPE_OBJECT)
  {
    DB_OBJECT *class_ = db_domain_class (crt_domain);
    if (class_ == NULL)
      *any_class_can_be_referenced = true;     /* "object" wildcard */
    else if (referenced_class == class_ || db_is_subclass (referenced_class, class_) > 0)
      *class_can_be_referenced = true;
  }
else if (pr_is_set_type (type))
  class_referenced_by_domain (referenced_class, db_domain_set (crt_domain),
                              class_can_be_referenced, any_class_can_be_referenced);

The two booleans encode three states. any_class_can_be_referenced is set when any schema attribute has unconstrained OBJECT domain (“any object” wildcard); once true it is sticky for the rest of the run, disabling Pass 2 globally. class_can_be_referenced flags a domain specifically including the current class (or a partition parent). Both false → safe → call heap_reclaim_addresses. Two flag-driven shortcuts: SM_CLASSFLAG_REUSE_OID classes can always be reclaimed (nothing is allowed to hold OID references to them); SM_CLASSFLAG_SYSTEM classes are always skipped (the reachability check does not include them).

xheap_reclaim_addresses (server-side, in heap_file.c) walks the heap file and frees pages whose every slot is empty. It does not move surviving rows. Its precondition is that every OID anyone might dereference is still valid — Pass 1 makes that true.

Pass 3 — defragment slotted pages

The third pass is boot_heap_compact per class, which calls boot_heap_compact_pages on the server, which calls heap_compact_pages (thread_p, class_oid) in heap_file.c. The work per page is the standard slotted-page compaction: re-pack live records to the start of the free space region, update the slot table, and bring the per-page free-space tracker into sync. No row is moved between pages; only the free-space layout inside each page changes.

// boot_heap_compact_pages — src/storage/compactdb_sr.c
int
boot_heap_compact_pages (THREAD_ENTRY * thread_p, OID * class_oid)
{
  if (boot_can_compact (thread_p) == false)
    {
      return ER_COMPACTDB_ALREADY_STARTED;
    }
  return heap_compact_pages (thread_p, class_oid);
}

Each per-class call commits its own transaction. As with Pass 1 and Pass 2, the loop tolerates ER_LK_UNILATERALLY_ABORTED by calling tran_abort_only_client and continuing.

Single-instance guard and restart safety

boot_compact_start and boot_compact_stop (in compactdb_sr.c) guard the whole utility under a critical section CSECT_COMPACTDB_ONE_INSTANCE. The state is two file-scope variables:

// boot_compact_start / boot_compact_stop — src/storage/compactdb_sr.c
static bool compact_started = false;
static int last_tran_index = -1;

int
boot_compact_start (THREAD_ENTRY * thread_p)
{
  if (csect_enter (thread_p, CSECT_COMPACTDB_ONE_INSTANCE, INF_WAIT) != NO_ERROR)
    return ER_FAILED;

  current_tran_index = LOG_FIND_THREAD_TRAN_INDEX (thread_p);
  if (current_tran_index != last_tran_index && compact_started == true)
    {
      csect_exit (thread_p, CSECT_COMPACTDB_ONE_INSTANCE);
      return ER_COMPACTDB_ALREADY_STARTED;
    }
  last_tran_index = current_tran_index;
  compact_started = true;
  csect_exit (thread_p, CSECT_COMPACTDB_ONE_INSTANCE);
  return NO_ERROR;
}

The semantics: only one transaction at a time may be the active compactdb session. If a second compactdb tries to start while the first is still alive, it gets ER_COMPACTDB_ALREADY_STARTED. If the first compactdb’s process dies mid-pass, the next start from the same transaction index will succeed (the index is now reused for a new process, but compact_started is still true from the previous owner — see “Cross-check Notes” for the subtlety this introduces).

Because every iteration of every pass commits its own transaction, a crash mid-run is a clean restart point. After recovery, the operator can simply start compactdb again; Pass 1’s cursor will restart from a class-OID boundary, Pass 2 / Pass 3 are class-by-class and will repeat any class whose work was not committed. There is no compactdb-specific recovery path, only the ordinary WAL recovery the engine runs at boot.

Two compactdb implementations

The repo has two compactdb translation units, both defining int compactdb (UTIL_FUNCTION_ARG * arg):

src/executables/compactdb_cl.c — the client/server form described above, invoked under SERVER_MODE by the cubrid compactdb CLI. Connects to a running server, holds class locks, drives the three passes via boot_compact_classes / boot_heap_compact / do_reclaim_addresses.
src/executables/compactdb.c — the standalone form (SA_MODE), which links the engine in-process and walks the heap directly via locator_fetch_all. Gated by PRM_ID_COMPACTDB_PAGE_RECLAIM_ONLY. Its Phase 3 additionally calls catalog_reclaim_space and file_tracker_reclaim_marked_deleted, which require that no other process holds the database open.

The two binaries share compactdb_common.c for class-list resolution but implement the passes independently with slightly different invariants.

Tunables

The CLI knobs (parsed in compactdb_cl.c::compactdb): --pages-commited-once (clamped [1, 20], multiplied by DB_PAGESIZE for the Pass 1 byte budget), --instance-lock-timeout and --class-lock-timeout (each clamped [1, 10] seconds, multiplied by 1000 for lock_object_wait_msecs), --delete-old-repr (enables Pass 1’s representation drop), --input-class-file (file of class names, mutually exclusive with command-line names), --standby-cs-mode (switches client type to DB_CLIENT_TYPE_ADMIN_COMPACTDB_WOS for HA standby nodes).

The standalone form additionally consults PRM_ID_COMPACTDB_PAGE_RECLAIM_ONLY: 0 = all three passes, 1 = skip Pass 1 (page reclaim only), 2 = skip Pass 1 and Pass 2 (catalog and tracker-deleted-file reclaim only).

Source Walkthrough

The compactor’s symbol surface, grouped by responsibility. Anchor on symbol names; the position table at the end of the next section pins each one to a (file, line) valid as of updated:.

CLI driver. compactdb is the entry point in both compactdb.c (standalone) and compactdb_cl.c (client/server); each parses CLI args, calls db_login / db_restart, then compactdb_start. The client/server compactdb_start runs the three-pass loop between compact_db_start and compact_db_stop; the standalone compactdb_start follows a goto-driven phase1 / phase2 / phase3 flow gated by PRM_ID_COMPACTDB_PAGE_RECLAIM_ONLY. Helpers: compactdb_usage / compact_usage (print help via the message catalog), show_statistics (per-class summary), get_name_from_class_oid, find_oid.

Class list resolution. Shared between both CLIs in compactdb_common.c: get_class_mops resolves names through locator_find_class (handling owner qualifiers and case-folding), get_class_mops_from_file reads names line-by-line from a file, get_num_requested_class sizes the array.

Pass 2 — reachability and page reclaim (client). In compactdb_cl.c: do_reclaim_addresses is the per-class loop; do_reclaim_class_addresses switches to TRAN_READ_COMMITTED, takes SCH-M lock, runs reachability, calls heap_reclaim_addresses. The reachability walk is class_instances_can_be_referenced → class_referenced_by_class → class_referenced_by_attributes → class_referenced_by_domain, with is_not_system_class as the filter for locator_get_all_class_mops.

Server-side worker (compactdb_sr.c). boot_compact_db is the Pass 1 entry. It calls process_class (heap-instance fetch loop), which calls process_object (lock the row, walk attributes, force-write on change), which calls process_value (examines each DB_VALUE; on DB_TYPE_OID, checks visibility and NULLs the reference if dead; on set-types, recurses through process_set). desc_disk_to_attr_info converts a RECDES to a HEAP_CACHE_ATTRINFO; is_class is the predicate that excludes class objects from rewrite. boot_heap_compact_pages is the Pass 3 server entry; boot_compact_start / boot_compact_stop / boot_can_compact form the single-instance guard around CSECT_COMPACTDB_ONE_INSTANCE.

Cross-module entry points the compactor calls. xlocator_lock_and_fetch_all (bulk fetch with per-instance lock), locator_lock_and_get_object (single-instance fetch + X-lock), locator_attribute_info_force (force-write a row through the locator with B+Tree and replication hooks), heap_get_visible_version (MVCC visibility check used by Pass 1), heap_get_class_repr_id (current representation ID, used to detect concurrent ALTER), catalog_drop_old_representations (drop obsolete reprs), xheap_reclaim_addresses and heap_compact_pages in heap_file.c (Pass 2 and Pass 3 work), catalog_reclaim_space and file_tracker_reclaim_marked_deleted (called only from standalone Pass 3).

Counter sentinels and CLI options. COMPACTDB_LOCKED_CLASS, COMPACTDB_INVALID_CLASS, COMPACTDB_UNPROCESSED_CLASS, COMPACTDB_REPR_DELETED are sentinel values stored in total_objects[i] to encode per-class outcomes. COMPACT_MIN_PAGES / COMPACT_MAX_PAGES clamp the page budget to [1, 20]; COMPACT_INSTANCE_MIN/MAX_LOCK_TIMEOUT and COMPACT_CLASS_MIN/MAX_LOCK_TIMEOUT clamp the timeout knobs to [1, 10] seconds. Option strings: COMPACT_VERBOSE_S, COMPACT_PAGES_COMMITED_ONCE_S, COMPACT_INSTANCE_LOCK_TIMEOUT_S, COMPACT_CLASS_LOCK_TIMEOUT_S, COMPACT_INPUT_CLASS_FILE_S, COMPACT_DELETE_OLD_REPR_S, COMPACT_STANDBY_CS_MODE_S.

Source verification (as of 2026-05-01)

Symbol	File	Line
`boot_compact_db`	`src/storage/compactdb_sr.c`	517
`process_class` (server)	`src/storage/compactdb_sr.c`	333
`process_object` (server)	`src/storage/compactdb_sr.c`	194
`process_value` (server)	`src/storage/compactdb_sr.c`	85
`process_set` (server)	`src/storage/compactdb_sr.c`	157
`desc_disk_to_attr_info`	`src/storage/compactdb_sr.c`	297
`is_class` (server)	`src/storage/compactdb_sr.c`	68
`boot_heap_compact_pages`	`src/storage/compactdb_sr.c`	680
`boot_compact_start`	`src/storage/compactdb_sr.c`	695
`boot_compact_stop`	`src/storage/compactdb_sr.c`	725
`boot_can_compact`	`src/storage/compactdb_sr.c`	754
`compactdb` (standalone)	`src/executables/compactdb.c`	97
`compactdb_start` (standalone)	`src/executables/compactdb.c`	172
`process_class` (standalone)	`src/executables/compactdb.c`	361
`process_object` (standalone)	`src/executables/compactdb.c`	492
`process_value` (standalone)	`src/executables/compactdb.c`	534
`disk_update_instance`	`src/executables/compactdb.c`	652
`update_indexes`	`src/executables/compactdb.c`	770
`compactdb` (client/server)	`src/executables/compactdb_cl.c`	783
`compactdb_start` (client/server)	`src/executables/compactdb_cl.c`	252
`do_reclaim_addresses`	`src/executables/compactdb_cl.c`	927
`do_reclaim_class_addresses`	`src/executables/compactdb_cl.c`	1012
`class_instances_can_be_referenced`	`src/executables/compactdb_cl.c`	1286
`class_referenced_by_class`	`src/executables/compactdb_cl.c`	1321
`class_referenced_by_attributes`	`src/executables/compactdb_cl.c`	1406
`class_referenced_by_domain`	`src/executables/compactdb_cl.c`	1435
`show_statistics`	`src/executables/compactdb_cl.c`	136
`get_class_mops`	`src/executables/compactdb_common.c`	92
`get_class_mops_from_file`	`src/executables/compactdb_common.c`	186
`xheap_reclaim_addresses`	`src/storage/heap_file.c`	6227
`heap_compact_pages`	`src/storage/heap_file.c`	17562
`catalog_reclaim_space`	`src/storage/system_catalog.c`	2725
`file_tracker_reclaim_marked_deleted`	`src/storage/file_manager.c`	10687

Cross-check Notes

Single-instance guard is process-fragile. The pair compact_started / last_tran_index is file-scope server state. If a compactdb client crashes without calling compact_db_stop, the next attempt sees compact_started == true with a stale last_tran_index and is refused with ER_COMPACTDB_ALREADY_STARTED until the server is restarted or the original transaction index gets reused. The standalone form sidesteps this — its server lives only inside its own process.

Pass 1 vs vacuum overlap. process_value checks visibility against an MVCC snapshot taken at this moment. The class-level IX_LOCK does not exclude vacuum on other classes’ rows that this class’s columns reference; if vacuum runs concurrently, a soon-dead row may be flagged as dangling and the column NULL-ed slightly early. In practice benign — the result is a NULL where a soft pointer to a doomed row used to be — but worth noting as an MVCC interaction.

Reachability is conservative. The analysis inspects domain types, not runtime values. A single OBJECT-domained column anywhere in the schema flips any_class_can_be_referenced and disables Pass 2 globally. Schemas that use OBJECT widely will rarely see Pass 2 do anything.

Standalone vs C/S Pass 1 differ. The standalone form (compactdb.c) skips reachability entirely — it always runs Pass 1 then Pass 2 unconditionally. The C/S form (compactdb_cl.c::do_reclaim_class_addresses) runs the analysis per class. Standalone is for DBA-only maintenance windows; C/S is for shared-server settings.

disk_update_instance retry path. In compactdb.c the code runs desc_obj_to_disk, and on size overflow reallocates a larger record buffer (rounded to DB_PAGESIZE) and tries once more. A second overflow returns 0 — there is no third try.

update_indexes reads the last on-disk version. It uses heap_get_last_version, not visible-version, because standalone runs without an active MVCC scope and needs the physically latest row to compute the index-key delta.

is_class in compactdb_sr.c. The static helper guards against accidentally rewriting a class-MOP reference (OID_EQ (class_oid, oid_Root_class_oid)); the same gate is implicit in the catalog’s separate handling of class objects, so the practical effect is small.

Open Questions

Pass 1 takes per-instance X_LOCK via locator_lock_and_get_object and lets locator_attribute_info_force lock again. The process_object header says “oid already locked at locator_lock_and_get_object” yet the force path acquires the lock once more — presumably re-entrant short-circuit, but the rationale is not stated.
Is there a way to do Pass 2 without the SCH-M class lock? The current path quiesces all access to the class for the duration of heap_reclaim_addresses; a per-page-latch protocol with a “this page being reclaimed” bit would be nicer in HA setups but not obviously safe with concurrent inserts.
The standalone Pass 1’s TODO acknowledges that Pass 2’s precondition (no dangling references) is not actually guaranteed when Pass 1 had failures. Real-world workflows depend on the operator checking failed_objects manually.

Beyond CUBRID — Comparative Designs & Research Frontiers

CUBRID’s no-relocation compactor is unusual. Mainstream patterns assume compaction implies row-ID rewriting, which implies index rebuild. Stable OIDs let indexes and foreign-OID columns survive compaction untouched, but mean CUBRID never recovers the physical-clustering gains of CLUSTER or InnoDB’s clustered-index rebuild. For an OID-graph schema (CUBRID’s OO heritage), stable OIDs are a feature; for a star-schema OLAP workload they are not.

Modern research has moved toward in-place reorganization with buffered redirection: SAP HANA’s delta-merge, Hyper/Umbra, and SingleStore all relocate rows in the background while a transient forwarding layer keeps live references valid. None maps directly to CUBRID’s disk-resident, object-heritage model, but the forwarding-layer idea is the right shape for any offline compactor that does want to relocate.

Two simpler extensions inside CUBRID’s existing model: (a) softening Pass 2’s reachability cliff by sampling runtime attribute values (today a single OBJECT column disables Pass 2 globally), and (b) an online Pass 2 / Pass 3 that tracks per-page liveness in the buffer manager. The hard part of (b) is reachability — online with OBJECT columns would need either pessimistic locking of every potentially-referencing class or optimistic conflict detection on every foreign-OID write. Neither has been taken; the offline compactor remains the operator’s tool.

Sources

Code paths are listed in the frontmatter references: field and pinned with line numbers in the position table above. Companion docs: cubrid-vacuum.md (online MVCC reclamation that compactdb complements), cubrid-heap-manager.md (the heap-file layer compactdb reclaims and defragments), cubrid-disk-manager.md (the volume / sector layer beneath). Textbook frame: Silberschatz, Korth, Sudarshan, Database System Concepts, chapter on storage and file structure.