CUBRID Catalog Manager — Disk Representation, System Classes, and Statistics

Contents:

Theoretical Background
Common DBMS Design
CUBRID’s Approach
Source Walkthrough
Source verification (as of 2026-04-30)
Beyond CUBRID — Comparative Designs and Research Frontiers
Sources

Theoretical Background

The catalog is the database’s self-description: every other subsystem — parser, optimizer, MVCC, lock manager, vacuum, CDC — asks the catalog questions like “what attributes does class C have?”, “where is its heap file?”, “what indexes target it?”, “how many rows does it have?”. Database Internals (Petrov, ch. 1 §“Database storage” and ch. 7 §“Storage Engines”) frames this as one of the two universal database invariants: schema and storage layout must be reconstructible from the bytes on disk without any out-of-band knowledge.

Two implementation choices the model leaves open shape every real engine and frame the rest of this document:

Where the bootstrap “root” lives. The catalog needs a well-known starting point — an OID, a fixed page id, or a file id — that is the same on every CUBRID database file. Without it, opening the catalog would itself need a catalog. Engines pick: a fixed OID for a root class (CUBRID, OODB-style), a fixed page in the system tablespace (Oracle’s bootstrap segment), or a fixed table OID with hard-coded schema (PostgreSQL’s pg_class).
Whether storage layout and user-visible schema are the same structure. Some engines unify them: PostgreSQL’s pg_class is the table catalog, accessed via the same heap+index machinery as user tables. Others split them: an internal “catalog manager” stores compact disk-representation records for engine use, and a parallel set of user-visible system classes (_db_class, _db_attribute, …) lets SQL queries inspect the schema. CUBRID picks the split design — the internal system_catalog and the user-visible catcls_* tables coexist, with the user-side tables driven from the internal records.

Once those choices are named, every CUBRID-specific structure in this document either implements one of them or makes the access faster.

Common DBMS Design

Every relational engine reaches for a similar set of patterns around catalog storage and bootstrap.

Self-describing storage on top of itself

The catalog is stored in heap files like any other class — but the heap manager needs catalog records to interpret pages. The chicken-and-egg is resolved by bootstrap classes with hard- coded schemas the engine can interpret without consulting the catalog. CUBRID’s root class is the seed; from the root class the engine learns about _db_class, from _db_class it learns about _db_attribute, and so on.

Disk representation vs. logical schema

Engines distinguish between the physical disk representation (the byte order of attributes, fixed vs. variable, pad bytes) and the logical schema (column names, types, constraints). The disk representation can change without invalidating existing rows by versioning: each class has a list of representations indexed by REPR_ID, and every heap row carries the REPR_ID it was written under. ALTER TABLE bumps the representation; old rows decode with the old representation, new rows with the new.

Statistics as a separate, mutable record

Statistics are the optimizer’s input, not part of the schema — they change continuously. Engines store them adjacent to the catalog but with a different update cadence. PostgreSQL has pg_statistic, InnoDB has mysql.innodb_index_stats, CUBRID has CLS_INFO carrying per-class stats and BTREE_STATS per index.

Cardinality estimation hierarchy

The optimizer asks at three granularities: per-class (heap_num_objects, heap_num_pages), per-attribute (n_distinct_values), per-index (B+Tree key count, leaf count, height, partial-key cardinality for compound indexes). CUBRID’s CLASS_STATS, ATTR_STATS, BTREE_STATS map one-to-one to these levels.

Two access flavours: server-side and client-side

The server reads the catalog during query execution; the client reads it during DDL parsing and schema introspection. CUBRID ships parallel *_sr.c (server) and *_cl.c (client) sources for statistics, with the _sr side authoritative.

Theory ↔ CUBRID mapping

Theoretical concept	CUBRID name
Catalog identifier (boot anchor)	`CTID { vfid, xhid, hpgid }` (`system_catalog.h`); global `catalog_Id`
Disk representation of one class	`DISK_REPR { id, n_fixed, fixed[], n_variable, variable[] }` (`system_catalog.h`)
Per-attribute disk info	`DISK_ATTR { id, location, type, val_length, value, position, classoid, n_btstats, bt_stats, ndv }`
Per-class catalog info	`CLS_INFO { ci_hfid, ci_tot_pages, ci_tot_objects, ci_time_stamp, ci_rep_dir }`
Per-access transient state	`CATALOG_ACCESS_INFO { class_oid, dir_oid, class_name, is_update, … }`
User-visible system classes	`_db_class`, `_db_attribute`, `_db_index`, `_db_domain`, `_db_method`, …
Catalog-class entry-point family	`catcls_*` functions (`catalog_class.c`)
Catalog primary heap	`catalog_Id.vfid` — file holding catalog records
Catalog directory hash	`catalog_Id.xhid` — extendible-hash index `class_oid → dir_oid`
Catalog header page	`catalog_Id.hpgid` — fixed page id holding catalog metadata
Class statistics	`CLASS_STATS { time_stamp, heap_num_objects, heap_num_pages, n_attrs, attr_stats[] }` (`statistics.h`)
Per-attribute statistics	`ATTR_STATS { id, type, n_btstats, bt_stats[], ndv }`
Per-index statistics	`BTREE_STATS { btid, leafs, pages, height, keys, has_function, key_type, pkeys[] }`
Recovery functions for catalog	`catalog_rv_new_page_redo`, `catalog_rv_insert_{redo,undo}`, `catalog_rv_delete_{redo,undo}`, `catalog_rv_update`, `catalog_rv_ovf_page_logical_insert_undo`
Server boot	`boot_restart_server` (`boot_sr.c`)
Catalog start at access	`catalog_start_access_with_dir_oid`
Catalog end at access	`catalog_end_access_with_dir_oid`

CUBRID’s Approach

The catalog has four moving parts: the catalog identifier and its three-volume layout, the disk-representation records that capture per-class storage, the parallel user-visible system classes that surface the same data through SQL, and the statistics records whose update cadence is different from the schema. We walk them in that order.

Overall structure

flowchart LR
  subgraph BOOT["Bootstrap"]
    ROOT["Root class (fixed OID)"]
    BOOTFN["boot_restart_server"]
  end
  subgraph CTID["catalog_Id (CTID)"]
    VFID["vfid: catalog heap file"]
    XHID["xhid: extendible hash class_oid → dir_oid"]
    HPGID["hpgid: catalog header page"]
  end
  subgraph CR["Catalog records (system_catalog.c)"]
    DR1["DISK_REPR for class A repr 1"]
    DR2["DISK_REPR for class A repr 2"]
    CLI["CLS_INFO for class A"]
    CR1["..."]
  end
  subgraph SC["User-visible system classes (catcls)"]
    DBC["_db_class"]
    DBA["_db_attribute"]
    DBI["_db_index"]
    DBD["_db_domain"]
    DBT["_db_data_type"]
  end
  subgraph ST["Statistics"]
    CS["CLASS_STATS"]
    AS["ATTR_STATS"]
    BS["BTREE_STATS"]
  end
  BOOTFN --> ROOT
  BOOTFN --> CTID
  CTID --> CR
  ROOT --> SC
  SC -.synchronised with.-> CR
  CR -.cardinality from.-> ST
  ST --> CS --> AS --> BS

Figure 1 — Catalog subsystem overview. boot_restart_server roots the three-part CTID (heap file, extendible-hash directory, header page); internal DISK_REPR / CLS_INFO records are synchronised with the user-visible _db_class / _db_attribute system classes; CLASS_STATS → ATTR_STATS → BTREE_STATS form the statistics chain fed by stats_update_statistics.

The figure encodes three boundaries. (boot / runtime) the root class is the only OID the engine knows ahead of time; everything else is reachable from it. (internal / user-visible) catalog records under catalog_Id are compact disk representations the engine reads on hot paths; system classes under _db_* are the schema-introspection face SQL queries hit. (schema / stats) representations describe shape and do not change often; statistics describe size and change every analyze.

CTID — the catalog’s three-part identity

The catalog itself is a small data structure with three pointers:

// CTID — src/storage/system_catalog.h
struct ctid
{
  VFID vfid;                /* catalog volume identifier — heap file
                               holding catalog records */
  EHID xhid;                /* extendible hash index identifier —
                               class_oid → dir_oid map */
  PAGEID hpgid;             /* catalog header page identifier */
};
typedef struct ctid CTID;

extern CTID catalog_Id;     /* global catalog identifier */

The three components correspond to three on-disk objects:

vfid — a heap file that stores the catalog’s DISK_REPR and CLS_INFO records. It is treated like any other heap by the heap manager (cubrid-heap-manager.md), with one extra invariant: catalog records must never be vacuumed away while any transaction can still see the class they describe.
xhid — an extendible-hash index keyed on class_oid, pointing to the directory record OID for that class. The directory record in turn lists the OIDs of all DISK_REPR records for the class (one per representation, plus one for CLS_INFO).
hpgid — the catalog header page, holding global catalog metadata (version, last-allocated representation id, etc.). Its page id is fixed at boot time and never changes.

catalog_initialize (system_catalog.c) populates the global catalog_Id from these three values during boot. From that moment, every catalog access starts with catalog_Id and descends into one of the three components.

Disk representation — one record per attribute layout

Per class, per representation:

// DISK_REPR — src/storage/system_catalog.h
struct disk_representation
{
  REPR_ID id;                          /* representation identifier */
  int n_fixed;                         /* number of fixed-length attributes */
  struct disk_attribute *fixed;        /* fixed attribute structures */
  int fixed_length;                    /* total length of fixed attrs */
  int n_variable;                      /* number of variable-length attrs */
  struct disk_attribute *variable;     /* variable attribute structures */
};

The split between fixed and variable is the on-disk layout choice: fixed-length attributes pack tightly with no per-attribute offset overhead, variable-length attributes carry an offset table at the front of the row. Iterating over disk_representation::fixed[] and ::variable[] is exactly the order the heap manager uses to interpret a row.

Each attribute carries:

// DISK_ATTR — src/storage/system_catalog.h
struct disk_attribute
{
  ATTR_ID id;                  /* attribute identifier */
  int location;                /* fixed: exact offset; variable: index into
                                  the offset table */
  DB_TYPE type;                /* int / varchar / float / … */
  int val_length;              /* default value length, ≥ 0 */
  void *value;                 /* default value (no default expression) */
  int position;                /* storage position (fixed only) */
  OID classoid;                /* source class — for inherited attrs */
  int n_btstats;               /* number of B+tree statistics */
  BTREE_STATS *bt_stats;       /* per-index stats array */
  INT64 ndv;                   /* Number of Distinct Values */
};

Two fields worth marking up. (classoid) for inherited attributes, the classoid distinguishes which class along the inheritance chain originally defined the attribute. The optimizer uses this to avoid double-counting inherited attributes when computing class cardinality. (bt_stats[] and ndv) statistics live inside the attribute record, not in a separate file. This is the trade-off: ALTER STATS rewrites the attribute record, which is heavier than a separate stats table would be, but reading the disk representation gives the optimizer everything in one fetch.

Per-class info — the heap pointer and rough counts

CLS_INFO is the per-class summary record:

// CLS_INFO — src/storage/system_catalog.h
struct cls_info
{
  HFID ci_hfid;                /* heap file identifier for the class */
  int ci_tot_pages;            /* total pages in the heap file */
  int ci_tot_objects;          /* total live objects */
  unsigned int ci_time_stamp;  /* timestamp of last update */
  OID ci_rep_dir;              /* representation directory record OID */
};

ci_hfid is the most-read field. Every query that scans a class starts by fetching the class’s CLS_INFO from the catalog, reading ci_hfid, and handing the heap file to the scan manager.

ci_rep_dir is the back-pointer from CLS_INFO to the directory record listing all representations of this class. Following the chain class_oid → xhid → dir_oid → DISK_REPR is the standard lookup; following CLS_INFO::ci_rep_dir → DISK_REPR is the inverse for traversal during ALTER.

ci_time_stamp is the cache-validation token: the optimizer caches CLS_INFO in process memory and invalidates the cache when ci_time_stamp advances.

Catalog access — the per-access state

Every catalog read or write goes through a CATALOG_ACCESS_INFO session:

// CATALOG_ACCESS_INFO — src/storage/system_catalog.h
struct catalog_access_info
{
  OID *class_oid;
  OID *dir_oid;             /* cached after first xhid lookup */
  char *class_name;
  bool is_update;           /* update access — needs X locks */
  bool need_unlock;         /* unlock at end-access time */
  bool access_started;      /* guard against double-start */
  bool need_free_class_name;
#if !defined (NDEBUG)
  bool is_systemop_started;
#endif
};

The session is opened with catalog_start_access_with_dir_oid and closed with catalog_end_access_with_dir_oid. Between the two, the caller has the directory OID cached, the class lock acquired (S for read, X for update), and (in update sessions) a system-op bracket open so partial catalog updates can be rolled back as a logical unit. The debug-only is_systemop_started field assert-checks this discipline.

Catalog-class machinery — user-visible schema

Parallel to catalog_Id-rooted records, CUBRID maintains a set of user-visible classes that mirror the same data through SQL. The classes are conventionally named _db_*:

_db_class — one row per class, with name, OID, owner, type (table / view / partition), creation time.
_db_attribute — one row per attribute of a class.
_db_index — one row per index.
_db_domain — one row per type domain (used for compound domains).
_db_data_type — system data-type catalogue.
_db_method, _db_meth_arg, _db_meth_file, _db_method_sig — for OODB methods.
_db_partition — partitioning info.
_db_trigger — triggers.
_db_serial — sequences.
_db_collation — collation catalogue.
_db_charset, _db_servers, _db_user, _db_auth, _db_password, _db_synonym, … — auxiliary.

The catcls_* family in catalog_class.c is the bridge:

// catalog_class.h — src/storage/catalog_class.h
extern bool catcls_Enable;

int catcls_compile_catalog_classes (THREAD_ENTRY *thread_p);
int catcls_insert_catalog_classes (THREAD_ENTRY *thread_p, RECDES *record);
int catcls_delete_catalog_classes (THREAD_ENTRY *thread_p, const char *name, OID *class_oid);
int catcls_update_catalog_classes (THREAD_ENTRY *thread_p, const char *name, RECDES *record,
                                    OID *class_oid_p, UPDATE_INPLACE_STYLE force_in_place);
int catcls_finalize_class_oid_to_oid_hash_table (THREAD_ENTRY *thread_p);
int catcls_remove_entry (THREAD_ENTRY *thread_p, OID *class_oid);
int catcls_get_server_compat_info (THREAD_ENTRY *thread_p, INTL_CODESET *charset_id_p,
                                    char *lang_buf, const int lang_buf_size, char *timezone_checksum);
int catcls_get_db_collation (THREAD_ENTRY *thread_p, LANG_COLL_COMPAT **db_collations, int *coll_cnt);
int catcls_update_class_stats (THREAD_ENTRY *thread_p, const char *class_name,
                                unsigned int ci_time_stamp, bool with_fullscan);

When DDL creates or alters a class, the engine writes an internal DISK_REPR to the catalog and inserts a row into _db_class (and related rows into _db_attribute, _db_index, …). The two writes are bracketed in a single transaction so they commit together; if they diverge (e.g., crash between writes), the recovery pass rolls the partial work back as one unit. Whether concurrent readers can ever observe the two faces out of step under lock-mode escalation is open (see Open Q3).

catcls_compile_catalog_classes is the function that originally builds the system classes from a hard-coded schema; it runs at install time. (The neighbouring symbol catcls_insert_catalog_classes is the row-insert entry point used at DDL time, not the install-time compiler.) The schema source lives in src/object/schema_system_catalog_install.cpp (cubrid AGENTS.md mentions strict formatting rules there).

Bootstrap — boot_restart_server and the root class

boot_restart_server (boot_sr.c) is the post-recovery entry point that brings the catalog online:

Initialize log and recover. log_initialize runs the three-pass restart (cubrid-recovery-manager.md).
Initialize disk and file managers.
Read catalog_Id. From the database parameter file (the boot_DB_parm record) plus the on-disk header; this gives (vfid, xhid, hpgid).
catalog_initialize (catalog_Id). Sets up in-memory structures.
Bind the root class. The root class has a fixed OID stored in the boot parameters; the engine reads its DISK_REPR from the catalog and primes the metaclass cache.
Bind the system classes. Walking from the root class, _db_class, _db_attribute, etc. are loaded with the class-name ↔ OID mapping cached in catcls_class_oid_to_oid_hash_table.
Initialize statistics caches.
Start vacuum master + workers (cubrid-vacuum.md).

After step 7 the server is ready to accept queries. boot_DB_parm is the on-disk parameter record; updates to it flow through catcls_update_catalog_classes so they’re durable.

Statistics — separate cadence, separate access

Statistics are part of the same DISK_ATTR record but updated on a different cadence — every UPDATE STATISTICS or fullscan. The statistics structures:

// statistics.h — src/storage/statistics.h
struct btree_stats
{
  BTID btid;
  int leafs;                 /* leaf pages including overflow */
  int pages;                 /* total pages */
  int height;                /* tree depth */
  int keys;                  /* distinct keys */
  int has_function;          /* function index? */
  TP_DOMAIN *key_type;
  int pkeys_size;            /* compound-key partial-cardinality array */
  int *pkeys;                /* pkeys[k] = NDV of first k+1 components */
  int dedup_idx;             /* SUPPORT_DEDUPLICATE_KEY_MODE */
};

struct attr_stats
{
  int id;
  DB_TYPE type;
  int n_btstats;
  BTREE_STATS *bt_stats;
  INT64 ndv;                 /* Number of Distinct Values */
};

struct class_stats
{
  unsigned int time_stamp;
  int heap_num_objects;
  int heap_num_pages;
  int n_attrs;
  ATTR_STATS *attr_stats;
};

The pkeys[] array is worth marking up. For a compound index (a, b, ..., x) of size pkeys_size = k, pkeys[i] is the cardinality of the first i+1 columns. The optimizer uses this to estimate selectivity for queries that filter on a prefix of the index — without it, every prefix query would have to assume independent column distributions.

STATS_SAMPLING_THRESHOLD = 5000 and NUMBER_OF_SAMPLING_PAGES = 5000 (declared in statistics.h) are the sampling defaults; full-scan mode (STATS_WITH_FULLSCAN) is the alternative. The stats_adjust_sampling_weight inline (in statistics.h) applies a differential weight when sampling NDV is below 1% of expected; the assumption is that “if the sample data is a lot of duplicated, there will also be duplicate in the overall data”.

Server-side statistics access goes through statistics_sr.c; client-side (the SQL interface) through statistics_cl.c plus stats_get_statistics (declared in statistics.h, SERVER_MODE-disabled).

One ALTER, end to end

sequenceDiagram
  participant CL as DDL parser
  participant CAT as catalog (system_catalog)
  participant CC as catalog_class (catcls_*)
  participant LM as log_manager (sysop)
  participant LCK as lock_manager

  CL->>LCK: X-lock class_oid
  CL->>LM: log_sysop_start (DDL atomic)
  CL->>CAT: catalog_start_access_with_dir_oid (X)
  CL->>CAT: catalog_add_representation (new DISK_REPR with new REPR_ID)
  CL->>CAT: catalog_update_class_info (CLS_INFO ci_time_stamp bumped)
  CL->>CC: catcls_update_catalog_classes (_db_class row)
  CL->>CC: catcls_update_catalog_classes (_db_attribute rows)
  CL->>CAT: catalog_end_access_with_dir_oid
  CL->>LM: log_sysop_commit
  CL->>LCK: X-lock release at commit

Figure 2 — One ALTER end-to-end. catalog_add_representation and catcls_update_catalog_classes are invoked inside the same log_sysop_start / log_sysop_commit bracket, so the internal DISK_REPR record and the _db_class / _db_attribute rows are always updated atomically.

The two catalog faces — internal records and user-visible system classes — are updated in the same system op, so a crash mid-update either rolls them both back or keeps them both. The DDL transaction commits as a single atomic unit even though it touched ~10 different files.

Source Walkthrough

Anchor on symbol names, not line numbers.

Headers and types

CTID (system_catalog.h) — catalog identifier.
DISK_REPR / DISK_ATTR (system_catalog.h) — disk representation records.
CLS_INFO (system_catalog.h) — per-class summary.
CATALOG_ACCESS_INFO (system_catalog.h) — per-access session.
CATALOG_DIR_REPR_KEY = -2 macro (system_catalog.h) — directory key sentinel.
BTREE_STATS / ATTR_STATS / CLASS_STATS (statistics.h).
BTREE_STATS_PKEYS_NUM = 8 macro (statistics.h) — compound-key array bound.
STATS_SAMPLING_THRESHOLD / NUMBER_OF_SAMPLING_PAGES (statistics.h).

Lifecycle

catalog_initialize (system_catalog.c).
catalog_finalize (system_catalog.c).
catalog_create (system_catalog.c) — first-time setup; called only by root.
catalog_destroy (system_catalog.c) — drop catalog.
catalog_reclaim_space (system_catalog.c) — compact fragmented catalog records.

Read access

catalog_get_class_info (system_catalog.c).
catalog_get_representation (system_catalog.c).
catalog_get_representation_directory (system_catalog.c).
catalog_get_last_representation_id (system_catalog.c).
catalog_get_class_info_from_record (system_catalog.c) — decode CLS_INFO from a heap record.
catalog_get_dir_oid_from_cache (system_catalog.c) — cache-aware lookup.

Write access

catalog_add_representation (system_catalog.c).
catalog_add_class_info (system_catalog.c).
catalog_update_class_info (system_catalog.c).
catalog_drop_old_representations (system_catalog.c).
catalog_insert / catalog_update / catalog_delete (system_catalog.c) — generic record-level ops.

Session bracket

catalog_start_access_with_dir_oid (system_catalog.c).
catalog_end_access_with_dir_oid (system_catalog.c).

Recovery functions

catalog_rv_new_page_redo, catalog_rv_insert_redo / _undo, catalog_rv_delete_redo / _undo, catalog_rv_update, catalog_rv_ovf_page_logical_insert_undo (declared in system_catalog.h, defined in system_catalog.c).

Cardinality

catalog_get_cardinality (system_catalog.c).
catalog_get_cardinality_by_name (system_catalog.c).

User-visible system classes

catcls_compile_catalog_classes (catalog_class.c) — install-time schema build.
catcls_insert_catalog_classes (catalog_class.c).
catcls_update_catalog_classes (catalog_class.c).
catcls_delete_catalog_classes (catalog_class.c).
catcls_remove_entry (catalog_class.c).
catcls_get_server_compat_info (catalog_class.c) — charset / locale / timezone compatibility check at boot.
catcls_get_db_collation (catalog_class.c).
catcls_update_class_stats (catalog_class.c).
catcls_finalize_class_oid_to_oid_hash_table (catalog_class.c).
catcls_find_and_set_cached_class_oid (catalog_class.c).

Boot

boot_restart_server (boot_sr.c) — main boot entry.
The boot parameter record (boot_DB_parm) — disk-resident database parameters including catalog OIDs.

Statistics

stats_get_statistics (statistics.h, defined in statistics_cl.c) — client-side fetch.
stats_dump / stats_ndv_dump (statistics.h) — debugging.
stats_make_select_list_for_ndv (statistics.h).
stats_get_ndv_by_query (statistics.h).
stats_adjust_sampling_weight (statistics.h, inline).

Position hints as of 2026-04-30

Symbol	File	Line
`CTID` (struct)	`system_catalog.h`	45
`DISK_REPR` (struct)	`system_catalog.h`	63
`DISK_ATTR` (struct)	`system_catalog.h`	80
`CLS_INFO` (struct)	`system_catalog.h`	96
`CATALOG_ACCESS_INFO` (struct)	`system_catalog.h`	106
`catalog_Id` (extern global)	`system_catalog.h`	153
`BTREE_STATS` (struct)	`statistics.h`	61
`ATTR_STATS` (struct)	`statistics.h`	82
`CLASS_STATS` (struct)	`statistics.h`	93
`stats_adjust_sampling_weight` (inline)	`statistics.h`	135
`catalog_initialize`	`system_catalog.c`	2577
`catalog_finalize`	`system_catalog.c`	2607
`catalog_get_class_info_from_record`	`system_catalog.c`	504
`catalog_initialize_max_space`	`system_catalog.c`	549
`catalog_initialize_new_page`	`system_catalog.c`	598
`catalog_add_representation`	`system_catalog.c`	2815
`catalog_add_class_info`	`system_catalog.c`	3029
`catalog_update_class_info`	`system_catalog.c`	3172
`catalog_get_class_info`	`system_catalog.c`	4113
`catcls_insert_catalog_classes`	`catalog_class.c`	4310
`boot_restart_server`	`boot_sr.c`	1969

Source verification (as of 2026-04-30)

Verified facts

Catalog source files are system_catalog.{c,h} and catalog_class.{c,h}, not catalog_manager.{c,h}. Verified by find -name 'catalog*'. The skeleton’s references: list named catalog_manager.{c,h} (which doesn’t exist); corrected at draft time. The misnomer comes from the vendor decks calling the subsystem “catalog manager” in prose.
The catalog identifier CTID is a triple of (vfid, xhid, hpgid). Verified by reading the CTID struct in system_catalog.h. The three correspond to a heap file (catalog records), an extendible-hash index (class_oid → dir_oid), and a fixed header page.
catalog_Id is a global, not per-thread state. Verified by the extern CTID catalog_Id declaration in system_catalog.h. The identifier is set once at boot (catalog_initialize) and never changes for the lifetime of the server.
DISK_REPR splits attributes into fixed and variable arrays, not a single ordered list. Verified by the disk_representation struct in system_catalog.h (separate n_fixed / fixed[] and n_variable / variable[] fields). Implication: row decoding reads fixed attributes by exact offset, variable attributes via the per-row offset table.
Statistics live inline on the attribute record, not in a separate file. Verified by the n_btstats, bt_stats, and ndv fields on DISK_ATTR in system_catalog.h. Cost: stats updates rewrite the attribute record. Benefit: optimizer reads schema and stats in one fetch.
CATALOG_ACCESS_INFO carries a debug-only is_systemop_started field. Verified by the NDEBUG-conditional field on the catalog_access_info struct in system_catalog.h. Its purpose is to assert that update-mode catalog accesses are properly bracketed in a system op.
The user-visible system classes are populated through the catcls_* family in catalog_class.c, separate from the internal catalog_* in system_catalog.c. Verified by reading catalog_class.h (the entire header is small — under ~50 lines) and grep-finding catcls_insert_catalog_classes in catalog_class.c. The two surfaces share the same transaction so DDL is atomic across both.
catcls_Enable is a global toggle for catalog-class maintenance. Verified by the extern bool catcls_Enable declaration in catalog_class.h. When false, the system classes aren’t kept in sync — used during installation and migration.
CLS_INFO::ci_time_stamp is the cache-validation token. Verified by the ci_time_stamp field on the cls_info struct in system_catalog.h. Optimizer caches CLS_INFO in process memory; cache invalidation compares stored and current timestamps.
Seven recovery functions handle catalog log records. Verified by the recovery-function declarations in system_catalog.h: catalog_rv_new_page_redo, catalog_rv_insert_redo, catalog_rv_insert_undo, catalog_rv_delete_redo, catalog_rv_delete_undo, catalog_rv_update, catalog_rv_ovf_page_logical_insert_undo. The last one is notable — overflow-page insertion has a logical undo (the redo would replay page allocation; logical undo de-allocates the page through the file manager).
Default sampling page count is 5000 with a sampling threshold of 5000. Verified by the sampling constants in statistics.h. STATS_SAMPLING_THRESHOLD = 5000 is the trial count; NUMBER_OF_SAMPLING_PAGES = 5000 is the page budget; EXPECTED_ROWS_PER_PAGE = 20 is the fan-out assumption.
The compound-key partial-cardinality array pkeys[] is sized at 8 by default. Verified by the BTREE_STATS_PKEYS_NUM = 8 macro in statistics.h. Compound indexes deeper than 8 columns lose per-prefix selectivity tracking past the 8th.

Open questions

Where exactly does the root class’s OID live on disk? The boot_DB_parm record holds boot parameters, but the root class’s OID is one specific field there. Investigation path: read boot_sr.c around line 1969 (boot_restart_server) and trace where it loads the root-class OID.
Catalog overflow-page logical-undo discipline. catalog_rv_ovf_page_logical_insert_undo is logical, but what’s the sequence of file-manager calls during the undo path that ensures the overflow page returns cleanly? Investigation path: read its body and chase file_dealloc_page calls.
Synchronisation between internal catalog and user system classes. A DDL must update both in lockstep. What prevents another reader from observing the internal catalog updated but _db_class not yet? Investigation path: trace lock acquisition order in DDL paths; check whether the catalog X-lock covers both faces.
catcls_update_class_stats cadence. Stats updates flow through this function. Is it synchronous with the SQL UPDATE STATISTICS command, or is there a background sweep? Investigation path: grep callers; check for daemon registration.
Catalog-class cache invalidation across servers in HA. On a slave server, when a master DDL is replayed, does the slave invalidate its catalog-class hash table? Investigation path: cubrid-cdc.md plus catcls_finalize_class_oid_to_oid_hash_table.
catalog_reclaim_space cadence and triggers. Catalog compaction is presumably rare, but the trigger isn’t named in the header. Investigation path: grep for callers; check for use in boot_restart_server or a background daemon.

Beyond CUBRID — Comparative Designs and Research Frontiers

Pointers, not analysis.

PostgreSQL pg_class — single catalog table, accessed through normal heap+index machinery. Bootstrap via genbki.pl script + pg_*_d.h macros. CUBRID’s split design trades unification for a more compact internal record format the optimizer reads directly.
MySQL data dictionary (8.0+) — InnoDB tables since 8.0, before that was the FRM file. CUBRID’s split predates and is closer to pre-8.0 MySQL conceptually (a binary structure separate from the SQL face).
Oracle’s bootstrap segment — single-row obj$ seed read at instance start. CUBRID’s boot_DB_parm plus root-class OID is the same idea with two anchors.
Schema versioning by REPR_ID is similar to PG’s pg_attribute.atttypid versioning — both engines decode rows by their stored representation, allowing online ALTER TABLE without rewriting all rows immediately. The difference is in where the version lives: CUBRID per-row (REPR_ID in the row header), PG per-table-version (DDL gets a new pg_class row).
InnoDB’s mysql.innodb_index_stats — separate table for per-index stats. Compared to CUBRID’s inline bt_stats[], this is heavier to query but lighter to update.
HyPer / Vectorwise compressed catalogs — research engines that compress the catalog structure for in-memory caching. CUBRID’s DISK_REPR is already compact; in-memory variants could collapse it further.

Sources

Raw analyses (`raw/code-analysis/cubrid/storage/catalog_manager/`)

1._Catalog_Overview.pdf
2._Root_Class.pdf
3._System_Catalog_n_Statistics.pdf
4._Catalog_Classes_n_boot_DB_parm.pdf
cls_info_rec.pptx
CUBRID Catalog Access.pptx

Sibling docs

knowledge/code-analysis/cubrid/cubrid-heap-manager.md — heap files the catalog records live on.
knowledge/code-analysis/cubrid/cubrid-btree.md — BTREE_STATS consumers; index-stats source.
knowledge/code-analysis/cubrid/cubrid-recovery-manager.md — catalog catalog_rv_* functions in RV_fun[].
knowledge/code-analysis/cubrid/cubrid-log-manager.md — system-op bracket discipline DDL uses.
knowledge/code-analysis/cubrid/cubrid-cdc.md — DDL events surfaced from catalog mutations; in-progress in the same batch.

Textbook chapters (under `knowledge/research/dbms-general/`)

Database Internals (Petrov), Ch. 1 §“Database storage” (boot anchors), Ch. 7 §“Storage Engines” (catalog as metadata).

CUBRID source (`/data/hgryoo/references/cubrid/`)

src/storage/system_catalog.{c,h}
src/storage/catalog_class.{c,h}
src/storage/statistics.h, statistics_{cl,sr}.{c,h}
src/transaction/boot_sr.{c,h}
src/object/schema_system_catalog_install.cpp — install-time hard-coded schema for the system classes (CUBRID AGENTS.md §“Add info schema view”).