CUBRID Catalog Manager — Disk Representation, System Classes, and Statistics
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Source verification (as of 2026-04-30)
- Beyond CUBRID — Comparative Designs and Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”The catalog is the database’s self-description: every other subsystem — parser, optimizer, MVCC, lock manager, vacuum, CDC — asks the catalog questions like “what attributes does class C have?”, “where is its heap file?”, “what indexes target it?”, “how many rows does it have?”. Database Internals (Petrov, ch. 1 §“Database storage” and ch. 7 §“Storage Engines”) frames this as one of the two universal database invariants: schema and storage layout must be reconstructible from the bytes on disk without any out-of-band knowledge.
Two implementation choices the model leaves open shape every real engine and frame the rest of this document:
- Where the bootstrap “root” lives. The catalog needs a
well-known starting point — an OID, a fixed page id, or a file
id — that is the same on every CUBRID database file. Without
it, opening the catalog would itself need a catalog. Engines
pick: a fixed OID for a root class (CUBRID, OODB-style), a
fixed page in the system tablespace (Oracle’s bootstrap
segment), or a fixed table OID with hard-coded schema
(PostgreSQL’s
pg_class). - Whether storage layout and user-visible schema are the same
structure. Some engines unify them: PostgreSQL’s
pg_classis the table catalog, accessed via the same heap+index machinery as user tables. Others split them: an internal “catalog manager” stores compact disk-representation records for engine use, and a parallel set of user-visible system classes (_db_class,_db_attribute, …) lets SQL queries inspect the schema. CUBRID picks the split design — the internalsystem_catalogand the user-visiblecatcls_*tables coexist, with the user-side tables driven from the internal records.
Once those choices are named, every CUBRID-specific structure in this document either implements one of them or makes the access faster.
Common DBMS Design
Section titled “Common DBMS Design”Every relational engine reaches for a similar set of patterns around catalog storage and bootstrap.
Self-describing storage on top of itself
Section titled “Self-describing storage on top of itself”The catalog is stored in heap files like any other class — but
the heap manager needs catalog records to interpret pages. The
chicken-and-egg is resolved by bootstrap classes with hard-
coded schemas the engine can interpret without consulting the
catalog. CUBRID’s root class is the seed; from the root class
the engine learns about _db_class, from _db_class it learns
about _db_attribute, and so on.
Disk representation vs. logical schema
Section titled “Disk representation vs. logical schema”Engines distinguish between the physical disk representation
(the byte order of attributes, fixed vs. variable, pad bytes)
and the logical schema (column names, types, constraints).
The disk representation can change without invalidating
existing rows by versioning: each class has a list of
representations indexed by REPR_ID, and every heap row carries
the REPR_ID it was written under. ALTER TABLE bumps the
representation; old rows decode with the old representation,
new rows with the new.
Statistics as a separate, mutable record
Section titled “Statistics as a separate, mutable record”Statistics are the optimizer’s input, not part of the schema —
they change continuously. Engines store them adjacent to the
catalog but with a different update cadence. PostgreSQL has
pg_statistic, InnoDB has mysql.innodb_index_stats, CUBRID
has CLS_INFO carrying per-class stats and BTREE_STATS per
index.
Cardinality estimation hierarchy
Section titled “Cardinality estimation hierarchy”The optimizer asks at three granularities: per-class
(heap_num_objects, heap_num_pages), per-attribute
(n_distinct_values), per-index (B+Tree key count, leaf count,
height, partial-key cardinality for compound indexes). CUBRID’s
CLASS_STATS, ATTR_STATS, BTREE_STATS map one-to-one to
these levels.
Two access flavours: server-side and client-side
Section titled “Two access flavours: server-side and client-side”The server reads the catalog during query execution; the client
reads it during DDL parsing and schema introspection. CUBRID
ships parallel *_sr.c (server) and *_cl.c (client) sources
for statistics, with the _sr side authoritative.
Theory ↔ CUBRID mapping
Section titled “Theory ↔ CUBRID mapping”| Theoretical concept | CUBRID name |
|---|---|
| Catalog identifier (boot anchor) | CTID { vfid, xhid, hpgid } (system_catalog.h); global catalog_Id |
| Disk representation of one class | DISK_REPR { id, n_fixed, fixed[], n_variable, variable[] } (system_catalog.h) |
| Per-attribute disk info | DISK_ATTR { id, location, type, val_length, value, position, classoid, n_btstats, bt_stats, ndv } |
| Per-class catalog info | CLS_INFO { ci_hfid, ci_tot_pages, ci_tot_objects, ci_time_stamp, ci_rep_dir } |
| Per-access transient state | CATALOG_ACCESS_INFO { class_oid, dir_oid, class_name, is_update, … } |
| User-visible system classes | _db_class, _db_attribute, _db_index, _db_domain, _db_method, … |
| Catalog-class entry-point family | catcls_* functions (catalog_class.c) |
| Catalog primary heap | catalog_Id.vfid — file holding catalog records |
| Catalog directory hash | catalog_Id.xhid — extendible-hash index class_oid → dir_oid |
| Catalog header page | catalog_Id.hpgid — fixed page id holding catalog metadata |
| Class statistics | CLASS_STATS { time_stamp, heap_num_objects, heap_num_pages, n_attrs, attr_stats[] } (statistics.h) |
| Per-attribute statistics | ATTR_STATS { id, type, n_btstats, bt_stats[], ndv } |
| Per-index statistics | BTREE_STATS { btid, leafs, pages, height, keys, has_function, key_type, pkeys[] } |
| Recovery functions for catalog | catalog_rv_new_page_redo, catalog_rv_insert_{redo,undo}, catalog_rv_delete_{redo,undo}, catalog_rv_update, catalog_rv_ovf_page_logical_insert_undo |
| Server boot | boot_restart_server (boot_sr.c) |
| Catalog start at access | catalog_start_access_with_dir_oid |
| Catalog end at access | catalog_end_access_with_dir_oid |
CUBRID’s Approach
Section titled “CUBRID’s Approach”The catalog has four moving parts: the catalog identifier and its three-volume layout, the disk-representation records that capture per-class storage, the parallel user-visible system classes that surface the same data through SQL, and the statistics records whose update cadence is different from the schema. We walk them in that order.
Overall structure
Section titled “Overall structure”flowchart LR
subgraph BOOT["Bootstrap"]
ROOT["Root class (fixed OID)"]
BOOTFN["boot_restart_server"]
end
subgraph CTID["catalog_Id (CTID)"]
VFID["vfid: catalog heap file"]
XHID["xhid: extendible hash class_oid → dir_oid"]
HPGID["hpgid: catalog header page"]
end
subgraph CR["Catalog records (system_catalog.c)"]
DR1["DISK_REPR for class A repr 1"]
DR2["DISK_REPR for class A repr 2"]
CLI["CLS_INFO for class A"]
CR1["..."]
end
subgraph SC["User-visible system classes (catcls)"]
DBC["_db_class"]
DBA["_db_attribute"]
DBI["_db_index"]
DBD["_db_domain"]
DBT["_db_data_type"]
end
subgraph ST["Statistics"]
CS["CLASS_STATS"]
AS["ATTR_STATS"]
BS["BTREE_STATS"]
end
BOOTFN --> ROOT
BOOTFN --> CTID
CTID --> CR
ROOT --> SC
SC -.synchronised with.-> CR
CR -.cardinality from.-> ST
ST --> CS --> AS --> BS
The figure encodes three boundaries. (boot / runtime) the
root class is the only OID the engine knows ahead of time;
everything else is reachable from it. (internal / user-visible)
catalog records under catalog_Id are compact disk
representations the engine reads on hot paths; system classes
under _db_* are the schema-introspection face SQL queries
hit. (schema / stats) representations describe shape and do
not change often; statistics describe size and change every
analyze.
CTID — the catalog’s three-part identity
Section titled “CTID — the catalog’s three-part identity”The catalog itself is a small data structure with three pointers:
// CTID — src/storage/system_catalog.hstruct ctid{ VFID vfid; /* catalog volume identifier — heap file holding catalog records */ EHID xhid; /* extendible hash index identifier — class_oid → dir_oid map */ PAGEID hpgid; /* catalog header page identifier */};typedef struct ctid CTID;
extern CTID catalog_Id; /* global catalog identifier */The three components correspond to three on-disk objects:
vfid— a heap file that stores the catalog’sDISK_REPRandCLS_INFOrecords. It is treated like any other heap by the heap manager (cubrid-heap-manager.md), with one extra invariant: catalog records must never be vacuumed away while any transaction can still see the class they describe.xhid— an extendible-hash index keyed onclass_oid, pointing to the directory record OID for that class. The directory record in turn lists the OIDs of allDISK_REPRrecords for the class (one per representation, plus one forCLS_INFO).hpgid— the catalog header page, holding global catalog metadata (version, last-allocated representation id, etc.). Its page id is fixed at boot time and never changes.
catalog_initialize (system_catalog.c) populates the
global catalog_Id from these three values during boot. From
that moment, every catalog access starts with catalog_Id and
descends into one of the three components.
Disk representation — one record per attribute layout
Section titled “Disk representation — one record per attribute layout”Per class, per representation:
// DISK_REPR — src/storage/system_catalog.hstruct disk_representation{ REPR_ID id; /* representation identifier */ int n_fixed; /* number of fixed-length attributes */ struct disk_attribute *fixed; /* fixed attribute structures */ int fixed_length; /* total length of fixed attrs */ int n_variable; /* number of variable-length attrs */ struct disk_attribute *variable; /* variable attribute structures */};The split between fixed and variable is the on-disk layout
choice: fixed-length attributes pack tightly with no
per-attribute offset overhead, variable-length attributes carry
an offset table at the front of the row. Iterating over
disk_representation::fixed[] and ::variable[] is exactly the
order the heap manager uses to interpret a row.
Each attribute carries:
// DISK_ATTR — src/storage/system_catalog.hstruct disk_attribute{ ATTR_ID id; /* attribute identifier */ int location; /* fixed: exact offset; variable: index into the offset table */ DB_TYPE type; /* int / varchar / float / … */ int val_length; /* default value length, ≥ 0 */ void *value; /* default value (no default expression) */ int position; /* storage position (fixed only) */ OID classoid; /* source class — for inherited attrs */ int n_btstats; /* number of B+tree statistics */ BTREE_STATS *bt_stats; /* per-index stats array */ INT64 ndv; /* Number of Distinct Values */};Two fields worth marking up. (classoid) for inherited
attributes, the classoid distinguishes which class along the
inheritance chain originally defined the attribute. The
optimizer uses this to avoid double-counting inherited
attributes when computing class cardinality. (bt_stats[] and
ndv) statistics live inside the attribute record, not in a
separate file. This is the trade-off: ALTER STATS rewrites the
attribute record, which is heavier than a separate stats table
would be, but reading the disk representation gives the
optimizer everything in one fetch.
Per-class info — the heap pointer and rough counts
Section titled “Per-class info — the heap pointer and rough counts”CLS_INFO is the per-class summary record:
// CLS_INFO — src/storage/system_catalog.hstruct cls_info{ HFID ci_hfid; /* heap file identifier for the class */ int ci_tot_pages; /* total pages in the heap file */ int ci_tot_objects; /* total live objects */ unsigned int ci_time_stamp; /* timestamp of last update */ OID ci_rep_dir; /* representation directory record OID */};ci_hfid is the most-read field. Every query that scans a
class starts by fetching the class’s CLS_INFO from the catalog,
reading ci_hfid, and handing the heap file to the scan
manager.
ci_rep_dir is the back-pointer from CLS_INFO to the directory
record listing all representations of this class. Following the
chain class_oid → xhid → dir_oid → DISK_REPR is the standard
lookup; following CLS_INFO::ci_rep_dir → DISK_REPR is the
inverse for traversal during ALTER.
ci_time_stamp is the cache-validation token: the optimizer
caches CLS_INFO in process memory and invalidates the cache
when ci_time_stamp advances.
Catalog access — the per-access state
Section titled “Catalog access — the per-access state”Every catalog read or write goes through a CATALOG_ACCESS_INFO
session:
// CATALOG_ACCESS_INFO — src/storage/system_catalog.hstruct catalog_access_info{ OID *class_oid; OID *dir_oid; /* cached after first xhid lookup */ char *class_name; bool is_update; /* update access — needs X locks */ bool need_unlock; /* unlock at end-access time */ bool access_started; /* guard against double-start */ bool need_free_class_name;#if !defined (NDEBUG) bool is_systemop_started;#endif};The session is opened with catalog_start_access_with_dir_oid
and closed with catalog_end_access_with_dir_oid. Between the
two, the caller has the directory OID cached, the class lock
acquired (S for read, X for update), and (in update sessions) a
system-op bracket open so partial catalog updates can be rolled
back as a logical unit. The debug-only is_systemop_started
field assert-checks this discipline.
Catalog-class machinery — user-visible schema
Section titled “Catalog-class machinery — user-visible schema”Parallel to catalog_Id-rooted records, CUBRID maintains a set
of user-visible classes that mirror the same data through SQL.
The classes are conventionally named _db_*:
_db_class— one row per class, with name, OID, owner, type (table / view / partition), creation time._db_attribute— one row per attribute of a class._db_index— one row per index._db_domain— one row per type domain (used for compound domains)._db_data_type— system data-type catalogue._db_method,_db_meth_arg,_db_meth_file,_db_method_sig— for OODB methods._db_partition— partitioning info._db_trigger— triggers._db_serial— sequences._db_collation— collation catalogue._db_charset,_db_servers,_db_user,_db_auth,_db_password,_db_synonym, … — auxiliary.
The catcls_* family in catalog_class.c is the bridge:
// catalog_class.h — src/storage/catalog_class.hextern bool catcls_Enable;
int catcls_compile_catalog_classes (THREAD_ENTRY *thread_p);int catcls_insert_catalog_classes (THREAD_ENTRY *thread_p, RECDES *record);int catcls_delete_catalog_classes (THREAD_ENTRY *thread_p, const char *name, OID *class_oid);int catcls_update_catalog_classes (THREAD_ENTRY *thread_p, const char *name, RECDES *record, OID *class_oid_p, UPDATE_INPLACE_STYLE force_in_place);int catcls_finalize_class_oid_to_oid_hash_table (THREAD_ENTRY *thread_p);int catcls_remove_entry (THREAD_ENTRY *thread_p, OID *class_oid);int catcls_get_server_compat_info (THREAD_ENTRY *thread_p, INTL_CODESET *charset_id_p, char *lang_buf, const int lang_buf_size, char *timezone_checksum);int catcls_get_db_collation (THREAD_ENTRY *thread_p, LANG_COLL_COMPAT **db_collations, int *coll_cnt);int catcls_update_class_stats (THREAD_ENTRY *thread_p, const char *class_name, unsigned int ci_time_stamp, bool with_fullscan);When DDL creates or alters a class, the engine writes an internal
DISK_REPR to the catalog and inserts a row into _db_class (and
related rows into _db_attribute, _db_index, …). The two writes
are bracketed in a single transaction so they commit together; if
they diverge (e.g., crash between writes), the recovery pass rolls
the partial work back as one unit. Whether concurrent readers can
ever observe the two faces out of step under lock-mode escalation
is open (see Open Q3).
catcls_compile_catalog_classes is the function that originally
builds the system classes from a hard-coded schema; it runs at
install time. (The neighbouring symbol
catcls_insert_catalog_classes is the row-insert entry point used
at DDL time, not the install-time compiler.) The schema source lives in
src/object/schema_system_catalog_install.cpp (cubrid AGENTS.md
mentions strict formatting rules there).
Bootstrap — boot_restart_server and the root class
Section titled “Bootstrap — boot_restart_server and the root class”boot_restart_server (boot_sr.c) is the post-recovery
entry point that brings the catalog online:
- Initialize log and recover.
log_initializeruns the three-pass restart (cubrid-recovery-manager.md). - Initialize disk and file managers.
- Read
catalog_Id. From the database parameter file (theboot_DB_parmrecord) plus the on-disk header; this gives(vfid, xhid, hpgid). catalog_initialize (catalog_Id). Sets up in-memory structures.- Bind the root class. The root class has a fixed OID
stored in the boot parameters; the engine reads its
DISK_REPRfrom the catalog and primes the metaclass cache. - Bind the system classes. Walking from the root class,
_db_class,_db_attribute, etc. are loaded with the class-name ↔ OID mapping cached incatcls_class_oid_to_oid_hash_table. - Initialize statistics caches.
- Start vacuum master + workers (cubrid-vacuum.md).
After step 7 the server is ready to accept queries.
boot_DB_parm is the on-disk parameter record; updates to it
flow through catcls_update_catalog_classes so they’re durable.
Statistics — separate cadence, separate access
Section titled “Statistics — separate cadence, separate access”Statistics are part of the same DISK_ATTR record but updated on
a different cadence — every UPDATE STATISTICS or fullscan. The
statistics structures:
// statistics.h — src/storage/statistics.hstruct btree_stats{ BTID btid; int leafs; /* leaf pages including overflow */ int pages; /* total pages */ int height; /* tree depth */ int keys; /* distinct keys */ int has_function; /* function index? */ TP_DOMAIN *key_type; int pkeys_size; /* compound-key partial-cardinality array */ int *pkeys; /* pkeys[k] = NDV of first k+1 components */ int dedup_idx; /* SUPPORT_DEDUPLICATE_KEY_MODE */};
struct attr_stats{ int id; DB_TYPE type; int n_btstats; BTREE_STATS *bt_stats; INT64 ndv; /* Number of Distinct Values */};
struct class_stats{ unsigned int time_stamp; int heap_num_objects; int heap_num_pages; int n_attrs; ATTR_STATS *attr_stats;};The pkeys[] array is worth marking up. For a compound
index (a, b, ..., x) of size pkeys_size = k, pkeys[i] is
the cardinality of the first i+1 columns. The optimizer uses
this to estimate selectivity for queries that filter on a prefix
of the index — without it, every prefix query would have to
assume independent column distributions.
STATS_SAMPLING_THRESHOLD = 5000 and NUMBER_OF_SAMPLING_PAGES = 5000 (declared in statistics.h) are the sampling defaults;
full-scan mode (STATS_WITH_FULLSCAN) is the alternative. The
stats_adjust_sampling_weight inline (in statistics.h)
applies a differential weight when sampling NDV is below 1% of
expected; the assumption is that “if the sample data is a lot of
duplicated, there will also be duplicate in the overall data”.
Server-side statistics access goes through statistics_sr.c;
client-side (the SQL interface) through statistics_cl.c plus
stats_get_statistics (declared in statistics.h,
SERVER_MODE-disabled).
One ALTER, end to end
Section titled “One ALTER, end to end”sequenceDiagram participant CL as DDL parser participant CAT as catalog (system_catalog) participant CC as catalog_class (catcls_*) participant LM as log_manager (sysop) participant LCK as lock_manager CL->>LCK: X-lock class_oid CL->>LM: log_sysop_start (DDL atomic) CL->>CAT: catalog_start_access_with_dir_oid (X) CL->>CAT: catalog_add_representation (new DISK_REPR with new REPR_ID) CL->>CAT: catalog_update_class_info (CLS_INFO ci_time_stamp bumped) CL->>CC: catcls_update_catalog_classes (_db_class row) CL->>CC: catcls_update_catalog_classes (_db_attribute rows) CL->>CAT: catalog_end_access_with_dir_oid CL->>LM: log_sysop_commit CL->>LCK: X-lock release at commit
The two catalog faces — internal records and user-visible system classes — are updated in the same system op, so a crash mid-update either rolls them both back or keeps them both. The DDL transaction commits as a single atomic unit even though it touched ~10 different files.
Source Walkthrough
Section titled “Source Walkthrough”Anchor on symbol names, not line numbers.
Headers and types
Section titled “Headers and types”CTID(system_catalog.h) — catalog identifier.DISK_REPR/DISK_ATTR(system_catalog.h) — disk representation records.CLS_INFO(system_catalog.h) — per-class summary.CATALOG_ACCESS_INFO(system_catalog.h) — per-access session.CATALOG_DIR_REPR_KEY = -2macro (system_catalog.h) — directory key sentinel.BTREE_STATS/ATTR_STATS/CLASS_STATS(statistics.h).BTREE_STATS_PKEYS_NUM = 8macro (statistics.h) — compound-key array bound.STATS_SAMPLING_THRESHOLD/NUMBER_OF_SAMPLING_PAGES(statistics.h).
Lifecycle
Section titled “Lifecycle”catalog_initialize(system_catalog.c).catalog_finalize(system_catalog.c).catalog_create(system_catalog.c) — first-time setup; called only by root.catalog_destroy(system_catalog.c) — drop catalog.catalog_reclaim_space(system_catalog.c) — compact fragmented catalog records.
Read access
Section titled “Read access”catalog_get_class_info(system_catalog.c).catalog_get_representation(system_catalog.c).catalog_get_representation_directory(system_catalog.c).catalog_get_last_representation_id(system_catalog.c).catalog_get_class_info_from_record(system_catalog.c) — decode CLS_INFO from a heap record.catalog_get_dir_oid_from_cache(system_catalog.c) — cache-aware lookup.
Write access
Section titled “Write access”catalog_add_representation(system_catalog.c).catalog_add_class_info(system_catalog.c).catalog_update_class_info(system_catalog.c).catalog_drop_old_representations(system_catalog.c).catalog_insert/catalog_update/catalog_delete(system_catalog.c) — generic record-level ops.
Session bracket
Section titled “Session bracket”catalog_start_access_with_dir_oid(system_catalog.c).catalog_end_access_with_dir_oid(system_catalog.c).
Recovery functions
Section titled “Recovery functions”catalog_rv_new_page_redo,catalog_rv_insert_redo/_undo,catalog_rv_delete_redo/_undo,catalog_rv_update,catalog_rv_ovf_page_logical_insert_undo(declared insystem_catalog.h, defined insystem_catalog.c).
Cardinality
Section titled “Cardinality”catalog_get_cardinality(system_catalog.c).catalog_get_cardinality_by_name(system_catalog.c).
User-visible system classes
Section titled “User-visible system classes”catcls_compile_catalog_classes(catalog_class.c) — install-time schema build.catcls_insert_catalog_classes(catalog_class.c).catcls_update_catalog_classes(catalog_class.c).catcls_delete_catalog_classes(catalog_class.c).catcls_remove_entry(catalog_class.c).catcls_get_server_compat_info(catalog_class.c) — charset / locale / timezone compatibility check at boot.catcls_get_db_collation(catalog_class.c).catcls_update_class_stats(catalog_class.c).catcls_finalize_class_oid_to_oid_hash_table(catalog_class.c).catcls_find_and_set_cached_class_oid(catalog_class.c).
boot_restart_server(boot_sr.c) — main boot entry.- The boot parameter record (
boot_DB_parm) — disk-resident database parameters including catalog OIDs.
Statistics
Section titled “Statistics”stats_get_statistics(statistics.h, defined instatistics_cl.c) — client-side fetch.stats_dump/stats_ndv_dump(statistics.h) — debugging.stats_make_select_list_for_ndv(statistics.h).stats_get_ndv_by_query(statistics.h).stats_adjust_sampling_weight(statistics.h, inline).
Position hints as of 2026-04-30
Section titled “Position hints as of 2026-04-30”| Symbol | File | Line |
|---|---|---|
CTID (struct) | system_catalog.h | 45 |
DISK_REPR (struct) | system_catalog.h | 63 |
DISK_ATTR (struct) | system_catalog.h | 80 |
CLS_INFO (struct) | system_catalog.h | 96 |
CATALOG_ACCESS_INFO (struct) | system_catalog.h | 106 |
catalog_Id (extern global) | system_catalog.h | 153 |
BTREE_STATS (struct) | statistics.h | 61 |
ATTR_STATS (struct) | statistics.h | 82 |
CLASS_STATS (struct) | statistics.h | 93 |
stats_adjust_sampling_weight (inline) | statistics.h | 135 |
catalog_initialize | system_catalog.c | 2577 |
catalog_finalize | system_catalog.c | 2607 |
catalog_get_class_info_from_record | system_catalog.c | 504 |
catalog_initialize_max_space | system_catalog.c | 549 |
catalog_initialize_new_page | system_catalog.c | 598 |
catalog_add_representation | system_catalog.c | 2815 |
catalog_add_class_info | system_catalog.c | 3029 |
catalog_update_class_info | system_catalog.c | 3172 |
catalog_get_class_info | system_catalog.c | 4113 |
catcls_insert_catalog_classes | catalog_class.c | 4310 |
boot_restart_server | boot_sr.c | 1969 |
Source verification (as of 2026-04-30)
Section titled “Source verification (as of 2026-04-30)”Verified facts
Section titled “Verified facts”-
Catalog source files are
system_catalog.{c,h}andcatalog_class.{c,h}, notcatalog_manager.{c,h}. Verified byfind -name 'catalog*'. The skeleton’sreferences:list namedcatalog_manager.{c,h}(which doesn’t exist); corrected at draft time. The misnomer comes from the vendor decks calling the subsystem “catalog manager” in prose. -
The catalog identifier
CTIDis a triple of(vfid, xhid, hpgid). Verified by reading theCTIDstruct insystem_catalog.h. The three correspond to a heap file (catalog records), an extendible-hash index (class_oid → dir_oid), and a fixed header page. -
catalog_Idis a global, not per-thread state. Verified by theextern CTID catalog_Iddeclaration insystem_catalog.h. The identifier is set once at boot (catalog_initialize) and never changes for the lifetime of the server. -
DISK_REPR splits attributes into fixed and variable arrays, not a single ordered list. Verified by the
disk_representationstruct insystem_catalog.h(separaten_fixed/fixed[]andn_variable/variable[]fields). Implication: row decoding reads fixed attributes by exact offset, variable attributes via the per-row offset table. -
Statistics live inline on the attribute record, not in a separate file. Verified by the
n_btstats,bt_stats, andndvfields onDISK_ATTRinsystem_catalog.h. Cost: stats updates rewrite the attribute record. Benefit: optimizer reads schema and stats in one fetch. -
CATALOG_ACCESS_INFOcarries a debug-onlyis_systemop_startedfield. Verified by the NDEBUG-conditional field on thecatalog_access_infostruct insystem_catalog.h. Its purpose is to assert that update-mode catalog accesses are properly bracketed in a system op. -
The user-visible system classes are populated through the
catcls_*family incatalog_class.c, separate from the internalcatalog_*insystem_catalog.c. Verified by readingcatalog_class.h(the entire header is small — under ~50 lines) and grep-findingcatcls_insert_catalog_classesincatalog_class.c. The two surfaces share the same transaction so DDL is atomic across both. -
catcls_Enableis a global toggle for catalog-class maintenance. Verified by theextern bool catcls_Enabledeclaration incatalog_class.h. When false, the system classes aren’t kept in sync — used during installation and migration. -
CLS_INFO::ci_time_stampis the cache-validation token. Verified by theci_time_stampfield on thecls_infostruct insystem_catalog.h. Optimizer caches CLS_INFO in process memory; cache invalidation compares stored and current timestamps. -
Seven recovery functions handle catalog log records. Verified by the recovery-function declarations in
system_catalog.h:catalog_rv_new_page_redo,catalog_rv_insert_redo,catalog_rv_insert_undo,catalog_rv_delete_redo,catalog_rv_delete_undo,catalog_rv_update,catalog_rv_ovf_page_logical_insert_undo. The last one is notable — overflow-page insertion has a logical undo (the redo would replay page allocation; logical undo de-allocates the page through the file manager). -
Default sampling page count is 5000 with a sampling threshold of 5000. Verified by the sampling constants in
statistics.h.STATS_SAMPLING_THRESHOLD = 5000is the trial count;NUMBER_OF_SAMPLING_PAGES = 5000is the page budget;EXPECTED_ROWS_PER_PAGE = 20is the fan-out assumption. -
The compound-key partial-cardinality array
pkeys[]is sized at 8 by default. Verified by theBTREE_STATS_PKEYS_NUM = 8macro instatistics.h. Compound indexes deeper than 8 columns lose per-prefix selectivity tracking past the 8th.
Open questions
Section titled “Open questions”-
Where exactly does the root class’s OID live on disk? The
boot_DB_parmrecord holds boot parameters, but the root class’s OID is one specific field there. Investigation path: readboot_sr.caround line 1969 (boot_restart_server) and trace where it loads the root-class OID. -
Catalog overflow-page logical-undo discipline.
catalog_rv_ovf_page_logical_insert_undois logical, but what’s the sequence of file-manager calls during the undo path that ensures the overflow page returns cleanly? Investigation path: read its body and chasefile_dealloc_pagecalls. -
Synchronisation between internal catalog and user system classes. A DDL must update both in lockstep. What prevents another reader from observing the internal catalog updated but
_db_classnot yet? Investigation path: trace lock acquisition order in DDL paths; check whether the catalog X-lock covers both faces. -
catcls_update_class_statscadence. Stats updates flow through this function. Is it synchronous with the SQLUPDATE STATISTICScommand, or is there a background sweep? Investigation path: grep callers; check for daemon registration. -
Catalog-class cache invalidation across servers in HA. On a slave server, when a master DDL is replayed, does the slave invalidate its catalog-class hash table? Investigation path: cubrid-cdc.md plus
catcls_finalize_class_oid_to_oid_hash_table. -
catalog_reclaim_spacecadence and triggers. Catalog compaction is presumably rare, but the trigger isn’t named in the header. Investigation path: grep for callers; check for use inboot_restart_serveror a background daemon.
Beyond CUBRID — Comparative Designs and Research Frontiers
Section titled “Beyond CUBRID — Comparative Designs and Research Frontiers”Pointers, not analysis.
-
PostgreSQL
pg_class— single catalog table, accessed through normal heap+index machinery. Bootstrap viagenbki.plscript +pg_*_d.hmacros. CUBRID’s split design trades unification for a more compact internal record format the optimizer reads directly. -
MySQL data dictionary (8.0+) — InnoDB tables since 8.0, before that was the FRM file. CUBRID’s split predates and is closer to pre-8.0 MySQL conceptually (a binary structure separate from the SQL face).
-
Oracle’s bootstrap segment — single-row
obj$seed read at instance start. CUBRID’sboot_DB_parmplus root-class OID is the same idea with two anchors. -
Schema versioning by REPR_ID is similar to PG’s
pg_attribute.atttypidversioning — both engines decode rows by their stored representation, allowing online ALTER TABLE without rewriting all rows immediately. The difference is in where the version lives: CUBRID per-row (REPR_IDin the row header), PG per-table-version (DDL gets a newpg_classrow). -
InnoDB’s
mysql.innodb_index_stats— separate table for per-index stats. Compared to CUBRID’s inlinebt_stats[], this is heavier to query but lighter to update. -
HyPer / Vectorwise compressed catalogs — research engines that compress the catalog structure for in-memory caching. CUBRID’s
DISK_REPRis already compact; in-memory variants could collapse it further.
Sources
Section titled “Sources”Raw analyses (raw/code-analysis/cubrid/storage/catalog_manager/)
Section titled “Raw analyses (raw/code-analysis/cubrid/storage/catalog_manager/)”1._Catalog_Overview.pdf2._Root_Class.pdf3._System_Catalog_n_Statistics.pdf4._Catalog_Classes_n_boot_DB_parm.pdfcls_info_rec.pptxCUBRID Catalog Access.pptx
Sibling docs
Section titled “Sibling docs”knowledge/code-analysis/cubrid/cubrid-heap-manager.md— heap files the catalog records live on.knowledge/code-analysis/cubrid/cubrid-btree.md—BTREE_STATSconsumers; index-stats source.knowledge/code-analysis/cubrid/cubrid-recovery-manager.md— catalogcatalog_rv_*functions inRV_fun[].knowledge/code-analysis/cubrid/cubrid-log-manager.md— system-op bracket discipline DDL uses.knowledge/code-analysis/cubrid/cubrid-cdc.md— DDL events surfaced from catalog mutations; in-progress in the same batch.
Textbook chapters (under knowledge/research/dbms-general/)
Section titled “Textbook chapters (under knowledge/research/dbms-general/)”- Database Internals (Petrov), Ch. 1 §“Database storage” (boot anchors), Ch. 7 §“Storage Engines” (catalog as metadata).
CUBRID source (/data/hgryoo/references/cubrid/)
Section titled “CUBRID source (/data/hgryoo/references/cubrid/)”src/storage/system_catalog.{c,h}src/storage/catalog_class.{c,h}src/storage/statistics.h,statistics_{cl,sr}.{c,h}src/transaction/boot_sr.{c,h}src/object/schema_system_catalog_install.cpp— install-time hard-coded schema for the system classes (CUBRID AGENTS.md §“Add info schema view”).