CUBRID DDL & Schema — Section Overview
Contents:
- What this section covers
- The schema graph
- Reading order
- Cross-cutting concerns
- Detail-doc summaries
- Adjacent sections
What this section covers
Section titled “What this section covers”This section documents CUBRID’s schema layer — the band that sits above the storage engine (heap, B-tree, MVCC, lock manager) and below the executor (XASL, scan procedures, query optimizer). The schema layer is where the database’s self-description is created, mutated, read, and policed: every CREATE/ALTER/DROP statement passes through it, every DML statement reads from it to learn the row shape, every plan compile consults it to learn the access paths and the distributions of the columns, and every privilege check terminates inside it. When the schema layer changes its mind — when a column is added, a trigger fires, an owner is rotated, or statistics are refreshed — that decision propagates outward through the XASL cache, the workspace MOP table, the locator’s grant cache, and ultimately into the next compiled plan. The six detail documents under this subcategory describe how that propagation works end-to-end. This overview does not repeat their content; it links and orients.
The schema layer is responsible for three things that are kept strictly separate in CUBRID’s source tree, even though they describe the same logical “class”:
- A pipeline that turns a parse tree into an applied schema
change without exposing partial state to concurrent readers
(
cubrid-ddl-execution.md). - An on-disk projection that lets
SELECT * FROM _db_classwork just like any other table — and that survives crash recovery through the WAL because catalog rows are stored in heap pages like user data (cubrid-catalog-manager.md). - An in-memory graph that the executor and the locator can
traverse cheaply, without hitting disk on every column lookup
(
cubrid-class-object.md).
Three further documents describe the cross-cutting features that hang
off the schema graph: triggers fire on DML against any class
(cubrid-trigger.md); authorization checks gate every fetch
(cubrid-authentication.md); and statistics, which the catalog
stores as a side-table on each disk representation, feed the cost
optimizer (cubrid-statistics.md).
The schema graph
Section titled “The schema graph”Three primary layers move every schema change through the engine.
- DDL execution (
cubrid-ddl-execution.md) — the pipeline from a PT_CREATE_ENTITY / PT_ALTER / PT_DROP parse-tree node to a committed change. It is realised bydo_create_entity/do_alter/do_dropinexecute_schema.c, which build anSM_TEMPLATEviadbt_create_class/smt_def_class, finish it throughsm_finish_class → update_class → install_new_representation, and then bumpsm_bump_local_schema_versionso caches downstream see the new shape. A failed transaction rolls back through the same WAL the storage engine uses for user data. - On-disk catalog (
cubrid-catalog-manager.md) — the representation that survivesSHUTDOWN. CUBRID has a split catalog: a compact, engine-internalsystem_catalogkeyed by(class OID, repr id)and rooted at a fixed root-class OID; and a parallel set of user-visible system classes (_db_class,_db_attribute,_db_index,_db_method,_db_partition,_db_trigger, …) accessed through the same heap+B-tree machinery as user data. Every committed DDL writes both views. - In-memory schema (
cubrid-class-object.md) — the live graph the executor walks at every reference resolution. AnSM_CLASSobject lives in client-side workspace memory, hung off a MOP, and carries theSM_ATTRIBUTEchain, theSM_REPRESENTATIONhistory needed to read old rows after ALTER, the partition graph, the constraint list, and theTR_TRIGGERcache.catcls_*mediates between this graph and the on-disk catalog.
Three cross-cutting concerns hook into all three primary layers.
- Triggers (
cubrid-trigger.md) — SQL-99 ECA active rules attached to a class. They are stored as_db_triggerrows, cached on theSM_CLASS, and fired from the locator at DML time viatr_prepare_class/tr_before_object/tr_after_object. - Authorization (
cubrid-authentication.md) — users, passwords, and per-class privilege bitmaps stored as MOP-keyed rows indb_user,db_password,_db_auth,db_authorization. The per-classAU_CLASS_CACHEcollapses each fetch’s grant lookup to one bitmask test, andau_check_authorizationis called inline by the locator before every class fetch. - Statistics (
cubrid-statistics.md) — cardinality, NDV, leaf page counts, and partial-key fanouts produced by a server-sidexstats_update_statisticsheap+B+Tree walk. They live on the latest disk representation in the catalog, ride the wire as a packed buffer, and feedqo_iscan_cost/qo_sscan_cost/qo_equal_selectivity/qo_range_selectivityin the optimizer.
flowchart TD
PT["PT_CREATE_ENTITY /<br/>PT_ALTER / PT_DROP<br/>(parse tree)"] --> DO["do_create_entity /<br/>do_alter / do_drop<br/>(execute_schema.c)"]
DO --> TMPL["SM_TEMPLATE<br/>(staging copy of class)"]
TMPL --> FIN["sm_finish_class →<br/>update_class →<br/>install_new_representation"]
FIN --> CATCLS["catcls_insert_catalog_classes /<br/>catcls_update_subset<br/>(_db_class, _db_attribute, ...)"]
FIN --> SMCLASS["SM_CLASS rebuild<br/>(workspace MOP)"]
CATCLS --> SYSCAT["system_catalog<br/>(disk repr, repr-id chain)"]
SMCLASS --> BUMP["sm_bump_local_schema_version"]
BUMP --> XASL["XASL cache invalidate<br/>(per-class chn bump)"]
BUMP --> MOPCHN["Workspace MOP cache headers<br/>updated"]
SMCLASS -.attaches.-> TR["TR_TRIGGER cache<br/>(cubrid-trigger.md)"]
SMCLASS -.attaches.-> AU["AU_CLASS_CACHE<br/>(cubrid-authentication.md)"]
SYSCAT -.side-table.-> ST["statistics buffer<br/>(cubrid-statistics.md)"]
TR -.fires from locator.-> DML["DML path<br/>(insert / update / delete)"]
AU -.au_check_authorization.-> DML
ST -.feeds.-> QO["qo_∗_cost,<br/>qo_∗_selectivity"]
The diagram shows the spine of the section: a schema mutation flows top-to-bottom through staging, on-disk write, in-memory rebuild, and cache invalidation. The dotted edges are the points where the three cross-cutting concerns attach to the spine.
Reading order
Section titled “Reading order”The recommended order is graph-first, because the in-memory graph is the structure every other doc projects from or projects into:
cubrid-class-object.md— start here. It defines theSM_CLASSgraph, theSM_ATTRIBUTEandSM_METHODchains, theSM_REPRESENTATIONhistory, the partition links, and the workspace-MOP placement. Every later doc refers to fields of this graph.cubrid-catalog-manager.md— read next. It is the on-disk projection of the same graph: how_db_classrows mirrorSM_CLASSheaders, how_db_attributerows mirror the attribute chain, and how the engine-internalsystem_catalogsits beside the user-visible system classes. Once you have the in-memory structure in mind, the disk layout reads as “the same thing, serialised”.cubrid-ddl-execution.md— read third. It is the pipeline that connects (1) and (2): how a parse tree builds anSM_TEMPLATE, howsm_finish_classwrites the catalog and rebuilds theSM_CLASS, how the schema-version bump invalidates downstream caches, and how all of this is wrapped in the same WAL/MVCC/lock framework as user-data DML.- Cross-cutting docs as needed. Read these on demand:
cubrid-authentication.mdwhen investigating a permission denial or grant/revoke path.cubrid-trigger.mdwhen investigating a DML statement that fires extra work, or asking how_db_triggerrows get from CREATE TRIGGER to runtime invocation.cubrid-statistics.mdwhen an optimizer plan looks wrong, or when ANALYZE / UPDATE STATISTICS is being added to a workload.
The order is intentional: trying to read DDL execution first leaves
you flipping back to look up SM_TEMPLATE and catcls_*. Trying to
read the catalog manager first leaves you wondering what the disk
rows are projections of. Reading the in-memory graph first lets the
other two docs land cleanly.
Cross-cutting concerns
Section titled “Cross-cutting concerns”Three concerns cut across this section and across the rest of the engine.
Statistics consumed by Query Processing. cubrid-statistics.md
is logically part of the DDL & Schema section because the per-class
statistics buffer is stored in the catalog and is created /
refreshed alongside the class’s disk representation. But every cost
estimate the optimizer makes — every join order, every access-path
choice, every selectivity multiplier — reads back through this
buffer. The detail doc traces both ends: the producer
(xstats_update_statistics walks the heap and B+Trees) and the
consumer (qo_get_attr_info deserialises the buffer on the client,
then qo_iscan_cost / qo_sscan_cost and the selectivity
estimators score plans). When the Query Processing section’s
cubrid-query-optimizer.md says “the optimizer reads NDV from the
catalog,” this is what it means.
Authentication consumed by every DML path. au_check_authorization
is invoked inline by the locator’s class-fetch path, so the
authorization cache is on the hot path of every SELECT, INSERT,
UPDATE, and DELETE — not just at session start. The per-class
AU_CLASS_CACHE exists precisely to make this check a single
bitmask test against the cached grant bitmap, so the read path does
not pay for a _db_auth lookup on every fetch. The locator
documentation in the Server Architecture / Storage Engine section
takes this cache for granted; cubrid-authentication.md is where it
is built.
Triggers fire from the locator at DML time. Trigger
registration is a DDL operation (CREATE TRIGGER) that writes a
_db_trigger row and attaches a TR_TRIGGER to the affected
SM_CLASS. Trigger execution is a DML operation: when the locator
or object_template applies an INSERT/UPDATE/DELETE, it calls
tr_prepare_class once and then tr_before_object /
tr_after_object per row or per statement. The two halves are
documented together in cubrid-trigger.md because the data-flow
ties them — the cached TR_TRIGGER is what makes the firing path
cheap, and the OID-stack recursion guard lives between the locator
and the trigger manager.
Detail-doc summaries
Section titled “Detail-doc summaries”| Doc | Module | Role | Key entry points |
|---|---|---|---|
| cubrid-ddl-execution.md | ddl-execution | Pipeline turning PT_CREATE_ENTITY / PT_ALTER / PT_DROP into a committed catalog write + in-memory rebuild + cache invalidation | do_create_entity, do_alter, do_drop, dbt_create_class, smt_def_class, sm_finish_class, update_class, install_new_representation, sm_bump_local_schema_version, locator_add_class, catcls_insert_catalog_classes |
| cubrid-catalog-manager.md | catalog-manager | Engine-internal system_catalog (CTID-keyed disk representations + statistics) plus the parallel user-visible _db_class/_db_attribute/_db_index/… system classes; bootstrapped from a fixed root-class OID | catalog_add_representation, catalog_get_representation, catalog_add_statistics, catalog_get_statistics, catcls_insert_*, boot_create_db_parm, root-class fixed OID |
| cubrid-class-object.md | class-object | In-memory schema graph: SM_CLASS with attribute chain, method chain, representation history, partition graph, constraint list, and trigger cache, allocated on the client-side workspace heap | SM_CLASS, SM_ATTRIBUTE, SM_METHOD, SM_REPRESENTATION, SM_CLASS_CONSTRAINT, classobj_*, sm_get_class_with_statistics, catcls_get_class_* (mediator) |
| cubrid-trigger.md | trigger | SQL-99 ECA rules: _db_trigger row + TR_TRIGGER in-memory cache hung off SM_CLASS, fired from locator-adjacent path; statement-level recursion controlled by an OID stack rather than a per-statement guard | tr_create_trigger, tr_drop_trigger, tr_prepare_class, tr_before_object, tr_after_object, TR_TRIGGER, TR_RECURSION_DECISION |
| cubrid-authentication.md | authentication | Users/passwords/grants modelled as MOPs in db_user, db_password, _db_auth, db_authorization; per-class AU_CLASS_CACHE reduces the read-path check to one bitmask test | au_login, au_fetch_class, au_check_authorization, AU_CLASS_CACHE, au_grant, au_revoke, db_user / _db_auth system classes |
| cubrid-statistics.md | statistics | Server-side heap+B+Tree walk produces per-class NDV / cardinality / leaf-pages / partial-key fanouts, persisted on the latest disk representation, shipped as a packed buffer, fed to the cost optimizer | xstats_update_statistics, stats_compile_disk_representation, qo_get_attr_info, qo_iscan_cost, qo_sscan_cost, qo_equal_selectivity, qo_range_selectivity |
Adjacent sections
Section titled “Adjacent sections”- Storage Engine. Catalog rows live in heap pages and are indexed
by B-trees just like user data, so
cubrid-heap-manager.md,cubrid-btree.md,cubrid-page-buffer-manager.md,cubrid-disk-manager.md,cubrid-mvcc.md, andcubrid-lock-manager.mdare all on the read/write path of every catalog access. The “split catalog” choice incubrid-catalog-manager.mdonly works because the engine-internalsystem_catalogand the user-visiblecatcls_*tables both ultimately ride the heap+B-tree+page-buffer stack — they are the same I/O mechanics underneath. - Query Processing. Statistics from
cubrid-statistics.mdfeed every cost estimate incubrid-query-optimizer.md. Authorization fromcubrid-authentication.mdis the gate every DML and every access-path resolution passes through, via the locator. Trigger firing fromcubrid-trigger.mdis invoked from the same DML paths the optimizer’s plans terminate in. In short, the schema layer is the data-dictionary plus access-policy plus active-rule half of the query system, and Query Processing is the other half. - Server Architecture. A DDL fence — the schema-version bump
inside
sm_bump_local_schema_version— is the trigger that invalidates the XASL cache and forces re-prepare on the next plan compile. This is the architectural seam where the schema layer meets the cross-session plan cache. See the server-architecture doc for the XASL cache shape; seecubrid-ddl-execution.mdfor the bump itself.
The schema layer is not a leaf module: every other section in
this knowledge base either reads from it (Query Processing, Storage
Engine on catalog pages, the locator’s authorization gate) or
writes through it (DDL itself). Treat the six detail docs as a
single subsystem; do not try to read any one of them in isolation
without at least skimming cubrid-class-object.md first.