Skip to content

CUBRID DDL & Schema — Section Overview

Contents:

This section documents CUBRID’s schema layer — the band that sits above the storage engine (heap, B-tree, MVCC, lock manager) and below the executor (XASL, scan procedures, query optimizer). The schema layer is where the database’s self-description is created, mutated, read, and policed: every CREATE/ALTER/DROP statement passes through it, every DML statement reads from it to learn the row shape, every plan compile consults it to learn the access paths and the distributions of the columns, and every privilege check terminates inside it. When the schema layer changes its mind — when a column is added, a trigger fires, an owner is rotated, or statistics are refreshed — that decision propagates outward through the XASL cache, the workspace MOP table, the locator’s grant cache, and ultimately into the next compiled plan. The six detail documents under this subcategory describe how that propagation works end-to-end. This overview does not repeat their content; it links and orients.

The schema layer is responsible for three things that are kept strictly separate in CUBRID’s source tree, even though they describe the same logical “class”:

  1. A pipeline that turns a parse tree into an applied schema change without exposing partial state to concurrent readers (cubrid-ddl-execution.md).
  2. An on-disk projection that lets SELECT * FROM _db_class work just like any other table — and that survives crash recovery through the WAL because catalog rows are stored in heap pages like user data (cubrid-catalog-manager.md).
  3. An in-memory graph that the executor and the locator can traverse cheaply, without hitting disk on every column lookup (cubrid-class-object.md).

Three further documents describe the cross-cutting features that hang off the schema graph: triggers fire on DML against any class (cubrid-trigger.md); authorization checks gate every fetch (cubrid-authentication.md); and statistics, which the catalog stores as a side-table on each disk representation, feed the cost optimizer (cubrid-statistics.md).

Three primary layers move every schema change through the engine.

  • DDL execution (cubrid-ddl-execution.md) — the pipeline from a PT_CREATE_ENTITY / PT_ALTER / PT_DROP parse-tree node to a committed change. It is realised by do_create_entity / do_alter / do_drop in execute_schema.c, which build an SM_TEMPLATE via dbt_create_class / smt_def_class, finish it through sm_finish_class → update_class → install_new_representation, and then bump sm_bump_local_schema_version so caches downstream see the new shape. A failed transaction rolls back through the same WAL the storage engine uses for user data.
  • On-disk catalog (cubrid-catalog-manager.md) — the representation that survives SHUTDOWN. CUBRID has a split catalog: a compact, engine-internal system_catalog keyed by (class OID, repr id) and rooted at a fixed root-class OID; and a parallel set of user-visible system classes (_db_class, _db_attribute, _db_index, _db_method, _db_partition, _db_trigger, …) accessed through the same heap+B-tree machinery as user data. Every committed DDL writes both views.
  • In-memory schema (cubrid-class-object.md) — the live graph the executor walks at every reference resolution. An SM_CLASS object lives in client-side workspace memory, hung off a MOP, and carries the SM_ATTRIBUTE chain, the SM_REPRESENTATION history needed to read old rows after ALTER, the partition graph, the constraint list, and the TR_TRIGGER cache. catcls_* mediates between this graph and the on-disk catalog.

Three cross-cutting concerns hook into all three primary layers.

  • Triggers (cubrid-trigger.md) — SQL-99 ECA active rules attached to a class. They are stored as _db_trigger rows, cached on the SM_CLASS, and fired from the locator at DML time via tr_prepare_class / tr_before_object / tr_after_object.
  • Authorization (cubrid-authentication.md) — users, passwords, and per-class privilege bitmaps stored as MOP-keyed rows in db_user, db_password, _db_auth, db_authorization. The per-class AU_CLASS_CACHE collapses each fetch’s grant lookup to one bitmask test, and au_check_authorization is called inline by the locator before every class fetch.
  • Statistics (cubrid-statistics.md) — cardinality, NDV, leaf page counts, and partial-key fanouts produced by a server-side xstats_update_statistics heap+B+Tree walk. They live on the latest disk representation in the catalog, ride the wire as a packed buffer, and feed qo_iscan_cost / qo_sscan_cost / qo_equal_selectivity / qo_range_selectivity in the optimizer.
flowchart TD
    PT["PT_CREATE_ENTITY /<br/>PT_ALTER / PT_DROP<br/>(parse tree)"] --> DO["do_create_entity /<br/>do_alter / do_drop<br/>(execute_schema.c)"]
    DO --> TMPL["SM_TEMPLATE<br/>(staging copy of class)"]
    TMPL --> FIN["sm_finish_class →<br/>update_class →<br/>install_new_representation"]
    FIN --> CATCLS["catcls_insert_catalog_classes /<br/>catcls_update_subset<br/>(_db_class, _db_attribute, ...)"]
    FIN --> SMCLASS["SM_CLASS rebuild<br/>(workspace MOP)"]
    CATCLS --> SYSCAT["system_catalog<br/>(disk repr, repr-id chain)"]
    SMCLASS --> BUMP["sm_bump_local_schema_version"]
    BUMP --> XASL["XASL cache invalidate<br/>(per-class chn bump)"]
    BUMP --> MOPCHN["Workspace MOP cache headers<br/>updated"]

    SMCLASS -.attaches.-> TR["TR_TRIGGER cache<br/>(cubrid-trigger.md)"]
    SMCLASS -.attaches.-> AU["AU_CLASS_CACHE<br/>(cubrid-authentication.md)"]
    SYSCAT -.side-table.-> ST["statistics buffer<br/>(cubrid-statistics.md)"]

    TR -.fires from locator.-> DML["DML path<br/>(insert / update / delete)"]
    AU -.au_check_authorization.-> DML
    ST -.feeds.-> QO["qo_∗_cost,<br/>qo_∗_selectivity"]

The diagram shows the spine of the section: a schema mutation flows top-to-bottom through staging, on-disk write, in-memory rebuild, and cache invalidation. The dotted edges are the points where the three cross-cutting concerns attach to the spine.

The recommended order is graph-first, because the in-memory graph is the structure every other doc projects from or projects into:

  1. cubrid-class-object.md — start here. It defines the SM_CLASS graph, the SM_ATTRIBUTE and SM_METHOD chains, the SM_REPRESENTATION history, the partition links, and the workspace-MOP placement. Every later doc refers to fields of this graph.
  2. cubrid-catalog-manager.md — read next. It is the on-disk projection of the same graph: how _db_class rows mirror SM_CLASS headers, how _db_attribute rows mirror the attribute chain, and how the engine-internal system_catalog sits beside the user-visible system classes. Once you have the in-memory structure in mind, the disk layout reads as “the same thing, serialised”.
  3. cubrid-ddl-execution.md — read third. It is the pipeline that connects (1) and (2): how a parse tree builds an SM_TEMPLATE, how sm_finish_class writes the catalog and rebuilds the SM_CLASS, how the schema-version bump invalidates downstream caches, and how all of this is wrapped in the same WAL/MVCC/lock framework as user-data DML.
  4. Cross-cutting docs as needed. Read these on demand:
    • cubrid-authentication.md when investigating a permission denial or grant/revoke path.
    • cubrid-trigger.md when investigating a DML statement that fires extra work, or asking how _db_trigger rows get from CREATE TRIGGER to runtime invocation.
    • cubrid-statistics.md when an optimizer plan looks wrong, or when ANALYZE / UPDATE STATISTICS is being added to a workload.

The order is intentional: trying to read DDL execution first leaves you flipping back to look up SM_TEMPLATE and catcls_*. Trying to read the catalog manager first leaves you wondering what the disk rows are projections of. Reading the in-memory graph first lets the other two docs land cleanly.

Three concerns cut across this section and across the rest of the engine.

Statistics consumed by Query Processing. cubrid-statistics.md is logically part of the DDL & Schema section because the per-class statistics buffer is stored in the catalog and is created / refreshed alongside the class’s disk representation. But every cost estimate the optimizer makes — every join order, every access-path choice, every selectivity multiplier — reads back through this buffer. The detail doc traces both ends: the producer (xstats_update_statistics walks the heap and B+Trees) and the consumer (qo_get_attr_info deserialises the buffer on the client, then qo_iscan_cost / qo_sscan_cost and the selectivity estimators score plans). When the Query Processing section’s cubrid-query-optimizer.md says “the optimizer reads NDV from the catalog,” this is what it means.

Authentication consumed by every DML path. au_check_authorization is invoked inline by the locator’s class-fetch path, so the authorization cache is on the hot path of every SELECT, INSERT, UPDATE, and DELETE — not just at session start. The per-class AU_CLASS_CACHE exists precisely to make this check a single bitmask test against the cached grant bitmap, so the read path does not pay for a _db_auth lookup on every fetch. The locator documentation in the Server Architecture / Storage Engine section takes this cache for granted; cubrid-authentication.md is where it is built.

Triggers fire from the locator at DML time. Trigger registration is a DDL operation (CREATE TRIGGER) that writes a _db_trigger row and attaches a TR_TRIGGER to the affected SM_CLASS. Trigger execution is a DML operation: when the locator or object_template applies an INSERT/UPDATE/DELETE, it calls tr_prepare_class once and then tr_before_object / tr_after_object per row or per statement. The two halves are documented together in cubrid-trigger.md because the data-flow ties them — the cached TR_TRIGGER is what makes the firing path cheap, and the OID-stack recursion guard lives between the locator and the trigger manager.

DocModuleRoleKey entry points
cubrid-ddl-execution.mdddl-executionPipeline turning PT_CREATE_ENTITY / PT_ALTER / PT_DROP into a committed catalog write + in-memory rebuild + cache invalidationdo_create_entity, do_alter, do_drop, dbt_create_class, smt_def_class, sm_finish_class, update_class, install_new_representation, sm_bump_local_schema_version, locator_add_class, catcls_insert_catalog_classes
cubrid-catalog-manager.mdcatalog-managerEngine-internal system_catalog (CTID-keyed disk representations + statistics) plus the parallel user-visible _db_class/_db_attribute/_db_index/… system classes; bootstrapped from a fixed root-class OIDcatalog_add_representation, catalog_get_representation, catalog_add_statistics, catalog_get_statistics, catcls_insert_*, boot_create_db_parm, root-class fixed OID
cubrid-class-object.mdclass-objectIn-memory schema graph: SM_CLASS with attribute chain, method chain, representation history, partition graph, constraint list, and trigger cache, allocated on the client-side workspace heapSM_CLASS, SM_ATTRIBUTE, SM_METHOD, SM_REPRESENTATION, SM_CLASS_CONSTRAINT, classobj_*, sm_get_class_with_statistics, catcls_get_class_* (mediator)
cubrid-trigger.mdtriggerSQL-99 ECA rules: _db_trigger row + TR_TRIGGER in-memory cache hung off SM_CLASS, fired from locator-adjacent path; statement-level recursion controlled by an OID stack rather than a per-statement guardtr_create_trigger, tr_drop_trigger, tr_prepare_class, tr_before_object, tr_after_object, TR_TRIGGER, TR_RECURSION_DECISION
cubrid-authentication.mdauthenticationUsers/passwords/grants modelled as MOPs in db_user, db_password, _db_auth, db_authorization; per-class AU_CLASS_CACHE reduces the read-path check to one bitmask testau_login, au_fetch_class, au_check_authorization, AU_CLASS_CACHE, au_grant, au_revoke, db_user / _db_auth system classes
cubrid-statistics.mdstatisticsServer-side heap+B+Tree walk produces per-class NDV / cardinality / leaf-pages / partial-key fanouts, persisted on the latest disk representation, shipped as a packed buffer, fed to the cost optimizerxstats_update_statistics, stats_compile_disk_representation, qo_get_attr_info, qo_iscan_cost, qo_sscan_cost, qo_equal_selectivity, qo_range_selectivity
  • Storage Engine. Catalog rows live in heap pages and are indexed by B-trees just like user data, so cubrid-heap-manager.md, cubrid-btree.md, cubrid-page-buffer-manager.md, cubrid-disk-manager.md, cubrid-mvcc.md, and cubrid-lock-manager.md are all on the read/write path of every catalog access. The “split catalog” choice in cubrid-catalog-manager.md only works because the engine-internal system_catalog and the user-visible catcls_* tables both ultimately ride the heap+B-tree+page-buffer stack — they are the same I/O mechanics underneath.
  • Query Processing. Statistics from cubrid-statistics.md feed every cost estimate in cubrid-query-optimizer.md. Authorization from cubrid-authentication.md is the gate every DML and every access-path resolution passes through, via the locator. Trigger firing from cubrid-trigger.md is invoked from the same DML paths the optimizer’s plans terminate in. In short, the schema layer is the data-dictionary plus access-policy plus active-rule half of the query system, and Query Processing is the other half.
  • Server Architecture. A DDL fence — the schema-version bump inside sm_bump_local_schema_version — is the trigger that invalidates the XASL cache and forces re-prepare on the next plan compile. This is the architectural seam where the schema layer meets the cross-session plan cache. See the server-architecture doc for the XASL cache shape; see cubrid-ddl-execution.md for the bump itself.

The schema layer is not a leaf module: every other section in this knowledge base either reads from it (Query Processing, Storage Engine on catalog pages, the locator’s authorization gate) or writes through it (DDL itself). Treat the six detail docs as a single subsystem; do not try to read any one of them in isolation without at least skimming cubrid-class-object.md first.