PostgreSQL DDL Execution — ProcessUtility Dispatch, the Simple/Slow Split, and the CREATE TABLE Path
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”Every relational engine splits its statement set into two families. The
first family — SELECT, INSERT, UPDATE, DELETE, MERGE — can be
optimized by a cost-based planner: the planner examines statistics,
generates alternative plans, picks the cheapest, and produces an executor
tree. The second family — CREATE TABLE, ALTER TABLE, CREATE INDEX,
DROP, GRANT, and all the rest — cannot be cost-estimated because their
work is structural: they alter the shape of the catalog, not the content
of user rows. Database System Concepts (Silberschatz et al., 7th ed.,
§5.2) names this second family Data Definition Language (DDL) and
characterizes its job as maintaining the schema: the set of relation
definitions, type annotations, integrity constraints, and access-control
rules that every subsequent DML statement is compiled against.
The canonical DDL pipeline has five movements, regardless of which engine executes it:
- Parse the DDL statement into a definition tree that names the object being defined and all its attributes.
- Validate the definition against the existing catalog — namespace uniqueness, type resolution, permission checks.
- Materialise on-disk artefacts — allocate the heap file for a new table, the index file for a new index.
- Write catalog rows — insert or update
pg_class,pg_attribute,pg_index,pg_constraint, and related catalog tables so every other backend can discover the new object. - Invalidate caches — broadcast to all backends that any cached plan or relcache entry referencing the changed object must be discarded before the next compile.
Two design axes cut across all five movements and shape the implementation significantly.
Transactional vs. implicit-commit DDL. PostgreSQL, SQL Server, and DB2
execute DDL inside the same transaction and WAL-recovery framework as DML.
A CREATE TABLE followed by ROLLBACK is indistinguishable from no
CREATE TABLE at all — the catalog rows are simply not committed. Oracle
and pre-8.0 MySQL treat DDL as an implicit commit: the engine commits the
current transaction before executing the DDL, which removes the need to
journal catalog mutations through the undo log but makes DDL irreversible
within a session. PostgreSQL’s choice — fully transactional DDL — forces
every catalog mutation to be WAL-logged and undoable, which in turn
requires the catalog-write path to participate in the same lock-and-MVCC
machinery as the user-data path.
Schema-change vs. data-movement DDL. Pure-metadata DDL (CREATE TABLE,
DROP INDEX) takes a short exclusive lock on the new object (which no
concurrent session can yet see), writes a few catalog rows, and exits in
milliseconds. Data-touching DDL (ALTER TABLE ... ADD COLUMN c NOT NULL DEFAULT expr, partition reorganisation) must scan and rewrite existing rows
— it can run for hours while holding an AccessExclusiveLock on the
table, blocking all concurrent reads. The code reflects this split starkly:
PostgreSQL separates standard_ProcessUtility (which handles the simple,
no-data-movement cases inline) from ProcessUtilitySlow (which wraps every
command in event-trigger fences and dispatches into the
commands/tablecmds.c-style per-command modules).
Cache invalidation via CommandCounterIncrement. Within a single
transaction, catalog changes written by one statement must become visible
to subsequent statements in the same transaction before that transaction
commits. PostgreSQL solves this with a per-transaction command counter
(CommandId). Each CommandCounterIncrement() call advances the counter,
making rows written by the previous command visible to the next command’s
snapshot. The DDL path calls it at precise points — immediately after
writing pg_class to make the new table tuple visible, again after adding
default expressions so that subsequent constraint processing can reference
generated columns. The broader cache-invalidation story (sinval messages
propagating relcache invalidations to other backends) is covered in
postgres-cache-invalidation.md; this document focuses on the within-
transaction command-counter discipline.
Common DBMS Design
Section titled “Common DBMS Design”The textbook gives the model; this section names the engineering conventions that nearly every DBMS adopts when implementing DDL. PostgreSQL’s specific choices in §“PostgreSQL’s Approach” are one set of dials within this shared space.
A single dispatch gate for non-optimizable statements
Section titled “A single dispatch gate for non-optimizable statements”Every engine that distinguishes DML from DDL needs a single checkpoint that routes statements to the right handler. The planner does not touch DDL, so a separate dispatch function acts as the front door for the entire non- optimizable statement set. This function is also the natural place to:
- enforce read-only transaction rules (a
CREATE TABLEmust not be executed inside a read-only transaction or while a backup is taking a consistent snapshot); - check for parallel-mode restrictions (many DDL statements cannot run inside a parallel worker context);
- fire extension hooks so loadable modules can intercept or replace the built-in behaviour.
The pattern of dispatch_gate → extension_hook_or_default → per_command_ module is universal. PostgreSQL’s ProcessUtility → ProcessUtility_hook
→ standard_ProcessUtility is the canonical instance.
A simple/slow split on event-trigger support
Section titled “A simple/slow split on event-trigger support”Event triggers (PostgreSQL’s term for command-level callbacks — “fire before
this DDL command”, “fire after this DDL command finishes”) require the
engine to maintain a per-command context object throughout the statement’s
execution, populate it with the objects affected, and fire the trigger at
two checkpoints (start and end). This bookkeeping is non-trivial and cannot
be retrofitted around every statement without adding overhead to the
lightweight path (e.g., BEGIN, SET, CHECKPOINT).
The standard pattern is to have two dispatch levels:
- A fast path handles statements that either do not support event triggers at all or for which event-trigger support is conditional on the object type. These run inline, no trigger context allocated.
- A slow path wraps every command in trigger fences
(
EventTriggerBeginCompleteQuery/EventTriggerDDLCommandStart/EventTriggerEndCompleteQuery) and delegates to per-command handlers.
PostgreSQL’s standard_ProcessUtility is the fast path; ProcessUtilitySlow
is the slow path.
Staged construction with command-counter visibility
Section titled “Staged construction with command-counter visibility”When DDL creates an object in multiple steps (create the heap, write
pg_attribute rows, process default expressions, add constraints), the
steps must see each other’s outputs within the same transaction. The
standard mechanism is a command counter or statement counter that
advances the effective snapshot between steps, making previously written
catalog rows visible. Oracle’s Library Cache Lock and PostgreSQL’s
CommandCounterIncrement serve this role.
Transactional catalog writes with rollback support
Section titled “Transactional catalog writes with rollback support”In a transactional DDL engine, catalog rows are regular heap tuples subject
to the same MVCC rules as user rows. Writing a pg_class row during
CREATE TABLE is identical in mechanism to INSERT INTO pg_class (...):
the row is inserted under the current transaction’s xmin, is not visible
to other transactions until commit, and is removed (by vacuum) if the
transaction rolls back. This means the catalog-write path is simply the
heap-write path applied to system catalogs, and DDL rollback requires no
special undo machinery beyond the standard MVCC + WAL combination.
Theory ↔ PostgreSQL mapping
Section titled “Theory ↔ PostgreSQL mapping”| Concept | PostgreSQL name |
|---|---|
| DDL dispatch gate | ProcessUtility — tcop/utility.c:499 |
| Extension hook | ProcessUtility_hook (ProcessUtility_hook_type) |
| Default handler | standard_ProcessUtility — tcop/utility.c:543 |
| Simple (no event-trigger overhead) path | inline cases in standard_ProcessUtility |
| Full event-trigger path | ProcessUtilitySlow — tcop/utility.c:1092 |
| Event-trigger fence open | EventTriggerBeginCompleteQuery + EventTriggerDDLCommandStart |
| Event-trigger fence close | EventTriggerEndCompleteQuery |
| Definition tree (parse output) | CreateStmt, AlterTableStmt, … (nodes/parsenodes.h) |
| Per-command semantic analysis | transformCreateStmt (parser/parse_utilcmd.c:164) |
| Table creation entry | DefineRelation — commands/tablecmds.c:764 |
| Catalog row writer | heap_create_with_catalog — catalog/heap.c:1139 |
| Physical file allocator | heap_create — catalog/heap.c:285 |
| Within-transaction visibility bump | CommandCounterIncrement |
| Raw-default expression processor | AddRelationNewConstraints — catalog/heap.c:2402 |
| Inheritance chain writer | StoreCatalogInheritance — commands/tablecmds.c:3521 |
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”The dispatch spine: ProcessUtility → standard_ProcessUtility → ProcessUtilitySlow
Section titled “The dispatch spine: ProcessUtility → standard_ProcessUtility → ProcessUtilitySlow”Every statement that is not a plain DML enters through ProcessUtility.
Its only job is to check whether a hook is registered; if so it hands
control to the hook, otherwise to standard_ProcessUtility:
// ProcessUtility — src/backend/tcop/utility.cProcessUtility(PlannedStmt *pstmt, const char *queryString, bool readOnlyTree, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment *queryEnv, DestReceiver *dest, QueryCompletion *qc){ // ... if (ProcessUtility_hook) (*ProcessUtility_hook) (pstmt, queryString, readOnlyTree, context, params, queryEnv, dest, qc); else standard_ProcessUtility(pstmt, queryString, readOnlyTree, context, params, queryEnv, dest, qc);}The hook is declared as ProcessUtility_hook_type ProcessUtility_hook = NULL
in tcop/utility.c and exposed as PGDLLIMPORT via tcop/utility.h. An
extension that wants to intercept CREATE TABLE (for example, to add audit
logging or enforce naming conventions) installs its own function at this
hook and calls standard_ProcessUtility inside it to preserve the default
path.
standard_ProcessUtility begins with safety checks — read-only transaction
enforcement, parallel-mode restrictions, stack depth — then switches on the
parse node tag. Statements that either do not support event triggers or need
conditional support (based on the object type) run inline; catalog-heavy
statements are forwarded to ProcessUtilitySlow. For example:
// standard_ProcessUtility — src/backend/tcop/utility.ccase T_DropStmt: { DropStmt *stmt = (DropStmt *) parsetree; if (EventTriggerSupportsObjectType(stmt->removeType)) ProcessUtilitySlow(pstate, pstmt, queryString, context, params, queryEnv, dest, qc); else ExecDropStmt(stmt, isTopLevel); } break;For T_CreateStmt (ordinary CREATE TABLE) there is no conditional — it
always goes through ProcessUtilitySlow. The comment at line 529 captures
the design intent:
“standard_ProcessUtility itself deals only with utility commands for which we do not provide event trigger support. Commands that do have such support are passed down to ProcessUtilitySlow, which contains the necessary infrastructure for such triggers. This division is not just for performance: it’s critical that the event trigger code not be invoked when doing START TRANSACTION for example, because we might need to refresh the event trigger cache, which requires being in a valid transaction.”
ProcessUtilitySlow opens the event-trigger fence, dispatches on the
node tag, and closes the fence in a PG_TRY/PG_CATCH block to ensure
cleanup even when the command errors:
// ProcessUtilitySlow — src/backend/tcop/utility.cProcessUtilitySlow(ParseState *pstate, PlannedStmt *pstmt, ...){ // ... bool isCompleteQuery = (context != PROCESS_UTILITY_SUBCOMMAND); needCleanup = isCompleteQuery && EventTriggerBeginCompleteQuery();
PG_TRY(); { if (isCompleteQuery) EventTriggerDDLCommandStart(parsetree);
switch (nodeTag(parsetree)) { case T_CreateStmt: case T_CreateForeignTableStmt: { List *stmts; // ... transform, then loop over stmts: stmts = transformCreateStmt((CreateStmt *) parsetree, queryString); while (stmts != NIL) { Node *stmt = (Node *) linitial(stmts); stmts = list_delete_first(stmts); if (IsA(stmt, CreateStmt)) { address = DefineRelation(cstmt, RELKIND_RELATION, InvalidOid, NULL, queryString); EventTriggerCollectSimpleCommand(address, ...); CommandCounterIncrement(); NewRelationCreateToastTable(address.objectId, ...); } // ... LIKE clauses and foreign tables handled separately } } break; // ... other T_* cases }
if (isCompleteQuery) { EventTriggerSQLDropAddTarget(/* ... */); EventTriggerDDLCommandEnd(parsetree); } } PG_FINALLY(); { if (needCleanup) EventTriggerEndCompleteQuery(); } PG_END_TRY();}Figure 1 — DDL dispatch flow from portal to catalog writer
flowchart TD PORTAL["PortalRunUtility\n(query-processing)"] --> PU["ProcessUtility\ntcop/utility.c"] PU -->|hook installed| HOOK["ProcessUtility_hook\nextension code"] PU -->|no hook| SPU["standard_ProcessUtility"] HOOK -.->|calls back| SPU SPU -->|no event-trigger support| INLINE["inline handler\ne.g. BeginTransactionBlock\nRequestCheckpoint"] SPU -->|conditional, wrong type| INLINE2["inline ExecDropStmt\nExecRenameStmt, ..."] SPU -->|catalog-heavy or conditional+match| SLOW["ProcessUtilitySlow"] SLOW --> ETSTART["EventTriggerDDLCommandStart"] ETSTART --> DISPATCH["switch nodeTag\nT_CreateStmt\nT_AlterTableStmt\nT_DropStmt\n..."] DISPATCH -->|CREATE TABLE| TRANSFORM["transformCreateStmt\nparse_utilcmd.c"] TRANSFORM --> DR["DefineRelation\ntablecmds.c"] DR --> HCC["heap_create_with_catalog\ncatalog/heap.c"] HCC --> HC["heap_create\ncatalog/heap.c"] HC -->|physical| SMGR["smgr_create + storage alloc"] HCC -->|catalog rows| PGCLASS["INSERT pg_class\npg_attribute\npg_type"] DR --> CCI["CommandCounterIncrement"] CCI --> ARNC["AddRelationNewConstraints\nraw DEFAULTs + CHECKs"] ARNC --> SCI["StoreCatalogInheritance"] SLOW --> ETEND["EventTriggerDDLCommandEnd\nEventTriggerEndCompleteQuery"]
Figure 1 — Simplified DDL dispatch from portal entry to catalog writer. The fast path (inline, left branch) and the slow path (right branch) share the same ProcessUtility gate. Error paths are not shown; ProcessUtilitySlow uses PG_TRY/PG_FINALLY to ensure EventTriggerEndCompleteQuery always fires.
Figure 1 shows the spine end-to-end. Figure 2 zooms into the
standard_ProcessUtility body itself — the part where the nodeTag switch
decides, per statement, whether the statement runs inline or is forwarded to
ProcessUtilitySlow. The switch has three kinds of arms. (1) Pure-runtime
statements (T_TransactionStmt, T_VariableSetStmt, T_CheckPointStmt)
have hand-written inline handlers and never reach the slow path — this is
exactly the START TRANSACTION case the line-529 comment warns must stay out
of event-trigger code. (2) Object-type-conditional statements (T_DropStmt,
T_RenameStmt, T_GrantStmt, T_CommentStmt, T_SecLabelStmt, …) call
EventTriggerSupportsObjectType and forward to ProcessUtilitySlow only when
the affected object type has event-trigger support — otherwise they run their
own Exec*Stmt inline. (3) Everything else falls through to the default:
arm, whose one-line comment is “All other statement types have event trigger
support” — this is how T_CreateStmt and T_AlterTableStmt reach
ProcessUtilitySlow with no per-statement conditional.
Figure 2 — standard_ProcessUtility dispatch and the ProcessUtility_hook seam
flowchart TD
PU["ProcessUtility<br/>tcop/utility.c"] -->|ProcessUtility_hook != NULL| HOOK["ProcessUtility_hook<br/>extension fn pointer"]
PU -->|hook is NULL| SPU["standard_ProcessUtility"]
HOOK -.->|"convention: call through"| SPU
SPU --> SAFE["check_stack_depth<br/>ClassifyUtilityCommandAsReadOnly<br/>PreventCommandIfReadOnly / ParallelMode"]
SAFE --> SW["switch nodeTag parsetree"]
SW -->|"T_TransactionStmt<br/>T_VariableSetStmt<br/>T_CheckPointStmt"| RUNTIME["inline handler<br/>BeginTransactionBlock<br/>SetPGVariable<br/>RequestCheckpoint"]
SW -->|"T_DropStmt / T_RenameStmt<br/>T_GrantStmt / T_CommentStmt<br/>T_SecLabelStmt ..."| COND{"EventTriggerSupportsObjectType ?"}
COND -->|no| INLINE_EXEC["inline ExecDropStmt<br/>ExecRenameStmt<br/>ExecuteGrantStmt ..."]
COND -->|yes| SLOW["ProcessUtilitySlow"]
SW -->|"default:<br/>T_CreateStmt, T_AlterTableStmt,<br/>T_IndexStmt, T_ViewStmt ..."| SLOW
SLOW --> FENCE["EventTriggerBeginCompleteQuery<br/>EventTriggerDDLCommandStart<br/>... dispatch ...<br/>EventTriggerDDLCommandEnd"]
Figure 2 — Inside standard_ProcessUtility: safety checks precede the nodeTag switch. Runtime statements run inline and never reach the slow path; object-type-conditional statements consult EventTriggerSupportsObjectType; the default: arm forwards the entire catalog-heavy remainder to ProcessUtilitySlow. The ProcessUtility_hook seam sits one level up, before any of this — an extension that installs a hook is expected (by convention, not by enforcement) to call standard_ProcessUtility to preserve the default path.
The default: arm is the literal mechanism by which CREATE TABLE reaches
the slow path. There is no case T_CreateStmt: in standard_ProcessUtility
at all — it is absorbed by the catch-all:
// standard_ProcessUtility (default arm) — src/backend/tcop/utility.c default: /* All other statement types have event trigger support */ ProcessUtilitySlow(pstate, pstmt, queryString, context, params, queryEnv, dest, qc); break; }The object-type-conditional arms, by contrast, branch explicitly. The
T_RenameStmt case is representative of the whole family:
// standard_ProcessUtility (T_RenameStmt arm) — src/backend/tcop/utility.c case T_RenameStmt: { RenameStmt *stmt = (RenameStmt *) parsetree;
if (EventTriggerSupportsObjectType(stmt->renameType)) ProcessUtilitySlow(pstate, pstmt, queryString, context, params, queryEnv, dest, qc); else ExecRenameStmt(stmt); } break;transformCreateStmt: semantic analysis before DefineRelation
Section titled “transformCreateStmt: semantic analysis before DefineRelation”Before ProcessUtilitySlow calls DefineRelation, it passes the raw
CreateStmt through transformCreateStmt (parser/parse_utilcmd.c:164).
This function does the semantic work the parser cannot: it expands LIKE
clauses (which copy column definitions from an existing table), resolves
INHERITS parents, converts SERIAL/GENERATED shorthand into explicit
DEFAULT and constraint nodes, and normalises column-level constraints into
table-level constraint nodes. The return value is a List * because a
single CREATE TABLE ... (LIKE t INCLUDING ALL) can expand into multiple
CreateStmt nodes (the original table plus any index-creation statements
needed to reproduce the source table’s indexes). ProcessUtilitySlow loops
over that list and handles each item.
DefineRelation: from CreateStmt to relationId
Section titled “DefineRelation: from CreateStmt to relationId”DefineRelation (commands/tablecmds.c:764) is the coordinator for the
create-table path. It does not write any catalog rows itself; instead it
assembles all the inputs and calls the lower-level writers:
// DefineRelation — src/backend/commands/tablecmds.cDefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, ObjectAddress *typaddress, const char *queryString){ // 1. Resolve namespace, tablespace, access method namespaceId = RangeVarGetAndCheckCreationNamespace(stmt->relation, ...);
// 2. Compute lockmode for parent scan parentLockmode = (stmt->partbound != NULL ? AccessExclusiveLock : ShareUpdateExclusiveLock);
// 3. Merge inherited attributes into tableElts stmt->tableElts = MergeAttributes(stmt->tableElts, inheritOids, ...);
// 4. Build TupleDesc from merged column list descriptor = BuildDescForRelation(stmt->tableElts);
// 5. Separate raw DEFAULTs from cooked (inherited) ones // raw -> rawDefaults; cooked -> cookedDefaults
// 6. Write catalog rows + allocate physical file relationId = heap_create_with_catalog(relname, namespaceId, tablespaceId, InvalidOid, InvalidOid, ofTypeId, ownerId, accessMethodId, descriptor, list_concat(cookedDefaults, old_constraints), relkind, relpersistence, false, false, stmt->oncommit, reloptions, true, false, false, InvalidOid, typaddress);
// 7. Make new pg_class tuple visible within this transaction CommandCounterIncrement();
// 8. Open the new relation (exclusive lock, for lock-manager accounting) rel = relation_open(relationId, AccessExclusiveLock);
// 9. Transform raw DEFAULT/CHECK expressions now that the relcache // entry exists if (rawDefaults) AddRelationNewConstraints(rel, rawDefaults, NIL, true, true, false, queryString);
// 10. Bump again so generated-column expressions are visible CommandCounterIncrement();
// 11. Process partitioning (if partbound != NULL) // 12. Set partition strategy / partition key // 13. Store inheritance linkage StoreCatalogInheritance(relationId, inheritOids, partitioned);
// 14. Add table-level CHECK constraints // ...
return address;}The block above is a structural sketch; the rest of this section quotes the
real REL_18 code at the load-bearing points. First, namespace resolution and
the parent lock mode. RangeVarGetAndCheckCreationNamespace both resolves the
schema and locks it against concurrent drop; the parent lock mode is
self-exclusive for ordinary inheritance but AccessExclusiveLock when the new
relation is a partition (its parent’s partition descriptor is about to change):
// DefineRelation (namespace + parent lockmode) — src/backend/commands/tablecmds.c namespaceId = RangeVarGetAndCheckCreationNamespace(stmt->relation, NoLock, NULL); // ... temp-table security check ... parentLockmode = (stmt->partbound != NULL ? AccessExclusiveLock : ShareUpdateExclusiveLock);Next, the schema is assembled. MergeAttributes folds inherited columns into
stmt->tableElts (and is destructive — it rewrites the list in place), then
BuildDescForRelation turns the merged column list into the TupleDesc that
heap_create_with_catalog will persist as pg_attribute rows:
// DefineRelation (merge + build TupleDesc) — src/backend/commands/tablecmds.c stmt->tableElts = MergeAttributes(stmt->tableElts, inheritOids, stmt->relation->relpersistence, stmt->partbound != NULL, &old_constraints, &old_notnulls);
descriptor = BuildDescForRelation(stmt->tableElts);The default-value split happens column by column. A column with a
raw_default (an unparsed parse tree from this CREATE TABLE) is appended
to rawDefaults for later transformation; a column with a cooked_default
(an already-transformed Expr *, inherited from a parent) is wrapped in a
CookedConstraint and will be handed straight to heap_create_with_catalog:
// DefineRelation (raw vs. cooked default split) — src/backend/commands/tablecmds.c if (colDef->raw_default != NULL) { RawColumnDefault *rawEnt;
Assert(colDef->cooked_default == NULL); rawEnt = (RawColumnDefault *) palloc(sizeof(RawColumnDefault)); rawEnt->attnum = attnum; rawEnt->raw_default = colDef->raw_default; rawEnt->generated = colDef->generated; rawDefaults = lappend(rawDefaults, rawEnt); } else if (colDef->cooked_default != NULL) { CookedConstraint *cooked; // ... fill cooked->contype = CONSTR_DEFAULT, attnum, expr ... cookedDefaults = lappend(cookedDefaults, cooked); }Finally the real catalog-write call and the two-phase default handling around
it. Note that heap_create_with_catalog receives only the cooked defaults
(concatenated with old_constraints); the raw ones are applied after the
first CommandCounterIncrement makes the new pg_class row visible, because
transformExpr (inside AddRelationNewConstraints) needs a real relation to
resolve column references against:
// DefineRelation (heap_create_with_catalog + two-phase defaults) — src/backend/commands/tablecmds.c relationId = heap_create_with_catalog(relname, namespaceId, tablespaceId, InvalidOid, InvalidOid, ofTypeId, ownerId, accessMethodId, descriptor, list_concat(cookedDefaults, old_constraints), relkind, stmt->relation->relpersistence, false, false, stmt->oncommit, reloptions, true, allowSystemTableMods, false, InvalidOid, typaddress);
/* make the new pg_class tuple visible for opening */ CommandCounterIncrement(); rel = relation_open(relationId, AccessExclusiveLock);
if (rawDefaults) AddRelationNewConstraints(rel, rawDefaults, NIL, true, true, false, queryString);
/* make column generation expressions visible for partitioning */ CommandCounterIncrement();Step 7 (the first CommandCounterIncrement) is worth pausing on. The comment
in the source says:
“We must bump the command counter to make the newly-created relation tuple visible for opening.”
This is the within-transaction visibility mechanic. heap_create_with_catalog
has inserted a pg_class row under the current transaction’s xmin. That
row is not yet visible to the current transaction’s own snapshot because
the snapshot was taken before the insert. CommandCounterIncrement
advances the command counter; the next snapshot acquisition will have a
higher curcid (current command ID) that makes the row visible. Only then
can relation_open find and lock the new relcache entry.
heap_create_with_catalog: the catalog writer
Section titled “heap_create_with_catalog: the catalog writer”heap_create_with_catalog (catalog/heap.c:1139) is the function that
turns a TupleDesc and a set of options into persistent catalog rows. Its
steps:
- Open
pg_classwithRowExclusiveLock. - Validate the TupleDesc — column name/type sanity checks, rejecting
ANYARRAYoutside bootstrap/system-table modes. - Check for name collision in both
pg_classandpg_type(every table gets a composite row type; the type name must not conflict with an existing type). - Allocate an OID / relfilenumber via
GetNewRelFileNumber, then lock the OID (LockRelationOid(relid, AccessExclusiveLock)). - Create the physical file by calling
heap_create(which calls intosmgr_createand writes the initial storage). - Insert
pg_classrow withInsertPgClassTuple. - Insert
pg_attributerows viaInsertPgAttributeTuples. - Create the composite row type in
pg_type(unless it is a sequence, toast table, or index — those do not get a row type). - Store
pg_attrdeffor pre-cooked default expressions passed incooked_constraints. - Close
pg_class.
// heap_create_with_catalog (condensed) — src/backend/catalog/heap.cheap_create_with_catalog(const char *relname, Oid relnamespace, Oid reltablespace, Oid relid, Oid reltypeid, Oid reloftypeid, Oid ownerid, Oid accessmtd, TupleDesc tupdesc, List *cooked_constraints, char relkind, char relpersistence, bool shared_relation, bool mapped_relation, OnCommitAction oncommit, Datum reloptions, bool use_user_acl, bool allow_system_table_mods, bool is_internal, Oid relrewrite, ObjectAddress *typaddress){ pg_class_desc = table_open(RelationRelationId, RowExclusiveLock);
// ... name collision checks ...
// Allocate OID (or use binary-upgrade override) relid = GetNewRelFileNumber(reltablespace, pg_class_desc, relpersistence); LockRelationOid(relid, AccessExclusiveLock);
// Create relcache entry + physical storage new_rel_desc = heap_create(relname, relnamespace, reltablespace, relid, relfilenumber, accessmtd, tupdesc, relkind, relpersistence, shared_relation, mapped_relation, allow_system_table_mods, &relfrozenxid, &relminmxid, true /* create_storage */);
// Write pg_class row, pg_attribute rows, pg_type row, pg_attrdef rows // ... (AddNewRelationTuple, InsertPgAttributeTuples, etc.) ...
table_close(pg_class_desc, RowExclusiveLock); return relid;}heap_create (catalog/heap.c:285) is the lower half: it creates the
relcache entry (RelationBuildDesc-style) and calls smgr_create to
allocate the physical fork files on disk. It does not write any catalog
rows; that is entirely heap_create_with_catalog’s job. The split exists
so that callers who already have a valid relcache entry (e.g., REINDEX)
can call heap_create without going through the full catalog-write path.
CommandCounterIncrement and the two-phase DEFAULT problem
Section titled “CommandCounterIncrement and the two-phase DEFAULT problem”Raw DEFAULT expressions in CREATE TABLE (col INT DEFAULT expr) cannot
be transformed when the parser sees them, because transformExpr requires
a pre-existing relcache entry to resolve column references. DefineRelation
therefore separates “cooked” defaults (already-transformed Expr * nodes,
coming from inherited columns) from “raw” defaults (unparsed Node * trees
from the current CREATE TABLE). The cooked defaults are passed directly
into heap_create_with_catalog; the raw ones are stashed in rawDefaults.
After CommandCounterIncrement makes the new pg_class row visible,
DefineRelation opens the relation and calls AddRelationNewConstraints
(catalog/heap.c:2402), passing the raw defaults. AddRelationNewConstraints
calls transformExpr against the now-open relation’s parse-state, converting
each raw DEFAULT node into an Expr * and writing it to pg_attrdef.
For generated columns (GENERATED ALWAYS AS expr STORED), a second
CommandCounterIncrement follows so the generated expressions are visible
before the optional partbound processing reads them.
This two-phase pattern — write catalog row, bump counter, then process
expressions that reference the new row — recurs throughout the DDL path.
ALTER TABLE exercises it at every sub-command boundary, which is why
tablecmds.c contains dozens of CommandCounterIncrement() calls.
ProcessUtilityForAlterTable: recursion back through ProcessUtility
Section titled “ProcessUtilityForAlterTable: recursion back through ProcessUtility”ALTER TABLE sub-commands (e.g., ADD CONSTRAINT FOREIGN KEY ... USING INDEX) can themselves trigger additional DDL. The function
ProcessUtilityForAlterTable (tcop/utility.c:1959) handles this by
wrapping the sub-statement in a PlannedStmt and calling ProcessUtility
recursively with PROCESS_UTILITY_SUBCOMMAND context. The subcommand
context suppresses the event-trigger fences (the outer ALTER TABLE has
already opened them) and prevents top-level-only checks:
// ProcessUtilityForAlterTable — src/backend/tcop/utility.cProcessUtilityForAlterTable(Node *stmt, AlterTableUtilityContext *context){ EventTriggerAlterTableEnd(); // close current AT sub-command set // ... build wrapper PlannedStmt ... ProcessUtility(wrapper, context->queryString, false, PROCESS_UTILITY_SUBCOMMAND, /* suppress fences */ context->params, context->queryEnv, None_Receiver, NULL); EventTriggerAlterTableStart(context->pstmt->utilityStmt); EventTriggerAlterTableRelid(context->relid);}This recursive path is how a single ALTER TABLE statement can internally
drive multiple CREATE INDEX, ADD CONSTRAINT, and CREATE TRIGGER
commands without replicating their dispatch logic.
Source Walkthrough
Section titled “Source Walkthrough”Entry and dispatch
Section titled “Entry and dispatch”ProcessUtility(tcop/utility.c) — the single entry point for all utility statements fromPortalRunUtility. ChecksProcessUtility_hook; callsstandard_ProcessUtilityif no hook.ProcessUtility_hook—ProcessUtility_hook_typefunction pointer, declaredPGDLLIMPORT; the extension interception point.standard_ProcessUtility(tcop/utility.c) — safety checks (read- only mode, parallel mode, stack depth) followed by anodeTagswitch. Simple statements run inline; catalog-heavy ones callProcessUtilitySlow.ProcessUtilitySlow(tcop/utility.c) — event-trigger fence wrapper (EventTriggerBeginCompleteQuery/EventTriggerDDLCommandStart/…End) around the full catalog-heavy dispatch switch.
CREATE TABLE path
Section titled “CREATE TABLE path”transformCreateStmt(parser/parse_utilcmd.c) — semantic analysis ofCreateStmt: expandsLIKE, resolvesINHERITS, convertsSERIAL, splits constraints. Returns aList *of statements.DefineRelation(commands/tablecmds.c) — coordinator: namespace lookup,MergeAttributesfor inherited columns,BuildDescForRelation, call toheap_create_with_catalog, twoCommandCounterIncrementcalls,AddRelationNewConstraints,StoreCatalogInheritance.heap_create_with_catalog(catalog/heap.c) — catalog writer: openspg_class, allocates OID, callsheap_create, writespg_class/pg_attribute/pg_typerows.heap_create(catalog/heap.c) — physical layer: builds relcache entry, callssmgr_createto allocate segment files.CommandCounterIncrement— advances the per-transaction command counter; called at least twice perCREATE TABLE(post-catalog-write, post-default-expression processing).AddRelationNewConstraints(catalog/heap.c) — transforms rawDEFAULTandCHECKexpression trees against the now-open relation, writespg_attrdef/pg_constraintrows.StoreCatalogInheritance(commands/tablecmds.c) — writespg_inheritsrows forINHERITSand declarative partition parents.NewRelationCreateToastTable(commands/tablecmds.c) — creates the TOAST relation if the new table’s row type may exceedTOAST_TUPLE_THRESHOLD.
ALTER TABLE recursion
Section titled “ALTER TABLE recursion”ProcessUtilityForAlterTable(tcop/utility.c) — called fromATExecCmdfor sub-commands that require their own DDL dispatch; wraps the sub-statement and callsProcessUtilitywithPROCESS_UTILITY_SUBCOMMAND.
Position hints (as of 2026-06-05, commit 273fe94)
Section titled “Position hints (as of 2026-06-05, commit 273fe94)”| Symbol | File | Line |
|---|---|---|
ProcessUtility | src/backend/tcop/utility.c | 499 |
ProcessUtility_hook | src/backend/tcop/utility.c | 70 |
standard_ProcessUtility | src/backend/tcop/utility.c | 543 |
ProcessUtilitySlow | src/backend/tcop/utility.c | 1092 |
ProcessUtilityForAlterTable | src/backend/tcop/utility.c | 1959 |
transformCreateStmt | src/backend/parser/parse_utilcmd.c | 164 |
DefineRelation | src/backend/commands/tablecmds.c | 764 |
RangeVarGetAndCheckCreationNamespace | src/backend/catalog/namespace.c | 739 |
MergeAttributes | src/backend/commands/tablecmds.c | 2546 |
BuildDescForRelation | src/backend/commands/tablecmds.c | 1380 |
heap_create_with_catalog | src/backend/catalog/heap.c | 1139 |
heap_create | src/backend/catalog/heap.c | 285 |
InsertPgAttributeTuples | src/backend/catalog/heap.c | 731 |
AddRelationNewConstraints | src/backend/catalog/heap.c | 2402 |
StoreConstraints | src/backend/catalog/heap.c | 2327 |
StoreCatalogInheritance | src/backend/commands/tablecmds.c | 3521 |
heap_drop_with_catalog | src/backend/catalog/heap.c | 1801 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”Verified facts
Section titled “Verified facts”-
ProcessUtilitychecks exactly one hook (ProcessUtility_hook) before delegating. Verified attcop/utility.c:518. The hook is typedProcessUtility_hook_typeand declaredPGDLLIMPORT; the default value isNULL. No secondary hooks exist at this level. -
The simple/slow split is conditional on
EventTriggerSupportsObjectTypeforT_DropStmt,T_RenameStmt,T_GrantStmt, and a handful of others. Verified attcop/utility.c:977–1070.T_CreateStmtandT_AlterTableStmtalways go toProcessUtilitySlowwith no conditional — everyCREATE TABLEgets event-trigger wrapping regardless of whether any event triggers are defined. -
ProcessUtilitySlowwraps the entire command body inPG_TRYto guaranteeEventTriggerEndCompleteQueryfires even on error. Verified attcop/utility.c:1113. The cleanup is in thePG_FINALLYblock, notPG_CATCH, so it fires on both normal completion and error. -
transformCreateStmtis called beforeDefineRelation, not after. Verified attcop/utility.c:1143. The transformed list is looped over; eachCreateStmtnode in the list callsDefineRelationindependently, meaning a single user statement can create multiple heap relations (e.g., whenLIKE … INCLUDING INDEXESis used on a table with multiple indexes). -
CommandCounterIncrementis called at least twice inDefineRelation. Verified attablecmds.c:1082andtablecmds.c:1111. The first bump (afterheap_create_with_catalog) makes thepg_classrow visible forrelation_open. The second bump (afterAddRelationNewConstraints) makes generated-column expressions visible before partition-bound processing. -
heap_create_with_catalogtakesAccessExclusiveLockon the new OID immediately after allocating it. Verified atcatalog/heap.c:1292. The comment explains: “Other sessions’ catalog scans can’t find this until we commit. Hence, it doesn’t hurt to hold AccessExclusiveLock.” The lock is purely for lock-manager accounting — preventing false deadlock reports if the same OID is referenced again before commit. -
heap_createis responsible for both the relcache entry and the physical file;heap_create_with_catalogdoes not callsmgr_createdirectly. Verified by readingheap_create_with_catalogatcatalog/heap.c:1330: it callsheap_create(... true /* create_storage */), andheap_createatcatalog/heap.c:336callssmgr_createwhenRELKIND_HAS_STORAGE.
Open questions
Section titled “Open questions”-
NewRelationCreateToastTableheuristic.DefineRelationcallsNewRelationCreateToastTableunconditionally afterCommandCounterIncrement. The function presumably inspects theTupleDescto decide whether a TOAST table is needed. The exact decision threshold (based onMaxHeapTupleSizeand per-column storage mode) is documented inaccess/heap/but not verified against the REL_18_STABLE source in this pass. -
pg_typerow for partitioned tables.heap_create_with_catalogcreates a composite row type forRELKIND_PARTITIONED_TABLE. Whether the row type is accessible viapg_typethe same way a regular table’s type is (e.g., viaROW(...)construction) is not verified in this pass; the condition atcatalog/heap.c:1355gates the type creation but the exact logic was not traced in full. -
ProcessUtility_hookreentrancy. If an extension’s hook implementation calls back intoProcessUtility(notstandard_ProcessUtility), the hook fires again for the inner call. Whether any shipped extension exploits or guards against this is not verified.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”-
CUBRID’s
do_statement/SM_TEMPLATEDDL path. CUBRID dispatches DDL throughdo_statement(execute_statement.c) →do_create_entity→dbt_create_class→sm_finish_class, using a staging template (SM_TEMPLATE) that is mutated and then committed atomically viaupdate_class. The key difference from PostgreSQL: CUBRID’s DDL staging is an explicit mutable object, while PostgreSQL’s staging is implicit in the transaction’s visibility rules —pg_classrows under the currentxminare simply not visible to other transactions until commit. Seecubrid-ddl-execution.md. -
Oracle’s implicit-commit DDL and Library Cache Lock. Oracle commits the current transaction before executing DDL (implicit commit), making DDL non-transactional. Cache coherence is maintained by the Library Cache Lock rather than MVCC-style visibility. A PostgreSQL engineer who has worked with Oracle will find the explicit
CommandCounterIncrementpattern surprising — it exists because DDL is transactional and rows must be made visible within the same transaction’s scope. -
MySQL 8.0 transactional DDL via the data dictionary. MySQL 8.0 replaced its non-transactional
FRMfiles with a transactional InnoDB-backed data dictionary. DDL now participates in two-phase commit for atomic DDL across the dictionary and the SE, closely analogous to PostgreSQL’s fully transactional catalog. The implementation paper comparing the two approaches has not been curated; a side-by-side cost analysis ofCommandCounterIncrement- style vs. 2PC-style within-transaction DDL staging would be a useful follow-up. -
DDL and the event-trigger extensibility surface. PostgreSQL’s event triggers (
ddl_command_start,ddl_command_end,sql_drop,table_rewrite) are the extensibility hook that makes tools likepg_audit,ddl_historizer, and schema-migration frameworks possible without patching the core. The design — aProcessUtilitySlowwrapper that collects affected objects into a per-command context and fires PL/pgSQL or C functions at two checkpoints — is documented inpostgres-event-triggers.md. -
DDL-logging and schema-change capture. Logical decoding in PostgreSQL captures DML changes by replaying WAL. DDL changes are not natively decoded by the replication protocol (the catalog writes are WAL-logged, but the logical decoding output plugin receives a
COMMITrecord, not a structured DDL event). Tools likepglogicalandpg_auto_failoverextend this by capturing event-trigger output and serialising it separately. A formal treatment of “DDL as part of the replication stream” remains an active research and engineering area.
Sources
Section titled “Sources”Source files (REL_18_STABLE, commit 273fe94):
src/backend/tcop/utility.c—ProcessUtility,standard_ProcessUtility,ProcessUtilitySlow,ProcessUtilityForAlterTablesrc/backend/commands/tablecmds.c—DefineRelation,StoreCatalogInheritance,MergeAttributessrc/backend/catalog/heap.c—heap_create,heap_create_with_catalog,InsertPgAttributeTuples,AddRelationNewConstraints,StoreConstraintssrc/backend/parser/parse_utilcmd.c—transformCreateStmtsrc/include/tcop/utility.h—ProcessUtility_hook_type,ProcessUtility_hook
Textbooks:
- Database System Concepts, Silberschatz et al., 7th ed., §5.2 “Data Definition Language”, §15.5 “DDL in SQL”
Cross-references:
postgres-overview-ddl-schema.md— section router; the ProcessUtility dispatch map and the two-escapee (RLS, partition routing) framingpostgres-alter-table.md— the multi-phaseAlterTableengine intablecmds.c; the AT-subcommand walk that usesProcessUtilityForAlterTablepostgres-cache-invalidation.md— the sinval-message broadcast that propagates relcache invalidation to other backends after DDL commitpostgres-system-catalogs.md—pg_class,pg_attribute,pg_constraintrow layouts; whatheap_create_with_catalogwritespostgres-xact.md— the transaction lifecycle under which transactional DDL runs;CommandCounterIncrementin the xact contextcubrid-ddl-execution.md— CUBRID’sSM_TEMPLATE-based DDL staging path; the foil for PostgreSQL’s implicit-visibility approach