Skip to content

PostgreSQL DDL Execution — ProcessUtility Dispatch, the Simple/Slow Split, and the CREATE TABLE Path

Contents:

Every relational engine splits its statement set into two families. The first family — SELECT, INSERT, UPDATE, DELETE, MERGE — can be optimized by a cost-based planner: the planner examines statistics, generates alternative plans, picks the cheapest, and produces an executor tree. The second family — CREATE TABLE, ALTER TABLE, CREATE INDEX, DROP, GRANT, and all the rest — cannot be cost-estimated because their work is structural: they alter the shape of the catalog, not the content of user rows. Database System Concepts (Silberschatz et al., 7th ed., §5.2) names this second family Data Definition Language (DDL) and characterizes its job as maintaining the schema: the set of relation definitions, type annotations, integrity constraints, and access-control rules that every subsequent DML statement is compiled against.

The canonical DDL pipeline has five movements, regardless of which engine executes it:

  1. Parse the DDL statement into a definition tree that names the object being defined and all its attributes.
  2. Validate the definition against the existing catalog — namespace uniqueness, type resolution, permission checks.
  3. Materialise on-disk artefacts — allocate the heap file for a new table, the index file for a new index.
  4. Write catalog rows — insert or update pg_class, pg_attribute, pg_index, pg_constraint, and related catalog tables so every other backend can discover the new object.
  5. Invalidate caches — broadcast to all backends that any cached plan or relcache entry referencing the changed object must be discarded before the next compile.

Two design axes cut across all five movements and shape the implementation significantly.

Transactional vs. implicit-commit DDL. PostgreSQL, SQL Server, and DB2 execute DDL inside the same transaction and WAL-recovery framework as DML. A CREATE TABLE followed by ROLLBACK is indistinguishable from no CREATE TABLE at all — the catalog rows are simply not committed. Oracle and pre-8.0 MySQL treat DDL as an implicit commit: the engine commits the current transaction before executing the DDL, which removes the need to journal catalog mutations through the undo log but makes DDL irreversible within a session. PostgreSQL’s choice — fully transactional DDL — forces every catalog mutation to be WAL-logged and undoable, which in turn requires the catalog-write path to participate in the same lock-and-MVCC machinery as the user-data path.

Schema-change vs. data-movement DDL. Pure-metadata DDL (CREATE TABLE, DROP INDEX) takes a short exclusive lock on the new object (which no concurrent session can yet see), writes a few catalog rows, and exits in milliseconds. Data-touching DDL (ALTER TABLE ... ADD COLUMN c NOT NULL DEFAULT expr, partition reorganisation) must scan and rewrite existing rows — it can run for hours while holding an AccessExclusiveLock on the table, blocking all concurrent reads. The code reflects this split starkly: PostgreSQL separates standard_ProcessUtility (which handles the simple, no-data-movement cases inline) from ProcessUtilitySlow (which wraps every command in event-trigger fences and dispatches into the commands/tablecmds.c-style per-command modules).

Cache invalidation via CommandCounterIncrement. Within a single transaction, catalog changes written by one statement must become visible to subsequent statements in the same transaction before that transaction commits. PostgreSQL solves this with a per-transaction command counter (CommandId). Each CommandCounterIncrement() call advances the counter, making rows written by the previous command visible to the next command’s snapshot. The DDL path calls it at precise points — immediately after writing pg_class to make the new table tuple visible, again after adding default expressions so that subsequent constraint processing can reference generated columns. The broader cache-invalidation story (sinval messages propagating relcache invalidations to other backends) is covered in postgres-cache-invalidation.md; this document focuses on the within- transaction command-counter discipline.

The textbook gives the model; this section names the engineering conventions that nearly every DBMS adopts when implementing DDL. PostgreSQL’s specific choices in §“PostgreSQL’s Approach” are one set of dials within this shared space.

A single dispatch gate for non-optimizable statements

Section titled “A single dispatch gate for non-optimizable statements”

Every engine that distinguishes DML from DDL needs a single checkpoint that routes statements to the right handler. The planner does not touch DDL, so a separate dispatch function acts as the front door for the entire non- optimizable statement set. This function is also the natural place to:

  • enforce read-only transaction rules (a CREATE TABLE must not be executed inside a read-only transaction or while a backup is taking a consistent snapshot);
  • check for parallel-mode restrictions (many DDL statements cannot run inside a parallel worker context);
  • fire extension hooks so loadable modules can intercept or replace the built-in behaviour.

The pattern of dispatch_gate → extension_hook_or_default → per_command_ module is universal. PostgreSQL’s ProcessUtilityProcessUtility_hookstandard_ProcessUtility is the canonical instance.

A simple/slow split on event-trigger support

Section titled “A simple/slow split on event-trigger support”

Event triggers (PostgreSQL’s term for command-level callbacks — “fire before this DDL command”, “fire after this DDL command finishes”) require the engine to maintain a per-command context object throughout the statement’s execution, populate it with the objects affected, and fire the trigger at two checkpoints (start and end). This bookkeeping is non-trivial and cannot be retrofitted around every statement without adding overhead to the lightweight path (e.g., BEGIN, SET, CHECKPOINT).

The standard pattern is to have two dispatch levels:

  • A fast path handles statements that either do not support event triggers at all or for which event-trigger support is conditional on the object type. These run inline, no trigger context allocated.
  • A slow path wraps every command in trigger fences (EventTriggerBeginCompleteQuery / EventTriggerDDLCommandStart / EventTriggerEndCompleteQuery) and delegates to per-command handlers.

PostgreSQL’s standard_ProcessUtility is the fast path; ProcessUtilitySlow is the slow path.

Staged construction with command-counter visibility

Section titled “Staged construction with command-counter visibility”

When DDL creates an object in multiple steps (create the heap, write pg_attribute rows, process default expressions, add constraints), the steps must see each other’s outputs within the same transaction. The standard mechanism is a command counter or statement counter that advances the effective snapshot between steps, making previously written catalog rows visible. Oracle’s Library Cache Lock and PostgreSQL’s CommandCounterIncrement serve this role.

Transactional catalog writes with rollback support

Section titled “Transactional catalog writes with rollback support”

In a transactional DDL engine, catalog rows are regular heap tuples subject to the same MVCC rules as user rows. Writing a pg_class row during CREATE TABLE is identical in mechanism to INSERT INTO pg_class (...): the row is inserted under the current transaction’s xmin, is not visible to other transactions until commit, and is removed (by vacuum) if the transaction rolls back. This means the catalog-write path is simply the heap-write path applied to system catalogs, and DDL rollback requires no special undo machinery beyond the standard MVCC + WAL combination.

ConceptPostgreSQL name
DDL dispatch gateProcessUtilitytcop/utility.c:499
Extension hookProcessUtility_hook (ProcessUtility_hook_type)
Default handlerstandard_ProcessUtilitytcop/utility.c:543
Simple (no event-trigger overhead) pathinline cases in standard_ProcessUtility
Full event-trigger pathProcessUtilitySlowtcop/utility.c:1092
Event-trigger fence openEventTriggerBeginCompleteQuery + EventTriggerDDLCommandStart
Event-trigger fence closeEventTriggerEndCompleteQuery
Definition tree (parse output)CreateStmt, AlterTableStmt, … (nodes/parsenodes.h)
Per-command semantic analysistransformCreateStmt (parser/parse_utilcmd.c:164)
Table creation entryDefineRelationcommands/tablecmds.c:764
Catalog row writerheap_create_with_catalogcatalog/heap.c:1139
Physical file allocatorheap_createcatalog/heap.c:285
Within-transaction visibility bumpCommandCounterIncrement
Raw-default expression processorAddRelationNewConstraintscatalog/heap.c:2402
Inheritance chain writerStoreCatalogInheritancecommands/tablecmds.c:3521

The dispatch spine: ProcessUtility → standard_ProcessUtility → ProcessUtilitySlow

Section titled “The dispatch spine: ProcessUtility → standard_ProcessUtility → ProcessUtilitySlow”

Every statement that is not a plain DML enters through ProcessUtility. Its only job is to check whether a hook is registered; if so it hands control to the hook, otherwise to standard_ProcessUtility:

// ProcessUtility — src/backend/tcop/utility.c
ProcessUtility(PlannedStmt *pstmt,
const char *queryString,
bool readOnlyTree,
ProcessUtilityContext context,
ParamListInfo params,
QueryEnvironment *queryEnv,
DestReceiver *dest,
QueryCompletion *qc)
{
// ...
if (ProcessUtility_hook)
(*ProcessUtility_hook) (pstmt, queryString, readOnlyTree,
context, params, queryEnv, dest, qc);
else
standard_ProcessUtility(pstmt, queryString, readOnlyTree,
context, params, queryEnv, dest, qc);
}

The hook is declared as ProcessUtility_hook_type ProcessUtility_hook = NULL in tcop/utility.c and exposed as PGDLLIMPORT via tcop/utility.h. An extension that wants to intercept CREATE TABLE (for example, to add audit logging or enforce naming conventions) installs its own function at this hook and calls standard_ProcessUtility inside it to preserve the default path.

standard_ProcessUtility begins with safety checks — read-only transaction enforcement, parallel-mode restrictions, stack depth — then switches on the parse node tag. Statements that either do not support event triggers or need conditional support (based on the object type) run inline; catalog-heavy statements are forwarded to ProcessUtilitySlow. For example:

// standard_ProcessUtility — src/backend/tcop/utility.c
case T_DropStmt:
{
DropStmt *stmt = (DropStmt *) parsetree;
if (EventTriggerSupportsObjectType(stmt->removeType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv, dest, qc);
else
ExecDropStmt(stmt, isTopLevel);
}
break;

For T_CreateStmt (ordinary CREATE TABLE) there is no conditional — it always goes through ProcessUtilitySlow. The comment at line 529 captures the design intent:

“standard_ProcessUtility itself deals only with utility commands for which we do not provide event trigger support. Commands that do have such support are passed down to ProcessUtilitySlow, which contains the necessary infrastructure for such triggers. This division is not just for performance: it’s critical that the event trigger code not be invoked when doing START TRANSACTION for example, because we might need to refresh the event trigger cache, which requires being in a valid transaction.”

ProcessUtilitySlow opens the event-trigger fence, dispatches on the node tag, and closes the fence in a PG_TRY/PG_CATCH block to ensure cleanup even when the command errors:

// ProcessUtilitySlow — src/backend/tcop/utility.c
ProcessUtilitySlow(ParseState *pstate, PlannedStmt *pstmt, ...)
{
// ...
bool isCompleteQuery = (context != PROCESS_UTILITY_SUBCOMMAND);
needCleanup = isCompleteQuery && EventTriggerBeginCompleteQuery();
PG_TRY();
{
if (isCompleteQuery)
EventTriggerDDLCommandStart(parsetree);
switch (nodeTag(parsetree))
{
case T_CreateStmt:
case T_CreateForeignTableStmt:
{
List *stmts;
// ... transform, then loop over stmts:
stmts = transformCreateStmt((CreateStmt *) parsetree,
queryString);
while (stmts != NIL)
{
Node *stmt = (Node *) linitial(stmts);
stmts = list_delete_first(stmts);
if (IsA(stmt, CreateStmt))
{
address = DefineRelation(cstmt, RELKIND_RELATION,
InvalidOid, NULL, queryString);
EventTriggerCollectSimpleCommand(address, ...);
CommandCounterIncrement();
NewRelationCreateToastTable(address.objectId, ...);
}
// ... LIKE clauses and foreign tables handled separately
}
}
break;
// ... other T_* cases
}
if (isCompleteQuery)
{
EventTriggerSQLDropAddTarget(/* ... */);
EventTriggerDDLCommandEnd(parsetree);
}
}
PG_FINALLY();
{
if (needCleanup)
EventTriggerEndCompleteQuery();
}
PG_END_TRY();
}

Figure 1 — DDL dispatch flow from portal to catalog writer

flowchart TD
  PORTAL["PortalRunUtility\n(query-processing)"] --> PU["ProcessUtility\ntcop/utility.c"]

  PU -->|hook installed| HOOK["ProcessUtility_hook\nextension code"]
  PU -->|no hook| SPU["standard_ProcessUtility"]
  HOOK -.->|calls back| SPU

  SPU -->|no event-trigger support| INLINE["inline handler\ne.g. BeginTransactionBlock\nRequestCheckpoint"]
  SPU -->|conditional, wrong type| INLINE2["inline ExecDropStmt\nExecRenameStmt, ..."]
  SPU -->|catalog-heavy or conditional+match| SLOW["ProcessUtilitySlow"]

  SLOW --> ETSTART["EventTriggerDDLCommandStart"]
  ETSTART --> DISPATCH["switch nodeTag\nT_CreateStmt\nT_AlterTableStmt\nT_DropStmt\n..."]

  DISPATCH -->|CREATE TABLE| TRANSFORM["transformCreateStmt\nparse_utilcmd.c"]
  TRANSFORM --> DR["DefineRelation\ntablecmds.c"]
  DR --> HCC["heap_create_with_catalog\ncatalog/heap.c"]
  HCC --> HC["heap_create\ncatalog/heap.c"]
  HC -->|physical| SMGR["smgr_create + storage alloc"]
  HCC -->|catalog rows| PGCLASS["INSERT pg_class\npg_attribute\npg_type"]
  DR --> CCI["CommandCounterIncrement"]
  CCI --> ARNC["AddRelationNewConstraints\nraw DEFAULTs + CHECKs"]
  ARNC --> SCI["StoreCatalogInheritance"]

  SLOW --> ETEND["EventTriggerDDLCommandEnd\nEventTriggerEndCompleteQuery"]

Figure 1 — Simplified DDL dispatch from portal entry to catalog writer. The fast path (inline, left branch) and the slow path (right branch) share the same ProcessUtility gate. Error paths are not shown; ProcessUtilitySlow uses PG_TRY/PG_FINALLY to ensure EventTriggerEndCompleteQuery always fires.

Figure 1 shows the spine end-to-end. Figure 2 zooms into the standard_ProcessUtility body itself — the part where the nodeTag switch decides, per statement, whether the statement runs inline or is forwarded to ProcessUtilitySlow. The switch has three kinds of arms. (1) Pure-runtime statements (T_TransactionStmt, T_VariableSetStmt, T_CheckPointStmt) have hand-written inline handlers and never reach the slow path — this is exactly the START TRANSACTION case the line-529 comment warns must stay out of event-trigger code. (2) Object-type-conditional statements (T_DropStmt, T_RenameStmt, T_GrantStmt, T_CommentStmt, T_SecLabelStmt, …) call EventTriggerSupportsObjectType and forward to ProcessUtilitySlow only when the affected object type has event-trigger support — otherwise they run their own Exec*Stmt inline. (3) Everything else falls through to the default: arm, whose one-line comment is “All other statement types have event trigger support” — this is how T_CreateStmt and T_AlterTableStmt reach ProcessUtilitySlow with no per-statement conditional.

Figure 2 — standard_ProcessUtility dispatch and the ProcessUtility_hook seam

flowchart TD
  PU["ProcessUtility<br/>tcop/utility.c"] -->|ProcessUtility_hook != NULL| HOOK["ProcessUtility_hook<br/>extension fn pointer"]
  PU -->|hook is NULL| SPU["standard_ProcessUtility"]
  HOOK -.->|"convention: call through"| SPU

  SPU --> SAFE["check_stack_depth<br/>ClassifyUtilityCommandAsReadOnly<br/>PreventCommandIfReadOnly / ParallelMode"]
  SAFE --> SW["switch nodeTag parsetree"]

  SW -->|"T_TransactionStmt<br/>T_VariableSetStmt<br/>T_CheckPointStmt"| RUNTIME["inline handler<br/>BeginTransactionBlock<br/>SetPGVariable<br/>RequestCheckpoint"]

  SW -->|"T_DropStmt / T_RenameStmt<br/>T_GrantStmt / T_CommentStmt<br/>T_SecLabelStmt ..."| COND{"EventTriggerSupportsObjectType ?"}
  COND -->|no| INLINE_EXEC["inline ExecDropStmt<br/>ExecRenameStmt<br/>ExecuteGrantStmt ..."]
  COND -->|yes| SLOW["ProcessUtilitySlow"]

  SW -->|"default:<br/>T_CreateStmt, T_AlterTableStmt,<br/>T_IndexStmt, T_ViewStmt ..."| SLOW

  SLOW --> FENCE["EventTriggerBeginCompleteQuery<br/>EventTriggerDDLCommandStart<br/>... dispatch ...<br/>EventTriggerDDLCommandEnd"]

Figure 2 — Inside standard_ProcessUtility: safety checks precede the nodeTag switch. Runtime statements run inline and never reach the slow path; object-type-conditional statements consult EventTriggerSupportsObjectType; the default: arm forwards the entire catalog-heavy remainder to ProcessUtilitySlow. The ProcessUtility_hook seam sits one level up, before any of this — an extension that installs a hook is expected (by convention, not by enforcement) to call standard_ProcessUtility to preserve the default path.

The default: arm is the literal mechanism by which CREATE TABLE reaches the slow path. There is no case T_CreateStmt: in standard_ProcessUtility at all — it is absorbed by the catch-all:

// standard_ProcessUtility (default arm) — src/backend/tcop/utility.c
default:
/* All other statement types have event trigger support */
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
dest, qc);
break;
}

The object-type-conditional arms, by contrast, branch explicitly. The T_RenameStmt case is representative of the whole family:

// standard_ProcessUtility (T_RenameStmt arm) — src/backend/tcop/utility.c
case T_RenameStmt:
{
RenameStmt *stmt = (RenameStmt *) parsetree;
if (EventTriggerSupportsObjectType(stmt->renameType))
ProcessUtilitySlow(pstate, pstmt, queryString,
context, params, queryEnv,
dest, qc);
else
ExecRenameStmt(stmt);
}
break;

transformCreateStmt: semantic analysis before DefineRelation

Section titled “transformCreateStmt: semantic analysis before DefineRelation”

Before ProcessUtilitySlow calls DefineRelation, it passes the raw CreateStmt through transformCreateStmt (parser/parse_utilcmd.c:164). This function does the semantic work the parser cannot: it expands LIKE clauses (which copy column definitions from an existing table), resolves INHERITS parents, converts SERIAL/GENERATED shorthand into explicit DEFAULT and constraint nodes, and normalises column-level constraints into table-level constraint nodes. The return value is a List * because a single CREATE TABLE ... (LIKE t INCLUDING ALL) can expand into multiple CreateStmt nodes (the original table plus any index-creation statements needed to reproduce the source table’s indexes). ProcessUtilitySlow loops over that list and handles each item.

DefineRelation: from CreateStmt to relationId

Section titled “DefineRelation: from CreateStmt to relationId”

DefineRelation (commands/tablecmds.c:764) is the coordinator for the create-table path. It does not write any catalog rows itself; instead it assembles all the inputs and calls the lower-level writers:

// DefineRelation — src/backend/commands/tablecmds.c
DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
ObjectAddress *typaddress, const char *queryString)
{
// 1. Resolve namespace, tablespace, access method
namespaceId = RangeVarGetAndCheckCreationNamespace(stmt->relation, ...);
// 2. Compute lockmode for parent scan
parentLockmode = (stmt->partbound != NULL ?
AccessExclusiveLock : ShareUpdateExclusiveLock);
// 3. Merge inherited attributes into tableElts
stmt->tableElts = MergeAttributes(stmt->tableElts, inheritOids, ...);
// 4. Build TupleDesc from merged column list
descriptor = BuildDescForRelation(stmt->tableElts);
// 5. Separate raw DEFAULTs from cooked (inherited) ones
// raw -> rawDefaults; cooked -> cookedDefaults
// 6. Write catalog rows + allocate physical file
relationId = heap_create_with_catalog(relname, namespaceId,
tablespaceId, InvalidOid,
InvalidOid, ofTypeId, ownerId,
accessMethodId, descriptor,
list_concat(cookedDefaults,
old_constraints),
relkind, relpersistence,
false, false, stmt->oncommit,
reloptions, true, false, false,
InvalidOid, typaddress);
// 7. Make new pg_class tuple visible within this transaction
CommandCounterIncrement();
// 8. Open the new relation (exclusive lock, for lock-manager accounting)
rel = relation_open(relationId, AccessExclusiveLock);
// 9. Transform raw DEFAULT/CHECK expressions now that the relcache
// entry exists
if (rawDefaults)
AddRelationNewConstraints(rel, rawDefaults, NIL, true, true, false,
queryString);
// 10. Bump again so generated-column expressions are visible
CommandCounterIncrement();
// 11. Process partitioning (if partbound != NULL)
// 12. Set partition strategy / partition key
// 13. Store inheritance linkage
StoreCatalogInheritance(relationId, inheritOids, partitioned);
// 14. Add table-level CHECK constraints
// ...
return address;
}

The block above is a structural sketch; the rest of this section quotes the real REL_18 code at the load-bearing points. First, namespace resolution and the parent lock mode. RangeVarGetAndCheckCreationNamespace both resolves the schema and locks it against concurrent drop; the parent lock mode is self-exclusive for ordinary inheritance but AccessExclusiveLock when the new relation is a partition (its parent’s partition descriptor is about to change):

// DefineRelation (namespace + parent lockmode) — src/backend/commands/tablecmds.c
namespaceId =
RangeVarGetAndCheckCreationNamespace(stmt->relation, NoLock, NULL);
// ... temp-table security check ...
parentLockmode = (stmt->partbound != NULL ? AccessExclusiveLock :
ShareUpdateExclusiveLock);

Next, the schema is assembled. MergeAttributes folds inherited columns into stmt->tableElts (and is destructive — it rewrites the list in place), then BuildDescForRelation turns the merged column list into the TupleDesc that heap_create_with_catalog will persist as pg_attribute rows:

// DefineRelation (merge + build TupleDesc) — src/backend/commands/tablecmds.c
stmt->tableElts =
MergeAttributes(stmt->tableElts, inheritOids,
stmt->relation->relpersistence,
stmt->partbound != NULL,
&old_constraints, &old_notnulls);
descriptor = BuildDescForRelation(stmt->tableElts);

The default-value split happens column by column. A column with a raw_default (an unparsed parse tree from this CREATE TABLE) is appended to rawDefaults for later transformation; a column with a cooked_default (an already-transformed Expr *, inherited from a parent) is wrapped in a CookedConstraint and will be handed straight to heap_create_with_catalog:

// DefineRelation (raw vs. cooked default split) — src/backend/commands/tablecmds.c
if (colDef->raw_default != NULL)
{
RawColumnDefault *rawEnt;
Assert(colDef->cooked_default == NULL);
rawEnt = (RawColumnDefault *) palloc(sizeof(RawColumnDefault));
rawEnt->attnum = attnum;
rawEnt->raw_default = colDef->raw_default;
rawEnt->generated = colDef->generated;
rawDefaults = lappend(rawDefaults, rawEnt);
}
else if (colDef->cooked_default != NULL)
{
CookedConstraint *cooked;
// ... fill cooked->contype = CONSTR_DEFAULT, attnum, expr ...
cookedDefaults = lappend(cookedDefaults, cooked);
}

Finally the real catalog-write call and the two-phase default handling around it. Note that heap_create_with_catalog receives only the cooked defaults (concatenated with old_constraints); the raw ones are applied after the first CommandCounterIncrement makes the new pg_class row visible, because transformExpr (inside AddRelationNewConstraints) needs a real relation to resolve column references against:

// DefineRelation (heap_create_with_catalog + two-phase defaults) — src/backend/commands/tablecmds.c
relationId = heap_create_with_catalog(relname, namespaceId, tablespaceId,
InvalidOid, InvalidOid, ofTypeId,
ownerId, accessMethodId, descriptor,
list_concat(cookedDefaults,
old_constraints),
relkind, stmt->relation->relpersistence,
false, false, stmt->oncommit, reloptions,
true, allowSystemTableMods, false,
InvalidOid, typaddress);
/* make the new pg_class tuple visible for opening */
CommandCounterIncrement();
rel = relation_open(relationId, AccessExclusiveLock);
if (rawDefaults)
AddRelationNewConstraints(rel, rawDefaults, NIL,
true, true, false, queryString);
/* make column generation expressions visible for partitioning */
CommandCounterIncrement();

Step 7 (the first CommandCounterIncrement) is worth pausing on. The comment in the source says:

“We must bump the command counter to make the newly-created relation tuple visible for opening.”

This is the within-transaction visibility mechanic. heap_create_with_catalog has inserted a pg_class row under the current transaction’s xmin. That row is not yet visible to the current transaction’s own snapshot because the snapshot was taken before the insert. CommandCounterIncrement advances the command counter; the next snapshot acquisition will have a higher curcid (current command ID) that makes the row visible. Only then can relation_open find and lock the new relcache entry.

heap_create_with_catalog: the catalog writer

Section titled “heap_create_with_catalog: the catalog writer”

heap_create_with_catalog (catalog/heap.c:1139) is the function that turns a TupleDesc and a set of options into persistent catalog rows. Its steps:

  1. Open pg_class with RowExclusiveLock.
  2. Validate the TupleDesc — column name/type sanity checks, rejecting ANYARRAY outside bootstrap/system-table modes.
  3. Check for name collision in both pg_class and pg_type (every table gets a composite row type; the type name must not conflict with an existing type).
  4. Allocate an OID / relfilenumber via GetNewRelFileNumber, then lock the OID (LockRelationOid(relid, AccessExclusiveLock)).
  5. Create the physical file by calling heap_create (which calls into smgr_create and writes the initial storage).
  6. Insert pg_class row with InsertPgClassTuple.
  7. Insert pg_attribute rows via InsertPgAttributeTuples.
  8. Create the composite row type in pg_type (unless it is a sequence, toast table, or index — those do not get a row type).
  9. Store pg_attrdef for pre-cooked default expressions passed in cooked_constraints.
  10. Close pg_class.
// heap_create_with_catalog (condensed) — src/backend/catalog/heap.c
heap_create_with_catalog(const char *relname,
Oid relnamespace, Oid reltablespace,
Oid relid, Oid reltypeid, Oid reloftypeid,
Oid ownerid, Oid accessmtd,
TupleDesc tupdesc,
List *cooked_constraints,
char relkind, char relpersistence,
bool shared_relation, bool mapped_relation,
OnCommitAction oncommit,
Datum reloptions,
bool use_user_acl,
bool allow_system_table_mods,
bool is_internal,
Oid relrewrite,
ObjectAddress *typaddress)
{
pg_class_desc = table_open(RelationRelationId, RowExclusiveLock);
// ... name collision checks ...
// Allocate OID (or use binary-upgrade override)
relid = GetNewRelFileNumber(reltablespace, pg_class_desc, relpersistence);
LockRelationOid(relid, AccessExclusiveLock);
// Create relcache entry + physical storage
new_rel_desc = heap_create(relname, relnamespace, reltablespace,
relid, relfilenumber, accessmtd,
tupdesc, relkind, relpersistence,
shared_relation, mapped_relation,
allow_system_table_mods,
&relfrozenxid, &relminmxid,
true /* create_storage */);
// Write pg_class row, pg_attribute rows, pg_type row, pg_attrdef rows
// ... (AddNewRelationTuple, InsertPgAttributeTuples, etc.) ...
table_close(pg_class_desc, RowExclusiveLock);
return relid;
}

heap_create (catalog/heap.c:285) is the lower half: it creates the relcache entry (RelationBuildDesc-style) and calls smgr_create to allocate the physical fork files on disk. It does not write any catalog rows; that is entirely heap_create_with_catalog’s job. The split exists so that callers who already have a valid relcache entry (e.g., REINDEX) can call heap_create without going through the full catalog-write path.

CommandCounterIncrement and the two-phase DEFAULT problem

Section titled “CommandCounterIncrement and the two-phase DEFAULT problem”

Raw DEFAULT expressions in CREATE TABLE (col INT DEFAULT expr) cannot be transformed when the parser sees them, because transformExpr requires a pre-existing relcache entry to resolve column references. DefineRelation therefore separates “cooked” defaults (already-transformed Expr * nodes, coming from inherited columns) from “raw” defaults (unparsed Node * trees from the current CREATE TABLE). The cooked defaults are passed directly into heap_create_with_catalog; the raw ones are stashed in rawDefaults.

After CommandCounterIncrement makes the new pg_class row visible, DefineRelation opens the relation and calls AddRelationNewConstraints (catalog/heap.c:2402), passing the raw defaults. AddRelationNewConstraints calls transformExpr against the now-open relation’s parse-state, converting each raw DEFAULT node into an Expr * and writing it to pg_attrdef. For generated columns (GENERATED ALWAYS AS expr STORED), a second CommandCounterIncrement follows so the generated expressions are visible before the optional partbound processing reads them.

This two-phase pattern — write catalog row, bump counter, then process expressions that reference the new row — recurs throughout the DDL path. ALTER TABLE exercises it at every sub-command boundary, which is why tablecmds.c contains dozens of CommandCounterIncrement() calls.

ProcessUtilityForAlterTable: recursion back through ProcessUtility

Section titled “ProcessUtilityForAlterTable: recursion back through ProcessUtility”

ALTER TABLE sub-commands (e.g., ADD CONSTRAINT FOREIGN KEY ... USING INDEX) can themselves trigger additional DDL. The function ProcessUtilityForAlterTable (tcop/utility.c:1959) handles this by wrapping the sub-statement in a PlannedStmt and calling ProcessUtility recursively with PROCESS_UTILITY_SUBCOMMAND context. The subcommand context suppresses the event-trigger fences (the outer ALTER TABLE has already opened them) and prevents top-level-only checks:

// ProcessUtilityForAlterTable — src/backend/tcop/utility.c
ProcessUtilityForAlterTable(Node *stmt, AlterTableUtilityContext *context)
{
EventTriggerAlterTableEnd(); // close current AT sub-command set
// ... build wrapper PlannedStmt ...
ProcessUtility(wrapper,
context->queryString,
false,
PROCESS_UTILITY_SUBCOMMAND, /* suppress fences */
context->params,
context->queryEnv,
None_Receiver, NULL);
EventTriggerAlterTableStart(context->pstmt->utilityStmt);
EventTriggerAlterTableRelid(context->relid);
}

This recursive path is how a single ALTER TABLE statement can internally drive multiple CREATE INDEX, ADD CONSTRAINT, and CREATE TRIGGER commands without replicating their dispatch logic.

  • ProcessUtility (tcop/utility.c) — the single entry point for all utility statements from PortalRunUtility. Checks ProcessUtility_hook; calls standard_ProcessUtility if no hook.
  • ProcessUtility_hookProcessUtility_hook_type function pointer, declared PGDLLIMPORT; the extension interception point.
  • standard_ProcessUtility (tcop/utility.c) — safety checks (read- only mode, parallel mode, stack depth) followed by a nodeTag switch. Simple statements run inline; catalog-heavy ones call ProcessUtilitySlow.
  • ProcessUtilitySlow (tcop/utility.c) — event-trigger fence wrapper (EventTriggerBeginCompleteQuery / EventTriggerDDLCommandStart / …End) around the full catalog-heavy dispatch switch.
  • transformCreateStmt (parser/parse_utilcmd.c) — semantic analysis of CreateStmt: expands LIKE, resolves INHERITS, converts SERIAL, splits constraints. Returns a List * of statements.
  • DefineRelation (commands/tablecmds.c) — coordinator: namespace lookup, MergeAttributes for inherited columns, BuildDescForRelation, call to heap_create_with_catalog, two CommandCounterIncrement calls, AddRelationNewConstraints, StoreCatalogInheritance.
  • heap_create_with_catalog (catalog/heap.c) — catalog writer: opens pg_class, allocates OID, calls heap_create, writes pg_class / pg_attribute / pg_type rows.
  • heap_create (catalog/heap.c) — physical layer: builds relcache entry, calls smgr_create to allocate segment files.
  • CommandCounterIncrement — advances the per-transaction command counter; called at least twice per CREATE TABLE (post-catalog-write, post-default-expression processing).
  • AddRelationNewConstraints (catalog/heap.c) — transforms raw DEFAULT and CHECK expression trees against the now-open relation, writes pg_attrdef / pg_constraint rows.
  • StoreCatalogInheritance (commands/tablecmds.c) — writes pg_inherits rows for INHERITS and declarative partition parents.
  • NewRelationCreateToastTable (commands/tablecmds.c) — creates the TOAST relation if the new table’s row type may exceed TOAST_TUPLE_THRESHOLD.
  • ProcessUtilityForAlterTable (tcop/utility.c) — called from ATExecCmd for sub-commands that require their own DDL dispatch; wraps the sub-statement and calls ProcessUtility with PROCESS_UTILITY_SUBCOMMAND.

Position hints (as of 2026-06-05, commit 273fe94)

Section titled “Position hints (as of 2026-06-05, commit 273fe94)”
SymbolFileLine
ProcessUtilitysrc/backend/tcop/utility.c499
ProcessUtility_hooksrc/backend/tcop/utility.c70
standard_ProcessUtilitysrc/backend/tcop/utility.c543
ProcessUtilitySlowsrc/backend/tcop/utility.c1092
ProcessUtilityForAlterTablesrc/backend/tcop/utility.c1959
transformCreateStmtsrc/backend/parser/parse_utilcmd.c164
DefineRelationsrc/backend/commands/tablecmds.c764
RangeVarGetAndCheckCreationNamespacesrc/backend/catalog/namespace.c739
MergeAttributessrc/backend/commands/tablecmds.c2546
BuildDescForRelationsrc/backend/commands/tablecmds.c1380
heap_create_with_catalogsrc/backend/catalog/heap.c1139
heap_createsrc/backend/catalog/heap.c285
InsertPgAttributeTuplessrc/backend/catalog/heap.c731
AddRelationNewConstraintssrc/backend/catalog/heap.c2402
StoreConstraintssrc/backend/catalog/heap.c2327
StoreCatalogInheritancesrc/backend/commands/tablecmds.c3521
heap_drop_with_catalogsrc/backend/catalog/heap.c1801
  • ProcessUtility checks exactly one hook (ProcessUtility_hook) before delegating. Verified at tcop/utility.c:518. The hook is typed ProcessUtility_hook_type and declared PGDLLIMPORT; the default value is NULL. No secondary hooks exist at this level.

  • The simple/slow split is conditional on EventTriggerSupportsObjectType for T_DropStmt, T_RenameStmt, T_GrantStmt, and a handful of others. Verified at tcop/utility.c:977–1070. T_CreateStmt and T_AlterTableStmt always go to ProcessUtilitySlow with no conditional — every CREATE TABLE gets event-trigger wrapping regardless of whether any event triggers are defined.

  • ProcessUtilitySlow wraps the entire command body in PG_TRY to guarantee EventTriggerEndCompleteQuery fires even on error. Verified at tcop/utility.c:1113. The cleanup is in the PG_FINALLY block, not PG_CATCH, so it fires on both normal completion and error.

  • transformCreateStmt is called before DefineRelation, not after. Verified at tcop/utility.c:1143. The transformed list is looped over; each CreateStmt node in the list calls DefineRelation independently, meaning a single user statement can create multiple heap relations (e.g., when LIKE … INCLUDING INDEXES is used on a table with multiple indexes).

  • CommandCounterIncrement is called at least twice in DefineRelation. Verified at tablecmds.c:1082 and tablecmds.c:1111. The first bump (after heap_create_with_catalog) makes the pg_class row visible for relation_open. The second bump (after AddRelationNewConstraints) makes generated-column expressions visible before partition-bound processing.

  • heap_create_with_catalog takes AccessExclusiveLock on the new OID immediately after allocating it. Verified at catalog/heap.c:1292. The comment explains: “Other sessions’ catalog scans can’t find this until we commit. Hence, it doesn’t hurt to hold AccessExclusiveLock.” The lock is purely for lock-manager accounting — preventing false deadlock reports if the same OID is referenced again before commit.

  • heap_create is responsible for both the relcache entry and the physical file; heap_create_with_catalog does not call smgr_create directly. Verified by reading heap_create_with_catalog at catalog/heap.c:1330: it calls heap_create(... true /* create_storage */), and heap_create at catalog/heap.c:336 calls smgr_create when RELKIND_HAS_STORAGE.

  1. NewRelationCreateToastTable heuristic. DefineRelation calls NewRelationCreateToastTable unconditionally after CommandCounterIncrement. The function presumably inspects the TupleDesc to decide whether a TOAST table is needed. The exact decision threshold (based on MaxHeapTupleSize and per-column storage mode) is documented in access/heap/ but not verified against the REL_18_STABLE source in this pass.

  2. pg_type row for partitioned tables. heap_create_with_catalog creates a composite row type for RELKIND_PARTITIONED_TABLE. Whether the row type is accessible via pg_type the same way a regular table’s type is (e.g., via ROW(...) construction) is not verified in this pass; the condition at catalog/heap.c:1355 gates the type creation but the exact logic was not traced in full.

  3. ProcessUtility_hook reentrancy. If an extension’s hook implementation calls back into ProcessUtility (not standard_ProcessUtility), the hook fires again for the inner call. Whether any shipped extension exploits or guards against this is not verified.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”
  • CUBRID’s do_statement / SM_TEMPLATE DDL path. CUBRID dispatches DDL through do_statement (execute_statement.c) → do_create_entitydbt_create_classsm_finish_class, using a staging template (SM_TEMPLATE) that is mutated and then committed atomically via update_class. The key difference from PostgreSQL: CUBRID’s DDL staging is an explicit mutable object, while PostgreSQL’s staging is implicit in the transaction’s visibility rules — pg_class rows under the current xmin are simply not visible to other transactions until commit. See cubrid-ddl-execution.md.

  • Oracle’s implicit-commit DDL and Library Cache Lock. Oracle commits the current transaction before executing DDL (implicit commit), making DDL non-transactional. Cache coherence is maintained by the Library Cache Lock rather than MVCC-style visibility. A PostgreSQL engineer who has worked with Oracle will find the explicit CommandCounterIncrement pattern surprising — it exists because DDL is transactional and rows must be made visible within the same transaction’s scope.

  • MySQL 8.0 transactional DDL via the data dictionary. MySQL 8.0 replaced its non-transactional FRM files with a transactional InnoDB-backed data dictionary. DDL now participates in two-phase commit for atomic DDL across the dictionary and the SE, closely analogous to PostgreSQL’s fully transactional catalog. The implementation paper comparing the two approaches has not been curated; a side-by-side cost analysis of CommandCounterIncrement- style vs. 2PC-style within-transaction DDL staging would be a useful follow-up.

  • DDL and the event-trigger extensibility surface. PostgreSQL’s event triggers (ddl_command_start, ddl_command_end, sql_drop, table_rewrite) are the extensibility hook that makes tools like pg_audit, ddl_historizer, and schema-migration frameworks possible without patching the core. The design — a ProcessUtilitySlow wrapper that collects affected objects into a per-command context and fires PL/pgSQL or C functions at two checkpoints — is documented in postgres-event-triggers.md.

  • DDL-logging and schema-change capture. Logical decoding in PostgreSQL captures DML changes by replaying WAL. DDL changes are not natively decoded by the replication protocol (the catalog writes are WAL-logged, but the logical decoding output plugin receives a COMMIT record, not a structured DDL event). Tools like pglogical and pg_auto_failover extend this by capturing event-trigger output and serialising it separately. A formal treatment of “DDL as part of the replication stream” remains an active research and engineering area.

Source files (REL_18_STABLE, commit 273fe94):

  • src/backend/tcop/utility.cProcessUtility, standard_ProcessUtility, ProcessUtilitySlow, ProcessUtilityForAlterTable
  • src/backend/commands/tablecmds.cDefineRelation, StoreCatalogInheritance, MergeAttributes
  • src/backend/catalog/heap.cheap_create, heap_create_with_catalog, InsertPgAttributeTuples, AddRelationNewConstraints, StoreConstraints
  • src/backend/parser/parse_utilcmd.ctransformCreateStmt
  • src/include/tcop/utility.hProcessUtility_hook_type, ProcessUtility_hook

Textbooks:

  • Database System Concepts, Silberschatz et al., 7th ed., §5.2 “Data Definition Language”, §15.5 “DDL in SQL”

Cross-references:

  • postgres-overview-ddl-schema.md — section router; the ProcessUtility dispatch map and the two-escapee (RLS, partition routing) framing
  • postgres-alter-table.md — the multi-phase AlterTable engine in tablecmds.c; the AT-subcommand walk that uses ProcessUtilityForAlterTable
  • postgres-cache-invalidation.md — the sinval-message broadcast that propagates relcache invalidation to other backends after DDL commit
  • postgres-system-catalogs.mdpg_class, pg_attribute, pg_constraint row layouts; what heap_create_with_catalog writes
  • postgres-xact.md — the transaction lifecycle under which transactional DDL runs; CommandCounterIncrement in the xact context
  • cubrid-ddl-execution.md — CUBRID’s SM_TEMPLATE-based DDL staging path; the foil for PostgreSQL’s implicit-visibility approach