Skip to content

PostgreSQL Triggers — Definition, Firing Points, and the After-Trigger Queue

Contents:

A trigger is a database object that ties a piece of procedural code to a data-modification event so that the code fires automatically and as part of the same transaction as the event. Database System Concepts (Silberschatz, 7e, ch. 5 “Advanced SQL”, §5.3 “Triggers”) defines it as “a statement that the system executes automatically as a side effect of a modification to the database,” and names the two design decisions that every trigger facility must make explicit:

  1. The event and condition that cause the trigger to be executed. SQL triggers specify the triggering event (INSERT, UPDATE, DELETE), optionally a column list for UPDATE, and an optional WHEN condition that is checked before the body runs (§5.3.1). Only when the event occurs and the condition holds does the action fire — the textbook’s event-condition-action (ECA) model.
  2. The actions to be taken when the trigger executes, together with two orthogonal axes that decide how often and when the action runs:
    • Granularity. A FOR EACH ROW trigger fires once per affected row and can see the row’s OLD/NEW images; a FOR EACH STATEMENT trigger fires once for the whole statement regardless of how many rows it touched (§5.3.1, “row-level” vs. “statement-level”).
    • Timing. A BEFORE trigger runs before the change is applied — and can therefore inspect or alter the proposed row, or cancel the operation; an AFTER trigger runs after the change, when the new state is visible and final (§5.3.1).

The textbook is also blunt about the hazards, and these hazards are exactly what the implementation must defend against. “Triggers can be used to implement certain integrity constraints” and to maintain derived data such as materialized aggregates, but “triggers need to be written with great care, since a trigger error detected at runtime causes the failure of the … statement that set off the trigger” (§5.3.2). The two named pitfalls are cascading / non-termination — a trigger whose action fires further triggers, possibly without bound — and unintended ordering, where the result depends on the order in which multiple triggers on the same event run. SQL standardizes the latter only weakly (the standard leaves order largely to the implementation), so each engine must pick and document a firing order.

There is a third concept the standard adds and that the textbook treats more briefly: transition tables (REFERENCING OLD TABLE AS / NEW TABLE AS). A statement-level AFTER trigger can ask to see the entire set of rows the statement changed, as two read-only relations, rather than firing per row. This turns a per-row visitor into a set-oriented one — the difference between an O(rows) cascade of function calls and a single call that can issue one set-based SQL statement over the delta. It is the construct that makes statement-level triggers useful for bulk integrity maintenance.

Finally, the textbook situates triggers against declarative integrity constraints: “In many cases it is preferable to use … features [foreign keys, check constraints] rather than triggers” because the system can reason about declarative constraints, whereas a trigger is opaque procedural code (§5.3.3). The deep implementation consequence — which PostgreSQL makes literal — is that constraints are themselves implemented as triggers: a foreign key is a pair of internal AFTER triggers, and DEFERRABLE constraints are exactly deferrable AFTER triggers. So the trigger machinery is not a peripheral feature; it is the substrate on which referential integrity rides, which raises the engineering bar for its queue discipline, ordering, and transaction integration.

The ECA model gives the semantics; production engines converge on a small set of engineering conventions to make those semantics fast, ordered, and transaction-safe. PostgreSQL’s specific choices in the next section are best read as one point in this shared design space.

1. Catalog the trigger; compile a per-relation dispatch summary. A trigger definition is metadata — a catalog row naming a function, an event mask, a timing/level flag, and an optional condition. But consulting the catalog on every row would be ruinous, so engines build a cached, per-relation descriptor listing the relation’s triggers, attached to the in-memory table descriptor and invalidated when the catalog changes. Crucially, the descriptor carries summary booleans (“does this table have any BEFORE-INSERT-ROW trigger at all?”) so the hot path can branch out in one test when a table has no triggers of the relevant kind — the common case.

2. A fixed set of firing points wired into the DML executor. The engine does not “scan for triggers” at arbitrary times; it calls a fixed family of hooks at hard-coded points in its insert/update/delete code — one hook per (timing × event × level) combination. Each hook is a no-op (one boolean test) when the summary flag is clear. This keeps the trigger subsystem pluggable into the executor rather than entangled with it.

3. BEFORE-row triggers are synchronous and may transform the tuple. Because a BEFORE FOR EACH ROW trigger can rewrite NEW or veto the operation, its hook must run inline, take the candidate tuple, and return a (possibly different, possibly null) tuple that the executor then proceeds to store. This is a pipeline transform, not a queued event.

4. AFTER triggers are deferred onto a queue. An AFTER trigger must see the final post-modification state and, for DEFERRABLE constraints, must possibly wait until commit. So the AFTER hook does not run the function; it records an event — minimally, which trigger and which row — onto a queue, and a later drain phase fires the queued events in order. The two hard problems this creates are (a) compactly identifying the row so the queue does not balloon (engines store a row identifier / tuple pointer, not a tuple copy), and (b) re-locating the row at fire time under the right visibility, since the heap may have changed.

5. Deterministic firing order. With the standard silent, engines pick a rule and document it. The dominant convention — alphabetical by trigger name within a (timing, event) class — is arbitrary but stable and inspectable, which is what users actually need.

6. Transaction and subtransaction integration. The AFTER queue is transaction-scoped: it must survive across the statements of a transaction (for deferred constraints), be drained at commit, be discarded at abort, and roll back partially on subtransaction abort — events queued by a savepoint that rolls back must vanish, while earlier events survive. This demands that the queue’s structure make “truncate back to a saved position” cheap.

7. Re-entrancy and recursion control. A trigger function can issue DML that fires more triggers. The queue drain must therefore be a loop (“fire; new events may appear; fire again until empty”), and the engine tracks nesting depth to bound runaway recursion and to integrate with statement timeouts.

PostgreSQL implements every one of these conventions, and the rest of this document is essentially a tour of how: pg_trigger + TriggerDesc for (1) and (5); the ExecBR/AR/IR/BS/AS family for (2); ExecBRInsertTriggers returning a tuple for (3); AfterTriggerSaveEvent + the chunked AfterTriggerEventList for (4) and (6); and MyTriggerDepth plus the firing loop for (7).

A trigger is a pg_trigger row, compiled into a relcache TriggerDesc

Section titled “A trigger is a pg_trigger row, compiled into a relcache TriggerDesc”

The persistent form of a trigger is one row in the pg_trigger system catalog. CREATE TRIGGER (CreateTriggerFiringOn in trigger.c) parses the statement, validates it, looks up the function, and inserts that row. The single most important field is tgtype — a packed int16 bitmask that encodes timing, level, and events together, using the bits defined in pg_trigger.h:

// tgtype bit layout — src/include/catalog/pg_trigger.h
#define TRIGGER_TYPE_ROW (1 << 0) /* else STATEMENT */
#define TRIGGER_TYPE_BEFORE (1 << 1) /* else AFTER (=0) */
#define TRIGGER_TYPE_INSERT (1 << 2)
#define TRIGGER_TYPE_DELETE (1 << 3)
#define TRIGGER_TYPE_UPDATE (1 << 4)
#define TRIGGER_TYPE_TRUNCATE (1 << 5)
#define TRIGGER_TYPE_INSTEAD (1 << 6) /* INSTEAD OF, views only */

Note two encodings that trip readers up. First, AFTER and STATEMENT are not bits — they are the absence of TRIGGER_TYPE_BEFORE/INSTEAD and TRIGGER_TYPE_ROW respectively. Second, a pg_trigger row may carry several event bits at once (INSERT OR UPDATE), unlike a runtime TriggerEvent, which always names exactly one operation (see the comment in trigger.h warning that the two representations differ).

When a backend first touches a relation, the relcache builds a TriggerDesc by scanning pg_trigger for that table. RelationBuildTriggers reads the rows in name order (it scans via TriggerRelidNameIndexId), which is how PostgreSQL realizes the “alphabetical firing order” convention — the array order is the firing order:

// RelationBuildTriggers — src/backend/commands/trigger.c
/*
* Note: since we scan the triggers using TriggerRelidNameIndexId, we will
* be reading the triggers in name order ... This in turn
* ensures that triggers will be fired in name order.
*/
ScanKeyInit(&skey, Anum_pg_trigger_tgrelid,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(RelationGetRelid(relation)));
tgrel = table_open(TriggerRelationId, AccessShareLock);
tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true, NULL, 1, &skey);
while (HeapTupleIsValid(htup = systable_getnext(tgscan)))
{
Form_pg_trigger pg_trigger = (Form_pg_trigger) GETSTRUCT(htup);
/* ... copy tgname, tgfoid, tgtype, tgenabled, tgdeferrable, ... into build ... */
}

The resulting in-memory shape is two structs in reltrigger.h. Trigger is one trigger (mostly a copy of the pg_trigger row plus the resolved OID); TriggerDesc is the per-relation array plus a wall of summary booleans:

// TriggerDesc — src/include/utils/reltrigger.h
typedef struct TriggerDesc
{
Trigger *triggers; /* array, in name order */
int numtriggers;
bool trig_insert_before_row; /* one flag per (event,timing,level) */
bool trig_insert_after_row;
bool trig_insert_instead_row;
bool trig_insert_before_statement;
bool trig_insert_after_statement;
/* ... update_*, delete_*, truncate_* ... */
bool trig_insert_new_table; /* any NEW TABLE transition table? */
bool trig_update_old_table;
bool trig_update_new_table;
bool trig_delete_old_table;
} TriggerDesc;

Those flags are filled by SetTriggerFlags, which ORs each trigger’s classification into the descriptor. The whole point is the negative test: the executor’s firing hooks begin with if (!trigdesc->trig_insert_before_row) return;, so a table with no triggers of a given class pays a single boolean load, never an array walk.

// SetTriggerFlags — src/backend/commands/trigger.c
trigdesc->trig_insert_before_row |=
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_ROW,
TRIGGER_TYPE_BEFORE, TRIGGER_TYPE_INSERT);
trigdesc->trig_insert_after_row |=
TRIGGER_TYPE_MATCHES(tgtype, TRIGGER_TYPE_ROW,
TRIGGER_TYPE_AFTER, TRIGGER_TYPE_INSERT);
/* ... and so on for every (event, timing, level) combination ... */
trigdesc->trig_insert_new_table |=
(TRIGGER_FOR_INSERT(tgtype) &&
TRIGGER_USES_TRANSITION_TABLE(trigger->tgnewtable));

The firing-point family: one hook per (timing, level, event)

Section titled “The firing-point family: one hook per (timing, level, event)”

At runtime, a trigger does not “watch” for events. The executor — overwhelmingly nodeModifyTable.c, plus COPY, ExecuteTruncate, and the RI code — calls a hook at the exact moment a tuple operation happens. The hooks form a regular grid, named Exec + timing + level + event + Triggers:

timing \ eventINSERTUPDATEDELETE
BEFORE statementExecBSInsertTriggersExecBSUpdateTriggersExecBSDeleteTriggers
BEFORE rowExecBRInsertTriggersExecBRUpdateTriggersExecBRDeleteTriggers
INSTEAD OF rowExecIRInsertTriggersExecIRUpdateTriggersExecIRDeleteTriggers
AFTER rowExecARInsertTriggersExecARUpdateTriggersExecARDeleteTriggers
AFTER statementExecASInsertTriggersExecASUpdateTriggersExecASDeleteTriggers

(B=before, A=after, S=statement, R=row, I=instead.) TRUNCATE adds only ExecBSTruncateTriggers / ExecASTruncateTriggers — there is no row-level truncate trigger, as the TriggerDesc comment notes. The contract differs sharply by timing, and that difference is the heart of the design:

flowchart TD
    subgraph row["per affected row, inside nodeModifyTable"]
        BR["ExecBRInsertTriggers<br/>runs function NOW<br/>returns tuple or NULL"]
        STORE["heap_insert / table_tuple_update<br/>apply the change"]
        AR["ExecARInsertTriggers<br/>does NOT run function<br/>queues an event"]
    end
    BR -->|"NULL = skip this row"| SKIP["row discarded"]
    BR -->|"tuple"| STORE
    STORE --> AR
    AR --> SAVE["AfterTriggerSaveEvent<br/>append AfterTriggerEventData"]
    SAVE --> Q[("per-query<br/>AfterTriggerEventList")]
    Q -.->|"AfterTriggerEndQuery"| FIRE["afterTriggerInvokeEvents<br/>fire immediate-mode events"]
    Q -.->|"deferrable events<br/>moved to xact list"| DEF["AfterTriggerFireDeferred<br/>at commit"]

BEFORE-row hooks run the function synchronously and transform the tuple. ExecBRInsertTriggers walks the trigger array, and for each matching+enabled trigger calls ExecCallTriggerFunc; the returned tuple becomes the input to the next trigger, so triggers chain. A NULL return means “skip this row”; a different (non-NULL) tuple is stored back into the slot:

// ExecBRInsertTriggers — src/backend/commands/trigger.c
newtuple = ExecCallTriggerFunc(&LocTriggerData, i,
relinfo->ri_TrigFunctions,
relinfo->ri_TrigInstrument,
GetPerTupleMemoryContext(estate));
if (newtuple == NULL)
{
if (should_free)
heap_freetuple(oldtuple);
return false; /* "do nothing" — skip this row */
}
else if (newtuple != oldtuple)
{
newtuple = check_modified_virtual_generated(RelationGetDescr(...), newtuple);
ExecForceStoreHeapTuple(newtuple, slot, false); /* trigger rewrote NEW */
/* ... partition-fit recheck for cloned triggers ... */
}

AFTER-row hooks do almost nothing at fire time. ExecARInsertTriggers is a thin guard that, if any after-row trigger or transition table is in play, calls AfterTriggerSaveEvent — and returns. No user code runs here:

// ExecARInsertTriggers — src/backend/commands/trigger.c
if ((trigdesc && trigdesc->trig_insert_after_row) ||
(transition_capture && transition_capture->tcs_insert_new_table))
AfterTriggerSaveEvent(estate, relinfo, NULL, NULL,
TRIGGER_EVENT_INSERT,
true /* row_trigger */, NULL, slot,
recheckIndexes, NULL,
transition_capture, false);

Statement-level AFTER hooks are similar — ExecASInsertTriggers calls AfterTriggerSaveEvent once with row_trigger = false — while statement-level BEFORE hooks (ExecBSInsertTriggers) run synchronously like BEFORE-row but cannot return a value (a BEFORE STATEMENT trigger returning non-NULL is an error). INSTEAD-OF hooks (ExecIR*, views only) run synchronously and replace the operation entirely.

The single chokepoint through which every trigger function is actually invoked is ExecCallTriggerFunc, which switches into the per-tuple memory context, sets up the fcinfo with the TriggerData as fmgr context, bumps MyTriggerDepth (recursion accounting), and calls through fmgr:

// ExecCallTriggerFunc — src/backend/commands/trigger.c
oldContext = MemoryContextSwitchTo(per_tuple_context);
InitFunctionCallInfoData(*fcinfo, finfo, 0, InvalidOid, (Node *) trigdata, NULL);
pgstat_init_function_usage(fcinfo, &fcusage);
MyTriggerDepth++;
PG_TRY();
{
result = FunctionCallInvoke(fcinfo);
}
PG_FINALLY();
{
MyTriggerDepth--;
}
PG_END_TRY();

The function receives no SQL arguments; everything — event type, OLD/NEW slots, the Trigger struct, transition tuplestores — arrives through the TriggerData node in fcinfo->context, which a PL trigger reads via the CALLED_AS_TRIGGER macro and per-language glue.

Before any hook actually calls ExecCallTriggerFunc, it filters each candidate trigger through TriggerEnabled, which folds together the three “should this trigger fire for this specific row/event” tests that the catalog flags alone cannot answer:

// TriggerEnabled — src/backend/commands/trigger.c
/* 1. session_replication_role vs. tgenabled */
if (SessionReplicationRole == SESSION_REPLICATION_ROLE_REPLICA)
{
if (trigger->tgenabled == TRIGGER_FIRES_ON_ORIGIN ||
trigger->tgenabled == TRIGGER_DISABLED)
return false;
}
/* 2. column-specific UPDATE trigger: skip if no listed column changed */
if (trigger->tgnattr > 0 && TRIGGER_FIRED_BY_UPDATE(event))
{
/* ... return false unless some tgattr[] member is in modifiedCols ... */
}
/* 3. WHEN (...) qualifier */
if (trigger->tgqual)
{
econtext->ecxt_innertuple = oldslot; /* OLD -> INNER_VAR */
econtext->ecxt_outertuple = newslot; /* NEW -> OUTER_VAR */
if (!ExecQual(*predicate, econtext))
return false;
}

Three things are worth pulling out. First, tgenabled is not a boolean: it is a four-way state (Origin / Replica / Always / Disabled) that interacts with session_replication_role, which is how logical replication apply workers suppress origin-side triggers — the same mechanism pg_dump and pglogical rely on. Second, the column list (UPDATE OF col1, col2) is checked here against the statement’s modifiedCols bitmap, not at queue time — a column-specific UPDATE trigger on an unmodified column is filtered out before it ever reaches the queue. Third, the WHEN condition is compiled lazily (the first firing per query stringToNodes tgqual, rewrites OLD/NEW Var references to INNER_VAR/OUTER_VAR, and caches the ExprState in ri_TrigWhenExprs[]) and then evaluated against the OLD/NEW slots — so a WHEN that fails also keeps the event off the queue entirely. For AFTER triggers this matters: the WHEN is evaluated at save time against the in-flight tuples, not at fire time, which is the only point where the OLD/NEW images are still both at hand.

Worked example: ordering, recursion, and the firing loop

Section titled “Worked example: ordering, recursion, and the firing loop”

Put the pieces together with one statement. Suppose table t has a BEFORE-row trigger a_stamp, an AFTER-row trigger b_audit, and an AFTER-statement trigger c_summary, and we run UPDATE t SET x = x + 1 WHERE x < 100 touching 40 rows.

flowchart TD
    START["ExecutorStart -> AfterTriggerBeginQuery<br/>query_depth++"]
    BS["ExecBSUpdateTriggers<br/>(no BS trigger here: flag clear, return)"]
    LOOP["for each of the 40 matching rows"]
    BR["ExecBRUpdateTriggers<br/>run a_stamp NOW, may rewrite NEW"]
    UPD["table_tuple_update applies the row"]
    AR["ExecARUpdateTriggers<br/>queue b_audit event (ctid1=old, ctid2=new)"]
    AS["ExecASUpdateTriggers<br/>queue ONE c_summary statement event"]
    END["ExecutorFinish -> AfterTriggerEndQuery"]
    MARK["afterTriggerMarkEvents: stamp firable events<br/>with firing_id = counter++"]
    INV["afterTriggerInvokeEvents: fire in queue order<br/>40x b_audit, then c_summary"]
    START --> BS --> LOOP --> BR --> UPD --> AR --> LOOP
    LOOP -->|"all rows done"| AS --> END --> MARK --> INV
    INV -.->|"a fired trigger queued more?"| MARK

Several invariants from the source surface in this trace. The 40 b_audit events are queued in row-processing order and share a single AfterTriggerSharedData record (same tgoid, relid, rolid), so the queue holds 40 one-CTID-ish event records plus one descriptor. The single c_summary event is queued after all row events because ExecASUpdateTriggers runs once at end-of-statement; and cancel_prior_stmt_triggers (called from AfterTriggerSaveEvent for the statement event) guarantees the statement trigger fires exactly once even across writable-CTE re-entry. At AfterTriggerEndQuery the events are marked with a fresh firing_id and fired in queue order — row triggers before the statement trigger — and the surrounding for (;;) loop re-runs afterTriggerMarkEvents if b_audit or c_summary itself issued DML that queued more events. Runaway recursion is bounded not by static analysis but by MyTriggerDepth (incremented in ExecCallTriggerFunc) interacting with max_stack_depth and statement timeouts.

The reason AFTER triggers are not run inline is that they must observe the final state of the statement (after all rows are modified, after BEFORE triggers, after constraint application) and, for deferrable constraints, may have to wait until commit. PostgreSQL therefore records a tiny event per firing and drains the queue later. The record is deliberately minimal — a flags word and one or two item pointers (CTIDs), not a tuple copy:

// AfterTriggerEventData — src/backend/commands/trigger.c
typedef struct AfterTriggerEventData
{
TriggerFlags ate_flags; /* status bits + offset to shared data */
ItemPointerData ate_ctid1; /* inserted/deleted/old-updated tuple */
ItemPointerData ate_ctid2; /* new updated tuple */
Oid ate_src_part; /* cross-partition update only */
Oid ate_dst_part;
} AfterTriggerEventData;

The clever part is that the per-trigger metadata — which trigger, which relation, which role, modified-column set — is factored out into a separate AfterTriggerSharedData record, and many events can point at one shared record. The low 27 bits of ate_flags hold the byte offset from the event to its shared record (GetTriggerSharedData), and the high bits encode size class and status (AFTER_TRIGGER_1CTID, _2CTID, _CP_UPDATE, plus AFTER_TRIGGER_IN_PROGRESS / _DONE). Because most queued events for a statement share the same trigger and relation, the per-event cost collapses to roughly the size of AfterTriggerEventDataOneCtid — a flags word plus a single 6-byte CTID. For a million-row UPDATE that fires one FK check trigger, the queue is a million 12-ish-byte records sharing one descriptor, not a million tuple copies.

Events live in an AfterTriggerEventList — a linked list of geometrically growing chunks (1 KB doubling up to 1 MB). Each chunk is a double-ended arena: AfterTriggerEventData records grow upward from freeptr, AfterTriggerSharedData records grow downward from endfree, and the offset link bridges them:

flowchart LR
    subgraph chunk["AfterTriggerEventChunk (arena)"]
        direction TB
        E1["event[0]<br/>flags+ctid1"]
        E2["event[1]<br/>flags+ctid1"]
        EDOTS["..."]
        FREE["free space"]
        SDOTS["..."]
        S1["shared[1]"]
        S0["shared[0]<br/>tgoid, relid, firing_id"]
    end
    E1 -.->|"ate_flags & OFFSET"| S0
    E2 -.->|"ate_flags & OFFSET"| S0
    L["AfterTriggerEventList<br/>head / tail / tailfree"] --> chunk

afterTriggerAddEvent is the allocator: it finds room in the tail chunk (or mallocs a bigger one from the AfterTriggerEvents memory context), scans the chunk’s existing shared records for a match, reuses it if found or copies a new one in, then memcpys the event and patches the offset link:

// afterTriggerAddEvent — src/backend/commands/trigger.c
/* try to locate a matching shared-data record already in the chunk */
for (newshared = (AfterTriggerShared) chunk->endfree;
(char *) newshared < chunk->endptr; newshared++)
{
if (newshared->ats_tgoid == evtshared->ats_tgoid &&
newshared->ats_event == evtshared->ats_event &&
newshared->ats_firing_id == 0 &&
/* ... relid, rolid, modifiedcols all equal ... */ )
break;
}
/* ... allocate a new shared record if none matched ... */
newevent = (AfterTriggerEvent) chunk->freeptr;
memcpy(newevent, event, eventsize);
newevent->ate_flags &= ~AFTER_TRIGGER_OFFSET;
newevent->ate_flags |= (char *) newshared - (char *) newevent; /* link */
chunk->freeptr += eventsize;

The queue is two-level: afterTriggers.query_stack[query_depth].events holds events from the currently running query, while afterTriggers.events is the transaction-global deferred list. The split is what makes immediate-vs-deferred and subtransaction rollback tractable, and it is described in the big comment on AfterTriggersData:

// AfterTriggersData — src/backend/commands/trigger.c
typedef struct AfterTriggersData
{
CommandId firing_counter; /* next firing-cycle ID to assign */
SetConstraintState state; /* active SET CONSTRAINTS state */
AfterTriggerEventList events; /* transaction-global deferred list */
MemoryContext event_cxt; /* memory context for events */
AfterTriggersQueryData *query_stack; /* per-query-level events */
int query_depth; /* current index; -1 when empty */
int maxquerydepth;
AfterTriggersTransData *trans_stack; /* per-subxact saved pointers */
int maxtransdepth;
} AfterTriggersData;

The lifecycle hooks (all called from xact.c / the executor, not from user code) are:

  • AfterTriggerBeginXact — zero the state at transaction start; firing_counter = 1, query_depth = -1.
  • AfterTriggerBeginQueryquery_depth++; called from standard_ExecutorStart/ExecutorStart. Cheap: real allocation is lazy.
  • AfterTriggerEndQuery — the drain for immediate-mode events; called from ExecutorFinish. It calls afterTriggerMarkEvents to tag firable events with the next firing-cycle ID (deferred ones are migrated to the global list here), then loops afterTriggerInvokeEvents until none remain, because a fired trigger may queue more at the same level.
// AfterTriggerEndQuery — src/backend/commands/trigger.c
qs = &afterTriggers.query_stack[afterTriggers.query_depth];
for (;;)
{
if (afterTriggerMarkEvents(&qs->events, &afterTriggers.events, true))
{
CommandId firing_id = afterTriggers.firing_counter++;
AfterTriggerEventChunk *oldtail = qs->events.tail;
if (afterTriggerInvokeEvents(&qs->events, firing_id, estate, false))
break; /* all fired */
qs = &afterTriggers.query_stack[afterTriggers.query_depth]; /* may have moved */
/* drop fully-fired leading chunks to speed the rescan */
while (qs->events.head != oldtail)
afterTriggerDeleteHeadEventChunk(qs);
}
else
break;
}
  • AfterTriggerFireDeferred — the drain for the transaction-global deferred list; called from CommitTransaction just before commit. It pushes a snapshot, then loops mark+invoke until empty, since deferred triggers may queue more.

The firing_counter/firing_id scheme is what keeps SET CONSTRAINTS ... IMMEDIATE sane: each drain pass stamps the events it intends to fire with a unique cycle ID, and afterTriggerInvokeEvents only fires events whose ats_firing_id matches the current cycle and whose AFTER_TRIGGER_IN_PROGRESS bit is set — so a nested SET CONSTRAINTS issued by a trigger fires only events that were not already scheduled.

At fire time the event carries only CTIDs, so afterTriggerInvokeEvents (via AfterTriggerExecute) re-fetches the tuple by CTID under SnapshotAny — the row must be found regardless of MVCC visibility, because the trigger acts on behalf of the modifying transaction:

// AfterTriggerExecute — src/backend/commands/trigger.c (default, heap case)
if (!table_tuple_fetch_row_version(src_rel, &(event->ate_ctid1),
SnapshotAny, src_slot))
elog(ERROR, "failed to fetch tuple1 for AFTER trigger");
LocTriggerData.tg_trigtuple =
ExecFetchSlotHeapTuple(LocTriggerData.tg_trigslot, false, &should_free_trig);

(Foreign-table events take a different branch: their tuples cannot be re-fetched by CTID, so they are spooled into an FDW tuplestore at save time and read back here, flagged AFTER_TRIGGER_FDW_FETCH/_REUSE.)

Subtransaction rollback is handled by trans_stack: at subxact start the current events head/tail pointers are saved, and on abort afterTriggerRestoreEventList truncates the list back to the saved position, discarding exactly the chunks added by the aborted subxact — O(chunks-added), not O(total events).

Transition tables are captured separately, into tuplestores

Section titled “Transition tables are captured separately, into tuplestores”

REFERENCING OLD TABLE / NEW TABLE is a different mechanism that shares the AFTER machinery’s lifecycle but not its CTID-event representation. When a relation has any transition-table trigger, the executor (e.g., nodeModifyTable’s setup) builds a TransitionCaptureState via MakeTransitionCaptureState, which allocates tuplestore objects in the (sub)transaction’s CurTransactionContext:

// MakeTransitionCaptureState — src/backend/commands/trigger.c
if (need_old_upd && upd_table->old_tuplestore == NULL)
upd_table->old_tuplestore = tuplestore_begin_heap(false, false, work_mem);
if (need_new_upd && upd_table->new_tuplestore == NULL)
upd_table->new_tuplestore = tuplestore_begin_heap(false, false, work_mem);
/* ... old_del, new_ins similarly; keyed by (relid, cmdType) ... */
state = (TransitionCaptureState *) palloc0(sizeof(TransitionCaptureState));
state->tcs_update_old_table = need_old_upd;
state->tcs_update_new_table = need_new_upd;
state->tcs_update_private = upd_table;

As each row flows through ExecAR*AfterTriggerSaveEvent, the OLD/NEW slot is also appended to the matching tuplestore (TransitionTableAddTuple) before — or instead of — queuing an event. The decisive design constraint, stated in both MakeTransitionCaptureState and the AfterTriggersData comment, is that transition tables are never deferrable: they live only until AfterTriggerEndQuery, so a deferrable trigger cannot reference one. This is why the tuplestores can sit in CurTransactionContext and be freed when the query level pops, rather than surviving to commit like the deferred event list. The trigger function ultimately sees these as tg_oldtable/tg_newtable in its TriggerData, exposed to SQL as the named OLD/NEW relations.

This section names the stable symbols, grouped by the path a trigger takes from catalog to execution. Adjacent mechanisms — the DML node that calls these hooks (nodeModifyTable.c), the generic CREATE TRIGGER utility plumbing, and the fmgr call convention — are covered in postgres-executor.md, postgres-ddl-execution.md, and postgres-fmgr.md respectively; here we stay inside the trigger subsystem proper.

Definition and catalog → relcache descriptor.

  • CreateTrigger / CreateTriggerFiringOn — implement CREATE TRIGGER; validate, look up the function, build and CatalogTupleInsert the pg_trigger row. Returns the new trigger’s ObjectAddress.
  • RemoveTriggerById, renametrig, EnableDisableTrigger — the rest of the DDL surface (drop, rename, ENABLE/DISABLE).
  • Form_pg_trigger / the tgtype bit macros (TRIGGER_TYPE_ROW, _BEFORE, _INSERT, …, _INSTEAD) and TRIGGER_TYPE_MATCHES — the packed on-disk classification and the test macro.
  • RelationBuildTriggers — relcache hook; scans pg_trigger by TriggerRelidNameIndexId (name order = firing order) and fills a TriggerDesc.
  • SetTriggerFlagsORs each trigger into the TriggerDesc summary booleans (trig_insert_before_row, …, trig_*_old_table).
  • CopyTriggerDesc, FreeTriggerDesc, equalTriggerDescs — descriptor lifecycle used by the relcache.
  • Trigger, TriggerDesc (in reltrigger.h); TriggerData, TransitionCaptureState (in trigger.h) — the in-memory structs.

Row/statement firing points (called by the executor).

  • ExecBSInsertTriggers / ExecBSUpdateTriggers / ExecBSDeleteTriggers / ExecBSTruncateTriggers — BEFORE STATEMENT; run synchronously, must not return a value. Guard against double-firing via before_stmt_triggers_fired.
  • ExecBRInsertTriggers / ExecBRUpdateTriggers / ExecBRDeleteTriggers — BEFORE ROW; run synchronously, return the (possibly rewritten / NULL) tuple. ExecBRUpdateTriggers/ExecBRDeleteTriggers first fetch the old row via GetTupleForTrigger.
  • ExecIRInsertTriggers / ExecIRUpdateTriggers / ExecIRDeleteTriggers — INSTEAD OF ROW (views); replace the operation.
  • ExecARInsertTriggers / ExecARUpdateTriggers / ExecARDeleteTriggers — AFTER ROW; thin guards that call AfterTriggerSaveEvent.
  • ExecASInsertTriggers / ExecASUpdateTriggers / ExecASDeleteTriggers / ExecASTruncateTriggers — AFTER STATEMENT; call AfterTriggerSaveEvent with row_trigger = false.
  • ExecCallTriggerFunc — the single fmgr chokepoint; sets up TriggerData, bumps MyTriggerDepth, invokes the function in the per-tuple context.
  • TriggerEnabled — evaluates tgenabled (session replication role) and the WHEN qualifier; returns whether this trigger fires for this row/event.

After-trigger queue (event records, chunks, drain).

  • AfterTriggerEventData (+ …NoOids, …OneCtid, …ZeroCtids size variants), AfterTriggerSharedData, AfterTriggerEventChunk, AfterTriggerEventList — the on-queue representation.
  • SizeofTriggerEvent, GetTriggerSharedData, the for_each_event / for_each_chunk iterator macros, AFTER_TRIGGER_OFFSET / _IN_PROGRESS / _DONE / _1CTID / _2CTID / _CP_UPDATE flag bits.
  • AfterTriggersData, AfterTriggersQueryData, AfterTriggersTransData, AfterTriggersTableData, and the file-static afterTriggers — global state.
  • AfterTriggerSaveEvent — the entry point from every ExecAR*/ExecAS*; validates the event, captures transition tuples, computes flags, and calls afterTriggerAddEvent.
  • afterTriggerAddEvent — the chunked-arena allocator + shared-record dedup.
  • afterTriggerMarkEvents — tag firable events with the current firing_id; migrate deferred events to the move-list.
  • afterTriggerInvokeEventsAfterTriggerExecute — re-fetch tuples by CTID under SnapshotAny (or from the FDW tuplestore) and call ExecCallTriggerFunc.
  • afterTriggerCheckState, SetConstraintsCommand, SetConstraintStateCreateSET CONSTRAINTS (deferral) state.
  • afterTriggerFreeEventList, afterTriggerRestoreEventList, afterTriggerDeleteHeadEventChunk — teardown and subxact-abort truncation.

Lifecycle hooks (called from xact.c / executor).

  • AfterTriggerBeginXact, AfterTriggerBeginQuery, AfterTriggerEndQuery, AfterTriggerFireDeferred, AfterTriggerEndXact, AfterTriggerBeginSubXact, AfterTriggerEndSubXact — the queue’s transaction integration.
  • AfterTriggerEnlargeQueryState — grow query_stack on demand.

Transition tables.

  • MakeTransitionCaptureState — allocate the OLD/NEW tuplestores keyed by (relid, cmdType); returns NULL when no transition table is needed.
  • GetAfterTriggersTableData, GetAfterTriggersTransitionTable, GetAfterTriggersStoreSlot, TransitionTableAddTuple — find/create the per-table data and append rows.

Position hints (as of 2026-06-05, REL_18 273fe94)

Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”
SymbolFileLine
Trigger (struct)src/include/utils/reltrigger.h23
TriggerDesc (struct)src/include/utils/reltrigger.h47
TriggerData (struct)src/include/commands/trigger.h31
TransitionCaptureState (struct)src/include/commands/trigger.h56
TRIGGER_EVENT_* flagssrc/include/commands/trigger.h94
TRIGGER_TYPE_ROW_INSTEADsrc/include/catalog/pg_trigger.h93
TRIGGER_TYPE_MATCHESsrc/include/catalog/pg_trigger.h141
CreateTriggersrc/backend/commands/trigger.c161
CreateTriggerFiringOnsrc/backend/commands/trigger.c178
RelationBuildTriggerssrc/backend/commands/trigger.c1862
SetTriggerFlagssrc/backend/commands/trigger.c2014
CopyTriggerDescsrc/backend/commands/trigger.c2091
ExecCallTriggerFuncsrc/backend/commands/trigger.c2310
ExecBSInsertTriggerssrc/backend/commands/trigger.c2402
ExecASInsertTriggerssrc/backend/commands/trigger.c2453
ExecBRInsertTriggerssrc/backend/commands/trigger.c2466
ExecARInsertTriggerssrc/backend/commands/trigger.c2544
ExecIRInsertTriggerssrc/backend/commands/trigger.c2570
ExecBRDeleteTriggerssrc/backend/commands/trigger.c2702
ExecARDeleteTriggerssrc/backend/commands/trigger.c2802
ExecBRUpdateTriggerssrc/backend/commands/trigger.c2972
ExecARUpdateTriggerssrc/backend/commands/trigger.c3145
ExecBSTruncateTriggerssrc/backend/commands/trigger.c3281
TriggerEnabledsrc/backend/commands/trigger.c3483
AFTER_TRIGGER_* flag bitssrc/backend/commands/trigger.c3682
AfterTriggerSharedDatasrc/backend/commands/trigger.c3694
AfterTriggerEventDatasrc/backend/commands/trigger.c3707
SizeofTriggerEvent / GetTriggerSharedDatasrc/backend/commands/trigger.c3743
AfterTriggerEventChunksrc/backend/commands/trigger.c3762
AfterTriggerEventListsrc/backend/commands/trigger.c3774
AfterTriggersDatasrc/backend/commands/trigger.c3880
afterTriggerCheckStatesrc/backend/commands/trigger.c4008
afterTriggerAddEventsrc/backend/commands/trigger.c4078
afterTriggerRestoreEventListsrc/backend/commands/trigger.c4226
AfterTriggerExecutesrc/backend/commands/trigger.c4328
afterTriggerMarkEventssrc/backend/commands/trigger.c4614
afterTriggerInvokeEventssrc/backend/commands/trigger.c4698
GetAfterTriggersTableDatasrc/backend/commands/trigger.c4867
MakeTransitionCaptureStatesrc/backend/commands/trigger.c4958
AfterTriggerBeginXactsrc/backend/commands/trigger.c5084
AfterTriggerBeginQuerysrc/backend/commands/trigger.c5116
AfterTriggerEndQuerysrc/backend/commands/trigger.c5136
AfterTriggerFireDeferredsrc/backend/commands/trigger.c5287
AfterTriggerEndXactsrc/backend/commands/trigger.c5343
GetAfterTriggersTransitionTablesrc/backend/commands/trigger.c5536
AfterTriggerSaveEventsrc/backend/commands/trigger.c6169
before_stmt_triggers_firedsrc/backend/commands/trigger.c6584

Checked against /data/hgryoo/references/postgres at REL_18_STABLE, commit 273fe94. Confirmed facts:

  • No PG18-incompatible symbols asserted. The doc references no removed rmgr (XLOG2) or B_DATACHECKSUMSWORKER_* worker states. All trigger symbols cited exist in the REL_18 tree.
  • tgtype bit layout (TRIGGER_TYPE_ROW=1<<0 … TRIGGER_TYPE_INSTEAD =1<<6, with STATEMENT/AFTER being the zero-bit cases) verified in src/include/catalog/pg_trigger.h lines 93–98; TRIGGER_TYPE_MATCHES at line 141.
  • Name-order firing verified by the TriggerRelidNameIndexId scan and the explanatory comment in RelationBuildTriggers (src/backend/commands/trigger.c).
  • The 15+2 firing-point functions (ExecBR/AR/IR × INSERT/UPDATE/DELETE, ExecBS/AS × INSERT/UPDATE/DELETE/TRUNCATE; no row-level TRUNCATE) all present with the signatures quoted; ExecBRInsertTriggers returns a tuple and ExecARInsertTriggers returns void after calling AfterTriggerSaveEvent, as quoted.
  • Event record sizing. AfterTriggerEventData carries ate_flags, ate_ctid1, ate_ctid2, ate_src_part, ate_dst_part; the four size variants and SizeofTriggerEvent are as quoted; the offset link via the low 27 bits (AFTER_TRIGGER_OFFSET = 0x07FFFFFF) and GetTriggerSharedData verified.
  • Chunk growth 1 KB → 1 MB (MIN_CHUNK_SIZE 1024, MAX_CHUNK_SIZE 1024*1024) with the doubling/halving heuristic in afterTriggerAddEvent verified.
  • SnapshotAny re-fetch at fire time via table_tuple_fetch_row_version in AfterTriggerExecute verified; FDW path uses AFTER_TRIGGER_FDW_FETCH/_REUSE and a tuplestore, as stated.
  • Lifecycle orderingAfterTriggerEndQuery fires immediate events and migrates deferred ones; AfterTriggerFireDeferred (pre-commit) drains the global list under a pushed transaction snapshot — verified against the quoted bodies and their header comments.
  • Transition tables never deferrable — asserted in the MakeTransitionCaptureState and AfterTriggersData comments; tuplestores allocated in CurTransactionContext and freed in AfterTriggerFreeQuery.
  • Caveat on line numbers. The position-hint table lists line numbers as observed at 273fe94; AfterTriggerExecute (4328) and CreateTriggerFiringOn (178) are function-definition lines. Symbols are the durable anchor; line numbers decay on any reformat.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”

PostgreSQL’s trigger architecture is one well-tested point in a space the SQL standard only partly fixes. Comparing it to other engines and to the research literature sharpens why its choices land where they do.

Firing order and the standard’s silence. The SQL standard says little about the order in which multiple triggers on the same event fire, and engines diverge. PostgreSQL fires in trigger-name alphabetical order (a consequence of the TriggerRelidNameIndexId scan in RelationBuildTriggers) — arbitrary but stable and inspectable, which is the property Database System Concepts (§5.3.2) implicitly calls for when it warns that “unintended order of firing” is a hazard. Oracle historically left same-timing order undefined until 11g added the FOLLOWS/PRECEDES clause for explicit ordering; SQL Server fires AFTER triggers in an order that is undefined except for a configurable first/last via sp_settriggerorder. PostgreSQL’s “name order” is the most predictable of the three but the least expressive — there is no per-trigger ordering clause, so users encode order in names (01_audit, 02_fk). This is a deliberate simplicity trade.

BEFORE-row tuple rewrite vs. the standard’s SET model. PostgreSQL lets a BEFORE FOR EACH ROW trigger return a modified NEW tuple, and the returned tuple becomes what is stored (ExecBRInsertTriggers). The SQL standard instead models row modification through assignment to NEW.col in the trigger body. The two are observationally similar, but PostgreSQL’s “return a tuple” convention is what makes BEFORE triggers compose as a pipeline (each trigger’s output feeds the next) and is the same convention that lets a BEFORE trigger veto a row by returning NULL — a capability the assignment model expresses only awkwardly.

Constraints as triggers — the deep design bet. PostgreSQL implements foreign keys and deferrable uniqueness as internal AFTER triggers riding the same queue (RI_FKey_* trigger functions, F_UNIQUE_KEY_RECHECK). This unifies two facilities the textbook treats separately (§5.3.3 contrasts declarative constraints against triggers), and it means the after-trigger queue’s correctness is referential-integrity correctness. The cost is visible in AfterTriggerSaveEvent, which is shot through with RI-specific skip logic (RI_FKey_trigger_type, the cross-partition-update special cases). An engine that kept constraints in a separate enforcement path would have a leaner trigger queue but two integrity mechanisms to keep coherent. The research lineage here is assertion / integrity-constraint maintenance (Ceri & Widom’s work on deriving production rules for constraint maintenance, early 1990s), which framed triggers as the operational form of declarative rules — exactly PostgreSQL’s stance.

Active databases and the ECA heritage. The trigger is the surviving commercial fragment of the active database research program (HiPAC, Ariel, Starburst, late 1980s–early 1990s), which studied event-condition-action rules as a general reactive mechanism: composite events, coupling modes (immediate / deferred / detached), and rule-execution semantics (termination, confluence). PostgreSQL implements a pragmatic subset — immediate and deferred coupling map to its immediate-mode and deferrable events; composite events and detached (separate-transaction) coupling are absent. The termination problem the active-DB literature studied formally shows up here as a runtime guard: MyTriggerDepth plus max_stack_depth/statement timeout, rather than static confluence analysis.

Set-oriented vs. row-oriented reactions. Transition tables (and their tuplestore capture in MakeTransitionCaptureState) are PostgreSQL’s answer to a long-standing critique of row-level triggers: firing a function per row is O(rows) procedure-call overhead, whereas a statement-level trigger over a transition table can run one set-based SQL statement over the whole delta. This mirrors the delta-relation approach in incremental view maintenance (the DRed and counting algorithms; Gupta & Mumick), where a change is represented as insert/delete sets rather than per-tuple events. A frontier question — relevant to PostgreSQL’s incremental-matview efforts — is whether the transition-table capture path could feed an IVM engine directly rather than a user trigger.

Push-down and streaming frontiers. Modern systems push reactive logic out of the trigger queue: change-data-capture and logical replication (PostgreSQL’s own logical_decoding, covered in postgres-logical-decoding.md) reconstruct row deltas from WAL after commit, decoupling reaction from the writing transaction entirely — the “detached coupling mode” the active-DB literature anticipated. For high-fan-out audit/replication workloads this is strictly cheaper than an AFTER trigger per row, because it pays nothing on the write path. The trigger queue remains the right tool only when the reaction must be synchronous with and transactional with the change — which is precisely the FK-enforcement case the queue was built around.

  • Source tree. /data/hgryoo/references/postgres at REL_18_STABLE, commit 273fe94 (PG 18.x). Primary file: src/backend/commands/trigger.c. Headers: src/include/commands/trigger.h, src/include/utils/reltrigger.h, src/include/catalog/pg_trigger.h. Callers (cross-referenced, not re-analyzed here): src/backend/executor/nodeModifyTable.c, src/backend/utils/cache/relcache.c (RelationBuildTriggers hook), src/backend/access/transam/xact.c (lifecycle calls).
  • Textbook anchor. Silberschatz, Korth & Sudarshan, Database System Concepts, 7th ed., ch. 5 “Advanced SQL”, §5.3 “Triggers” (ECA model, granularity, timing, transition tables, the constraints-vs-triggers guidance and the cascading/ordering hazards). Captured under knowledge/research/dbms-general/.
  • Comparative / historical. Active-database ECA lineage (HiPAC, Ariel, Starburst); Ceri & Widom on production rules for integrity-constraint maintenance; Gupta & Mumick on incremental view maintenance and delta relations — cited for context, not consumed as PG source.
  • Cross-references within this KB. knowledge/code-analysis/postgres/postgres-executor.md (the demand-pull node tree and ExecutorFinish that drives AfterTriggerEndQuery), postgres-ddl-execution.md (the CREATE TRIGGER utility path), postgres-fmgr.md (the function-call convention behind ExecCallTriggerFunc), postgres-mvcc-snapshots.md (why fire-time re-fetch uses SnapshotAny), postgres-logical-decoding.md (the post-commit, detached alternative to AFTER triggers).