Skip to content

PostgreSQL Cache Invalidation — sinval Message Queue, inval.c Dispatcher, and Transactional Deferral

Contents:

A database engine that caches metadata — catalog tuples, relation descriptors, parsed plan trees — gains orders of magnitude in lookup speed relative to going to disk on every access. The cost is coherence: a backend with a stale cached entry will see the old schema for a table whose columns have just been renamed by another session. Every multi-process engine must solve the problem of propagating metadata changes to all caches that may hold a copy of the affected entry.

Database System Concepts (Silberschatz, 7e, ch. 25 §“Buffer Management”) identifies cache invalidation as a classical distributed consistency problem reduced by two architectural choices that nearly every engine makes: (1) caches are per-process rather than shared, so the number of parties that must be notified is bounded by the session count; (2) invalidation is lazy rather than eager — the committing session broadcasts what changed and each receiving session discards its stale copy at the next safe moment rather than synchronously. This avoids the cost of blocking DDL on all live sessions.

Database Internals (Petrov, ch. 6 §“System Catalogs”) notes that the transition from eager to lazy invalidation introduces a correctness constraint: a receiving backend must not return a stale cached entry to its own query even before it has processed messages from other sessions. This is the command-boundary deferral problem: within a transaction, a backend’s own catalog changes must also flow through a local invalidation path so the backend sees its own DDL immediately.

Two design axes determine the shape of every lazy invalidation system:

  1. Broadcast medium. Is the channel a per-backend signal (push), a shared ring buffer that all backends poll (pull), or a combination? A pure push system delivers instantly but requires O(N) signals per commit where N is the session count. A pure pull system needs no signaling but a slow backend drifts arbitrarily far behind, potentially blocking garbage collection of old messages. Real systems use a ring buffer with a catchup signal sent to slow backends.

  2. Granularity of the invalidation message. Does the message name a specific tuple (by hash key) in a specific cache, or does it name the whole cache? Fine-grained messages allow selective eviction; coarse messages (reset the whole cache) are cheaper to generate and simpler to process when the buffer overflows. Real systems support both: per-tuple messages for normal DDL, whole-cache resets for overflow recovery.

PostgreSQL uses a fixed-size shared ring buffer (4096 slots, sinval) combined with a catchup interrupt mechanism and a whole-cache reset fallback for overflow. The invalidation dispatcher (inval.c) collects pending messages during a transaction and defers broadcasting them until commit, while processing them locally at command boundaries.

The three-tier flow: generate → defer → broadcast

Section titled “The three-tier flow: generate → defer → broadcast”

Nearly every lazy-invalidation system in a multi-process RDBMS follows the same three-tier architecture:

Tier 1 — generation. The code that mutates catalog tuples (heap insert, update, delete) calls an invalidation registration function immediately. The message is not sent yet; it is accumulated in a per-transaction buffer. This keeps invalidation generation collocated with the mutation site without coupling mutation latency to cross-process signaling.

Tier 2 — command-local application. At each command boundary (after CommandCounterIncrement()), pending messages from the current command are applied to the local caches only. The reason: the mutating backend’s own next command must see the new catalog state. The messages remain in the per-transaction buffer for later broadcast; they are not sent to other backends yet.

Tier 3 — broadcast at commit. At transaction commit, all accumulated messages are pushed into the shared medium (ring buffer, log, or broadcast channel). Other backends drain this medium at their next transaction start. Abort discards Tier 1 messages (never sent) and reverses the Tier 2 local applications.

Catcache vs. relcache invalidation ordering

Section titled “Catcache vs. relcache invalidation ordering”

Most engines that have both a tuple-level catalog cache (catcache) and an assembled relation descriptor cache (relcache) must process catcache invalidations before relcache invalidations. The reason: building or rebuilding a relcache entry requires reading catalog tuples; if the catcache still holds the old tuple at the moment the relcache is being rebuilt, the rebuilt relcache entry is born stale. The ordering rule is therefore: flush catcache first, then flush relcache.

A fixed-size ring buffer in shared memory is the standard choice for the broadcast medium in process-local-cache engines. The invariants:

  • maxMsgNum is the producer cursor (next write slot).
  • minMsgNum is the global consumer floor (no reader needs anything before this).
  • Each backend has its own nextMsgNum cursor.
  • If a backend’s nextMsgNum falls more than MAXNUMMESSAGES slots behind maxMsgNum, its copy of the buffer has been overwritten. The engine sets a reset flag for that backend; when the backend next checks for messages it discards all its cached state rather than processing individual messages.

The reset is the correctness safety net: it is always safe to discard the entire cache, because the next cache miss will reload from the source of truth (system catalogs). The cost is a storm of reload activity after the reset. Catchup interrupts are therefore sent to slow backends to prevent reset from becoming the common case.

Every subsystem that maintains a higher-level cache derived from catcache or relcache (plan cache, partition descriptor cache, event trigger cache, etc.) needs to know when its inputs are invalidated. The standard pattern is a per-cache callback registry: the subsystem registers a function pointer at startup; the invalidation dispatcher calls it when the relevant catcache or relcache entry is flushed. This decouples the dispatcher from knowledge of higher-level caches.

Theory conceptPostgreSQL entity
Invalidation message (per-tuple, catcache)SharedInvalidationMessage with id ≥ 0 (catcache ID)
Invalidation message (whole-catalog)SharedInvalidationMessage with id == SHAREDINVALCATALOG_ID (-1)
Invalidation message (relcache)SharedInvalidationMessage with id == SHAREDINVALRELCACHE_ID (-2)
Invalidation message (smgr / relmap / snapshot / relsync)IDs -3 to -6
Shared ring bufferSISeg.buffer[MAXNUMMESSAGES] in sinvaladt.c
Producer cursorSISeg.maxMsgNum
Consumer floorSISeg.minMsgNum
Per-backend cursorProcState.nextMsgNum
Reset flagProcState.resetState
Catchup signalPROCSIG_CATCHUP_INTERRUPT via ProcState.signaled
Per-transaction accumulation bufferTransInvalidationInfo chain in inval.c
Command-local applicationCommandEndInvalidationMessages()
Commit broadcastAtEOXact_Inval(isCommit=true)SendSharedInvalidMessages
Abort rollbackAtEOXact_Inval(isCommit=false)LocalExecuteInvalidationMessage
Subsystem callbacksyscache_callback_list[] / relcache_callback_list[]

sinval.h defines six negative IDs for special message types and reserves non-negative values for catcache IDs:

// SharedInvalidationMessage — src/include/storage/sinval.h
#define SHAREDINVALCATALOG_ID (-1) /* whole catalog flush */
#define SHAREDINVALRELCACHE_ID (-2) /* relcache entry */
#define SHAREDINVALSMGR_ID (-3) /* smgr file reference */
#define SHAREDINVALRELMAP_ID (-4) /* relation mapper */
#define SHAREDINVALSNAPSHOT_ID (-5) /* catalog snapshot */
#define SHAREDINVALRELSYNC_ID (-6) /* logical decoding relsync */

A SharedInvalidationMessage is a union with one member per type. For catcache messages (id ≥ 0) the relevant fields are the cache ID, the hash value of the invalidated key, and the database OID. For relcache messages the relevant fields are the database OID and the relation OID (InvalidOid means flush the entire relcache).

inval.c maintains two parallel accumulation arrays allocated in TopTransactionContext — one for catcache messages and one for relcache messages — plus a linked list of TransInvalidationInfo structs for subtransaction nesting:

// TransInvalidationInfo — src/backend/utils/cache/inval.c
typedef struct TransInvalidationInfo
{
struct InvalidationInfo ii; /* base: CurrentCmdInvalidMsgs +
RelcacheInitFileInval */
InvalidationMsgsGroup PriorCmdInvalidMsgs; /* cmds already processed */
struct TransInvalidationInfo *parent;
int my_level; /* subtransaction nesting depth */
} TransInvalidationInfo;
// InvalidationMsgsGroup — indexes into the two flat arrays
typedef struct InvalidationMsgsGroup
{
int firstmsg[2]; /* [0]=CatCacheMsgs, [1]=RelCacheMsgs */
int nextmsg[2];
} InvalidationMsgsGroup;

The flat arrays are never copied; the InvalidationMsgsGroup structs hold indexes into them. Appending a subtransaction’s messages to its parent is therefore an O(1) index-range adjustment, not a data copy.

For inplace updates (non-transactional catalog mutations, e.g., system catalog pg_class.reltuples write-back), a separate inplaceInvalInfo path exists. It uses the same message arrays but bypasses the command/transaction boundary logic, sending messages immediately during the WAL insertion critical section.

When heap_update or heap_delete touches a tuple in a system catalog, it calls CacheInvalidateHeapTuple:

// CacheInvalidateHeapTuple — src/backend/utils/cache/inval.c
void
CacheInvalidateHeapTuple(Relation relation,
HeapTuple tuple,
HeapTuple newtuple)
{
CacheInvalidateHeapTupleCommon(relation, tuple, newtuple,
PrepareInvalidationState);
}

The common path:

  1. Short-circuits on non-catalog relations (IsCatalogRelation check).
  2. Calls PrepareToInvalidateCacheTuple in catcache.c to determine which catcache IDs are affected and registers a SharedInvalidationMessage per affected cache via RegisterCatcacheInvalidation.
  3. For tuples in pg_class, pg_attribute, pg_index, or pg_constraint (foreign keys), also calls RegisterRelcacheInvalidation to enqueue a relcache flush for the owning relation.
  4. If the relation is in the relcache init file, sets RelcacheInitFileInval = true so the file is deleted at commit.
// CacheInvalidateHeapTupleCommon (condensed) — src/backend/utils/cache/inval.c
static void
CacheInvalidateHeapTupleCommon(Relation relation,
HeapTuple tuple,
HeapTuple newtuple,
InvalidationInfo *(*prepare_callback)(void))
{
if (!IsCatalogRelation(relation)) return;
if (IsToastRelation(relation)) return;
info = prepare_callback(); /* PrepareInvalidationState or
PrepareInplaceInvalidationState */
tupleRelId = RelationGetRelid(relation);
if (RelationInvalidatesSnapshotsOnly(tupleRelId))
RegisterSnapshotInvalidation(info, databaseId, tupleRelId);
else
PrepareToInvalidateCacheTuple(relation, tuple, newtuple,
RegisterCatcacheInvalidation,
(void *) info);
/* relcache flush for pg_class / pg_attribute / pg_index / pg_constraint */
if (tupleRelId == RelationRelationId) { relationId = ...; }
else if (tupleRelId == AttributeRelationId) { relationId = ...; }
else if (tupleRelId == IndexRelationId) { relationId = ...; }
else if (tupleRelId == ConstraintRelationId) { /* FK only */ }
else return;
RegisterRelcacheInvalidation(info, databaseId, relationId);
}

At each CommandCounterIncrement(), PostgreSQL calls CommandEndInvalidationMessages(). This processes all messages from the current command against the local caches only — no cross-process signaling yet:

// CommandEndInvalidationMessages — src/backend/utils/cache/inval.c
void
CommandEndInvalidationMessages(void)
{
if (transInvalInfo == NULL)
return;
ProcessInvalidationMessages(&transInvalInfo->ii.CurrentCmdInvalidMsgs,
LocalExecuteInvalidationMessage);
/* WAL-log per-command invalidations for logical decoding */
if (XLogLogicalInfoActive())
LogLogicalInvalidations();
AppendInvalidationMessages(&transInvalInfo->PriorCmdInvalidMsgs,
&transInvalInfo->ii.CurrentCmdInvalidMsgs);
}

After this call, the messages move from CurrentCmdInvalidMsgs to PriorCmdInvalidMsgs. The local cache is up to date for the next command in the same transaction.

At transaction commit, AtEOXact_Inval(isCommit=true):

  1. If RelcacheInitFileInval is set, calls RelationCacheInitFilePreInvalidate() (deletes the init file).
  2. Appends CurrentCmdInvalidMsgs to PriorCmdInvalidMsgs.
  3. Calls ProcessInvalidationMessagesMulti with SendSharedInvalidMessages as the processor, which pushes all accumulated messages into the sinval buffer via SIInsertDataEntries.
  4. If RelcacheInitFileInval, calls RelationCacheInitFilePostInvalidate().
// AtEOXact_Inval (isCommit path, condensed) — src/backend/utils/cache/inval.c
void
AtEOXact_Inval(bool isCommit)
{
// ... NULL check ...
if (isCommit)
{
if (transInvalInfo->ii.RelcacheInitFileInval)
RelationCacheInitFilePreInvalidate();
AppendInvalidationMessages(&transInvalInfo->PriorCmdInvalidMsgs,
&transInvalInfo->ii.CurrentCmdInvalidMsgs);
ProcessInvalidationMessagesMulti(&transInvalInfo->PriorCmdInvalidMsgs,
SendSharedInvalidMessages);
if (transInvalInfo->ii.RelcacheInitFileInval)
RelationCacheInitFilePostInvalidate();
}
else /* abort */
{
ProcessInvalidationMessages(&transInvalInfo->PriorCmdInvalidMsgs,
LocalExecuteInvalidationMessage);
}
transInvalInfo = NULL;
}

On abort, the local caches must be rolled back (the local changes from PriorCmdInvalidMsgs — already applied at command boundaries — must be undone by re-flushing). The CurrentCmdInvalidMsgs (not yet applied locally) are simply discarded.

SISeg is the shared-memory segment housing the ring buffer and all per-backend state:

// SISeg — src/backend/storage/ipc/sinvaladt.c
typedef struct SISeg
{
int minMsgNum; /* oldest unread message */
int maxMsgNum; /* next slot to write */
int nextThreshold; /* fullness trigger for SICleanupQueue */
slock_t msgnumLock; /* spinlock protecting maxMsgNum */
SharedInvalidationMessage buffer[MAXNUMMESSAGES]; /* 4096 slots */
int numProcs;
int *pgprocnos;
ProcState procState[FLEXIBLE_ARRAY_MEMBER]; /* one per backend slot */
} SISeg;
// ProcState — per-backend cursor and flags
typedef struct ProcState
{
pid_t procPid; /* 0 = inactive */
int nextMsgNum; /* next message to read */
bool resetState; /* missed messages; must reset entire cache */
bool signaled; /* catchup interrupt already sent */
bool hasMessages; /* unread messages present */
bool sendOnly; /* Startup process: send only, never receive */
// ... nextLXID
} ProcState;

MAXNUMMESSAGES = 4096. When a backend’s nextMsgNum would need to read a slot that has already been overwritten (maxMsgNum - nextMsgNum > MAXNUMMESSAGES), resetState is set to true.

Writing (SIInsertDataEntries):

// SIInsertDataEntries (condensed) — src/backend/storage/ipc/sinvaladt.c
void
SIInsertDataEntries(const SharedInvalidationMessage *data, int n)
{
while (n > 0) {
int nthistime = Min(n, WRITE_QUANTUM); /* 64 */
n -= nthistime;
LWLockAcquire(SInvalWriteLock, LW_EXCLUSIVE);
/* Clean/reset if full */
for (;;) {
numMsgs = segP->maxMsgNum - segP->minMsgNum;
if (numMsgs + nthistime > MAXNUMMESSAGES || numMsgs >= segP->nextThreshold)
SICleanupQueue(true, nthistime);
else break;
}
max = segP->maxMsgNum;
while (nthistime-- > 0)
segP->buffer[max++ % MAXNUMMESSAGES] = *data++;
SpinLockAcquire(&segP->msgnumLock);
segP->maxMsgNum = max; /* memory barrier via spinlock */
SpinLockRelease(&segP->msgnumLock);
for (i = 0; i < segP->numProcs; i++)
segP->procState[segP->pgprocnos[i]].hasMessages = true;
LWLockRelease(SInvalWriteLock);
}
}

SInvalWriteLock serializes producers. The spinlock on maxMsgNum provides a memory barrier: messages are guaranteed to be visible in buffer[] before maxMsgNum is advanced.

Reading (SIGetDataEntries):

// SIGetDataEntries (condensed) — src/backend/storage/ipc/sinvaladt.c
int
SIGetDataEntries(SharedInvalidationMessage *data, int datasize)
{
if (!stateP->hasMessages) return 0; /* fast path: nothing pending */
LWLockAcquire(SInvalReadLock, LW_SHARED);
stateP->hasMessages = false;
SpinLockAcquire(&segP->msgnumLock);
max = segP->maxMsgNum;
SpinLockRelease(&segP->msgnumLock);
if (stateP->resetState) {
stateP->nextMsgNum = max;
stateP->resetState = false;
LWLockRelease(SInvalReadLock);
return -1; /* -1 = reset signal */
}
n = 0;
while (n < datasize && stateP->nextMsgNum < max)
data[n++] = segP->buffer[stateP->nextMsgNum++ % MAXNUMMESSAGES];
// ... reset hasMessages if partial read ...
LWLockRelease(SInvalReadLock);
return n;
}

Multiple backends can call SIGetDataEntries in parallel under a shared SInvalReadLock, because each backend modifies only its own ProcState fields. The lock is not held in the conventional read-only sense; it is held to authorize mutation of self-only state, providing the memory barrier needed to see messages written under SInvalWriteLock.

Receiving messages (AcceptInvalidationMessages)

Section titled “Receiving messages (AcceptInvalidationMessages)”

Each backend calls AcceptInvalidationMessages() at the start of each transaction (in StartTransaction) and at other checkpoints (e.g., after acquiring a lock). It calls ReceiveSharedInvalidMessages, which loops calling SIGetDataEntries until the queue is drained:

// AcceptInvalidationMessages — src/backend/utils/cache/inval.c
void
AcceptInvalidationMessages(void)
{
ReceiveSharedInvalidMessages(LocalExecuteInvalidationMessage,
InvalidateSystemCaches);
// ... optional debug_discard_caches path
}

LocalExecuteInvalidationMessage dispatches each message by type:

// LocalExecuteInvalidationMessage (condensed) — src/backend/utils/cache/inval.c
void
LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
{
if (msg->id >= 0) /* catcache tuple */
{
InvalidateCatalogSnapshot();
SysCacheInvalidate(msg->cc.id, msg->cc.hashValue);
CallSyscacheCallbacks(msg->cc.id, msg->cc.hashValue);
}
else if (msg->id == SHAREDINVALCATALOG_ID) /* whole catalog */
{
InvalidateCatalogSnapshot();
CatalogCacheFlushCatalog(msg->cat.catId);
}
else if (msg->id == SHAREDINVALRELCACHE_ID) /* relcache entry */
{
RelationCacheInvalidateEntry(msg->rc.relId); /* or full flush */
/* call relcache_callback_list entries */
}
else if (msg->id == SHAREDINVALSMGR_ID) { smgrreleaserellocator(...); }
else if (msg->id == SHAREDINVALRELMAP_ID) { RelationMapInvalidate(...); }
else if (msg->id == SHAREDINVALSNAPSHOT_ID) { InvalidateCatalogSnapshot(); }
else if (msg->id == SHAREDINVALRELSYNC_ID) { CallRelSyncCallbacks(...); }
}

If SIGetDataEntries returns -1 (reset), InvalidateSystemCaches() is called instead: it wipes all catcache and relcache entries, then fires all registered syscache and relcache callbacks.

Subsystems that cache derived state register callbacks to be notified on invalidation events:

// CacheRegisterSyscacheCallback — src/backend/utils/cache/inval.c
void
CacheRegisterSyscacheCallback(int cacheid,
SyscacheCallbackFunction func,
Datum arg)
{
// adds to syscache_callback_list[], linked by syscache_callback_links[id]
}
// CacheRegisterRelcacheCallback — src/backend/utils/cache/inval.c
void
CacheRegisterRelcacheCallback(RelcacheCallbackFunction func,
Datum arg)
{
// adds to relcache_callback_list[]
}

Up to MAX_SYSCACHE_CALLBACKS = 64 syscache callbacks and MAX_RELCACHE_CALLBACKS = 10 relcache callbacks are supported. Callbacks are chained per-cache-ID via syscache_callback_links[] for O(1) dispatch. Callers include the plan cache, the partition descriptor cache, the event trigger cache, and logical decoding subsystems.

For catalog changes that are not transactional — such as updating pg_class.reltuples during ANALYZE — an inplace update path exists that bypasses the transaction/subtransaction stack. CacheInvalidateHeapTupleInplace queues messages into a separate inplaceInvalInfo structure; AtInplace_Inval() sends them directly to the sinval buffer during the WAL insertion critical section, so the update is immediately visible to other backends. PreInplace_Inval() handles relcache init file deletion before the critical section.

flowchart TD
    A["heap_update / heap_delete\n(catalog relation)"] -->|CacheInvalidateHeapTuple| B["CacheInvalidateHeapTupleCommon\ninval.c"]
    B -->|catcache msg| C["CurrentCmdInvalidMsgs\n(catcache array)"]
    B -->|relcache msg| D["CurrentCmdInvalidMsgs\n(relcache array)"]
    C --> E["CommandEndInvalidationMessages\n(CommandCounterIncrement)"]
    D --> E
    E -->|LocalExecuteInvalidationMessage| F["SysCacheInvalidate +<br/>RelationCacheInvalidateEntry\n(local caches)"]
    E -->|move to| G["PriorCmdInvalidMsgs"]
    G -->|commit: AtEOXact_Inval| H["SendSharedInvalidMessages\n→ SIInsertDataEntries"]
    G -->|abort: AtEOXact_Inval| I["LocalExecuteInvalidationMessage\n(undo prior-cmd changes)"]
    H --> J["SISeg.buffer\n4096-slot ring, shared memory"]
    J -->|AcceptInvalidationMessages\nnext transaction start| K["SIGetDataEntries"]
    K -->|n > 0| L["LocalExecuteInvalidationMessage\nper message"]
    K -->|returns -1 reset| M["InvalidateSystemCaches\nfull wipe + all callbacks"]
    L --> N["SysCacheInvalidate\nRelationCacheInvalidateEntry\nCallbacks..."]

Figure 1 — Cache invalidation flow from catalog mutation to cross-backend delivery. Left side: the generating backend. Right side: the receiving backend. The sinval ring buffer is the only shared structure.

End-to-end fan-out: one mutation, every backend

Section titled “End-to-end fan-out: one mutation, every backend”

Figure 1 traces the data structures inside a single backend. Figure 2 takes the complementary view: it follows one catalog mutation as it fans out to every live backend through the shared sinval queue. The key asymmetry is that the producer runs the path once (registration → accumulate → broadcast at commit), while the consumer path runs N times, independently, in each backend that drains the ring buffer at its own next AcceptInvalidationMessages checkpoint. Each consumer ends at the same terminal action: dropping the stale catcache and relcache entries so the next lookup reloads from the catalog.

Note the per-message database guard inside LocalExecuteInvalidationMessage: catcache, relcache, relmap, and snapshot messages all short-circuit unless msg->*.dbId == MyDatabaseId || dbId == InvalidOid, so a backend connected to a different database ignores another database’s DDL even though the message physically passed through the shared ring it polls.

flowchart TD
    subgraph PROD["Producer backend (runs once)"]
        A["heap_update / heap_delete\non a catalog relation"] -->|CacheInvalidateHeapTuple| B["CacheInvalidateHeapTupleCommon"]
        B -->|PrepareToInvalidateCacheTuple| C["RegisterCatcacheInvalidation\n(catcache msg: id, hashValue, dbId)"]
        B -->|pg_class / pg_attribute /<br/>pg_index / pg_constraint| D["RegisterRelcacheInvalidation\n(relcache msg: dbId, relId)"]
        C --> E["CurrentCmdInvalidMsgs\n(per-transaction accumulation)"]
        D --> E
        E -->|CommandEndInvalidationMessages\nmove to PriorCmdInvalidMsgs| F["AtEOXact_Inval\nisCommit = true"]
        F -->|SendSharedInvalidMessages| G["SIInsertDataEntries\nunder SInvalWriteLock"]
    end
    G --> Q["SISeg.buffer\n4096-slot ring in shared memory\nmaxMsgNum advanced; hasMessages=true for all backends"]
    Q --> H1["Backend 1\nAcceptInvalidationMessages"]
    Q --> H2["Backend 2\nAcceptInvalidationMessages"]
    Q --> H3["Backend N\nAcceptInvalidationMessages"]
    subgraph CONS["Consumer backend (runs in each of N backends)"]
        H1 -->|ReceiveSharedInvalidMessages| I["SIGetDataEntries"]
        I -->|n >= 0: per message| J["LocalExecuteInvalidationMessage\ndbId guard: MyDatabaseId or InvalidOid"]
        I -->|returns -1: overflow| K["InvalidateSystemCaches\nfull catcache + relcache wipe"]
        J -->|id >= 0| L["SysCacheInvalidate +\nCallSyscacheCallbacks\n(catcache entry drop)"]
        J -->|SHAREDINVALRELCACHE_ID| M["RelationCacheInvalidateEntry +\nrelcache_callback_list\n(relcache entry drop)"]
    end

Figure 2 — End-to-end fan-out of a single catalog mutation. The producer path executes once and terminates by writing the ring buffer under SInvalWriteLock. Every live backend independently drains the same ring at its next AcceptInvalidationMessages checkpoint; each ends by dropping the affected catcache and relcache entries (or, on overflow, wiping all caches). The dbId guard in LocalExecuteInvalidationMessage lets backends on other databases skip irrelevant messages.

When wal_level = logical, CommandEndInvalidationMessages also calls LogLogicalInvalidations() to write per-command invalidation messages into the WAL stream. This allows logical decoding backends (WAL subscribers) to replay catalog changes and maintain their own catcache/relcache without access to the live sinval queue.

The ProcessCommittedInvalidationMessages function is the redo-time analog of AtEOXact_Inval: it is called by xact_redo_commit() on the standby to broadcast the invalidation messages embedded in the commit WAL record to the standby’s sinval queue.

  • CacheInvalidateHeapTuple — entry point called by heap DML on catalog relations; routes to CacheInvalidateHeapTupleCommon with transactional state prepare.
  • CacheInvalidateHeapTupleInplace — inplace-update variant; bypasses transaction stack.
  • CacheInvalidateCatalog — registers a whole-catalog flush (used by VACUUM FULL on a catalog).
  • CacheInvalidateRelcache — registers a relcache flush for a specific relation when no tuple-level trigger fires (e.g., DROP INDEX).
  • CacheInvalidateRelcacheAll — broadcasts InvalidOid to flush all relcache entries cluster-wide.
  • PrepareInvalidationState — allocates / reuses a TransInvalidationInfo for the current (sub)transaction nesting level.
  • CommandEndInvalidationMessages — applies CurrentCmdInvalidMsgs locally, WAL-logs if wal_level=logical, moves to PriorCmdInvalidMsgs.
  • AtEOXact_Inval — commit: broadcasts PriorCmdInvalidMsgs via sinval; abort: applies them locally (undo).
  • AtEOSubXact_Inval — subtransaction commit: bubbles messages to parent; subtransaction abort: applies locally.
  • AtInplace_Inval / PreInplace_Inval — inplace-update broadcast during WAL critical section.
  • PostPrepare_Inval — PREPARE path: behaves like abort (undo local changes; the transaction’s broadcast will arrive via ProcessCommittedInvalidationMessages when it ultimately commits).
  • AcceptInvalidationMessages (inval.c) — outer entry; calls ReceiveSharedInvalidMessages with LocalExecuteInvalidationMessage and InvalidateSystemCaches as the two dispatch callbacks.
  • LocalExecuteInvalidationMessage (inval.c) — per-message dispatch: routes to SysCacheInvalidate, RelationCacheInvalidateEntry, smgr, relmap, snapshot, or relsync handlers.
  • InvalidateSystemCaches / InvalidateSystemCachesExtended (inval.c) — full reset: wipes catcache + relcache + all callbacks.
  • ReceiveSharedInvalidMessages (sinval.c) — thin wrapper: loops SIGetDataEntries; handles PROCSIG_CATCHUP_INTERRUPT via SendSharedInvalidMessages.
  • SIGetDataEntries (sinvaladt.c) — reads backend’s pending messages from the ring buffer; returns -1 on reset.
  • SIInsertDataEntries (sinvaladt.c) — writes up to WRITE_QUANTUM=64 messages per lock hold; calls SICleanupQueue if buffer is filling.
  • CacheRegisterSyscacheCallback (inval.c) — registers a hook for a specific syscache ID; linked list per cache via syscache_callback_links[].
  • CacheRegisterRelcacheCallback (inval.c) — registers a hook for any relcache flush.
  • CacheRegisterRelSyncCallback (inval.c) — logical decoding relsync hook.

Position hints (as of 2026-06-05, commit 273fe94)

Section titled “Position hints (as of 2026-06-05, commit 273fe94)”
SymbolFileLine
CacheInvalidateHeapTuplesrc/backend/utils/cache/inval.c1571
CacheInvalidateHeapTupleCommonsrc/backend/utils/cache/inval.c1436
CacheInvalidateHeapTupleInplacesrc/backend/utils/cache/inval.c1593
CacheInvalidateCatalogsrc/backend/utils/cache/inval.c1612
CacheInvalidateRelcachesrc/backend/utils/cache/inval.c1635
CommandEndInvalidationMessagessrc/backend/utils/cache/inval.c1409
AtEOXact_Invalsrc/backend/utils/cache/inval.c1199
AtEOSubXact_Invalsrc/backend/utils/cache/inval.c1310
AtInplace_Invalsrc/backend/utils/cache/inval.c1263
AcceptInvalidationMessagessrc/backend/utils/cache/inval.c930
LocalExecuteInvalidationMessagesrc/backend/utils/cache/inval.c823
InvalidateSystemCachessrc/backend/utils/cache/inval.c916
PrepareInvalidationStatesrc/backend/utils/cache/inval.c682
RegisterCatcacheInvalidationsrc/backend/utils/cache/inval.c604
RegisterRelcacheInvalidationsrc/backend/utils/cache/inval.c632
RegisterSnapshotInvalidationsrc/backend/utils/cache/inval.c672
CacheRegisterSyscacheCallbacksrc/backend/utils/cache/inval.c1816
CacheRegisterRelcacheCallbacksrc/backend/utils/cache/inval.c1858
xactGetCommittedInvalidationMessagessrc/backend/utils/cache/inval.c1012
ProcessCommittedInvalidationMessagessrc/backend/utils/cache/inval.c1135
TransInvalidationInfo (struct)src/backend/utils/cache/inval.c241
SISeg (struct)src/backend/storage/ipc/sinvaladt.c165
ProcState (struct)src/backend/storage/ipc/sinvaladt.c136
SIInsertDataEntriessrc/backend/storage/ipc/sinvaladt.c370
SIGetDataEntriessrc/backend/storage/ipc/sinvaladt.c473
SharedInvalBackendInitsrc/backend/storage/ipc/sinvaladt.c272
SICleanupQueuesrc/backend/storage/ipc/sinvaladt.c~560
SendSharedInvalidMessagessrc/backend/storage/ipc/sinval.c47
ReceiveSharedInvalidMessagessrc/backend/storage/ipc/sinval.c69
  • MAXNUMMESSAGES = 4096 is a compile-time constant. Verified in sinvaladt.c line 129 at commit 273fe94. Not a GUC. There is no runtime mechanism to resize the ring buffer without recompiling.

  • Catcache messages are ordered before relcache messages within each subgroup. Verified by reading AtEOXact_Inval: it calls ProcessInvalidationMessagesMulti which processes CatCacheMsgs subgroup first, then RelCacheMsgs, matching the design principle that catcache must be clean before relcache is rebuilt.

  • SIGetDataEntries runs under a shared SInvalReadLock. Confirmed in sinvaladt.c. Multiple backends can drain their own ProcState.nextMsgNum concurrently without contention on the read path, because each modifies only its own per-backend state. The shared lock is used unconventionally to authorize self-modification, not to serialize reads.

  • hasMessages flag provides a fast-path check before acquiring any lock. Verified: SIGetDataEntries returns 0 immediately if stateP->hasMessages is false. The flag is set by the producer inside SInvalWriteLock after advancing maxMsgNum, giving a memory-barrier ordering guarantee.

  • Inplace-update path (AtInplace_Inval) calls SendSharedInvalidMessages inside a critical section (CritSectionCount > 0). Confirmed by the Assert(CritSectionCount > 0) in AtInplace_Inval. This means the sinval buffer write is atomic with the WAL record for the inplace heap change.

  • The debug_discard_caches GUC exists (default 0) for stress testing. Verified: AcceptInvalidationMessages contains a DISCARD_CACHES_ENABLED block that calls InvalidateSystemCachesExtended(true) recursively up to debug_discard_caches levels. Available only if compiled with DISCARD_CACHES_ENABLED (set by --enable-discard-caches). The debug_discard_caches = 1 mode was formerly CLOBBER_CACHE_ALWAYS.

  • RelcacheInitFileInval flag is per-transaction, not per-message. A single boolean on TransInvalidationInfo.ii.RelcacheInitFileInval. Any RegisterRelcacheInvalidation for a relation that RelationIdIsInInitFile sets it; the init file is deleted only once at commit, not per-message.

  • MAX_SYSCACHE_CALLBACKS = 64, MAX_RELCACHE_CALLBACKS = 10. Both are fixed arrays (no dynamic growth). If a subsystem attempts to register more, elog(FATAL) fires. As of REL_18_STABLE, no regression test exercises the limit, but the counts are well below the cap in a standard build.

  1. SICleanupQueue reset heuristic. SIG_THRESHOLD = MAXNUMMESSAGES / 2 (2048) determines when a catchup interrupt is sent to a lagging backend. The comment says “the furthest-back backend might be stuck,” but there is no timeout-based reset: a backend that is stuck in a state where it cannot call AcceptInvalidationMessages will eventually get a resetState set, then log a message and invalidate its entire cache. The impact on a production system of a single stuck backend filling a 4096-slot buffer is worth measuring; investigation path: instrument SICleanupQueue calls in pg_stat_activity or a custom extension.

  2. sendOnly semantics for the Startup process. ProcState.sendOnly = true is set for the Startup process during recovery. The code comment says it “fires inval messages to allow query backends to see schema changes” but “doesn’t maintain a relcache.” How catalog changes during recovery (e.g., from committed transactions being replayed) flow through the sinval path and whether any receiving backend actually rebuilds relcache correctly during hot-standby reads is not fully traced in this doc.

  3. WAL-logged invalidations and logical decoding. LogLogicalInvalidations at CommandEndInvalidationMessages writes per-command invalidation messages into WAL. The format and the exact consumer path through xlogreader and decode.c on the subscriber side is not traced here. The interaction with the reorderbuffer snapshots that logical decoding maintains deserves a dedicated analysis.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”
  • Oracle’s global shared cache invalidation. Oracle’s buffer cache is shared (SGA), so cache invalidation does not require per-process propagation. When a DDL commits, Oracle invalidates child cursors in the shared cursor cache by marking them invalid in the library cache; the next execution rebinds. The per-process sinval ring buffer pattern is an artifact of PostgreSQL’s per-process architecture — a shared-cache engine has a different problem (cursor invalidation rather than metadata propagation).

  • MySQL/InnoDB table definition cache (TDC) and MDL. MySQL uses a centralized table definition cache with metadata locks (MDL) that block DDL until all sessions using a table have released their references. This is an eager rather than lazy model: DDL waits for the cache to be clean rather than sending invalidation after commit. The tradeoff is no stale- read risk but higher DDL latency under concurrency.

  • Lock-based cache coherence (distributed DBMS). In distributed engines (CockroachDB, Spanner, YugabyteDB), schema changes propagate via a lease or version mechanism: each backend caches the schema at a given lease version; DDL bumps the version and waits for all backends to observe the new version before proceeding. The PostgreSQL sinval model is a single-node analog — “version bumped at commit, all backends eventually drain the queue” — without the distributed coordination.

  • Lock-free sinval. The current sinval design uses SInvalReadLock (LWLock shared) and SInvalWriteLock (LWLock exclusive) plus a spinlock on maxMsgNum. With atomic read/write and hardware memory barriers it may be possible to eliminate the LWLock for the reader path entirely, reducing contention at very high session counts. This is an open optimization direction noted in source comments.

None — synthesized from source tree directly.

  • Database System Concepts (Silberschatz et al., 7e) — ch. 11 §“System Catalog”, ch. 25 §“Buffer Management”.
  • Database Internals (Alex Petrov) — ch. 6 §“System Catalogs and Metadata”.

Source code paths (REL_18_STABLE, commit 273fe94)

Section titled “Source code paths (REL_18_STABLE, commit 273fe94)”
  • src/backend/utils/cache/inval.c — invalidation dispatcher and accumulator.
  • src/backend/storage/ipc/sinvaladt.c — shared ring buffer implementation.
  • src/backend/storage/ipc/sinval.c — thin wrappers over sinvaladt.
  • src/include/storage/sinval.hSharedInvalidationMessage union and ID constants.
  • src/include/utils/inval.h — public API declarations.
  • src/backend/utils/cache/catcache.cPrepareToInvalidateCacheTuple, SysCacheInvalidate.
  • src/backend/utils/cache/relcache.cRelationCacheInvalidateEntry, RelationCacheInitFilePreInvalidate.
  • postgres-catcache-syscache.md — how catcache entries are invalidated when SysCacheInvalidate is called; negative entries; callback chain.
  • postgres-relcache.md — how RelationCacheInvalidateEntry rebuilds or flushes a relcache entry; the init file; bootstrap nailing.
  • postgres-xlog-wal.md — WAL record format; LogLogicalInvalidations and how inval messages embed in commit records.
  • postgres-xact.md — transaction lifecycle; where AtEOXact_Inval and CommandEndInvalidationMessages are called in StartTransaction / CommitTransaction / CommandCounterIncrement.
  • postgres-shared-memory-ipc.mdSISeg allocation; LWLock and spinlock primitives used by sinvaladt.