PostgreSQL CatCache & SysCache — Per-Backend Catalog Tuple Cache, Negative Entries, and the sinval Invalidation Loop
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”A database engine spends a surprising fraction of its CPU in its own
data dictionary — the set of tables that describe tables, columns,
types, functions, operators, and every other schema object. In
PostgreSQL these are the system catalogs (pg_class, pg_attribute,
pg_proc, pg_type, and dozens of others). Almost every node of a
query plan requires at least one catalog lookup: the parser checks that
a relation name exists; the type resolver fetches type OIDs and input
functions; the planner queries pg_statistic for column statistics; the
executor resolves operator OIDs to function pointers. Without a cache,
every one of these queries would go to disk through the buffer manager —
a prohibitive overhead for a multi-join plan that touches hundreds of
catalog rows.
The general concept is the data dictionary cache (or catalog cache): a main-memory structure that retains recently accessed catalog tuples so that the same lookup can be answered in microseconds rather than milliseconds. Database System Concepts (Silberschatz, ch. 25 §“Buffer Management”) names catalog caching as a standard technique alongside the buffer pool; Database Internals (Petrov, ch. 6) ties it to the broader observation that metadata lookups are on the critical path of every query and must be made as cheap as possible.
Two axes define the design space for a catalog cache:
-
Scope: per-process vs. shared. Should the cache live in the backend’s private heap (one copy per session), in shared memory (shared across all backends), or in both? Per-process caches are simpler and require no locking for reads, but duplicate work across many concurrent backends. Shared caches are memory-efficient but require concurrency control on every access.
-
Invalidation: eager vs. lazy. When a DDL statement changes a catalog row, cached copies elsewhere are stale. Eager invalidation (broadcast-and-wait) keeps caches consistent immediately but couples all backends to the DDL transaction. Lazy invalidation (mark-and- sweep on next use) decouples DDL from readers but requires each backend to detect staleness before returning a cached value.
PostgreSQL chooses per-process private caches with lazy invalidation
mediated by the shared invalidation message queue (sinval). The
choice reflects the same “shared-memory machine, processes as
inhabitants” philosophy that governs the rest of the architecture (see
postgres-architecture-overview.md axis 6): no per-cache locking is
needed for normal reads, and the invalidation mechanism integrates
cleanly into the existing sinval infrastructure rather than
introducing a second coherence path.
An important theoretical nuance is the negative cache entry: an explicit record that a lookup for a given key returned no tuple. In a naive cache, a “miss” simply returns to the catalog. In a catalog with hundreds of concurrent backends doing the same repeated lookup for a non-existent row (common when checking whether an optional catalog entry exists), the cache can never help. A negative entry converts the miss into a cache hit on subsequent calls — the entry represents the absence of a tuple as positively as a real entry represents its presence. Invalidation handles negative entries exactly like positive ones: if a new tuple is inserted, the negative entry is marked dead.
Common DBMS Design
Section titled “Common DBMS Design”Nearly every production database engine with a system catalog — Oracle, SQL Server, MySQL InnoDB, DB2 — converges on a set of patterns that sit between the textbook model and the specific implementation.
Hash table keyed on catalog identity
Section titled “Hash table keyed on catalog identity”The cache is a hash table. The key is the catalog identity: which relation is being queried plus the specific key values used to look up a row (an OID, a name, or a composite). The value is the cached tuple (or, for a negative entry, a sentinel). The hash gives O(1) average lookup time without loading the entire catalog into memory.
Refcount-based tuple lifetime
Section titled “Refcount-based tuple lifetime”Cached tuples are not freed at end-of-query. They live until no code holds a reference to them and they have been invalidated. The pattern is: caller increments a reference count on retrieval; caller decrements it on release; the cache frees the tuple only when the refcount drops to zero and the entry is marked dead. This lets the caller hold a raw pointer into the cache without copying and without worrying about the tuple disappearing underneath it.
Positive and negative entries managed identically
Section titled “Positive and negative entries managed identically”Most engines that implement negative caching represent both positive and
negative entries with the same struct, differing only in a negative
flag. This uniformity means that invalidation, eviction, and refcount
logic is identical for both kinds. Only the return value to the caller
differs: positive returns the tuple pointer, negative returns NULL.
Broadcast invalidation via a shared queue
Section titled “Broadcast invalidation via a shared queue”When a transaction commits a change to a catalog row, it must notify all other backends that their cached copies of that row are stale. The common mechanism is a shared message queue from which every backend reads on its next “safe point” (typically at the start of a command or at a wait point). The message carries enough information — usually just a hash value of the affected key — for each backend to mark the corresponding cached entry dead. The entry is not freed immediately if another caller holds a refcount; it is freed when the refcount drops to zero.
Separate layers: generic cache engine vs. named-cache facade
Section titled “Separate layers: generic cache engine vs. named-cache facade”In most production engines the catalog cache has two levels: a generic
tuple-cache layer that knows about hash buckets, refcounts, and
invalidation, but not about any specific catalog; and a facade layer
that maps human-readable catalog names to integer IDs and exposes
convenience functions (get_type_by_oid(oid) rather than lookup(cache_id, datum)). The facade layer is what the rest of the engine calls; the
generic layer is what the facade layer uses. This split lets the generic
layer be tested and reasoned about independently, and lets the facade
evolve its API without touching the cache internals.
Theory ↔ PostgreSQL mapping
Section titled “Theory ↔ PostgreSQL mapping”| Theory concept | PostgreSQL name |
|---|---|
| Data dictionary cache | CatCache (generic) + SysCache[] array (named) |
| Hash bucket array | CatCache.cc_bucket[] — array of dlist_head |
| Cached tuple (positive) | CatCTup with negative = false |
| Negative cache entry | CatCTup with negative = true |
| Multi-row partial-key result | CatCList |
| Reference count | CatCTup.refcount / CatCList.refcount |
| Invalidation message | sinval message carrying (cache_id, hash_value) |
| Named-cache facade | syscache.c + SysCache[] indexed by enum from MAKE_SYSCACHE |
| Convenience wrappers | lsyscache.c — get_attname, get_func_name, … |
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”Layer architecture
Section titled “Layer architecture”PostgreSQL’s catalog cache has three layers, each in its own source file:
lsyscache.c ← convenience API (get_attname, get_func_name, …) │syscache.c ← named-ID facade (SearchSysCache1/2/3/4, cacheinfo[]) │catcache.c ← generic hash-table engine (CatCache, CatCTup, CatCList)catcache.c knows nothing about which catalog it is caching. It
receives a relation OID, an index OID, a key-count, and key-column
attribute numbers, and builds a hash table. syscache.c populates the
SysCache[] array using the cacheinfo[] table (generated from
MAKE_SYSCACHE declarations in pg_*.h headers), and exposes the
SearchSysCache1/2/3/4 family. lsyscache.c wraps syscache.c with
one function per common query pattern, so callers write
get_func_name(funcid) rather than manipulating HeapTuple directly.
How caches are declared: MAKE_SYSCACHE
Section titled “How caches are declared: MAKE_SYSCACHE”Each catalog header declares its caches with MAKE_SYSCACHE:
// MAKE_SYSCACHE — src/include/catalog/pg_proc.h (line 143)MAKE_SYSCACHE(PROCOID, pg_proc_oid_index, 128);
// MAKE_SYSCACHE — src/include/catalog/pg_type.h (line 268)MAKE_SYSCACHE(TYPEOID, pg_type_oid_index, 64);
// MAKE_SYSCACHE — src/include/catalog/pg_class.h (line 162)MAKE_SYSCACHE(RELOID, pg_class_oid_index, 128);MAKE_SYSCACHE(RELNAMENSP, pg_class_relname_nsp_index, 128);The macro registers: a symbolic cache ID (e.g. PROCOID), the
underlying unique index OID expression, and the initial bucket count
(always a power of two). The cacheinfo[] array in syscache.c is
populated by #include "catalog/syscache_info.h", which expands those
declarations into struct cachedesc entries holding (reloid, indoid, nkeys, key[], nbuckets). InitCatalogCache iterates that array and
calls InitCatCache for each entry, building the SysCache[] array.
The CatCache struct
Section titled “The CatCache struct”// struct catcache — src/include/utils/catcache.htypedef struct catcache{ int id; /* cache identifier (e.g. PROCOID) */ int cc_nbuckets; /* # of hash buckets (power of 2) */ TupleDesc cc_tupdesc; /* copied from relcache at first use */ dlist_head *cc_bucket; /* per-bucket doubly linked lists */ CCHashFN cc_hashfunc[CATCACHE_MAXKEYS]; /* per-key hash fns */ CCFastEqualFN cc_fastequal[CATCACHE_MAXKEYS]; /* per-key eq fns */ int cc_keyno[CATCACHE_MAXKEYS]; /* attr numbers */ int cc_nkeys; /* 1..CATCACHE_MAXKEYS */ int cc_ntup; /* current tuple count */ int cc_nlist; /* current list count */ dlist_head *cc_lbucket; /* list hash buckets (allocated on demand) */ Oid cc_reloid; /* relation being cached */ Oid cc_indexoid; /* supporting unique index */ bool cc_relisshared; /* shared across databases? */ ScanKeyData cc_skey[CATCACHE_MAXKEYS]; /* precomputed scan key */ /* ... CATCACHE_STATS fields omitted ... */} CatCache;cc_tupdesc is NULL until first use; CatalogCacheInitializeCache
fills it by opening the relation. This lazy init avoids opening catalog
relations before the relcache is ready, which matters during bootstrap.
CatCTup: the cached tuple
Section titled “CatCTup: the cached tuple”// struct catctup — src/include/utils/catcache.htypedef struct catctup{ int ct_magic; /* CT_MAGIC = 0x57261502 */ uint32 hash_value; /* precomputed hash of keys */ Datum keys[CATCACHE_MAXKEYS]; /* extracted key values */ dlist_node cache_elem; /* link in per-bucket list (LRU order) */ int refcount; /* active references */ bool dead; /* marked for removal */ bool negative; /* negative cache entry? */ HeapTupleData tuple; /* tuple management header */ struct catclist *c_list; /* owning CatCList, or NULL */ CatCache *my_cache; /* owning CatCache */ /* aligned tuple data follows (unless negative) */} CatCTup;The CatCTup and its tuple data are allocated in a single palloc in
CacheMemoryContext, so the tuple bytes immediately follow the struct.
For negative entries, no tuple data is stored; keys[] hold the
searched-for values (separately palloc’d, freed when the entry is
removed).
A key design choice: per-bucket dlist_head lists are maintained in
LRU order. SearchCatCacheInternal moves a hit to the front of its
bucket list with dlist_move_head, so the most recently used entries
are always at the head. This is cheap (two pointer swaps) and
effectively makes the cache approximate-LRU within each bucket.
CatCList: partial-key multi-row results
Section titled “CatCList: partial-key multi-row results”When a caller needs all tuples sharing a partial key — for example, all
pg_attribute rows for a given relid — SearchCatCacheList builds a
CatCList:
// struct catclist — src/include/utils/catcache.htypedef struct catclist{ int cl_magic; /* CL_MAGIC = 0x52765103 */ uint32 hash_value; /* hash of the partial key */ dlist_node cache_elem; /* link in per-catcache list bucket */ Datum keys[CATCACHE_MAXKEYS]; /* partial-key values */ int refcount; bool dead; bool ordered; /* members in index order? */ short nkeys; /* # of key columns specified */ int n_members; CatCache *my_cache; CatCTup *members[FLEXIBLE_ARRAY_MEMBER]; /* member tuples */} CatCList;A CatCList entry holds pointers to the individual CatCTup entries
it contains; those entries also exist in the per-bucket hash lists for
point lookups. A CatCTup can belong to at most one CatCList. The
list hash buckets (cc_lbucket) are allocated lazily on the first list
search, and grow by doubling when fill factor exceeds 2.
The fast path: SearchCatCacheInternal
Section titled “The fast path: SearchCatCacheInternal”The hot path through the cache avoids any catalog access:
// SearchCatCacheInternal — src/backend/utils/cache/catcache.c (line 1402)static inline HeapTupleSearchCatCacheInternal(CatCache *cache, int nkeys, Datum v1, Datum v2, Datum v3, Datum v4){ uint32 hashValue; Index hashIndex; dlist_iter iter; CatCTup *ct;
ConditionalCatalogCacheInitializeCache(cache); /* one-time init */
hashValue = CatalogCacheComputeHashValue(cache, nkeys, v1, v2, v3, v4); hashIndex = HASH_INDEX(hashValue, cache->cc_nbuckets); /* bitmask */
dlist_foreach(iter, &cache->cc_bucket[hashIndex]) { ct = dlist_container(CatCTup, cache_elem, iter.cur); if (ct->dead) continue; if (ct->hash_value != hashValue) continue; /* early exit */ if (!CatalogCacheCompareTuple(...)) continue;
dlist_move_head(...); /* promote to LRU front */
if (!ct->negative) { ct->refcount++; ResourceOwnerRememberCatCacheRef(...); return &ct->tuple; /* ← cache HIT, positive */ } return NULL; /* ← cache HIT, negative */ } return SearchCatCacheMiss(...); /* ← cache MISS */}The function is pg_attribute_always_inline and is specialized into
SearchCatCache1/2/3/4 with a fixed nkeys, allowing the compiler to
unroll the hash computation loop. The hash is built by XOR-ing per-key
hashes with fixed bit-rotations (8, 16, 24 bits for keys 2, 3, 4).
Hash and equality functions are hard-coded fast-path variants for the
common key types (OID, int4, int2, name, text) in GetCCHashEqFuncs,
bypassing fmgr overhead on the critical path.
Figure 1 — Cache lookup decision tree
flowchart TD
A[SearchSysCache1/2/3/4] --> B[ConditionalCatalogCacheInitializeCache]
B --> C{cc_tupdesc == NULL?}
C -- yes --> D[CatalogCacheInitializeCache<br/>open rel, copy TupleDesc]
C -- no --> E[ComputeHashValue + HASH_INDEX]
D --> E
E --> F[Scan cc_bucket dlist]
F --> G{dead or hash mismatch?}
G -- skip --> F
G -- match keys? --> H{negative?}
H -- no --> I[bump refcount\nreturn &ct->tuple]
H -- yes --> J[return NULL\nneg-cache HIT]
F --> K{exhausted?}
K -- yes --> L[SearchCatCacheMiss]
L --> M[systable_beginscan on catalog index]
M --> N{found?}
N -- yes --> O[CatalogCacheCreateEntry\npositive entry]
N -- no --> P[CatalogCacheCreateEntry\nnegative entry]
O --> I
P --> J
Figure 1 — The two-branch decision tree: hit on a positive entry returns the tuple directly; hit on a negative entry returns NULL without a catalog scan; miss calls SearchCatCacheMiss which opens the catalog via systable_beginscan.
The miss path: SearchCatCacheMiss
Section titled “The miss path: SearchCatCacheMiss”On a cache miss, SearchCatCacheMiss opens the catalog relation with
table_open(AccessShareLock) and performs an index scan via
systable_beginscan. IndexScanOK gates whether an index scan is
actually safe (it returns false for pg_am always, and for pg_index
until criticalRelcachesBuilt, to avoid bootstrapping recursion):
// SearchCatCacheMiss — src/backend/utils/cache/catcache.c (line 1510) relation = table_open(cache->cc_reloid, AccessShareLock); do { scandesc = systable_beginscan(relation, cache->cc_indexoid, IndexScanOK(cache), NULL, nkeys, cur_skey); ct = NULL; stale = false; while (HeapTupleIsValid(ntp = systable_getnext(scandesc))) { ct = CatalogCacheCreateEntry(cache, ntp, NULL, hashValue, hashIndex); if (ct == NULL) { stale = true; break; } /* toast retry */ ct->refcount++; break; /* unique index: at most one match */ } systable_endscan(scandesc); } while (stale); table_close(relation, AccessShareLock);
if (ct == NULL) /* not found → negative entry */ ct = CatalogCacheCreateEntry(cache, NULL, arguments, hashValue, hashIndex);The stale retry loop handles a subtle race: if the tuple has out-of-
line TOAST fields, CatalogCacheCreateEntry calls toast_flatten_tuple
which can process AcceptInvalidationMessages, potentially marking the
entry-in-progress dead. In that case CatalogCacheCreateEntry returns
NULL and the scan restarts. This is the CatCInProgress stack
mechanism: during entry creation a CatCInProgress node is pushed onto
catcache_in_progress_stack; if an invalidation arrives while that node
is live, it sets node->dead = true, which CatalogCacheCreateEntry
checks before inserting the new entry.
CatalogCacheCreateEntry: tuple storage
Section titled “CatalogCacheCreateEntry: tuple storage”// CatalogCacheCreateEntry — src/backend/utils/cache/catcache.c (line 2144) ct = (CatCTup *) palloc(sizeof(CatCTup) + MAXIMUM_ALIGNOF + dtp->t_len); ct->tuple.t_data = (HeapTupleHeader) MAXALIGN(((char *) ct) + sizeof(CatCTup)); memcpy((char *) ct->tuple.t_data, (const char *) dtp->t_data, dtp->t_len); /* extract keys — they point into the tuple for positive entries */ for (i = 0; i < cache->cc_nkeys; i++) ct->keys[i] = heap_getattr(&ct->tuple, cache->cc_keyno[i], ...);The key insight: CatCTup + its tuple bytes are one allocation in
CacheMemoryContext. The key Datum values point directly into the tuple
data (for fixed-length by-value types they are copied inline; for
variable-length they point into the heap tuple). For negative entries no
tuple is stored, and the keys are separately palloc’d copies.
syscache.c: the named-ID facade
Section titled “syscache.c: the named-ID facade”syscache.c wraps catcache.c with integer cache IDs:
// InitCatalogCache — src/backend/utils/cache/syscache.c (line 110)void InitCatalogCache(void){ for (cacheId = 0; cacheId < SysCacheSize; cacheId++) { SysCache[cacheId] = InitCatCache(cacheId, cacheinfo[cacheId].reloid, cacheinfo[cacheId].indoid, cacheinfo[cacheId].nkeys, cacheinfo[cacheId].key, cacheinfo[cacheId].nbuckets); } /* build sorted OID arrays for RelationHasSysCache / RelationSupportsSysCache */ qsort(SysCacheRelationOid, SysCacheRelationOidSize, sizeof(Oid), oid_compare); /* ... de-dup ... */ CacheInitialized = true;}The public API is the SearchSysCache1/2/3/4 family, which delegates
directly to SearchCatCache1/2/3/4:
// SearchSysCache1 — src/backend/utils/cache/syscache.c (line 221)HeapTuple SearchSysCache1(int cacheId, Datum key1){ Assert(cacheId >= 0 && cacheId < SysCacheSize && PointerIsValid(SysCache[cacheId])); Assert(SysCache[cacheId]->cc_nkeys == 1); return SearchCatCache1(SysCache[cacheId], key1);}Convenience macros SearchSysCacheCopy1, SearchSysCacheExists1,
GetSysCacheOid1, and GetSysCacheHashValue1 are defined in
syscache.h as inline wrappers that pad unused key arguments to zero.
SearchSysCacheLocked1 (line 287) is a PG18-era addition for
inplace-updated tables (pg_class, pg_database, pg_authid,
etc.) that must be locked before modification. It loops SearchSysCache1 → LockTuple → re-fetch until the TID is stable, so callers get both
the tuple and the tuple lock in one atomic step.
SysCacheGetAttr and SysCacheGetAttrNotNull
Section titled “SysCacheGetAttr and SysCacheGetAttrNotNull”// SysCacheGetAttr — src/backend/utils/cache/syscache.c (line 600)Datum SysCacheGetAttr(int cacheId, HeapTuple tup, AttrNumber attributeNumber, bool *isNull){ /* Assert cache's TupleDesc matches the tuple */ Assert(...); return heap_getattr(tup, attributeNumber, SysCache[cacheId]->cc_tupdesc, isNull);}Because cached tuples live in CacheMemoryContext (not the transaction
context), the returned Datum pointer is valid for the duration of the
backend — unless the entry is invalidated. Callers that need to hold the
value past the next catalog access must copy it with datumCopy.
lsyscache.c: the convenience facade
Section titled “lsyscache.c: the convenience facade”lsyscache.c provides ~150 typed wrapper functions so callers do not
manipulate HeapTuple directly. The pattern is uniform:
// get_func_name — src/backend/utils/cache/lsyscache.c (line 1786)char *get_func_name(Oid funcid){ HeapTuple tp; char *result;
tp = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid)); if (!HeapTupleIsValid(tp)) return NULL; result = pstrdup(NameStr(((Form_pg_proc) GETSTRUCT(tp))->proname)); ReleaseSysCache(tp); return result;}The function: look up by OID, extract the typed struct with GETSTRUCT,
copy the field out of the cache tuple into the caller’s context, release
the cache reference, return. The pstrdup is critical: the cached
NameStr pointer is into CacheMemoryContext — copying it into the
current memory context ensures the value survives potential subsequent
catalog access that might trigger invalidation.
Figure 2 — Layer call flow for a typical catalog lookup
sequenceDiagram
participant E as Executor / Planner
participant L as lsyscache.c
participant S as syscache.c
participant C as catcache.c
participant K as Catalog (heap)
E->>L: get_func_name(funcid)
L->>S: SearchSysCache1(PROCOID, funcid)
S->>C: SearchCatCache1(SysCache[PROCOID], funcid)
C->>C: ComputeHashValue + bucket scan
alt cache HIT
C-->>S: &ct->tuple (refcount++)
S-->>L: HeapTuple
L->>S: ReleaseSysCache(tp)
S->>C: ReleaseCatCache (refcount--)
L-->>E: pstrdup(proname)
else cache MISS
C->>K: table_open + systable_beginscan
K-->>C: HeapTuple from index scan
C->>C: CatalogCacheCreateEntry (palloc in CacheMemoryContext)
C-->>S: &ct->tuple (refcount++)
S-->>L: HeapTuple
L->>S: ReleaseSysCache(tp)
S->>C: ReleaseCatCache (refcount--)
L-->>E: pstrdup(proname)
end
Figure 2 — Both hit and miss paths end with a ReleaseSysCache call that decrements the refcount. The caller receives a pstrdup copy of the name, not a pointer into the cache.
Invalidation: CatCacheInvalidate and the sinval loop
Section titled “Invalidation: CatCacheInvalidate and the sinval loop”When a transaction commits a catalog change, inval.c queues a sinval
message containing (cacheId, hashValue) for each affected cache entry.
At the next safe point — AcceptInvalidationMessages, called from
transaction start, CHECK_FOR_INTERRUPTS, and similar sites — each
backend processes its queue and calls SysCacheInvalidate →
CatCacheInvalidate:
// CatCacheInvalidate — src/backend/utils/cache/catcache.c (line 636)void CatCacheInvalidate(CatCache *cache, uint32 hashValue){ /* 1. Invalidate ALL CatCLists in this cache (too hard to be selective) */ for (int i = 0; i < cache->cc_nlbuckets; i++) dlist_foreach_modify(iter, &cache->cc_lbucket[i]) { CatCList *cl = ...; if (cl->refcount > 0) cl->dead = true; else CatCacheRemoveCList(cache, cl); }
/* 2. Invalidate matching tuples in the specific hash bucket */ hashIndex = HASH_INDEX(hashValue, cache->cc_nbuckets); dlist_foreach_modify(iter, &cache->cc_bucket[hashIndex]) { CatCTup *ct = ...; if (hashValue == ct->hash_value) { if (ct->refcount > 0 || (ct->c_list && ct->c_list->refcount > 0)) ct->dead = true; /* in use: mark, free later */ else CatCacheRemoveCTup(cache, ct); /* not in use: free now */ } }
/* 3. Mark any in-progress build entries dead */ for (CatCInProgress *e = catcache_in_progress_stack; e; e = e->next) if (e->cache == cache && (e->list || e->hash_value == hashValue)) e->dead = true;}Three points to observe: first, all CatCList entries in the affected
cache are invalidated regardless of key, because it is too expensive to
determine which partial-key lists are affected. Second, a tuple with
refcount > 0 is merely marked dead — it is not freed until
ReleaseCatCache decrements the refcount to zero. Third, the match is
on hash_value, not on the tuple TID; this accepts a small risk of
false-positive invalidations (hash collisions) in exchange for
correctness after VACUUM FULL (which changes TIDs).
Figure 3 — Invalidation state machine for a CatCTup entry
stateDiagram-v2
[*] --> Live : CatalogCacheCreateEntry
Live --> Live : SearchCatCacheInternal hit\nrefcount++
Live --> Live : ReleaseCatCache\nrefcount--
Live --> Dead_Referenced : CatCacheInvalidate\nrefcount > 0
Live --> Freed : CatCacheInvalidate\nrefcount == 0
Dead_Referenced --> Freed : ReleaseCatCache\nrefcount drops to 0
Freed --> [*] : CatCacheRemoveCTup pfree
Figure 3 — An entry transitions from Live to Dead_Referenced when invalidated while in use, and is freed only when the last caller releases it.
Callbacks: RegisterSyscacheCallback
Section titled “Callbacks: RegisterSyscacheCallback”Subsystems that maintain their own derived state — like the relcache and
the type cache — register callbacks via RegisterSyscacheCallback. When
CatCacheInvalidate marks entries dead, CallSyscacheCallbacks fires
each registered callback for the affected cache ID. The relcache uses
this to invalidate RelationData entries; the type cache uses it to
discard derived type information. This is the mechanism by which a
single sinval event propagates through all layers of the in-memory
catalog hierarchy.
Hash bucket growth: RehashCatCache
Section titled “Hash bucket growth: RehashCatCache”The bucket count starts at the value from cacheinfo[] and doubles
when the load factor exceeds a heuristic threshold. RehashCatCache
allocates a new bucket array, iterates all existing entries, re-places
each by HASH_INDEX(ct->hash_value, newnbuckets), and frees the old
array. This never requires blocking: it happens synchronously inside the
backend that triggered the rehash. Similarly, RehashCatCacheLists
doubles the list bucket array.
Source Walkthrough
Section titled “Source Walkthrough”catcache.c — generic engine
Section titled “catcache.c — generic engine”InitCatCache— allocatesCatCachestruct inCacheMemoryContext, setscc_reloid,cc_indexoid,cc_nkeys,cc_keyno,cc_nbuckets; does not open the relation.CatalogCacheInitializeCache— opens the catalog relation (once, on first use), copiesTupleDesc, resolves per-key hash/eq functions viaGetCCHashEqFuncs, fillscc_skey[].SearchCatCacheInternal— inline hot path: hash → bucket scan → LRU promote → refcount. Dispatched fromSearchCatCache1/2/3/4.SearchCatCacheMiss—pg_noinlinecold path:systable_beginscanon the index,CatalogCacheCreateEntryfor positive entry or negative entry on not-found.CatalogCacheCreateEntry— single-palloc forCatCTup+ tuple bytes; handles TOAST flattening withCatCInProgressstale-detection.SearchCatCacheList— partial-key multi-row fetch intoCatCList; retries if an invalidation arrives mid-build viain_progress_ent.CatCacheInvalidate— marks/frees entries by hash value; invalidates all lists; marks in-progress entries.ReleaseCatCache/ReleaseCatCacheList— decrement refcount; free if dead and refcount==0.RehashCatCache/RehashCatCacheLists— double bucket arrays, redistribute entries.PrepareToInvalidateCacheTuple— called byinval.con tuple insert/update/delete to compute the hash value to queue.
syscache.c — named-ID facade
Section titled “syscache.c — named-ID facade”InitCatalogCache— buildsSysCache[]fromcacheinfo[]; populates sortedSysCacheRelationOid[]andSysCacheSupportingRelOid[]arrays.SearchSysCache1/2/3/4— thin wrappers callingSearchCatCache1/2/3/4.SearchSysCacheLocked1— acquireLOCKTAG_TUPLEatInplaceUpdateTupleLockthen fetch; loop until TID is stable.SysCacheGetAttr/SysCacheGetAttrNotNull—heap_getattrwith the cache’scc_tupdesc.SysCacheInvalidate— index intoSysCache[], callCatCacheInvalidate; also firesCallSyscacheCallbacks.RelationHasSysCache/RelationSupportsSysCache— binary search ofSysCacheRelationOid[]/SysCacheSupportingRelOid[].
lsyscache.c — convenience facade
Section titled “lsyscache.c — convenience facade”Representative wrappers:
get_attname(relid, attnum)—ATTNUMcache, returnspstrdupofattname.get_opname(opno)—OPEROIDcache, returnspstrdupofoprname.get_func_name(funcid)—PROCOIDcache, returnspstrdupofproname.get_rel_name(relid)—RELOIDcache, returnspstrdupofrelname.get_namespace_name(nspid)—NAMESPACEOIDcache, returnspstrdupofnspname.
Position hints (commit 273fe94, 2026-06-05)
Section titled “Position hints (commit 273fe94, 2026-06-05)”| Symbol | File | Line |
|---|---|---|
struct catcache | src/include/utils/catcache.h | 44 |
struct catctup | src/include/utils/catcache.h | 88 |
struct catclist | src/include/utils/catcache.h | 159 |
struct catcacheheader | src/include/utils/catcache.h | 184 |
InitCatCache | src/backend/utils/cache/catcache.c | 889 |
CatalogCacheInitializeCache | src/backend/utils/cache/catcache.c | 1126 |
SearchCatCacheInternal | src/backend/utils/cache/catcache.c | 1402 |
SearchCatCacheMiss | src/backend/utils/cache/catcache.c | 1510 |
CatalogCacheCreateEntry | src/backend/utils/cache/catcache.c | 2144 |
SearchCatCacheList | src/backend/utils/cache/catcache.c | 1730 |
CatCacheInvalidate | src/backend/utils/cache/catcache.c | 636 |
ReleaseCatCache | src/backend/utils/cache/catcache.c | 1658 |
ReleaseCatCacheList | src/backend/utils/cache/catcache.c | 2104 |
PrepareToInvalidateCacheTuple | src/backend/utils/cache/catcache.c | 2387 |
InitCatalogCache | src/backend/utils/cache/syscache.c | 110 |
SearchSysCache1 | src/backend/utils/cache/syscache.c | 221 |
SearchSysCacheLocked1 | src/backend/utils/cache/syscache.c | 287 |
SysCacheGetAttr | src/backend/utils/cache/syscache.c | 600 |
SysCacheInvalidate | src/backend/utils/cache/syscache.c | 698 |
RelationHasSysCache | src/backend/utils/cache/syscache.c | 745 |
get_func_name | src/backend/utils/cache/lsyscache.c | 1786 |
get_attname | src/backend/utils/cache/lsyscache.c | 957 |
get_rel_name | src/backend/utils/cache/lsyscache.c | 2106 |
get_namespace_name | src/backend/utils/cache/lsyscache.c | 3544 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”Verified facts
Section titled “Verified facts”-
CatCTup and its tuple bytes are a single allocation in
CacheMemoryContext. Verified inCatalogCacheCreateEntry(catcache.c:2216):palloc(sizeof(CatCTup) + MAXIMUM_ALIGNOF + dtp->t_len), withct->tuple.t_datapointing toMAXALIGN(((char *) ct) + sizeof(CatCTup)). This means the tuple bytes are directly after the struct header at a cache-line-aligned address. -
Negative entries store separately-palloc’d keys; positive entries point into the tuple. Verified: for positive entries (ntp != NULL),
ct->keys[i] = heap_getattr(...)points into the inlined tuple. For negative entries,CatCacheCopyKeysallocates separate datums.CatCacheRemoveCTupcallsCatCacheFreeKeysonly for negative entries. -
CatCListinvalidation is total — all lists in the cache are invalidated on any invalidation event. Verified inCatCacheInvalidate(catcache.c:636): the function iterates allcc_nlbucketslist buckets unconditionally. The comment says “it’s too hard to tell which searches might still be correct, so just zap ‘em all.” -
cc_lbucket(list hash buckets) is allocated lazily on firstSearchCatCacheListcall. Verified:InitCatCachesetscp->cc_lbucket = NULL(catcache.c:949).SearchCatCacheListchecksif (cache->cc_lbucket == NULL)and allocates an initial 16-bucket array on the first call. -
The
CatCInProgressstack is process-local. Verified: declared asstatic CatCInProgress *catcache_in_progress_stack = NULL(catcache.c:61). There is no shared-memory component; the race it protects against is intra-process (invalidation arriving duringtoast_flatten_tuple). -
SearchSysCacheLocked1loops until TID stability. Verified in syscache.c:287–376: the function performsSearchSysCache1 → LockTuple → re-fetch → compare TIDin a loop. This is the mechanism described inREADME.tuplock§“Locking to write inplace-updated tables.” Added as part of PG16/17 inplace-update hardening. -
MAKE_SYSCACHEdeclarations live in individualpg_*.hcatalog headers. Verified by grep:MAKE_SYSCACHE(PROCOID, ...)inpg_proc.h:143,MAKE_SYSCACHE(TYPEOID, ...)inpg_type.h:268,MAKE_SYSCACHE(RELOID, ...)inpg_class.h:162. Thesyscache_info.hfile is generated/included to populatecacheinfo[].
Open questions
Section titled “Open questions”-
Total count of
SysCacheSizeat commit 273fe94. The exact count of cache IDs is in the generatedcatalog/syscache_ids.h; the file was not found at the expected path undersrc/include/catalog/in the working tree. Investigation path:grep -r SysCacheSize src/include/to locate the generated header or enum definition. -
RegisterSyscacheCallbackcapacity limit.inval.cstores registered callbacks in a fixed array. The capacity limit and what happens on overflow (assert vs. error vs. expand) is not verified. Investigation path: readCallSyscacheCallbacksininval.cand the array declaration. -
debug_discard_cachesGUC behavior.ResetCatalogCachesExt(true)is called whendebug_discard_cachesis set. The exact GUC name and whether it is accessible in PG18 release builds (it requiresUSE_ASSERT_CHECKING?) is not verified.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”-
Oracle row cache (shared). Oracle’s data dictionary cache lives in the shared pool (SGA), shared across all sessions. This avoids per- session redundancy but requires a latch on every cache access. The tradeoff: Oracle’s heavy concurrent workloads make the shared approach worth the latch cost; PostgreSQL’s per-process model avoids any lock for reads at the cost of duplicating entries across sessions. A side- by-side measurement under high-concurrency DDL-read workloads would quantify the tradeoff.
-
MySQL InnoDB table definition cache (
table_definition_cache). A shared LRU cache ofTABLE_SHAREobjects, also shared across threads. The sharing granularity is the whole table definition rather than individual catalog tuples. -
CockroachDB lease-based descriptor cache. CockroachDB uses range-leased schema descriptors: a node holds a time-bounded lease on a schema version and serves all requests from its local copy until the lease expires. This is closer to the eager-invalidation end of the spectrum, with a lease protocol taking the place of the sinval queue. Contrast with PostgreSQL’s demand-driven lazy invalidation.
-
Research: fine-grained list invalidation. PostgreSQL’s
CatCacheInvalidateinvalidates allCatCListentries on any catalog change, regardless of which partial key was affected. This is conservative and sometimes excessive. A finer-grained approach — hash the partial key and invalidate only matching lists — would reduce unnecessary list rebuilds under DDL workloads on large catalogs. No known published study quantifies this for PostgreSQL specifically. -
Research: shared catalog cache. PostgreSQL has long had proposals for a shared catalog cache to reduce per-backend memory usage and catalog-scan startup overhead (especially relevant for connection poolers and short-lived sessions). The implementation difficulty is managing refcounts and invalidation across a shared structure without introducing a new global latch. Some work in this area appeared in pgsql-hackers discussions (~2019–2022); the topic remains open.
Sources
Section titled “Sources”Source files (commit 273fe94, REL_18_STABLE)
Section titled “Source files (commit 273fe94, REL_18_STABLE)”src/backend/utils/cache/catcache.c— generic catalog cache enginesrc/backend/utils/cache/syscache.c— named-ID facade andSysCache[]arraysrc/backend/utils/cache/lsyscache.c— typed convenience wrapperssrc/include/utils/catcache.h—CatCache,CatCTup,CatCListstruct definitionssrc/include/utils/syscache.h—SearchSysCache*prototypes and macro wrapperssrc/include/utils/lsyscache.h—get_attname,get_func_name, … prototypessrc/include/catalog/pg_class.h,pg_proc.h,pg_type.h,pg_attribute.h,pg_namespace.h—MAKE_SYSCACHEdeclarations
Textbook references
Section titled “Textbook references”- Database Internals (Petrov, 2019) — ch. 6 §“Metadata Management”: catalog cache as critical-path optimization.
- Database System Concepts (Silberschatz, 7e) — ch. 25 §“Buffer Management”: data dictionary cache alongside buffer pool.
Related docs in this KB
Section titled “Related docs in this KB”knowledge/code-analysis/postgres/postgres-cache-invalidation.md— the sinval message queue andinval.cin full detail.knowledge/code-analysis/postgres/postgres-relcache.md— relcache as the higher-level consumer of catcache;RegisterSyscacheCallbackusage.knowledge/code-analysis/postgres/postgres-system-catalogs.md— catalog layout,MAKE_SYSCACHEmacro,.bkibootstrap.knowledge/code-analysis/postgres/postgres-memory-contexts.md—CacheMemoryContextlifetime and allocation patterns.