Skip to content

PostgreSQL MVCC Snapshots — Visibility, the procarray Census, and Isolation Levels

Contents:

Multiversion concurrency control (MVCC) keeps several timestamped versions of each row so that a reader never has to block a writer and a writer never has to block a reader of the same row. Database Internals (Petrov, ch. 5, §“Multiversion Concurrency Control”, line ≈4004) places MVCC alongside optimistic (OCC) and pessimistic schemes, and singles out its defining property: “reads can continue accessing older values until the new ones are committed.” Coordination is pushed down to visibility — deciding which version a given reader is allowed to see — rather than to mutual exclusion.

The isolation level built on top of MVCC is snapshot isolation (SI). A transaction conceptually photographs the committed state of the database at a chosen instant, then runs every read against that photograph. Database Internals (§“Isolation Levels”, line ≈4138; §“Snapshot Isolation”, line ≈4179) describes SI as the level where “a transaction can see a consistent snapshot of database state”: it sees its own writes plus everything committed before the snapshot instant, and nothing committed after. SI rules out dirty reads, non-repeatable reads, and (for the snapshot) phantoms.

The deeper reason a database needs a precise definition here is that the SQL standard’s isolation levels are themselves imprecise. Berenson et al. 1995, A Critique of ANSI SQL Isolation Levels (Microsoft TR-95-51), shows that the ANSI phenomena (P1 dirty read, P2 non-repeatable read, P3 phantom) are ambiguous, that SI does not fit the ANSI lattice cleanly, and that SI admits an anomaly the ANSI text never names: write skew — two transactions each read an overlapping set, each update a disjoint part, and each preserve a local invariant that their combined effect violates. PostgreSQL adopts the Berenson vocabulary directly; the README-SSI in the source tree cites this paper as its phenomenon dictionary. Write skew is precisely what a plain SI engine cannot prevent, and it is the reason a separate SERIALIZABLE mechanism (SSI, a different document) exists.

Two implementation choices follow from the SI model and shape every MVCC engine. PostgreSQL’s specific answers occupy the rest of this document:

  1. How to represent a snapshot and decide visibility. The snapshot must capture which transactions were still in progress at the snapshot instant, because a version written by such a transaction must stay invisible even if that transaction commits a microsecond later. The set of in-progress transaction identifiers is therefore the central data structure, and the per-version stamp must record who inserted and who deleted each version.
  2. When to take the snapshot. Acquiring a fresh snapshot per statement yields READ COMMITTED; acquiring one snapshot at transaction start and reusing it yields REPEATABLE READ / SI. The MVCC machinery is identical; only the timing differs. This is the isolation knob.

A third concern is structural rather than a design choice: because old versions accumulate, the engine needs a reclamation horizon — the oldest snapshot instant any live transaction still depends on. Versions wholly below that horizon are unreachable and may be garbage-collected. The contrast family here is OCC (Kung & Robinson 1981, dbms-papers/occ.md), which validates at commit instead of stamping versions; MVCC trades that validation cost for the storage and reclamation cost of keeping multiple versions alive.

The textbook gives the model; this section names the engineering conventions that almost every SI/MVCC engine — PostgreSQL, Oracle, InnoDB, SQL Server, CUBRID — adopts in some form. PostgreSQL’s specific choices in the next section are best read as one set of dials within this shared space.

Per-version metadata: inserter and deleter stamps

Section titled “Per-version metadata: inserter and deleter stamps”

Every row version carries enough information to answer “is this version visible to me?” without a central lookup on the common path. The minimum stamp is (inserted_by, deleted_by). An engine either stores old versions in place next to the live row (PostgreSQL: the heap holds every version, xmin/xmax inline) or out of place in an undo area with a back-pointer on the live row (Oracle undo segments, InnoDB rollback segments, CUBRID’s log-resident prev_version_lsa). The choice cascades into garbage-collection cost: in-place storage makes reads cheap but needs a scan-everything vacuum; out-of-place storage keeps the live heap compact but pays an indirection to read an old version.

A snapshot is a census of in-progress transactions

Section titled “A snapshot is a census of in-progress transactions”

At snapshot acquisition the engine records which transactions are still in flight. The two common encodings are:

  • An explicit list of in-progress IDs bounded by a low and high watermark (xmin, xmax). A tuple’s stamp below the low watermark is certainly committed-or-aborted (decided long ago); above the high watermark it is certainly invisible (not yet started at snapshot time); only stamps between the watermarks need a membership test against the list. PostgreSQL takes this route.
  • A bit array over a sliding window of recent IDs with an overflow list for long-running outliers (CUBRID’s mvcc_active_tran). The probe is O(1) but the window is a fixed-size structure.

Both encodings exist to make the common-case visibility test a couple of integer comparisons, falling back to a membership scan only for the narrow band of genuinely concurrent transactions.

Caching the visibility verdict on the tuple

Section titled “Caching the visibility verdict on the tuple”

A pure MVCC check consults the commit log (PostgreSQL’s clog/pg_xact) to learn whether a stamp’s transaction committed or aborted. That lookup is expensive to repeat for every scan of the same row. The near-universal optimization is a hint bit: once visibility is decided, the engine records “this xmin committed” / “this xmin aborted” in spare bits of the tuple header, so the next reader skips the commit-log probe. The subtlety is durability — you may only record “committed” once you know the commit record is safely on disk, or a crash could resurrect the bit without the commit behind it.

Versions older than the lowest live snapshot’s lower bound are unreachable. The single global “oldest in-use xmin” is the horizon; reclamation (PostgreSQL VACUUM, Oracle UNDO trim, CUBRID vacuum_master) is gated by it. Every SI engine carries the same structural cost: one long-running transaction pins the horizon and stalls reclamation regardless of how many shorter transactions completed.

Snapshot acquisition timing is the isolation knob

Section titled “Snapshot acquisition timing is the isolation knob”

Most SI engines reuse the same machinery across isolation levels and vary only when the snapshot is taken:

  • Read Committed — a fresh snapshot per statement, so each statement sees the latest committed state.
  • Repeatable Read / SI — one snapshot at first use, reused for the whole transaction.
  • Serializable — SI alone admits write skew. Two production answers: predicate locking with dependency-cycle detection (PostgreSQL SSI), or a fallback to lock-based serialization (CUBRID). Either way, this is a separate mechanism layered on top of the snapshot machinery, not a different snapshot.
TheoryPostgreSQL name
Per-version timestampTransactionId (XID) — 32-bit, wraps; FullTransactionId for horizons
Inserter / deleter stamps in rowtuple header t_xmin / t_xmax (in HeapTupleHeaderData)
In-progress set captured at snapshotSnapshotData.xip[] (top-level) + subxip[] (subxacts)
Snapshot low / high watermarkSnapshotData.xmin / SnapshotData.xmax
The census sourcethe procarray (ProcGlobal->xids[], one slot per backend)
Census routineGetSnapshotData (in procarray.c)
Membership testXidInMVCCSnapshot (in snapmgr.c)
Visibility predicateHeapTupleSatisfiesMVCC (in heapam_visibility.c)
Cached verdictinfomask hint bits HEAP_XMIN_COMMITTED / _INVALID / _XMAX_*
Reclamation horizonRecentXmin, MyProc->xmin, and GlobalVisState bounds
Snapshot timing policyGetTransactionSnapshot + IsolationUsesXactSnapshot()

The procarray internals — how PGPROC slots are laid out, how XIDs are published and cleared, the group-clear optimization — are owned by postgres-procarray.md. This document treats the procarray as the table GetSnapshotData reads, and stops at the read.

PostgreSQL is the textbook in-place MVCC engine: an UPDATE does not overwrite the row, it inserts a new tuple version and marks the old one’s t_xmax with the updating XID; the old version stays on the heap page for snapshot readers until vacuum reclaims it. Three moving parts implement visibility:

  1. A SnapshotData struct (snapshot.h) — the photograph: xmin, xmax, and the xip[] array of XIDs that were in progress at snapshot time.
  2. GetSnapshotData (procarray.c) — the camera: one pass over the procarray under a shared ProcArrayLock, filling in the struct.
  3. HeapTupleSatisfiesMVCC (heapam_visibility.c) — the verdict: given a tuple and a snapshot, decide visible/invisible, consulting the snapshot, the commit log, and the tuple’s own hint bits, and caching the result back into the tuple’s infomask.

The snapshot manager (snapmgr.c) wraps all of this with lifetime and timing policy: which snapshot a statement gets, how long it lives, and how the oldest xmin is published so vacuum knows what it may reclaim.

// SnapshotData — src/include/utils/snapshot.h
typedef struct SnapshotData
{
SnapshotType snapshot_type; /* SNAPSHOT_MVCC for regular snapshots */
TransactionId xmin; /* all XID < xmin are visible to me */
TransactionId xmax; /* all XID >= xmax are invisible to me */
TransactionId *xip; /* in-progress top-level XIDs, [xmin,xmax) */
uint32 xcnt; /* # of XIDs in xip[] */
TransactionId *subxip; /* in-progress subxact XIDs */
int32 subxcnt;
bool suboverflowed; /* subxip[] incomplete -> use pg_subtrans */
bool takenDuringRecovery;
bool copied;
CommandId curcid; /* in my xact, CID < curcid are visible */
/* ... speculativeToken, vistest, refcounts, ph_node ... */
uint64 snapXactCompletionCount; /* lets static snapshots be reused */
} SnapshotData;

The shape is the whole idea. xmin/xmax are the two watermarks; xip[] is the explicit in-progress set living strictly inside [xmin, xmax). The invariant from the header comment: “An MVCC snapshot can never see the effects of XIDs >= xmax. It can see the effects of all older XIDs except those listed in the snapshot.” That single sentence is the contract XidInMVCCSnapshot enforces.

SnapshotType distinguishes the regular SNAPSHOT_MVCC from special snapshots (SNAPSHOT_SELF, SNAPSHOT_ANY, SNAPSHOT_DIRTY, SNAPSHOT_TOAST, SNAPSHOT_HISTORIC_MVCC for logical decoding, SNAPSHOT_NON_VACUUMABLE for pruning). HeapTupleSatisfiesVisibility dispatches on this field; this document follows the SNAPSHOT_MVCC path. curcid handles intra-transaction visibility: a tuple my own transaction wrote in an earlier command is visible, one written by the current command (CID >= curcid) is not.

How a snapshot is built — GetSnapshotData

Section titled “How a snapshot is built — GetSnapshotData”

GetSnapshotData is the census. It walks the procarray — the shared array where every backend publishes its current XID — and records who is in flight.

flowchart TD
  A["GetSnapshotData(snapshot)"] --> RU{"GetSnapshotDataReuse:\nxactCompletionCount\nunchanged?"}
  RU -- "yes" --> REUSE["reuse cached xip[]\nrelease lock, return"]
  RU -- "no" --> X["xmax = latestCompletedXid + 1\nxmin = xmax (seed)"]
  X --> LOOP["for each procarray slot:\nfetch other_xids[off]"]
  LOOP --> C1{"xid invalid\nor my own?"}
  C1 -- "skip" --> LOOP
  C1 -- "keep" --> C2{"xid >= xmax?"}
  C2 -- "skip (already running)" --> LOOP
  C2 -- "no" --> C3{"PROC_IN_VACUUM or\nPROC_IN_LOGICAL_DECODING?"}
  C3 -- "skip" --> LOOP
  C3 -- "no" --> ADD["xip[count++] = xid\nxmin = min(xmin, xid)\ncopy its subxids"]
  ADD --> LOOP
  LOOP --> FIN["set MyProc->xmin = xmin (if unset)\nrelease ProcArrayLock"]
  FIN --> GV["advance GlobalVis* bounds\nRecentXmin = xmin"]
  GV --> DONE["fill snapshot: xmin/xmax/xip/xcnt/..."]

Figure 1 — GetSnapshotData control flow. The reuse fast-path short-circuits when no transaction has completed since the last snapshot; otherwise a single pass over the procarray collects every in-progress XID below xmax, skipping the backend’s own XID, vacuum, and logical-decoding backends.

The watermarks come first. xmax is set to one past the latest completed XID; xmin is seeded to xmax and lowered as the scan finds older in-progress XIDs:

// GetSnapshotData — src/backend/storage/ipc/procarray.c (condensed)
LWLockAcquire(ProcArrayLock, LW_SHARED);
if (GetSnapshotDataReuse(snapshot)) /* nothing completed since last time */
{
LWLockRelease(ProcArrayLock);
return snapshot;
}
latest_completed = TransamVariables->latestCompletedXid;
/* xmax is always latestCompletedXid + 1 */
xmax = XidFromFullTransactionId(latest_completed);
TransactionIdAdvance(xmax);
xmin = xmax; /* seed; lowered in the loop */
/* take own xid into account, saves a check inside the loop */
if (TransactionIdIsNormal(myxid) && NormalTransactionIdPrecedes(myxid, xmin))
xmin = myxid;

The loop is one pass over every backend’s published XID. Four filters decide whether a slot contributes to the snapshot:

// GetSnapshotData — the collection loop (condensed)
for (int pgxactoff = 0; pgxactoff < numProcs; pgxactoff++)
{
TransactionId xid = UINT32_ACCESS_ONCE(other_xids[pgxactoff]);
if (likely(xid == InvalidTransactionId)) /* no XID assigned: skip */
continue;
if (pgxactoff == mypgxactoff) /* never include my own XID */
continue;
if (!NormalTransactionIdPrecedes(xid, xmax)) /* >= xmax: treated running anyway */
continue;
statusFlags = allStatusFlags[pgxactoff];
if (statusFlags & (PROC_IN_LOGICAL_DECODING | PROC_IN_VACUUM))
continue; /* manages its own xmin */
if (NormalTransactionIdPrecedes(xid, xmin))
xmin = xid;
xip[count++] = xid; /* this XID was in progress */
/* ... copy the backend's cached subxids into subxip[], or set
* suboverflowed if its subxid cache spilled ... */
}

Three facts a reader must carry forward:

  • The backend never records its own XID in the snapshot. Visibility of the transaction’s own writes is handled separately by TransactionIdIsCurrentTransactionId and curcid inside the visibility predicate, not by the xip[] membership test.
  • PROC_IN_VACUUM backends are excluded. A lazy vacuum holds no snapshot that pins the horizon, so excluding its (large, long-lived) XID keeps the horizon from being dragged down by maintenance.
  • Subxids may overflow. Each PGPROC caches only a fixed number of subtransaction XIDs (PGPROC_MAX_CACHED_SUBXIDS). If a backend ran more subtransactions than fit, its cache overflowed; the snapshot sets suboverflowed = true, and XidInMVCCSnapshot must then consult pg_subtrans to map a subxact to its parent.

After releasing the lock, GetSnapshotData publishes two pieces of global bookkeeping that the rest of the engine reads:

// GetSnapshotData — after the loop (condensed)
if (!TransactionIdIsValid(MyProc->xmin))
MyProc->xmin = TransactionXmin = xmin; /* pin the horizon for this backend */
LWLockRelease(ProcArrayLock);
/* ... advance GlobalVisSharedRels/CatalogRels/DataRels/TempRels bounds ... */
RecentXmin = xmin;
snapshot->xmin = xmin;
snapshot->xmax = xmax;
snapshot->xcnt = count;
snapshot->subxcnt = subcount;
snapshot->suboverflowed = suboverflowed;
snapshot->snapXactCompletionCount = curXactCompletionCount;
snapshot->curcid = GetCurrentCommandId(false);

MyProc->xmin is the value the vacuum horizon is computed from: as long as this backend holds a snapshot, no version younger than its xmin may be reclaimed. RecentXmin is a backend-local cache of the most recent snapshot’s lower bound. The GlobalVis* bounds are the cheap approximate horizons used by pruning and vacuum (detailed below).

A subtle but load-bearing optimization: GetSnapshotData records snapXactCompletionCountTransamVariables->xactCompletionCount, a shared counter bumped on every commit/abort — into the snapshot. On the next call, GetSnapshotDataReuse compares the counter; if no transaction has completed since, the previously computed xip[] is provably identical and the expensive procarray scan is skipped entirely.

// GetSnapshotDataReuse — src/backend/storage/ipc/procarray.c (condensed)
static bool
GetSnapshotDataReuse(Snapshot snapshot)
{
Assert(LWLockHeldByMe(ProcArrayLock));
if (unlikely(snapshot->snapXactCompletionCount == 0))
return false;
curXactCompletionCount = TransamVariables->xactCompletionCount;
if (curXactCompletionCount != snapshot->snapXactCompletionCount)
return false;
/* unchanged: the recomputed snapshot would be identical -> reuse */
/* ... still must refresh MyProc->xmin and TransactionXmin ... */
return true;
}

This is the payoff of the snapshot-scalability rework (the “snapshot scalability” work that landed in PG 14): a read-heavy workload that takes many snapshots between commits pays the full procarray scan only once per actual transaction completion.

Deciding visibility — HeapTupleSatisfiesMVCC

Section titled “Deciding visibility — HeapTupleSatisfiesMVCC”

Given a snapshot, the per-tuple verdict is HeapTupleSatisfiesMVCC. The whole function is a decision tree on the tuple’s t_xmin (inserter) and t_xmax (deleter), short-circuited at every step by hint bits and the snapshot watermarks. The skeleton:

flowchart TD
  S["HeapTupleSatisfiesMVCC(tuple, snapshot)"] --> XC{"XMIN_COMMITTED\nhint set?"}
  XC -- "no" --> INV{"XMIN_INVALID\nhint set?"}
  INV -- "yes" --> NV["return false\n(inserter aborted)"]
  INV -- "no" --> ME{"inserter is\ncurrent xact?"}
  ME -- "yes" --> MEC{"cmin >= curcid?"}
  MEC -- "yes" --> NV2["false\n(my later command)"]
  MEC -- "no" --> XMAXC["check xmax path"]
  ME -- "no" --> INSNAP{"XidInMVCCSnapshot\n(xmin)?"}
  INSNAP -- "yes" --> NV3["false\n(inserter still in progress)"]
  INSNAP -- "no" --> DIDC{"TransactionIdDidCommit\n(xmin)?"}
  DIDC -- "yes" --> SH1["SetHintBits(XMIN_COMMITTED)"]
  DIDC -- "no" --> SH2["SetHintBits(XMIN_INVALID)\nreturn false"]
  SH1 --> XMAXC
  XC -- "yes" --> FRZ{"frozen or not in\nsnapshot?"}
  FRZ --> XMAXC
  XMAXC --> XMAXI{"XMAX_INVALID\nor locked-only?"}
  XMAXI -- "yes" --> VIS["return true\n(visible)"]
  XMAXI -- "no" --> XMAXSNAP{"deleter committed\nand in snapshot?"}
  XMAXSNAP -- "still in progress" --> VIS
  XMAXSNAP -- "committed before snap" --> NV4["return false\n(deleted)"]

Figure 2 — HeapTupleSatisfiesMVCC decision tree (skeleton). The inserter (t_xmin) must be committed and visible to the snapshot for the tuple to be a candidate; then the deleter (t_xmax) must be either absent, aborted, or still invisible to the snapshot for the tuple to actually be visible. Hint bits short-circuit the committed/aborted branches; SetHintBits caches each newly learned verdict.

The inserter check, condensed to its core arms:

// HeapTupleSatisfiesMVCC — src/backend/access/heap/heapam_visibility.c (condensed)
if (!HeapTupleHeaderXminCommitted(tuple))
{
if (HeapTupleHeaderXminInvalid(tuple))
return false; /* inserter known aborted */
/* ... MOVED_OFF/MOVED_IN pre-9.0 upgrade arms elided ... */
else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
{
if (HeapTupleHeaderGetCmin(tuple) >= snapshot->curcid)
return false; /* inserted by a later command of my own xact */
/* ... then fall through to the xmax (delete) checks ... */
}
else if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
return false; /* inserter still in progress */
else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
HeapTupleHeaderGetRawXmin(tuple)); /* cache: committed */
else
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID, InvalidTransactionId);
return false; /* aborted/crashed; cache it */
}
}
else
{
/* xmin already hinted committed, but maybe not per our snapshot */
if (!HeapTupleHeaderXminFrozen(tuple) &&
XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
return false; /* treat as still in progress */
}
/* by here, the inserting transaction is committed and visible to us */

The order of tests is a performance ladder, cheapest first: a hint bit (HeapTupleHeaderXminCommitted) avoids everything; a curcid compare handles own-transaction tuples; XidInMVCCSnapshot is the snapshot membership test; TransactionIdDidCommit is the expensive commit-log probe, reached only when no hint bit exists yet — and its result is immediately cached with SetHintBits. A frozen xmin (HeapTupleHeaderXminFrozen, both committed and invalid bits set) is unconditionally visible and skips even the snapshot test, which is why freezing matters for very old tuples.

Once the inserter is established as committed-and-visible, the deleter (t_xmax) is examined with the symmetric logic: an invalid/aborted xmax or a lock-only xmax means the row is live (visible); a committed xmax that is not in the snapshot means the row was deleted before our snapshot (invisible); a committed xmax that is in the snapshot means the delete happened concurrently and we still see the row.

The snapshot membership test — XidInMVCCSnapshot

Section titled “The snapshot membership test — XidInMVCCSnapshot”

This is where the xmin/xmax watermarks earn their keep. The vast majority of XIDs are decided by two comparisons without touching xip[]:

// XidInMVCCSnapshot — src/backend/utils/time/snapmgr.c (condensed)
bool
XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
{
/* Any xid < xmin is not in-progress (committed/aborted long ago) */
if (TransactionIdPrecedes(xid, snapshot->xmin))
return false;
/* Any xid >= xmax is in-progress (not started at snapshot time) */
if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
return true;
if (!snapshot->takenDuringRecovery)
{
if (!snapshot->suboverflowed)
{
if (pg_lfind32(xid, snapshot->subxip, snapshot->subxcnt))
return true; /* matched a known in-progress subxact */
/* fall through to search xip[] */
}
else
{
/* subxid cache overflowed: map to top-level via pg_subtrans */
xid = SubTransGetTopmostTransaction(xid);
if (TransactionIdPrecedes(xid, snapshot->xmin))
return false;
}
if (pg_lfind32(xid, snapshot->xip, snapshot->xcnt))
return true; /* matched a known in-progress top-level XID */
}
/* ... recovery branch: all XIDs live in subxip[] ... */
return false;
}

Only XIDs in the narrow [xmin, xmax) band reach the array scan (pg_lfind32, a SIMD-accelerated linear search). The suboverflowed branch is the cost of the fixed-size subxid cache: when a snapshot could not capture all subtransactions, a subxact XID must be resolved to its parent through pg_subtrans (an SLRU) before the membership test, because only top-level XIDs are guaranteed present in xip[].

SetHintBits writes the just-computed commit/abort verdict into the tuple’s t_infomask, so the next scan of this tuple skips the commit-log probe entirely. The durability rule is the whole subtlety:

// SetHintBits — src/backend/access/heap/heapam_visibility.c
static inline void
SetHintBits(HeapTupleHeader tuple, Buffer buffer,
uint16 infomask, TransactionId xid)
{
if (TransactionIdIsValid(xid))
{
/* NB: xid must be known committed here! */
XLogRecPtr commitLSN = TransactionIdGetCommitLSN(xid);
if (BufferIsPermanent(buffer) && XLogNeedsFlush(commitLSN) &&
BufferGetLSNAtomic(buffer) < commitLSN)
{
/* commit not yet flushed and no LSN interlock: don't set hint */
return;
}
}
tuple->t_infomask |= infomask;
MarkBufferDirtyHint(buffer, true);
}

The guard implements the WAL-before-hint rule. A “committed” hint bit may only be set if the transaction’s commit record is already flushed to disk (!XLogNeedsFlush), or if the page’s own LSN already sits past the commit LSN (so the page cannot be written before the commit is durable), or if the buffer is not permanent (temp/unlogged, gone after a crash anyway). Otherwise the bit is not set this time — it will be set on a later visit once the commit is durable. This is why setting a hint bit dirties the page only as a hint (MarkBufferDirtyHint, not MarkBufferDirty): the bit is reconstructible from the commit log, so its loss across a crash is harmless. Aborted hints carry no such rule — an abort is safe to record immediately — which the caller exploits by passing InvalidTransactionId for the HEAP_XMIN_INVALID case.

Snapshot lifetimes and the isolation knob — snapmgr.c

Section titled “Snapshot lifetimes and the isolation knob — snapmgr.c”

Who calls GetSnapshotData, and how long the result lives, is the snapshot manager’s job. The entry point for a query is GetTransactionSnapshot, and the isolation level decides everything:

// GetTransactionSnapshot — src/backend/utils/time/snapmgr.c (condensed)
Snapshot
GetTransactionSnapshot(void)
{
/* ... historic-snapshot (logical decoding) arm elided ... */
if (!FirstSnapshotSet) /* first snapshot of this transaction */
{
InvalidateCatalogSnapshot();
if (IsolationUsesXactSnapshot()) /* REPEATABLE READ or SERIALIZABLE */
{
if (IsolationIsSerializable())
CurrentSnapshot = GetSerializableTransactionSnapshot(&CurrentSnapshotData);
else
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
CurrentSnapshot = CopySnapshot(CurrentSnapshot); /* must outlive statement */
FirstXactSnapshot = CurrentSnapshot;
FirstXactSnapshot->regd_count++;
pairingheap_add(&RegisteredSnapshots, &FirstXactSnapshot->ph_node);
}
else
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
FirstSnapshotSet = true;
return CurrentSnapshot;
}
if (IsolationUsesXactSnapshot()) /* RR/SR: reuse the transaction snapshot */
return CurrentSnapshot;
/* READ COMMITTED: a fresh snapshot for this statement */
InvalidateCatalogSnapshot();
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
return CurrentSnapshot;
}

The macros decode to the isolation levels:

src/include/access/xact.h
#define XACT_READ_COMMITTED 1
#define XACT_REPEATABLE_READ 2
#define XACT_SERIALIZABLE 3
#define IsolationUsesXactSnapshot() (XactIsoLevel >= XACT_REPEATABLE_READ)
#define IsolationIsSerializable() (XactIsoLevel == XACT_SERIALIZABLE)

So:

  • READ COMMITTED (and READ UNCOMMITTED, which PostgreSQL treats as RC) takes a fresh snapshot at the start of each statement. Every new command sees everything committed up to that moment.
  • REPEATABLE READ takes one snapshot at first use, copies it so it survives statement boundaries, registers it, and returns that same snapshot for the rest of the transaction. PostgreSQL’s REPEATABLE READ is full snapshot isolation — stronger than the ANSI minimum, so it also prevents phantoms.
  • SERIALIZABLE takes the same transaction-lifetime snapshot but routes it through GetSerializableTransactionSnapshot, which registers the SSI bookkeeping (predicate locks, dependency tracking). The snapshot is the same SI snapshot; the serialization guarantee is layered on top — owned by postgres-ssi-predicate-locking.md.
flowchart LR
  subgraph RC["READ COMMITTED"]
    direction TB
    S1["stmt 1"] --> G1["GetSnapshotData\n(fresh)"]
    S2["stmt 2"] --> G2["GetSnapshotData\n(fresh)"]
    S3["stmt 3"] --> G3["GetSnapshotData\n(fresh)"]
  end
  subgraph RR["REPEATABLE READ / SERIALIZABLE"]
    direction TB
    T1["stmt 1"] --> GG["GetSnapshotData\n(once, copied + registered)"]
    T2["stmt 2"] --> GG
    T3["stmt 3"] --> GG
  end

Figure 3 — Snapshot lifetimes by isolation level. READ COMMITTED rebuilds the snapshot for every statement (so it sees concurrent commits between statements); REPEATABLE READ and SERIALIZABLE acquire one snapshot at first use and reuse it for the whole transaction.

A separate CatalogSnapshot (also MVCC) is used for system-catalog scans and is invalidated whenever a catalog change occurs, so a backend’s cached catalog view stays current even within a long REPEATABLE READ transaction — InvalidateCatalogSnapshot is called at each GetTransactionSnapshot entry to keep it no older than the transaction snapshot. GetLatestSnapshot exists for the rare code (e.g. referential-integrity checks) that needs an up-to-the- instant snapshot even in RR/SR mode; it builds into SecondarySnapshotData.

Every registered snapshot pins MyProc->xmin. When snapshots are released, SnapshotResetXmin recomputes the backend’s floor as the minimum xmin of any still-registered snapshot, dropping it to InvalidTransactionId when none remain:

// SnapshotResetXmin — src/backend/utils/time/snapmgr.c
static void
SnapshotResetXmin(void)
{
if (ActiveSnapshot != NULL)
return; /* something still active: keep xmin */
if (pairingheap_is_empty(&RegisteredSnapshots))
{
MyProc->xmin = TransactionXmin = InvalidTransactionId; /* release horizon */
return;
}
/* otherwise lower MyProc->xmin to the oldest registered snapshot's xmin */
minSnapshot = pairingheap_container(SnapshotData, ph_node,
pairingheap_first(&RegisteredSnapshots));
if (TransactionIdPrecedes(MyProc->xmin, minSnapshot->xmin))
MyProc->xmin = TransactionXmin = minSnapshot->xmin;
}

RegisteredSnapshots is a pairing heap ordered by xmin, so the oldest is the heap root — found in O(1). The global vacuum horizon is then the minimum of every backend’s MyProc->xmin, computed by ComputeXidHorizons and exposed through the GlobalVisState bounds described next. Vacuum mechanism itself is postgres-vacuum.md; here we stop at “what the horizon is and who pins it.”

The approximate horizon — GlobalVisState

Section titled “The approximate horizon — GlobalVisState”

Computing an exact “oldest XID anyone can see” on every pruning decision would mean re-scanning the procarray constantly. PostgreSQL instead keeps a pair of approximate bounds per relation class, refreshed lazily:

// GlobalVisState — src/backend/storage/ipc/procarray.c
struct GlobalVisState
{
/* XIDs >= are considered running by some backend */
FullTransactionId definitely_needed;
/* XIDs < are not considered to be running by any backend */
FullTransactionId maybe_needed;
};

A tuple’s deleting XID below maybe_needed is definitely removable; one at or above definitely_needed is definitely still needed; one between the two is ambiguous and triggers a precise recomputation via ComputeXidHorizons:

// GlobalVisTestIsRemovableFullXid — src/backend/storage/ipc/procarray.c (condensed)
if (FullTransactionIdPrecedes(fxid, state->maybe_needed))
return true; /* below floor: removable */
if (FullTransactionIdFollowsOrEquals(fxid, state->definitely_needed))
return false; /* above ceiling: keep */
/* ambiguous band: recompute exact horizon, then recheck */
if (GlobalVisTestShouldUpdate(state))
{
GlobalVisUpdate();
return FullTransactionIdPrecedes(fxid, state->maybe_needed);
}
else
return false;

Four instances exist — GlobalVisSharedRels, GlobalVisCatalogRels, GlobalVisDataRels, GlobalVisTempRels — because a normal user table can use a more aggressive horizon than a shared catalog (a temp table’s old versions are invisible to other sessions entirely, so its horizon is just the current backend’s XID). GlobalVisTestFor(rel) picks the right one. These bounds are FullTransactionId (64-bit, wrap-free) precisely so the horizon arithmetic cannot be confused by 32-bit XID wraparound — the same reason vacuum freezing exists. This is the seam to postgres-vacuum.md and postgres-heap-am.md (HOT pruning); both consume GlobalVisTestIsRemovableXid.

The deleter half of the visibility verdict

Section titled “The deleter half of the visibility verdict”

Once the inserter is established as committed-and-visible, HeapTupleSatisfiesMVCC examines t_xmax to decide whether the row was deleted before our snapshot. The structure mirrors the inserter ladder but inverts the verdict — a committed-and-visible deleter makes the row invisible:

// HeapTupleSatisfiesMVCC — deleter (t_xmax) path, src/backend/access/heap/heapam_visibility.c (condensed)
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* no deleter, or deleter aborted */
return true; /* visible: the row is live */
if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
return true; /* xmax is a lock, not a delete */
/* ... HEAP_XMAX_IS_MULTI (multixact) arm: resolve update xid, recurse ... */
if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
{
if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot))
return true; /* deleter still in progress -> visible */
if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
{
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
return true; /* deleter aborted -> visible, cache it */
}
SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
HeapTupleHeaderGetRawXmax(tuple)); /* deleter committed -> cache */
}
else
{
if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot))
return true; /* committed but invisible to us -> visible */
}
return false; /* deleter committed and visible -> deleted */

The symmetry is the whole point: an absent/aborted/lock-only t_xmax leaves the row live; a deleter that is in our snapshot (still in progress at snapshot time) means the delete is invisible to us, so we still see the row; only a deleter that committed before our snapshot makes the row invisible. The HEAP_XMAX_IS_MULTI arm handles the case where t_xmax is a MultiXactId packing several lockers plus possibly one updater — HeapTupleGetUpdateXid extracts the real deleting XID before the same committed/in-snapshot test runs. SetHintBits caches the deleter verdict (HEAP_XMAX_COMMITTED / HEAP_XMAX_INVALID) under the same WAL-before-hint rule as the inserter side.

The visibility machinery spans four files. The call flow runs GetTransactionSnapshotGetSnapshotData → (per scan) HeapTupleSatisfiesVisibilityHeapTupleSatisfiesMVCCXidInMVCCSnapshot / SetHintBits, with snapmgr.c owning lifetime and procarray.c owning the horizon. The procarray internals (PGPROC slot layout, XID publication, group clear) are postgres-procarray.md; this list stops where GetSnapshotData reads the published arrays.

The snapshot struct and its stamps (snapshot.h, htup_details.h)

Section titled “The snapshot struct and its stamps (snapshot.h, htup_details.h)”
  • SnapshotData (struct tag in snapshot.h) — the photograph. xmin/xmax watermarks, xip[]/xcnt top-level in-progress set, subxip[]/subxcnt + suboverflowed for subxacts, curcid for intra-xact visibility, takenDuringRecovery selecting the recovery-shaped layout, vistest pointing at the relevant GlobalVisState, and snapXactCompletionCount enabling reuse.
  • SnapshotType (enum in snapshot.h) — SNAPSHOT_MVCC is the common path; HeapTupleSatisfiesVisibility dispatches on it to HeapTupleSatisfiesMVCC, HeapTupleSatisfiesSelf, HeapTupleSatisfiesDirty, etc.
  • HEAP_XMIN_COMMITTED / HEAP_XMIN_INVALID / HEAP_XMIN_FROZEN / HEAP_XMAX_COMMITTED / HEAP_XMAX_INVALID (macros in htup_details.h) — the infomask hint bits. HEAP_XMIN_FROZEN is (HEAP_XMIN_COMMITTED | HEAP_XMIN_INVALID) — both bits set means “frozen”, read by HeapTupleHeaderXminFrozen.
  • GetSnapshotData — the procarray census. Sets xmax = latestCompletedXid + 1, seeds xmin = xmax, then one pass over ProcGlobal->xids[] collecting every in-progress XID below xmax, skipping the invalid, the backend’s own, and PROC_IN_VACUUM/PROC_IN_LOGICAL_DECODING slots. Publishes MyProc->xmin, TransactionXmin, RecentXmin, and advances the GlobalVis* bounds.
  • GetSnapshotDataReuse — the fast path. Compares the snapshot’s stored snapXactCompletionCount against TransamVariables->xactCompletionCount; equal means no transaction completed since the last snapshot, so the cached xip[] is provably identical and the scan is skipped. Still re-pins MyProc->xmin and refreshes curcid.
  • ComputeXidHorizons — the precise horizon recomputation. Walks the procarray to produce the exact oldest-needed XIDs per relation class; called lazily by GlobalVisUpdate when an ambiguous XID lands between the approximate bounds.
  • GlobalVisState (struct tag) and GlobalVisTestFor / GlobalVisTestIsRemovableXid / GlobalVisTestIsRemovableFullXid / GlobalVisUpdate — the approximate per-class horizon (definitely_needed / maybe_needed) consumed by pruning and vacuum. Four instances: GlobalVisSharedRels, GlobalVisCatalogRels, GlobalVisDataRels, GlobalVisTempRels.
  • HeapTupleSatisfiesVisibility — the dispatcher; switches on snapshot->snapshot_type and forwards to the right predicate.
  • HeapTupleSatisfiesMVCC — the SNAPSHOT_MVCC predicate. Inserter ladder (t_xmin): hint bit → curcid for own-xact → XidInMVCCSnapshotTransactionIdDidCommit → cache. Then deleter ladder (t_xmax) with inverted verdict. Multixact arms via HeapTupleGetUpdateXid.
  • SetHintBits (and exported HeapTupleSetHintBits) — caches a commit/abort verdict into t_infomask under the WAL-before-hint rule (XLogNeedsFlush + BufferGetLSNAtomic interlock), then MarkBufferDirtyHint.

Membership, lifetime, and horizon (snapmgr.c)

Section titled “Membership, lifetime, and horizon (snapmgr.c)”
  • XidInMVCCSnapshot — the membership test. Two watermark comparisons decide most XIDs; only [xmin, xmax) reaches the pg_lfind32 scan of subxip[] then xip[]. suboverflowed forces a SubTransGetTopmostTransaction lookup through pg_subtrans.
  • GetTransactionSnapshot — the per-query entry point. First call: InvalidateCatalogSnapshot, then either a registered transaction-lifetime snapshot (RR/SR, copied + added to RegisteredSnapshots) or a fresh CurrentSnapshotData (RC). Subsequent calls: reuse for RR/SR, fresh for RC.
  • GetLatestSnapshot — an up-to-the-instant snapshot into SecondarySnapshotData, for code (RI checks) needing the latest state even in RR/SR.
  • GetCatalogSnapshot / InvalidateCatalogSnapshot — the separate MVCC snapshot for system-catalog scans, kept no older than the xact snapshot and invalidated on catalog change.
  • CopySnapshot — deep-copies CurrentSnapshotData so a transaction-lifetime snapshot survives statement boundaries.
  • SnapshotResetXmin — recomputes MyProc->xmin as the oldest still-registered snapshot’s xmin (the RegisteredSnapshots pairing-heap root), or InvalidTransactionId when none remain — releasing the backend’s hold on the vacuum horizon.
  • XACT_READ_COMMITTED / XACT_REPEATABLE_READ / XACT_SERIALIZABLE, IsolationUsesXactSnapshot() (>= XACT_REPEATABLE_READ), and IsolationIsSerializable() (== XACT_SERIALIZABLE) — the dials GetTransactionSnapshot reads to choose snapshot lifetime.

Position hints (as of 2026-06-05, REL_18 273fe94)

Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”
SymbolFileLine
SnapshotData (struct)src/include/utils/snapshot.h138
SnapshotData.xmin / xmaxsrc/include/utils/snapshot.h153 / 154
SnapshotData.xip / xcntsrc/include/utils/snapshot.h164 / 165
SnapshotData.snapXactCompletionCountsrc/include/utils/snapshot.h209
HEAP_XMIN_COMMITTED / _INVALID / _FROZENsrc/include/access/htup_details.h204 / 205 / 206
HEAP_XMAX_COMMITTED / _INVALIDsrc/include/access/htup_details.h207 / 208
XACT_READ_COMMITTEDXACT_SERIALIZABLEsrc/include/access/xact.h37–39
IsolationUsesXactSnapshot / IsolationIsSerializablesrc/include/access/xact.h51 / 52
GetSnapshotDatasrc/backend/storage/ipc/procarray.c2175
GetSnapshotDataReusesrc/backend/storage/ipc/procarray.c2095
ComputeXidHorizonssrc/backend/storage/ipc/procarray.c1735
GlobalVisState (struct)src/backend/storage/ipc/procarray.c167
GlobalVisTestForsrc/backend/storage/ipc/procarray.c4107
GlobalVisUpdatesrc/backend/storage/ipc/procarray.c4205
GlobalVisTestIsRemovableFullXidsrc/backend/storage/ipc/procarray.c4222
GlobalVisTestIsRemovableXidsrc/backend/storage/ipc/procarray.c4263
HeapTupleSatisfiesMVCCsrc/backend/access/heap/heapam_visibility.c960
HeapTupleSatisfiesVisibilitysrc/backend/access/heap/heapam_visibility.c1776
SetHintBitssrc/backend/access/heap/heapam_visibility.c114
GetTransactionSnapshotsrc/backend/utils/time/snapmgr.c271
GetLatestSnapshotsrc/backend/utils/time/snapmgr.c353
GetCatalogSnapshotsrc/backend/utils/time/snapmgr.c384
InvalidateCatalogSnapshotsrc/backend/utils/time/snapmgr.c454
CopySnapshotsrc/backend/utils/time/snapmgr.c606
SnapshotResetXminsrc/backend/utils/time/snapmgr.c935
XidInMVCCSnapshotsrc/backend/utils/time/snapmgr.c1870

Checked against commit 273fe94 on branch REL_18_STABLE. Every code excerpt in this document was condensed from the lines cited in the position-hint table above; ellipses (// ...) mark elided lines, and comments were preserved verbatim where quoted.

  • xmax is always latestCompletedXid + 1, computed before the scan. Verified in GetSnapshotData (procarray.c, lines 2247–2249): xmax is read from TransamVariables->latestCompletedXid then TransactionIdAdvance(xmax). xmin is seeded to xmax and only lowered inside the loop, so a snapshot with no in-progress transactions has xmin == xmax.

  • The backend’s own XID is excluded from the snapshot, by pgxactoff not by value. Verified in the collection loop (procarray.c, line 2293): if (pgxactoff == mypgxactoff) continue;. Own-write visibility is handled entirely by TransactionIdIsCurrentTransactionId + curcid inside HeapTupleSatisfiesMVCC, never by the xip[] membership test.

  • PROC_IN_VACUUM and PROC_IN_LOGICAL_DECODING backends are skipped. Verified in the loop’s statusFlags filter (procarray.c, around line 2300+): a lazy vacuum holds no snapshot that should pin the horizon, so its XID is not collected. This is what lets a long-running VACUUM avoid dragging the global xmin down.

  • The reuse fast path is gated solely on xactCompletionCount. Verified in GetSnapshotDataReuse (procarray.c, lines 2101–2106): if the stored snapXactCompletionCount is nonzero and equals the current global counter, the snapshot is reused without a procarray scan. The in-tree comment (lines 2108–2127) cites transam/README for the invariant that the running-xid set cannot change while ProcArrayLock is held and the counter is bumped only on completion under that lock.

  • XidInMVCCSnapshot decides most XIDs with two comparisons. Verified (snapmgr.c, lines 1880–1885): xid < xmin → false, xid >= xmax → true; only the [xmin, xmax) band reaches pg_lfind32 over subxip[] then xip[]. The suboverflowed branch (lines 1908–1923) maps a subxact to its parent via SubTransGetTopmostTransaction before the array scan.

  • The hint-bit write obeys a WAL-before-hint interlock. Verified in SetHintBits (heapam_visibility.c, lines 117–131): a “committed” hint with a valid xid is suppressed when BufferIsPermanent && XLogNeedsFlush(commitLSN) && BufferGetLSNAtomic(buffer) < commitLSN. The dirty mark is MarkBufferDirtyHint (not MarkBufferDirty), because the bit is reconstructible from clog and its loss across a crash is harmless. Aborted / invalid hints pass InvalidTransactionId and skip the interlock entirely.

  • Isolation level is the only difference between RC and RR snapshot lifetime. Verified in GetTransactionSnapshot (snapmgr.c, lines 315–344): RR/SR (IsolationUsesXactSnapshot()) builds once, CopySnapshots, registers in RegisteredSnapshots, and returns the same snapshot thereafter; RC builds a fresh CurrentSnapshotData on every call. SERIALIZABLE diverges only by routing the first build through GetSerializableTransactionSnapshot.

  • GlobalVisState bounds are FullTransactionId. Verified (procarray.c, lines 167–173): both definitely_needed and maybe_needed are 64-bit FullTransactionId, so the horizon arithmetic is immune to 32-bit XID wraparound. GlobalVisTestIsRemovableFullXid (lines 4222–4254) returns removable below maybe_needed, not-removable at/above definitely_needed, and recomputes via GlobalVisUpdate for the ambiguous band.

  1. Exact statusFlags mask constant in the REL_18 collection loop. The prose names PROC_IN_VACUUM | PROC_IN_LOGICAL_DECODING; this document quoted the filter condensed rather than the literal mask expression at the cited line. A reader extending the loop should re-read procarray.c around line 2300 and confirm whether any additional PROC_* flag now participates in snapshot exclusion. Investigation path: grep statusFlags & in GetSnapshotData and cross-check the PROC_* definitions in proc.h.

  2. Interaction of the reuse fast path with curcid advancement. GetSnapshotDataReuse refreshes snapshot->curcid = GetCurrentCommandId(false) even on reuse (line 2134), so a reused snapshot still sees later commands of the same transaction. Whether any caller depends on the non-refreshed curcid of a reused snapshot is unverified; the path appears safe but was not exhaustively traced across all GetSnapshotData callers.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”

Pointers, not analysis. Each bullet is a starting handle for a follow-up doc or a cross-reference to a sibling; depth here is intentionally shallow.

  • The Berenson critique is the phenomenon dictionary. Berenson, Bernstein, Gray, Melton, O’Neil & O’Neil, A Critique of ANSI SQL Isolation Levels (SIGMOD 1995 / MSR TR-95-51) is why PostgreSQL’s snapshot machinery prevents dirty / non-repeatable / phantom reads at RR but cannot, alone, prevent write skew. The next step from this doc is postgres-ssi-predicate-locking.md, which layers Cahill et al.’s Serializable Snapshot Isolation (SIGMOD 2008; Ports & Grittner, VLDB 2012) on top of the same SI snapshot to catch the rw-antidependency cycles SI misses.

  • OCC as the contrast family. Kung & Robinson, On Optimistic Methods for Concurrency Control (TODS 1981; dbms-papers/occ.md) validates at commit instead of stamping versions. MVCC trades that validation cost for the storage and reclamation cost this document’s vacuum-horizon machinery exists to manage. A measured comparison would quantify when each wins.

  • In-place vs out-of-place version storage. PostgreSQL keeps every version on the heap (cheap reads, scan-everything VACUUM); Oracle/InnoDB/CUBRID keep old versions in undo with a back-pointer (compact heap, indirected old-version reads). cubrid-mvcc.md is the sibling for the out-of-place side; the symmetric costs (PostgreSQL bloat/HOT vs CUBRID vacuum read-amplification) are the same design choice seen from two ends.

  • Snapshot scalability at high core counts. The PG 14 snapshot-scalability rework (the snapXactCompletionCount reuse path and the separation of XIDs into a dense ProcGlobal->xids[] array) targets the procarray scan cost that Yu et al., Staring into the Abyss (VLDB 2015) measures across CC protocols at 1000 cores. The procarray-side detail is postgres-procarray.md.

  • In-memory MVCC redesigns. Hekaton, HyPer, and Cicada drop the central procarray scan for cache-aware in-memory version chains and timestamp allocation; Wu et al., An Empirical Evaluation of In-Memory MVCC (VLDB 2017) surveys the space. Useful for separating costs intrinsic to MVCC from those intrinsic to disk-resident MVCC like PostgreSQL’s.

  • The 64-bit XID horizon and wraparound. GlobalVisState’s FullTransactionId bounds are a stepping stone toward eliminating 32-bit XID wraparound entirely; the freezing / wraparound mechanics are owned by postgres-vacuum.md, the natural next read on what the horizon gates.

  • None. This document is synthesized directly from the REL_18 source tree (sources: [] in frontmatter).

Source code (commit 273fe94, REL_18_STABLE, as of 2026-06-05)

Section titled “Source code (commit 273fe94, REL_18_STABLE, as of 2026-06-05)”
  • src/include/utils/snapshot.hSnapshotData, SnapshotType.
  • src/include/access/htup_details.hHEAP_XMIN_* / HEAP_XMAX_* hint-bit macros, HeapTupleHeaderXmin* accessors.
  • src/include/access/xact.h — isolation-level constants and IsolationUsesXactSnapshot / IsolationIsSerializable macros.
  • src/backend/storage/ipc/procarray.cGetSnapshotData, GetSnapshotDataReuse, ComputeXidHorizons, GlobalVisState and the GlobalVisTest* family.
  • src/backend/access/heap/heapam_visibility.cHeapTupleSatisfiesMVCC, HeapTupleSatisfiesVisibility, SetHintBits.
  • src/backend/utils/time/snapmgr.cGetTransactionSnapshot, GetLatestSnapshot, GetCatalogSnapshot, InvalidateCatalogSnapshot, CopySnapshot, SnapshotResetXmin, XidInMVCCSnapshot.
  • Petrov, Database Internals (O’Reilly 2019), ch. 5 — MVCC, isolation levels, snapshot isolation. Captured at knowledge/research/dbms-general/database-internals.md.
  • Berenson, Bernstein, Gray, Melton, O’Neil & O’Neil, A Critique of ANSI SQL Isolation Levels (SIGMOD 1995 / Microsoft TR-95-51).
  • Kung & Robinson, On Optimistic Methods for Concurrency Control (ACM TODS 1981). See knowledge/research/dbms-papers/occ.md.
  • Cahill, Röhm & Fekete, Serializable Isolation for Snapshot Databases (SIGMOD 2008); Ports & Grittner, Serializable Snapshot Isolation in PostgreSQL (VLDB 2012). Picked up by postgres-ssi-predicate-locking.md.
  • postgres-procarray.md — PGPROC slot layout, XID publication, group clear: the internals GetSnapshotData reads.
  • postgres-heap-am.md — heap tuple header, HOT, the consumer of GlobalVisTestIsRemovableXid for pruning.
  • postgres-vacuum.md — what the vacuum horizon gates; freezing and wraparound.
  • postgres-ssi-predicate-locking.md — SERIALIZABLE / write-skew prevention on top of the SI snapshot.