Skip to content

PostgreSQL Visibility Map — The Two-Bit Per-Page Bitmap That Drives Vacuum Skipping and Index-Only Scans

Contents:

MVCC-based engines never modify tuples in place: a delete sets the xmax field on the old version; an update inserts a new version and sets xmax on the old one. The old bytes remain on the page until a background cleaner determines that no running transaction can see them any more, at which point it reclaims the space. Database System Concepts (Silberschatz, 7e, §15.6 “Multiversion Concurrency Control”) characterises this as the fundamental MVCC trade-off: reads never block writes because each sees its own snapshot, but dead versions accumulate until something sweeps them away.

Two questions arise for the sweeper:

  1. Which pages need attention? If a page has no dead tuples and every live tuple is visible to all current and future transactions, there is nothing to do. Re-reading it wastes I/O.
  2. Which pages can an index-only scan trust? An index-only scan returns attribute values straight from the index leaf page, skipping the heap entirely — but only when it can be certain the indexed row is visible to the query’s snapshot without consulting heap visibility information.

Both questions reduce to the same property: is every tuple on this heap page visible to every transaction that could ever run? Maintaining a compact, per-page record of that property — one that is cheaper to consult than reading the page itself — is the problem the visibility map solves.

The theoretical framing is a conservative approximation: when the bit is set, the property is guaranteed; when the bit is clear, the property may or may not hold. False negatives (cleared bit when the page is actually clean) cause unnecessary work but never wrong answers. False positives (set bit when the page has dead tuples) would be a correctness error, so the design always errs toward clearing bits when in doubt.

The second bit — all-frozen — addresses a distinct but related requirement: XID wraparound prevention. PostgreSQL uses 32-bit transaction identifiers that wrap around every ~2 billion transactions. Tuples whose xmin or xmax predates the wraparound horizon become invisible unless they are frozen: their XID fields are replaced with the special FrozenTransactionId (XID 2), which is always considered “in the past.” A page where every tuple is already frozen needs no freeze work even during an aggressive anti-wraparound VACUUM. The all-frozen bit captures exactly that property.

Most MVCC engines attach hint bits directly to the tuple or row header to cache visibility decisions. Once a transaction that inserted a row is known to be committed, a backend can set an XMIN_COMMITTED hint bit on the tuple so future backends skip the commit-log lookup. Hint bits are opportunistic and not WAL-logged: they are set silently by any backend that evaluates visibility, and re-derived if lost. PostgreSQL uses t_infomask bits (HEAP_XMIN_COMMITTED, HEAP_XMAX_COMMITTED, etc.) for this. The visibility map is a coarser but more accessible layer above hint bits: instead of scanning individual tuples, a reader checks one bit for the entire page.

Separate-file bitmaps for coarse-grained state

Section titled “Separate-file bitmaps for coarse-grained state”

Rather than storing per-page metadata inside the heap pages themselves (which would require reading and dirtying the heap to update the metadata), several engines keep auxiliary bitmaps in separate files that can be read without touching the data. PostgreSQL’s free-space map (FSM) uses this same pattern: a separate fork stores approximate free space per page so the inserter can find a candidate page without scanning the heap. The visibility map follows the same structural principle — a dedicated storage fork, one record per heap page, consulted before any heap I/O is attempted.

The two-phase pin protocol for crash safety

Section titled “The two-phase pin protocol for crash safety”

When a cleaner marks a page as having a property (all-visible, all-frozen), it must ensure that the backing on-disk state cannot be lost on a crash. The naive sequence — update the page in place, then update the auxiliary bitmap — is vulnerable: if the disk writes happen in the wrong order, the bitmap bit may be set while the heap page on disk still contains dirty state. The standard solution is to log both updates together in a single WAL record (or coordinate their LSNs), so crash recovery can restore both atomically. PostgreSQL’s XLOG_HEAP2_VISIBLE record encodes exactly this coordination.

A symmetric asymmetry appears in nearly every such system: clearing the bit (saying “this page may be dirty”) is always safe to do silently, because the result is a false negative — we do more work than necessary, but never return wrong data. Setting the bit (asserting “this page is clean”) requires a durability guarantee, because a false positive could cause a reader to skip a heap fetch and return stale data. The logging asymmetry (set = WAL-logged, clear = no WAL needed but must accompany the heap modification) flows directly from this.

ConceptPostgreSQL name
Per-page “all dead-tuples reclaimed” bitVISIBILITYMAP_ALL_VISIBLE (0x01) in VM fork
Per-page “all tuples frozen” bitVISIBILITYMAP_ALL_FROZEN (0x02) in VM fork
Page-level hint mirroring the VM bitPD_ALL_VISIBLE flag in PageHeaderData.pd_flags
Auxiliary bitmap fileVISIBILITYMAP_FORKNUM (fork 2) of the relation
WAL record coordinating VM + heap LSNXLOG_HEAP2_VISIBLE / log_heap_visible
Consumer: vacuum skipvisibilitymap_get_status in vacuumlazy.c
Consumer: index-only scan skipVM_ALL_VISIBLE macro in nodeIndexonlyscan.c

Storage layout: a two-bit bitmap in fork 2

Section titled “Storage layout: a two-bit bitmap in fork 2”

Each relation has up to four on-disk storage forks:

fork 0 MAIN_FORKNUM — heap data pages
fork 1 FSM_FORKNUM — free-space map
fork 2 VISIBILITYMAP_FORKNUM — visibility map ← this doc
fork 3 INIT_FORKNUM — unlogged-table init fork

The visibility map fork stores a compact bitmap: two bits per heap page, packed into standard 8 KB buffer-manager pages with no special header beyond the standard PageHeaderData. The two bits for heap block N live at a well-defined position computed by three macros:

// HEAPBLK_TO_MAPBLOCK / HEAPBLK_TO_MAPBYTE / HEAPBLK_TO_OFFSET — visibilitymap.c
#define MAPSIZE (BLCKSZ - MAXALIGN(SizeOfPageHeaderData))
#define HEAPBLOCKS_PER_BYTE (BITS_PER_BYTE / BITS_PER_HEAPBLOCK) /* 4 */
#define HEAPBLOCKS_PER_PAGE (MAPSIZE * HEAPBLOCKS_PER_BYTE)
#define HEAPBLK_TO_MAPBLOCK(x) ((x) / HEAPBLOCKS_PER_PAGE)
#define HEAPBLK_TO_MAPBYTE(x) (((x) % HEAPBLOCKS_PER_PAGE) / HEAPBLOCKS_PER_BYTE)
#define HEAPBLK_TO_OFFSET(x) (((x) % HEAPBLOCKS_PER_BYTE) * BITS_PER_HEAPBLOCK)

BITS_PER_HEAPBLOCK is 2 (from visibilitymapdefs.h), so each byte covers four heap blocks, and a single 8 KB VM page covers approximately 32,736 heap pages. The bit layout within each byte is:

bit 0 (mask 0x01): VISIBILITYMAP_ALL_VISIBLE
bit 1 (mask 0x02): VISIBILITYMAP_ALL_FROZEN

The two flag constants are:

visibilitymapdefs.h
#define VISIBILITYMAP_ALL_VISIBLE 0x01
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03

The all-frozen bit must never be set without the all-visible bit also being set (enforced by an assertion in visibilitymap_set). A page that is frozen is necessarily also all-visible.

Figure 1 — VM bit layout per heap block pair

flowchart LR
    subgraph "one byte in VM page"
        B7["bit 7\nblock N+3\nfrozen"]
        B6["bit 6\nblock N+3\nvisible"]
        B5["bit 5\nblock N+2\nfrozen"]
        B4["bit 4\nblock N+2\nvisible"]
        B3["bit 3\nblock N+1\nfrozen"]
        B2["bit 2\nblock N+1\nvisible"]
        B1["bit 1\nblock N\nfrozen"]
        B0["bit 0\nblock N\nvisible"]
    end

Figure 1 — Each byte in the VM page stores the all-visible (low bit) and all-frozen (high bit) flags for four consecutive heap blocks. Block N uses bits 0–1, block N+1 uses bits 2–3, and so on.

The dual representation: VM bit and PD_ALL_VISIBLE

Section titled “The dual representation: VM bit and PD_ALL_VISIBLE”

The visibility map bit is not the only place the all-visible property is recorded. The heap page itself carries a PD_ALL_VISIBLE flag in PageHeaderData.pd_flags:

// PageIsAllVisible / PageSetAllVisible / PageClearAllVisible — src/include/storage/bufpage.h
#define PD_ALL_VISIBLE 0x0004 /* all tuples on page are visible to everyone */

The two representations must be kept in sync. When a heap-modifying operation clears the VM bit, it must also clear PD_ALL_VISIBLE on the heap page, and it does so in the same critical section and the same WAL record as the heap change itself. When VACUUM sets the VM bit, it sets PD_ALL_VISIBLE on the heap page first, then calls visibilitymap_set. The invariant the system maintains is:

VM bit set ⟹ PD_ALL_VISIBLE set.
VM bit clear does not imply PD_ALL_VISIBLE clear.

If a crash occurs after the VM bit reaches disk but before PD_ALL_VISIBLE is written to the heap page, WAL replay of the XLOG_HEAP2_VISIBLE record restores PD_ALL_VISIBLE on the heap page, maintaining the invariant.

Setting a VM bit is a two-step operation:

  1. Pin the VM page (visibilitymap_pin) — acquire a buffer-manager pin on the VM page that covers the target heap block. This may require I/O to read the VM page from disk. Critically, this step happens before locking the heap page, because holding a buffer lock during I/O is forbidden.
  2. Set the bit (visibilitymap_set) — with the heap page buffer-locked, set the bit and emit the WAL record.
// visibilitymap_pin — src/backend/access/heap/visibilitymap.c
void
visibilitymap_pin(Relation rel, BlockNumber heapBlk, Buffer *vmbuf)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
if (BufferIsValid(*vmbuf))
{
if (BufferGetBlockNumber(*vmbuf) == mapBlock)
return; /* already pinned — reuse */
ReleaseBuffer(*vmbuf);
}
*vmbuf = vm_readbuf(rel, mapBlock, true); /* extend if needed */
}
// visibilitymap_set — src/backend/access/heap/visibilitymap.c
uint8
visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
uint8 flags)
{
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
uint8 *map;
uint8 status;
/* ... assertions omitted ... */
page = BufferGetPage(vmBuf);
map = (uint8 *) PageGetContents(page);
LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
if (flags != status)
{
START_CRIT_SECTION();
map[mapByte] |= (flags << mapOffset);
MarkBufferDirty(vmBuf);
if (RelationNeedsWAL(rel))
{
if (XLogRecPtrIsInvalid(recptr))
recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
if (XLogHintBitIsNeeded())
PageSetLSN(BufferGetPage(heapBuf), recptr);
PageSetLSN(page, recptr);
}
END_CRIT_SECTION();
}
LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
return status; /* previous status, for caller's benefit */
}

The function returns the previous status of the bits, which VACUUM uses to decide whether to count the page toward pg_class.relallvisible.

Clearing a bit: no WAL needed, must accompany heap modification

Section titled “Clearing a bit: no WAL needed, must accompany heap modification”

Clearing a VM bit is simpler and requires no WAL:

// visibilitymap_clear — src/backend/access/heap/visibilitymap.c
bool
visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags)
{
int mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
int mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
uint8 mask = flags << mapOffset;
char *map;
bool cleared = false;
/* Assert: never clear all_visible while leaving all_frozen set */
Assert(flags != VISIBILITYMAP_ALL_VISIBLE);
/* ... buffer validation ... */
LockBuffer(vmbuf, BUFFER_LOCK_EXCLUSIVE);
map = PageGetContents(BufferGetPage(vmbuf));
if (map[mapByte] & mask)
{
map[mapByte] &= ~mask;
MarkBufferDirty(vmbuf);
cleared = true;
}
LockBuffer(vmbuf, BUFFER_LOCK_UNLOCK);
return cleared;
}

The callers in heapam.cheap_insert, heap_update, heap_delete, and heap_lock_tuple — invoke visibilitymap_clear inside the same critical section as the heap page modification, holding the heap buffer lock throughout. This ensures that WAL replay of the heap operation also clears the VM bit, even if a crash occurred before the VM page’s dirty buffer was flushed.

Reading a bit: lock-free, caller bears responsibility

Section titled “Reading a bit: lock-free, caller bears responsibility”
// visibilitymap_get_status — src/backend/access/heap/visibilitymap.c
uint8
visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf)
{
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
char *map;
uint8 result;
/* ... pin the VM page if needed ... */
map = PageGetContents(BufferGetPage(*vmbuf));
result = ((map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS);
return result;
}

The read is performed without any lock on the VM page. The comment in the source is explicit: “somebody else could change the bit just after we look at it.” This is intentional — a stale read of the VM bit produces a conservative answer (it may say “not all-visible” when the page actually is), which is safe. The caller is responsible for handling races, and VACUUM uses a subsequent locked re-check before acting on the status.

The convenience macros VM_ALL_VISIBLE and VM_ALL_FROZEN wrap this function:

// VM_ALL_VISIBLE / VM_ALL_FROZEN — src/include/access/visibilitymap.h
#define VM_ALL_VISIBLE(r, b, v) \
((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_VISIBLE) != 0)
#define VM_ALL_FROZEN(r, b, v) \
((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_FROZEN) != 0)

VACUUM’s inner loop in vacuumlazy.c calls visibilitymap_get_status before reading each heap page. A page with VISIBILITYMAP_ALL_VISIBLE set can be skipped for dead-tuple removal; a page with VISIBILITYMAP_ALL_FROZEN set can be skipped even during aggressive (anti-wraparound) VACUUM. The skip logic is:

// heap_vac_scan_next_block (inner loop) — src/backend/access/heap/vacuumlazy.c
uint8 mapbits = visibilitymap_get_status(vacrel->rel,
next_unskippable_block,
&next_unskippable_vmbuffer);
next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
if (!next_unskippable_allvis)
break; /* must process this block */
/* frozen check follows for aggressive mode ... */

When VACUUM determines that all tuples on a page are visible and optionally frozen, it calls visibilitymap_set with the appropriate flags. It first calls visibilitymap_count at the start and end of the pass to update pg_class.relallvisible, which the planner uses to estimate index-only scan benefits.

Figure 2 — VACUUM page-skip decision flow

flowchart TD
    A["next heap block"] --> B{"VM: all-visible?"}
    B -->|"no"| C["read heap page\nprocess dead tuples"]
    B -->|"yes"| D{"aggressive vacuum?"}
    D -->|"no"| E["skip block entirely"]
    D -->|"yes"| F{"VM: all-frozen?"}
    F -->|"yes"| E
    F -->|"no"| G["read heap page\nfreeze old XIDs\nset all-frozen if done"]
    C --> H{"page now all-visible?"}
    H -->|"yes"| I["visibilitymap_set\nALL_VISIBLE\nor ALL_FROZEN"]
    H -->|"no"| A
    G --> A
    E --> A
    I --> A

Figure 2 — VACUUM’s inner loop checks the VM bit before issuing any heap I/O. All-visible pages are skipped entirely in normal mode; all-frozen pages are also skipped in aggressive mode.

An index-only scan (IOS) returns attribute values from the index leaf without a heap fetch — but only when the heap tuple is known visible to the scan’s snapshot. Without the VM, the executor would need to fetch the heap page for every index entry just to verify visibility. The VM eliminates most of those fetches:

// IndexOnlyNext — src/backend/executor/nodeIndexonlyscan.c
if (!VM_ALL_VISIBLE(scandesc->heapRelation,
ItemPointerGetBlockNumber(tid),
&node->ioss_VMBuffer))
{
/* must visit heap to check visibility */
InstrCountTuples2(node, 1);
if (!index_fetch_heap(scandesc, node->ioss_TableSlot))
continue; /* no visible tuple, try next index entry */
/* ... */
}

When VM_ALL_VISIBLE returns true, the executor trusts that the tuple is visible to all transactions and skips the heap fetch entirely. On a large, mostly-static table where VACUUM has set VM bits on most pages, an IOS can answer a query entirely from the index with zero heap reads.

Figure 3 — Index-only scan visibility decision

flowchart TD
    IDX["index leaf entry\n(tid, attributes)"] --> VM{"VM_ALL_VISIBLE\n(tid.block)?"}
    VM -->|"true"| RET["return attributes\nfrom index\n(no heap fetch)"]
    VM -->|"false"| HEAP["fetch heap tuple\ncheck MVCC visibility"]
    HEAP --> VIS{"tuple visible\nto snapshot?"}
    VIS -->|"yes"| RET2["return attributes\nfrom heap"]
    VIS -->|"no"| SKIP["skip — try\nnext index entry"]

Figure 3 — When the VM bit is set for a heap block, the index-only scan returns data directly from the index without touching the heap. When the bit is clear, it falls back to heap visibility checking.

When visibilitymap_set emits a WAL record, it calls log_heap_visible in heapam.c:

// log_heap_visible — src/backend/access/heap/heapam.c
XLogRecPtr
log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
TransactionId snapshotConflictHorizon, uint8 vmflags)
{
xl_heap_visible xlrec;
xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
xlrec.flags = vmflags;
if (RelationIsAccessibleInLogicalDecoding(rel))
xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
XLogBeginInsert();
XLogRegisterData(&xlrec, SizeOfHeapVisible);
XLogRegisterBuffer(0, vm_buffer, 0);
flags = REGBUF_STANDARD;
if (!XLogHintBitIsNeeded())
flags |= REGBUF_NO_IMAGE;
XLogRegisterBuffer(1, heap_buffer, flags);
recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
return recptr;
}

The record registers both the VM buffer and the heap buffer. During replay, heap_xlog_visible in heapam_xlog.c sets PD_ALL_VISIBLE on the heap page and sets the VM bit. The snapshotConflictHorizon field is used on Hot Standby to resolve recovery conflicts: if an index-only scan on the standby has an xmin horizon older than snapshotConflictHorizon, the scan is cancelled before it can return incorrect results based on the newly-set all-visible bit.

The additional VISIBILITYMAP_XLOG_CATALOG_REL flag is set for user catalog tables to support logical decoding on standbys — logical decoding must track visibility changes to catalog pages because it needs to reconstruct row images.

When a heap relation grows, the VM fork may also need to grow. vm_readbuf (the internal reader) calls vm_extend if the requested VM block number exceeds the cached size:

// vm_extend — src/backend/access/heap/visibilitymap.c
static Buffer
vm_extend(Relation rel, BlockNumber vm_nblocks)
{
Buffer buf;
buf = ExtendBufferedRelTo(BMR_REL(rel), VISIBILITYMAP_FORKNUM, NULL,
EB_CREATE_FORK_IF_NEEDED | EB_CLEAR_SIZE_CACHE,
vm_nblocks, RBM_ZERO_ON_ERROR);
CacheInvalidateSmgr(RelationGetSmgr(rel)->smgr_rlocator);
return buf;
}

The EB_CREATE_FORK_IF_NEEDED flag creates the VM fork on first use. CacheInvalidateSmgr broadcasts a shared-invalidation message so other backends close their cached smgr references to the relation, avoiding stale file-size state.

When a relation is truncated, visibilitymap_prepare_truncate clears the trailing bits in the last surviving VM page and returns the new VM size in blocks. If the new heap size falls exactly on a VM page boundary, no bit clearing is needed. The actual smgrtruncate call is the caller’s responsibility (TRUNCATE and CLUSTER code paths).

// visibilitymap_count — src/backend/access/heap/visibilitymap.c
void
visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen)
{
for (mapBlock = 0;; mapBlock++)
{
mapBuffer = vm_readbuf(rel, mapBlock, false);
if (!BufferIsValid(mapBuffer)) break;
map = (uint64 *) PageGetContents(BufferGetPage(mapBuffer));
nvisible += pg_popcount_masked((const char *) map, MAPSIZE, VISIBLE_MASK8);
if (all_frozen)
nfrozen += pg_popcount_masked((const char *) map, MAPSIZE, FROZEN_MASK8);
ReleaseBuffer(mapBuffer);
}
*all_visible = nvisible;
if (all_frozen) *all_frozen = nfrozen;
}

VISIBLE_MASK8 (0x55) selects the all-visible bits (the low bit of each two-bit pair) and FROZEN_MASK8 (0xAA) selects the all-frozen bits. pg_popcount_masked counts set bits in the byte range after ANDing with the mask. VACUUM calls this function before and after the pass to update pg_class.relallvisible, which the query planner uses to estimate the fraction of pages that an index-only scan can skip.

  • visibilitymap_clear — clear specified bits for one page; returns true if any bits were actually cleared. Caller must hold a buffer pin on the correct VM page (obtained via visibilitymap_pin) and the heap page must be buffer-locked. No WAL emitted here; correctness depends on the caller’s heap-modification WAL record also clearing the bit at redo time.

  • visibilitymap_pin — acquire (or reuse) a buffer-manager pin on the VM page covering heapBlk. Must be called before visibilitymap_set and before locking the heap page. Extends the VM fork if the page does not yet exist.

  • visibilitymap_pin_ok — test whether a previously pinned buffer still covers heapBlk. Used to avoid re-pinning when multiple operations on the same VM page are batched.

  • visibilitymap_set — set one or both bits on a previously pinned VM page. Caller must have set PD_ALL_VISIBLE on the heap page and hold the heap buffer locked. Emits XLOG_HEAP2_VISIBLE when WAL is required. Returns the previous bit status.

  • visibilitymap_get_status — return the two-bit status for heapBlk. No lock taken; lock-free read of a single byte (atomic on all supported architectures). Caller must handle the possibility of a stale read.

  • visibilitymap_count — scan the entire VM fork and count all-visible and all-frozen pages. No lock; approximate. Used by VACUUM to update pg_class.relallvisible.

  • visibilitymap_prepare_truncate — prepare for a relation truncation: clear trailing bits in the last surviving VM page, return the new VM block count. Returns InvalidBlockNumber if no truncation of the VM fork is required.

  • visibilitymap_truncation_length — pure computation: given a proposed heap truncation length, return the correct VM truncation length. No side effects.

  • vm_readbuf — read (or extend) a VM page using ReadBufferExtended with RBM_ZERO_ON_ERROR. Initialises new pages via PageInit. Caches the VM fork block count in smgr_cached_nblocks[VISIBILITYMAP_FORKNUM] to avoid repeated smgrnblocks calls.

  • vm_extend — extend the VM fork to at least vm_nblocks blocks using ExtendBufferedRelTo. Sends a CacheInvalidateSmgr message after extension.

  • vacuumlazy.c — calls visibilitymap_pin before the main heap scan, visibilitymap_get_status per block in the skip-decision loop, visibilitymap_set after determining a page is all-visible/all-frozen, and visibilitymap_count at the start and end of the VACUUM pass.

  • heapam.cheap_insert, heap_update, heap_delete, heap_lock_tuple, and related functions call visibilitymap_clear (with a pre-pinned buffer) when they modify a page that had its VM bit set.

  • nodeIndexonlyscan.cIndexOnlyNext calls VM_ALL_VISIBLE for every candidate index entry to decide whether to skip the heap fetch.

Position-hint table (commit 273fe94, 2026-06-05)

Section titled “Position-hint table (commit 273fe94, 2026-06-05)”
SymbolFileLine
BITS_PER_HEAPBLOCKsrc/include/access/visibilitymapdefs.h17
VISIBILITYMAP_ALL_VISIBLEsrc/include/access/visibilitymapdefs.h20
VISIBILITYMAP_ALL_FROZENsrc/include/access/visibilitymapdefs.h21
VISIBILITYMAP_VALID_BITSsrc/include/access/visibilitymapdefs.h22
MAPSIZEsrc/backend/access/heap/visibilitymap.c113
HEAPBLOCKS_PER_PAGEsrc/backend/access/heap/visibilitymap.c117
HEAPBLK_TO_MAPBLOCKsrc/backend/access/heap/visibilitymap.c120
HEAPBLK_TO_MAPBYTEsrc/backend/access/heap/visibilitymap.c122
HEAPBLK_TO_OFFSETsrc/backend/access/heap/visibilitymap.c123
VISIBLE_MASK8src/backend/access/heap/visibilitymap.c126
FROZEN_MASK8src/backend/access/heap/visibilitymap.c127
visibilitymap_clearsrc/backend/access/heap/visibilitymap.c145
visibilitymap_pinsrc/backend/access/heap/visibilitymap.c193
visibilitymap_pin_oksrc/backend/access/heap/visibilitymap.c217
visibilitymap_setsrc/backend/access/heap/visibilitymap.c242
visibilitymap_get_statussrc/backend/access/heap/visibilitymap.c329
visibilitymap_countsrc/backend/access/heap/visibilitymap.c382
visibilitymap_prepare_truncatesrc/backend/access/heap/visibilitymap.c428
visibilitymap_truncation_lengthsrc/backend/access/heap/visibilitymap.c506
vm_readbufsrc/backend/access/heap/visibilitymap.c524
vm_extendsrc/backend/access/heap/visibilitymap.c619
PD_ALL_VISIBLEsrc/include/storage/bufpage.h190
PageIsAllVisiblesrc/include/storage/bufpage.h431
VM_ALL_VISIBLEsrc/include/access/visibilitymap.h24
VM_ALL_FROZENsrc/include/access/visibilitymap.h27
log_heap_visiblesrc/backend/access/heap/heapam.c8813
heap_xlog_visiblesrc/backend/access/heap/heapam_xlog.c182
VISIBILITYMAP_FORKNUMsrc/include/common/relpath.h61
  • The VM stores exactly two bits per heap page, with no page-level header beyond the standard PageHeaderData. Verified by reading MAPSIZE, HEAPBLOCKS_PER_BYTE, and HEAPBLOCKS_PER_PAGE in visibilitymap.c. MAPSIZE = BLCKSZ - MAXALIGN(SizeOfPageHeaderData) and HEAPBLOCKS_PER_BYTE = BITS_PER_BYTE / BITS_PER_HEAPBLOCK = 4.

  • Setting the all-visible bit always requires WAL when the relation needs WAL; clearing never emits its own WAL record. Verified in visibilitymap_set (emits XLOG_HEAP2_VISIBLE) and visibilitymap_clear (only MarkBufferDirty, no XLogInsert). The correctness argument for clearing is in the source comment: callers ensure WAL replay of the heap change also clears the bit.

  • All-frozen must not be set without all-visible. Enforced by Assert(flags != VISIBILITYMAP_ALL_FROZEN) in visibilitymap_set and Assert(flags != VISIBILITYMAP_ALL_VISIBLE) (with VISIBILITYMAP_ALL_FROZEN still set) in visibilitymap_clear. Verified by reading both assertion sites.

  • visibilitymap_get_status takes no lock. Verified by reading the function body: no LockBuffer call. The source comment explicitly warns the caller about the race. On architectures where a byte read is atomic (all supported PostgreSQL platforms), the race is safe — the worst outcome is a stale conservative read.

  • The snapshotConflictHorizon in xl_heap_visible is used to cancel index-only scans on Hot Standby. Verified in heap_xlog_visible (heapam_xlog.c:182): ResolveRecoveryConflictWithSnapshot is called when InHotStandby is true.

  • VISIBILITYMAP_XLOG_CATALOG_REL is set only in the WAL record, never stored in the VM bitmap. Verified: the flag is or’d into xlrec.flags in log_heap_visible but not passed to the on-disk bit write. The constant is defined in visibilitymapdefs.h with a comment explicitly noting VISIBILITYMAP_XLOG_* constants must not be passed to visibilitymap_set.

  • visibilitymap_count uses pg_popcount_masked with VISIBLE_MASK8=0x55 and FROZEN_MASK8=0xAA. Verified at lines 373–375 in visibilitymap.c. The masks correctly isolate the low and high bits of each two-bit pair.

  1. Lazy initialization of the VM fork. vm_readbuf passes RBM_ZERO_ON_ERROR rather than failing if the VM page is corrupt or missing. This means a corrupt VM page silently becomes all-zeros (all bits clear), which is conservative and safe. Whether there is any mechanism to detect or report such silent zeroing is not verified. Investigation path: trace the RBM_ZERO_ON_ERROR path through ReadBufferExtended and ReadBuffer_common.

  2. pg_class.relallvisible accuracy after crash recovery. VACUUM updates relallvisible by calling visibilitymap_count after the pass. If the server crashes between a VACUUM completing and the catalog update being committed, relallvisible may undercount on recovery. Whether the planner re-checks or simply trusts the catalog value is not verified in this pass. Investigation path: trace how the planner uses relallvisible in costsize.c and whether it guards against stale counts.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”
  • InnoDB’s change buffer and page cleaner. InnoDB tracks per-page cleanliness implicitly via the undo log and a background page-cleaner thread; there is no explicit visibility bitmap. The contrast with PostgreSQL’s explicit two-bit per-page record is instructive: PostgreSQL pays the cost of maintaining the VM on every write in exchange for O(1) per-page cleanliness queries during VACUUM and IOS. A direct cost comparison across workload types (write-heavy vs. read-heavy vs. vacuum-sensitive) would sharpen the trade-off.

  • MySQL 8 InnoDB persistent statistics and innodb_stats_persistent. InnoDB stores page-cleanliness information differently but also exposes it to the planner for index range scans. Cross-referencing how each system updates planner-visible statistics after background cleanup passes would clarify the VM → relallvisible → IOS cost model in PostgreSQL.

  • VACUUM’s eager scanning mode (PG18). PostgreSQL 18 added an eager scanning sub-mode that proactively resets all-visible bits on pages that are likely to be modified soon (based on activity heuristics), then re-sets them after VACUUM. This reduces the latency of IOS availability for hot tables. The interaction with visibilitymap_get_status and the skip logic in heap_vac_scan_next_block is an interesting extension of the core VM mechanism described here. See postgres-vacuum.md for the full VACUUM pass description.

  • Predicate locking and SSI. The visibility map guarantees tuple visibility to all transactions, but SSI requires tracking finer-grained read predicates to detect rw-antidependency cycles. The VM cannot be used as a shortcut for SSI predicate checks. See postgres-ssi-predicate-locking.md (planned) for the interaction.

  • BRIN indexes and the visibility map. Block Range INdexes use per-range min/max statistics stored similarly to the VM (compact per-page metadata). Both the VM and BRIN metadata are updated by VACUUM. Whether BRIN’s brin_summarize_range can leverage VM bits to skip summarisation of clean pages is not verified in this pass.

(none — synthesised directly from the source tree)

  • Silberschatz, Korth, Sudarshan. Database System Concepts, 7th ed., §15.6 “Multiversion Concurrency Control”; §18.3 “Multiple Granularity”.
  • Petrov, Alex. Database Internals, ch. 5 §“MVCC Versions and Cleanup”.

Source code paths (REL_18_STABLE, commit 273fe94)

Section titled “Source code paths (REL_18_STABLE, commit 273fe94)”
  • src/backend/access/heap/visibilitymap.c — full implementation
  • src/include/access/visibilitymap.h — public API and VM_ALL_* macros
  • src/include/access/visibilitymapdefs.h — bit flag constants
  • src/include/storage/bufpage.hPD_ALL_VISIBLE, PageIsAllVisible
  • src/include/common/relpath.hVISIBILITYMAP_FORKNUM enum
  • src/backend/access/heap/heapam.clog_heap_visible
  • src/backend/access/heap/heapam_xlog.cheap_xlog_visible redo
  • src/backend/access/heap/vacuumlazy.c — VM-driven page skipping
  • src/backend/executor/nodeIndexonlyscan.cVM_ALL_VISIBLE in IOS
  • postgres-heap-am.md — heap tuple layout, PD_ALL_VISIBLE lifecycle, HOT
  • postgres-vacuum.md — full VACUUM pass, LVRelState, wraparound failsafe
  • postgres-mvcc-snapshots.md — snapshot acquisition, xmin/xmax visibility
  • postgres-xid-wraparound-freeze.md — freeze mechanism, FrozenTransactionId
  • postgres-buffer-manager.mdReadBufferExtended, RBM_ZERO_ON_ERROR
  • postgres-xlog-wal.md — WAL insertion, XLogRegisterBuffer
  • postgres-smgr-md.mdsmgrexists, smgrnblocks, VISIBILITYMAP_FORKNUM