PostgreSQL Visibility Map — The Two-Bit Per-Page Bitmap That Drives Vacuum Skipping and Index-Only Scans
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”MVCC-based engines never modify tuples in place: a delete sets the xmax field
on the old version; an update inserts a new version and sets xmax on the old
one. The old bytes remain on the page until a background cleaner determines that
no running transaction can see them any more, at which point it reclaims the
space. Database System Concepts (Silberschatz, 7e, §15.6 “Multiversion
Concurrency Control”) characterises this as the fundamental MVCC trade-off:
reads never block writes because each sees its own snapshot, but dead versions
accumulate until something sweeps them away.
Two questions arise for the sweeper:
- Which pages need attention? If a page has no dead tuples and every live tuple is visible to all current and future transactions, there is nothing to do. Re-reading it wastes I/O.
- Which pages can an index-only scan trust? An index-only scan returns attribute values straight from the index leaf page, skipping the heap entirely — but only when it can be certain the indexed row is visible to the query’s snapshot without consulting heap visibility information.
Both questions reduce to the same property: is every tuple on this heap page visible to every transaction that could ever run? Maintaining a compact, per-page record of that property — one that is cheaper to consult than reading the page itself — is the problem the visibility map solves.
The theoretical framing is a conservative approximation: when the bit is set, the property is guaranteed; when the bit is clear, the property may or may not hold. False negatives (cleared bit when the page is actually clean) cause unnecessary work but never wrong answers. False positives (set bit when the page has dead tuples) would be a correctness error, so the design always errs toward clearing bits when in doubt.
The second bit — all-frozen — addresses a distinct but related requirement:
XID wraparound prevention. PostgreSQL uses 32-bit transaction identifiers
that wrap around every ~2 billion transactions. Tuples whose xmin or xmax
predates the wraparound horizon become invisible unless they are frozen: their
XID fields are replaced with the special FrozenTransactionId (XID 2), which
is always considered “in the past.” A page where every tuple is already frozen
needs no freeze work even during an aggressive anti-wraparound VACUUM. The
all-frozen bit captures exactly that property.
Common DBMS Design
Section titled “Common DBMS Design”Hint bits: cheap per-tuple approximations
Section titled “Hint bits: cheap per-tuple approximations”Most MVCC engines attach hint bits directly to the tuple or row header to
cache visibility decisions. Once a transaction that inserted a row is known to
be committed, a backend can set an XMIN_COMMITTED hint bit on the tuple so
future backends skip the commit-log lookup. Hint bits are opportunistic and not
WAL-logged: they are set silently by any backend that evaluates visibility, and
re-derived if lost. PostgreSQL uses t_infomask bits (HEAP_XMIN_COMMITTED,
HEAP_XMAX_COMMITTED, etc.) for this. The visibility map is a coarser but more
accessible layer above hint bits: instead of scanning individual tuples, a
reader checks one bit for the entire page.
Separate-file bitmaps for coarse-grained state
Section titled “Separate-file bitmaps for coarse-grained state”Rather than storing per-page metadata inside the heap pages themselves (which would require reading and dirtying the heap to update the metadata), several engines keep auxiliary bitmaps in separate files that can be read without touching the data. PostgreSQL’s free-space map (FSM) uses this same pattern: a separate fork stores approximate free space per page so the inserter can find a candidate page without scanning the heap. The visibility map follows the same structural principle — a dedicated storage fork, one record per heap page, consulted before any heap I/O is attempted.
The two-phase pin protocol for crash safety
Section titled “The two-phase pin protocol for crash safety”When a cleaner marks a page as having a property (all-visible, all-frozen), it
must ensure that the backing on-disk state cannot be lost on a crash. The naive
sequence — update the page in place, then update the auxiliary bitmap — is
vulnerable: if the disk writes happen in the wrong order, the bitmap bit may be
set while the heap page on disk still contains dirty state. The standard
solution is to log both updates together in a single WAL record (or coordinate
their LSNs), so crash recovery can restore both atomically. PostgreSQL’s
XLOG_HEAP2_VISIBLE record encodes exactly this coordination.
Conservative clearing, logged setting
Section titled “Conservative clearing, logged setting”A symmetric asymmetry appears in nearly every such system: clearing the bit (saying “this page may be dirty”) is always safe to do silently, because the result is a false negative — we do more work than necessary, but never return wrong data. Setting the bit (asserting “this page is clean”) requires a durability guarantee, because a false positive could cause a reader to skip a heap fetch and return stale data. The logging asymmetry (set = WAL-logged, clear = no WAL needed but must accompany the heap modification) flows directly from this.
Theory ↔ PostgreSQL mapping
Section titled “Theory ↔ PostgreSQL mapping”| Concept | PostgreSQL name |
|---|---|
| Per-page “all dead-tuples reclaimed” bit | VISIBILITYMAP_ALL_VISIBLE (0x01) in VM fork |
| Per-page “all tuples frozen” bit | VISIBILITYMAP_ALL_FROZEN (0x02) in VM fork |
| Page-level hint mirroring the VM bit | PD_ALL_VISIBLE flag in PageHeaderData.pd_flags |
| Auxiliary bitmap file | VISIBILITYMAP_FORKNUM (fork 2) of the relation |
| WAL record coordinating VM + heap LSN | XLOG_HEAP2_VISIBLE / log_heap_visible |
| Consumer: vacuum skip | visibilitymap_get_status in vacuumlazy.c |
| Consumer: index-only scan skip | VM_ALL_VISIBLE macro in nodeIndexonlyscan.c |
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”Storage layout: a two-bit bitmap in fork 2
Section titled “Storage layout: a two-bit bitmap in fork 2”Each relation has up to four on-disk storage forks:
fork 0 MAIN_FORKNUM — heap data pagesfork 1 FSM_FORKNUM — free-space mapfork 2 VISIBILITYMAP_FORKNUM — visibility map ← this docfork 3 INIT_FORKNUM — unlogged-table init forkThe visibility map fork stores a compact bitmap: two bits per heap page,
packed into standard 8 KB buffer-manager pages with no special header beyond
the standard PageHeaderData. The two bits for heap block N live at a
well-defined position computed by three macros:
// HEAPBLK_TO_MAPBLOCK / HEAPBLK_TO_MAPBYTE / HEAPBLK_TO_OFFSET — visibilitymap.c#define MAPSIZE (BLCKSZ - MAXALIGN(SizeOfPageHeaderData))#define HEAPBLOCKS_PER_BYTE (BITS_PER_BYTE / BITS_PER_HEAPBLOCK) /* 4 */#define HEAPBLOCKS_PER_PAGE (MAPSIZE * HEAPBLOCKS_PER_BYTE)
#define HEAPBLK_TO_MAPBLOCK(x) ((x) / HEAPBLOCKS_PER_PAGE)#define HEAPBLK_TO_MAPBYTE(x) (((x) % HEAPBLOCKS_PER_PAGE) / HEAPBLOCKS_PER_BYTE)#define HEAPBLK_TO_OFFSET(x) (((x) % HEAPBLOCKS_PER_BYTE) * BITS_PER_HEAPBLOCK)BITS_PER_HEAPBLOCK is 2 (from visibilitymapdefs.h), so each byte covers
four heap blocks, and a single 8 KB VM page covers approximately 32,736 heap
pages. The bit layout within each byte is:
bit 0 (mask 0x01): VISIBILITYMAP_ALL_VISIBLEbit 1 (mask 0x02): VISIBILITYMAP_ALL_FROZENThe two flag constants are:
#define VISIBILITYMAP_ALL_VISIBLE 0x01#define VISIBILITYMAP_ALL_FROZEN 0x02#define VISIBILITYMAP_VALID_BITS 0x03The all-frozen bit must never be set without the all-visible bit also being set
(enforced by an assertion in visibilitymap_set). A page that is frozen is
necessarily also all-visible.
Figure 1 — VM bit layout per heap block pair
flowchart LR
subgraph "one byte in VM page"
B7["bit 7\nblock N+3\nfrozen"]
B6["bit 6\nblock N+3\nvisible"]
B5["bit 5\nblock N+2\nfrozen"]
B4["bit 4\nblock N+2\nvisible"]
B3["bit 3\nblock N+1\nfrozen"]
B2["bit 2\nblock N+1\nvisible"]
B1["bit 1\nblock N\nfrozen"]
B0["bit 0\nblock N\nvisible"]
end
Figure 1 — Each byte in the VM page stores the all-visible (low bit) and all-frozen (high bit) flags for four consecutive heap blocks. Block N uses bits 0–1, block N+1 uses bits 2–3, and so on.
The dual representation: VM bit and PD_ALL_VISIBLE
Section titled “The dual representation: VM bit and PD_ALL_VISIBLE”The visibility map bit is not the only place the all-visible property is
recorded. The heap page itself carries a PD_ALL_VISIBLE flag in
PageHeaderData.pd_flags:
// PageIsAllVisible / PageSetAllVisible / PageClearAllVisible — src/include/storage/bufpage.h#define PD_ALL_VISIBLE 0x0004 /* all tuples on page are visible to everyone */The two representations must be kept in sync. When a heap-modifying operation
clears the VM bit, it must also clear PD_ALL_VISIBLE on the heap page, and
it does so in the same critical section and the same WAL record as the heap
change itself. When VACUUM sets the VM bit, it sets PD_ALL_VISIBLE on the
heap page first, then calls visibilitymap_set. The invariant the system
maintains is:
VM bit set ⟹ PD_ALL_VISIBLE set.
VM bit clear does not imply PD_ALL_VISIBLE clear.
If a crash occurs after the VM bit reaches disk but before PD_ALL_VISIBLE is
written to the heap page, WAL replay of the XLOG_HEAP2_VISIBLE record
restores PD_ALL_VISIBLE on the heap page, maintaining the invariant.
Setting a bit: the two-phase pin protocol
Section titled “Setting a bit: the two-phase pin protocol”Setting a VM bit is a two-step operation:
- Pin the VM page (
visibilitymap_pin) — acquire a buffer-manager pin on the VM page that covers the target heap block. This may require I/O to read the VM page from disk. Critically, this step happens before locking the heap page, because holding a buffer lock during I/O is forbidden. - Set the bit (
visibilitymap_set) — with the heap page buffer-locked, set the bit and emit the WAL record.
// visibilitymap_pin — src/backend/access/heap/visibilitymap.cvoidvisibilitymap_pin(Relation rel, BlockNumber heapBlk, Buffer *vmbuf){ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk); if (BufferIsValid(*vmbuf)) { if (BufferGetBlockNumber(*vmbuf) == mapBlock) return; /* already pinned — reuse */ ReleaseBuffer(*vmbuf); } *vmbuf = vm_readbuf(rel, mapBlock, true); /* extend if needed */}// visibilitymap_set — src/backend/access/heap/visibilitymap.cuint8visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf, XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid, uint8 flags){ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk); uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk); uint8 *map; uint8 status;
/* ... assertions omitted ... */ page = BufferGetPage(vmBuf); map = (uint8 *) PageGetContents(page); LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS; if (flags != status) { START_CRIT_SECTION(); map[mapByte] |= (flags << mapOffset); MarkBufferDirty(vmBuf); if (RelationNeedsWAL(rel)) { if (XLogRecPtrIsInvalid(recptr)) recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags); if (XLogHintBitIsNeeded()) PageSetLSN(BufferGetPage(heapBuf), recptr); PageSetLSN(page, recptr); } END_CRIT_SECTION(); } LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK); return status; /* previous status, for caller's benefit */}The function returns the previous status of the bits, which VACUUM uses to
decide whether to count the page toward pg_class.relallvisible.
Clearing a bit: no WAL needed, must accompany heap modification
Section titled “Clearing a bit: no WAL needed, must accompany heap modification”Clearing a VM bit is simpler and requires no WAL:
// visibilitymap_clear — src/backend/access/heap/visibilitymap.cboolvisibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags){ int mapByte = HEAPBLK_TO_MAPBYTE(heapBlk); int mapOffset = HEAPBLK_TO_OFFSET(heapBlk); uint8 mask = flags << mapOffset; char *map; bool cleared = false;
/* Assert: never clear all_visible while leaving all_frozen set */ Assert(flags != VISIBILITYMAP_ALL_VISIBLE); /* ... buffer validation ... */ LockBuffer(vmbuf, BUFFER_LOCK_EXCLUSIVE); map = PageGetContents(BufferGetPage(vmbuf)); if (map[mapByte] & mask) { map[mapByte] &= ~mask; MarkBufferDirty(vmbuf); cleared = true; } LockBuffer(vmbuf, BUFFER_LOCK_UNLOCK); return cleared;}The callers in heapam.c — heap_insert, heap_update, heap_delete, and
heap_lock_tuple — invoke visibilitymap_clear inside the same critical
section as the heap page modification, holding the heap buffer lock throughout.
This ensures that WAL replay of the heap operation also clears the VM bit, even
if a crash occurred before the VM page’s dirty buffer was flushed.
Reading a bit: lock-free, caller bears responsibility
Section titled “Reading a bit: lock-free, caller bears responsibility”// visibilitymap_get_status — src/backend/access/heap/visibilitymap.cuint8visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf){ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk); uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk); char *map; uint8 result;
/* ... pin the VM page if needed ... */ map = PageGetContents(BufferGetPage(*vmbuf)); result = ((map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS); return result;}The read is performed without any lock on the VM page. The comment in the source is explicit: “somebody else could change the bit just after we look at it.” This is intentional — a stale read of the VM bit produces a conservative answer (it may say “not all-visible” when the page actually is), which is safe. The caller is responsible for handling races, and VACUUM uses a subsequent locked re-check before acting on the status.
The convenience macros VM_ALL_VISIBLE and VM_ALL_FROZEN wrap this function:
// VM_ALL_VISIBLE / VM_ALL_FROZEN — src/include/access/visibilitymap.h#define VM_ALL_VISIBLE(r, b, v) \ ((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_VISIBLE) != 0)#define VM_ALL_FROZEN(r, b, v) \ ((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_FROZEN) != 0)How VACUUM uses the VM
Section titled “How VACUUM uses the VM”VACUUM’s inner loop in vacuumlazy.c calls visibilitymap_get_status before
reading each heap page. A page with VISIBILITYMAP_ALL_VISIBLE set can be
skipped for dead-tuple removal; a page with VISIBILITYMAP_ALL_FROZEN set can
be skipped even during aggressive (anti-wraparound) VACUUM. The skip logic is:
// heap_vac_scan_next_block (inner loop) — src/backend/access/heap/vacuumlazy.cuint8 mapbits = visibilitymap_get_status(vacrel->rel, next_unskippable_block, &next_unskippable_vmbuffer);next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;if (!next_unskippable_allvis) break; /* must process this block *//* frozen check follows for aggressive mode ... */When VACUUM determines that all tuples on a page are visible and optionally
frozen, it calls visibilitymap_set with the appropriate flags. It first calls
visibilitymap_count at the start and end of the pass to update
pg_class.relallvisible, which the planner uses to estimate index-only scan
benefits.
Figure 2 — VACUUM page-skip decision flow
flowchart TD
A["next heap block"] --> B{"VM: all-visible?"}
B -->|"no"| C["read heap page\nprocess dead tuples"]
B -->|"yes"| D{"aggressive vacuum?"}
D -->|"no"| E["skip block entirely"]
D -->|"yes"| F{"VM: all-frozen?"}
F -->|"yes"| E
F -->|"no"| G["read heap page\nfreeze old XIDs\nset all-frozen if done"]
C --> H{"page now all-visible?"}
H -->|"yes"| I["visibilitymap_set\nALL_VISIBLE\nor ALL_FROZEN"]
H -->|"no"| A
G --> A
E --> A
I --> A
Figure 2 — VACUUM’s inner loop checks the VM bit before issuing any heap I/O. All-visible pages are skipped entirely in normal mode; all-frozen pages are also skipped in aggressive mode.
How index-only scans use the VM
Section titled “How index-only scans use the VM”An index-only scan (IOS) returns attribute values from the index leaf without a heap fetch — but only when the heap tuple is known visible to the scan’s snapshot. Without the VM, the executor would need to fetch the heap page for every index entry just to verify visibility. The VM eliminates most of those fetches:
// IndexOnlyNext — src/backend/executor/nodeIndexonlyscan.cif (!VM_ALL_VISIBLE(scandesc->heapRelation, ItemPointerGetBlockNumber(tid), &node->ioss_VMBuffer)){ /* must visit heap to check visibility */ InstrCountTuples2(node, 1); if (!index_fetch_heap(scandesc, node->ioss_TableSlot)) continue; /* no visible tuple, try next index entry */ /* ... */}When VM_ALL_VISIBLE returns true, the executor trusts that the tuple is
visible to all transactions and skips the heap fetch entirely. On a large,
mostly-static table where VACUUM has set VM bits on most pages, an IOS can
answer a query entirely from the index with zero heap reads.
Figure 3 — Index-only scan visibility decision
flowchart TD
IDX["index leaf entry\n(tid, attributes)"] --> VM{"VM_ALL_VISIBLE\n(tid.block)?"}
VM -->|"true"| RET["return attributes\nfrom index\n(no heap fetch)"]
VM -->|"false"| HEAP["fetch heap tuple\ncheck MVCC visibility"]
HEAP --> VIS{"tuple visible\nto snapshot?"}
VIS -->|"yes"| RET2["return attributes\nfrom heap"]
VIS -->|"no"| SKIP["skip — try\nnext index entry"]
Figure 3 — When the VM bit is set for a heap block, the index-only scan returns data directly from the index without touching the heap. When the bit is clear, it falls back to heap visibility checking.
WAL logging: XLOG_HEAP2_VISIBLE
Section titled “WAL logging: XLOG_HEAP2_VISIBLE”When visibilitymap_set emits a WAL record, it calls log_heap_visible in
heapam.c:
// log_heap_visible — src/backend/access/heap/heapam.cXLogRecPtrlog_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer, TransactionId snapshotConflictHorizon, uint8 vmflags){ xl_heap_visible xlrec; xlrec.snapshotConflictHorizon = snapshotConflictHorizon; xlrec.flags = vmflags; if (RelationIsAccessibleInLogicalDecoding(rel)) xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL; XLogBeginInsert(); XLogRegisterData(&xlrec, SizeOfHeapVisible); XLogRegisterBuffer(0, vm_buffer, 0); flags = REGBUF_STANDARD; if (!XLogHintBitIsNeeded()) flags |= REGBUF_NO_IMAGE; XLogRegisterBuffer(1, heap_buffer, flags); recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE); return recptr;}The record registers both the VM buffer and the heap buffer. During replay,
heap_xlog_visible in heapam_xlog.c sets PD_ALL_VISIBLE on the heap page
and sets the VM bit. The snapshotConflictHorizon field is used on Hot Standby
to resolve recovery conflicts: if an index-only scan on the standby has an xmin
horizon older than snapshotConflictHorizon, the scan is cancelled before it
can return incorrect results based on the newly-set all-visible bit.
The additional VISIBILITYMAP_XLOG_CATALOG_REL flag is set for user catalog
tables to support logical decoding on standbys — logical decoding must track
visibility changes to catalog pages because it needs to reconstruct row images.
VM file extension and truncation
Section titled “VM file extension and truncation”When a heap relation grows, the VM fork may also need to grow. vm_readbuf
(the internal reader) calls vm_extend if the requested VM block number
exceeds the cached size:
// vm_extend — src/backend/access/heap/visibilitymap.cstatic Buffervm_extend(Relation rel, BlockNumber vm_nblocks){ Buffer buf; buf = ExtendBufferedRelTo(BMR_REL(rel), VISIBILITYMAP_FORKNUM, NULL, EB_CREATE_FORK_IF_NEEDED | EB_CLEAR_SIZE_CACHE, vm_nblocks, RBM_ZERO_ON_ERROR); CacheInvalidateSmgr(RelationGetSmgr(rel)->smgr_rlocator); return buf;}The EB_CREATE_FORK_IF_NEEDED flag creates the VM fork on first use.
CacheInvalidateSmgr broadcasts a shared-invalidation message so other
backends close their cached smgr references to the relation, avoiding stale
file-size state.
When a relation is truncated, visibilitymap_prepare_truncate clears the
trailing bits in the last surviving VM page and returns the new VM size in
blocks. If the new heap size falls exactly on a VM page boundary, no bit
clearing is needed. The actual smgrtruncate call is the caller’s
responsibility (TRUNCATE and CLUSTER code paths).
Counting VM bits for the planner
Section titled “Counting VM bits for the planner”// visibilitymap_count — src/backend/access/heap/visibilitymap.cvoidvisibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen){ for (mapBlock = 0;; mapBlock++) { mapBuffer = vm_readbuf(rel, mapBlock, false); if (!BufferIsValid(mapBuffer)) break; map = (uint64 *) PageGetContents(BufferGetPage(mapBuffer)); nvisible += pg_popcount_masked((const char *) map, MAPSIZE, VISIBLE_MASK8); if (all_frozen) nfrozen += pg_popcount_masked((const char *) map, MAPSIZE, FROZEN_MASK8); ReleaseBuffer(mapBuffer); } *all_visible = nvisible; if (all_frozen) *all_frozen = nfrozen;}VISIBLE_MASK8 (0x55) selects the all-visible bits (the low bit of each
two-bit pair) and FROZEN_MASK8 (0xAA) selects the all-frozen bits.
pg_popcount_masked counts set bits in the byte range after ANDing with the
mask. VACUUM calls this function before and after the pass to update
pg_class.relallvisible, which the query planner uses to estimate the fraction
of pages that an index-only scan can skip.
Source Walkthrough
Section titled “Source Walkthrough”API surface (public functions)
Section titled “API surface (public functions)”-
visibilitymap_clear— clear specified bits for one page; returnstrueif any bits were actually cleared. Caller must hold a buffer pin on the correct VM page (obtained viavisibilitymap_pin) and the heap page must be buffer-locked. No WAL emitted here; correctness depends on the caller’s heap-modification WAL record also clearing the bit at redo time. -
visibilitymap_pin— acquire (or reuse) a buffer-manager pin on the VM page coveringheapBlk. Must be called beforevisibilitymap_setand before locking the heap page. Extends the VM fork if the page does not yet exist. -
visibilitymap_pin_ok— test whether a previously pinned buffer still coversheapBlk. Used to avoid re-pinning when multiple operations on the same VM page are batched. -
visibilitymap_set— set one or both bits on a previously pinned VM page. Caller must have setPD_ALL_VISIBLEon the heap page and hold the heap buffer locked. EmitsXLOG_HEAP2_VISIBLEwhen WAL is required. Returns the previous bit status. -
visibilitymap_get_status— return the two-bit status forheapBlk. No lock taken; lock-free read of a single byte (atomic on all supported architectures). Caller must handle the possibility of a stale read. -
visibilitymap_count— scan the entire VM fork and count all-visible and all-frozen pages. No lock; approximate. Used by VACUUM to updatepg_class.relallvisible. -
visibilitymap_prepare_truncate— prepare for a relation truncation: clear trailing bits in the last surviving VM page, return the new VM block count. ReturnsInvalidBlockNumberif no truncation of the VM fork is required. -
visibilitymap_truncation_length— pure computation: given a proposed heap truncation length, return the correct VM truncation length. No side effects.
Internal helpers
Section titled “Internal helpers”-
vm_readbuf— read (or extend) a VM page usingReadBufferExtendedwithRBM_ZERO_ON_ERROR. Initialises new pages viaPageInit. Caches the VM fork block count insmgr_cached_nblocks[VISIBILITYMAP_FORKNUM]to avoid repeatedsmgrnblockscalls. -
vm_extend— extend the VM fork to at leastvm_nblocksblocks usingExtendBufferedRelTo. Sends aCacheInvalidateSmgrmessage after extension.
Key call sites in callers
Section titled “Key call sites in callers”-
vacuumlazy.c— callsvisibilitymap_pinbefore the main heap scan,visibilitymap_get_statusper block in the skip-decision loop,visibilitymap_setafter determining a page is all-visible/all-frozen, andvisibilitymap_countat the start and end of the VACUUM pass. -
heapam.c—heap_insert,heap_update,heap_delete,heap_lock_tuple, and related functions callvisibilitymap_clear(with a pre-pinned buffer) when they modify a page that had its VM bit set. -
nodeIndexonlyscan.c—IndexOnlyNextcallsVM_ALL_VISIBLEfor every candidate index entry to decide whether to skip the heap fetch.
Position-hint table (commit 273fe94, 2026-06-05)
Section titled “Position-hint table (commit 273fe94, 2026-06-05)”| Symbol | File | Line |
|---|---|---|
BITS_PER_HEAPBLOCK | src/include/access/visibilitymapdefs.h | 17 |
VISIBILITYMAP_ALL_VISIBLE | src/include/access/visibilitymapdefs.h | 20 |
VISIBILITYMAP_ALL_FROZEN | src/include/access/visibilitymapdefs.h | 21 |
VISIBILITYMAP_VALID_BITS | src/include/access/visibilitymapdefs.h | 22 |
MAPSIZE | src/backend/access/heap/visibilitymap.c | 113 |
HEAPBLOCKS_PER_PAGE | src/backend/access/heap/visibilitymap.c | 117 |
HEAPBLK_TO_MAPBLOCK | src/backend/access/heap/visibilitymap.c | 120 |
HEAPBLK_TO_MAPBYTE | src/backend/access/heap/visibilitymap.c | 122 |
HEAPBLK_TO_OFFSET | src/backend/access/heap/visibilitymap.c | 123 |
VISIBLE_MASK8 | src/backend/access/heap/visibilitymap.c | 126 |
FROZEN_MASK8 | src/backend/access/heap/visibilitymap.c | 127 |
visibilitymap_clear | src/backend/access/heap/visibilitymap.c | 145 |
visibilitymap_pin | src/backend/access/heap/visibilitymap.c | 193 |
visibilitymap_pin_ok | src/backend/access/heap/visibilitymap.c | 217 |
visibilitymap_set | src/backend/access/heap/visibilitymap.c | 242 |
visibilitymap_get_status | src/backend/access/heap/visibilitymap.c | 329 |
visibilitymap_count | src/backend/access/heap/visibilitymap.c | 382 |
visibilitymap_prepare_truncate | src/backend/access/heap/visibilitymap.c | 428 |
visibilitymap_truncation_length | src/backend/access/heap/visibilitymap.c | 506 |
vm_readbuf | src/backend/access/heap/visibilitymap.c | 524 |
vm_extend | src/backend/access/heap/visibilitymap.c | 619 |
PD_ALL_VISIBLE | src/include/storage/bufpage.h | 190 |
PageIsAllVisible | src/include/storage/bufpage.h | 431 |
VM_ALL_VISIBLE | src/include/access/visibilitymap.h | 24 |
VM_ALL_FROZEN | src/include/access/visibilitymap.h | 27 |
log_heap_visible | src/backend/access/heap/heapam.c | 8813 |
heap_xlog_visible | src/backend/access/heap/heapam_xlog.c | 182 |
VISIBILITYMAP_FORKNUM | src/include/common/relpath.h | 61 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”Verified facts
Section titled “Verified facts”-
The VM stores exactly two bits per heap page, with no page-level header beyond the standard
PageHeaderData. Verified by readingMAPSIZE,HEAPBLOCKS_PER_BYTE, andHEAPBLOCKS_PER_PAGEinvisibilitymap.c.MAPSIZE = BLCKSZ - MAXALIGN(SizeOfPageHeaderData)andHEAPBLOCKS_PER_BYTE = BITS_PER_BYTE / BITS_PER_HEAPBLOCK = 4. -
Setting the all-visible bit always requires WAL when the relation needs WAL; clearing never emits its own WAL record. Verified in
visibilitymap_set(emitsXLOG_HEAP2_VISIBLE) andvisibilitymap_clear(onlyMarkBufferDirty, noXLogInsert). The correctness argument for clearing is in the source comment: callers ensure WAL replay of the heap change also clears the bit. -
All-frozen must not be set without all-visible. Enforced by
Assert(flags != VISIBILITYMAP_ALL_FROZEN)invisibilitymap_setandAssert(flags != VISIBILITYMAP_ALL_VISIBLE)(withVISIBILITYMAP_ALL_FROZENstill set) invisibilitymap_clear. Verified by reading both assertion sites. -
visibilitymap_get_statustakes no lock. Verified by reading the function body: noLockBuffercall. The source comment explicitly warns the caller about the race. On architectures where a byte read is atomic (all supported PostgreSQL platforms), the race is safe — the worst outcome is a stale conservative read. -
The
snapshotConflictHorizoninxl_heap_visibleis used to cancel index-only scans on Hot Standby. Verified inheap_xlog_visible(heapam_xlog.c:182):ResolveRecoveryConflictWithSnapshotis called whenInHotStandbyis true. -
VISIBILITYMAP_XLOG_CATALOG_RELis set only in the WAL record, never stored in the VM bitmap. Verified: the flag is or’d intoxlrec.flagsinlog_heap_visiblebut not passed to the on-disk bit write. The constant is defined invisibilitymapdefs.hwith a comment explicitly notingVISIBILITYMAP_XLOG_*constants must not be passed tovisibilitymap_set. -
visibilitymap_countusespg_popcount_maskedwithVISIBLE_MASK8=0x55andFROZEN_MASK8=0xAA. Verified at lines 373–375 invisibilitymap.c. The masks correctly isolate the low and high bits of each two-bit pair.
Open questions
Section titled “Open questions”-
Lazy initialization of the VM fork.
vm_readbufpassesRBM_ZERO_ON_ERRORrather than failing if the VM page is corrupt or missing. This means a corrupt VM page silently becomes all-zeros (all bits clear), which is conservative and safe. Whether there is any mechanism to detect or report such silent zeroing is not verified. Investigation path: trace theRBM_ZERO_ON_ERRORpath throughReadBufferExtendedandReadBuffer_common. -
pg_class.relallvisibleaccuracy after crash recovery. VACUUM updatesrelallvisibleby callingvisibilitymap_countafter the pass. If the server crashes between a VACUUM completing and the catalog update being committed,relallvisiblemay undercount on recovery. Whether the planner re-checks or simply trusts the catalog value is not verified in this pass. Investigation path: trace how the planner usesrelallvisibleincostsize.cand whether it guards against stale counts.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”-
InnoDB’s change buffer and page cleaner. InnoDB tracks per-page cleanliness implicitly via the undo log and a background page-cleaner thread; there is no explicit visibility bitmap. The contrast with PostgreSQL’s explicit two-bit per-page record is instructive: PostgreSQL pays the cost of maintaining the VM on every write in exchange for O(1) per-page cleanliness queries during VACUUM and IOS. A direct cost comparison across workload types (write-heavy vs. read-heavy vs. vacuum-sensitive) would sharpen the trade-off.
-
MySQL 8 InnoDB persistent statistics and
innodb_stats_persistent. InnoDB stores page-cleanliness information differently but also exposes it to the planner for index range scans. Cross-referencing how each system updates planner-visible statistics after background cleanup passes would clarify the VM →relallvisible→ IOS cost model in PostgreSQL. -
VACUUM’s
eager scanningmode (PG18). PostgreSQL 18 added an eager scanning sub-mode that proactively resets all-visible bits on pages that are likely to be modified soon (based on activity heuristics), then re-sets them after VACUUM. This reduces the latency of IOS availability for hot tables. The interaction withvisibilitymap_get_statusand the skip logic inheap_vac_scan_next_blockis an interesting extension of the core VM mechanism described here. Seepostgres-vacuum.mdfor the full VACUUM pass description. -
Predicate locking and SSI. The visibility map guarantees tuple visibility to all transactions, but SSI requires tracking finer-grained read predicates to detect rw-antidependency cycles. The VM cannot be used as a shortcut for SSI predicate checks. See
postgres-ssi-predicate-locking.md(planned) for the interaction. -
BRIN indexes and the visibility map. Block Range INdexes use per-range min/max statistics stored similarly to the VM (compact per-page metadata). Both the VM and BRIN metadata are updated by VACUUM. Whether BRIN’s
brin_summarize_rangecan leverage VM bits to skip summarisation of clean pages is not verified in this pass.
Sources
Section titled “Sources”Raw files consumed
Section titled “Raw files consumed”(none — synthesised directly from the source tree)
Textbook and paper references
Section titled “Textbook and paper references”- Silberschatz, Korth, Sudarshan. Database System Concepts, 7th ed., §15.6 “Multiversion Concurrency Control”; §18.3 “Multiple Granularity”.
- Petrov, Alex. Database Internals, ch. 5 §“MVCC Versions and Cleanup”.
Source code paths (REL_18_STABLE, commit 273fe94)
Section titled “Source code paths (REL_18_STABLE, commit 273fe94)”src/backend/access/heap/visibilitymap.c— full implementationsrc/include/access/visibilitymap.h— public API andVM_ALL_*macrossrc/include/access/visibilitymapdefs.h— bit flag constantssrc/include/storage/bufpage.h—PD_ALL_VISIBLE,PageIsAllVisiblesrc/include/common/relpath.h—VISIBILITYMAP_FORKNUMenumsrc/backend/access/heap/heapam.c—log_heap_visiblesrc/backend/access/heap/heapam_xlog.c—heap_xlog_visibleredosrc/backend/access/heap/vacuumlazy.c— VM-driven page skippingsrc/backend/executor/nodeIndexonlyscan.c—VM_ALL_VISIBLEin IOS
Cross-references within this KB
Section titled “Cross-references within this KB”postgres-heap-am.md— heap tuple layout,PD_ALL_VISIBLElifecycle, HOTpostgres-vacuum.md— full VACUUM pass,LVRelState, wraparound failsafepostgres-mvcc-snapshots.md— snapshot acquisition, xmin/xmax visibilitypostgres-xid-wraparound-freeze.md— freeze mechanism,FrozenTransactionIdpostgres-buffer-manager.md—ReadBufferExtended,RBM_ZERO_ON_ERRORpostgres-xlog-wal.md— WAL insertion,XLogRegisterBufferpostgres-smgr-md.md—smgrexists,smgrnblocks,VISIBILITYMAP_FORKNUM