PostgreSQL Vacuum & Visibility — From Plain Lazy Vacuum to HOT, Freeze Maps, and Parallelism
Contents:
- Why this subsystem had to evolve (the original limitation)
- Timeline
- Era 0 — Plain lazy vacuum (the 8.x baseline)
- Era 1 — HOT: heap-only tuples (8.3, 2007)
- Era 2 — The visibility map: skip clean pages (8.4, 2009)
- Era 3 — The all-frozen bit / freeze map: skip frozen pages (9.6, 2016)
- Era 4 — Parallel index vacuum (13, 2020)
- Era 5 — The wraparound failsafe (14, 2021)
- Era 6 — TidStore: the dead-TID radix store (17, 2024)
- Where it stands at REL_18
- Sources
Why this subsystem had to evolve (the original limitation)
Section titled “Why this subsystem had to evolve (the original limitation)”PostgreSQL is a no-overwrite MVCC engine. A DELETE does not erase a row and
an UPDATE does not rewrite one in place; instead the storage layer stamps the
old version’s xmax and (for an update) writes a brand-new version elsewhere.
The mechanism is covered in the current-state module doc
postgres-heap-am.md: every tuple lives at an
ItemId slot on a heap page, visibility is decided per tuple by comparing
xmin/xmax against a snapshot, and the dead bytes simply sit there until
something sweeps them. That something is vacuum.
This design buys the headline MVCC property — readers never block writers and writers never block readers, because each transaction reads its own snapshot — but it hands the implementation two open-ended bills that grow with write volume:
-
Dead-tuple reclamation. Updates and deletes leak space. A heavily updated table bloats without bound, and index/heap scans walk ever-longer runs of dead versions, unless a background process reclaims them. The hard part is that “this version is dead” is a global predicate: vacuum must know that no snapshot anywhere in the cluster can still see the row. That horizon (
OldestXmin) is computed across every active backend. -
Transaction-ID wraparound. XIDs are 32-bit. The visibility comparator treats the XID space as a circle with a 2^31 horizon, so a tuple stamped with an
xminmore than ~2.1 billion transactions in the past would suddenly look like it came from the future and become invisible — silent data loss. The defence is to freeze old tuples (rewrite theirxminto a value the comparator always treats as “in the past”) before they age out. Freezing is only correct if it touches every old tuple, so it is fundamentally a full-table obligation. The mechanism lives inpostgres-xid-wraparound-freeze.md.
The original implementation met both bills with the bluntest possible tool: a single-threaded process that read every page of the table, collected the dead line pointers into a flat array, walked every index to delete the matching entries, then went back and reaped the dead heap line pointers. Two sub-problems, one full scan, no shortcuts. That worked at the table sizes of 2005 and fell apart as tables grew, as update rates climbed, and as the anti-wraparound obligation forced repeated full scans over data that had not changed in months.
Every era below is a response to a specific way that brute-force loop did not scale. Read as a whole, the arc moves through four distinct strategies:
- Don’t make garbage (HOT) — avoid creating index bloat and let a cheap single-page prune reclaim space without vacuum at all.
- Don’t read what’s clean (visibility map) — record per page whether vacuum can skip it, so the scan is proportional to churn, not to table size.
- Don’t re-read what’s static (all-frozen bit) — extend that skip to the anti-wraparound scan, the one obligation the visibility map originally could not satisfy.
- Parallelise and survive (parallel index vacuum, failsafe, TidStore) — when the scan is unavoidable, spread index cleanup across workers, refuse to let wraparound win under emergency pressure, and stop wasting memory on the dead-TID bookkeeping.
Timeline
Section titled “Timeline”timeline
title PostgreSQL Vacuum and Visibility Evolution
section Brute force
8.x baseline : Plain lazy vacuum : Full heap scan, full index scan, flat dead-TID array
section Avoid and skip
8.3 (2007) : HOT heap-only tuples : Same-page update chains, no new index entry, single-page prune
8.4 (2009) : Visibility map : One all-visible bit per page, vacuum skips clean pages, index-only scans
section Skip the static
9.6 (2016) : All-frozen bit / freeze map : Second VM bit, anti-wraparound scan skips frozen pages
section Scale and survive
13 (2020) : Parallel index vacuum : Per-index parallel workers via DSM
14 (2021) : Wraparound failsafe : Bypass index vacuuming when relfrozenxid is dangerously old
17 (2024) : TidStore radix store : Compressed adaptive-radix dead-TID store replaces flat array
Era 0 — Plain lazy vacuum (the 8.x baseline)
Section titled “Era 0 — Plain lazy vacuum (the 8.x baseline)”What it was. Before any of the optimisations below, VACUUM (the
non-FULL “lazy” variant introduced in 7.2 to replace the old rewrite-the-
whole-table vacuum) ran a fixed three-phase loop over the entire relation:
- Phase I — heap scan. Read every heap page from block 0 to the end.
On each page, run HOT-less pruning logic and decide, tuple by tuple,
which line pointers are dead (deleted before
OldestXminand invisible to every snapshot). Append each dead item’s(block, offset)TID to an in-memory array sized bymaintenance_work_mem. While here, freeze any tuple older than the freeze cutoff. - Phase II — index vacuum. For each index on the table, call its
ambulkdeleteentry point, which scans the entire index and removes any entry pointing at a TID in the dead array. This is the expensive phase: it isO(index size)per index, and it runs once per fill of the dead-TID array. - Phase III — heap reap. Walk the heap pages that held dead items again
and turn their
LP_DEADline pointers intoLP_UNUSED, making the space reusable, and update the free space map.
If the dead-TID array filled before the heap scan finished, vacuum flushed it
by running Phases II and III early, then resumed the heap scan — meaning a big
table with a small maintenance_work_mem could scan each index several times
in one vacuum.
Why it did not scale. Three structural costs were baked in:
- Phase I is
O(table size), always. Even a 500 GB table that received ten updates last night was read in full. There was no per-page record of “nothing changed here,” so there was nothing to skip. - Every ordinary
UPDATEcreated index bloat. Changing one non-indexed column still wrote a new heap tuple and a new entry in every index, because the index pointed at the physical TID and the new version had a new TID. The indexes grew as fast as the heap. - The dead-TID array was a flat, fixed-width array. Each dead tuple cost a
full
ItemPointerData(6 bytes, later padded), capped bymaintenance_work_ mem. A vacuum that found more dead tuples than the array could hold paid for extra full index scans.
And on top of all that, the anti-wraparound obligation (Era 3’s subject) forced periodic aggressive vacuums that could not skip anything, because freezing must visit every unfrozen tuple. The baseline had no way to remember that a page was already entirely frozen, so an anti-wraparound vacuum re-read even cold, static, decade-old data.
A concrete worked example makes the pain tangible. Suppose a 200 GB table with
six indexes, maintenance_work_mem set so the dead-TID array holds ~11 million
TIDs, and an overnight batch that deletes 50 million rows. The baseline vacuum
reads all 200 GB of heap (Phase I), and because 50 million dead TIDs overflow
the 11-million-entry array roughly five times, it runs Phase II — six full index
scans — five separate times, i.e. thirty full index scans in one vacuum,
interleaved with five Phase III reaping passes. Nothing about this is
proportional to anything useful; it is proportional to (table size) + (dead
tuples / array capacity) × (number of indexes). Each era below knocks out one
of these multipliers.
The structural shape — three serial phases driven by lazy_scan_heap, a flat
dead-TID array, per-index bulk delete — is still the skeleton of
postgres-vacuum.md today. Everything below is a layer
bolted onto that skeleton to make one of its three costs disappear.
Era 1 — HOT: heap-only tuples (8.3, 2007)
Section titled “Era 1 — HOT: heap-only tuples (8.3, 2007)”The change. PostgreSQL 8.3 introduced HOT — Heap-Only Tuples — the
first big lever, and the only one in this whole arc that attacks the problem
before vacuum runs at all. The premise: most updates do not change indexed
columns. If an UPDATE touches no column that any index covers, there is no
reason to add new index entries, because every index’s keys are unchanged. HOT
makes that update place the new version on the same heap page and chain it
off the old version’s t_ctid, with no new index tuple created.
The index entry that already exists keeps pointing at the original line
pointer (the “HOT chain root”). An index scan that lands on the root follows
the t_ctid chain forward through the same-page versions until it finds the one
visible to its snapshot. The chain is invisible to indexes; it lives entirely
inside one heap page. The mechanism — HEAP_HOT_UPDATED and
HEAP_ONLY_TUPLE infomask bits, the root-offset redirect, and the chain walk —
is documented in postgres-heap-am.md and the
in-tree src/backend/access/heap/README.HOT.
Why it mattered, and the second half of HOT. HOT did two things. First, it
stopped indexes from bloating on the common update pattern. Second — and this is
the part that reshaped vacuum — it introduced single-page pruning: a
mechanism that can reclaim dead heap-only tuples without involving indexes at
all. Because a dead HOT tuple has no index entry pointing at it, it can be
removed by a purely local page operation. PostgreSQL prunes such chains
opportunistically, during ordinary reads and writes that pin the page, not just
during vacuum — this is heap_page_prune_opt and the pruning logic now living
in postgres-heap-am.md’s companion module, with the
current code in src/backend/access/heap/pruneheap.c. A dead intermediate HOT
version collapses into a LP_REDIRECT line pointer; the space is freed and the
root line pointer is retargeted, all under a single buffer lock.
The structural shift is best seen side by side.
flowchart LR
subgraph before["Before HOT (pre-8.3) — UPDATE of a non-indexed column"]
direction TB
bidx["Index on col_a"] --> bv1["Heap v1<br/>TID (5,1)"]
bidx2["Index on col_a<br/>(new entry)"] --> bv2["Heap v2<br/>TID (5,2)<br/>same key, new TID"]
bnote["Every UPDATE adds<br/>one index entry per index;<br/>only vacuum + index bulk-delete<br/>can reclaim v1"]
end
subgraph after["After HOT (8.3+) — same UPDATE, no indexed column changed"]
direction TB
aidx["Index on col_a<br/>(one entry, unchanged)"] --> aroot["Root line ptr (5,1)"]
aroot -->|"t_ctid chain"| av2["Heap v2 (5,2)<br/>HEAP_ONLY_TUPLE<br/>no index entry"]
anote["No new index entry;<br/>single-page prune reclaims<br/>dead chain members<br/>without touching indexes"]
end
The payoff for vacuum: on a HOT-friendly workload, a large fraction of dead tuples never reach Phase I at all — they are reaped opportunistically by page pruning between vacuums. The dead-TID array fills more slowly, the expensive per-index Phase II runs less often, and index size tracks the number of distinct key values rather than the number of updates. HOT did not change vacuum’s three-phase shape; it changed how much work flows into it.
There is one important constraint that shaped the design and that vacuum still
respects today: a HOT update can only stay on the same page if the page has room
for the new version. When the page fills, the chain must break — the next update
spills to a different page and does create a new index entry (a non-HOT
update). This is why fillfactor tuning matters for update-heavy tables: leaving
free space on each page raises the fraction of updates that can stay HOT. Vacuum
also participates in keeping chains prunable: when it processes a page it
collapses dead HOT chain members into LP_REDIRECT/LP_UNUSED line pointers so
the root pointer always reaches a live tuple, and it is careful never to remove
a line pointer that an index entry still references. Cross-link:
postgres-heap-am.md (HOT chains, pruning, line-pointer
states) and postgres-vacuum.md (how pruning integrates
with the heap scan).
Era 2 — The visibility map: skip clean pages (8.4, 2009)
Section titled “Era 2 — The visibility map: skip clean pages (8.4, 2009)”The change. PostgreSQL 8.4 added the visibility map (VM): a tiny
separate fork of the relation, one bit per heap page, set when every tuple on
that page is visible to all transactions (no dead tuples, nothing in flight).
The point is brutally simple — if a page is marked all-visible, vacuum’s Phase I
can skip reading it entirely. The full mechanism (the _vm fork, the
visibilitymap_set / visibilitymap_get_status API, the crash-safe
coordination with the heap page’s PD_ALL_VISIBLE flag) is the subject of
postgres-visibility-map.md.
This is the single most important scaling change in the whole arc, because it
broke the O(table size) floor on Phase I. With the VM in place, a vacuum’s
heap-scan cost becomes proportional to the number of modified pages since the
last vacuum, not to the size of the table. A 1 TB append-mostly table whose old
pages are all-visible is vacuumed almost as cheaply as a 1 GB one.
The second payoff: index-only scans. The VM bit also tells the executor
something: if a page is all-visible, an index scan that found a matching entry
does not need to visit the heap to check visibility — the index entry alone is
trustworthy. This is the index-only scan, also landed in 9.2’s executor work
but riding on the 8.4 VM infrastructure, and it is why the VM lives in the
storage engine rather than inside vacuum. See
postgres-visibility-map.md for the
nodeIndexonlyscan consumer side.
Why it could not yet solve wraparound. Here is the limitation that Era 3
exists to fix. The 8.4 VM had one bit: all-visible. All-visible says
“nothing here is dead and nothing is in flight.” It does not say “everything
here is frozen.” So when an anti-wraparound (aggressive) vacuum ran — the
one that must freeze old tuples to advance relfrozenxid — it could not trust
the all-visible bit to mean “no freezing needed here.” An all-visible page might
still hold tuples whose xmin is old but not yet frozen. To be safe, the
aggressive vacuum ignored the VM and read every page anyway. The skip
optimisation helped ordinary vacuums but evaporated exactly when the table was
under the most pressure: an anti-wraparound scan of a huge, cold, static table
still re-read every page, every time, forever.
The structural shift for vacuum:
flowchart LR
subgraph e2before["Pre-8.4 lazy_scan_heap"]
direction TB
b0["For block 0..N"] --> b1["Read page (always)"]
b1 --> b2["Prune, collect dead TIDs,<br/>freeze old tuples"]
b2 --> b3["Cost: O(table size)<br/>every vacuum"]
end
subgraph e2after["8.4+ lazy_scan_heap with VM"]
direction TB
a0["For block 0..N"] --> a1{"VM all-visible<br/>bit set?"}
a1 -->|yes, normal vacuum| a2["SKIP page"]
a1 -->|"no, or aggressive"| a3["Read page,<br/>prune, collect, freeze"]
a3 --> a4["Cost: O(modified pages)<br/>EXCEPT aggressive<br/>still reads all"]
end
Cross-link: postgres-visibility-map.md (the
fork and bit semantics) and postgres-vacuum.md (how
lazy_scan_heap consults the VM and the normal-vs-aggressive distinction).
Era 3 — The all-frozen bit / freeze map: skip frozen pages (9.6, 2016)
Section titled “Era 3 — The all-frozen bit / freeze map: skip frozen pages (9.6, 2016)”The change. PostgreSQL 9.6 widened the visibility map from one bit per page
to two: the existing all-visible bit, and a new all-frozen bit. The
all-frozen bit is set when every tuple on the page is not merely visible but
fully frozen — no tuple has an unfrozen xmin that an anti-wraparound scan
would need to act on. The map with this second bit is often called the freeze
map. The bit definitions (VISIBILITYMAP_ALL_VISIBLE,
VISIBILITYMAP_ALL_FROZEN) and the two-bits-per-page packing are detailed in
postgres-visibility-map.md.
Why it was the missing half of Era 2. Recall the Era 2 limitation: the aggressive, anti-wraparound vacuum could not trust the single all-visible bit, so it re-read the whole table to be sure nothing needed freezing. The all-frozen bit closes exactly that gap. Now an aggressive vacuum can ask, per page, “is this page already all-frozen?” — and if so, skip it, because a frozen page by definition has nothing left to freeze. For the first time, the anti- wraparound obligation became proportional to the amount of unfrozen data rather than to the whole table.
This is enormous for the exact pathology that broke the baseline: a giant, mostly-cold table. Once its old pages are frozen and marked all-frozen, every subsequent anti-wraparound vacuum skips them. A 5 TB historical table that used to force a multi-hour full read every few hundred million transactions now reads only the pages that received recent writes.
There was a one-time cost and a transition wrinkle worth noting historically:
existing pages had no all-frozen bit set, so the first aggressive vacuum after
upgrading to 9.6 still had to read and freeze everything to populate the bit.
After that, the skips compound. A later refinement (9.6 and tuning in following
releases via vacuum_freeze_min_age / opportunistic freezing) pushed vacuum to
set the all-frozen bit eagerly when it was already visiting an all-visible page,
so cold pages reach the frozen-and-skippable state sooner.
The crash-safety detail is what makes the all-frozen bit trustworthy enough to
skip on, and it is worth being precise about because a wrong bit here would mean
silent corruption (a page skipped that actually needed freezing). The VM bits
are not authoritative on their own; they are a cache of a fact also recorded on
the heap page itself (PD_ALL_VISIBLE), and the two are updated under
coordinated WAL logging so that a crash can never leave the map claiming a page
is frozen when the heap disagrees. The skip is therefore only ever as aggressive
as the durably-recorded truth. The full coordination protocol is in
postgres-visibility-map.md; the point for this
arc is that the all-frozen bit is a correctness contract, not a hint — the
aggressive vacuum is allowed to skip a frozen page precisely because the contract
guarantees there is nothing to freeze there.
flowchart LR
subgraph f1["8.4–9.5 VM: one bit"]
direction TB
v1["Per page:<br/>ALL_VISIBLE bit"] --> v2{"Vacuum mode?"}
v2 -->|normal| v3["Skip all-visible pages"]
v2 -->|"aggressive<br/>(anti-wraparound)"| v4["Read EVERY page<br/>cannot trust all-visible<br/>for freeze decisions"]
end
subgraph f2["9.6+ VM: two bits"]
direction TB
w1["Per page:<br/>ALL_VISIBLE + ALL_FROZEN"] --> w2{"Vacuum mode?"}
w2 -->|normal| w3["Skip all-visible pages"]
w2 -->|"aggressive<br/>(anti-wraparound)"| w4["Skip ALL_FROZEN pages;<br/>read only unfrozen ones"]
end
Cross-link: postgres-visibility-map.md (two-bit
layout) and postgres-xid-wraparound-freeze.md
(how freezing advances relfrozenxid and why the all-frozen bit is the contract
that lets the aggressive scan skip).
Era 4 — Parallel index vacuum (13, 2020)
Section titled “Era 4 — Parallel index vacuum (13, 2020)”The change. Eras 1–3 all attack the heap side — making garbage rarer,
making the heap scan skippable. But the baseline’s most expensive single phase
was often Phase II: scanning every index to delete entries that point at
dead heap TIDs. On a table with a dozen large indexes, that is a dozen serial
full-index scans. PostgreSQL 13 made them parallel: a manual VACUUM (and
the implicit vacuum portion of VACUUM over a table, not autovacuum by default)
can launch background workers, one taking each index, so the indexes are
bulk-deleted concurrently.
The implementation is a dedicated module,
src/backend/commands/vacuumparallel.c, which sets up a ParallelVacuumState
in dynamic shared memory (DSM): the leader copies the parameters and the shared
dead-TID storage into the DSM segment, launches workers, and each worker claims
indexes to process. Indexes that are too small to be worth a worker, or whose
AM declares it cannot run bulk-delete in parallel, are still processed by the
leader. The parallelism is across indexes, not within one — each index is still
cleaned by a single process — so the speedup is bounded by the number of
(sufficiently large) indexes. This is described in
postgres-vacuum.md’s parallel-vacuum section.
The structural shift is in who runs Phase II, not in the phases themselves:
flowchart TB
subgraph s1["Pre-13: serial index vacuum"]
direction TB
l1["Leader (single process)"] --> i1["Bulk-delete index 1"]
i1 --> i2["Bulk-delete index 2"]
i2 --> i3["... index N"]
i3 --> i4["Wall time ~ sum of all indexes"]
end
subgraph s2["13+: parallel index vacuum via DSM"]
direction TB
l2["Leader sets up<br/>ParallelVacuumState in DSM"] --> p1["Worker A: index 1"]
l2 --> p2["Worker B: index 2"]
l2 --> p3["Leader: small / unsafe indexes"]
p1 --> p4["Wall time ~ slowest index"]
p2 --> p4
p3 --> p4
end
Why it mattered. This was the first time vacuum broke out of being strictly
single-threaded. For wide tables with many big indexes — common in OLAP-leaning
and append-heavy schemas — the dominant cost moved from sum(index scans)
toward max(index scan). It does not reduce total I/O, but it cuts wall-clock
time, which matters when a maintenance window is finite.
Two design choices keep it safe and bounded. The leader only requests workers up
to the number of eligible indexes (and the PARALLEL degree the user asked
for), so a table with one index gains nothing and pays no DSM setup it does not
need. And because the dead-TID storage must be shared, it lives in DSM with the
parallel-aware allocator — which is exactly why this era and the TidStore era
(below) had to be compatible: the shared store the workers iterate is the same
abstraction the leader uses serially. Note also the scope: parallel vacuum is a
property of explicit VACUUM invocations, not of autovacuum, which deliberately
stays single-process per table so the launcher’s cost-balancing across the whole
cluster remains predictable. Cross-link:
postgres-vacuum.md (parallel state, worker
assignment) and src/backend/commands/vacuumparallel.c.
Era 5 — The wraparound failsafe (14, 2021)
Section titled “Era 5 — The wraparound failsafe (14, 2021)”The change. PostgreSQL 14 added a wraparound failsafe: an emergency mode
that, when relfrozenxid (or relminmxid) has aged past a dangerous threshold,
makes vacuum stop doing index vacuuming and heap cleanup work and race to
freeze instead. Controlled by the vacuum_failsafe_age /
vacuum_multixact_failsafe_age GUCs, it is checked periodically during the heap
scan; once it trips, the global flag VacuumFailsafeActive is set and the rest
of the vacuum skips the optional, time-consuming phases — index bulk-delete,
index cleanup, cost-delay throttling — so it can finish advancing
relfrozenxid as fast as physically possible. In the REL_18 tree this is
lazy_check_wraparound_failsafe in src/backend/access/heap/vacuumlazy.c,
called both early (before allocating the dead-TID store) and repeatedly during
the scan.
Why it had to exist. Every prior era made the common case faster, but none
addressed the failure case: what happens when, despite autovacuum, a table’s
oldest XID creeps toward the 2^31 wraparound wall. Historically the database
would, at the very last moment, refuse new XIDs and shut down to avoid
corruption — a hard outage. The failsafe is a pressure-release valve: rather
than spend hours dutifully cleaning indexes and obeying the cost delay while the
clock runs out, vacuum throws everything non-essential overboard and does the
one thing that actually prevents data loss — freeze and advance
relfrozenxid. Index bloat and a stalled cost-delay budget are recoverable
later; a wraparound shutdown is not. The mechanism and its interplay with
anti-wraparound autovacuum are covered in
postgres-vacuum.md and
postgres-autovacuum.md; the XID-limit ladder it is
racing against is in
postgres-xid-wraparound-freeze.md.
Structurally this is not a new phase but a short-circuit through the existing
ones: a boolean checked at the top of each 4 GB scan segment that, once set,
causes lazy_vacuum to bypass the index phases (see the
!VacuumFailsafeActive guards in vacuumlazy.c). It is the smallest code
change of any era here and arguably the highest-leverage for operational
safety.
Era 6 — TidStore: the dead-TID radix store (17, 2024)
Section titled “Era 6 — TidStore: the dead-TID radix store (17, 2024)”The change. PostgreSQL 17 replaced the flat, fixed-width dead-TID array —
the one piece of the baseline that had survived essentially unchanged since
Era 0 — with TidStore, a compact, adaptive-radix-tree-backed store for dead
tuple IDs. The relevant code is src/backend/access/common/tidstore.c and
src/include/access/tidstore.h; in vacuumlazy.c, vacrel->dead_items is now
a TidStore * rather than a VacDeadItems array, with a VacDeadItemsInfo
companion tracking counts and the memory budget.
Why the flat array had to go. The old array stored one full
ItemPointerData per dead tuple and was hard-capped by maintenance_work_mem
(historically at most ~1 GB worth, i.e. the array could not exceed a fixed
INT_MAX-bounded element count regardless of how much memory you gave it). Two
consequences:
- Memory waste. Dead TIDs cluster by page — many dead offsets share the same
block number. A flat array of
(block, offset)pairs re-stored the block number for every offset. The radix store keys on the block number and packs the offsets within a page into a bitmap, so a page with 100 dead tuples costs roughly one key plus a small bitmap instead of 100 six-byte entries. On realistic bloated tables the memory footprint drops several-fold. - The 1 GB ceiling and extra index scans. Because the array could not grow
past its hard cap no matter how much
maintenance_work_memwas available, a vacuum that found more dead tuples than the cap had to flush early — run Phase II (every index) and Phase III, empty the array, and resume — incurring additional full index scans. TidStore’s compression means far more dead TIDs fit in the same memory, so the array fills less often and those extra index passes largely disappear. PostgreSQL 17 also let vacuum actually use memory beyond the old 1 GB practical limit.
The store is the same structure used in DSM for parallel index vacuum (Era 4), so the two improvements compose: the parallel workers iterate a shared TidStore rather than a shared flat array.
flowchart LR
subgraph t1["Pre-17: flat dead-TID array"]
direction TB
a1["VacDeadItems[]<br/>one ItemPointerData<br/>per dead tuple"] --> a2["Block repeated<br/>per offset"]
a2 --> a3["Hard-capped element count<br/>(~1 GB practical limit)"]
a3 --> a4["Fills early on bloated tables<br/>=> extra full index scans"]
end
subgraph t2["17+: TidStore radix store"]
direction TB
b1["TidStore<br/>adaptive radix tree<br/>keyed on block number"] --> b2["Offsets packed<br/>as per-page bitmap"]
b2 --> b3["Several-fold less memory<br/>can exceed old 1 GB limit"]
b3 --> b4["Fills far less often<br/>=> fewer index passes"]
end
Cross-link: postgres-vacuum.md (dead_items_alloc,
the TidStore/VacDeadItemsInfo pairing, and how the store drives Phase II/III)
and src/backend/access/common/tidstore.c.
Where it stands at REL_18
Section titled “Where it stands at REL_18”At REL_18 (commit 273fe94, PG 18.x) the vacuum-and-visibility stack is the sum
of every era above, layered onto the same three-phase lazy_scan_heap skeleton
from Era 0:
- Don’t make garbage — HOT updates and opportunistic single-page pruning
(
pruneheap.c) keep dead tuples and index entries from accumulating in the first place. →postgres-heap-am.md. - Don’t read what’s clean / static — the two-bit visibility map
(
visibilitymap.c) lets normal vacuum skip all-visible pages and aggressive vacuum skip all-frozen ones, so both the dead-tuple scan and the anti-wraparound scan are proportional to churn, not to table size. →postgres-visibility-map.md. - Scale the unavoidable — when indexes must be cleaned,
vacuumparallel.cfans bulk-delete out across DSM workers, and the dead TIDs they consume live in a memory-efficientTidStore(tidstore.c) rather than a flat array. →postgres-vacuum.md. - Survive the worst case —
lazy_check_wraparound_failsafeinvacuumlazy.ctripsVacuumFailsafeActiveto bypass index work and race to freeze whenrelfrozenxidis dangerously old, turning a would-be wraparound shutdown into recoverable bloat. →postgres-vacuum.md,postgres-xid-wraparound-freeze.md,postgres-autovacuum.md.
The orchestration around all of this — the launcher/worker model that decides
when a table is vacuumed and forces anti-wraparound vacuums even when
autovacuum is disabled — is in
postgres-autovacuum.md.
The PG19 next step. The direction of travel continues toward decoupling the
freeze obligation from the table-size scan: just-released PG19-era work pushes
on eager and more incremental freezing strategies (visiting fewer pages to keep
relfrozenxid advancing) so that aggressive anti-wraparound vacuums become rarer
still. Treat that only as a forward note; the design described above is the
current REL_18 behaviour.
Sources
Section titled “Sources”- Release notes — PostgreSQL release notes for 8.3 (HOT), 8.4 (visibility
map), 9.6 (freeze map / all-frozen bit), 13 (parallel
VACUUM), 14 (vacuum emergency / wraparound failsafe,vacuum_failsafe_age), 17 (vacuum dead-TID memory / TidStore). - Current-state module docs (mechanism, not re-derived here):
postgres-vacuum.md— three-phase loop,LVRelState, cutoffs, parallel state, failsafe, dead-items store.postgres-heap-am.md— no-overwrite tuples, HOT chains, single-page pruning.postgres-visibility-map.md—_vmfork, all-visible + all-frozen bits, crash-safe coordination, index-only scans.postgres-autovacuum.md— launcher/worker scheduling, anti-wraparound forcing.postgres-xid-wraparound-freeze.md— XID limit ladder, freeze plans,relfrozenxidadvancement.
- Key source files (observable on REL_18, commit 273fe94) —
src/backend/access/heap/vacuumlazy.c—lazy_scan_heap,lazy_check_wraparound_failsafe,VacuumFailsafeActive,dead_items.src/backend/access/heap/visibilitymap.c— VM fork get/set.src/backend/access/heap/pruneheap.c— HOT pruning /heap_page_prune_opt.src/backend/commands/vacuumparallel.c—ParallelVacuumState, DSM setup.src/backend/access/common/tidstore.c,src/include/access/tidstore.h— the radix dead-TID store.src/backend/access/heap/heapam.c,src/backend/access/heap/README.HOT— HOT mechanics.