Skip to content

PostgreSQL Vacuum & Visibility — From Plain Lazy Vacuum to HOT, Freeze Maps, and Parallelism

Contents:

Why this subsystem had to evolve (the original limitation)

Section titled “Why this subsystem had to evolve (the original limitation)”

PostgreSQL is a no-overwrite MVCC engine. A DELETE does not erase a row and an UPDATE does not rewrite one in place; instead the storage layer stamps the old version’s xmax and (for an update) writes a brand-new version elsewhere. The mechanism is covered in the current-state module doc postgres-heap-am.md: every tuple lives at an ItemId slot on a heap page, visibility is decided per tuple by comparing xmin/xmax against a snapshot, and the dead bytes simply sit there until something sweeps them. That something is vacuum.

This design buys the headline MVCC property — readers never block writers and writers never block readers, because each transaction reads its own snapshot — but it hands the implementation two open-ended bills that grow with write volume:

  1. Dead-tuple reclamation. Updates and deletes leak space. A heavily updated table bloats without bound, and index/heap scans walk ever-longer runs of dead versions, unless a background process reclaims them. The hard part is that “this version is dead” is a global predicate: vacuum must know that no snapshot anywhere in the cluster can still see the row. That horizon (OldestXmin) is computed across every active backend.

  2. Transaction-ID wraparound. XIDs are 32-bit. The visibility comparator treats the XID space as a circle with a 2^31 horizon, so a tuple stamped with an xmin more than ~2.1 billion transactions in the past would suddenly look like it came from the future and become invisible — silent data loss. The defence is to freeze old tuples (rewrite their xmin to a value the comparator always treats as “in the past”) before they age out. Freezing is only correct if it touches every old tuple, so it is fundamentally a full-table obligation. The mechanism lives in postgres-xid-wraparound-freeze.md.

The original implementation met both bills with the bluntest possible tool: a single-threaded process that read every page of the table, collected the dead line pointers into a flat array, walked every index to delete the matching entries, then went back and reaped the dead heap line pointers. Two sub-problems, one full scan, no shortcuts. That worked at the table sizes of 2005 and fell apart as tables grew, as update rates climbed, and as the anti-wraparound obligation forced repeated full scans over data that had not changed in months.

Every era below is a response to a specific way that brute-force loop did not scale. Read as a whole, the arc moves through four distinct strategies:

  • Don’t make garbage (HOT) — avoid creating index bloat and let a cheap single-page prune reclaim space without vacuum at all.
  • Don’t read what’s clean (visibility map) — record per page whether vacuum can skip it, so the scan is proportional to churn, not to table size.
  • Don’t re-read what’s static (all-frozen bit) — extend that skip to the anti-wraparound scan, the one obligation the visibility map originally could not satisfy.
  • Parallelise and survive (parallel index vacuum, failsafe, TidStore) — when the scan is unavoidable, spread index cleanup across workers, refuse to let wraparound win under emergency pressure, and stop wasting memory on the dead-TID bookkeeping.
timeline
    title PostgreSQL Vacuum and Visibility Evolution
    section Brute force
        8.x baseline : Plain lazy vacuum : Full heap scan, full index scan, flat dead-TID array
    section Avoid and skip
        8.3 (2007) : HOT heap-only tuples : Same-page update chains, no new index entry, single-page prune
        8.4 (2009) : Visibility map : One all-visible bit per page, vacuum skips clean pages, index-only scans
    section Skip the static
        9.6 (2016) : All-frozen bit / freeze map : Second VM bit, anti-wraparound scan skips frozen pages
    section Scale and survive
        13 (2020) : Parallel index vacuum : Per-index parallel workers via DSM
        14 (2021) : Wraparound failsafe : Bypass index vacuuming when relfrozenxid is dangerously old
        17 (2024) : TidStore radix store : Compressed adaptive-radix dead-TID store replaces flat array

Era 0 — Plain lazy vacuum (the 8.x baseline)

Section titled “Era 0 — Plain lazy vacuum (the 8.x baseline)”

What it was. Before any of the optimisations below, VACUUM (the non-FULL “lazy” variant introduced in 7.2 to replace the old rewrite-the- whole-table vacuum) ran a fixed three-phase loop over the entire relation:

  1. Phase I — heap scan. Read every heap page from block 0 to the end. On each page, run HOT-less pruning logic and decide, tuple by tuple, which line pointers are dead (deleted before OldestXmin and invisible to every snapshot). Append each dead item’s (block, offset) TID to an in-memory array sized by maintenance_work_mem. While here, freeze any tuple older than the freeze cutoff.
  2. Phase II — index vacuum. For each index on the table, call its ambulkdelete entry point, which scans the entire index and removes any entry pointing at a TID in the dead array. This is the expensive phase: it is O(index size) per index, and it runs once per fill of the dead-TID array.
  3. Phase III — heap reap. Walk the heap pages that held dead items again and turn their LP_DEAD line pointers into LP_UNUSED, making the space reusable, and update the free space map.

If the dead-TID array filled before the heap scan finished, vacuum flushed it by running Phases II and III early, then resumed the heap scan — meaning a big table with a small maintenance_work_mem could scan each index several times in one vacuum.

Why it did not scale. Three structural costs were baked in:

  • Phase I is O(table size), always. Even a 500 GB table that received ten updates last night was read in full. There was no per-page record of “nothing changed here,” so there was nothing to skip.
  • Every ordinary UPDATE created index bloat. Changing one non-indexed column still wrote a new heap tuple and a new entry in every index, because the index pointed at the physical TID and the new version had a new TID. The indexes grew as fast as the heap.
  • The dead-TID array was a flat, fixed-width array. Each dead tuple cost a full ItemPointerData (6 bytes, later padded), capped by maintenance_work_ mem. A vacuum that found more dead tuples than the array could hold paid for extra full index scans.

And on top of all that, the anti-wraparound obligation (Era 3’s subject) forced periodic aggressive vacuums that could not skip anything, because freezing must visit every unfrozen tuple. The baseline had no way to remember that a page was already entirely frozen, so an anti-wraparound vacuum re-read even cold, static, decade-old data.

A concrete worked example makes the pain tangible. Suppose a 200 GB table with six indexes, maintenance_work_mem set so the dead-TID array holds ~11 million TIDs, and an overnight batch that deletes 50 million rows. The baseline vacuum reads all 200 GB of heap (Phase I), and because 50 million dead TIDs overflow the 11-million-entry array roughly five times, it runs Phase II — six full index scans — five separate times, i.e. thirty full index scans in one vacuum, interleaved with five Phase III reaping passes. Nothing about this is proportional to anything useful; it is proportional to (table size) + (dead tuples / array capacity) × (number of indexes). Each era below knocks out one of these multipliers.

The structural shape — three serial phases driven by lazy_scan_heap, a flat dead-TID array, per-index bulk delete — is still the skeleton of postgres-vacuum.md today. Everything below is a layer bolted onto that skeleton to make one of its three costs disappear.

Era 1 — HOT: heap-only tuples (8.3, 2007)

Section titled “Era 1 — HOT: heap-only tuples (8.3, 2007)”

The change. PostgreSQL 8.3 introduced HOT — Heap-Only Tuples — the first big lever, and the only one in this whole arc that attacks the problem before vacuum runs at all. The premise: most updates do not change indexed columns. If an UPDATE touches no column that any index covers, there is no reason to add new index entries, because every index’s keys are unchanged. HOT makes that update place the new version on the same heap page and chain it off the old version’s t_ctid, with no new index tuple created.

The index entry that already exists keeps pointing at the original line pointer (the “HOT chain root”). An index scan that lands on the root follows the t_ctid chain forward through the same-page versions until it finds the one visible to its snapshot. The chain is invisible to indexes; it lives entirely inside one heap page. The mechanism — HEAP_HOT_UPDATED and HEAP_ONLY_TUPLE infomask bits, the root-offset redirect, and the chain walk — is documented in postgres-heap-am.md and the in-tree src/backend/access/heap/README.HOT.

Why it mattered, and the second half of HOT. HOT did two things. First, it stopped indexes from bloating on the common update pattern. Second — and this is the part that reshaped vacuum — it introduced single-page pruning: a mechanism that can reclaim dead heap-only tuples without involving indexes at all. Because a dead HOT tuple has no index entry pointing at it, it can be removed by a purely local page operation. PostgreSQL prunes such chains opportunistically, during ordinary reads and writes that pin the page, not just during vacuum — this is heap_page_prune_opt and the pruning logic now living in postgres-heap-am.md’s companion module, with the current code in src/backend/access/heap/pruneheap.c. A dead intermediate HOT version collapses into a LP_REDIRECT line pointer; the space is freed and the root line pointer is retargeted, all under a single buffer lock.

The structural shift is best seen side by side.

flowchart LR
    subgraph before["Before HOT (pre-8.3) — UPDATE of a non-indexed column"]
        direction TB
        bidx["Index on col_a"] --> bv1["Heap v1<br/>TID (5,1)"]
        bidx2["Index on col_a<br/>(new entry)"] --> bv2["Heap v2<br/>TID (5,2)<br/>same key, new TID"]
        bnote["Every UPDATE adds<br/>one index entry per index;<br/>only vacuum + index bulk-delete<br/>can reclaim v1"]
    end
    subgraph after["After HOT (8.3+) — same UPDATE, no indexed column changed"]
        direction TB
        aidx["Index on col_a<br/>(one entry, unchanged)"] --> aroot["Root line ptr (5,1)"]
        aroot -->|"t_ctid chain"| av2["Heap v2 (5,2)<br/>HEAP_ONLY_TUPLE<br/>no index entry"]
        anote["No new index entry;<br/>single-page prune reclaims<br/>dead chain members<br/>without touching indexes"]
    end

The payoff for vacuum: on a HOT-friendly workload, a large fraction of dead tuples never reach Phase I at all — they are reaped opportunistically by page pruning between vacuums. The dead-TID array fills more slowly, the expensive per-index Phase II runs less often, and index size tracks the number of distinct key values rather than the number of updates. HOT did not change vacuum’s three-phase shape; it changed how much work flows into it.

There is one important constraint that shaped the design and that vacuum still respects today: a HOT update can only stay on the same page if the page has room for the new version. When the page fills, the chain must break — the next update spills to a different page and does create a new index entry (a non-HOT update). This is why fillfactor tuning matters for update-heavy tables: leaving free space on each page raises the fraction of updates that can stay HOT. Vacuum also participates in keeping chains prunable: when it processes a page it collapses dead HOT chain members into LP_REDIRECT/LP_UNUSED line pointers so the root pointer always reaches a live tuple, and it is careful never to remove a line pointer that an index entry still references. Cross-link: postgres-heap-am.md (HOT chains, pruning, line-pointer states) and postgres-vacuum.md (how pruning integrates with the heap scan).

Era 2 — The visibility map: skip clean pages (8.4, 2009)

Section titled “Era 2 — The visibility map: skip clean pages (8.4, 2009)”

The change. PostgreSQL 8.4 added the visibility map (VM): a tiny separate fork of the relation, one bit per heap page, set when every tuple on that page is visible to all transactions (no dead tuples, nothing in flight). The point is brutally simple — if a page is marked all-visible, vacuum’s Phase I can skip reading it entirely. The full mechanism (the _vm fork, the visibilitymap_set / visibilitymap_get_status API, the crash-safe coordination with the heap page’s PD_ALL_VISIBLE flag) is the subject of postgres-visibility-map.md.

This is the single most important scaling change in the whole arc, because it broke the O(table size) floor on Phase I. With the VM in place, a vacuum’s heap-scan cost becomes proportional to the number of modified pages since the last vacuum, not to the size of the table. A 1 TB append-mostly table whose old pages are all-visible is vacuumed almost as cheaply as a 1 GB one.

The second payoff: index-only scans. The VM bit also tells the executor something: if a page is all-visible, an index scan that found a matching entry does not need to visit the heap to check visibility — the index entry alone is trustworthy. This is the index-only scan, also landed in 9.2’s executor work but riding on the 8.4 VM infrastructure, and it is why the VM lives in the storage engine rather than inside vacuum. See postgres-visibility-map.md for the nodeIndexonlyscan consumer side.

Why it could not yet solve wraparound. Here is the limitation that Era 3 exists to fix. The 8.4 VM had one bit: all-visible. All-visible says “nothing here is dead and nothing is in flight.” It does not say “everything here is frozen.” So when an anti-wraparound (aggressive) vacuum ran — the one that must freeze old tuples to advance relfrozenxid — it could not trust the all-visible bit to mean “no freezing needed here.” An all-visible page might still hold tuples whose xmin is old but not yet frozen. To be safe, the aggressive vacuum ignored the VM and read every page anyway. The skip optimisation helped ordinary vacuums but evaporated exactly when the table was under the most pressure: an anti-wraparound scan of a huge, cold, static table still re-read every page, every time, forever.

The structural shift for vacuum:

flowchart LR
    subgraph e2before["Pre-8.4 lazy_scan_heap"]
        direction TB
        b0["For block 0..N"] --> b1["Read page (always)"]
        b1 --> b2["Prune, collect dead TIDs,<br/>freeze old tuples"]
        b2 --> b3["Cost: O(table size)<br/>every vacuum"]
    end
    subgraph e2after["8.4+ lazy_scan_heap with VM"]
        direction TB
        a0["For block 0..N"] --> a1{"VM all-visible<br/>bit set?"}
        a1 -->|yes, normal vacuum| a2["SKIP page"]
        a1 -->|"no, or aggressive"| a3["Read page,<br/>prune, collect, freeze"]
        a3 --> a4["Cost: O(modified pages)<br/>EXCEPT aggressive<br/>still reads all"]
    end

Cross-link: postgres-visibility-map.md (the fork and bit semantics) and postgres-vacuum.md (how lazy_scan_heap consults the VM and the normal-vs-aggressive distinction).

Era 3 — The all-frozen bit / freeze map: skip frozen pages (9.6, 2016)

Section titled “Era 3 — The all-frozen bit / freeze map: skip frozen pages (9.6, 2016)”

The change. PostgreSQL 9.6 widened the visibility map from one bit per page to two: the existing all-visible bit, and a new all-frozen bit. The all-frozen bit is set when every tuple on the page is not merely visible but fully frozen — no tuple has an unfrozen xmin that an anti-wraparound scan would need to act on. The map with this second bit is often called the freeze map. The bit definitions (VISIBILITYMAP_ALL_VISIBLE, VISIBILITYMAP_ALL_FROZEN) and the two-bits-per-page packing are detailed in postgres-visibility-map.md.

Why it was the missing half of Era 2. Recall the Era 2 limitation: the aggressive, anti-wraparound vacuum could not trust the single all-visible bit, so it re-read the whole table to be sure nothing needed freezing. The all-frozen bit closes exactly that gap. Now an aggressive vacuum can ask, per page, “is this page already all-frozen?” — and if so, skip it, because a frozen page by definition has nothing left to freeze. For the first time, the anti- wraparound obligation became proportional to the amount of unfrozen data rather than to the whole table.

This is enormous for the exact pathology that broke the baseline: a giant, mostly-cold table. Once its old pages are frozen and marked all-frozen, every subsequent anti-wraparound vacuum skips them. A 5 TB historical table that used to force a multi-hour full read every few hundred million transactions now reads only the pages that received recent writes.

There was a one-time cost and a transition wrinkle worth noting historically: existing pages had no all-frozen bit set, so the first aggressive vacuum after upgrading to 9.6 still had to read and freeze everything to populate the bit. After that, the skips compound. A later refinement (9.6 and tuning in following releases via vacuum_freeze_min_age / opportunistic freezing) pushed vacuum to set the all-frozen bit eagerly when it was already visiting an all-visible page, so cold pages reach the frozen-and-skippable state sooner.

The crash-safety detail is what makes the all-frozen bit trustworthy enough to skip on, and it is worth being precise about because a wrong bit here would mean silent corruption (a page skipped that actually needed freezing). The VM bits are not authoritative on their own; they are a cache of a fact also recorded on the heap page itself (PD_ALL_VISIBLE), and the two are updated under coordinated WAL logging so that a crash can never leave the map claiming a page is frozen when the heap disagrees. The skip is therefore only ever as aggressive as the durably-recorded truth. The full coordination protocol is in postgres-visibility-map.md; the point for this arc is that the all-frozen bit is a correctness contract, not a hint — the aggressive vacuum is allowed to skip a frozen page precisely because the contract guarantees there is nothing to freeze there.

flowchart LR
    subgraph f1["8.4–9.5 VM: one bit"]
        direction TB
        v1["Per page:<br/>ALL_VISIBLE bit"] --> v2{"Vacuum mode?"}
        v2 -->|normal| v3["Skip all-visible pages"]
        v2 -->|"aggressive<br/>(anti-wraparound)"| v4["Read EVERY page<br/>cannot trust all-visible<br/>for freeze decisions"]
    end
    subgraph f2["9.6+ VM: two bits"]
        direction TB
        w1["Per page:<br/>ALL_VISIBLE + ALL_FROZEN"] --> w2{"Vacuum mode?"}
        w2 -->|normal| w3["Skip all-visible pages"]
        w2 -->|"aggressive<br/>(anti-wraparound)"| w4["Skip ALL_FROZEN pages;<br/>read only unfrozen ones"]
    end

Cross-link: postgres-visibility-map.md (two-bit layout) and postgres-xid-wraparound-freeze.md (how freezing advances relfrozenxid and why the all-frozen bit is the contract that lets the aggressive scan skip).

Era 4 — Parallel index vacuum (13, 2020)

Section titled “Era 4 — Parallel index vacuum (13, 2020)”

The change. Eras 1–3 all attack the heap side — making garbage rarer, making the heap scan skippable. But the baseline’s most expensive single phase was often Phase II: scanning every index to delete entries that point at dead heap TIDs. On a table with a dozen large indexes, that is a dozen serial full-index scans. PostgreSQL 13 made them parallel: a manual VACUUM (and the implicit vacuum portion of VACUUM over a table, not autovacuum by default) can launch background workers, one taking each index, so the indexes are bulk-deleted concurrently.

The implementation is a dedicated module, src/backend/commands/vacuumparallel.c, which sets up a ParallelVacuumState in dynamic shared memory (DSM): the leader copies the parameters and the shared dead-TID storage into the DSM segment, launches workers, and each worker claims indexes to process. Indexes that are too small to be worth a worker, or whose AM declares it cannot run bulk-delete in parallel, are still processed by the leader. The parallelism is across indexes, not within one — each index is still cleaned by a single process — so the speedup is bounded by the number of (sufficiently large) indexes. This is described in postgres-vacuum.md’s parallel-vacuum section.

The structural shift is in who runs Phase II, not in the phases themselves:

flowchart TB
    subgraph s1["Pre-13: serial index vacuum"]
        direction TB
        l1["Leader (single process)"] --> i1["Bulk-delete index 1"]
        i1 --> i2["Bulk-delete index 2"]
        i2 --> i3["... index N"]
        i3 --> i4["Wall time ~ sum of all indexes"]
    end
    subgraph s2["13+: parallel index vacuum via DSM"]
        direction TB
        l2["Leader sets up<br/>ParallelVacuumState in DSM"] --> p1["Worker A: index 1"]
        l2 --> p2["Worker B: index 2"]
        l2 --> p3["Leader: small / unsafe indexes"]
        p1 --> p4["Wall time ~ slowest index"]
        p2 --> p4
        p3 --> p4
    end

Why it mattered. This was the first time vacuum broke out of being strictly single-threaded. For wide tables with many big indexes — common in OLAP-leaning and append-heavy schemas — the dominant cost moved from sum(index scans) toward max(index scan). It does not reduce total I/O, but it cuts wall-clock time, which matters when a maintenance window is finite.

Two design choices keep it safe and bounded. The leader only requests workers up to the number of eligible indexes (and the PARALLEL degree the user asked for), so a table with one index gains nothing and pays no DSM setup it does not need. And because the dead-TID storage must be shared, it lives in DSM with the parallel-aware allocator — which is exactly why this era and the TidStore era (below) had to be compatible: the shared store the workers iterate is the same abstraction the leader uses serially. Note also the scope: parallel vacuum is a property of explicit VACUUM invocations, not of autovacuum, which deliberately stays single-process per table so the launcher’s cost-balancing across the whole cluster remains predictable. Cross-link: postgres-vacuum.md (parallel state, worker assignment) and src/backend/commands/vacuumparallel.c.

Era 5 — The wraparound failsafe (14, 2021)

Section titled “Era 5 — The wraparound failsafe (14, 2021)”

The change. PostgreSQL 14 added a wraparound failsafe: an emergency mode that, when relfrozenxid (or relminmxid) has aged past a dangerous threshold, makes vacuum stop doing index vacuuming and heap cleanup work and race to freeze instead. Controlled by the vacuum_failsafe_age / vacuum_multixact_failsafe_age GUCs, it is checked periodically during the heap scan; once it trips, the global flag VacuumFailsafeActive is set and the rest of the vacuum skips the optional, time-consuming phases — index bulk-delete, index cleanup, cost-delay throttling — so it can finish advancing relfrozenxid as fast as physically possible. In the REL_18 tree this is lazy_check_wraparound_failsafe in src/backend/access/heap/vacuumlazy.c, called both early (before allocating the dead-TID store) and repeatedly during the scan.

Why it had to exist. Every prior era made the common case faster, but none addressed the failure case: what happens when, despite autovacuum, a table’s oldest XID creeps toward the 2^31 wraparound wall. Historically the database would, at the very last moment, refuse new XIDs and shut down to avoid corruption — a hard outage. The failsafe is a pressure-release valve: rather than spend hours dutifully cleaning indexes and obeying the cost delay while the clock runs out, vacuum throws everything non-essential overboard and does the one thing that actually prevents data loss — freeze and advance relfrozenxid. Index bloat and a stalled cost-delay budget are recoverable later; a wraparound shutdown is not. The mechanism and its interplay with anti-wraparound autovacuum are covered in postgres-vacuum.md and postgres-autovacuum.md; the XID-limit ladder it is racing against is in postgres-xid-wraparound-freeze.md.

Structurally this is not a new phase but a short-circuit through the existing ones: a boolean checked at the top of each 4 GB scan segment that, once set, causes lazy_vacuum to bypass the index phases (see the !VacuumFailsafeActive guards in vacuumlazy.c). It is the smallest code change of any era here and arguably the highest-leverage for operational safety.

Era 6 — TidStore: the dead-TID radix store (17, 2024)

Section titled “Era 6 — TidStore: the dead-TID radix store (17, 2024)”

The change. PostgreSQL 17 replaced the flat, fixed-width dead-TID array — the one piece of the baseline that had survived essentially unchanged since Era 0 — with TidStore, a compact, adaptive-radix-tree-backed store for dead tuple IDs. The relevant code is src/backend/access/common/tidstore.c and src/include/access/tidstore.h; in vacuumlazy.c, vacrel->dead_items is now a TidStore * rather than a VacDeadItems array, with a VacDeadItemsInfo companion tracking counts and the memory budget.

Why the flat array had to go. The old array stored one full ItemPointerData per dead tuple and was hard-capped by maintenance_work_mem (historically at most ~1 GB worth, i.e. the array could not exceed a fixed INT_MAX-bounded element count regardless of how much memory you gave it). Two consequences:

  • Memory waste. Dead TIDs cluster by page — many dead offsets share the same block number. A flat array of (block, offset) pairs re-stored the block number for every offset. The radix store keys on the block number and packs the offsets within a page into a bitmap, so a page with 100 dead tuples costs roughly one key plus a small bitmap instead of 100 six-byte entries. On realistic bloated tables the memory footprint drops several-fold.
  • The 1 GB ceiling and extra index scans. Because the array could not grow past its hard cap no matter how much maintenance_work_mem was available, a vacuum that found more dead tuples than the cap had to flush early — run Phase II (every index) and Phase III, empty the array, and resume — incurring additional full index scans. TidStore’s compression means far more dead TIDs fit in the same memory, so the array fills less often and those extra index passes largely disappear. PostgreSQL 17 also let vacuum actually use memory beyond the old 1 GB practical limit.

The store is the same structure used in DSM for parallel index vacuum (Era 4), so the two improvements compose: the parallel workers iterate a shared TidStore rather than a shared flat array.

flowchart LR
    subgraph t1["Pre-17: flat dead-TID array"]
        direction TB
        a1["VacDeadItems[]<br/>one ItemPointerData<br/>per dead tuple"] --> a2["Block repeated<br/>per offset"]
        a2 --> a3["Hard-capped element count<br/>(~1 GB practical limit)"]
        a3 --> a4["Fills early on bloated tables<br/>=> extra full index scans"]
    end
    subgraph t2["17+: TidStore radix store"]
        direction TB
        b1["TidStore<br/>adaptive radix tree<br/>keyed on block number"] --> b2["Offsets packed<br/>as per-page bitmap"]
        b2 --> b3["Several-fold less memory<br/>can exceed old 1 GB limit"]
        b3 --> b4["Fills far less often<br/>=> fewer index passes"]
    end

Cross-link: postgres-vacuum.md (dead_items_alloc, the TidStore/VacDeadItemsInfo pairing, and how the store drives Phase II/III) and src/backend/access/common/tidstore.c.

At REL_18 (commit 273fe94, PG 18.x) the vacuum-and-visibility stack is the sum of every era above, layered onto the same three-phase lazy_scan_heap skeleton from Era 0:

  • Don’t make garbage — HOT updates and opportunistic single-page pruning (pruneheap.c) keep dead tuples and index entries from accumulating in the first place. → postgres-heap-am.md.
  • Don’t read what’s clean / static — the two-bit visibility map (visibilitymap.c) lets normal vacuum skip all-visible pages and aggressive vacuum skip all-frozen ones, so both the dead-tuple scan and the anti-wraparound scan are proportional to churn, not to table size. → postgres-visibility-map.md.
  • Scale the unavoidable — when indexes must be cleaned, vacuumparallel.c fans bulk-delete out across DSM workers, and the dead TIDs they consume live in a memory-efficient TidStore (tidstore.c) rather than a flat array. → postgres-vacuum.md.
  • Survive the worst caselazy_check_wraparound_failsafe in vacuumlazy.c trips VacuumFailsafeActive to bypass index work and race to freeze when relfrozenxid is dangerously old, turning a would-be wraparound shutdown into recoverable bloat. → postgres-vacuum.md, postgres-xid-wraparound-freeze.md, postgres-autovacuum.md.

The orchestration around all of this — the launcher/worker model that decides when a table is vacuumed and forces anti-wraparound vacuums even when autovacuum is disabled — is in postgres-autovacuum.md.

The PG19 next step. The direction of travel continues toward decoupling the freeze obligation from the table-size scan: just-released PG19-era work pushes on eager and more incremental freezing strategies (visiting fewer pages to keep relfrozenxid advancing) so that aggressive anti-wraparound vacuums become rarer still. Treat that only as a forward note; the design described above is the current REL_18 behaviour.

  • Release notes — PostgreSQL release notes for 8.3 (HOT), 8.4 (visibility map), 9.6 (freeze map / all-frozen bit), 13 (parallel VACUUM), 14 (vacuum emergency / wraparound failsafe, vacuum_failsafe_age), 17 (vacuum dead-TID memory / TidStore).
  • Current-state module docs (mechanism, not re-derived here):
  • Key source files (observable on REL_18, commit 273fe94)
    • src/backend/access/heap/vacuumlazy.clazy_scan_heap, lazy_check_wraparound_failsafe, VacuumFailsafeActive, dead_items.
    • src/backend/access/heap/visibilitymap.c — VM fork get/set.
    • src/backend/access/heap/pruneheap.c — HOT pruning / heap_page_prune_opt.
    • src/backend/commands/vacuumparallel.cParallelVacuumState, DSM setup.
    • src/backend/access/common/tidstore.c, src/include/access/tidstore.h — the radix dead-TID store.
    • src/backend/access/heap/heapam.c, src/backend/access/heap/README.HOT — HOT mechanics.