PostgreSQL Incremental Backup — WAL Summaries, the Manifest, and pg_combinebackup

Contents:

Theoretical Background
Common DBMS Design
PostgreSQL’s Approach
Source Walkthrough
Source verification (as of 2026-06-05)
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Sources

Theoretical Background

A physical backup of a database is a byte-for-byte copy of its on-disk state: the data files, plus enough write-ahead log (WAL) to bring that copy to a consistent point during recovery. A full backup copies every file. The cost of a full backup grows with the size of the cluster, not with the rate of change — a 10 TB warehouse that mutates 1 % of its pages per day still pays 10 TB of read, transfer, and storage for each backup. That mismatch between data size and change size is the entire motivation for incremental backup.

An incremental backup copies only the data that changed since some earlier reference backup, plus the metadata needed to reconstruct a full image later. The design question every system must answer is: how do we know which bytes changed? There are three classical answers, and they trade accuracy for bookkeeping cost:

Timestamp / mtime comparison. Copy any file whose modification time is newer than the reference backup. Cheap, but coarse (a one-byte change copies the whole file) and unsafe (clock skew, in-place writes that do not bump mtime, files removed and recreated with the same name). File-level granularity is far too coarse for a database where a 1 GB relation segment changes a handful of 8 KB pages.
Block-checksum diffing. Read every block of both the live file and the reference, compare checksums, copy the blocks that differ. Accurate at page granularity, but it must read the entire database to find the changes — it eliminates the transfer cost but not the read cost. This is what rsync --checksum style tools effectively do.
Change tracking from the log. The database already records every page modification in its WAL for durability. If we summarize the WAL — distill from it the set of (relation, fork, block) tuples that were touched between two LSNs — we can know exactly which blocks changed without reading the data files at all. The cost is proportional to the volume of WAL, i.e. to the change rate, which is exactly the quantity we wanted to be proportional to.

PostgreSQL 17 implements option 3. This is the same idea that underlies ARIES-style redo (Mohan 1992): the WAL is an authoritative, ordered record of every change, so any question of the form “what changed between LSN a and LSN b?” can in principle be answered by replaying the log between those points. Incremental backup replays the log not to apply changes but to enumerate them. Database Internals (Petrov, ch. 3 “File Formats” and the recovery discussion) frames the WAL as the system’s single source of truth for durability; incremental backup is a second consumer of that same truth, parallel to crash recovery and to streaming replication.

Two derived concepts make this concrete:

The block reference table. The summarized form of “what changed” is a map from each (RelFileLocator, ForkNumber) to the set of block numbers modified in the LSN window, plus a limit block marking truncations (blocks at or above the limit no longer exist). PostgreSQL calls this a BlockRefTable. Its in-memory representation converges to roughly one bit per modified block for densely-modified relations, so even a large change set stays compact.

The reconstruction problem. An incremental backup is, by construction, not restorable on its own — it is a delta. To restore, you need the incremental backup plus the entire chain of backups it depends on, back to a full backup. Reconstruction walks that chain and, for every block of every file, decides which backup in the chain holds the authoritative copy. This is a classic most-recent-wins merge over a layered set of deltas, the same shape as a log-structured merge or a stack of filesystem overlay layers.

The design space PostgreSQL chose within:

Summary granularity — file, segment, or block? PostgreSQL summarizes at block granularity but stores summaries per WAL LSN range, decoupled from any particular backup.
Who does the diffing — the backup client, or the server? PostgreSQL does it on the server, because only the server has the WAL summaries and the timeline history.
Where reconstruction happens — at backup time (synthetic full backups) or at restore time? PostgreSQL defers it to restore time via a separate frontend tool, pg_combinebackup, keeping the server side purely about producing deltas.

Common DBMS Design

Incremental and differential backup is old technology; nearly every serious DBMS and storage system has a version of it, and they converge on a small set of structural choices. Naming them makes PostgreSQL’s specific symbols read as one set of points in a shared design space.

Incremental vs. differential vs. cumulative

The vocabulary is shared across the industry:

A differential (or cumulative) backup records all changes since the last full backup. Restoration needs exactly two backups: the full plus the latest differential. The differential grows over time.
An incremental backup records changes since the last backup of any kind. Restoration needs the full chain. Each increment is small, but the chain can be long.

PostgreSQL’s mechanism is genuinely incremental: a BASE_BACKUP ... INCREMENTAL is taken relative to whatever prior backup’s manifest you upload, and that prior backup may itself have been incremental. The chain length is bounded only by how often you take a full backup.

A change-tracking side structure

Systems that diff from the log rather than from the data files all maintain some persistent side structure that records “blocks dirtied in this log window”:

SQL Server keeps a Differential Changed Map (DCM), one bit per extent, updated as pages are written; differential backups read the DCM.
Oracle RMAN maintains a block change tracking file recording changed blocks since the last backup, so an incremental backup need not scan every datafile.
Db2 uses incremental backup driven by a tracking bitmap as well.

PostgreSQL’s analog is the set of WAL summary files under pg_wal/summaries/, produced by a dedicated WAL summarizer background worker (covered in the sibling note; here we only consume its output). The crucial PostgreSQL-specific twist is that the summaries are keyed by LSN range, not by backup — a summary covering 0/10000000 to 0/20000000 is reusable for any incremental backup whose chain crosses that window. The side structure is decoupled from the backup schedule.

A self-describing manifest

Every modern backup format ships a manifest: a list of files with sizes, checksums, and the WAL range needed for consistency. The manifest serves three roles — verification (pg_verifybackup), incremental referencing (it tells the next incremental backup what the prior state was), and reconstruction (it lets the combiner reuse stored checksums). PostgreSQL’s backup_manifest is a JSON document; for incremental backups the client uploads the prior manifest to the server before requesting the backup.

Reconstruction as a layered merge

Whether reconstruction happens eagerly (synthesizing a full backup at backup time, as some enterprise tools do) or lazily (at restore time), the algorithm is the same most-recent-wins overlay: for each block, scan the backup chain from newest to oldest and take the first copy you find; if no backup in the chain has the block but the block is below the file’s truncation point, it must be a zero-filled hole.

flowchart TD
  subgraph Theory["Change-tracking backup, abstractly"]
    A["Authoritative change log<br/>(WAL)"] --> B["Summarizer:<br/>distill changed (rel,fork,block)<br/>per LSN range"]
    B --> C["Persistent side structure<br/>(WAL summary files)"]
    C --> D["Backup time:<br/>diff = blocks changed since<br/>reference backup"]
    D --> E["Incremental backup<br/>(deltas + manifest)"]
    E --> F["Restore time:<br/>layered most-recent-wins merge<br/>over the backup chain"]
    F --> G["Full synthetic data directory"]
  end

PostgreSQL’s Approach

PostgreSQL splits incremental backup across four cooperating pieces. The boundary lines matter, because the task scope here is the core backup mechanism — the first three live in the server, the fourth is a standalone frontend tool:

WAL summarizer (a background worker, walsummarizer.c) continuously reads WAL as it is generated and writes block-level summary files into pg_wal/summaries/. This runs whether or not anyone ever takes an incremental backup, gated by summarize_wal = on. Out of scope here — see the sibling note on archiving/WAL summarization.
Manifest ingestion (basebackup_incremental.c + UploadManifest in walsender.c). Before requesting an incremental backup, the client sends UPLOAD_MANIFEST and streams the prior backup’s backup_manifest. The server parses it incrementally into an IncrementalBackupInfo object.
The incremental BASE_BACKUP (basebackup.c + PrepareForIncrementalBackup / GetFileBackupMethod). The server loads the WAL summaries spanning the prior backup’s start through the current backup’s start, merges them into one BlockRefTable, then for each relation file decides — full, incremental, or stub — and streams the chosen bytes.
Reconstruction (pg_combinebackup, a frontend tool under src/bin/). Given a chain of backup directories, it produces a single full data directory by merging blocks newest-to-oldest. The server is never involved.

Wiring the protocol: UPLOAD_MANIFEST then BASE_BACKUP

The client (pg_basebackup --incremental=PRIOR/backup_manifest) first runs UPLOAD_MANIFEST. The walsender allocates an IncrementalBackupInfo in a dedicated memory context, then loops over CopyData packets feeding manifest bytes in:

// UploadManifest — src/backend/replication/walsender.c
mcxt = AllocSetContextCreate(CurrentMemoryContext,
                             "incremental backup information",
                             ALLOCSET_DEFAULT_SIZES);
ib = CreateIncrementalBackupInfo(mcxt);
/* ... send CopyInResponse ... */
while (HandleUploadManifestPacket(&buf, &offset, ib))
    ;
FinalizeIncrementalManifest(ib);
/* preserve ib across the later BASE_BACKUP in CacheMemoryContext */
MemoryContextSetParent(mcxt, CacheMemoryContext);
uploaded_manifest = ib;

The parsed-and-retained uploaded_manifest is handed to perform_base_backup when the subsequent BASE_BACKUP ... INCREMENTAL arrives. The incremental option itself is validated at parse time and refuses to proceed unless WAL summarization is enabled — there can be no summaries to diff against otherwise:

// parse_basebackup_options — src/backend/backup/basebackup.c
else if (strcmp(defel->defname, "incremental") == 0)
{
    opt->incremental = defGetBoolean(defel);
    if (opt->incremental && !summarize_wal)
        ereport(ERROR,
                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                 errmsg("incremental backups cannot be taken unless WAL summarization is enabled")));
}

Parsing the manifest incrementally

A manifest for a large cluster can be many megabytes (one entry per file). CreateIncrementalBackupInfo wires up a streaming JSON parser with callbacks, and seeds a simplehash of file entries sized for a realistic cluster:

// CreateIncrementalBackupInfo — src/backend/backup/basebackup_incremental.c
ib->manifest_files = backup_file_create(mcxt, 10000, NULL);
context = palloc0(sizeof(JsonManifestParseContext));
context->private_data = ib;
context->version_cb = manifest_process_version;
context->system_identifier_cb = manifest_process_system_identifier;
context->per_file_cb = manifest_process_file;
context->per_wal_range_cb = manifest_process_wal_range;
context->error_cb = manifest_report_error;
ib->inc_state = json_parse_manifest_incremental_init(context);

The per-WAL-range callback is the load-bearing one for incremental logic: it records the (tli, start_lsn, end_lsn) of each WAL range the prior backup needs. The per-file callback retains only path and size — used for sanity checks, not for deciding what changed (that comes from the WAL summaries):

// manifest_process_wal_range — basebackup_incremental.c
range->tli = tli;
range->start_lsn = start_lsn;
range->end_lsn = end_lsn;
ib->manifest_wal_ranges = lappend(ib->manifest_wal_ranges, range);

The streaming design (AppendIncrementalManifestData triggers a parse step whenever the buffer is about to exceed MAX_CHUNK = 128 KiB, keeping the last MIN_CHUNK = 1 KiB so the trailing checksum line stays intact) exists precisely so the whole manifest never has to be buffered at once.

Deciding which blocks are needed

PrepareForIncrementalBackup is the heart of the server side. It does five things in order: (1) validate the manifest’s WAL ranges against this server’s timeline history; (2) sanity-check the LSN boundaries; (3) wait for the WAL summarizer to catch up; (4) gather the WAL summary files covering the range of interest; (5) merge them into one BlockRefTable.

The timeline matching guards against taking an incremental backup relative to a backup that does not actually represent a prior state of this server:

// PrepareForIncrementalBackup — basebackup_incremental.c
expectedTLEs = readTimeLineHistory(backup_state->starttli);
/* ... match each manifest WAL range's TLI into expectedTLEs ... */
if (tlep[i] == NULL)
    ereport(ERROR,
            (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
             errmsg("timeline %u found in manifest, but not in this server's history",
                    range->tli)));

Once the LSN window is known, it blocks until summarization reaches the backup’s start point, then collects and filters the summary files, and finally streams each summary’s block references into the in-memory table:

// PrepareForIncrementalBackup — basebackup_incremental.c (summary merge)
WaitForWalSummarization(backup_state->startpoint);
all_wslist = GetWalSummaries(0, earliest_wal_range_start_lsn,
                             backup_state->startpoint);
/* ... per-timeline FilterWalSummaries + WalSummariesAreComplete check ... */
ib->brtab = CreateEmptyBlockRefTable();
foreach(lc, required_wslist)
{
    /* open summary, read each relation fork ... */
    while (BlockRefTableReaderNextRelation(reader, &rlocator, &forknum,
                                           &limit_block))
    {
        BlockRefTableSetLimitBlock(ib->brtab, &rlocator, forknum, limit_block);
        while ((nblocks = BlockRefTableReaderGetBlocks(reader, blocks,
                                                       BLOCKS_PER_READ)) != 0)
            for (i = 0; i < nblocks; ++i)
                BlockRefTableMarkBlockModified(ib->brtab, &rlocator,
                                               forknum, blocks[i]);
    }
}

If any summary is missing for a required LSN range, the backup fails loudly rather than silently producing an unsafe delta — the whole point is that the change set must be complete.

flowchart TD
  C1["pg_basebackup --incremental=PRIOR/backup_manifest"] --> C2["UPLOAD_MANIFEST<br/>(stream prior manifest)"]
  C2 --> S1["CreateIncrementalBackupInfo<br/>+ streaming JSON parse"]
  S1 --> C3["BASE_BACKUP ... INCREMENTAL"]
  C3 --> S2["PrepareForIncrementalBackup"]
  S2 --> S2a["match WAL ranges vs<br/>timeline history"]
  S2a --> S2b["WaitForWalSummarization<br/>(start LSN)"]
  S2b --> S2c["GetWalSummaries +<br/>FilterWalSummaries +<br/>WalSummariesAreComplete"]
  S2c --> S2d["merge into one<br/>BlockRefTable (ib->brtab)"]
  S2d --> S3["sendDir: for each relation file<br/>GetFileBackupMethod"]
  S3 --> M1["BACK_UP_FILE_FULLY"]
  S3 --> M2["BACK_UP_FILE_INCREMENTALLY<br/>(header + changed blocks)"]
  M1 --> S4["sendFile streams bytes"]
  M2 --> S4
  S4 --> OUT["base.tar + backup_manifest<br/>(this backup)"]

Source Walkthrough

This section walks the server-side code top-down — manifest ingestion, the per-file decision, the on-the-wire incremental format — and then crosses into the pg_combinebackup frontend for reconstruction. Symbols are the durable anchors; the position-hint table at the end pins each to a (file, line) as observed on 2026-06-05.

Manifest ingestion: `CreateIncrementalBackupInfo` and the streaming parser

UploadManifest (in walsender.c) is the protocol entry point. It creates an IncrementalBackupInfo, then drives the streaming JSON parser one CopyData packet at a time and finalizes. The object is reparented into CacheMemoryContext so it survives until the matching BASE_BACKUP:

// UploadManifest — src/backend/replication/walsender.c
mcxt = AllocSetContextCreate(CurrentMemoryContext,
                             "incremental backup information",
                             ALLOCSET_DEFAULT_SIZES);
ib = CreateIncrementalBackupInfo(mcxt);
/* ... CopyInResponse, then loop feeding bytes ... */
while (HandleUploadManifestPacket(&buf, &offset, ib))
    ;
FinalizeIncrementalManifest(ib);
MemoryContextSetParent(mcxt, CacheMemoryContext);
uploaded_manifest = ib;

The buffering discipline in AppendIncrementalManifestData is what keeps an arbitrarily large manifest from being held in memory all at once. It parses incrementally whenever the accumulated buffer would cross MAX_CHUNK, always retaining the last MIN_CHUNK bytes so a checksum line straddling a chunk boundary is never split mid-token:

// AppendIncrementalManifestData — src/backend/backup/basebackup_incremental.c
if (ib->buf.len > MIN_CHUNK && ib->buf.len + len > MAX_CHUNK)
{
    /* Parse all but the last MIN_CHUNK bytes of data we have so far. */
    json_parse_manifest_incremental_chunk(
        ib->inc_state, ib->buf.data, ib->buf.len - MIN_CHUNK, false);
    /* Now shift the data that hasn't yet been parsed to the start of
     * the buffer. */
    memmove(ib->buf.data, ib->buf.data + (ib->buf.len - MIN_CHUNK),
            MIN_CHUNK + 1);
    ib->buf.len = MIN_CHUNK;
}

The only callback whose output drives the diff is manifest_process_wal_range — the per-file callback merely records path and size for the existence checks in GetFileBackupMethod. The WAL ranges tell the server which timeline/LSN windows the prior backup spans, which is exactly what PrepareForIncrementalBackup later validates against this server’s own timeline history.

The per-file decision: `GetFileBackupMethod`

This is the function that turns “what changed” into “how do I send this file.” It is called once per relation segment from sendDir in basebackup.c. The early-out ladder is worth reading verbatim, because the order of the bail-outs encodes correctness constraints, not just optimizations.

First, two unconditional full-backup cases — a malformed size, and the free-space map fork, which is not WAL-logged and therefore cannot be reconstructed from WAL summaries:

// GetFileBackupMethod — src/backend/backup/basebackup_incremental.c
if ((size % BLCKSZ) != 0 || size / BLCKSZ > RELSEG_SIZE)
    return BACK_UP_FILE_FULLY;

/*
 * The free-space map fork is not properly WAL-logged, so we need to
 * backup the entire file every time.
 */
if (forknum == FSM_FORKNUM)
    return BACK_UP_FILE_FULLY;

Second, the “did this file exist in the prior backup?” guard. A file the prior manifest never mentioned cannot be sent as a delta against it — and critically, a file created after the current backup started will have no WAL summary coverage, so it too must be sent fully. The code probes both the plain path and the INCREMENTAL.* path (the prior backup may itself have stored this segment incrementally):

// GetFileBackupMethod — basebackup_incremental.c
if (backup_file_lookup(ib->manifest_files, path) == NULL)
{
    char       *ipath;

    ipath = GetIncrementalFilePath(dboid, spcoid, relfilenumber,
                                   forknum, segno);
    if (backup_file_lookup(ib->manifest_files, ipath) == NULL)
        return BACK_UP_FILE_FULLY;
}

Third, the actual BlockRefTable lookup. A missing entry means the WAL recorded no changes to this relation fork — so the file can be sent as a zero-block incremental stub (header only). A present entry yields the set of changed blocks plus a limit_block marking truncation:

// GetFileBackupMethod — basebackup_incremental.c
brtentry = BlockRefTableGetEntry(ib->brtab, &rlocator, forknum,
                                 &limit_block);
if (brtentry == NULL)
{
    if (size == 0)
        return BACK_UP_FILE_FULLY;
    *num_blocks_required = 0;
    *truncation_block_length = size / BLCKSZ;
    return BACK_UP_FILE_INCREMENTALLY;     /* a header-only stub */
}

Finally, the 90 % heuristic. BlockRefTableEntryGetBlocks fills the caller’s array with the absolute block numbers in this segment’s range; if that count would be more than 90 % of the file, the incremental encoding saves nothing and the file is sent fully. Otherwise the block numbers are sorted and rebased to segment-relative form, and truncation_block_length is clamped against the limit block and the segment size:

// GetFileBackupMethod — basebackup_incremental.c
nblocks = BlockRefTableEntryGetBlocks(brtentry, start_blkno, stop_blkno,
                                      relative_block_numbers, RELSEG_SIZE);
/* If we'd need to send 90% of the blocks anyway, send the whole file. */
if (nblocks * BLCKSZ > size * 0.9)
    return BACK_UP_FILE_FULLY;

qsort(relative_block_numbers, nblocks, sizeof(BlockNumber),
      compare_block_numbers);
if (start_blkno != 0)
    for (i = 0; i < nblocks; ++i)
        relative_block_numbers[i] -= start_blkno;
*num_blocks_required = nblocks;
*truncation_block_length = size / BLCKSZ;
/* ... clamp truncation_block_length to [relative_limit, RELSEG_SIZE] ... */
return BACK_UP_FILE_INCREMENTALLY;

sendDir consumes the return value: on BACK_UP_FILE_INCREMENTALLY it rewrites the tar member name to INCREMENTAL.<name> and shrinks statbuf.st_size to the incremental file’s size before calling sendFile:

// sendDir (incremental branch) — src/backend/backup/basebackup.c
method = GetFileBackupMethod(ib, lookup_path, dboid, relspcoid,
                             relfilenumber, relForkNum,
                             segno, statbuf.st_size,
                             &num_blocks_required,
                             relative_block_numbers,
                             &truncation_block_length);
if (method == BACK_UP_FILE_INCREMENTALLY)
{
    statbuf.st_size = GetIncrementalFileSize(num_blocks_required);
    snprintf(tarfilenamebuf, sizeof(tarfilenamebuf), "%s/INCREMENTAL.%s",
             path + basepathlen + 1, de->d_name);
    tarfilename = tarfilenamebuf;
}

flowchart TD
  G0["GetFileBackupMethod(path, segno, size)"] --> G1{"size not a BLCKSZ multiple<br/>or &gt; RELSEG_SIZE?"}
  G1 -- yes --> FULL["BACK_UP_FILE_FULLY"]
  G1 -- no --> G2{"forknum == FSM_FORKNUM?"}
  G2 -- yes --> FULL
  G2 -- no --> G3{"path in prior manifest<br/>(plain or INCREMENTAL.*)?"}
  G3 -- no --> FULL
  G3 -- yes --> G4{"BlockRefTable entry<br/>for this relfilenode?"}
  G4 -- "none (no WAL changes)" --> STUB["INCREMENTALLY<br/>num_blocks = 0 (stub)"]
  G4 -- present --> G5["GetBlocks → nblocks"]
  G5 --> G6{"nblocks*BLCKSZ &gt; 0.9*size?"}
  G6 -- yes --> FULL
  G6 -- no --> INC["INCREMENTALLY<br/>header + changed blocks"]

The on-the-wire incremental format: `sendFile`

When incremental_blocks != NULL, sendFile writes a header before any block data. The header is three little-endian uint32s — INCREMENTAL_MAGIC (0xd3ae1f0d), the block count, and the truncation block length — followed by the array of relative block numbers, optionally padded so block data starts on a BLCKSZ boundary:

// sendFile (incremental header) — src/backend/backup/basebackup.c
if (incremental_blocks != NULL)
{
    unsigned    magic = INCREMENTAL_MAGIC;
    size_t      header_bytes_done = 0;

    push_to_sink(sink, &checksum_ctx, &header_bytes_done,
                 &magic, sizeof(magic));
    push_to_sink(sink, &checksum_ctx, &header_bytes_done,
                 &num_incremental_blocks, sizeof(num_incremental_blocks));
    push_to_sink(sink, &checksum_ctx, &header_bytes_done,
                 &truncation_block_length, sizeof(truncation_block_length));
    push_to_sink(sink, &checksum_ctx, &header_bytes_done,
                 incremental_blocks,
                 sizeof(BlockNumber) * num_incremental_blocks);
    /* ... pad to BLCKSZ if num_incremental_blocks > 0 ... */
}

The data loop then reads exactly the listed blocks, one at a time, seeking to relative_blkno * BLCKSZ for each. A short read mid-block is treated as a concurrent truncation and ends the loop — WAL replay during restore will fix up the tail:

// sendFile (incremental data loop) — src/backend/backup/basebackup.c
relative_blkno = incremental_blocks[ibindex++];
cnt = read_file_data_into_buffer(sink, readfilename, fd,
                                 relative_blkno * BLCKSZ,   /* seek */
                                 BLCKSZ,                    /* one block */
                                 relative_blkno + segno * RELSEG_SIZE,
                                 verify_checksum,
                                 &checksum_failures);
if (cnt < BLCKSZ)
    break;     /* transient truncation; WAL replay will fix it */

Inspecting summaries from SQL: `walsummaryfuncs.c`

The same BlockRefTable reader the backup path uses is exposed to SQL for diagnostics. pg_available_wal_summaries lists the files; pg_wal_summary_contents opens one file and emits one row per (relfilenode, fork, block) — plus a synthetic limit_block row marking a truncation point:

// pg_wal_summary_contents — src/backend/backup/walsummaryfuncs.c
io.file = OpenWalSummaryFile(&ws, false);
reader = CreateBlockRefTableReader(ReadWalSummary, &io,
                                   FilePathName(io.file),
                                   ReportWalSummaryError, NULL);
while (BlockRefTableReaderNextRelation(reader, &rlocator, &forknum,
                                       &limit_block))
{
    /* emit limit_block row if BlockNumberIsValid(limit_block) ... */
    /* then loop over blocks, MAX_BLOCKS_PER_CALL at a time ... */
}

This is the user-visible window onto exactly the data PrepareForIncrementalBackup merges into ib->brtab, which makes it the go-to tool for answering “why was this file sent fully?”

Reconstruction: `pg_combinebackup` and `reconstruct_from_incremental_file`

The frontend tool is where an incremental backup becomes restorable. For each output file, reconstruct_from_incremental_file reads the newest incremental file’s header to learn the reconstructed length, then builds a sourcemap (which file holds each block) and an offsetmap (at what byte offset). Blocks present in the newest file win outright:

// reconstruct_from_incremental_file — src/bin/pg_combinebackup/reconstruct.c
latest_source = make_incremental_rfile(input_filename);
source[n_prior_backups] = latest_source;
block_length = find_reconstructed_block_length(latest_source);
sourcemap = pg_malloc0(sizeof(rfile *) * block_length);
offsetmap = pg_malloc0(sizeof(off_t) * block_length);

for (i = 0; i < latest_source->num_blocks; ++i)
{
    BlockNumber b = latest_source->relative_block_numbers[i];
    sourcemap[b] = latest_source;
    offsetmap[b] = latest_source->header_length + (i * BLCKSZ);
}

make_incremental_rfile is the reader for the format sendFile wrote — it validates INCREMENTAL_MAGIC and reconstructs header_length identically to the server’s GetIncrementalHeaderSize, which is how the two halves of the feature stay byte-compatible:

// make_incremental_rfile — src/bin/pg_combinebackup/reconstruct.c
read_bytes(rf, &magic, sizeof(magic));
if (magic != INCREMENTAL_MAGIC)
    pg_fatal("file \"%s\" has bad incremental magic number (0x%x, expected 0x%x)",
             filename, magic, INCREMENTAL_MAGIC);
read_bytes(rf, &rf->num_blocks, sizeof(rf->num_blocks));
read_bytes(rf, &rf->truncation_block_length,
           sizeof(rf->truncation_block_length));
/* ... read relative_block_numbers[] ... */
rf->header_length = sizeof(magic) + sizeof(rf->num_blocks) +
    sizeof(rf->truncation_block_length) +
    sizeof(BlockNumber) * rf->num_blocks;

The chain walk descends from newest to oldest prior backup. The moment it hits a full copy of the file, it fills every still-unassigned block below truncation_block_length from that full file and stops — older backups cannot override a block already claimed by a newer layer (most-recent-wins):

// reconstruct_from_incremental_file (full-file source) — reconstruct.c
blocklength = sb.st_size / BLCKSZ;
for (b = 0; b < latest_source->truncation_block_length; ++b)
{
    if (sourcemap[b] == NULL && b < blocklength)
    {
        sourcemap[b] = s;        /* fill from the full file */
        offsetmap[b] = b * BLCKSZ;
    }
}
/* ... then break: no older source can override these ... */

Any block still NULL after the walk but below the truncation length is a zero-filled hole — a block the server extended into existence but never WAL-logged, which write_reconstructed_file materializes as zeroes.

flowchart TD
  R0["reconstruct_from_incremental_file(output file)"] --> R1["make_incremental_rfile(newest):<br/>magic, num_blocks,<br/>truncation_block_length"]
  R1 --> R2["block_length =<br/>find_reconstructed_block_length"]
  R2 --> R3["claim blocks present in<br/>newest incremental → sourcemap"]
  R3 --> R4{"walk prior backups<br/>newest → oldest"}
  R4 -- "incremental layer" --> R5["claim still-unfilled blocks<br/>below truncation length"]
  R5 --> R4
  R4 -- "full file found" --> R6["fill remaining blocks<br/>from full file, then stop"]
  R6 --> R7["any still-NULL block<br/>below truncation = zero hole"]
  R7 --> R8["write_reconstructed_file:<br/>copy each block from its source"]

Position hints (as of 2026-06-05, REL_18 273fe94)

Symbol	File	Line
`INCREMENTAL_MAGIC`	`src/include/backup/basebackup_incremental.h`	20
`MIN_CHUNK` / `MAX_CHUNK`	`basebackup_incremental.c`	39
`manifest_process_wal_range`	`basebackup_incremental.c`	138
`CreateIncrementalBackupInfo`	`basebackup_incremental.c`	152
`AppendIncrementalManifestData`	`basebackup_incremental.c`	194
`FinalizeIncrementalManifest`	`basebackup_incremental.c`	227
`PrepareForIncrementalBackup`	`basebackup_incremental.c`	263
`GetIncrementalFilePath`	`basebackup_incremental.c`	625
`GetFileBackupMethod`	`basebackup_incremental.c`	663
`GetIncrementalHeaderSize`	`basebackup_incremental.c`	881
`GetIncrementalFileSize`	`basebackup_incremental.c`	909
`parse_basebackup_options`	`basebackup.c`	698
`perform_base_backup`	`basebackup.c`	234
`sendFile`	`basebackup.c`	1573
`read_file_data_into_buffer`	`basebackup.c`	96
`push_to_sink`	`basebackup.c`	102
`pg_available_wal_summaries`	`walsummaryfuncs.c`	32
`pg_wal_summary_contents`	`walsummaryfuncs.c`	69
`UploadManifest`	`src/backend/replication/walsender.c`	667
`HandleUploadManifestPacket`	`walsender.c`	733
`reconstruct_from_incremental_file`	`src/bin/pg_combinebackup/reconstruct.c`	88
`make_incremental_rfile`	`reconstruct.c`	456
`find_reconstructed_block_length`	`reconstruct.c`	439

Source verification (as of 2026-06-05)

Verified facts

The incremental file header is INCREMENTAL_MAGIC (0xd3ae1f0d), a block count, a truncation block length, and the relative block-number array. Verified in sendFile (server writer, basebackup.c) and make_incremental_rfile (frontend reader, reconstruct.c) — both compute header_length the same way, and the magic constant is defined once in src/include/backup/basebackup_incremental.h.
The free-space map fork is always backed up fully. Verified in GetFileBackupMethod: if (forknum == FSM_FORKNUM) return BACK_UP_FILE_FULLY;, with the in-source comment “The free-space map fork is not properly WAL-logged.” A relation whose FSM changed thus always ships its FSM in full even when the main fork ships incrementally.
A relation with no WAL-logged changes is sent as a header-only stub, not skipped. Verified in GetFileBackupMethod: when BlockRefTableGetEntry returns NULL and the file is non-empty, it returns BACK_UP_FILE_INCREMENTALLY with *num_blocks_required = 0. The stub is what tells pg_combinebackup “this file is unchanged — take it whole from the prior backup.”
The 90 % threshold downgrades an incremental file to a full one. Verified in GetFileBackupMethod: if (nblocks * BLCKSZ > size * 0.9) return BACK_UP_FILE_FULLY;. The in-source comment notes the threshold is not configurable and is a deliberate “don’t bother” guard, and that a file where every block changed is always sent fully.
Incremental backups are refused unless summarize_wal is on. Verified in parse_basebackup_options: the incremental option errors with ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE (“incremental backups cannot be taken unless WAL summarization is enabled”) when summarize_wal is off.
The manifest is parsed incrementally, retaining the trailing MIN_CHUNK bytes. Verified in AppendIncrementalManifestData: it triggers json_parse_manifest_incremental_chunk once the buffer would exceed MAX_CHUNK = 128 KiB, then memmoves the last MIN_CHUNK = 1 KiB to the buffer head. This keeps the closing checksum line intact across chunk boundaries.
PrepareForIncrementalBackup waits for summarization and fails if any required summary is missing. Verified: WaitForWalSummarization(backup_ state->startpoint) blocks until the summarizer reaches the backup start LSN, then GetWalSummaries + per-timeline FilterWalSummaries collect the files and a completeness check errors out on a gap, rather than producing an unsafe partial delta.
Reconstruction is most-recent-wins over the backup chain, with zero-fill for holes. Verified in reconstruct_from_incremental_file: the newest incremental claims its blocks first; the walk fills still-unassigned blocks from older layers and stops at the first full file; blocks left NULL below truncation_block_length are treated as zero-filled extensions.

Cross-references / deferred

WAL summarizer internals (the walsummarizer.c background worker, the pg_wal/summaries/ file format, BlockRefTable on-disk encoding) are out of scope here and live in postgres-archiving-walsummary.md. This doc treats the summaries purely as an input consumed by PrepareForIncrementalBackup.
The full (non-incremental) BASE_BACKUP flow — the bbsink pipeline, perform_base_backup, tar framing, compression sinks, and the backup manifest writer — is covered in postgres-backup-basebackup.md. Here we only touch the incremental-specific branches of sendDir/sendFile.
WAL itself as the source of truth (rmgr dispatch, record format, redo) is in postgres-wal-records-rmgr.md and postgres-recovery-redo.md.
Not verified beyond reading the code: the actual end-to-end restore of a multi-link chain (pg_combinebackup producing a startable cluster) was read, not executed, for this revision.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

PostgreSQL’s design lands at a specific, defensible point in the incremental-backup design space, and the contrasts are instructive.

Block change tracking: bitmap-at-write vs. summarize-from-log. Oracle RMAN’s block change tracking file and SQL Server’s Differential Changed Map are updated eagerly, in the write path, as pages are dirtied. That makes the differential read cheap but imposes a small steady tax on every write and couples the tracking structure to the live datafiles. PostgreSQL instead derives the change set lazily, after the fact, by summarizing WAL that it was already writing for durability. The cost moves off the hot write path and onto a background worker, and the tracking artifact (WAL summary files) is fully decoupled from both the datafiles and the backup schedule — a summary for an LSN window is reusable by any backup whose chain crosses it. The trade is latency: you cannot take an incremental backup until the summarizer has caught up to the start LSN, which is exactly why PrepareForIncrementalBackup must WaitForWalSummarization.

Eager vs. lazy reconstruction. Many enterprise tools (and Oracle’s incrementally-updated backups) roll forward a synthetic full image at backup time, so a restore is always a single full copy. PostgreSQL defers reconstruction entirely to restore time in pg_combinebackup. This keeps the server side purely a delta producer — no read amplification on the primary to maintain a synthetic full — at the cost of a restore-time merge whose work is proportional to chain length. The most-recent-wins overlay in reconstruct_from_incremental_file is structurally identical to reading a key through the levels of an LSM tree, or resolving a file through a stack of overlay-filesystem layers: newest layer that has the block wins, and a “tombstone” (here, the truncation_block_length clamp) bounds what older layers may contribute.

Granularity and the WAL-logging assumption. Block-granular tracking is only sound for data that is fully WAL-logged; the FSM exception (forknum == FSM_FORKNUM → full backup) is the visible seam where that assumption breaks. A research-frontier question is whether unlogged or minimally-logged objects could be incrementally captured via a different channel; today PostgreSQL simply sends them whole. Relatedly, the file-level “created after the prior backup” guard shows how change-tracking from the log must be defended against the namespace changing underneath it (files deleted and recreated under the same name), not just block contents.

Where the literature sits. The conceptual backbone is ARIES (Mohan et al., 1992): the WAL is an authoritative, ordered, replayable record of every page change, which is precisely what makes “enumerate what changed between two LSNs” a well-posed question. Database Internals (Petrov, 2019) frames the WAL as the single source of truth that recovery, replication, and now backup-diffing all consume. Incremental backup is best understood as a third consumer of redo information, parallel to crash recovery (which applies the changes) and physical replication (which streams them) — here the log is replayed not to apply or ship changes but to enumerate them. The active frontiers are largely operational: bounding chain length automatically, parallelizing pg_combinebackup, and integrating incremental physical backup with delta-aware object storage so that the on-disk delta and the archived delta are the same bytes.

Sources

PostgreSQL source (REL_18_STABLE, commit 273fe94, 2026-06-05):
- src/backend/backup/basebackup_incremental.c — manifest ingestion (CreateIncrementalBackupInfo, AppendIncrementalManifestData), PrepareForIncrementalBackup, and the per-file decision GetFileBackupMethod.
- src/backend/backup/basebackup.c — parse_basebackup_options (the summarize_wal precondition), sendDir’s incremental branch, and sendFile’s incremental header + block-seek loop.
- src/backend/backup/walsummaryfuncs.c — the SQL-callable pg_available_wal_summaries / pg_wal_summary_contents diagnostics.
- src/backend/replication/walsender.c — UploadManifest / HandleUploadManifestPacket (the UPLOAD_MANIFEST protocol).
- src/bin/pg_combinebackup/reconstruct.c — reconstruct_from_incremental_file, make_incremental_rfile, the sourcemap/offsetmap construction and most-recent-wins chain walk.
- src/include/backup/basebackup_incremental.h — INCREMENTAL_MAGIC, FileBackupMethod, and the server-side declarations.
Theory / textbook anchors:
- C. Mohan et al., ARIES: A Transaction Recovery Method… (1992) — WAL as the authoritative, replayable record of change. See knowledge/research/dbms-general/ and the bibliography plan dbms-papers/aries.md.
- A. Petrov, Database Internals (2019), ch. 3 and the recovery discussion — WAL as the single source of truth across recovery, replication, and backup. Captured in knowledge/research/dbms-general/database-internals.md.
Sibling code-analysis docs (this tree): postgres-archiving-walsummary.md (the WAL summarizer that produces the summaries), postgres-backup-basebackup.md (the full backup pipeline and bbsink chain), postgres-wal-records-rmgr.md and postgres-recovery-redo.md (WAL as the underlying change record).