Skip to content

pg_basebackup — Driving BASE_BACKUP, Parallel WAL Streaming, Output Formats, and the pg_receivewal / pg_recvlogical Family

Contents:

A physical base backup is a byte-for-byte copy of a running database cluster’s on-disk state, taken without stopping the server, suitable as the starting point for either a fresh replica or a point-in-time recovery (PITR) restore. The classical problem it solves is the fuzzy snapshot problem: the files of a live cluster are mutating while you copy them, so a naive cp -r produces a torn, internally inconsistent image. The textbook resolution (Database System Concepts, Silberschatz et al., recovery chapter; Database Internals, Petrov, Part II on log-structured recovery) is to bracket the file copy with a backup-start and backup-stop marker recorded in the write-ahead log (WAL), and then to require that all WAL generated between those two markers be replayed during restore. The copy itself may be torn; WAL replay heals it, because every page modification during the backup window is also in the log. This is the same redo-from-a-checkpoint argument that underlies crash recovery, generalised to “recover from an arbitrarily torn file image plus its concurrent log.”

Three properties define the design space for the tool that performs such a backup:

  1. Where does the data come from? A backup tool can read the data directory directly from a shared filesystem (filesystem-level backup), or it can ask the server to stream the bytes to it over a network protocol (streaming backup). The streaming model decouples the backup client from the server’s filesystem entirely — the client needs only a TCP connection and replication privileges, never local disk access to $PGDATA.

  2. How is the backup window’s WAL captured? The torn image is only useful if you also have every WAL record between start and stop. There are two strategies: fetch the needed segments at the end (a single batch after the file copy), or stream WAL continuously in parallel with the file copy. Parallel streaming bounds the amount of WAL the primary must retain and lets a slow, large backup finish without the primary recycling the segments the backup will need.

  3. What is the on-disk shape and durability contract of the output? A backup may be written as an extracted directory tree (ready to start), as one or more tar archives (compact, archival), and may be compressed. The tool must also decide where compression happens — on the server (saving network bandwidth at the cost of server CPU) or on the client (saving client disk at the cost of full network transfer).

PostgreSQL’s pg_basebackup is a streaming backup client: it speaks the replication sub-protocol of the PostgreSQL wire protocol (the same type-byte + length-prefix framing used by ordinary queries, but entered via a replication=true / replication=database startup option — see postgres-wire-protocol.md). The server side of the conversation is the WAL sender (postgres-wal-sender-receiver.md), and the actual snapshot construction lives in the backend’s basebackup.c (postgres-backup-basebackup.md). This document is about the client — how pg_basebackup builds the BASE_BACKUP command, drives the resulting multi-result / COPY conversation, forks a second connection to stream WAL, and shapes the bytes into output files.

Every production RDBMS ships a hot-backup utility, and they converge on the same skeleton even when the mechanism differs:

  • Oracle RMAN brackets datafile copies with ALTER DATABASE BEGIN BACKUP / END BACKUP (or, more commonly now, talks to the server process directly so the fuzzy-copy window is invisible to the user). RMAN tracks backups in a catalog and can do incremental block-level backups by consulting a block-change-tracking file — the conceptual ancestor of PostgreSQL 17’s incremental backup (postgres-incremental-backup.md).

  • MySQL / Percona XtraBackup copies InnoDB datafiles while concurrently tailing the InnoDB redo log into a separate file, then applies that redo at prepare time. This is structurally identical to PostgreSQL’s “copy files + stream WAL in parallel + replay on restore” model: the redo tail is MySQL’s analogue of the parallel WAL stream.

  • SQL Server uses VDI/VSS snapshots and a backup-log chain; the BACKUP DATABASEWITH DIFFERENTIAL family mirrors the full/incremental distinction.

The shared design vocabulary, then, is: (a) a consistent start/stop marker recorded in the log; (b) a possibly-torn file image; (c) a log tail that spans the backup window; (d) an output container (directory vs. archive, compressed vs. not); and (e) a resumable / continuous WAL archiver that exists independently of the base backup, so that PITR can roll forward past the backup’s stop point. PostgreSQL factors (e) into a separate program, pg_receivewal, rather than folding it into the backup tool — a clean separation that lets an archive of WAL accumulate continuously while base backups are taken occasionally.

A second cross-cutting concern is compression placement. Network-bound deployments want the server to compress before transmission; CPU-bound servers (or backups taken across a fast LAN to a cheap client) want the client to compress. A well-designed tool exposes both and negotiates which the server supports. PostgreSQL 15 introduced server-side compression negotiation, so pg_basebackup must decide at runtime whether a requested compression can run server-side or must fall back to client-side, and must adapt the archive file extension accordingly.

pg_basebackup is a libpq client. Its entire interaction with the server is a sequence of replication commands over one (or two) replication connections. The orchestration lives in BaseBackup(); the connection plumbing and the small replication-command helpers are shared with the sibling tools through streamutil.c.

One backup = one BASE_BACKUP command, built option by option

Section titled “One backup = one BASE_BACKUP command, built option by option”

BaseBackup() first checks the server version, then assembles the BASE_BACKUP command string by appending options to a PQExpBuffer. PG15+ servers accept a parenthesised option syntax (BASE_BACKUP (LABEL 'x', PROGRESS, ...)); older servers use a positional space-separated syntax. A single boolean, use_new_option_syntax, selects which, and the three Append*CommandOption helpers in streamutil.c emit the right separators:

// BaseBackup — src/bin/pg_basebackup/pg_basebackup.c
AppendStringCommandOption(&buf, use_new_option_syntax, "LABEL", label);
if (estimatesize)
AppendPlainCommandOption(&buf, use_new_option_syntax, "PROGRESS");
if (includewal == FETCH_WAL)
AppendPlainCommandOption(&buf, use_new_option_syntax, "WAL");
...
if (compressloc == COMPRESS_LOCATION_SERVER)
{
if (!use_new_option_syntax)
pg_fatal("server does not support server-side compression");
AppendStringCommandOption(&buf, use_new_option_syntax,
"COMPRESSION", compression_algorithm);
...
}
...
if (use_new_option_syntax && buf.len > 0)
basebkp = psprintf("BASE_BACKUP (%s)", buf.data);
else
basebkp = psprintf("BASE_BACKUP %s", buf.data);

The Append*CommandOption helpers guarantee correct punctuation regardless of ordering — AppendPlainCommandOption looks at the last byte already in the buffer to decide whether a leading , (new syntax) or space (old syntax) is needed, and string values are escaped through PQescapeStringConn:

// AppendStringCommandOption — src/bin/pg_basebackup/streamutil.c
AppendPlainCommandOption(buf, use_new_option_syntax, option_name);
if (option_value != NULL)
{
size_t length = strlen(option_value);
char *escaped_value = palloc(1 + 2 * length);
PQescapeStringConn(conn, escaped_value, option_value, length, NULL);
appendPQExpBuffer(buf, " '%s'", escaped_value);
pfree(escaped_value);
}

Once PQsendQuery(conn, basebkp) fires, the server answers with a sequence of result sets, and BaseBackup() consumes them in a fixed order:

  1. A one-row result giving the WAL start LSN and starting timeline. This is the backup-start marker. The checkpoint the server runs before answering is why a non-fast backup can pause here for a long time.
  2. A header result with one row per tablespace (OID, location, size). The sizes feed the progress estimate; the locations drive verify_dir_is_empty_or_create.
  3. The archive payload, delivered as COPY data.
  4. A one-row result giving the WAL stop LSN (the backup-stop marker).
  5. A final CommandComplete that may carry ERRCODE_DATA_CORRUPTED if a checksum verification failed mid-backup.
// BaseBackup — src/bin/pg_basebackup/pg_basebackup.c
if (PQsendQuery(conn, basebkp) == 0)
pg_fatal("could not send replication command \"%s\": %s",
"BASE_BACKUP", PQerrorMessage(conn));
/* Get the starting WAL location */
res = PQgetResult(conn);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
pg_fatal("could not initiate base backup: %s", PQerrorMessage(conn));
...
strlcpy(xlogstart, PQgetvalue(res, 0, 0), sizeof(xlogstart));
...
if (PQnfields(res) >= 2)
starttli = atoi(PQgetvalue(res, 0, 1));

The PG15 protocol change is visible in the dispatch after the header: a v15+ server sends one COPY stream containing every archive (and the manifest) back-to-back, decoded by ReceiveArchiveStream; an older server sends a separate tar COPY per tablespace, decoded by a ReceiveTarFile loop:

// BaseBackup — src/bin/pg_basebackup/pg_basebackup.c
if (serverMajor >= 1500)
{
/* Receive a single tar stream with everything. */
ReceiveArchiveStream(conn, client_compress);
}
else
{
/* Receive a tar file for each tablespace in turn */
for (i = 0; i < PQntuples(res); i++)
ReceiveTarFile(conn, archive_name, spclocation, i, client_compress);
if (!writing_to_stdout && manifest)
ReceiveBackupManifest(conn);
}

Parallel WAL streaming via a forked second connection

Section titled “Parallel WAL streaming via a forked second connection”

The defining architectural choice of pg_basebackup -X stream (the default) is that WAL is captured concurrently with the file copy, on a second replication connection driven by a forked child process (a thread on Windows). The fork happens after the start LSN is known but before the archive payload is read, so that WAL accumulates on the client while the potentially-huge file copy proceeds:

// BaseBackup — src/bin/pg_basebackup/pg_basebackup.c
if (includewal == STREAM_WAL)
{
...
StartLogStreamer(xlogstart, starttli, sysidentifier,
wal_compress_algorithm, wal_compress_level);
}

StartLogStreamer opens a fresh connection with GetConnection(), optionally creates a (temporary) replication slot, rounds the start position down to a segment boundary, creates a Unix pipe for shutdown signalling, and forks:

// StartLogStreamer — src/bin/pg_basebackup/pg_basebackup.c
param->startptr -= XLogSegmentOffset(param->startptr, WalSegSz);
...
if (pipe(bgpipe) < 0)
pg_fatal("could not create pipe for background process: %m");
param->bgconn = GetConnection();
...
if (temp_replication_slot && !replication_slot)
replication_slot = psprintf("pg_basebackup_%u",
(unsigned int) PQbackendPID(param->bgconn));
...
bgchild = fork();
if (bgchild == 0)
exit(LogStreamerMain(param)); /* in child process */

The child runs LogStreamerMain, which packages a StreamCtl and calls the shared ReceiveXlogStream engine (in receivelog.c, covered by postgres-wal-sender-receiver.md). The crucial wiring is the stop predicate and the stop socket: the parent, once it learns the backup’s stop LSN, writes it down the pipe, and the child’s reached_end_position callback reads it and tells ReceiveXlogStream to stop exactly there:

// LogStreamerMain — src/bin/pg_basebackup/pg_basebackup.c
stream.stream_stop = reached_end_position;
#ifndef WIN32
stream.stop_socket = bgpipe[0];
#endif
stream.mark_done = true;
stream.do_sync = false; /* fsync happens at the end of pg_basebackup */
if (format == 'p')
stream.walmethod = CreateWalDirectoryMethod(param->xlog,
PG_COMPRESSION_NONE, 0,
stream.do_sync);
else
stream.walmethod = CreateWalTarMethod(param->xlog,
param->wal_compress_algorithm,
param->wal_compress_level,
stream.do_sync);
if (!ReceiveXlogStream(param->bgconn, &stream))
return 1;

reached_end_position is a non-blocking select() on the pipe: until the parent sends the end LSN, the callback returns “keep going”; once it has the end pointer, it returns true as soon as a streamed segment reaches it:

// reached_end_position — src/bin/pg_basebackup/pg_basebackup.c
r = select(bgpipe[0] + 1, &fds, NULL, NULL, &tv);
if (r == 1)
{
r = read(bgpipe[0], xlogend, sizeof(xlogend) - 1);
...
if (sscanf(xlogend, "%X/%X", &hi, &lo) != 2)
pg_fatal("could not parse write-ahead log location \"%s\"", xlogend);
xlogendptr = ((uint64) hi) << 32 | lo;
has_xlogendptr = 1;
}
else
return false; /* don't know the end yet */
...
if (segendpos >= xlogendptr)
return true;
return false;

The parent, after consuming the stop-LSN result, hands the value to the child and waitpid()s for it:

// BaseBackup — src/bin/pg_basebackup/pg_basebackup.c (background reap)
if (write(bgpipe[1], xlogend, strlen(xlogend)) != strlen(xlogend))
pg_fatal("could not send command to background pipe: %m");
r = waitpid(bgchild, &status, 0);

This whole-cluster fsync deferral (the child sets do_sync = false) is deliberate: rather than fsync each WAL segment as it arrives, the parent flushes the entire basedir once at the very end, via sync_pgdata (plain) or sync_dir_recurse (tar), which is far cheaper.

flowchart TD
    A["main()"] --> B["BaseBackup()"]
    B --> C["build BASE_BACKUP cmd<br/>(Append*CommandOption)"]
    C --> D["PQsendQuery(BASE_BACKUP)"]
    D --> E["result 1: start LSN + TLI"]
    E --> F["result 2: per-tablespace header"]
    F --> G{"includewal == STREAM_WAL?"}
    G -->|yes| H["StartLogStreamer()<br/>fork 2nd connection"]
    H --> H2["child: LogStreamerMain<br/>ReceiveXlogStream()"]
    G -->|no| I
    H --> I{"serverMajor >= 1500?"}
    I -->|yes| J["ReceiveArchiveStream<br/>(single COPY stream)"]
    I -->|no| K["ReceiveTarFile loop<br/>+ ReceiveBackupManifest"]
    J --> L["result 4: stop LSN"]
    K --> L
    L --> M["write stop LSN to bgpipe<br/>waitpid(bgchild)"]
    M --> N["sync_pgdata / sync_dir_recurse<br/>durable_rename manifest"]

The COPY-streamed archive bytes are not written directly. Instead CreateBackupStreamer() builds a chain of astreamer objects — each a filter with a content / finalize / free vtable — and pushes bytes through them. The chain is assembled bottom-up (the final writer first, then wrappers), so the order of astreamer_*_new calls is the reverse of the data flow. Format and compression decisions are encoded purely in how the chain is built:

// CreateBackupStreamer — src/bin/pg_basebackup/pg_basebackup.c
if (compress->algorithm == PG_COMPRESSION_NONE)
streamer = astreamer_plain_writer_new(archive_filename, archive_file);
else if (compress->algorithm == PG_COMPRESSION_GZIP)
{
strlcat(archive_filename, ".gz", sizeof(archive_filename));
streamer = astreamer_gzip_writer_new(archive_filename, archive_file, compress);
}
else if (compress->algorithm == PG_COMPRESSION_LZ4)
{
strlcat(archive_filename, ".lz4", sizeof(archive_filename));
streamer = astreamer_plain_writer_new(archive_filename, archive_file);
streamer = astreamer_lz4_compressor_new(streamer, compress);
}
else if (compress->algorithm == PG_COMPRESSION_ZSTD)
{
strlcat(archive_filename, ".zst", sizeof(archive_filename));
streamer = astreamer_plain_writer_new(archive_filename, archive_file);
streamer = astreamer_zstd_compressor_new(streamer, compress);
}

For plain format (-Fp) the chain instead starts with an extractor that unpacks the tar stream into a directory tree, and — if the server compressed the stream — prepends a decompressor so the client can re-expand before extraction:

// CreateBackupStreamer — src/bin/pg_basebackup/pg_basebackup.c
if (format == 'p')
{
...
streamer = astreamer_extractor_new(directory,
get_tablespace_mapping,
progress_update_filename);
}
...
/* server-compressed archive, but client wants plain: decompress */
if (format == 'p')
{
if (is_tar_gz)
streamer = astreamer_gzip_decompressor_new(streamer);
else if (is_tar_lz4)
streamer = astreamer_lz4_decompressor_new(streamer);
else if (is_tar_zstd)
streamer = astreamer_zstd_decompressor_new(streamer);
}

Two more wrappers are conditionally inserted when the client must understand the tar contents rather than pass them through opaquely: an astreamer_recovery_injector_new (to write standby.signal / primary_conninfo for -R) and an astreamer_tar_parser_new (so the injector can splice files into the tar structure). The boolean must_parse_archive gates all of this — if the client is just writing an opaque compressed tar to disk, none of the parsing machinery is built, and the bytes flow straight to the writer.

The chain is built top-down in code (outermost filter last) but data flows the other way: a CopyData chunk enters at the head and is handed down through each content callback until it reaches the writer at the tail. The diagram below shows two representative chains — client-side zstd to a tar file, and a server-gzip stream extracted to a plain directory:

flowchart LR
    subgraph TAR["-Ft --compress=client-zstd"]
        T0["CopyData chunk"] --> T1["astreamer_tar_archiver<br/>(if must_parse)"]
        T1 --> T2["astreamer_zstd_compressor"]
        T2 --> T3["astreamer_plain_writer<br/>base.tar.zst"]
    end
    subgraph PLAIN["-Fp from server-gzip stream"]
        P0["CopyData chunk"] --> P1["astreamer_gzip_decompressor"]
        P1 --> P2["astreamer_tar_parser"]
        P2 --> P3["astreamer_recovery_injector<br/>(if -R)"]
        P3 --> P4["astreamer_extractor<br/>directory tree"]
    end

When -R is passed, GenerateRecoveryConfig() (in src/fe_utils/recovery_gen.c) reconstructs a primary_conninfo line from the live connection’s parameters, deliberately stripping replication, dbname, and fallback_application_name (which libpqwalreceiver overrides), so the generated standby config reconnects as a normal replica:

// GenerateRecoveryConfig — src/fe_utils/recovery_gen.c
for (PQconninfoOption *opt = connOptions; opt && opt->keyword; opt++)
{
if (strcmp(opt->keyword, "replication") == 0 ||
strcmp(opt->keyword, "dbname") == 0 ||
strcmp(opt->keyword, "fallback_application_name") == 0 ||
(opt->val == NULL) ||
(opt->val != NULL && opt->val[0] == '\0'))
continue;
...
appendPQExpBuffer(&conninfo_buf, "%s=", opt->keyword);
appendConnStrVal(&conninfo_buf, opt->val);
}

The shared replication-connection layer (streamutil.c)

Section titled “The shared replication-connection layer (streamutil.c)”

All three tools open their replication connection through one function, GetConnection(), which merges a connection string, command-line host/user/ port options, and replication-mode defaults. The single most important line is the dbname / replication defaulting that distinguishes a physical replication connection (pg_basebackup, pg_receivewal: dbname=replication, replication=true) from a logical one (pg_recvlogical: replication=database):

// GetConnection — src/bin/pg_basebackup/streamutil.c
keywords[i] = "replication";
values[i] = (dbname == NULL) ? "true" : "database";

RunIdentifySystem() issues IDENTIFY_SYSTEM and parses the system identifier, current timeline, and (for logical) the current flush LSN — the fallback start position when nothing else is known:

// RunIdentifySystem — src/bin/pg_basebackup/streamutil.c
res = PQexec(conn, "IDENTIFY_SYSTEM");
...
if (starttli != NULL)
*starttli = atoi(PQgetvalue(res, 0, 1));
if (startpos != NULL)
{
if (sscanf(PQgetvalue(res, 0, 2), "%X/%X", &hi, &lo) != 2)
...
*startpos = ((uint64) hi) << 32 | lo;
}

CreateReplicationSlot() builds a CREATE_REPLICATION_SLOT command that serves all three tools: PHYSICAL (with optional RESERVE_WAL) for backup / receivewal, LOGICAL <plugin> (with optional TWO_PHASE, FAILOVER) for recvlogical. The same new-vs-old syntax switch as BASE_BACKUP applies.

pg_receivewal — continuous physical WAL archiving with resume

Section titled “pg_receivewal — continuous physical WAL archiving with resume”

pg_receivewal is, in effect, the WAL-streaming half of pg_basebackup run as a standalone, long-lived program: no file copy, just ReceiveXlogStream into a local directory, forever. Its distinctive logic is resume-from-disk: FindStreamingStart() scans the destination directory for the highest-numbered complete WAL segment and resumes from the end of it, so a restarted pg_receivewal does not re-fetch or gap:

// FindStreamingStart — src/bin/pg_basebackup/pg_receivewal.c
while (errno = 0, (dirent = readdir(dir)) != NULL)
{
...
if (!is_xlogfilename(dirent->d_name, &ispartial, &wal_compression_algorithm))
continue;
XLogFromFileName(dirent->d_name, &tli, &segno, WalSegSz);
...
}

StreamLog() ties it together: identify system, then choose a start position in priority order — (1) the local directory’s last complete segment, (2) the replication slot’s restart_lsn (PG15+, via GetSlotInformation), (3) the server’s current flush position as a last resort — round it down to a segment boundary, and stream with mark_done = false (these segments are not handed to an archiver; they are the archive) and a .partial suffix for the in-progress segment:

// StreamLog — src/bin/pg_basebackup/pg_receivewal.c
stream.startpos = FindStreamingStart(&stream.timeline);
if (stream.startpos == InvalidXLogRecPtr)
{
if (replication_slot != NULL && PQserverVersion(conn) >= 150000)
GetSlotInformation(conn, replication_slot, &stream.startpos,
&stream.timeline);
if (stream.startpos == InvalidXLogRecPtr)
{
stream.startpos = serverpos;
stream.timeline = servertli;
}
}
stream.startpos -= XLogSegmentOffset(stream.startpos, WalSegSz);
...
stream.stream_stop = stop_streaming;
stream.mark_done = false;
stream.partial_suffix = ".partial";
ReceiveXlogStream(conn, &stream);

The stop_streaming callback honours --endpos and a SIGINT-set time_to_stop flag, logging timeline switches as it goes — the program is meant to run until killed or until the requested end LSN is reached.

pg_recvlogical — a logical decoding consumer

Section titled “pg_recvlogical — a logical decoding consumer”

pg_recvlogical connects in logical replication mode and runs a START_REPLICATION SLOT ... LOGICAL command, emitting plugin output (e.g. test_decoding, pgoutput) to a file. It does not use the ReceiveXlogStream file-oriented engine; it has its own COPY-both loop in StreamLogicalLog() that decodes the 'w' (WAL data) and 'k' (keepalive) CopyData messages and periodically reports output_written_lsn / output_fsync_lsn back to the server as feedback (so the slot’s confirmed-flush advances and WAL can be recycled):

// StreamLogicalLog — src/bin/pg_basebackup/pg_recvlogical.c
appendPQExpBuffer(query, "START_REPLICATION SLOT \"%s\" LOGICAL %X/%X",
replication_slot, LSN_FORMAT_ARGS(startpos));
...
res = PQexec(conn, query->data);
if (PQresultStatus(res) != PGRES_COPY_BOTH)
{
pg_log_error("could not send replication command \"%s\": %s",
query->data, PQresultErrorMessage(res));
...
}

The three tools thus form a family layered over a common base: streamutil.c provides the connection, IDENTIFY_SYSTEM, and slot helpers; receivelog.c provides the physical ReceiveXlogStream engine shared by pg_basebackup’s WAL child and by pg_receivewal; and pg_recvlogical reuses only the connection/slot layer, carrying its own decoding loop because logical output is a byte stream, not WAL segments.

The client lives entirely under src/bin/pg_basebackup/, with one shared helper in src/fe_utils/. Group the symbols by role:

Backup orchestration (pg_basebackup.c):

  • main — option parsing, compression-location resolution, dispatch to BaseBackup.
  • BaseBackup — the whole conversation: build command, send, consume the start/header/payload/stop result sequence, fork WAL streamer, final sync, durable manifest rename.
  • backup_parse_compress_options — split a --compress argument into algorithm + detail.
  • verify_dir_is_empty_or_create — pre-flight tablespace-directory checks (plain format only).

Parallel WAL streaming (pg_basebackup.c):

  • StartLogStreamer — second GetConnection, optional slot creation, segment-boundary rounding, pipe(), fork().
  • LogStreamerMain — child entry; fills StreamCtl, picks a walmethod (directory for -Fp, tar for -Ft), calls ReceiveXlogStream.
  • reached_end_position — child stop predicate; non-blocking select on the shutdown pipe, compares streamed LSN to the parent-supplied end LSN.
  • logstreamer_param — struct passed across the fork boundary.

Output shaping (pg_basebackup.c):

  • CreateBackupStreamer — assembles the astreamer filter chain from format
    • compression + parse-needs.
  • ReceiveArchiveStream / ReceiveArchiveStreamChunk — PG15+ single-COPY- stream decoder ('n' new archive, 'd' data, 'm' manifest, 'p' progress).
  • ReceiveTarFile / ReceiveTarCopyChunk — pre-PG15 per-tablespace tar decoder.
  • ReceiveCopyData — generic COPY-out pump invoking a callback per chunk.
  • progress_report / progress_update_filename--progress accounting.

Shared replication layer (streamutil.c):

  • GetConnection — connection-string merge + replication-mode defaulting.
  • RunIdentifySystemIDENTIFY_SYSTEM parse (sysid, TLI, start LSN, db).
  • GetSlotInformationREAD_REPLICATION_SLOT (PG15+) for resume LSN/TLI.
  • CreateReplicationSlot / DropReplicationSlot — slot lifecycle for all three tools.
  • AppendPlainCommandOption / AppendStringCommandOption / AppendIntegerCommandOption — new-vs-old option syntax emitters with escaping.
  • CheckServerVersionForStreaming — minimum-version gate.

Recovery config (recovery_gen.c):

  • GenerateRecoveryConfig — build primary_conninfo + primary_slot_name from the live connection, stripping replication-only keywords.

WAL archiving sibling (pg_receivewal.c):

  • main / StreamLog — long-lived physical WAL receiver loop.
  • FindStreamingStart — directory scan for resume LSN.
  • stop_streaming--endpos / signal-driven stop predicate.
  • get_destination_dir / close_destination_dir — output directory handles.

Logical decoding sibling (pg_recvlogical.c):

  • main / StreamLogicalLogSTART_REPLICATION ... LOGICAL COPY-both loop.
  • prepareToTerminate — graceful shutdown / feedback flush.

Physical streaming engine (receivelog.c, detailed in postgres-wal-sender-receiver.md):

  • ReceiveXlogStream — the shared engine pg_basebackup’s child and pg_receivewal both call.

Position hints (as of 2026-06-05, REL_18 273fe94)

Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”
SymbolFileLine
BaseBackupsrc/bin/pg_basebackup/pg_basebackup.c1754
mainsrc/bin/pg_basebackup/pg_basebackup.c2356
StartLogStreamersrc/bin/pg_basebackup/pg_basebackup.c616
LogStreamerMainsrc/bin/pg_basebackup/pg_basebackup.c545
reached_end_positionsrc/bin/pg_basebackup/pg_basebackup.c462
logstreamer_paramsrc/bin/pg_basebackup/pg_basebackup.c533
CreateBackupStreamersrc/bin/pg_basebackup/pg_basebackup.c1062
ReceiveArchiveStreamsrc/bin/pg_basebackup/pg_basebackup.c1285
ReceiveArchiveStreamChunksrc/bin/pg_basebackup/pg_basebackup.c1333
ReceiveTarFilesrc/bin/pg_basebackup/pg_basebackup.c1600
ReceiveCopyDatasrc/bin/pg_basebackup/pg_basebackup.c1015
progress_reportsrc/bin/pg_basebackup/pg_basebackup.c817
backup_parse_compress_optionssrc/bin/pg_basebackup/pg_basebackup.c987
verify_dir_is_empty_or_createsrc/bin/pg_basebackup/pg_basebackup.c748
GetConnectionsrc/bin/pg_basebackup/streamutil.c60
RunIdentifySystemsrc/bin/pg_basebackup/streamutil.c409
GetSlotInformationsrc/bin/pg_basebackup/streamutil.c490
CreateReplicationSlotsrc/bin/pg_basebackup/streamutil.c584
AppendPlainCommandOptionsrc/bin/pg_basebackup/streamutil.c746
AppendStringCommandOptionsrc/bin/pg_basebackup/streamutil.c767
AppendIntegerCommandOptionsrc/bin/pg_basebackup/streamutil.c790
StreamLogsrc/bin/pg_basebackup/pg_receivewal.c500
FindStreamingStartsrc/bin/pg_basebackup/pg_receivewal.c268
stop_streamingsrc/bin/pg_basebackup/pg_receivewal.c184
get_destination_dirsrc/bin/pg_basebackup/pg_receivewal.c235
StreamLogicalLogsrc/bin/pg_basebackup/pg_recvlogical.c215
prepareToTerminatesrc/bin/pg_basebackup/pg_recvlogical.c1064
ReceiveXlogStreamsrc/bin/pg_basebackup/receivelog.c452
GenerateRecoveryConfigsrc/fe_utils/recovery_gen.c28

Facts about the source at commit 273fe94, readable without external materials. Open questions follow.

  • WAL streaming starts before the archive payload is read, on a separate forked connection. Verified in BaseBackup: the if (includewal == STREAM_WAL) StartLogStreamer(...) block sits after the start-LSN and header results are consumed but before ReceiveArchiveStream / ReceiveTarFile. StartLogStreamer calls fork() (Unix) / _beginthreadex (Windows) and gets its own connection via GetConnection. This is the parallel-capture design.

  • The WAL child defers fsync to the parent. Verified in LogStreamerMain: stream.do_sync = false, with the comment “fsync happens at the end of pg_basebackup for all data.” The parent fsyncs once at the end via sync_pgdata (plain) or sync_dir_recurse (tar). The WAL start position is rounded down to a segment boundary in StartLogStreamer (param->startptr -= XLogSegmentOffset(...)).

  • The parent communicates the stop LSN to the child over a pipe, not shared memory, on Unix. Verified: StartLogStreamer creates bgpipe via pipe(); BaseBackup writes xlogend into bgpipe[1]; reached_end_position reads it from bgpipe[0] with a non-blocking select. On Windows the same program uses a shared xlogendptr + InterlockedIncrement(&has_xlogendptr) because the streamer is a thread, not a process.

  • PG15 changed the wire shape from per-tablespace tar COPYs to a single COPY stream. Verified in BaseBackup: if (serverMajor >= 1500) ReceiveArchiveStream(...) else a ReceiveTarFile loop. The single stream is multiplexed by a leading type byte per CopyData message ('n' new archive, 'd' data, etc.), decoded in ReceiveArchiveStreamChunk.

  • Compression placement is resolved at runtime and changes the archive file extension. Verified in CreateBackupStreamer: client-side gzip appends .gz and uses astreamer_gzip_writer_new; lz4/zstd append .lz4/.zst and layer a compressor over a plain writer. Server-side compression is requested via the COMPRESSION option in BaseBackup and, for -Fp output, undone by an astreamer_*_decompressor_new before extraction. BaseBackup rejects server-side compression against a pre-PG15 server (pg_fatal("server does not support server-side compression")).

  • The same CreateReplicationSlot serves physical and logical slots. Verified in streamutil.c: a single function with is_physical, two_phase, and failover flags. FAILOVER is gated on PG17+ (PQserverVersion(conn) >= 170000), TWO_PHASE on PG15+, and the parenthesised option syntax on PG15+ — all runtime-version checks, so one binary talks to a range of servers.

  • pg_receivewal resumes from disk first, then slot, then server position. Verified in StreamLog: FindStreamingStart (directory scan) is tried first; only if it returns InvalidXLogRecPtr does it consult GetSlotInformation (PG15+), then fall back to the IDENTIFY_SYSTEM flush position. Output segments use mark_done = false and partial_suffix = ".partial".

  • pg_recvlogical uses replication=database, the others replication=true. Verified in GetConnection: values[i] = (dbname == NULL) ? "true" : "database";. Only pg_recvlogical sets dbname; the Assert(dbname == NULL || connection_string == NULL) enforces the mutual exclusion. Logical decoding requires a database-attached walsender, hence the different mode.

  1. Backpressure between the WAL child and the file-copy parent. The child streams WAL into the same basedir the parent is extracting into, but they are separate processes with separate connections. How disk-full or slow-disk conditions in one affect the other (and whether the parent’s final waitpid can deadlock against a stuck child) is not traced here. Investigation path: kill_bgchild_atexit and the Windows bgchild_exited flag handling.

  2. astreamer chain teardown ordering on error. ReceiveArchiveStreamChunk calls astreamer_finalize / astreamer_free when a new 'n' archive begins, but the error-path cleanup of a partially-built chain (e.g. a pg_fatal mid-stream) relies on process exit rather than explicit unwinding. Whether any partial output file is left in a confusing state is not examined. Investigation path: the astreamer_* files (out of the three-file READ scope of this doc).

  3. Manifest streaming vs. separate-file path interaction. In the PG15+ single-stream path, the manifest arrives as 'm'-tagged CopyData buffered in memory or to a file, then durably renamed; in the writing-to-stdout case it is injected into the tar. The exact size threshold (memory vs. spill file) for the manifest buffer is not quantified here. Investigation path: ReceiveArchiveStreamChunk 'm' / 'd' branches and ReceiveBackupManifest.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”

Pointers, not analysis. Each bullet is a starting handle for a follow-up document.

  • pgBackRest / Barman / WAL-G. The dominant third-party PostgreSQL backup managers add what pg_basebackup deliberately omits: a backup catalog, retention policies, parallel multi-stream file transfer, delta/incremental block backup, and object-storage targets (S3/GCS/Azure). They typically use pg_basebackup or the same BASE_BACKUP protocol underneath, or read the data directory directly. A comparison of their parallelism model against pg_basebackup’s single-stream COPY would quantify the throughput ceiling of the in-tree tool.

  • Server-side compression algorithms and offload. PG15 added server-side gzip/lz4/zstd; the frontier is hardware-offloaded compression (QAT/zstd long-range) and choosing the placement automatically based on measured network vs. CPU headroom. The astreamer chain abstraction is exactly the seam where a new codec would plug in.

  • Incremental and block-level backup. PG17’s incremental backup (UPLOAD_MANIFEST + WAL summaries, see postgres-incremental-backup.md and postgres-archiving-walsummary.md) is the in-tree answer to RMAN-style block-change tracking. BaseBackup already uploads the prior manifest (UPLOAD_MANIFEST COPY-in) and appends an INCREMENTAL option — the client side is small; the interesting machinery is server-side summarization.

  • Logical vs. physical backup convergence. pg_recvlogical consumes a logical stream; pg_dump produces a logical dump (postgres-pg-dump-restore.md). Neither is a substitute for a physical base backup for PITR, but logical replication slots + pg_recvlogical enable continuous logical archiving. The research question is whether a unified “change data capture + physical baseline” tool could replace the current split; CockroachDB’s CDC + full backup model is one existence proof.

  • Backup verification and pg_verifybackup. The backup manifest (backup_manifest) that pg_basebackup durably renames at the end is consumed by pg_verifybackup to check file presence and checksums, and by the incremental machinery as the prior-backup reference. The manifest format and its verification semantics are a companion topic.

  • “pg_basebackup”, “pg_receivewal”, “pg_recvlogical” reference pages — option semantics, format/compression flags, slot interaction.
  • “Streaming Replication Protocol” chapter — IDENTIFY_SYSTEM, BASE_BACKUP, START_REPLICATION, CREATE_REPLICATION_SLOT, READ_REPLICATION_SLOT, UPLOAD_MANIFEST command grammar.
  • “Continuous Archiving and Point-in-Time Recovery (PITR)” chapter — the backup-start/stop marker + WAL-replay model.

PostgreSQL source (under /data/hgryoo/references/postgres, REL_18 273fe94)

Section titled “PostgreSQL source (under /data/hgryoo/references/postgres, REL_18 273fe94)”
  • src/bin/pg_basebackup/pg_basebackup.cBaseBackup, main, StartLogStreamer, LogStreamerMain, reached_end_position, CreateBackupStreamer, ReceiveArchiveStream, ReceiveArchiveStreamChunk, ReceiveTarFile, ReceiveCopyData, progress_report, verify_dir_is_empty_or_create.
  • src/bin/pg_basebackup/streamutil.cGetConnection, RunIdentifySystem, GetSlotInformation, CreateReplicationSlot, Append{Plain,String,Integer}CommandOption, CheckServerVersionForStreaming.
  • src/bin/pg_basebackup/pg_receivewal.cStreamLog, FindStreamingStart, stop_streaming, get_destination_dir.
  • src/bin/pg_basebackup/pg_recvlogical.cStreamLogicalLog, prepareToTerminate.
  • src/bin/pg_basebackup/receivelog.cReceiveXlogStream (the shared physical-streaming engine; detailed in postgres-wal-sender-receiver.md).
  • src/fe_utils/recovery_gen.cGenerateRecoveryConfig.

Textbook chapters (under knowledge/research/dbms-general/)

Section titled “Textbook chapters (under knowledge/research/dbms-general/)”
  • Database System Concepts (Silberschatz et al.), recovery chapter — fuzzy snapshots, redo-from-checkpoint, backup-and-restore model.
  • Database Internals (Petrov), Part II — log-structured recovery and the log-tail-spans-the-backup-window argument.
  • postgres-backup-basebackup.md — the server side: how basebackup.c runs the start checkpoint, constructs the tar/COPY stream, and records the start/stop markers. Defer all server-side mechanics there.
  • postgres-wal-sender-receiver.md — the WAL sender protocol and the ReceiveXlogStream / receivelog.c client engine reused by the WAL child and by pg_receivewal.
  • postgres-wire-protocol.md — the FE/BE framing and the replication startup option that all three tools enter through.
  • postgres-replication-slots.md — slot lifecycle behind CreateReplicationSlot / GetSlotInformation.
  • postgres-incremental-backup.md, postgres-archiving-walsummary.md — the UPLOAD_MANIFEST / INCREMENTAL path and WAL summarization.
  • postgres-logical-decoding.md, postgres-pgoutput.md — what pg_recvlogical consumes.
  • postgres-pg-dump-restore.md — the logical-dump alternative to physical base backup.