pg_basebackup — Driving BASE_BACKUP, Parallel WAL Streaming, Output Formats, and the pg_receivewal / pg_recvlogical Family
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”A physical base backup is a byte-for-byte copy of a running database
cluster’s on-disk state, taken without stopping the server, suitable as the
starting point for either a fresh replica or a point-in-time recovery (PITR)
restore. The classical problem it solves is the fuzzy snapshot problem: the
files of a live cluster are mutating while you copy them, so a naive cp -r
produces a torn, internally inconsistent image. The textbook resolution
(Database System Concepts, Silberschatz et al., recovery chapter; Database
Internals, Petrov, Part II on log-structured recovery) is to bracket the file
copy with a backup-start and backup-stop marker recorded in the
write-ahead log (WAL), and then to require that all WAL generated between
those two markers be replayed during restore. The copy itself may be torn;
WAL replay heals it, because every page modification during the backup window
is also in the log. This is the same redo-from-a-checkpoint argument that
underlies crash recovery, generalised to “recover from an arbitrarily torn
file image plus its concurrent log.”
Three properties define the design space for the tool that performs such a backup:
-
Where does the data come from? A backup tool can read the data directory directly from a shared filesystem (filesystem-level backup), or it can ask the server to stream the bytes to it over a network protocol (streaming backup). The streaming model decouples the backup client from the server’s filesystem entirely — the client needs only a TCP connection and replication privileges, never local disk access to
$PGDATA. -
How is the backup window’s WAL captured? The torn image is only useful if you also have every WAL record between start and stop. There are two strategies: fetch the needed segments at the end (a single batch after the file copy), or stream WAL continuously in parallel with the file copy. Parallel streaming bounds the amount of WAL the primary must retain and lets a slow, large backup finish without the primary recycling the segments the backup will need.
-
What is the on-disk shape and durability contract of the output? A backup may be written as an extracted directory tree (ready to start), as one or more tar archives (compact, archival), and may be compressed. The tool must also decide where compression happens — on the server (saving network bandwidth at the cost of server CPU) or on the client (saving client disk at the cost of full network transfer).
PostgreSQL’s pg_basebackup is a streaming backup client: it speaks the
replication sub-protocol of the PostgreSQL wire protocol (the same
type-byte + length-prefix framing used by ordinary queries, but entered via a
replication=true / replication=database startup option — see
postgres-wire-protocol.md). The server side of the conversation is the WAL
sender (postgres-wal-sender-receiver.md), and the actual snapshot
construction lives in the backend’s basebackup.c
(postgres-backup-basebackup.md). This document is about the client — how
pg_basebackup builds the BASE_BACKUP command, drives the resulting
multi-result / COPY conversation, forks a second connection to stream WAL,
and shapes the bytes into output files.
Common DBMS Design
Section titled “Common DBMS Design”Every production RDBMS ships a hot-backup utility, and they converge on the same skeleton even when the mechanism differs:
-
Oracle RMAN brackets datafile copies with
ALTER DATABASE BEGIN BACKUP/END BACKUP(or, more commonly now, talks to the server process directly so the fuzzy-copy window is invisible to the user). RMAN tracks backups in a catalog and can do incremental block-level backups by consulting a block-change-tracking file — the conceptual ancestor of PostgreSQL 17’s incremental backup (postgres-incremental-backup.md). -
MySQL / Percona XtraBackup copies InnoDB datafiles while concurrently tailing the InnoDB redo log into a separate file, then applies that redo at prepare time. This is structurally identical to PostgreSQL’s “copy files + stream WAL in parallel + replay on restore” model: the redo tail is MySQL’s analogue of the parallel WAL stream.
-
SQL Server uses VDI/VSS snapshots and a backup-log chain; the
BACKUP DATABASE…WITH DIFFERENTIALfamily mirrors the full/incremental distinction.
The shared design vocabulary, then, is: (a) a consistent start/stop marker
recorded in the log; (b) a possibly-torn file image; (c) a log tail
that spans the backup window; (d) an output container (directory vs.
archive, compressed vs. not); and (e) a resumable / continuous WAL archiver
that exists independently of the base backup, so that PITR can roll forward
past the backup’s stop point. PostgreSQL factors (e) into a separate program,
pg_receivewal, rather than folding it into the backup tool — a clean
separation that lets an archive of WAL accumulate continuously while base
backups are taken occasionally.
A second cross-cutting concern is compression placement. Network-bound
deployments want the server to compress before transmission; CPU-bound servers
(or backups taken across a fast LAN to a cheap client) want the client to
compress. A well-designed tool exposes both and negotiates which the server
supports. PostgreSQL 15 introduced server-side compression negotiation, so
pg_basebackup must decide at runtime whether a requested compression can run
server-side or must fall back to client-side, and must adapt the archive file
extension accordingly.
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”pg_basebackup is a libpq client. Its entire interaction with the server is a
sequence of replication commands over one (or two) replication connections.
The orchestration lives in BaseBackup(); the connection plumbing and the
small replication-command helpers are shared with the sibling tools through
streamutil.c.
One backup = one BASE_BACKUP command, built option by option
Section titled “One backup = one BASE_BACKUP command, built option by option”BaseBackup() first checks the server version, then assembles the
BASE_BACKUP command string by appending options to a PQExpBuffer. PG15+
servers accept a parenthesised option syntax (BASE_BACKUP (LABEL 'x', PROGRESS, ...)); older servers use a positional space-separated syntax. A
single boolean, use_new_option_syntax, selects which, and the three
Append*CommandOption helpers in streamutil.c emit the right separators:
// BaseBackup — src/bin/pg_basebackup/pg_basebackup.cAppendStringCommandOption(&buf, use_new_option_syntax, "LABEL", label);if (estimatesize) AppendPlainCommandOption(&buf, use_new_option_syntax, "PROGRESS");if (includewal == FETCH_WAL) AppendPlainCommandOption(&buf, use_new_option_syntax, "WAL");...if (compressloc == COMPRESS_LOCATION_SERVER){ if (!use_new_option_syntax) pg_fatal("server does not support server-side compression"); AppendStringCommandOption(&buf, use_new_option_syntax, "COMPRESSION", compression_algorithm); ...}...if (use_new_option_syntax && buf.len > 0) basebkp = psprintf("BASE_BACKUP (%s)", buf.data);else basebkp = psprintf("BASE_BACKUP %s", buf.data);The Append*CommandOption helpers guarantee correct punctuation regardless of
ordering — AppendPlainCommandOption looks at the last byte already in the
buffer to decide whether a leading , (new syntax) or space (old syntax) is
needed, and string values are escaped through PQescapeStringConn:
// AppendStringCommandOption — src/bin/pg_basebackup/streamutil.cAppendPlainCommandOption(buf, use_new_option_syntax, option_name);if (option_value != NULL){ size_t length = strlen(option_value); char *escaped_value = palloc(1 + 2 * length);
PQescapeStringConn(conn, escaped_value, option_value, length, NULL); appendPQExpBuffer(buf, " '%s'", escaped_value); pfree(escaped_value);}The multi-result conversation
Section titled “The multi-result conversation”Once PQsendQuery(conn, basebkp) fires, the server answers with a sequence of
result sets, and BaseBackup() consumes them in a fixed order:
- A one-row result giving the WAL start LSN and starting timeline. This is the backup-start marker. The checkpoint the server runs before answering is why a non-fast backup can pause here for a long time.
- A header result with one row per tablespace (OID, location, size). The
sizes feed the progress estimate; the locations drive
verify_dir_is_empty_or_create. - The archive payload, delivered as COPY data.
- A one-row result giving the WAL stop LSN (the backup-stop marker).
- A final
CommandCompletethat may carryERRCODE_DATA_CORRUPTEDif a checksum verification failed mid-backup.
// BaseBackup — src/bin/pg_basebackup/pg_basebackup.cif (PQsendQuery(conn, basebkp) == 0) pg_fatal("could not send replication command \"%s\": %s", "BASE_BACKUP", PQerrorMessage(conn));
/* Get the starting WAL location */res = PQgetResult(conn);if (PQresultStatus(res) != PGRES_TUPLES_OK) pg_fatal("could not initiate base backup: %s", PQerrorMessage(conn));...strlcpy(xlogstart, PQgetvalue(res, 0, 0), sizeof(xlogstart));...if (PQnfields(res) >= 2) starttli = atoi(PQgetvalue(res, 0, 1));The PG15 protocol change is visible in the dispatch after the header: a v15+
server sends one COPY stream containing every archive (and the manifest)
back-to-back, decoded by ReceiveArchiveStream; an older server sends a
separate tar COPY per tablespace, decoded by a ReceiveTarFile loop:
// BaseBackup — src/bin/pg_basebackup/pg_basebackup.cif (serverMajor >= 1500){ /* Receive a single tar stream with everything. */ ReceiveArchiveStream(conn, client_compress);}else{ /* Receive a tar file for each tablespace in turn */ for (i = 0; i < PQntuples(res); i++) ReceiveTarFile(conn, archive_name, spclocation, i, client_compress); if (!writing_to_stdout && manifest) ReceiveBackupManifest(conn);}Parallel WAL streaming via a forked second connection
Section titled “Parallel WAL streaming via a forked second connection”The defining architectural choice of pg_basebackup -X stream (the default)
is that WAL is captured concurrently with the file copy, on a second
replication connection driven by a forked child process (a thread on
Windows). The fork happens after the start LSN is known but before the
archive payload is read, so that WAL accumulates on the client while the
potentially-huge file copy proceeds:
// BaseBackup — src/bin/pg_basebackup/pg_basebackup.cif (includewal == STREAM_WAL){ ... StartLogStreamer(xlogstart, starttli, sysidentifier, wal_compress_algorithm, wal_compress_level);}StartLogStreamer opens a fresh connection with GetConnection(), optionally
creates a (temporary) replication slot, rounds the start position down to a
segment boundary, creates a Unix pipe for shutdown signalling, and forks:
// StartLogStreamer — src/bin/pg_basebackup/pg_basebackup.cparam->startptr -= XLogSegmentOffset(param->startptr, WalSegSz);...if (pipe(bgpipe) < 0) pg_fatal("could not create pipe for background process: %m");param->bgconn = GetConnection();...if (temp_replication_slot && !replication_slot) replication_slot = psprintf("pg_basebackup_%u", (unsigned int) PQbackendPID(param->bgconn));...bgchild = fork();if (bgchild == 0) exit(LogStreamerMain(param)); /* in child process */The child runs LogStreamerMain, which packages a StreamCtl and calls the
shared ReceiveXlogStream engine (in receivelog.c, covered by
postgres-wal-sender-receiver.md). The crucial wiring is the stop
predicate and the stop socket: the parent, once it learns the backup’s
stop LSN, writes it down the pipe, and the child’s reached_end_position
callback reads it and tells ReceiveXlogStream to stop exactly there:
// LogStreamerMain — src/bin/pg_basebackup/pg_basebackup.cstream.stream_stop = reached_end_position;#ifndef WIN32stream.stop_socket = bgpipe[0];#endifstream.mark_done = true;stream.do_sync = false; /* fsync happens at the end of pg_basebackup */if (format == 'p') stream.walmethod = CreateWalDirectoryMethod(param->xlog, PG_COMPRESSION_NONE, 0, stream.do_sync);else stream.walmethod = CreateWalTarMethod(param->xlog, param->wal_compress_algorithm, param->wal_compress_level, stream.do_sync);if (!ReceiveXlogStream(param->bgconn, &stream)) return 1;reached_end_position is a non-blocking select() on the pipe: until the
parent sends the end LSN, the callback returns “keep going”; once it has the
end pointer, it returns true as soon as a streamed segment reaches it:
// reached_end_position — src/bin/pg_basebackup/pg_basebackup.cr = select(bgpipe[0] + 1, &fds, NULL, NULL, &tv);if (r == 1){ r = read(bgpipe[0], xlogend, sizeof(xlogend) - 1); ... if (sscanf(xlogend, "%X/%X", &hi, &lo) != 2) pg_fatal("could not parse write-ahead log location \"%s\"", xlogend); xlogendptr = ((uint64) hi) << 32 | lo; has_xlogendptr = 1;}else return false; /* don't know the end yet */...if (segendpos >= xlogendptr) return true;return false;The parent, after consuming the stop-LSN result, hands the value to the child
and waitpid()s for it:
// BaseBackup — src/bin/pg_basebackup/pg_basebackup.c (background reap)if (write(bgpipe[1], xlogend, strlen(xlogend)) != strlen(xlogend)) pg_fatal("could not send command to background pipe: %m");r = waitpid(bgchild, &status, 0);This whole-cluster fsync deferral (the child sets do_sync = false) is
deliberate: rather than fsync each WAL segment as it arrives, the parent
flushes the entire basedir once at the very end, via sync_pgdata (plain) or
sync_dir_recurse (tar), which is far cheaper.
flowchart TD
A["main()"] --> B["BaseBackup()"]
B --> C["build BASE_BACKUP cmd<br/>(Append*CommandOption)"]
C --> D["PQsendQuery(BASE_BACKUP)"]
D --> E["result 1: start LSN + TLI"]
E --> F["result 2: per-tablespace header"]
F --> G{"includewal == STREAM_WAL?"}
G -->|yes| H["StartLogStreamer()<br/>fork 2nd connection"]
H --> H2["child: LogStreamerMain<br/>ReceiveXlogStream()"]
G -->|no| I
H --> I{"serverMajor >= 1500?"}
I -->|yes| J["ReceiveArchiveStream<br/>(single COPY stream)"]
I -->|no| K["ReceiveTarFile loop<br/>+ ReceiveBackupManifest"]
J --> L["result 4: stop LSN"]
K --> L
L --> M["write stop LSN to bgpipe<br/>waitpid(bgchild)"]
M --> N["sync_pgdata / sync_dir_recurse<br/>durable_rename manifest"]
Output shaping: the astreamer pipeline
Section titled “Output shaping: the astreamer pipeline”The COPY-streamed archive bytes are not written directly. Instead
CreateBackupStreamer() builds a chain of astreamer objects — each a
filter with a content / finalize / free vtable — and pushes bytes
through them. The chain is assembled bottom-up (the final writer first, then
wrappers), so the order of astreamer_*_new calls is the reverse of the data
flow. Format and compression decisions are encoded purely in how the chain is
built:
// CreateBackupStreamer — src/bin/pg_basebackup/pg_basebackup.cif (compress->algorithm == PG_COMPRESSION_NONE) streamer = astreamer_plain_writer_new(archive_filename, archive_file);else if (compress->algorithm == PG_COMPRESSION_GZIP){ strlcat(archive_filename, ".gz", sizeof(archive_filename)); streamer = astreamer_gzip_writer_new(archive_filename, archive_file, compress);}else if (compress->algorithm == PG_COMPRESSION_LZ4){ strlcat(archive_filename, ".lz4", sizeof(archive_filename)); streamer = astreamer_plain_writer_new(archive_filename, archive_file); streamer = astreamer_lz4_compressor_new(streamer, compress);}else if (compress->algorithm == PG_COMPRESSION_ZSTD){ strlcat(archive_filename, ".zst", sizeof(archive_filename)); streamer = astreamer_plain_writer_new(archive_filename, archive_file); streamer = astreamer_zstd_compressor_new(streamer, compress);}For plain format (-Fp) the chain instead starts with an extractor that
unpacks the tar stream into a directory tree, and — if the server compressed
the stream — prepends a decompressor so the client can re-expand before
extraction:
// CreateBackupStreamer — src/bin/pg_basebackup/pg_basebackup.cif (format == 'p'){ ... streamer = astreamer_extractor_new(directory, get_tablespace_mapping, progress_update_filename);}.../* server-compressed archive, but client wants plain: decompress */if (format == 'p'){ if (is_tar_gz) streamer = astreamer_gzip_decompressor_new(streamer); else if (is_tar_lz4) streamer = astreamer_lz4_decompressor_new(streamer); else if (is_tar_zstd) streamer = astreamer_zstd_decompressor_new(streamer);}Two more wrappers are conditionally inserted when the client must understand
the tar contents rather than pass them through opaquely: an
astreamer_recovery_injector_new (to write standby.signal /
primary_conninfo for -R) and an astreamer_tar_parser_new (so the
injector can splice files into the tar structure). The boolean
must_parse_archive gates all of this — if the client is just writing an
opaque compressed tar to disk, none of the parsing machinery is built, and the
bytes flow straight to the writer.
The chain is built top-down in code (outermost filter last) but data flows the
other way: a CopyData chunk enters at the head and is handed down through each
content callback until it reaches the writer at the tail. The diagram below
shows two representative chains — client-side zstd to a tar file, and a
server-gzip stream extracted to a plain directory:
flowchart LR
subgraph TAR["-Ft --compress=client-zstd"]
T0["CopyData chunk"] --> T1["astreamer_tar_archiver<br/>(if must_parse)"]
T1 --> T2["astreamer_zstd_compressor"]
T2 --> T3["astreamer_plain_writer<br/>base.tar.zst"]
end
subgraph PLAIN["-Fp from server-gzip stream"]
P0["CopyData chunk"] --> P1["astreamer_gzip_decompressor"]
P1 --> P2["astreamer_tar_parser"]
P2 --> P3["astreamer_recovery_injector<br/>(if -R)"]
P3 --> P4["astreamer_extractor<br/>directory tree"]
end
Recovery configuration generation
Section titled “Recovery configuration generation”When -R is passed, GenerateRecoveryConfig() (in src/fe_utils/recovery_gen.c)
reconstructs a primary_conninfo line from the live connection’s parameters,
deliberately stripping replication, dbname, and fallback_application_name
(which libpqwalreceiver overrides), so the generated standby config reconnects
as a normal replica:
// GenerateRecoveryConfig — src/fe_utils/recovery_gen.cfor (PQconninfoOption *opt = connOptions; opt && opt->keyword; opt++){ if (strcmp(opt->keyword, "replication") == 0 || strcmp(opt->keyword, "dbname") == 0 || strcmp(opt->keyword, "fallback_application_name") == 0 || (opt->val == NULL) || (opt->val != NULL && opt->val[0] == '\0')) continue; ... appendPQExpBuffer(&conninfo_buf, "%s=", opt->keyword); appendConnStrVal(&conninfo_buf, opt->val);}The shared replication-connection layer (streamutil.c)
Section titled “The shared replication-connection layer (streamutil.c)”All three tools open their replication connection through one function,
GetConnection(), which merges a connection string, command-line host/user/
port options, and replication-mode defaults. The single most important line is
the dbname / replication defaulting that distinguishes a physical
replication connection (pg_basebackup, pg_receivewal: dbname=replication,
replication=true) from a logical one (pg_recvlogical:
replication=database):
// GetConnection — src/bin/pg_basebackup/streamutil.ckeywords[i] = "replication";values[i] = (dbname == NULL) ? "true" : "database";RunIdentifySystem() issues IDENTIFY_SYSTEM and parses the system
identifier, current timeline, and (for logical) the current flush LSN — the
fallback start position when nothing else is known:
// RunIdentifySystem — src/bin/pg_basebackup/streamutil.cres = PQexec(conn, "IDENTIFY_SYSTEM");...if (starttli != NULL) *starttli = atoi(PQgetvalue(res, 0, 1));if (startpos != NULL){ if (sscanf(PQgetvalue(res, 0, 2), "%X/%X", &hi, &lo) != 2) ... *startpos = ((uint64) hi) << 32 | lo;}CreateReplicationSlot() builds a CREATE_REPLICATION_SLOT command that
serves all three tools: PHYSICAL (with optional RESERVE_WAL) for backup /
receivewal, LOGICAL <plugin> (with optional TWO_PHASE, FAILOVER) for
recvlogical. The same new-vs-old syntax switch as BASE_BACKUP applies.
pg_receivewal — continuous physical WAL archiving with resume
Section titled “pg_receivewal — continuous physical WAL archiving with resume”pg_receivewal is, in effect, the WAL-streaming half of pg_basebackup run as
a standalone, long-lived program: no file copy, just ReceiveXlogStream into a
local directory, forever. Its distinctive logic is resume-from-disk:
FindStreamingStart() scans the destination directory for the
highest-numbered complete WAL segment and resumes from the end of it, so a
restarted pg_receivewal does not re-fetch or gap:
// FindStreamingStart — src/bin/pg_basebackup/pg_receivewal.cwhile (errno = 0, (dirent = readdir(dir)) != NULL){ ... if (!is_xlogfilename(dirent->d_name, &ispartial, &wal_compression_algorithm)) continue; XLogFromFileName(dirent->d_name, &tli, &segno, WalSegSz); ...}StreamLog() ties it together: identify system, then choose a start position
in priority order — (1) the local directory’s last complete segment, (2) the
replication slot’s restart_lsn (PG15+, via GetSlotInformation), (3) the
server’s current flush position as a last resort — round it down to a segment
boundary, and stream with mark_done = false (these segments are not handed to
an archiver; they are the archive) and a .partial suffix for the
in-progress segment:
// StreamLog — src/bin/pg_basebackup/pg_receivewal.cstream.startpos = FindStreamingStart(&stream.timeline);if (stream.startpos == InvalidXLogRecPtr){ if (replication_slot != NULL && PQserverVersion(conn) >= 150000) GetSlotInformation(conn, replication_slot, &stream.startpos, &stream.timeline); if (stream.startpos == InvalidXLogRecPtr) { stream.startpos = serverpos; stream.timeline = servertli; }}stream.startpos -= XLogSegmentOffset(stream.startpos, WalSegSz);...stream.stream_stop = stop_streaming;stream.mark_done = false;stream.partial_suffix = ".partial";ReceiveXlogStream(conn, &stream);The stop_streaming callback honours --endpos and a SIGINT-set
time_to_stop flag, logging timeline switches as it goes — the program is
meant to run until killed or until the requested end LSN is reached.
pg_recvlogical — a logical decoding consumer
Section titled “pg_recvlogical — a logical decoding consumer”pg_recvlogical connects in logical replication mode and runs a
START_REPLICATION SLOT ... LOGICAL command, emitting plugin output (e.g.
test_decoding, pgoutput) to a file. It does not use the ReceiveXlogStream
file-oriented engine; it has its own COPY-both loop in StreamLogicalLog()
that decodes the 'w' (WAL data) and 'k' (keepalive) CopyData messages and
periodically reports output_written_lsn / output_fsync_lsn back to the
server as feedback (so the slot’s confirmed-flush advances and WAL can be
recycled):
// StreamLogicalLog — src/bin/pg_basebackup/pg_recvlogical.cappendPQExpBuffer(query, "START_REPLICATION SLOT \"%s\" LOGICAL %X/%X", replication_slot, LSN_FORMAT_ARGS(startpos));...res = PQexec(conn, query->data);if (PQresultStatus(res) != PGRES_COPY_BOTH){ pg_log_error("could not send replication command \"%s\": %s", query->data, PQresultErrorMessage(res)); ...}The three tools thus form a family layered over a common base: streamutil.c
provides the connection, IDENTIFY_SYSTEM, and slot helpers; receivelog.c
provides the physical ReceiveXlogStream engine shared by pg_basebackup’s
WAL child and by pg_receivewal; and pg_recvlogical reuses only the
connection/slot layer, carrying its own decoding loop because logical output is
a byte stream, not WAL segments.
Source Walkthrough
Section titled “Source Walkthrough”The client lives entirely under src/bin/pg_basebackup/, with one shared
helper in src/fe_utils/. Group the symbols by role:
Backup orchestration (pg_basebackup.c):
main— option parsing, compression-location resolution, dispatch toBaseBackup.BaseBackup— the whole conversation: build command, send, consume the start/header/payload/stop result sequence, fork WAL streamer, final sync, durable manifest rename.backup_parse_compress_options— split a--compressargument into algorithm + detail.verify_dir_is_empty_or_create— pre-flight tablespace-directory checks (plain format only).
Parallel WAL streaming (pg_basebackup.c):
StartLogStreamer— secondGetConnection, optional slot creation, segment-boundary rounding,pipe(),fork().LogStreamerMain— child entry; fillsStreamCtl, picks a walmethod (directory for-Fp, tar for-Ft), callsReceiveXlogStream.reached_end_position— child stop predicate; non-blockingselecton the shutdown pipe, compares streamed LSN to the parent-supplied end LSN.logstreamer_param— struct passed across the fork boundary.
Output shaping (pg_basebackup.c):
CreateBackupStreamer— assembles theastreamerfilter chain from format- compression + parse-needs.
ReceiveArchiveStream/ReceiveArchiveStreamChunk— PG15+ single-COPY- stream decoder ('n'new archive,'d'data,'m'manifest,'p'progress).ReceiveTarFile/ReceiveTarCopyChunk— pre-PG15 per-tablespace tar decoder.ReceiveCopyData— generic COPY-out pump invoking a callback per chunk.progress_report/progress_update_filename—--progressaccounting.
Shared replication layer (streamutil.c):
GetConnection— connection-string merge + replication-mode defaulting.RunIdentifySystem—IDENTIFY_SYSTEMparse (sysid, TLI, start LSN, db).GetSlotInformation—READ_REPLICATION_SLOT(PG15+) for resume LSN/TLI.CreateReplicationSlot/DropReplicationSlot— slot lifecycle for all three tools.AppendPlainCommandOption/AppendStringCommandOption/AppendIntegerCommandOption— new-vs-old option syntax emitters with escaping.CheckServerVersionForStreaming— minimum-version gate.
Recovery config (recovery_gen.c):
GenerateRecoveryConfig— buildprimary_conninfo+primary_slot_namefrom the live connection, stripping replication-only keywords.
WAL archiving sibling (pg_receivewal.c):
main/StreamLog— long-lived physical WAL receiver loop.FindStreamingStart— directory scan for resume LSN.stop_streaming—--endpos/ signal-driven stop predicate.get_destination_dir/close_destination_dir— output directory handles.
Logical decoding sibling (pg_recvlogical.c):
main/StreamLogicalLog—START_REPLICATION ... LOGICALCOPY-both loop.prepareToTerminate— graceful shutdown / feedback flush.
Physical streaming engine (receivelog.c, detailed in
postgres-wal-sender-receiver.md):
ReceiveXlogStream— the shared enginepg_basebackup’s child andpg_receivewalboth call.
Position hints (as of 2026-06-05, REL_18 273fe94)
Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”| Symbol | File | Line |
|---|---|---|
BaseBackup | src/bin/pg_basebackup/pg_basebackup.c | 1754 |
main | src/bin/pg_basebackup/pg_basebackup.c | 2356 |
StartLogStreamer | src/bin/pg_basebackup/pg_basebackup.c | 616 |
LogStreamerMain | src/bin/pg_basebackup/pg_basebackup.c | 545 |
reached_end_position | src/bin/pg_basebackup/pg_basebackup.c | 462 |
logstreamer_param | src/bin/pg_basebackup/pg_basebackup.c | 533 |
CreateBackupStreamer | src/bin/pg_basebackup/pg_basebackup.c | 1062 |
ReceiveArchiveStream | src/bin/pg_basebackup/pg_basebackup.c | 1285 |
ReceiveArchiveStreamChunk | src/bin/pg_basebackup/pg_basebackup.c | 1333 |
ReceiveTarFile | src/bin/pg_basebackup/pg_basebackup.c | 1600 |
ReceiveCopyData | src/bin/pg_basebackup/pg_basebackup.c | 1015 |
progress_report | src/bin/pg_basebackup/pg_basebackup.c | 817 |
backup_parse_compress_options | src/bin/pg_basebackup/pg_basebackup.c | 987 |
verify_dir_is_empty_or_create | src/bin/pg_basebackup/pg_basebackup.c | 748 |
GetConnection | src/bin/pg_basebackup/streamutil.c | 60 |
RunIdentifySystem | src/bin/pg_basebackup/streamutil.c | 409 |
GetSlotInformation | src/bin/pg_basebackup/streamutil.c | 490 |
CreateReplicationSlot | src/bin/pg_basebackup/streamutil.c | 584 |
AppendPlainCommandOption | src/bin/pg_basebackup/streamutil.c | 746 |
AppendStringCommandOption | src/bin/pg_basebackup/streamutil.c | 767 |
AppendIntegerCommandOption | src/bin/pg_basebackup/streamutil.c | 790 |
StreamLog | src/bin/pg_basebackup/pg_receivewal.c | 500 |
FindStreamingStart | src/bin/pg_basebackup/pg_receivewal.c | 268 |
stop_streaming | src/bin/pg_basebackup/pg_receivewal.c | 184 |
get_destination_dir | src/bin/pg_basebackup/pg_receivewal.c | 235 |
StreamLogicalLog | src/bin/pg_basebackup/pg_recvlogical.c | 215 |
prepareToTerminate | src/bin/pg_basebackup/pg_recvlogical.c | 1064 |
ReceiveXlogStream | src/bin/pg_basebackup/receivelog.c | 452 |
GenerateRecoveryConfig | src/fe_utils/recovery_gen.c | 28 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”Facts about the source at commit
273fe94, readable without external materials. Open questions follow.
Verified facts
Section titled “Verified facts”-
WAL streaming starts before the archive payload is read, on a separate forked connection. Verified in
BaseBackup: theif (includewal == STREAM_WAL) StartLogStreamer(...)block sits after the start-LSN and header results are consumed but beforeReceiveArchiveStream/ReceiveTarFile.StartLogStreamercallsfork()(Unix) /_beginthreadex(Windows) and gets its own connection viaGetConnection. This is the parallel-capture design. -
The WAL child defers fsync to the parent. Verified in
LogStreamerMain:stream.do_sync = false, with the comment “fsync happens at the end of pg_basebackup for all data.” The parent fsyncs once at the end viasync_pgdata(plain) orsync_dir_recurse(tar). The WAL start position is rounded down to a segment boundary inStartLogStreamer(param->startptr -= XLogSegmentOffset(...)). -
The parent communicates the stop LSN to the child over a pipe, not shared memory, on Unix. Verified:
StartLogStreamercreatesbgpipeviapipe();BaseBackupwritesxlogendintobgpipe[1];reached_end_positionreads it frombgpipe[0]with a non-blockingselect. On Windows the same program uses a sharedxlogendptr+InterlockedIncrement(&has_xlogendptr)because the streamer is a thread, not a process. -
PG15 changed the wire shape from per-tablespace tar COPYs to a single COPY stream. Verified in
BaseBackup:if (serverMajor >= 1500) ReceiveArchiveStream(...)else aReceiveTarFileloop. The single stream is multiplexed by a leading type byte per CopyData message ('n'new archive,'d'data, etc.), decoded inReceiveArchiveStreamChunk. -
Compression placement is resolved at runtime and changes the archive file extension. Verified in
CreateBackupStreamer: client-side gzip appends.gzand usesastreamer_gzip_writer_new; lz4/zstd append.lz4/.zstand layer a compressor over a plain writer. Server-side compression is requested via theCOMPRESSIONoption inBaseBackupand, for-Fpoutput, undone by anastreamer_*_decompressor_newbefore extraction.BaseBackuprejects server-side compression against a pre-PG15 server (pg_fatal("server does not support server-side compression")). -
The same
CreateReplicationSlotserves physical and logical slots. Verified instreamutil.c: a single function withis_physical,two_phase, andfailoverflags.FAILOVERis gated on PG17+ (PQserverVersion(conn) >= 170000),TWO_PHASEon PG15+, and the parenthesised option syntax on PG15+ — all runtime-version checks, so one binary talks to a range of servers. -
pg_receivewalresumes from disk first, then slot, then server position. Verified inStreamLog:FindStreamingStart(directory scan) is tried first; only if it returnsInvalidXLogRecPtrdoes it consultGetSlotInformation(PG15+), then fall back to theIDENTIFY_SYSTEMflush position. Output segments usemark_done = falseandpartial_suffix = ".partial". -
pg_recvlogicalusesreplication=database, the othersreplication=true. Verified inGetConnection:values[i] = (dbname == NULL) ? "true" : "database";. Onlypg_recvlogicalsetsdbname; theAssert(dbname == NULL || connection_string == NULL)enforces the mutual exclusion. Logical decoding requires a database-attached walsender, hence the different mode.
Open questions
Section titled “Open questions”-
Backpressure between the WAL child and the file-copy parent. The child streams WAL into the same basedir the parent is extracting into, but they are separate processes with separate connections. How disk-full or slow-disk conditions in one affect the other (and whether the parent’s final
waitpidcan deadlock against a stuck child) is not traced here. Investigation path:kill_bgchild_atexitand the Windowsbgchild_exitedflag handling. -
astreamer chain teardown ordering on error.
ReceiveArchiveStreamChunkcallsastreamer_finalize/astreamer_freewhen a new'n'archive begins, but the error-path cleanup of a partially-built chain (e.g. apg_fatalmid-stream) relies on process exit rather than explicit unwinding. Whether any partial output file is left in a confusing state is not examined. Investigation path: theastreamer_*files (out of the three-file READ scope of this doc). -
Manifest streaming vs. separate-file path interaction. In the PG15+ single-stream path, the manifest arrives as
'm'-tagged CopyData buffered in memory or to a file, then durably renamed; in the writing-to-stdout case it is injected into the tar. The exact size threshold (memory vs. spill file) for the manifest buffer is not quantified here. Investigation path:ReceiveArchiveStreamChunk'm'/'d'branches andReceiveBackupManifest.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”Pointers, not analysis. Each bullet is a starting handle for a follow-up document.
-
pgBackRest / Barman / WAL-G. The dominant third-party PostgreSQL backup managers add what
pg_basebackupdeliberately omits: a backup catalog, retention policies, parallel multi-stream file transfer, delta/incremental block backup, and object-storage targets (S3/GCS/Azure). They typically usepg_basebackupor the sameBASE_BACKUPprotocol underneath, or read the data directory directly. A comparison of their parallelism model againstpg_basebackup’s single-stream COPY would quantify the throughput ceiling of the in-tree tool. -
Server-side compression algorithms and offload. PG15 added server-side gzip/lz4/zstd; the frontier is hardware-offloaded compression (QAT/zstd long-range) and choosing the placement automatically based on measured network vs. CPU headroom. The
astreamerchain abstraction is exactly the seam where a new codec would plug in. -
Incremental and block-level backup. PG17’s incremental backup (
UPLOAD_MANIFEST+ WAL summaries, seepostgres-incremental-backup.mdandpostgres-archiving-walsummary.md) is the in-tree answer to RMAN-style block-change tracking.BaseBackupalready uploads the prior manifest (UPLOAD_MANIFESTCOPY-in) and appends anINCREMENTALoption — the client side is small; the interesting machinery is server-side summarization. -
Logical vs. physical backup convergence.
pg_recvlogicalconsumes a logical stream;pg_dumpproduces a logical dump (postgres-pg-dump-restore.md). Neither is a substitute for a physical base backup for PITR, but logical replication slots +pg_recvlogicalenable continuous logical archiving. The research question is whether a unified “change data capture + physical baseline” tool could replace the current split; CockroachDB’s CDC + full backup model is one existence proof. -
Backup verification and
pg_verifybackup. The backup manifest (backup_manifest) thatpg_basebackupdurably renames at the end is consumed bypg_verifybackupto check file presence and checksums, and by the incremental machinery as the prior-backup reference. The manifest format and its verification semantics are a companion topic.
Sources
Section titled “Sources”PostgreSQL documentation
Section titled “PostgreSQL documentation”- “pg_basebackup”, “pg_receivewal”, “pg_recvlogical” reference pages — option semantics, format/compression flags, slot interaction.
- “Streaming Replication Protocol” chapter —
IDENTIFY_SYSTEM,BASE_BACKUP,START_REPLICATION,CREATE_REPLICATION_SLOT,READ_REPLICATION_SLOT,UPLOAD_MANIFESTcommand grammar. - “Continuous Archiving and Point-in-Time Recovery (PITR)” chapter — the backup-start/stop marker + WAL-replay model.
PostgreSQL source (under /data/hgryoo/references/postgres, REL_18 273fe94)
Section titled “PostgreSQL source (under /data/hgryoo/references/postgres, REL_18 273fe94)”src/bin/pg_basebackup/pg_basebackup.c—BaseBackup,main,StartLogStreamer,LogStreamerMain,reached_end_position,CreateBackupStreamer,ReceiveArchiveStream,ReceiveArchiveStreamChunk,ReceiveTarFile,ReceiveCopyData,progress_report,verify_dir_is_empty_or_create.src/bin/pg_basebackup/streamutil.c—GetConnection,RunIdentifySystem,GetSlotInformation,CreateReplicationSlot,Append{Plain,String,Integer}CommandOption,CheckServerVersionForStreaming.src/bin/pg_basebackup/pg_receivewal.c—StreamLog,FindStreamingStart,stop_streaming,get_destination_dir.src/bin/pg_basebackup/pg_recvlogical.c—StreamLogicalLog,prepareToTerminate.src/bin/pg_basebackup/receivelog.c—ReceiveXlogStream(the shared physical-streaming engine; detailed inpostgres-wal-sender-receiver.md).src/fe_utils/recovery_gen.c—GenerateRecoveryConfig.
Textbook chapters (under knowledge/research/dbms-general/)
Section titled “Textbook chapters (under knowledge/research/dbms-general/)”- Database System Concepts (Silberschatz et al.), recovery chapter — fuzzy snapshots, redo-from-checkpoint, backup-and-restore model.
- Database Internals (Petrov), Part II — log-structured recovery and the log-tail-spans-the-backup-window argument.
Cross-references (sibling module docs)
Section titled “Cross-references (sibling module docs)”postgres-backup-basebackup.md— the server side: howbasebackup.cruns the start checkpoint, constructs the tar/COPY stream, and records the start/stop markers. Defer all server-side mechanics there.postgres-wal-sender-receiver.md— the WAL sender protocol and theReceiveXlogStream/receivelog.cclient engine reused by the WAL child and bypg_receivewal.postgres-wire-protocol.md— the FE/BE framing and the replication startup option that all three tools enter through.postgres-replication-slots.md— slot lifecycle behindCreateReplicationSlot/GetSlotInformation.postgres-incremental-backup.md,postgres-archiving-walsummary.md— theUPLOAD_MANIFEST/INCREMENTALpath and WAL summarization.postgres-logical-decoding.md,postgres-pgoutput.md— whatpg_recvlogicalconsumes.postgres-pg-dump-restore.md— the logical-dump alternative to physical base backup.