PostgreSQL Utilities — Section Overview
Contents:
- What this section covers
- The layering: tools sit outside the server, grouped by the contract they touch
- Reading order
- Detail-doc summaries
- Adjacent sections
What this section covers
Section titled “What this section covers”This subcategory is the tooling tree — the standalone programs under
src/bin/, plus the one build-time code generator (catalog/genbki.pl) and
the one backend mode (bootstrap/bootstrap.c) that the genesis tool depends
on. Everything here shares a defining property that separates it from the
other twelve subcategories: none of it runs inside the postmaster’s
shared-memory machine. These are separate executables. They act on a
cluster from the outside — by talking the FE/BE wire protocol to a running
server, by reading and writing the on-disk files of a stopped (or
backup-snapshotted) cluster, or — for initdb — by creating the cluster
before any server exists.
Concretely, the tools covered here are: initdb and the bootstrap/genbki
catalog codegen behind it; pg_dump / pg_restore / pg_dumpall;
pg_upgrade; pg_basebackup; pg_combinebackup; pg_rewind; pg_waldump;
psql; and pg_ctl / pg_controldata.
The sharp boundaries — what this section does not own:
- The mechanisms the tools invoke live in other subcategories. A tool is
a client of an on-disk or on-wire contract; the contract itself is owned
elsewhere.
pg_waldumpdecodes WAL records, but the WAL format and the resource-managerdesccallbacks belong to txn-recovery (postgres-wal-records-rmgr.md).pg_basebackupstreams a base backup, but the server-sideBASE_BACKUPreplication command, WAL summarization, and the backup manifest belong to replication-ha.pg_controldataprintspg_control, but theControlFileDatastruct and its update path belong to txn-recovery (postgres-xlog-wal.md). The rule: this section owns the executables; it hands off the format and protocol to the subcategory that defines them. - The catalog content belongs to system-catalog.
genbki.plis covered here as a code-generation pipeline (header +.dat→.bki), but what the catalogs are (pg_class,pg_proc, the relcache that reads them) is system-catalog (postgres-system-catalogs.md). - The replication-only frontend tools that happen to live in
src/bin/pg_basebackup/—pg_receivewal,pg_recvlogical,pg_createsubscriber— are replication-ha scope, not utilities. They are named here only to explain why the directory is larger than the one tool this section claims from it. contrib/is out of scope for the whole tree; tools such aspg_amcheck,pgbench,pg_resetwal,pg_checksums, andpg_verifybackupthat do live insrc/bin/but are not in the plan’s module catalog are out of this section’s scope and may be named only as examples.
The layering: tools sit outside the server, grouped by the contract they touch
Section titled “The layering: tools sit outside the server, grouped by the contract they touch”The honest picture is not a layered stack but a rim of external
executables around the cluster’s three on-disk/on-wire contracts: the
catalog/.bki genesis path, the FE/BE protocol, and the WAL + data-file
on-disk format. Each tool is positioned by which contract it touches.
flowchart TB
subgraph BUILD["build time (no cluster yet)"]
GENBKI["genbki.pl<br/>pg_*.h + pg_*.dat -> postgres.bki<br/>(postgres-initdb-bootstrap-genbki.md)"]
end
subgraph GENESIS["cluster genesis"]
INITDB["initdb<br/>drives 'postgres --boot' on postgres.bki,<br/>writes system views, stamps pg_control<br/>(postgres-initdb-bootstrap-genbki.md)"]
end
GENBKI -- "postgres.bki" --> INITDB
subgraph CLUSTER["a cluster on disk (PGDATA)"]
direction LR
CTRL["pg_control"]
DATA["base/ data files"]
WAL["pg_wal/ + summaries"]
end
INITDB -- "creates" --> CLUSTER
subgraph FEBE["over the FE/BE protocol (server running)"]
DUMP["pg_dump / pg_restore / pg_dumpall<br/>schema+data as SQL or archive<br/>(postgres-pg-dump-restore.md)"]
PSQL["psql<br/>interactive client + meta-commands<br/>(postgres-psql.md)"]
BASE["pg_basebackup<br/>BASE_BACKUP + WAL stream<br/>(postgres-pg-basebackup.md)"]
end
DUMP <--> CLUSTER
PSQL <--> CLUSTER
BASE <-- "streams" --> CLUSTER
subgraph OFFLINE["direct on-disk file access (server stopped / snapshot)"]
UPGRADE["pg_upgrade<br/>schema dump + relfilenode swap<br/>(postgres-pg-upgrade.md)"]
COMBINE["pg_combinebackup<br/>full + incremental -> synthetic full<br/>(postgres-pg-basebackup.md companion)"]
REWIND["pg_rewind<br/>diverged data dir -> source timeline<br/>(postgres-pg-rewind.md)"]
WALDUMP["pg_waldump<br/>decode WAL via rmgr desc callbacks<br/>(postgres-pg-waldump.md)"]
CTLDATA["pg_ctl / pg_controldata<br/>postmaster lifecycle + read pg_control<br/>(postgres-pg-ctl-controldata.md)"]
end
UPGRADE --> CLUSTER
COMBINE --> CLUSTER
REWIND --> CLUSTER
WALDUMP --> WAL
CTLDATA --> CTRL
Two structural facts to carry forward:
genbki.plruns once at build, not at runtime. It is a Perl script, not a C program, and it emitspostgres.bki(plus symbol headers likepg_*_d.h) from the catalog header files and their.datcompanions.initdbships that.bkiand replays it through a backend launched in bootstrap mode (postgres --boot), whose interpreter isbootstrap.c. This is the only place in the whole tree where a build-time generator and a backend mode are paired into one tool’s story, which is why they collapse into a single detail doc.- The “offline” tools assume the server is down (or the files are a
consistent snapshot).
pg_upgrade,pg_rewind, andpg_combinebackuprewrite or readPGDATAdirectly and would corrupt a live cluster; that shared precondition is what makes them a family distinct from the protocol-talking tools.
Reading order
Section titled “Reading order”Cross-referenced-first: read the genesis tool before anything that assumes a cluster exists, then the export/migration pair, then the physical tools.
postgres-initdb-bootstrap-genbki.md— start here. It explains where a cluster comes from and how the catalogs get onto disk, which every other tool presupposes. Read alongsidepostgres-system-catalogs.mdfor what the.bkiis populating.postgres-psql.md— the client you will use to observe every other tool’s effects; light, and a good orientation to the FE/BE protocol from the frontend side.postgres-pg-dump-restore.md— the logical export model (archive formats, dependency-sorted restore). Read beforepg_upgrade, which builds on it.postgres-pg-upgrade.md— composes a schema-only dump with a physical relfilenode swap; only makes sense after both dump/restore and the storage layout are familiar.postgres-pg-basebackup.md(with thepg_combinebackupcompanion) — the physical backup path; cross-reference replication-ha for the server-sideBASE_BACKUPcommand and incremental-backup WAL summaries.postgres-pg-rewind.md— resynchronizes a diverged data directory; read after base backup and after txn-recovery’s timeline concepts.postgres-pg-waldump.md— WAL inspection; read afterpostgres-wal-records-rmgr.mdso the rmgrdesccallbacks it reuses are already familiar.postgres-pg-ctl-controldata.md— postmaster lifecycle wrapper and thepg_controlreader; a fitting capstone sincepg_controlis the file every other tool ultimately trusts.
Detail-doc summaries
Section titled “Detail-doc summaries”These are forward references; the module docs may not exist yet. Each row is a predictive one-line scope statement.
| Module doc | What it will cover |
|---|---|
postgres-initdb-bootstrap-genbki.md | The cluster-genesis pipeline: how genbki.pl turns pg_*.h + pg_*.dat catalog sources into postgres.bki at build time, and how initdb replays that script through a standalone backend in bootstrap mode (bootstrap.c) to populate template1, run the system-view SQL, and stamp pg_control. |
postgres-pg-dump-restore.md | The logical export/restore model: pg_dump walks the catalog over the FE/BE protocol into a TOC-based archive, pg_dump_sort orders objects by dependency, the pg_backup_archiver engine emits plain/custom/directory/tar formats, and pg_restore (and parallel restore) replays them; plus pg_dumpall for globals. |
postgres-pg-upgrade.md | In-place major-version upgrade: how it dumps schema only from the old cluster, initdbs the new one, and moves relfilenode files instead of re-loading data, with the consistency checks (check.c), relfilenumber mapping, and parallel per-database orchestration that make the file swap safe. |
postgres-pg-basebackup.md | The physical base-backup client: how it issues the BASE_BACKUP replication command, streams the data directory and the concurrent WAL (the walmethods/receivelog paths), applies compression, and writes the backup manifest; with the pg_combinebackup companion that merges a full backup plus incrementals into a synthetic full. |
postgres-pg-rewind.md | Resynchronizing a diverged data directory to a source timeline: building the file map by replaying WAL from the last common checkpoint (parsexlog.c, filemap.c), then copying only the changed blocks from a local or libpq source instead of taking a fresh base backup. |
postgres-pg-waldump.md | The WAL decoder: how pg_waldump reads raw WAL segments with the frontend XLogReader and renders each record human-readably by linking the same per-rmgr desc callbacks the backend uses (rmgrdesc.c), plus its filtering, stats, and follow modes. |
postgres-psql.md | The interactive terminal client: the read/eval loop (mainloop.c), backslash meta-command dispatch, variable and \set handling, the describe.c catalog-introspection queries behind \d, query buffering, and COPY/pager integration — all as a frontend over libpq. |
postgres-pg-ctl-controldata.md | Cluster lifecycle and control-file inspection: how pg_ctl starts/stops/reloads/promotes by spawning and signalling the postmaster and polling for readiness, and how pg_controldata reads and prints the ControlFileData struct (state, checkpoint LSN, timeline, settings) using the same control-file layout the backend writes in xlog.c. |
Adjacent sections
Section titled “Adjacent sections”- txn-recovery (
postgres-overview-txn-recovery.md) — the closest neighbor. It owns the WAL record format and rmgrdesccallbacks thatpg_waldumpreuses, theControlFileData/pg_controlcontract thatpg_controldataprints andpg_rewindreads, and the timeline and checkpoint concepts behindpg_rewindand base backup. Utilities are the external readers of everything txn-recovery defines on disk. - replication-ha (
postgres-overview-replication-ha.md) — owns the server-sideBASE_BACKUPcommand, WAL summarization, the backup manifest, and the incremental-backup machinery thatpg_basebackup/pg_combinebackupare the clients of; also owns the replication-only frontend tools (pg_receivewal,pg_recvlogical,pg_createsubscriber) that share thesrc/bin/pg_basebackup/directory but not this section. - system-catalog (
postgres-overview-system-catalog.md) — owns the catalog tables thatgenbki.plpopulates and thatpg_dumpand psql’s\dintrospect; the genesis doc generates the.bki, this section owns what the.bkimeans. - client-protocol (
postgres-overview-client-protocol.md) — owns the FE/BE wire protocol and authentication thatpsql,pg_dump, andpg_basebackupspeak from the frontend; utilities are protocol clients, client-protocol owns the protocol itself. - server-architecture (
postgres-overview-server-architecture.md) — owns the postmaster thatpg_ctlstarts and stops, and the bootstrap-mode backend thatinitdbdrives; utilities wrap the lifecycle that server-architecture defines.