PostgreSQL Utilities — Section Overview

Contents:

What this section covers
The layering: tools sit outside the server, grouped by the contract they touch
Reading order
Detail-doc summaries
Adjacent sections

What this section covers

This subcategory is the tooling tree — the standalone programs under src/bin/, plus the one build-time code generator (catalog/genbki.pl) and the one backend mode (bootstrap/bootstrap.c) that the genesis tool depends on. Everything here shares a defining property that separates it from the other twelve subcategories: none of it runs inside the postmaster’s shared-memory machine. These are separate executables. They act on a cluster from the outside — by talking the FE/BE wire protocol to a running server, by reading and writing the on-disk files of a stopped (or backup-snapshotted) cluster, or — for initdb — by creating the cluster before any server exists.

Concretely, the tools covered here are: initdb and the bootstrap/genbki catalog codegen behind it; pg_dump / pg_restore / pg_dumpall; pg_upgrade; pg_basebackup; pg_combinebackup; pg_rewind; pg_waldump; psql; and pg_ctl / pg_controldata.

The sharp boundaries — what this section does not own:

The mechanisms the tools invoke live in other subcategories. A tool is a client of an on-disk or on-wire contract; the contract itself is owned elsewhere. pg_waldump decodes WAL records, but the WAL format and the resource-manager desc callbacks belong to txn-recovery (postgres-wal-records-rmgr.md). pg_basebackup streams a base backup, but the server-side BASE_BACKUP replication command, WAL summarization, and the backup manifest belong to replication-ha. pg_controldata prints pg_control, but the ControlFileData struct and its update path belong to txn-recovery (postgres-xlog-wal.md). The rule: this section owns the executables; it hands off the format and protocol to the subcategory that defines them.
The catalog content belongs to system-catalog. genbki.pl is covered here as a code-generation pipeline (header + .dat → .bki), but what the catalogs are (pg_class, pg_proc, the relcache that reads them) is system-catalog (postgres-system-catalogs.md).
The replication-only frontend tools that happen to live in src/bin/pg_basebackup/ — pg_receivewal, pg_recvlogical, pg_createsubscriber — are replication-ha scope, not utilities. They are named here only to explain why the directory is larger than the one tool this section claims from it.
contrib/ is out of scope for the whole tree; tools such as pg_amcheck, pgbench, pg_resetwal, pg_checksums, and pg_verifybackup that do live in src/bin/ but are not in the plan’s module catalog are out of this section’s scope and may be named only as examples.

The layering: tools sit outside the server, grouped by the contract they touch

The honest picture is not a layered stack but a rim of external executables around the cluster’s three on-disk/on-wire contracts: the catalog/.bki genesis path, the FE/BE protocol, and the WAL + data-file on-disk format. Each tool is positioned by which contract it touches.

flowchart TB
  subgraph BUILD["build time (no cluster yet)"]
    GENBKI["genbki.pl<br/>pg_*.h + pg_*.dat -> postgres.bki<br/>(postgres-initdb-bootstrap-genbki.md)"]
  end

  subgraph GENESIS["cluster genesis"]
    INITDB["initdb<br/>drives 'postgres --boot' on postgres.bki,<br/>writes system views, stamps pg_control<br/>(postgres-initdb-bootstrap-genbki.md)"]
  end
  GENBKI -- "postgres.bki" --> INITDB

  subgraph CLUSTER["a cluster on disk (PGDATA)"]
    direction LR
    CTRL["pg_control"]
    DATA["base/ data files"]
    WAL["pg_wal/ + summaries"]
  end
  INITDB -- "creates" --> CLUSTER

  subgraph FEBE["over the FE/BE protocol (server running)"]
    DUMP["pg_dump / pg_restore / pg_dumpall<br/>schema+data as SQL or archive<br/>(postgres-pg-dump-restore.md)"]
    PSQL["psql<br/>interactive client + meta-commands<br/>(postgres-psql.md)"]
    BASE["pg_basebackup<br/>BASE_BACKUP + WAL stream<br/>(postgres-pg-basebackup.md)"]
  end
  DUMP <--> CLUSTER
  PSQL <--> CLUSTER
  BASE <-- "streams" --> CLUSTER

  subgraph OFFLINE["direct on-disk file access (server stopped / snapshot)"]
    UPGRADE["pg_upgrade<br/>schema dump + relfilenode swap<br/>(postgres-pg-upgrade.md)"]
    COMBINE["pg_combinebackup<br/>full + incremental -> synthetic full<br/>(postgres-pg-basebackup.md companion)"]
    REWIND["pg_rewind<br/>diverged data dir -> source timeline<br/>(postgres-pg-rewind.md)"]
    WALDUMP["pg_waldump<br/>decode WAL via rmgr desc callbacks<br/>(postgres-pg-waldump.md)"]
    CTLDATA["pg_ctl / pg_controldata<br/>postmaster lifecycle + read pg_control<br/>(postgres-pg-ctl-controldata.md)"]
  end
  UPGRADE --> CLUSTER
  COMBINE --> CLUSTER
  REWIND --> CLUSTER
  WALDUMP --> WAL
  CTLDATA --> CTRL

Two structural facts to carry forward:

genbki.pl runs once at build, not at runtime. It is a Perl script, not a C program, and it emits postgres.bki (plus symbol headers like pg_*_d.h) from the catalog header files and their .dat companions. initdb ships that .bki and replays it through a backend launched in bootstrap mode (postgres --boot), whose interpreter is bootstrap.c. This is the only place in the whole tree where a build-time generator and a backend mode are paired into one tool’s story, which is why they collapse into a single detail doc.
The “offline” tools assume the server is down (or the files are a consistent snapshot). pg_upgrade, pg_rewind, and pg_combinebackup rewrite or read PGDATA directly and would corrupt a live cluster; that shared precondition is what makes them a family distinct from the protocol-talking tools.

Reading order

Cross-referenced-first: read the genesis tool before anything that assumes a cluster exists, then the export/migration pair, then the physical tools.

postgres-initdb-bootstrap-genbki.md — start here. It explains where a cluster comes from and how the catalogs get onto disk, which every other tool presupposes. Read alongside postgres-system-catalogs.md for what the .bki is populating.
postgres-psql.md — the client you will use to observe every other tool’s effects; light, and a good orientation to the FE/BE protocol from the frontend side.
postgres-pg-dump-restore.md — the logical export model (archive formats, dependency-sorted restore). Read before pg_upgrade, which builds on it.
postgres-pg-upgrade.md — composes a schema-only dump with a physical relfilenode swap; only makes sense after both dump/restore and the storage layout are familiar.
postgres-pg-basebackup.md (with the pg_combinebackup companion) — the physical backup path; cross-reference replication-ha for the server-side BASE_BACKUP command and incremental-backup WAL summaries.
postgres-pg-rewind.md — resynchronizes a diverged data directory; read after base backup and after txn-recovery’s timeline concepts.
postgres-pg-waldump.md — WAL inspection; read after postgres-wal-records-rmgr.md so the rmgr desc callbacks it reuses are already familiar.
postgres-pg-ctl-controldata.md — postmaster lifecycle wrapper and the pg_control reader; a fitting capstone since pg_control is the file every other tool ultimately trusts.

Detail-doc summaries

These are forward references; the module docs may not exist yet. Each row is a predictive one-line scope statement.

Module doc	What it will cover
`postgres-initdb-bootstrap-genbki.md`	The cluster-genesis pipeline: how `genbki.pl` turns `pg_.h` + `pg_.dat` catalog sources into `postgres.bki` at build time, and how `initdb` replays that script through a standalone backend in bootstrap mode (`bootstrap.c`) to populate `template1`, run the system-view SQL, and stamp `pg_control`.
`postgres-pg-dump-restore.md`	The logical export/restore model: `pg_dump` walks the catalog over the FE/BE protocol into a TOC-based archive, `pg_dump_sort` orders objects by dependency, the `pg_backup_archiver` engine emits plain/custom/directory/tar formats, and `pg_restore` (and parallel restore) replays them; plus `pg_dumpall` for globals.
`postgres-pg-upgrade.md`	In-place major-version upgrade: how it dumps schema only from the old cluster, `initdb`s the new one, and moves relfilenode files instead of re-loading data, with the consistency checks (`check.c`), relfilenumber mapping, and parallel per-database orchestration that make the file swap safe.
`postgres-pg-basebackup.md`	The physical base-backup client: how it issues the `BASE_BACKUP` replication command, streams the data directory and the concurrent WAL (the `walmethods`/`receivelog` paths), applies compression, and writes the backup manifest; with the `pg_combinebackup` companion that merges a full backup plus incrementals into a synthetic full.
`postgres-pg-rewind.md`	Resynchronizing a diverged data directory to a source timeline: building the file map by replaying WAL from the last common checkpoint (`parsexlog.c`, `filemap.c`), then copying only the changed blocks from a local or libpq source instead of taking a fresh base backup.
`postgres-pg-waldump.md`	The WAL decoder: how `pg_waldump` reads raw WAL segments with the frontend `XLogReader` and renders each record human-readably by linking the same per-rmgr `desc` callbacks the backend uses (`rmgrdesc.c`), plus its filtering, stats, and follow modes.
`postgres-psql.md`	The interactive terminal client: the read/eval loop (`mainloop.c`), backslash meta-command dispatch, variable and `\set` handling, the `describe.c` catalog-introspection queries behind `\d`, query buffering, and `COPY`/pager integration — all as a frontend over libpq.
`postgres-pg-ctl-controldata.md`	Cluster lifecycle and control-file inspection: how `pg_ctl` starts/stops/reloads/promotes by spawning and signalling the postmaster and polling for readiness, and how `pg_controldata` reads and prints the `ControlFileData` struct (state, checkpoint LSN, timeline, settings) using the same control-file layout the backend writes in `xlog.c`.

Adjacent sections

txn-recovery (postgres-overview-txn-recovery.md) — the closest neighbor. It owns the WAL record format and rmgr desc callbacks that pg_waldump reuses, the ControlFileData/pg_control contract that pg_controldata prints and pg_rewind reads, and the timeline and checkpoint concepts behind pg_rewind and base backup. Utilities are the external readers of everything txn-recovery defines on disk.
replication-ha (postgres-overview-replication-ha.md) — owns the server-side BASE_BACKUP command, WAL summarization, the backup manifest, and the incremental-backup machinery that pg_basebackup / pg_combinebackup are the clients of; also owns the replication-only frontend tools (pg_receivewal, pg_recvlogical, pg_createsubscriber) that share the src/bin/pg_basebackup/ directory but not this section.
system-catalog (postgres-overview-system-catalog.md) — owns the catalog tables that genbki.pl populates and that pg_dump and psql’s \d introspect; the genesis doc generates the .bki, this section owns what the .bki means.
client-protocol (postgres-overview-client-protocol.md) — owns the FE/BE wire protocol and authentication that psql, pg_dump, and pg_basebackup speak from the frontend; utilities are protocol clients, client-protocol owns the protocol itself.
server-architecture (postgres-overview-server-architecture.md) — owns the postmaster that pg_ctl starts and stops, and the bootstrap-mode backend that initdb drives; utilities wrap the lifecycle that server-architecture defines.