Skip to content

PostgreSQL Utilities — Section Overview

Contents:

This subcategory is the tooling tree — the standalone programs under src/bin/, plus the one build-time code generator (catalog/genbki.pl) and the one backend mode (bootstrap/bootstrap.c) that the genesis tool depends on. Everything here shares a defining property that separates it from the other twelve subcategories: none of it runs inside the postmaster’s shared-memory machine. These are separate executables. They act on a cluster from the outside — by talking the FE/BE wire protocol to a running server, by reading and writing the on-disk files of a stopped (or backup-snapshotted) cluster, or — for initdb — by creating the cluster before any server exists.

Concretely, the tools covered here are: initdb and the bootstrap/genbki catalog codegen behind it; pg_dump / pg_restore / pg_dumpall; pg_upgrade; pg_basebackup; pg_combinebackup; pg_rewind; pg_waldump; psql; and pg_ctl / pg_controldata.

The sharp boundaries — what this section does not own:

  • The mechanisms the tools invoke live in other subcategories. A tool is a client of an on-disk or on-wire contract; the contract itself is owned elsewhere. pg_waldump decodes WAL records, but the WAL format and the resource-manager desc callbacks belong to txn-recovery (postgres-wal-records-rmgr.md). pg_basebackup streams a base backup, but the server-side BASE_BACKUP replication command, WAL summarization, and the backup manifest belong to replication-ha. pg_controldata prints pg_control, but the ControlFileData struct and its update path belong to txn-recovery (postgres-xlog-wal.md). The rule: this section owns the executables; it hands off the format and protocol to the subcategory that defines them.
  • The catalog content belongs to system-catalog. genbki.pl is covered here as a code-generation pipeline (header + .dat.bki), but what the catalogs are (pg_class, pg_proc, the relcache that reads them) is system-catalog (postgres-system-catalogs.md).
  • The replication-only frontend tools that happen to live in src/bin/pg_basebackup/pg_receivewal, pg_recvlogical, pg_createsubscriber — are replication-ha scope, not utilities. They are named here only to explain why the directory is larger than the one tool this section claims from it.
  • contrib/ is out of scope for the whole tree; tools such as pg_amcheck, pgbench, pg_resetwal, pg_checksums, and pg_verifybackup that do live in src/bin/ but are not in the plan’s module catalog are out of this section’s scope and may be named only as examples.

The layering: tools sit outside the server, grouped by the contract they touch

Section titled “The layering: tools sit outside the server, grouped by the contract they touch”

The honest picture is not a layered stack but a rim of external executables around the cluster’s three on-disk/on-wire contracts: the catalog/.bki genesis path, the FE/BE protocol, and the WAL + data-file on-disk format. Each tool is positioned by which contract it touches.

flowchart TB
  subgraph BUILD["build time (no cluster yet)"]
    GENBKI["genbki.pl<br/>pg_*.h + pg_*.dat -> postgres.bki<br/>(postgres-initdb-bootstrap-genbki.md)"]
  end

  subgraph GENESIS["cluster genesis"]
    INITDB["initdb<br/>drives 'postgres --boot' on postgres.bki,<br/>writes system views, stamps pg_control<br/>(postgres-initdb-bootstrap-genbki.md)"]
  end
  GENBKI -- "postgres.bki" --> INITDB

  subgraph CLUSTER["a cluster on disk (PGDATA)"]
    direction LR
    CTRL["pg_control"]
    DATA["base/ data files"]
    WAL["pg_wal/ + summaries"]
  end
  INITDB -- "creates" --> CLUSTER

  subgraph FEBE["over the FE/BE protocol (server running)"]
    DUMP["pg_dump / pg_restore / pg_dumpall<br/>schema+data as SQL or archive<br/>(postgres-pg-dump-restore.md)"]
    PSQL["psql<br/>interactive client + meta-commands<br/>(postgres-psql.md)"]
    BASE["pg_basebackup<br/>BASE_BACKUP + WAL stream<br/>(postgres-pg-basebackup.md)"]
  end
  DUMP <--> CLUSTER
  PSQL <--> CLUSTER
  BASE <-- "streams" --> CLUSTER

  subgraph OFFLINE["direct on-disk file access (server stopped / snapshot)"]
    UPGRADE["pg_upgrade<br/>schema dump + relfilenode swap<br/>(postgres-pg-upgrade.md)"]
    COMBINE["pg_combinebackup<br/>full + incremental -> synthetic full<br/>(postgres-pg-basebackup.md companion)"]
    REWIND["pg_rewind<br/>diverged data dir -> source timeline<br/>(postgres-pg-rewind.md)"]
    WALDUMP["pg_waldump<br/>decode WAL via rmgr desc callbacks<br/>(postgres-pg-waldump.md)"]
    CTLDATA["pg_ctl / pg_controldata<br/>postmaster lifecycle + read pg_control<br/>(postgres-pg-ctl-controldata.md)"]
  end
  UPGRADE --> CLUSTER
  COMBINE --> CLUSTER
  REWIND --> CLUSTER
  WALDUMP --> WAL
  CTLDATA --> CTRL

Two structural facts to carry forward:

  • genbki.pl runs once at build, not at runtime. It is a Perl script, not a C program, and it emits postgres.bki (plus symbol headers like pg_*_d.h) from the catalog header files and their .dat companions. initdb ships that .bki and replays it through a backend launched in bootstrap mode (postgres --boot), whose interpreter is bootstrap.c. This is the only place in the whole tree where a build-time generator and a backend mode are paired into one tool’s story, which is why they collapse into a single detail doc.
  • The “offline” tools assume the server is down (or the files are a consistent snapshot). pg_upgrade, pg_rewind, and pg_combinebackup rewrite or read PGDATA directly and would corrupt a live cluster; that shared precondition is what makes them a family distinct from the protocol-talking tools.

Cross-referenced-first: read the genesis tool before anything that assumes a cluster exists, then the export/migration pair, then the physical tools.

  1. postgres-initdb-bootstrap-genbki.md — start here. It explains where a cluster comes from and how the catalogs get onto disk, which every other tool presupposes. Read alongside postgres-system-catalogs.md for what the .bki is populating.
  2. postgres-psql.md — the client you will use to observe every other tool’s effects; light, and a good orientation to the FE/BE protocol from the frontend side.
  3. postgres-pg-dump-restore.md — the logical export model (archive formats, dependency-sorted restore). Read before pg_upgrade, which builds on it.
  4. postgres-pg-upgrade.md — composes a schema-only dump with a physical relfilenode swap; only makes sense after both dump/restore and the storage layout are familiar.
  5. postgres-pg-basebackup.md (with the pg_combinebackup companion) — the physical backup path; cross-reference replication-ha for the server-side BASE_BACKUP command and incremental-backup WAL summaries.
  6. postgres-pg-rewind.md — resynchronizes a diverged data directory; read after base backup and after txn-recovery’s timeline concepts.
  7. postgres-pg-waldump.md — WAL inspection; read after postgres-wal-records-rmgr.md so the rmgr desc callbacks it reuses are already familiar.
  8. postgres-pg-ctl-controldata.md — postmaster lifecycle wrapper and the pg_control reader; a fitting capstone since pg_control is the file every other tool ultimately trusts.

These are forward references; the module docs may not exist yet. Each row is a predictive one-line scope statement.

Module docWhat it will cover
postgres-initdb-bootstrap-genbki.mdThe cluster-genesis pipeline: how genbki.pl turns pg_*.h + pg_*.dat catalog sources into postgres.bki at build time, and how initdb replays that script through a standalone backend in bootstrap mode (bootstrap.c) to populate template1, run the system-view SQL, and stamp pg_control.
postgres-pg-dump-restore.mdThe logical export/restore model: pg_dump walks the catalog over the FE/BE protocol into a TOC-based archive, pg_dump_sort orders objects by dependency, the pg_backup_archiver engine emits plain/custom/directory/tar formats, and pg_restore (and parallel restore) replays them; plus pg_dumpall for globals.
postgres-pg-upgrade.mdIn-place major-version upgrade: how it dumps schema only from the old cluster, initdbs the new one, and moves relfilenode files instead of re-loading data, with the consistency checks (check.c), relfilenumber mapping, and parallel per-database orchestration that make the file swap safe.
postgres-pg-basebackup.mdThe physical base-backup client: how it issues the BASE_BACKUP replication command, streams the data directory and the concurrent WAL (the walmethods/receivelog paths), applies compression, and writes the backup manifest; with the pg_combinebackup companion that merges a full backup plus incrementals into a synthetic full.
postgres-pg-rewind.mdResynchronizing a diverged data directory to a source timeline: building the file map by replaying WAL from the last common checkpoint (parsexlog.c, filemap.c), then copying only the changed blocks from a local or libpq source instead of taking a fresh base backup.
postgres-pg-waldump.mdThe WAL decoder: how pg_waldump reads raw WAL segments with the frontend XLogReader and renders each record human-readably by linking the same per-rmgr desc callbacks the backend uses (rmgrdesc.c), plus its filtering, stats, and follow modes.
postgres-psql.mdThe interactive terminal client: the read/eval loop (mainloop.c), backslash meta-command dispatch, variable and \set handling, the describe.c catalog-introspection queries behind \d, query buffering, and COPY/pager integration — all as a frontend over libpq.
postgres-pg-ctl-controldata.mdCluster lifecycle and control-file inspection: how pg_ctl starts/stops/reloads/promotes by spawning and signalling the postmaster and polling for readiness, and how pg_controldata reads and prints the ControlFileData struct (state, checkpoint LSN, timeline, settings) using the same control-file layout the backend writes in xlog.c.
  • txn-recovery (postgres-overview-txn-recovery.md) — the closest neighbor. It owns the WAL record format and rmgr desc callbacks that pg_waldump reuses, the ControlFileData/pg_control contract that pg_controldata prints and pg_rewind reads, and the timeline and checkpoint concepts behind pg_rewind and base backup. Utilities are the external readers of everything txn-recovery defines on disk.
  • replication-ha (postgres-overview-replication-ha.md) — owns the server-side BASE_BACKUP command, WAL summarization, the backup manifest, and the incremental-backup machinery that pg_basebackup / pg_combinebackup are the clients of; also owns the replication-only frontend tools (pg_receivewal, pg_recvlogical, pg_createsubscriber) that share the src/bin/pg_basebackup/ directory but not this section.
  • system-catalog (postgres-overview-system-catalog.md) — owns the catalog tables that genbki.pl populates and that pg_dump and psql’s \d introspect; the genesis doc generates the .bki, this section owns what the .bki means.
  • client-protocol (postgres-overview-client-protocol.md) — owns the FE/BE wire protocol and authentication that psql, pg_dump, and pg_basebackup speak from the frontend; utilities are protocol clients, client-protocol owns the protocol itself.
  • server-architecture (postgres-overview-server-architecture.md) — owns the postmaster that pg_ctl starts and stops, and the bootstrap-mode backend that initdb drives; utilities wrap the lifecycle that server-architecture defines.