PostgreSQL pg_ctl / pg_controldata — Server Lifecycle Control and Cluster State Inspection

Contents:

Theoretical Background
Common DBMS Design
PostgreSQL’s Approach
Source Walkthrough
Source Verification (as of 2026-06-05)
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Sources

Theoretical Background

Every production database system faces a bootstrapping problem: the tool that starts and stops the server cannot use the server itself. Before the postmaster is running there is no SQL connection to open, no catalog to query, no session to receive commands. The control interface must be out-of-band — operating purely on the filesystem and OS signals.

Two complementary needs drive the design of such an interface.

Lifecycle control. An operator must be able to start, stop, restart, and reconfigure the server without writing a shell script that duplicates knowledge embedded in the server itself (where the binary lives, what options it accepts, what PID it chose). The control tool must discover that state autonomously and issue the right OS signal at the right time. Three stop modes exist in any serious implementation:

Smart: refuse new connections, wait for all existing sessions to terminate naturally, then shut down.
Fast: disconnect active sessions immediately, flush WAL, write a shutdown checkpoint.
Immediate: kill immediately without checkpoint — analogous to pulling the power; leaves crash recovery as the next startup’s job.

State inspection without a live connection. A crash, a failed upgrade, or a misconfiguration can leave the server in a state where it cannot accept connections. Administrators need a way to read the server’s last known state — the DBState, the checkpoint LSN, the timeline — from a file on disk. This requires a stable, checksummed, human-readable on-disk structure that survives the crash and can be decoded by a standalone tool.

The control file — universally present in server-class databases under names like pg_control, CURRENT, or control01.ctl — is the standard answer to the second need. Database Internals (Petrov, ch. 2 “B-Tree Basics” and the WAL chapter) notes that the control file is the first thing any crash-recovery path reads: it gives the REDO start point, the timeline, and the confirmation that the previous shutdown was clean. An atomic write (always within one disk sector, protected by a CRC) is the correctness invariant: a torn write must be detectable so recovery does not proceed from a corrupt baseline.

Common DBMS Design

The PID file as the server’s business card

A running server must leave a file — commonly named postmaster.pid or mysql.pid — that records its PID, start time, and status. This file plays three roles simultaneously:

Mutual exclusion. A second server startup attempt can read the file, check that the PID is still alive, and refuse to start, preventing two servers from sharing the same data directory.
Signal routing. A control tool sends the stop or reload signal to the PID it reads from the file, not to a hardcoded value. The PID is the only stable cross-process handle on Unix.
Readiness advertisement. A structured pidfile — with a status field in a known line position — lets the control tool poll for startup completion without implementing a separate health-check protocol.

The control file: an atomic-write state record

Every DBMS with crash-recovery needs a small (one sector, ≤ 512 bytes) record that captures the recovery entry point. The canonical fields are:

Field group	Purpose
Version identifiers	Detect cross-version or cross-architecture incompatibility before any catalog read
System identifier	Uniquely identify this cluster instance; reject WAL from a different cluster
DBState / lifecycle flags	Distinguish clean shutdown from in-recovery from crash
Checkpoint LSN + copy	Give WAL replay its REDO start point
Timeline IDs	Detect and track timeline switches (standby promotion, PITR)
WAL-level parameters	Confirm that archiving/replication settings match the WAL that was written
Compile-time constants	Detect block-size or alignment mismatches before first page read

The atomic-write constraint — the active payload must fit in one disk sector — is shared by PostgreSQL (PG_CONTROL_MAX_SAFE_SIZE = 512), Oracle (control file header block), and MySQL/InnoDB (ibdata1 system tablespace header). Violating it introduces a window where a partial write could leave an inconsistent recovery baseline.

Signal semantics for stop modes

The three stop modes map to three Unix signals in PostgreSQL (and similar mappings appear in other servers):

Mode	Signal	Postmaster behavior
Smart	SIGTERM	No new connections; wait for idle
Fast	SIGINT	Terminate backends; checkpoint; exit
Immediate	SIGQUIT	Kill all children; exit without checkpoint

A control tool that understands this mapping does not need the server to expose a management API; the OS signal mechanism is the API.

Promote: from standby to primary

Standby promotion is a state transition that cannot safely be signaled purely by sending a Unix signal to the postmaster. The postmaster must know the signal is a promote request, not a generic wakeup. The design pattern is a sentinel file in $PGDATA: the control tool creates the file, sends SIGUSR1, and the postmaster checks for the file on receipt. The file-based approach is atomic on POSIX filesystems (the open(O_CREAT) syscall is the serialization point) and survives a brief race between file creation and signal delivery.

PostgreSQL’s Approach

pg_ctl: a thin orchestration shell

pg_ctl is a standalone C binary (src/bin/pg_ctl/pg_ctl.c). It does not link against any PostgreSQL backend library; its only backend dependency is src/include/catalog/pg_control.h (for ControlFileData and DBState) and src/include/utils/pidfile.h (for the postmaster.pid line positions). Everything else — process launch, signal delivery, wait loops — is POSIX syscalls.

The command set is encoded in the CtlCommand enum:

// CtlCommand — src/bin/pg_ctl/pg_ctl.c
typedef enum
{
    NO_COMMAND = 0,
    INIT_COMMAND,
    START_COMMAND,
    STOP_COMMAND,
    RESTART_COMMAND,
    RELOAD_COMMAND,
    STATUS_COMMAND,
    PROMOTE_COMMAND,
    LOGROTATE_COMMAND,
    KILL_COMMAND,
    REGISTER_COMMAND,    /* Windows service only */
    UNREGISTER_COMMAND,  /* Windows service only */
    RUN_AS_SERVICE_COMMAND,
} CtlCommand;

Starting the server. do_start() calls start_postmaster(), which forks a child, runs /bin/sh -c "exec postgres -D … < /dev/null" in it, and returns the shell’s PID to the parent. The parent then calls wait_for_postmaster_start(), which polls postmaster.pid at 10 Hz for up to wait_seconds (default 60 s):

// start_postmaster — src/bin/pg_ctl/pg_ctl.c
pm_pid = fork();
if (pm_pid == 0) {
    /* child: detach session, exec postgres via shell */
    setsid();
    cmd = psprintf("exec \"%s\" %s%s < \"%s\" 2>&1",
                   exec_path, pgdata_opt, post_opts, DEVNULL);
    execl("/bin/sh", "/bin/sh", "-c", cmd, (char *) NULL);
    exit(1); /* exec failed */
}
return pm_pid; /* parent returns shell PID */

The wait loop reads line 8 of postmaster.pid (LOCK_FILE_LINE_PM_STATUS = 8) and checks for PM_STATUS_READY or PM_STATUS_STANDBY:

// wait_for_postmaster_start — src/bin/pg_ctl/pg_ctl.c
char *pmstatus = optlines[LOCK_FILE_LINE_PM_STATUS - 1];
if (strcmp(pmstatus, PM_STATUS_READY) == 0 ||
    strcmp(pmstatus, PM_STATUS_STANDBY) == 0)
    return POSTMASTER_READY;

If the postmaster dies before setting the ready status, pg_ctl calls get_control_dbstate() — which reads pg_control directly — to distinguish DB_SHUTDOWNED_IN_RECOVERY from a genuine startup failure.

Stopping the server. do_stop() reads the PID from postmaster.pid, sends the mode-appropriate signal (SIGTERM / SIGINT / SIGQUIT), and polls for the PID file to disappear:

// do_stop — src/bin/pg_ctl/pg_ctl.c
ShutdownMode  Signal sent
SMART_MODE    SIGTERM
FAST_MODE     SIGINT   (default)
IMMEDIATE_MODE SIGQUIT

The shutdown_mode defaults to FAST_MODE. The signal is the OS handle; pg_ctl never calls any in-process function of the server. The mapping from the textual --mode argument to the global sig variable is centralized in set_mode(), which is the single source of truth for the stop-mode → signal correspondence the table above describes:

// set_mode — src/bin/pg_ctl/pg_ctl.c
if (strcmp(modeopt, "s") == 0 || strcmp(modeopt, "smart") == 0)
{
    shutdown_mode = SMART_MODE;
    sig = SIGTERM;
}
else if (strcmp(modeopt, "f") == 0 || strcmp(modeopt, "fast") == 0)
{
    shutdown_mode = FAST_MODE;
    sig = SIGINT;
}
else if (strcmp(modeopt, "i") == 0 || strcmp(modeopt, "immediate") == 0)
{
    shutdown_mode = IMMEDIATE_MODE;
    sig = SIGQUIT;
}

sig is a file-scope global initialized to SIGINT (the fast-mode default), so a pg_ctl stop with no --mode flag never enters set_mode() and falls through with FAST_MODE semantics already in place. The kill -s command reuses the same sig global through a parallel set_sig() parser, which is why do_kill() can forward an arbitrary signal without any mode logic of its own.

Promoting a standby. do_promote() first guards against promoting a non-standby by calling get_control_dbstate() and asserting DB_IN_ARCHIVE_RECOVERY. It then writes an empty $PGDATA/promote file and sends SIGUSR1:

// do_promote — src/bin/pg_ctl/pg_ctl.c
if (get_control_dbstate() != DB_IN_ARCHIVE_RECOVERY)
{
    write_stderr(_("%s: cannot promote server; "
                   "server is not in standby mode\n"), progname);
    exit(1);
}
snprintf(promote_file, MAXPGPATH, "%s/promote", pg_data);
if ((prmfile = fopen(promote_file, "w")) == NULL) { ... exit(1); }
if (fclose(prmfile)) { ... exit(1); }
sig = SIGUSR1;
if (kill(pid, sig) != 0)
{
    write_stderr(_("%s: could not send promote signal (PID: %d): %m\n"),
                 progname, (int) pid);
    if (unlink(promote_file) != 0)        /* best-effort cleanup on failure */
        write_stderr(...);
    exit(1);
}

Two correctness details are worth noting. First, the fopen/fclose pair — not a single creat — is used so that a failure to flush the (empty) file is caught as a distinct error from a failure to create it. Second, if kill() fails after the file already exists, do_promote() unlinks the sentinel so a later restart does not silently auto-promote on a stale promote file. After the signal lands, wait_for_postmaster_promote() polls get_control_dbstate() until it observes DB_IN_PRODUCTION, bailing out early if the PID file vanishes or the postmaster dies mid-promotion:

// wait_for_postmaster_promote — src/bin/pg_ctl/pg_ctl.c
for (cnt = 0; cnt < wait_seconds * WAITS_PER_SEC; cnt++)
{
    if ((pid = get_pgpid(false)) == 0)
        return false;           /* pid file is gone */
    if (kill(pid, 0) != 0)
        return false;           /* postmaster died */

    state = get_control_dbstate();
    if (state == DB_IN_PRODUCTION)
        return true;            /* successful promotion */

    if (cnt % WAITS_PER_SEC == 0)
        print_msg(".");
    pg_usleep(USEC_PER_SEC / WAITS_PER_SEC);
}
return false;                   /* timeout reached */

This is the same 10 Hz poll cadence (WAITS_PER_SEC) used by the start and stop wait loops, but the readiness signal is the DBState read from pg_control rather than a line in postmaster.pid — promotion has no dedicated pidfile status line, so the control file is the only authoritative witness that the transition to primary has completed.

Reload. do_reload() sends SIGHUP, which the postmaster relays to all backends, causing them to reread postgresql.conf. No wait loop is needed because SIGHUP handling is asynchronous and non-disruptive.

get_control_dbstate: the bridge between pg_ctl and pg_control. This static helper, called from the wait loops and the promote guard, reads pg_control through the shared get_controlfile() utility:

// get_control_dbstate — src/bin/pg_ctl/pg_ctl.c
static DBState
get_control_dbstate(void)
{
    bool crc_ok;
    ControlFileData *ctl = get_controlfile(pg_data, &crc_ok);
    if (!crc_ok) { write_stderr("control file appears to be corrupt\n"); exit(1); }
    DBState ret = ctl->state;
    pfree(ctl);
    return ret;
}

ControlFileData and the pg_control format

ControlFileData (src/include/catalog/pg_control.h) is the on-disk layout of $PGDATA/global/pg_control. At REL_18_STABLE, PG_CONTROL_VERSION = 1800. The struct is deliberately kept under PG_CONTROL_MAX_SAFE_SIZE = 512 bytes — one disk sector — so that every write is atomic:

// ControlFileData — src/include/catalog/pg_control.h
typedef struct ControlFileData
{
    uint64      system_identifier;   /* unique cluster ID (set at initdb) */
    uint32      pg_control_version;  /* PG_CONTROL_VERSION = 1800 in PG18 */
    uint32      catalog_version_no;  /* catversion.h; changes on catalog changes */
    DBState     state;               /* current lifecycle state */
    pg_time_t   time;                /* timestamp of last pg_control update */
    XLogRecPtr  checkPoint;          /* LSN of last checkpoint record */
    CheckPoint  checkPointCopy;      /* full body of that checkpoint record */
    XLogRecPtr  unloggedLSN;         /* fake LSN counter for unlogged relations */
    XLogRecPtr  minRecoveryPoint;    /* must replay at least to here */
    TimeLineID  minRecoveryPointTLI;
    XLogRecPtr  backupStartPoint;    /* set during online backup */
    XLogRecPtr  backupEndPoint;
    bool        backupEndRequired;
    int         wal_level;
    bool        wal_log_hints;
    int         MaxConnections;
    int         max_worker_processes;
    int         max_wal_senders;
    int         max_prepared_xacts;
    int         max_locks_per_xact;
    bool        track_commit_timestamp;
    uint32      maxAlign;
    double      floatFormat;         /* = 1234567.0; architecture check */
    uint32      blcksz;              /* data block size */
    uint32      relseg_size;         /* blocks per large-relation segment */
    uint32      xlog_blcksz;         /* WAL block size */
    uint32      xlog_seg_size;       /* WAL segment size */
    uint32      nameDataLen;         /* NAMEDATALEN */
    uint32      indexMaxKeys;
    uint32      toast_max_chunk_size;
    uint32      loblksize;
    bool        float8ByVal;
    uint32      data_checksum_version;
    bool        default_char_signedness; /* new in PG18 */
    char        mock_authentication_nonce[MOCK_AUTH_NONCE_LEN];
    pg_crc32c   crc;                 /* MUST BE LAST */
} ControlFileData;

DBState is the lifecycle state machine for the cluster:

// DBState — src/include/catalog/pg_control.h
typedef enum DBState
{
    DB_STARTUP = 0,
    DB_SHUTDOWNED,
    DB_SHUTDOWNED_IN_RECOVERY,
    DB_SHUTDOWNING,
    DB_IN_CRASH_RECOVERY,
    DB_IN_ARCHIVE_RECOVERY,
    DB_IN_PRODUCTION,
} DBState;

DB_SHUTDOWNED is the only state from which a clean, non-recovery startup proceeds. Any other state triggers WAL replay on next startup. pg_ctl relies on DBState in two places: the promote guard (must see DB_IN_ARCHIVE_RECOVERY) and the startup-wait fallback (treats DB_SHUTDOWNED_IN_RECOVERY as a non-error exit).

The CheckPoint struct embedded in checkPointCopy carries:

// CheckPoint — src/include/catalog/pg_control.h
typedef struct CheckPoint
{
    XLogRecPtr  redo;              /* REDO start LSN */
    TimeLineID  ThisTimeLineID;
    TimeLineID  PrevTimeLineID;    /* non-zero if this record begins a new TL */
    bool        fullPageWrites;
    int         wal_level;
    FullTransactionId nextXid;
    Oid         nextOid;
    MultiXactId nextMulti;
    MultiXactOffset nextMultiOffset;
    TransactionId oldestXid;
    Oid         oldestXidDB;
    MultiXactId oldestMulti;
    Oid         oldestMultiDB;
    pg_time_t   time;
    TransactionId oldestCommitTsXid;
    TransactionId newestCommitTsXid;
    TransactionId oldestActiveXid;
} CheckPoint;

The read / write path through controldata_utils.c

Both backend and frontend code share src/common/controldata_utils.c. get_controlfile() builds the path $PGDATA/global/pg_control, opens it with O_RDONLY, reads exactly sizeof(ControlFileData) bytes, and verifies the CRC32c. In frontend (tool) mode a CRC mismatch triggers up to 10 retries with 10 ms sleeps — a guard against reading a partial write from a running server:

// get_controlfile_by_exact_path — src/common/controldata_utils.c
retry:
    fd = open(ControlFilePath, O_RDONLY | PG_BINARY, 0);
    r  = read(fd, ControlFile, sizeof(ControlFileData));
    close(fd);
    INIT_CRC32C(crc);
    COMP_CRC32C(crc, ControlFile, offsetof(ControlFileData, crc));
    FIN_CRC32C(crc);
    *crc_ok_p = EQ_CRC32C(crc, ControlFile->crc);
    if (!*crc_ok_p && retries < 10) { retries++; pg_usleep(10000); goto retry; }

update_controlfile() zero-pads the buffer to PG_CONTROL_FILE_SIZE (8192 bytes — the physical file size, kept constant across format changes so that an old binary can detect a version mismatch as a wrong-version error rather than a short read), recalculates the CRC, and writes with O_WRONLY. In backend mode the caller holds ControlFileLock before calling this function.

pg_controldata: read-only field printer

pg_controldata (src/bin/pg_controldata/pg_controldata.c) is a minimal program that calls get_controlfile() and prints every field with printf(). It carries no logic beyond field formatting. Notable points:

It #define FRONTEND 1 but #include "postgres.h" (not postgres_fe.h) because it needs the WAL-internal types (xlog_internal.h, transam.h) that only the backend header exposes.
The default_char_signedness field (new in PG18) is printed as signed / unsigned and encodes the platform’s default char signedness at initdb time — relevant when cross-compiling or migrating between ARM (unsigned) and x86 (signed) systems.
data_checksum_version is 0 when page checksums are disabled; any nonzero value indicates the checksum algorithm version in use.

// main (pg_controldata) — src/bin/pg_controldata/pg_controldata.c
ControlFile = get_controlfile(DataDir, &crc_ok);
if (!crc_ok)
    pg_log_warning("calculated CRC checksum does not match value stored in control file");

printf("pg_control version number:  %u\n",  ControlFile->pg_control_version);
printf("Database cluster state:     %s\n",  dbState(ControlFile->state));
printf("Latest checkpoint location: %X/%X\n", LSN_FORMAT_ARGS(ControlFile->checkPoint));
// ... (all ~40 fields)
printf("Default char data signedness: %s\n",
       ControlFile->default_char_signedness ? "signed" : "unsigned");
printf("Mock authentication nonce:  %s\n",  mock_auth_nonce_str);

Lifecycle flow diagram

flowchart TD
    A["pg_ctl start<br/>do_start()"] --> B["start_postmaster()<br/>fork + exec postgres"]
    B --> C["poll postmaster.pid<br/>wait_for_postmaster_start()"]
    C --> D{"LOCK_FILE_LINE_PM_STATUS<br/>== PM_STATUS_READY?"}
    D -- yes --> E["exit 0: server started"]
    D -- postmaster died --> F["get_control_dbstate()<br/>read pg_control"]
    F --> G{"DBState?"}
    G -- DB_SHUTDOWNED_IN_RECOVERY --> H["exit 0: shutdown in recovery"]
    G -- other --> I["exit 1: startup failed"]
    D -- timeout --> J["exit 1: server did not start in time"]

    K["pg_ctl stop<br/>do_stop()"] --> L["get_pgpid()<br/>read postmaster.pid"]
    L --> M["kill pid sig<br/>SIGTERM/SIGINT/SIGQUIT"]
    M --> N["poll: postmaster.pid gone?<br/>wait_for_postmaster_stop()"]
    N -- gone --> O["exit 0: server stopped"]
    N -- timeout --> P["exit 1: server does not shut down"]

    Q["pg_ctl promote<br/>do_promote()"] --> R["get_control_dbstate()"]
    R --> S{"DB_IN_ARCHIVE_RECOVERY?"}
    S -- no --> T["exit 1: not a standby"]
    S -- yes --> U["create promote file<br/>send SIGUSR1"]
    U --> V["poll get_control_dbstate()<br/>wait_for_postmaster_promote()"]
    V --> W{"DB_IN_PRODUCTION?"}
    W -- yes --> X["exit 0: server promoted"]
    W -- timeout --> Y["exit 1: promote timed out"]

DBState transition diagram

flowchart LR
    S0["DB_STARTUP"] --> S6["DB_IN_PRODUCTION"]
    S0 --> S4["DB_IN_CRASH_RECOVERY"]
    S0 --> S5["DB_IN_ARCHIVE_RECOVERY"]
    S6 --> S3["DB_SHUTDOWNING"]
    S3 --> S1["DB_SHUTDOWNED"]
    S5 --> S6
    S4 --> S1
    S5 --> S2["DB_SHUTDOWNED_IN_RECOVERY"]

Source Walkthrough

pg_ctl.c — function inventory

Symbol	Role
`CtlCommand` (enum)	Command discriminator
`ShutdownMode` (enum)	`SMART_MODE`, `FAST_MODE`, `IMMEDIATE_MODE`
`WaitPMResult` (enum)	Start-wait outcome
`main`	Parse options, build file paths, dispatch `ctl_command`
`do_init`	Fork `initdb`
`do_start`	Fork `postgres`; wait via `wait_for_postmaster_start`
`do_stop`	Send SIGTERM/INT/QUIT; wait via `wait_for_postmaster_stop`
`do_restart`	`do_stop` then `do_start`
`do_reload`	Send SIGHUP
`do_promote`	Create `promote` sentinel file; send SIGUSR1; wait
`do_logrotate`	Create `logrotate` sentinel file; send SIGUSR1
`do_status`	Print PID and opts from `postmaster.pid`
`do_kill`	Send arbitrary signal to a given PID
`start_postmaster`	`fork` + `exec /bin/sh -c "exec postgres …"`
`wait_for_postmaster_start`	Poll `postmaster.pid` line 8 at 10 Hz
`wait_for_postmaster_stop`	Poll `postmaster.pid` absence at 10 Hz
`wait_for_postmaster_promote`	Poll `get_control_dbstate()` at 10 Hz
`get_pgpid`	Read PID from `postmaster.pid` line 1
`get_control_dbstate`	Read `pg_control` via `get_controlfile()`; return `state`
`read_post_opts`	Read saved options from `postmaster.opts` (used in restart)
`postmaster_is_alive`	`kill(pid, 0)` liveness check
`trap_sigint_during_startup`	Forward SIGINT to postmaster during start wait
`set_mode`	Parse `--mode` to `ShutdownMode`; set global `sig` (SIGTERM/SIGINT/SIGQUIT)
`set_sig`	Parse `kill -s` signal name to global `sig`
`adjust_data_dir`	Handle `-D` pointing at config-only directory

pg_controldata.c — function inventory

Symbol	Role
`main`	Parse `-D`; call `get_controlfile()`; print all fields
`dbState`	Map `DBState` enum to human-readable string
`wal_level_str`	Map `WalLevel` enum to string

controldata_utils.c — shared read/write path

Symbol	Role
`get_controlfile`	Build path `$PGDATA/global/pg_control`; delegate to `get_controlfile_by_exact_path`
`get_controlfile_by_exact_path`	Open, read, CRC-verify; retry up to 10× in frontend mode
`update_controlfile`	Recompute CRC, zero-pad to 8192 B, write; `do_sync` controls `fsync`

Key structs and constants

Symbol	Header	Note
`ControlFileData`	`catalog/pg_control.h`	On-disk pg_control layout; ≤ 512 B active payload
`CheckPoint`	`catalog/pg_control.h`	Embedded in `ControlFileData.checkPointCopy`
`DBState`	`catalog/pg_control.h`	7-value lifecycle enum
`PG_CONTROL_VERSION`	`catalog/pg_control.h`	`1800` at REL_18_STABLE
`PG_CONTROL_MAX_SAFE_SIZE`	`catalog/pg_control.h`	`512` — one-sector atomic-write limit
`PG_CONTROL_FILE_SIZE`	`catalog/pg_control.h`	`8192` — physical file size, version-mismatch probe
`LOCK_FILE_LINE_PM_STATUS`	`utils/pidfile.h`	Line 8 in `postmaster.pid`
`PM_STATUS_READY`	`utils/pidfile.h`	`"ready "` — readiness sentinel
`PM_STATUS_STANDBY`	`utils/pidfile.h`	`"standby "` — hot-standby ready

Source Verification (as of 2026-06-05)

Position hints for REL_18_STABLE commit 273fe94. Symbols are the stable anchor; line numbers decay as the tree evolves.

Symbol	File	Approx. line
`CtlCommand` enum	`src/bin/pg_ctl/pg_ctl.c`	53
`ShutdownMode` enum	`src/bin/pg_ctl/pg_ctl.c`	37
`WaitPMResult` enum	`src/bin/pg_ctl/pg_ctl.c`	44
`main`	`src/bin/pg_ctl/pg_ctl.c`	2202
`do_start`	`src/bin/pg_ctl/pg_ctl.c`	931
`do_stop`	`src/bin/pg_ctl/pg_ctl.c`	1027
`do_restart`	`src/bin/pg_ctl/pg_ctl.c`	1085
`do_reload`	`src/bin/pg_ctl/pg_ctl.c`	1149
`do_promote`	`src/bin/pg_ctl/pg_ctl.c`	1186
`do_logrotate`	`src/bin/pg_ctl/pg_ctl.c`	1267
`do_status`	`src/bin/pg_ctl/pg_ctl.c`	1348
`do_kill`	`src/bin/pg_ctl/pg_ctl.c`	1405
`start_postmaster`	`src/bin/pg_ctl/pg_ctl.c`	439
`wait_for_postmaster_start`	`src/bin/pg_ctl/pg_ctl.c`	593
`wait_for_postmaster_stop`	`src/bin/pg_ctl/pg_ctl.c`	717
`wait_for_postmaster_promote`	`src/bin/pg_ctl/pg_ctl.c`	754
`get_pgpid`	`src/bin/pg_ctl/pg_ctl.c`	246
`get_control_dbstate`	`src/bin/pg_ctl/pg_ctl.c`	2183
`postmaster_is_alive`	`src/bin/pg_ctl/pg_ctl.c`	1324
`trap_sigint_during_startup`	`src/bin/pg_ctl/pg_ctl.c`	857
`read_post_opts`	`src/bin/pg_ctl/pg_ctl.c`	802
`set_mode`	`src/bin/pg_ctl/pg_ctl.c`	2047
`set_sig`	`src/bin/pg_ctl/pg_ctl.c`	2075
`dbState`	`src/bin/pg_controldata/pg_controldata.c`	49
`wal_level_str`	`src/bin/pg_controldata/pg_controldata.c`	73
`main` (pg_controldata)	`src/bin/pg_controldata/pg_controldata.c`	88
`get_controlfile`	`src/common/controldata_utils.c`	52
`get_controlfile_by_exact_path`	`src/common/controldata_utils.c`	68
`update_controlfile`	`src/common/controldata_utils.c`	189
`ControlFileData` struct	`src/include/catalog/pg_control.h`	104
`CheckPoint` struct	`src/include/catalog/pg_control.h`	35
`DBState` enum	`src/include/catalog/pg_control.h`	89
`PG_CONTROL_VERSION`	`src/include/catalog/pg_control.h`	25
`PG_CONTROL_MAX_SAFE_SIZE`	`src/include/catalog/pg_control.h`	247
`PG_CONTROL_FILE_SIZE`	`src/include/catalog/pg_control.h`	256
`LOCK_FILE_LINE_PM_STATUS`	`src/include/utils/pidfile.h`	44
`PM_STATUS_READY`	`src/include/utils/pidfile.h`	53
`PM_STATUS_STANDBY`	`src/include/utils/pidfile.h`	54

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Other databases’ control utilities

MySQL / MariaDB uses mysqladmin / mysqld_safe in a similar role: a wrapper script that forks mysqld, monitors the PID file, and sends SIGTERM on shutdown. The InnoDB system tablespace header (ibdata1, page 0) serves the control-file role: it stores the LSN of the last checkpoint and the tablespace ID, protected by a page checksum. Unlike PostgreSQL, MySQL stores this inside the tablespace itself rather than in a separate file — a design choice that couples the control record to the storage engine.

Oracle uses two or more control files mirrored to separate mount points. The Oracle control file is much larger (megabytes, not 512 bytes) because it also stores the RMAN backup catalog and archived-log history. The mirroring is a high-availability measure absent in PostgreSQL (which relies on the single pg_control surviving on the same filesystem).

SQLite stores its database state in a 100-byte header at offset 0 of the database file itself — the most minimal possible control-record design. The “change counter” at offset 24 is the equivalent of PostgreSQL’s CRC: a reader that detects a changed counter knows it must re-read the shared cache. No separate control file exists; the database is the control file.

Research context

The design of pg_control reflects two classical crash-recovery insights. First, ARIES (Mohan et al., 1992) established the principle that the recovery manager must be able to find the REDO start point in a structure that survives crashes — the “master record” in ARIES terminology, which maps directly to checkPointCopy.redo in ControlFileData. Second, the atomicity of the control-file write is a special case of the write-ahead logging invariant: before any data change is considered durable, the metadata record naming its location must be durable. Writing pg_control with a CRC and ensuring it fits in one sector is how PostgreSQL guarantees this without a second WAL record.

The system_identifier field (a 64-bit random value set at initdb) is PostgreSQL’s answer to the split-brain problem in HA clusters: a standby that has been promoted to primary will refuse to apply WAL from the old primary because the WAL carries a different system_identifier. This simple check prevents a catastrophic case of a demoted primary reattaching to its old WAL stream.

The mock_authentication_nonce (32 random bytes) was added in PG10 to close a timing side-channel in SASL authentication exchanges that could proceed based on a cluster-unique value even when the user does not exist. It is stored in pg_control because it must survive server restarts and be available before any catalog access — exactly the kind of stable, pre-catalog state that pg_control is designed to hold.

Sources

Primary source files (REL_18_STABLE, commit 273fe94):

src/bin/pg_ctl/pg_ctl.c
src/bin/pg_controldata/pg_controldata.c
src/include/catalog/pg_control.h
src/common/controldata_utils.c
src/include/utils/pidfile.h

Cross-references within this KB:

postgres-postmaster.md — the postmaster process that pg_ctl launches
postgres-xlog-wal.md — WAL mechanics; checkpoint LSN interpretation
postgres-checkpoint.md — checkpoint writes that update pg_control
postgres-recovery-redo.md — startup reads pg_control to begin REDO
postgres-backup-basebackup.md — backupStartPoint / backupEndRequired fields
postgres-pg-dump-restore.md — logical backup counterpart (no pg_control dependency)

Research and textbooks:

Mohan et al., “ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging,” ACM TODS 17(1), 1992 — origin of the REDO-start “master record”
Petrov, Database Internals (O’Reilly, 2019), ch. 7 “Log-Structured Storage” — control file and WAL bootstrap
Stonebraker & Rowe, “The Design of POSTGRES,” ACM SIGMOD 1986 — original process-model rationale