PostgreSQL Postmaster — Cluster Supervisor, Process Lifecycle, and Crash Recovery

Contents:

Theoretical Background
Common DBMS Design
PostgreSQL’s Approach
Source Walkthrough
Source verification (as of 2026-06-05)
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Sources

Theoretical Background

Every multi-user database server must answer a foundational question: how does the server program become more than one concurrent unit of execution, and who coordinates those units? The answer shapes fault isolation, memory visibility, scheduling behavior, and the entire crash-recovery story.

Two classical architectural answers dominate the literature:

One supervisor + per-client workers (process or thread). A long-lived coordinator accepts new connections and delegates work to children. The coordinator never touches user SQL; it manages membership, monitors health, and restarts failed workers. Examples: Apache httpd (prefork), PostgreSQL (process model), original Oracle dedicated-server mode.
Single-process multithreaded server. One process handles all connections on a thread pool. A single address space reduces communication cost but couples all sessions: a wild pointer in one thread can corrupt another session’s state. Examples: MySQL InnoDB, SQL Server, modern Oracle.

Architecture of a Database System (Hellerstein et al., 2007, §2) surveys both models and notes that the process model offers stronger fault isolation at the cost of higher per-connection overhead, while the thread model is cheaper to create sessions but harder to make crash-safe. PostgreSQL’s founding paper (“The Design of POSTGRES”, Stonebraker & Rowe, 1986) adopted the process model explicitly for robustness: a bug in one backend cannot corrupt another backend’s stack, and the operating system enforces address-space separation at no application cost.

The supervisor role adds a second design question: what is the supervisor’s relationship to shared state? In PostgreSQL the answer is sharp: the postmaster creates shared memory once, sizes it permanently, and then becomes a pure process manager. It never reads user data. Every child process attaches to the same shared segment. The shared structures — buffer pool, lock table, procarray, sinval ring — are the cluster’s runtime state; the postmaster’s only ownership is the file descriptor to the listen socket and the mapping of child PIDs to their roles.

This design has a critical implication for crash recovery: because the postmaster never participates in transactions or holds shared-memory state itself, it can detect a child crash (via SIGCHLD), signal all siblings (SIGTERM for graceful, SIGQUIT for immediate), wait for them to exit, and then re-create shared memory from scratch. The cluster’s durable state lives in WAL and in data files — not in the postmaster process — so “restart” means “re-attach to the same on-disk state after rebuilding the in-memory machine.”

Common DBMS Design

The postmaster pattern recurs across process-model database servers. Understanding the generic idioms makes PostgreSQL’s specific choices legible as one point in a well-mapped design space.

The supervisor never does query work

In every mature process-model server, the supervisor process is a thin loop: accept() a connection, fork() (or spawn) a worker, hand off the file descriptor, go back to sleep. The supervisor holds no transaction state, no cached plan, no open relation. This is the critical invariant that makes crash recovery deterministic: if the supervisor’s memory is always clean, any child crash is bounded.

Contrast with early implementations that let the coordinator service some requests itself — any corruption in one request then touches the coordinator’s heap, and the entire server must restart.

Fixed shared-memory segment sized at startup

Sharing data between processes requires an explicitly mapped shared-memory region. The region must be sized before any child runs, because most platforms cannot grow a SysV shared-memory segment while it is attached. The universal pattern:

Supervisor calculates the total required size from configuration (max_connections, shared_buffers, max_locks_per_transaction, …).
Supervisor allocates the segment (shmget / mmap).
Each child fork()s after the segment exists and inherits the mapping.
On crash restart, the supervisor detaches the old segment, creates a fresh one of the same size, and re-initializes.

Step 4 is why max_connections and shared_buffers require a server restart — they determine the segment size, which is fixed for the lifetime of the postmaster.

PMChild pool: a fixed roster of live children

The supervisor must track which child has which role so that when a child exits, it knows whether to restart it, signal siblings, or transition shutdown state. The pattern is a fixed-size array (or pool) of child descriptors, one slot per permitted child, allocated from the shared-memory layout but managed only by the supervisor. Each slot records: PID, role (BackendType), and any role-specific state (pointer to background worker registration, notification flags, etc.).

Dead-end children — processes forked to send an error to a client before dying — are the exception: they are not counted against the pool because they consume no shared-memory resources.

State-machine-driven shutdown

A clean, deterministic shutdown of a multi-process server requires sequencing: stop client backends first, wait for WAL to flush, take a shutdown checkpoint, stop archiver and walsenders, stop remaining infrastructure. Ad-hoc code that handles each subprocess type independently tends to miss interactions (e.g., a walsender that keeps replication slots open while archiver tries to retire WAL). The clean design is an explicit finite-state machine in the supervisor, with well-named states and one central function that decides which signal to send and when.

Theory ↔ PostgreSQL mapping

Generic concept	PostgreSQL name
Supervisor process	postmaster (`postmaster.c`)
Supervisor main loop	`ServerLoop`
Per-connection accept	`BackendStartup` → `postmaster_child_launch`
Child role identifier	`BackendType` enum (`miscadmin.h`)
Fixed child-slot roster	`PMChild` pool (`pmchild.c`)
Shared-memory sizing + allocation	`CalculateShmemSize` → `CreateSharedMemoryAndSemaphores`
Supervisor state machine	`PMState` enum + `PostmasterStateMachine`
Child-crash handler	`HandleChildCrash` → `HandleFatalError`
Dead-end child (no slot)	`AllocDeadEndChild` / `B_DEAD_END_BACKEND`
Background process roster	`LaunchMissingBackgroundProcesses`

PostgreSQL’s Approach

The single binary, many roles

PostgreSQL ships as one binary (postgres). What a process does is determined by the BackendType value stored in the global MyBackendType, set before the process calls its role-specific main_fn. The full enum in REL_18_STABLE:

// BackendType — src/include/miscadmin.h
typedef enum BackendType
{
    B_INVALID = 0,

    /* Backend-like processes (call PostgresMain or a thin wrapper) */
    B_BACKEND,            /* regular client-serving backend */
    B_DEAD_END_BACKEND,   /* forked only to send an error to the client */
    B_AUTOVAC_LAUNCHER,
    B_AUTOVAC_WORKER,
    B_BG_WORKER,
    B_WAL_SENDER,
    B_SLOTSYNC_WORKER,
    B_STANDALONE_BACKEND, /* postgres -s / single-user mode */

    /* Auxiliary processes (no database binding, no heavyweight locks) */
    B_ARCHIVER,
    B_BG_WRITER,
    B_CHECKPOINTER,
    B_IO_WORKER,          /* PG18: async I/O worker */
    B_STARTUP,
    B_WAL_RECEIVER,
    B_WAL_SUMMARIZER,     /* PG18: WAL summarization for incremental backup */
    B_WAL_WRITER,

    B_LOGGER,             /* syslogger — does not attach to shared memory */
} BackendType;

The distinction between “backend-like” and “auxiliary” is architectural: backend-like processes call InitPostgres and can hold heavyweight locks; auxiliary processes have simpler initialization paths and exist to support the cluster infrastructure regardless of which client databases are open. B_IO_WORKER and B_WAL_SUMMARIZER are new in PG18 and must not be asserted for PG17.

PostmasterMain: cluster startup sequence

PostmasterMain is the entry point when the binary is invoked as a server (postgres -D $PGDATA). Its startup sequence, before ServerLoop:

// PostmasterMain — src/backend/postmaster/postmaster.c
PostmasterMain(int argc, char *argv[])
{
    InitProcessGlobals();       /* PID, latch, random seed */
    PostmasterPid = MyProcPid;
    IsPostmasterEnvironment = true;

    /* parse argv, read postgresql.conf, validate DataDir */
    InitializeGUCOptions();
    /* ... option parsing ... */
    SelectConfigFiles(userDoption, progname);
    checkDataDir();
    checkControlFile();
    ChangeToDataDir();

    /* install postmaster signal handlers */
    pqsignal(SIGHUP,  handle_pm_reload_request_signal);
    pqsignal(SIGTERM, handle_pm_shutdown_request_signal);
    pqsignal(SIGQUIT, handle_pm_shutdown_request_signal);
    pqsignal(SIGCHLD, handle_pm_child_exit_signal);
    pqsignal(SIGUSR1, handle_pm_pmsignal_signal);

    /* *** fixed-size shared memory allocated here *** */
    CreateSharedMemoryAndSemaphores();

    InitPostmasterChildSlots();  /* allocate PMChild pool */

    /* open listen sockets, write postmaster.pid */
    /* ... condensed ... */

    /* launch syslogger, startup process */
    /* ... condensed: StartSysLogger(), StartChildProcess(B_STARTUP) ... */

    ServerLoop();  /* never returns */
}

CreateSharedMemoryAndSemaphores is the point of no return: it sizes and allocates the shared segment from which every later ShmemInitStruct call carves its slice. A max_connections change requires re-running this entire sequence because the buffer pool, lock table, procarray, and sinval ring sizes all derive from it.

The PMChild pool

InitPostmasterChildSlots (in pmchild.c) allocates a flat array of PMChild structs and partitions it into per-BackendType freelists:

// pmchild.c — pool structure
typedef struct PMChildPool
{
    int         size;         /* slots reserved for this BackendType */
    int         first_slotno; /* index into the flat array */
    dlist_head  freelist;     /* currently unused PMChild entries */
} PMChildPool;

static PMChildPool pmchild_pools[BACKEND_NUM_TYPES];
NON_EXEC_STATIC int num_pmchild_slots = 0;
dlist_head ActiveChildList;   /* all live children including dead-ends */

When the postmaster forks a new backend, AssignPostmasterChildSlot pops a slot from the appropriate freelist and links it onto ActiveChildList. When the child exits, ReleasePostmasterChildSlot returns the slot to its freelist. Dead-end children (B_DEAD_END_BACKEND) are the exception: AllocDeadEndChild heap-allocates a PMChild outside the pool — there is no cap on them because they consume no shared resources.

ServerLoop: the event loop

After startup, the postmaster enters ServerLoop, a for(;;) around WaitEventSetWait:

// ServerLoop — src/backend/postmaster/postmaster.c
static int
ServerLoop(void)
{
    ConfigurePostmasterWaitSet(true);   /* latch + all listen sockets */
    for (;;)
    {
        nevents = WaitEventSetWait(pm_wait_set, DetermineSleepTime(),
                                   events, lengthof(events), 0);
        for (int i = 0; i < nevents; i++)
        {
            if (events[i].events & WL_LATCH_SET)
                ResetLatch(MyLatch);

            /* process deferred signals in priority order */
            if (pending_pm_shutdown_request)  process_pm_shutdown_request();
            if (pending_pm_reload_request)    process_pm_reload_request();
            if (pending_pm_child_exit)        process_pm_child_exit();
            if (pending_pm_pmsignal)          process_pm_pmsignal();

            if (events[i].events & WL_SOCKET_ACCEPT)
            {
                AcceptConnection(events[i].fd, &s);
                BackendStartup(&s);       /* fork a new backend */
                closesocket(s.sock);      /* postmaster does not keep it */
            }
        }
        LaunchMissingBackgroundProcesses();
        /* ... periodic: recheck postmaster.pid, touch socket files ... */
    }
}

Signal handlers (SIGHUP, SIGTERM, SIGCHLD, SIGUSR1) do nothing more than set pending_pm_* boolean flags and set the latch. All actual work happens in the main loop. This deferred-signal discipline avoids async-signal-unsafe operations (malloc, file I/O) inside handlers.

Figure 1 — Postmaster event loop: signals set latches; the main loop dispatches

flowchart TD
    WES["WaitEventSetWait\n(blocks on latch + listen sockets)"]
    SIG["Signal arrives\nSIGHUP / SIGTERM / SIGCHLD / SIGUSR1"]
    LATCH["Set pending_pm_* flag\nSetLatch"]
    RESET["ResetLatch"]
    SDOWN["process_pm_shutdown_request"]
    RELOAD["process_pm_reload_request"]
    CEXIT["process_pm_child_exit"]
    PMSIG["process_pm_pmsignal"]
    ACCEPT["WL_SOCKET_ACCEPT:\nAcceptConnection → BackendStartup"]
    LAUNCH["LaunchMissingBackgroundProcesses"]

    WES -->|woken| RESET
    SIG --> LATCH --> WES
    RESET --> SDOWN
    SDOWN --> RELOAD --> CEXIT --> PMSIG --> ACCEPT --> LAUNCH --> WES

Figure 1 — The postmaster’s ServerLoop is a pure event dispatcher. Signal handlers only flip boolean flags; all logic runs in the foreground loop after WaitEventSetWait returns.

BackendStartup: forking a client backend

When WL_SOCKET_ACCEPT fires, BackendStartup orchestrates the fork:

// BackendStartup — src/backend/postmaster/postmaster.c
static int
BackendStartup(ClientSocket *client_sock)
{
    cac = canAcceptConnections(B_BACKEND);
    if (cac == CAC_OK)
    {
        bn = AssignPostmasterChildSlot(B_BACKEND);
        if (!bn)
            cac = CAC_TOOMANY;    /* pool exhausted → dead-end child */
    }
    if (!bn)
        bn = AllocDeadEndChild(); /* heap-allocated, no slot */

    startup_data.canAcceptConnections = cac;
    pid = postmaster_child_launch(bn->bkend_type, bn->child_slot,
                                  &startup_data, sizeof(startup_data),
                                  client_sock);
    if (pid < 0)
    {
        ReleasePostmasterChildSlot(bn);
        report_fork_failure_to_client(client_sock, save_errno);
        return STATUS_ERROR;
    }
    bn->pid = pid;
    return STATUS_OK;
}

canAcceptConnections checks pmState, max_connections, superuser_reserved_connections, and the connsAllowed flag. A backend forked when the connection limit is already reached gets cac = CAC_TOOMANY; it is a dead-end backend that will immediately send the “too many connections” error and exit.

postmaster_child_launch (in launch_backend.c) executes the actual fork(). On Unix the child runs the main_fn registered for its BackendType; on Windows EXEC_BACKEND is defined and the child re-enters via SubPostmasterMain after deserializing BackendParameters from shared memory.

The PMState machine: normal operation to shutdown

The postmaster tracks its overall state in a single PMState variable:

// PMState enum — src/backend/postmaster/postmaster.c
typedef enum PMState
{
    PM_INIT,              /* postmaster starting */
    PM_STARTUP,           /* waiting for startup subprocess */
    PM_RECOVERY,          /* in archive recovery mode */
    PM_HOT_STANDBY,       /* in hot standby mode */
    PM_RUN,               /* normal: accepting connections */
    PM_STOP_BACKENDS,     /* need to stop remaining backends (transient) */
    PM_WAIT_BACKENDS,     /* waiting for live backends to exit */
    PM_WAIT_XLOG_SHUTDOWN,/* waiting for checkpointer shutdown ckpt */
    PM_WAIT_XLOG_ARCHIVAL,/* waiting for archiver and walsenders to finish */
    PM_WAIT_IO_WORKERS,   /* waiting for io workers to exit */
    PM_WAIT_CHECKPOINTER, /* waiting for checkpointer to shut down */
    PM_WAIT_DEAD_END,     /* waiting for dead-end children to exit */
    PM_NO_CHILDREN,       /* all important children have exited */
} PMState;

PostmasterStateMachine is called after every significant event (child exit, signal receipt) and drives transitions. The normal-operation forward path:

PM_INIT → PM_STARTUP → PM_RECOVERY (if WAL recovery needed)
       → PM_HOT_STANDBY (if standby) → PM_RUN

The shutdown path (smart shutdown as the common case):

PM_RUN → PM_STOP_BACKENDS (send SIGTERM to client backends)
       → PM_WAIT_BACKENDS  (wait for them to exit)
       → PM_WAIT_XLOG_SHUTDOWN (wait for checkpointer to write shutdown ckpt)
       → PM_WAIT_XLOG_ARCHIVAL (wait for archiver + walsenders)
       → PM_WAIT_IO_WORKERS
       → PM_WAIT_CHECKPOINTER
       → PM_WAIT_DEAD_END
       → PM_NO_CHILDREN → ExitPostmaster(0)

Figure 2 — PMState transitions for smart shutdown

stateDiagram-v2
    [*] --> PM_INIT
    PM_INIT --> PM_STARTUP : CreateSharedMemoryAndSemaphores\nlaunch startup process
    PM_STARTUP --> PM_RECOVERY : startup process running\nrecovery needed
    PM_STARTUP --> PM_RUN : startup process exits 0\nno recovery
    PM_RECOVERY --> PM_HOT_STANDBY : recovery complete\nstandby mode
    PM_HOT_STANDBY --> PM_RUN : promoted to primary
    PM_RECOVERY --> PM_RUN : recovery complete\nprimary mode
    PM_RUN --> PM_STOP_BACKENDS : smart shutdown\nconnsAllowed=false
    PM_STOP_BACKENDS --> PM_WAIT_BACKENDS : SIGTERM sent to backends
    PM_WAIT_BACKENDS --> PM_WAIT_XLOG_SHUTDOWN : all backends exited
    PM_WAIT_XLOG_SHUTDOWN --> PM_WAIT_XLOG_ARCHIVAL : shutdown checkpoint written
    PM_WAIT_XLOG_ARCHIVAL --> PM_WAIT_IO_WORKERS : archiver and walsenders done
    PM_WAIT_IO_WORKERS --> PM_WAIT_CHECKPOINTER : io workers done
    PM_WAIT_CHECKPOINTER --> PM_WAIT_DEAD_END : checkpointer exits
    PM_WAIT_DEAD_END --> PM_NO_CHILDREN : dead-end children gone
    PM_NO_CHILDREN --> [*] : ExitPostmaster(0)

Figure 2 — Normal (smart) shutdown PMState progression. Immediate shutdown and crash recovery collapse several intermediate states by bypassing the wait-for-backends steps.

Crash recovery: HandleChildCrash and FatalError

When process_pm_child_exit finds a non-zero exit status for a critical child, it calls HandleChildCrash:

// HandleChildCrash — src/backend/postmaster/postmaster.c
static void
HandleChildCrash(int pid, int exitstatus, const char *procname)
{
    if (FatalError || Shutdown == ImmediateShutdown)
        return;   /* already in crash-recovery path */

    LogChildExit(LOG, procname, pid, exitstatus);
    ereport(LOG, (errmsg("terminating any other active server processes")));

    /* Sets FatalError=true, sends SIGQUIT to siblings */
    HandleFatalError(PMQUIT_FOR_CRASH, true);
}

FatalError = true is the flag that turns a normal shutdown into a crash restart. Once set, PostmasterStateMachine drives the cluster through:

Send SIGQUIT to all children (bypasses graceful shutdown path in each child — they call quickdie → _exit(2)).
Wait for all children to exit (PM_WAIT_BACKENDS → PM_WAIT_DEAD_END).
PM_NO_CHILDREN: re-create shared memory (CreateSharedMemoryAndSemaphores again, at line 3202), re-launch the startup process (which will run WAL recovery), and transition back to PM_STARTUP.

The key insight is that because the postmaster itself holds no user data, this whole cycle — signal, wait, re-create, relaunch — is just a few hundred lines of deterministic C.

The reaping itself happens in process_pm_child_exit, the deferred-signal handler that drains every exited child with a non-blocking waitpid loop:

// process_pm_child_exit — src/backend/postmaster/postmaster.c
static void
process_pm_child_exit(void)
{
    int     pid;
    int     exitstatus;

    pending_pm_child_exit = false;

    while ((pid = waitpid(-1, &exitstatus, WNOHANG)) > 0)
    {
        PMChild    *pmchild;

        /* Check if this child was a startup process. */
        if (StartupPMChild && pid == StartupPMChild->pid)
        {
            ReleasePostmasterChildSlot(StartupPMChild);
            StartupPMChild = NULL;

            if (Shutdown > NoShutdown &&
                (EXIT_STATUS_0(exitstatus) || EXIT_STATUS_1(exitstatus)))
            {
                StartupStatus = STARTUP_NOT_RUNNING;
                UpdatePMState(PM_WAIT_BACKENDS);
                continue;       /* PostmasterStateMachine does the rest */
            }
            /* ... unexpected startup-process exit → HandleChildCrash ... */
        }
        /* ... checkpointer, bgwriter, walwriter, autovac, archiver, ... */
        /* otherwise it was a backend or bgworker: */
        CleanupBackend(pmchild, exitstatus);
    }

    /* After processing all exits, recompute the postmaster's state. */
    PostmasterStateMachine();
}

waitpid(-1, …, WNOHANG) reaps all currently-zombie children in one handler invocation, not one per SIGCHLD: Unix may coalesce multiple SIGCHLD deliveries into a single pending bit, so the loop must keep calling waitpid until it returns 0. Each named auxiliary (startup, checkpointer, bgwriter, …) is matched by its saved PMChild pointer; everything else falls through to CleanupBackend, which releases the pool slot and applies bgworker restart policy. The single PostmasterStateMachine() call at the bottom is what actually advances shutdown or fires the crash-restart path described below.

The crash-restart re-initialization is the tail of PostmasterStateMachine: once FatalError is set and every non-syslogger child has been reaped (pmState == PM_NO_CHILDREN), the postmaster rebuilds the entire in-memory cluster from scratch:

// PostmasterStateMachine (reinit tail) — src/backend/postmaster/postmaster.c
if (FatalError && pmState == PM_NO_CHILDREN)
{
    ereport(LOG,
            (errmsg("all server processes terminated; reinitializing")));

    if (remove_temp_files_after_crash)
        RemovePgTempFiles();

    ResetBackgroundWorkerCrashTimes();  /* allow bgworkers to restart now */

    shmem_exit(1);                      /* detach old shared segment */
    LocalProcessControlFile(true);      /* re-read control file */
    CreateSharedMemoryAndSemaphores();  /* fresh shared memory */

    UpdatePMState(PM_STARTUP);
    maybe_adjust_io_workers();           /* need I/O workers for recovery */
    StartupPMChild = StartChildProcess(B_STARTUP);  /* runs WAL recovery */
    StartupStatus = STARTUP_RUNNING;
    AbortStartTime = 0;

    ConfigurePostmasterWaitSet(true);    /* accept connections again */
}

Two guards sit just above this block and turn a crash into a permanent exit instead of a restart: if StartupStatus == STARTUP_CRASHED the postmaster calls ExitPostmaster(1) (“more than likely it will just fail again”), and if the restart_after_crash GUC is off it exits with the log line “shutting down because “restart_after_crash” is off”. Only when neither guard fires does control reach the reinit block. Note that shmem_exit(1) detaches the old segment before CreateSharedMemoryAndSemaphores() maps a fresh one of identical size — this is why a crash restart preserves shared_buffers / max_connections sizing without re-reading postgresql.conf for those values.

Figure 3 — Crash detection to reinitialization

flowchart TD
    CHILD["A critical child exits<br/>non-zero (SIGCHLD)"]
    REAP["process_pm_child_exit<br/>waitpid(-1, WNOHANG) loop"]
    CRASH["HandleChildCrash<br/>FatalError = true<br/>SIGQUIT to all siblings"]
    SM1["PostmasterStateMachine"]
    WAIT["PM_WAIT_BACKENDS ... PM_WAIT_DEAD_END<br/>siblings quickdie / _exit(2)"]
    NOCHILD["PM_NO_CHILDREN<br/>all non-syslogger children reaped"]
    GUARD{"StartupStatus==CRASHED<br/>or restart_after_crash off?"}
    EXIT["ExitPostmaster(1)"]
    REINIT["shmem_exit(1) detach old segment<br/>CreateSharedMemoryAndSemaphores<br/>StartChildProcess(B_STARTUP)"]
    STARTUP["PM_STARTUP<br/>WAL recovery re-runs"]

    CHILD --> REAP --> CRASH --> SM1 --> WAIT --> NOCHILD --> GUARD
    GUARD -->|yes| EXIT
    GUARD -->|no| REINIT --> STARTUP

Figure 3 — The crash-restart cycle. SIGQUIT forces every sibling through the immediate quickdie path; only after the pool is fully drained (PM_NO_CHILDREN) does the postmaster detach shared memory, rebuild it at the same size, and relaunch the startup process to replay WAL.

LaunchMissingBackgroundProcesses

After every event-loop iteration, LaunchMissingBackgroundProcesses inspects pmState and fills in any background process that should be running but isn’t:

B_CHECKPOINTER and B_BG_WRITER are wanted in PM_STARTUP, PM_RECOVERY, PM_HOT_STANDBY, and PM_RUN.
B_WAL_WRITER and B_AUTOVAC_LAUNCHER are wanted only in PM_RUN.
B_ARCHIVER is wanted in PM_RUN (or always if archive_mode = always) when archiving is active.
B_IO_WORKER count is managed dynamically by maybe_adjust_io_workers based on io_combine_limit and max_io_concurrency GUCs (PG18).
B_WAL_SUMMARIZER is wanted in PM_RUN when summarize_wal = on (PG18).

This “lazy launch on every iteration” pattern means the postmaster never needs explicit “restart background process X” code paths — if a background process exits normally, the next loop iteration will relaunch it.

Source Walkthrough

PostmasterMain and startup

PostmasterMain (postmaster.c:494) — the cluster entry point; GUC init, config load, shared-memory creation, child-slot init, listen-socket binding, syslogger + startup-process launch, then ServerLoop.
InitProcessGlobals (postmaster.c:1933) — sets MyProcPid, MyStartTimestamp, initializes MyLatch and the random seed before any child is forked.
CreateSharedMemoryAndSemaphores — called at line 1004 on first startup and again at line 3202 after a crash restart. Sizes and allocates the shared segment; every ShmemInitStruct call later carves from this slab.

PMChild pool

InitPostmasterChildSlots (pmchild.c:86) — called once at startup; allocates the PMChild array and partitions it into per-type freelists.
AssignPostmasterChildSlot (pmchild.c:162) — pops a slot from the appropriate freelist before forking; links the slot onto ActiveChildList.
AllocDeadEndChild (pmchild.c:208) — heap-allocates a PMChild for dead-end backends; not drawn from the pool.
ReleasePostmasterChildSlot (pmchild.c:236) — returns a slot to its freelist after a child exits; called from CleanupBackend and process_pm_child_exit for named children.
FindPostmasterChildByPid (pmchild.c:274) — O(n) scan of ActiveChildList; used in process_pm_child_exit for the “was it a backend or bgworker?” check.

ServerLoop and event dispatch

ServerLoop (postmaster.c:1653) — the for(;;) around WaitEventSetWait; dispatches deferred-signal work and accepts.
ConfigurePostmasterWaitSet (postmaster.c:1630) — builds the WaitEventSet with the latch plus all listen sockets; called again with accept_connections=false during shutdown to stop accepting new connections.
BackendStartup (postmaster.c:3518) — acquires a child slot, calls postmaster_child_launch, records the PID; handles the CAC_TOOMANY → dead-end path.
canAcceptConnections (postmaster.c:1812) — checks pmState, connsAllowed, connection count vs limits; returns a CAC_state enum.
LaunchMissingBackgroundProcesses (postmaster.c:3267) — iterates all background types and launches any that should be running; called at the bottom of every ServerLoop iteration.

Child launch (launch_backend.c)

postmaster_child_launch (launch_backend.c:229) — fork() on Unix; internal_forkexec on Windows. Child calls InitPostmasterChild(), closes postmaster ports, then invokes child_process_kinds[type].main_fn.
PostmasterChildName (launch_backend.c:211) — maps BackendType to a human-readable string for log messages and ps display.

State machine and crash recovery

PostmasterStateMachine (postmaster.c:2865) — called after every signal-driven work function; the single function that decides PMState transitions and which signals to send.
HandleChildCrash (postmaster.c:2772) — logs the crash, calls HandleFatalError(PMQUIT_FOR_CRASH, true) to set FatalError and broadcast SIGQUIT.
CleanupBackend (postmaster.c:2550) — releases the PMChild slot for a backend or bgworker after exit; updates bgworker restart logic; calls PostmasterStateMachine.
process_pm_child_exit (postmaster.c:2233) — the SIGCHLD-driven reaper; waitpid(-1, WNOHANG) loop; dispatches to per-type handlers or to CleanupBackend; calls PostmasterStateMachine at the end.

Position hints (as of 2026-06-05, commit 273fe94)

Symbol	File	Line
`BackendType` enum	`src/include/miscadmin.h`	337
`PMState` enum	`src/backend/postmaster/postmaster.c`	336
`PostmasterMain`	`src/backend/postmaster/postmaster.c`	494
`CreateSharedMemoryAndSemaphores` (first call)	`src/backend/postmaster/postmaster.c`	1004
`InitPostmasterChildSlots` call	`src/backend/postmaster/postmaster.c`	952
`ServerLoop`	`src/backend/postmaster/postmaster.c`	1653
`ConfigurePostmasterWaitSet`	`src/backend/postmaster/postmaster.c`	1630
`canAcceptConnections`	`src/backend/postmaster/postmaster.c`	1812
`InitProcessGlobals`	`src/backend/postmaster/postmaster.c`	1933
`CleanupBackend`	`src/backend/postmaster/postmaster.c`	2550
`process_pm_child_exit`	`src/backend/postmaster/postmaster.c`	2233
`HandleChildCrash`	`src/backend/postmaster/postmaster.c`	2772
`PostmasterStateMachine`	`src/backend/postmaster/postmaster.c`	2865
`LaunchMissingBackgroundProcesses`	`src/backend/postmaster/postmaster.c`	3267
`BackendStartup`	`src/backend/postmaster/postmaster.c`	3518
`CreateSharedMemoryAndSemaphores` (crash-restart call)	`src/backend/postmaster/postmaster.c`	3202
`postmaster_child_launch`	`src/backend/postmaster/launch_backend.c`	229
`PostmasterChildName`	`src/backend/postmaster/launch_backend.c`	211
`MaxLivePostmasterChildren`	`src/backend/postmaster/pmchild.c`	70
`InitPostmasterChildSlots`	`src/backend/postmaster/pmchild.c`	86
`AssignPostmasterChildSlot`	`src/backend/postmaster/pmchild.c`	162
`AllocDeadEndChild`	`src/backend/postmaster/pmchild.c`	208
`ReleasePostmasterChildSlot`	`src/backend/postmaster/pmchild.c`	236
`FindPostmasterChildByPid`	`src/backend/postmaster/pmchild.c`	274

Source verification (as of 2026-06-05)

Verified facts

BackendType has 18 members in REL_18_STABLE, including B_IO_WORKER and B_WAL_SUMMARIZER which are new in PG18. Verified at src/include/miscadmin.h:337–375, commit 273fe94. B_IO_WORKER is the async-I/O worker added with storage/aio/; it does not call InitPostgres and does not hold heavyweight locks. B_WAL_SUMMARIZER supports incremental backup (PG18 feature). Neither exists in PG17; assert neither when describing earlier releases.
PMState has 12 values; PM_HOT_STANDBY is distinct from PM_RECOVERY. Verified at postmaster.c:336–352. The distinction matters for LaunchMissingBackgroundProcesses: archiver is started in PM_HOT_STANDBY (when archive_mode=always) but not in PM_RECOVERY.
Signal handlers set boolean flags only; no SQL-unsafe work in handlers. Verified: handle_pm_child_exit_signal sets pending_pm_child_exit = true and SetLatch(MyLatch) only (postmaster.c:2223–2231). All reaping happens in process_pm_child_exit in the main loop.
BackendStartup closes the accepted socket in the postmaster after fork. Verified at postmaster.c:1704–1712: closesocket(s.sock) is called unconditionally in the parent after BackendStartup returns. The child inherits the open file descriptor through fork(); the postmaster does not need it.
Dead-end children are heap-allocated, not pool-allocated. Verified in pmchild.c:208–234 (AllocDeadEndChild): uses palloc from TopMemoryContext, not pmchild_pools. The comment confirms “There is no limit on the number of dead-end backends.”
CreateSharedMemoryAndSemaphores is called twice: at startup and on crash restart. Verified at postmaster.c:1004 (startup) and postmaster.c:3202 (crash restart, inside the PM_NO_CHILDREN branch of PostmasterStateMachine). On crash restart the old shared segment is detached before the new one is allocated.
LaunchMissingBackgroundProcesses is called on every ServerLoop iteration, not only on child-exit events. Verified at postmaster.c:1718: it is the last statement in the event dispatch block, outside the for (int i = 0; i < nevents; i++) loop. This ensures background processes are relaunched even if no events fired (e.g., after a config reload that enables WAL archiving).

Open questions

B_IO_WORKER lifecycle details. maybe_adjust_io_workers (called from LaunchMissingBackgroundProcesses) dynamically starts and stops B_IO_WORKER processes based on GUC values. The exact algorithm for deciding how many workers to maintain — and whether excess workers are sent SIGTERM or allowed to exit naturally — is not traced in this document. Investigation path: read maybe_adjust_io_workers and the io_worker_* functions in storage/aio/.
PM_WAIT_XLOG_ARCHIVAL transition trigger. The condition that moves pmState from PM_WAIT_XLOG_SHUTDOWN to PM_WAIT_XLOG_ARCHIVAL (the shutdown checkpoint is “written enough”) is driven by a pmsignal from the checkpointer. The exact PMSIGNAL_* value and the handshake are not detailed here. Investigation path: search PMSIGNAL_SHUTDOWN_COMPLETE in pmsignal.c and checkpointer.c.
EXEC_BACKEND (Windows) re-entry path. On Windows, postmaster_child_launch calls internal_forkexec instead of fork(). The child re-enters via SubPostmasterMain, deserializes BackendParameters, and then calls the appropriate main_fn. The exact parameters serialized and the interaction with MyClientSocket on the Windows path are not analyzed here. Investigation path: read save_backend_variables / restore_backend_variables in launch_backend.c.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Supervisor-per-instance vs. thread-pool models. Architecture of a Database System (Hellerstein et al., 2007, §2.2–2.3) compares the process-per-session, thread-per-session, and process-pool models. PostgreSQL’s postmaster-forks-on-demand is the “process-per-session” variant; the cost is fork() latency per new connection. Connection poolers (PgBouncer, pgpool-II) and the prototype built-in pooler in PG17+ address this by multiplexing many client connections onto fewer backend processes. Understanding PostmasterMain’s startup cost illuminates what the pool amortizes.
“The Design of POSTGRES” (Stonebraker & Rowe, 1986). The original design paper explicitly chose the process model for simplicity and isolation. Comparing the 1986 description of the “postmaster” with the current PostmasterMain (now ~900 lines vs. a few pages of prose) makes a concrete evolution-of-complexity exercise: what did the original design defer that REL_18 must handle (Windows EXEC_BACKEND, PG18 async I/O workers, WAL summarizer, incremental backup, hot standby)?
Oracle’s process model and PMON. Oracle uses a similar supervisor/worker split, but its coordinator (PMON — Process MONitor) is itself a background process, not the parent of all other processes. PMON detects failed sessions and cleans up their resources in the shared pool. PostgreSQL’s postmaster plays both roles: parent-process supervisor (via SIGCHLD) and crash-recovery coordinator (via HandleChildCrash). A comparison would highlight the trade-off between polling (PMON) and signal-driven reaping (SIGCHLD).
MySQL’s thread-per-connection model. MySQL’s “one-thread-per-connection” server skips the fork() cost entirely: a new connection is a pthread_create. The penalty is that a bug in one thread’s stack can corrupt another session’s allocations. The absence of a “postmaster” equivalent means MySQL’s crash recovery operates at the engine level (InnoDB recovery on restart) rather than the process-supervisor level.
Greenplum / Citus: coordinating a postmaster fleet. Distributed PostgreSQL variants run one postmaster per segment. The coordinator postmaster dispatches query fragments to segment postmasters over libpq connections. The postmaster’s crash-isolation property becomes essential in this setting: a segment crash can be recovered in isolation without restarting the coordinator.

Sources

Raw source files consumed

None (synthesized directly from source tree at REL_18_STABLE / commit 273fe94).

Source code paths (REL_18_STABLE / commit 273fe94)

src/backend/postmaster/postmaster.c — PostmasterMain, ServerLoop, BackendStartup, canAcceptConnections, process_pm_child_exit, HandleChildCrash, PostmasterStateMachine, LaunchMissingBackgroundProcesses, CleanupBackend, PMState enum
src/backend/postmaster/launch_backend.c — postmaster_child_launch, PostmasterChildName, SubPostmasterMain (Windows re-entry), save_backend_variables / restore_backend_variables
src/backend/postmaster/pmchild.c — PMChild pool, PMChildPool, InitPostmasterChildSlots, AssignPostmasterChildSlot, AllocDeadEndChild, ReleasePostmasterChildSlot, FindPostmasterChildByPid
src/include/miscadmin.h — BackendType enum, AmRegularBackendProcess

Textbook and paper references

Hellerstein, Stonebraker, Hamilton. Architecture of a Database System, Foundations and Trends in Databases, 2007. §2 (process models).
Stonebraker, M., and Rowe, L. A. “The Design of POSTGRES.” SIGMOD 1986. (Process model rationale and original postmaster concept.)