PostgreSQL Postmaster — Cluster Supervisor, Process Lifecycle, and Crash Recovery
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”Every multi-user database server must answer a foundational question: how does the server program become more than one concurrent unit of execution, and who coordinates those units? The answer shapes fault isolation, memory visibility, scheduling behavior, and the entire crash-recovery story.
Two classical architectural answers dominate the literature:
-
One supervisor + per-client workers (process or thread). A long-lived coordinator accepts new connections and delegates work to children. The coordinator never touches user SQL; it manages membership, monitors health, and restarts failed workers. Examples: Apache httpd (prefork), PostgreSQL (process model), original Oracle dedicated-server mode.
-
Single-process multithreaded server. One process handles all connections on a thread pool. A single address space reduces communication cost but couples all sessions: a wild pointer in one thread can corrupt another session’s state. Examples: MySQL InnoDB, SQL Server, modern Oracle.
Architecture of a Database System (Hellerstein et al., 2007, §2) surveys both models and notes that the process model offers stronger fault isolation at the cost of higher per-connection overhead, while the thread model is cheaper to create sessions but harder to make crash-safe. PostgreSQL’s founding paper (“The Design of POSTGRES”, Stonebraker & Rowe, 1986) adopted the process model explicitly for robustness: a bug in one backend cannot corrupt another backend’s stack, and the operating system enforces address-space separation at no application cost.
The supervisor role adds a second design question: what is the supervisor’s relationship to shared state? In PostgreSQL the answer is sharp: the postmaster creates shared memory once, sizes it permanently, and then becomes a pure process manager. It never reads user data. Every child process attaches to the same shared segment. The shared structures — buffer pool, lock table, procarray, sinval ring — are the cluster’s runtime state; the postmaster’s only ownership is the file descriptor to the listen socket and the mapping of child PIDs to their roles.
This design has a critical implication for crash recovery: because the postmaster never participates in transactions or holds shared-memory state itself, it can detect a child crash (via SIGCHLD), signal all siblings (SIGTERM for graceful, SIGQUIT for immediate), wait for them to exit, and then re-create shared memory from scratch. The cluster’s durable state lives in WAL and in data files — not in the postmaster process — so “restart” means “re-attach to the same on-disk state after rebuilding the in-memory machine.”
Common DBMS Design
Section titled “Common DBMS Design”The postmaster pattern recurs across process-model database servers. Understanding the generic idioms makes PostgreSQL’s specific choices legible as one point in a well-mapped design space.
The supervisor never does query work
Section titled “The supervisor never does query work”In every mature process-model server, the supervisor process is a thin
loop: accept() a connection, fork() (or spawn) a worker, hand off
the file descriptor, go back to sleep. The supervisor holds no
transaction state, no cached plan, no open relation. This is the critical
invariant that makes crash recovery deterministic: if the supervisor’s
memory is always clean, any child crash is bounded.
Contrast with early implementations that let the coordinator service some requests itself — any corruption in one request then touches the coordinator’s heap, and the entire server must restart.
Fixed shared-memory segment sized at startup
Section titled “Fixed shared-memory segment sized at startup”Sharing data between processes requires an explicitly mapped shared-memory region. The region must be sized before any child runs, because most platforms cannot grow a SysV shared-memory segment while it is attached. The universal pattern:
- Supervisor calculates the total required size from configuration
(
max_connections,shared_buffers,max_locks_per_transaction, …). - Supervisor allocates the segment (
shmget/mmap). - Each child
fork()s after the segment exists and inherits the mapping. - On crash restart, the supervisor detaches the old segment, creates a fresh one of the same size, and re-initializes.
Step 4 is why max_connections and shared_buffers require a server
restart — they determine the segment size, which is fixed for the
lifetime of the postmaster.
PMChild pool: a fixed roster of live children
Section titled “PMChild pool: a fixed roster of live children”The supervisor must track which child has which role so that when a child exits, it knows whether to restart it, signal siblings, or transition shutdown state. The pattern is a fixed-size array (or pool) of child descriptors, one slot per permitted child, allocated from the shared-memory layout but managed only by the supervisor. Each slot records: PID, role (BackendType), and any role-specific state (pointer to background worker registration, notification flags, etc.).
Dead-end children — processes forked to send an error to a client before dying — are the exception: they are not counted against the pool because they consume no shared-memory resources.
State-machine-driven shutdown
Section titled “State-machine-driven shutdown”A clean, deterministic shutdown of a multi-process server requires sequencing: stop client backends first, wait for WAL to flush, take a shutdown checkpoint, stop archiver and walsenders, stop remaining infrastructure. Ad-hoc code that handles each subprocess type independently tends to miss interactions (e.g., a walsender that keeps replication slots open while archiver tries to retire WAL). The clean design is an explicit finite-state machine in the supervisor, with well-named states and one central function that decides which signal to send and when.
Theory ↔ PostgreSQL mapping
Section titled “Theory ↔ PostgreSQL mapping”| Generic concept | PostgreSQL name |
|---|---|
| Supervisor process | postmaster (postmaster.c) |
| Supervisor main loop | ServerLoop |
| Per-connection accept | BackendStartup → postmaster_child_launch |
| Child role identifier | BackendType enum (miscadmin.h) |
| Fixed child-slot roster | PMChild pool (pmchild.c) |
| Shared-memory sizing + allocation | CalculateShmemSize → CreateSharedMemoryAndSemaphores |
| Supervisor state machine | PMState enum + PostmasterStateMachine |
| Child-crash handler | HandleChildCrash → HandleFatalError |
| Dead-end child (no slot) | AllocDeadEndChild / B_DEAD_END_BACKEND |
| Background process roster | LaunchMissingBackgroundProcesses |
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”The single binary, many roles
Section titled “The single binary, many roles”PostgreSQL ships as one binary (postgres). What a process does is
determined by the BackendType value stored in the global MyBackendType,
set before the process calls its role-specific main_fn. The full enum in
REL_18_STABLE:
// BackendType — src/include/miscadmin.htypedef enum BackendType{ B_INVALID = 0,
/* Backend-like processes (call PostgresMain or a thin wrapper) */ B_BACKEND, /* regular client-serving backend */ B_DEAD_END_BACKEND, /* forked only to send an error to the client */ B_AUTOVAC_LAUNCHER, B_AUTOVAC_WORKER, B_BG_WORKER, B_WAL_SENDER, B_SLOTSYNC_WORKER, B_STANDALONE_BACKEND, /* postgres -s / single-user mode */
/* Auxiliary processes (no database binding, no heavyweight locks) */ B_ARCHIVER, B_BG_WRITER, B_CHECKPOINTER, B_IO_WORKER, /* PG18: async I/O worker */ B_STARTUP, B_WAL_RECEIVER, B_WAL_SUMMARIZER, /* PG18: WAL summarization for incremental backup */ B_WAL_WRITER,
B_LOGGER, /* syslogger — does not attach to shared memory */} BackendType;The distinction between “backend-like” and “auxiliary” is architectural:
backend-like processes call InitPostgres and can hold heavyweight locks;
auxiliary processes have simpler initialization paths and exist to support
the cluster infrastructure regardless of which client databases are open.
B_IO_WORKER and B_WAL_SUMMARIZER are new in PG18 and must not be
asserted for PG17.
PostmasterMain: cluster startup sequence
Section titled “PostmasterMain: cluster startup sequence”PostmasterMain is the entry point when the binary is invoked as a server
(postgres -D $PGDATA). Its startup sequence, before ServerLoop:
// PostmasterMain — src/backend/postmaster/postmaster.cPostmasterMain(int argc, char *argv[]){ InitProcessGlobals(); /* PID, latch, random seed */ PostmasterPid = MyProcPid; IsPostmasterEnvironment = true;
/* parse argv, read postgresql.conf, validate DataDir */ InitializeGUCOptions(); /* ... option parsing ... */ SelectConfigFiles(userDoption, progname); checkDataDir(); checkControlFile(); ChangeToDataDir();
/* install postmaster signal handlers */ pqsignal(SIGHUP, handle_pm_reload_request_signal); pqsignal(SIGTERM, handle_pm_shutdown_request_signal); pqsignal(SIGQUIT, handle_pm_shutdown_request_signal); pqsignal(SIGCHLD, handle_pm_child_exit_signal); pqsignal(SIGUSR1, handle_pm_pmsignal_signal);
/* *** fixed-size shared memory allocated here *** */ CreateSharedMemoryAndSemaphores();
InitPostmasterChildSlots(); /* allocate PMChild pool */
/* open listen sockets, write postmaster.pid */ /* ... condensed ... */
/* launch syslogger, startup process */ /* ... condensed: StartSysLogger(), StartChildProcess(B_STARTUP) ... */
ServerLoop(); /* never returns */}CreateSharedMemoryAndSemaphores is the point of no return: it sizes and
allocates the shared segment from which every later ShmemInitStruct call
carves its slice. A max_connections change requires re-running this
entire sequence because the buffer pool, lock table, procarray, and sinval
ring sizes all derive from it.
The PMChild pool
Section titled “The PMChild pool”InitPostmasterChildSlots (in pmchild.c) allocates a flat array of
PMChild structs and partitions it into per-BackendType freelists:
// pmchild.c — pool structuretypedef struct PMChildPool{ int size; /* slots reserved for this BackendType */ int first_slotno; /* index into the flat array */ dlist_head freelist; /* currently unused PMChild entries */} PMChildPool;
static PMChildPool pmchild_pools[BACKEND_NUM_TYPES];NON_EXEC_STATIC int num_pmchild_slots = 0;dlist_head ActiveChildList; /* all live children including dead-ends */When the postmaster forks a new backend, AssignPostmasterChildSlot pops
a slot from the appropriate freelist and links it onto ActiveChildList.
When the child exits, ReleasePostmasterChildSlot returns the slot to its
freelist. Dead-end children (B_DEAD_END_BACKEND) are the exception:
AllocDeadEndChild heap-allocates a PMChild outside the pool — there
is no cap on them because they consume no shared resources.
ServerLoop: the event loop
Section titled “ServerLoop: the event loop”After startup, the postmaster enters ServerLoop, a for(;;) around
WaitEventSetWait:
// ServerLoop — src/backend/postmaster/postmaster.cstatic intServerLoop(void){ ConfigurePostmasterWaitSet(true); /* latch + all listen sockets */ for (;;) { nevents = WaitEventSetWait(pm_wait_set, DetermineSleepTime(), events, lengthof(events), 0); for (int i = 0; i < nevents; i++) { if (events[i].events & WL_LATCH_SET) ResetLatch(MyLatch);
/* process deferred signals in priority order */ if (pending_pm_shutdown_request) process_pm_shutdown_request(); if (pending_pm_reload_request) process_pm_reload_request(); if (pending_pm_child_exit) process_pm_child_exit(); if (pending_pm_pmsignal) process_pm_pmsignal();
if (events[i].events & WL_SOCKET_ACCEPT) { AcceptConnection(events[i].fd, &s); BackendStartup(&s); /* fork a new backend */ closesocket(s.sock); /* postmaster does not keep it */ } } LaunchMissingBackgroundProcesses(); /* ... periodic: recheck postmaster.pid, touch socket files ... */ }}Signal handlers (SIGHUP, SIGTERM, SIGCHLD, SIGUSR1) do nothing more than
set pending_pm_* boolean flags and set the latch. All actual work
happens in the main loop. This deferred-signal discipline avoids
async-signal-unsafe operations (malloc, file I/O) inside handlers.
Figure 1 — Postmaster event loop: signals set latches; the main loop dispatches
flowchart TD
WES["WaitEventSetWait\n(blocks on latch + listen sockets)"]
SIG["Signal arrives\nSIGHUP / SIGTERM / SIGCHLD / SIGUSR1"]
LATCH["Set pending_pm_* flag\nSetLatch"]
RESET["ResetLatch"]
SDOWN["process_pm_shutdown_request"]
RELOAD["process_pm_reload_request"]
CEXIT["process_pm_child_exit"]
PMSIG["process_pm_pmsignal"]
ACCEPT["WL_SOCKET_ACCEPT:\nAcceptConnection → BackendStartup"]
LAUNCH["LaunchMissingBackgroundProcesses"]
WES -->|woken| RESET
SIG --> LATCH --> WES
RESET --> SDOWN
SDOWN --> RELOAD --> CEXIT --> PMSIG --> ACCEPT --> LAUNCH --> WES
Figure 1 — The postmaster’s ServerLoop is a pure event dispatcher. Signal handlers only flip boolean flags; all logic runs in the foreground loop after WaitEventSetWait returns.
BackendStartup: forking a client backend
Section titled “BackendStartup: forking a client backend”When WL_SOCKET_ACCEPT fires, BackendStartup orchestrates the fork:
// BackendStartup — src/backend/postmaster/postmaster.cstatic intBackendStartup(ClientSocket *client_sock){ cac = canAcceptConnections(B_BACKEND); if (cac == CAC_OK) { bn = AssignPostmasterChildSlot(B_BACKEND); if (!bn) cac = CAC_TOOMANY; /* pool exhausted → dead-end child */ } if (!bn) bn = AllocDeadEndChild(); /* heap-allocated, no slot */
startup_data.canAcceptConnections = cac; pid = postmaster_child_launch(bn->bkend_type, bn->child_slot, &startup_data, sizeof(startup_data), client_sock); if (pid < 0) { ReleasePostmasterChildSlot(bn); report_fork_failure_to_client(client_sock, save_errno); return STATUS_ERROR; } bn->pid = pid; return STATUS_OK;}canAcceptConnections checks pmState, max_connections,
superuser_reserved_connections, and the connsAllowed flag. A backend
forked when the connection limit is already reached gets
cac = CAC_TOOMANY; it is a dead-end backend that will immediately send
the “too many connections” error and exit.
postmaster_child_launch (in launch_backend.c) executes the actual
fork(). On Unix the child runs the main_fn registered for its
BackendType; on Windows EXEC_BACKEND is defined and the child
re-enters via SubPostmasterMain after deserializing BackendParameters
from shared memory.
The PMState machine: normal operation to shutdown
Section titled “The PMState machine: normal operation to shutdown”The postmaster tracks its overall state in a single PMState variable:
// PMState enum — src/backend/postmaster/postmaster.ctypedef enum PMState{ PM_INIT, /* postmaster starting */ PM_STARTUP, /* waiting for startup subprocess */ PM_RECOVERY, /* in archive recovery mode */ PM_HOT_STANDBY, /* in hot standby mode */ PM_RUN, /* normal: accepting connections */ PM_STOP_BACKENDS, /* need to stop remaining backends (transient) */ PM_WAIT_BACKENDS, /* waiting for live backends to exit */ PM_WAIT_XLOG_SHUTDOWN,/* waiting for checkpointer shutdown ckpt */ PM_WAIT_XLOG_ARCHIVAL,/* waiting for archiver and walsenders to finish */ PM_WAIT_IO_WORKERS, /* waiting for io workers to exit */ PM_WAIT_CHECKPOINTER, /* waiting for checkpointer to shut down */ PM_WAIT_DEAD_END, /* waiting for dead-end children to exit */ PM_NO_CHILDREN, /* all important children have exited */} PMState;PostmasterStateMachine is called after every significant event (child
exit, signal receipt) and drives transitions. The normal-operation
forward path:
PM_INIT → PM_STARTUP → PM_RECOVERY (if WAL recovery needed) → PM_HOT_STANDBY (if standby) → PM_RUNThe shutdown path (smart shutdown as the common case):
PM_RUN → PM_STOP_BACKENDS (send SIGTERM to client backends) → PM_WAIT_BACKENDS (wait for them to exit) → PM_WAIT_XLOG_SHUTDOWN (wait for checkpointer to write shutdown ckpt) → PM_WAIT_XLOG_ARCHIVAL (wait for archiver + walsenders) → PM_WAIT_IO_WORKERS → PM_WAIT_CHECKPOINTER → PM_WAIT_DEAD_END → PM_NO_CHILDREN → ExitPostmaster(0)Figure 2 — PMState transitions for smart shutdown
stateDiagram-v2
[*] --> PM_INIT
PM_INIT --> PM_STARTUP : CreateSharedMemoryAndSemaphores\nlaunch startup process
PM_STARTUP --> PM_RECOVERY : startup process running\nrecovery needed
PM_STARTUP --> PM_RUN : startup process exits 0\nno recovery
PM_RECOVERY --> PM_HOT_STANDBY : recovery complete\nstandby mode
PM_HOT_STANDBY --> PM_RUN : promoted to primary
PM_RECOVERY --> PM_RUN : recovery complete\nprimary mode
PM_RUN --> PM_STOP_BACKENDS : smart shutdown\nconnsAllowed=false
PM_STOP_BACKENDS --> PM_WAIT_BACKENDS : SIGTERM sent to backends
PM_WAIT_BACKENDS --> PM_WAIT_XLOG_SHUTDOWN : all backends exited
PM_WAIT_XLOG_SHUTDOWN --> PM_WAIT_XLOG_ARCHIVAL : shutdown checkpoint written
PM_WAIT_XLOG_ARCHIVAL --> PM_WAIT_IO_WORKERS : archiver and walsenders done
PM_WAIT_IO_WORKERS --> PM_WAIT_CHECKPOINTER : io workers done
PM_WAIT_CHECKPOINTER --> PM_WAIT_DEAD_END : checkpointer exits
PM_WAIT_DEAD_END --> PM_NO_CHILDREN : dead-end children gone
PM_NO_CHILDREN --> [*] : ExitPostmaster(0)
Figure 2 — Normal (smart) shutdown PMState progression. Immediate shutdown and crash recovery collapse several intermediate states by bypassing the wait-for-backends steps.
Crash recovery: HandleChildCrash and FatalError
Section titled “Crash recovery: HandleChildCrash and FatalError”When process_pm_child_exit finds a non-zero exit status for a critical
child, it calls HandleChildCrash:
// HandleChildCrash — src/backend/postmaster/postmaster.cstatic voidHandleChildCrash(int pid, int exitstatus, const char *procname){ if (FatalError || Shutdown == ImmediateShutdown) return; /* already in crash-recovery path */
LogChildExit(LOG, procname, pid, exitstatus); ereport(LOG, (errmsg("terminating any other active server processes")));
/* Sets FatalError=true, sends SIGQUIT to siblings */ HandleFatalError(PMQUIT_FOR_CRASH, true);}FatalError = true is the flag that turns a normal shutdown into a crash
restart. Once set, PostmasterStateMachine drives the cluster through:
- Send
SIGQUITto all children (bypasses graceful shutdown path in each child — they callquickdie→_exit(2)). - Wait for all children to exit (
PM_WAIT_BACKENDS→PM_WAIT_DEAD_END). PM_NO_CHILDREN: re-create shared memory (CreateSharedMemoryAndSemaphoresagain, at line 3202), re-launch the startup process (which will run WAL recovery), and transition back toPM_STARTUP.
The key insight is that because the postmaster itself holds no user data, this whole cycle — signal, wait, re-create, relaunch — is just a few hundred lines of deterministic C.
The reaping itself happens in process_pm_child_exit, the deferred-signal
handler that drains every exited child with a non-blocking waitpid loop:
// process_pm_child_exit — src/backend/postmaster/postmaster.cstatic voidprocess_pm_child_exit(void){ int pid; int exitstatus;
pending_pm_child_exit = false;
while ((pid = waitpid(-1, &exitstatus, WNOHANG)) > 0) { PMChild *pmchild;
/* Check if this child was a startup process. */ if (StartupPMChild && pid == StartupPMChild->pid) { ReleasePostmasterChildSlot(StartupPMChild); StartupPMChild = NULL;
if (Shutdown > NoShutdown && (EXIT_STATUS_0(exitstatus) || EXIT_STATUS_1(exitstatus))) { StartupStatus = STARTUP_NOT_RUNNING; UpdatePMState(PM_WAIT_BACKENDS); continue; /* PostmasterStateMachine does the rest */ } /* ... unexpected startup-process exit → HandleChildCrash ... */ } /* ... checkpointer, bgwriter, walwriter, autovac, archiver, ... */ /* otherwise it was a backend or bgworker: */ CleanupBackend(pmchild, exitstatus); }
/* After processing all exits, recompute the postmaster's state. */ PostmasterStateMachine();}waitpid(-1, …, WNOHANG) reaps all currently-zombie children in one
handler invocation, not one per SIGCHLD: Unix may coalesce multiple
SIGCHLD deliveries into a single pending bit, so the loop must keep
calling waitpid until it returns 0. Each named auxiliary (startup,
checkpointer, bgwriter, …) is matched by its saved PMChild pointer;
everything else falls through to CleanupBackend, which releases the
pool slot and applies bgworker restart policy. The single
PostmasterStateMachine() call at the bottom is what actually advances
shutdown or fires the crash-restart path described below.
The crash-restart re-initialization is the tail of
PostmasterStateMachine: once FatalError is set and every
non-syslogger child has been reaped (pmState == PM_NO_CHILDREN), the
postmaster rebuilds the entire in-memory cluster from scratch:
// PostmasterStateMachine (reinit tail) — src/backend/postmaster/postmaster.cif (FatalError && pmState == PM_NO_CHILDREN){ ereport(LOG, (errmsg("all server processes terminated; reinitializing")));
if (remove_temp_files_after_crash) RemovePgTempFiles();
ResetBackgroundWorkerCrashTimes(); /* allow bgworkers to restart now */
shmem_exit(1); /* detach old shared segment */ LocalProcessControlFile(true); /* re-read control file */ CreateSharedMemoryAndSemaphores(); /* fresh shared memory */
UpdatePMState(PM_STARTUP); maybe_adjust_io_workers(); /* need I/O workers for recovery */ StartupPMChild = StartChildProcess(B_STARTUP); /* runs WAL recovery */ StartupStatus = STARTUP_RUNNING; AbortStartTime = 0;
ConfigurePostmasterWaitSet(true); /* accept connections again */}Two guards sit just above this block and turn a crash into a permanent
exit instead of a restart: if StartupStatus == STARTUP_CRASHED the
postmaster calls ExitPostmaster(1) (“more than likely it will just
fail again”), and if the restart_after_crash GUC is off it exits with
the log line “shutting down because “restart_after_crash” is off”.
Only when neither guard fires does control reach the reinit block. Note
that shmem_exit(1) detaches the old segment before
CreateSharedMemoryAndSemaphores() maps a fresh one of identical size —
this is why a crash restart preserves shared_buffers / max_connections
sizing without re-reading postgresql.conf for those values.
Figure 3 — Crash detection to reinitialization
flowchart TD
CHILD["A critical child exits<br/>non-zero (SIGCHLD)"]
REAP["process_pm_child_exit<br/>waitpid(-1, WNOHANG) loop"]
CRASH["HandleChildCrash<br/>FatalError = true<br/>SIGQUIT to all siblings"]
SM1["PostmasterStateMachine"]
WAIT["PM_WAIT_BACKENDS ... PM_WAIT_DEAD_END<br/>siblings quickdie / _exit(2)"]
NOCHILD["PM_NO_CHILDREN<br/>all non-syslogger children reaped"]
GUARD{"StartupStatus==CRASHED<br/>or restart_after_crash off?"}
EXIT["ExitPostmaster(1)"]
REINIT["shmem_exit(1) detach old segment<br/>CreateSharedMemoryAndSemaphores<br/>StartChildProcess(B_STARTUP)"]
STARTUP["PM_STARTUP<br/>WAL recovery re-runs"]
CHILD --> REAP --> CRASH --> SM1 --> WAIT --> NOCHILD --> GUARD
GUARD -->|yes| EXIT
GUARD -->|no| REINIT --> STARTUP
Figure 3 — The crash-restart cycle. SIGQUIT forces every sibling through the immediate quickdie path; only after the pool is fully drained (PM_NO_CHILDREN) does the postmaster detach shared memory, rebuild it at the same size, and relaunch the startup process to replay WAL.
LaunchMissingBackgroundProcesses
Section titled “LaunchMissingBackgroundProcesses”After every event-loop iteration, LaunchMissingBackgroundProcesses
inspects pmState and fills in any background process that should be
running but isn’t:
B_CHECKPOINTERandB_BG_WRITERare wanted inPM_STARTUP,PM_RECOVERY,PM_HOT_STANDBY, andPM_RUN.B_WAL_WRITERandB_AUTOVAC_LAUNCHERare wanted only inPM_RUN.B_ARCHIVERis wanted inPM_RUN(or always ifarchive_mode = always) when archiving is active.B_IO_WORKERcount is managed dynamically bymaybe_adjust_io_workersbased onio_combine_limitandmax_io_concurrencyGUCs (PG18).B_WAL_SUMMARIZERis wanted inPM_RUNwhensummarize_wal = on(PG18).
This “lazy launch on every iteration” pattern means the postmaster never needs explicit “restart background process X” code paths — if a background process exits normally, the next loop iteration will relaunch it.
Source Walkthrough
Section titled “Source Walkthrough”PostmasterMain and startup
Section titled “PostmasterMain and startup”PostmasterMain(postmaster.c:494) — the cluster entry point; GUC init, config load, shared-memory creation, child-slot init, listen-socket binding, syslogger + startup-process launch, thenServerLoop.InitProcessGlobals(postmaster.c:1933) — setsMyProcPid,MyStartTimestamp, initializesMyLatchand the random seed before any child is forked.CreateSharedMemoryAndSemaphores— called at line 1004 on first startup and again at line 3202 after a crash restart. Sizes and allocates the shared segment; everyShmemInitStructcall later carves from this slab.
PMChild pool
Section titled “PMChild pool”InitPostmasterChildSlots(pmchild.c:86) — called once at startup; allocates thePMChildarray and partitions it into per-type freelists.AssignPostmasterChildSlot(pmchild.c:162) — pops a slot from the appropriate freelist before forking; links the slot ontoActiveChildList.AllocDeadEndChild(pmchild.c:208) — heap-allocates aPMChildfor dead-end backends; not drawn from the pool.ReleasePostmasterChildSlot(pmchild.c:236) — returns a slot to its freelist after a child exits; called fromCleanupBackendandprocess_pm_child_exitfor named children.FindPostmasterChildByPid(pmchild.c:274) — O(n) scan ofActiveChildList; used inprocess_pm_child_exitfor the “was it a backend or bgworker?” check.
ServerLoop and event dispatch
Section titled “ServerLoop and event dispatch”ServerLoop(postmaster.c:1653) — thefor(;;)aroundWaitEventSetWait; dispatches deferred-signal work and accepts.ConfigurePostmasterWaitSet(postmaster.c:1630) — builds theWaitEventSetwith the latch plus all listen sockets; called again withaccept_connections=falseduring shutdown to stop accepting new connections.BackendStartup(postmaster.c:3518) — acquires a child slot, callspostmaster_child_launch, records the PID; handles the CAC_TOOMANY → dead-end path.canAcceptConnections(postmaster.c:1812) — checkspmState,connsAllowed, connection count vs limits; returns aCAC_stateenum.LaunchMissingBackgroundProcesses(postmaster.c:3267) — iterates all background types and launches any that should be running; called at the bottom of everyServerLoopiteration.
Child launch (launch_backend.c)
Section titled “Child launch (launch_backend.c)”postmaster_child_launch(launch_backend.c:229) —fork()on Unix;internal_forkexecon Windows. Child callsInitPostmasterChild(), closes postmaster ports, then invokeschild_process_kinds[type].main_fn.PostmasterChildName(launch_backend.c:211) — mapsBackendTypeto a human-readable string for log messages andpsdisplay.
State machine and crash recovery
Section titled “State machine and crash recovery”PostmasterStateMachine(postmaster.c:2865) — called after every signal-driven work function; the single function that decides PMState transitions and which signals to send.HandleChildCrash(postmaster.c:2772) — logs the crash, callsHandleFatalError(PMQUIT_FOR_CRASH, true)to setFatalErrorand broadcastSIGQUIT.CleanupBackend(postmaster.c:2550) — releases thePMChildslot for a backend or bgworker after exit; updates bgworker restart logic; callsPostmasterStateMachine.process_pm_child_exit(postmaster.c:2233) — theSIGCHLD-driven reaper;waitpid(-1, WNOHANG)loop; dispatches to per-type handlers or toCleanupBackend; callsPostmasterStateMachineat the end.
Position hints (as of 2026-06-05, commit 273fe94)
Section titled “Position hints (as of 2026-06-05, commit 273fe94)”| Symbol | File | Line |
|---|---|---|
BackendType enum | src/include/miscadmin.h | 337 |
PMState enum | src/backend/postmaster/postmaster.c | 336 |
PostmasterMain | src/backend/postmaster/postmaster.c | 494 |
CreateSharedMemoryAndSemaphores (first call) | src/backend/postmaster/postmaster.c | 1004 |
InitPostmasterChildSlots call | src/backend/postmaster/postmaster.c | 952 |
ServerLoop | src/backend/postmaster/postmaster.c | 1653 |
ConfigurePostmasterWaitSet | src/backend/postmaster/postmaster.c | 1630 |
canAcceptConnections | src/backend/postmaster/postmaster.c | 1812 |
InitProcessGlobals | src/backend/postmaster/postmaster.c | 1933 |
CleanupBackend | src/backend/postmaster/postmaster.c | 2550 |
process_pm_child_exit | src/backend/postmaster/postmaster.c | 2233 |
HandleChildCrash | src/backend/postmaster/postmaster.c | 2772 |
PostmasterStateMachine | src/backend/postmaster/postmaster.c | 2865 |
LaunchMissingBackgroundProcesses | src/backend/postmaster/postmaster.c | 3267 |
BackendStartup | src/backend/postmaster/postmaster.c | 3518 |
CreateSharedMemoryAndSemaphores (crash-restart call) | src/backend/postmaster/postmaster.c | 3202 |
postmaster_child_launch | src/backend/postmaster/launch_backend.c | 229 |
PostmasterChildName | src/backend/postmaster/launch_backend.c | 211 |
MaxLivePostmasterChildren | src/backend/postmaster/pmchild.c | 70 |
InitPostmasterChildSlots | src/backend/postmaster/pmchild.c | 86 |
AssignPostmasterChildSlot | src/backend/postmaster/pmchild.c | 162 |
AllocDeadEndChild | src/backend/postmaster/pmchild.c | 208 |
ReleasePostmasterChildSlot | src/backend/postmaster/pmchild.c | 236 |
FindPostmasterChildByPid | src/backend/postmaster/pmchild.c | 274 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”Verified facts
Section titled “Verified facts”-
BackendTypehas 18 members in REL_18_STABLE, includingB_IO_WORKERandB_WAL_SUMMARIZERwhich are new in PG18. Verified atsrc/include/miscadmin.h:337–375, commit 273fe94.B_IO_WORKERis the async-I/O worker added withstorage/aio/; it does not callInitPostgresand does not hold heavyweight locks.B_WAL_SUMMARIZERsupports incremental backup (PG18 feature). Neither exists in PG17; assert neither when describing earlier releases. -
PMStatehas 12 values;PM_HOT_STANDBYis distinct fromPM_RECOVERY. Verified atpostmaster.c:336–352. The distinction matters forLaunchMissingBackgroundProcesses: archiver is started inPM_HOT_STANDBY(whenarchive_mode=always) but not inPM_RECOVERY. -
Signal handlers set boolean flags only; no SQL-unsafe work in handlers. Verified:
handle_pm_child_exit_signalsetspending_pm_child_exit = trueandSetLatch(MyLatch)only (postmaster.c:2223–2231). All reaping happens inprocess_pm_child_exitin the main loop. -
BackendStartupcloses the accepted socket in the postmaster after fork. Verified atpostmaster.c:1704–1712:closesocket(s.sock)is called unconditionally in the parent afterBackendStartupreturns. The child inherits the open file descriptor throughfork(); the postmaster does not need it. -
Dead-end children are heap-allocated, not pool-allocated. Verified in
pmchild.c:208–234(AllocDeadEndChild): usespallocfromTopMemoryContext, notpmchild_pools. The comment confirms “There is no limit on the number of dead-end backends.” -
CreateSharedMemoryAndSemaphoresis called twice: at startup and on crash restart. Verified atpostmaster.c:1004(startup) andpostmaster.c:3202(crash restart, inside thePM_NO_CHILDRENbranch ofPostmasterStateMachine). On crash restart the old shared segment is detached before the new one is allocated. -
LaunchMissingBackgroundProcessesis called on everyServerLoopiteration, not only on child-exit events. Verified atpostmaster.c:1718: it is the last statement in the event dispatch block, outside thefor (int i = 0; i < nevents; i++)loop. This ensures background processes are relaunched even if no events fired (e.g., after a config reload that enables WAL archiving).
Open questions
Section titled “Open questions”-
B_IO_WORKERlifecycle details.maybe_adjust_io_workers(called fromLaunchMissingBackgroundProcesses) dynamically starts and stopsB_IO_WORKERprocesses based on GUC values. The exact algorithm for deciding how many workers to maintain — and whether excess workers are sent SIGTERM or allowed to exit naturally — is not traced in this document. Investigation path: readmaybe_adjust_io_workersand theio_worker_*functions instorage/aio/. -
PM_WAIT_XLOG_ARCHIVALtransition trigger. The condition that movespmStatefromPM_WAIT_XLOG_SHUTDOWNtoPM_WAIT_XLOG_ARCHIVAL(the shutdown checkpoint is “written enough”) is driven by apmsignalfrom the checkpointer. The exactPMSIGNAL_*value and the handshake are not detailed here. Investigation path: searchPMSIGNAL_SHUTDOWN_COMPLETEinpmsignal.candcheckpointer.c. -
EXEC_BACKEND(Windows) re-entry path. On Windows,postmaster_child_launchcallsinternal_forkexecinstead offork(). The child re-enters viaSubPostmasterMain, deserializesBackendParameters, and then calls the appropriatemain_fn. The exact parameters serialized and the interaction withMyClientSocketon the Windows path are not analyzed here. Investigation path: readsave_backend_variables/restore_backend_variablesinlaunch_backend.c.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”-
Supervisor-per-instance vs. thread-pool models. Architecture of a Database System (Hellerstein et al., 2007, §2.2–2.3) compares the process-per-session, thread-per-session, and process-pool models. PostgreSQL’s postmaster-forks-on-demand is the “process-per-session” variant; the cost is
fork()latency per new connection. Connection poolers (PgBouncer, pgpool-II) and the prototype built-in pooler in PG17+ address this by multiplexing many client connections onto fewer backend processes. UnderstandingPostmasterMain’s startup cost illuminates what the pool amortizes. -
“The Design of POSTGRES” (Stonebraker & Rowe, 1986). The original design paper explicitly chose the process model for simplicity and isolation. Comparing the 1986 description of the “postmaster” with the current
PostmasterMain(now ~900 lines vs. a few pages of prose) makes a concrete evolution-of-complexity exercise: what did the original design defer that REL_18 must handle (WindowsEXEC_BACKEND, PG18 async I/O workers, WAL summarizer, incremental backup, hot standby)? -
Oracle’s process model and PMON. Oracle uses a similar supervisor/worker split, but its coordinator (PMON — Process MONitor) is itself a background process, not the parent of all other processes. PMON detects failed sessions and cleans up their resources in the shared pool. PostgreSQL’s postmaster plays both roles: parent-process supervisor (via SIGCHLD) and crash-recovery coordinator (via
HandleChildCrash). A comparison would highlight the trade-off between polling (PMON) and signal-driven reaping (SIGCHLD). -
MySQL’s thread-per-connection model. MySQL’s “one-thread-per-connection” server skips the
fork()cost entirely: a new connection is apthread_create. The penalty is that a bug in one thread’s stack can corrupt another session’s allocations. The absence of a “postmaster” equivalent means MySQL’s crash recovery operates at the engine level (InnoDB recovery on restart) rather than the process-supervisor level. -
Greenplum / Citus: coordinating a postmaster fleet. Distributed PostgreSQL variants run one postmaster per segment. The coordinator postmaster dispatches query fragments to segment postmasters over libpq connections. The postmaster’s crash-isolation property becomes essential in this setting: a segment crash can be recovered in isolation without restarting the coordinator.
Sources
Section titled “Sources”Raw source files consumed
Section titled “Raw source files consumed”- None (synthesized directly from source tree at REL_18_STABLE / commit 273fe94).
Source code paths (REL_18_STABLE / commit 273fe94)
Section titled “Source code paths (REL_18_STABLE / commit 273fe94)”src/backend/postmaster/postmaster.c—PostmasterMain,ServerLoop,BackendStartup,canAcceptConnections,process_pm_child_exit,HandleChildCrash,PostmasterStateMachine,LaunchMissingBackgroundProcesses,CleanupBackend,PMStateenumsrc/backend/postmaster/launch_backend.c—postmaster_child_launch,PostmasterChildName,SubPostmasterMain(Windows re-entry),save_backend_variables/restore_backend_variablessrc/backend/postmaster/pmchild.c—PMChildpool,PMChildPool,InitPostmasterChildSlots,AssignPostmasterChildSlot,AllocDeadEndChild,ReleasePostmasterChildSlot,FindPostmasterChildByPidsrc/include/miscadmin.h—BackendTypeenum,AmRegularBackendProcess
Textbook and paper references
Section titled “Textbook and paper references”- Hellerstein, Stonebraker, Hamilton. Architecture of a Database System, Foundations and Trends in Databases, 2007. §2 (process models).
- Stonebraker, M., and Rowe, L. A. “The Design of POSTGRES.” SIGMOD 1986. (Process model rationale and original postmaster concept.)