PostgreSQL Background Workers — The Dynamic Worker Framework
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”A database server is, at bottom, a task-dispatch system: a long-lived controlling entity that spawns, supervises, and reaps subordinate units of execution which do the actual work — running queries, flushing buffers, vacuuming dead tuples, shipping WAL. The architectural question every engine must answer is what is the unit of execution, and who owns its lifecycle? Three properties define the design space.
-
Process vs. thread. Is each subordinate unit an OS process with its own address space, or a thread inside a shared one? Processes give fault isolation (a crashing worker cannot scribble over another’s stack) at the cost of expensive context switches and the need for an explicit shared memory region for anything that must be communicated. Threads give cheap sharing at the cost of fate-sharing: one corrupt pointer takes down the server. Architecture of a Database System (Hellerstein, Stonebraker & Hamilton 2007, §2 “Process Models”) catalogs the three canonical models — process-per-connection, thread-per-connection, and process/thread pool — and observes that the choice is “one of the most fundamental” an engine makes, because it pervades every later decision about caching, locking, and admission control.
-
Static vs. dynamic worker population. Are the subordinate processes a fixed cohort decided at startup (a thread pool sized once), or can the population grow and shrink in response to load? A parallel query that fans out across eight cores for two seconds and then collapses back to one needs workers spun up and torn down on a sub-second cadence; a background vacuum launcher needs exactly one long-lived helper. A general framework must serve both: register-once-at-boot and register-on-demand-at-runtime.
-
Supervision and restart policy. When a worker dies, who notices, and what happens? A supervisor that restarts crashed helpers gives self-healing; one that forgets them on clean exit avoids zombie accumulation. The policy must be encoded per-worker, because a parallel worker that finished its slice should never be restarted, whereas a logical-replication apply worker whose connection dropped should be.
Database System Concepts (Silberschatz, Korth & Sudarshan, 7e, §20 “Database-System Architectures”) frames the server as a collection of cooperating processes communicating through shared memory, and notes that the “process monitor” — the entity that detects failures and triggers recovery — is the linchpin of availability. PostgreSQL’s postmaster is exactly this process monitor, and the background worker framework is the generalized, pluggable extension of its supervision logic: it lets the postmaster fork and reap processes whose code it does not itself contain (extensions, parallel workers, apply workers) under the same restart and crash-recovery umbrella that governs the built-in auxiliary processes.
The crucial PostgreSQL-specific constraint that shapes everything below is a
reliability invariant on the postmaster: the postmaster must never take a
lock — not even a spinlock — on shared memory it shares with backends,
because a backend that corrupts shared memory (a wild write, a bug) could
otherwise wedge or crash the postmaster, and a dead postmaster means no
crash recovery for anybody. The entire background-worker shared-memory
protocol is therefore designed to be lockless from the postmaster’s side:
backends coordinate among themselves with a conventional LWLock, but they
hand off slots to the postmaster through a single carefully-ordered flag
(in_use) plus memory barriers. This is the architectural signature of the
whole module.
Common DBMS Design
Section titled “Common DBMS Design”This section names the engineering patterns that recur across engines when they build a supervised-worker framework, so PostgreSQL’s choices read as selections within a shared space.
The supervisor / worker split
Section titled “The supervisor / worker split”Almost every process-model engine separates a supervisor (Oracle’s
PMON/SMON-era monitors, SQL Server’s SQLOS scheduler, PostgreSQL’s
postmaster) from workers that do the user-visible work. The supervisor
owns fork/exec (or thread spawn), holds the master list of who is alive,
catches SIGCHLD/exit events, and decides on restart. Workers report their
status back through a channel the supervisor can read cheaply and safely.
A shared registry of worker slots
Section titled “A shared registry of worker slots”To communicate “please start a worker like this” from a worker to the
supervisor, engines use a fixed-size array of descriptor slots in shared
memory. Fixed-size (not a linked list) because shared memory is allocated
once at startup and the supervisor must be able to scan it without following
pointers it cannot trust. Each slot carries the worker’s configuration plus a
small state machine (free → claimed → running → exited). The slot count is a
hard cap; in PostgreSQL it is the max_worker_processes GUC.
Generation counters to defeat the ABA problem
Section titled “Generation counters to defeat the ABA problem”A subtle hazard: backend A registers a worker, gets a handle pointing at slot
5, the worker finishes, slot 5 is recycled for backend B’s brand-new worker,
and now A’s stale handle appears to point at a live worker that is not its
own. The standard defense is a generation counter (a.k.a. a tag or epoch)
stored alongside the slot and copied into the handle. Every reuse bumps the
counter; a handle is only valid if its generation matches the slot’s. This is
the classic ABA-problem fix from lock-free programming, applied to slot
reuse. PostgreSQL’s BackgroundWorkerHandle is {slot, generation} for
exactly this reason.
Passing addresses by name, not by pointer
Section titled “Passing addresses by name, not by pointer”When the supervisor forks a worker and the worker must call a function the
supervisor was told about, you cannot pass a raw function pointer if there is
any chance the address differs between processes — and under EXEC_BACKEND
(Windows, or --enable-exec-backend debug builds) the child is a fresh
exec, so ASLR can place the same function at a different address.
The portable pattern is to pass (library_name, function_name) strings
and have the worker resolve them locally via the dynamic loader. PostgreSQL’s
bgw_library_name / bgw_function_name pair and LookupBackgroundWorkerFunction
implement precisely this.
Dynamic shared memory for variable-size shared state
Section titled “Dynamic shared memory for variable-size shared state”The fixed shared-memory region is sized at boot, but a parallel query needs a
chunk of shared memory sized to this query’s tuple-queue and instrumentation
needs — unknowable at boot. Engines answer with a dynamic shared memory
facility: per-operation segments created on demand, named by a handle that
can be shipped to another process, reference-counted so the last detacher
frees them. PostgreSQL layers this as dsm.c (raw segments) → dsa.c (a
shared heap inside segments, with dsa_allocate/dsa_free) → shm_mq,
shm_toc (message queues and tables-of-contents on top). The background
worker framework and DSM are siblings that almost always travel together:
you register a worker, then hand it a DSM handle as its bgw_main_arg.
flowchart TB
subgraph PM["Postmaster (no locks, ever)"]
LIST["BackgroundWorkerList<br/>private dlist of RegisteredBgWorker"]
SC["BackgroundWorkerStateChange()<br/>scans slots, copies new regs in"]
START["maybe_start_bgworkers()<br/>fork eligible workers"]
end
subgraph SHM["Shared memory"]
ARR["BackgroundWorkerArray<br/>slot[max_worker_processes]<br/>in_use / terminate / pid / generation"]
DSM["DSM segments + DSA areas<br/>per-operation shared state"]
end
subgraph BE["Regular backend (may take BackgroundWorkerLock)"]
REG["RegisterDynamicBackgroundWorker()<br/>claim a free slot"]
H["BackgroundWorkerHandle<br/>{slot, generation}"]
end
REG -->|"LWLock + write barrier,<br/>set in_use=true"| ARR
REG -->|"PMSIGNAL_BACKGROUND_WORKER_CHANGE"| SC
SC -->|"read barrier, copy slot"| LIST
LIST --> START
START -->|"postmaster_child_launch()<br/>fork"| WORKER["BackgroundWorkerMain()"]
REG --> H
H -.->|"GetBackgroundWorkerPid /<br/>WaitFor* / Terminate"| ARR
WORKER -.->|"dsm_attach(bgw_main_arg)"| DSM
BE -.->|"dsa_create / dsm_create"| DSM
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”PostgreSQL exposes a deliberately small public API — five registration/control functions plus one config struct — and hides a careful lockless protocol behind it. We trace the struct, the two registration paths, the slot array, the lifecycle, and the DSM/DSA companions.
The registration struct: BackgroundWorker
Section titled “The registration struct: BackgroundWorker”Everything a caller wants the postmaster to know about a prospective worker is
packed into one fixed-layout struct, because it must be memcpy-able through
shared memory and across a fork/exec boundary:
// BackgroundWorker — src/include/postmaster/bgworker.htypedef struct BackgroundWorker{ char bgw_name[BGW_MAXLEN]; char bgw_type[BGW_MAXLEN]; int bgw_flags; BgWorkerStartTime bgw_start_time; int bgw_restart_time; /* in seconds, or BGW_NEVER_RESTART */ char bgw_library_name[MAXPGPATH]; char bgw_function_name[BGW_MAXLEN]; Datum bgw_main_arg; char bgw_extra[BGW_EXTRALEN]; pid_t bgw_notify_pid; /* SIGUSR1 this backend on start/stop */} BackgroundWorker;Note there are no pointers here except Datum bgw_main_arg, which is by
convention a small scalar (often a DSM handle, an OID, or a slot index) — never
a heap address, because the worker runs in a different address space. The
entry point is named, not pointed at: bgw_library_name + bgw_function_name.
bgw_flags is a bitmask of BGWORKER_SHMEM_ACCESS (mandatory),
BGWORKER_BACKEND_DATABASE_CONNECTION (worker will call
BackgroundWorkerInitializeConnection), and the internal-only
BGWORKER_CLASS_PARALLEL. bgw_start_time (one of
BgWorkerStart_PostmasterStart / _ConsistentState / _RecoveryFinished)
tells the postmaster how far into startup it may launch this worker; a
database-connected worker may not start at postmaster start, since the
catalogs are not yet available.
Two registration paths, one slot array
Section titled “Two registration paths, one slot array”Static registration is for workers known at boot. It is callable only
from the postmaster itself or from an extension’s _PG_init while
shared_preload_libraries is being processed, and it appends to a
postmaster-private dlist, not to shared memory — because shared memory
does not exist yet:
// RegisterBackgroundWorker — src/backend/postmaster/bgworker.cvoidRegisterBackgroundWorker(BackgroundWorker *worker){ RegisteredBgWorker *rw; static int numworkers = 0;
if (IsUnderPostmaster || !IsPostmasterEnvironment) { if (process_shared_preload_libraries_in_progress) return; ereport(LOG, ( /* ... must be registered in shared_preload_libraries */ )); return; } if (BackgroundWorkerData != NULL) elog(ERROR, "cannot register background worker \"%s\" after shmem init", ...); /* ... SanityCheckBackgroundWorker, numworkers cap ... */ rw->rw_worker = *worker; rw->rw_pid = 0; rw->rw_crashed_at = 0; rw->rw_terminate = false; dlist_push_head(&BackgroundWorkerList, &rw->rw_lnode);}At BackgroundWorkerShmemInit time these private entries are copied 1-to-1
into the shared slot[] array, establishing the correspondence between the
postmaster’s private BackgroundWorkerList and the shared BackgroundWorkerArray.
Dynamic registration is for workers requested at runtime by an ordinary
backend (a parallel-query leader, the logical-replication launcher, an
extension’s SQL function). It must run under the postmaster, takes
BackgroundWorkerLock to coordinate with other backends, scans for a free
slot, fills it, and — critically — issues a write barrier before setting
in_use so the postmaster can never observe a half-initialized slot:
// RegisterDynamicBackgroundWorker — src/backend/postmaster/bgworker.cLWLockAcquire(BackgroundWorkerLock, LW_EXCLUSIVE);/* ... parallel-class admission check against max_parallel_workers ... */for (slotno = 0; slotno < BackgroundWorkerData->total_slots; ++slotno){ BackgroundWorkerSlot *slot = &BackgroundWorkerData->slot[slotno]; if (!slot->in_use) { memcpy(&slot->worker, worker, sizeof(BackgroundWorker)); slot->pid = InvalidPid; /* indicates not started yet */ slot->generation++; slot->terminate = false; generation = slot->generation; if (parallel) BackgroundWorkerData->parallel_register_count++; pg_write_barrier(); /* postmaster must see contents before in_use */ slot->in_use = true; success = true; break; }}LWLockRelease(BackgroundWorkerLock);if (success) SendPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE);The backend then signals the postmaster (PMSIGNAL_BACKGROUND_WORKER_CHANGE)
and, if it asked for a handle, receives {slot, generation} to query or kill
the worker later.
The lockless hand-off protocol
Section titled “The lockless hand-off protocol”The shared slot carries five fields, and a comment block in bgworker.c
spells out the ownership contract: when in_use is false the postmaster
ignores the slot and backends own it; once a backend flips in_use to true
(after a write barrier) the slot becomes the postmaster’s, and backends may
thereafter only set the terminate flag.
// BackgroundWorkerSlot — src/backend/postmaster/bgworker.ctypedef struct BackgroundWorkerSlot{ bool in_use; bool terminate; pid_t pid; /* InvalidPid = not started yet; 0 = dead */ uint64 generation; /* incremented when slot is recycled */ BackgroundWorker worker;} BackgroundWorkerSlot;The handle a backend holds is the generation-tagged coordinate:
// BackgroundWorkerHandle — src/backend/postmaster/bgworker.cstruct BackgroundWorkerHandle{ int slot; uint64 generation;};The postmaster’s side of the hand-off is BackgroundWorkerStateChange, run
in response to the signal. It is written defensively — it assumes shared
memory may be corrupt and must still not crash the postmaster — and uses a
read barrier so it never sees in_use before the slot contents that a
backend wrote under its write barrier:
// BackgroundWorkerStateChange — src/backend/postmaster/bgworker.cfor (slotno = 0; slotno < max_worker_processes; ++slotno){ BackgroundWorkerSlot *slot = &BackgroundWorkerData->slot[slotno]; if (!slot->in_use) continue; pg_read_barrier(); /* pair with the registrant's write barrier */ rw = FindRegisteredWorkerBySlotNumber(slotno); if (rw != NULL) { if (slot->terminate && !rw->rw_terminate) /* backend asked to kill */ { rw->rw_terminate = true; if (rw->rw_pid != 0) kill(rw->rw_pid, SIGTERM); else ReportBackgroundWorkerPID(rw); } continue; } /* ... newly-registered: copy strings paranoidly (ascii_safe_strlcpy), copy fixed fields, push onto BackgroundWorkerList ... */}Because the postmaster cannot take a lock, the parallel-worker accounting is
split into two counters that never need to be read together atomically:
parallel_register_count (bumped by backends under the lock) and
parallel_terminate_count (bumped only by the lock-free postmaster). The live
count is their difference, and the subtraction is correct even across uint32
wraparound.
flowchart LR FREE["slot free<br/>in_use=false"] -->|"backend: memcpy worker,<br/>generation++, write barrier,<br/>in_use=true"| CLAIMED["claimed<br/>in_use=true, pid=InvalidPid"] CLAIMED -->|"postmaster: StateChange<br/>copies to private list"| KNOWN["registered<br/>in postmaster list"] KNOWN -->|"maybe_start_bgworkers:<br/>fork, ReportBackgroundWorkerPID"| RUN["running<br/>pid > 0"] RUN -->|"exit code 0 OR never-restart<br/>OR terminate flag"| FORGET["ForgetBackgroundWorker<br/>in_use=false again"] RUN -->|"exit code 1,<br/>restart interval elapsed"| KNOWN CLAIMED -->|"backend sets terminate<br/>before start"| FORGET FORGET --> FREE
Lifecycle: launch, crash, restart, reap
Section titled “Lifecycle: launch, crash, restart, reap”The postmaster launches eligible workers from its main loop via
maybe_start_bgworkers → StartBackgroundWorker, which forks through
postmaster_child_launch(B_BG_WORKER, …, &rw->rw_worker, …). The forked
child runs BackgroundWorkerMain, which copies the registration out of the
inherited postmaster memory, frees PostmasterContext, installs signal
handlers, calls InitProcess + BaseInit, resolves the entry point, and
finally invokes the user function. The restart contract is encoded in the
exit code, interpreted in ReportBackgroundWorkerExit: exit 0 or
BGW_NEVER_RESTART ⇒ forget the worker (free the slot); exit 1 ⇒ leave it
registered so maybe_start_bgworkers relaunches it after bgw_restart_time
seconds have elapsed since rw_crashed_at.
DSM and DSA: the shared-state companions
Section titled “DSM and DSA: the shared-state companions”A worker that needs to share variable-size state with its launcher attaches a
dynamic shared memory segment whose handle was passed as bgw_main_arg.
dsm_create carves a segment (preferring the preallocated main-region slots,
falling back to OS-level segments), reference-counts it in a control array,
and returns a dsm_segment * whose handle is a process-portable name:
// dsm_create (excerpt) — src/backend/storage/ipc/dsm.cseg = dsm_create_descriptor();/* try main shared-memory region first, else create an OS segment: */seg->handle = pg_prng_uint32(&pg_global_prng_state) << 1; /* even handles *//* ... dsm_impl_op(DSM_OP_CREATE, ...) ... */dsm_control->item[i].handle = seg->handle;dsm_control->item[i].refcnt = 2; /* refcnt 1 == moribund, so start at 2 */For many small allocations in shared memory, raw segments are too coarse, so
dsa.c builds a shared heap on top: dsa_create makes a DSM segment, pins
it, and lays a control object plus a free-page manager over it; allocations
return a dsa_pointer (a {segment-index, offset} pseudo-pointer) that any
attached backend translates to a local address with dsa_get_address:
// dsa_get_address — src/backend/utils/mmgr/dsa.cindex = DSA_EXTRACT_SEGMENT_NUMBER(dp);offset = DSA_EXTRACT_OFFSET(dp);if (unlikely(area->segment_maps[index].mapped_address == NULL)) get_segment_by_index(area, index); /* map it in on demand */return area->segment_maps[index].mapped_address + offset;The pseudo-pointer indirection is what makes DSA pointers shareable: a
dsa_pointer means the same logical location in every backend even though the
underlying segment is mapped at a different virtual address in each. (For the
broader IPC story — shm_mq, shm_toc, the main shared-memory region — see
the cross-referenced postgres-shared-memory-ipc.md.)
Source Walkthrough
Section titled “Source Walkthrough”Symbols below are the canonical anchors; grep for them. They are grouped by call-flow. Line numbers live only in the position-hint table at the end.
Shared-memory layout and initialization
Section titled “Shared-memory layout and initialization”BackgroundWorkerSlot— the per-worker shared slot:in_use,terminate,pid,generation, embeddedworker. The whole lockless protocol is a contract over the ordering of writes to these five fields.BackgroundWorkerArray— the shared header:total_slots,parallel_register_count,parallel_terminate_count, and theslot[FLEXIBLE_ARRAY_MEMBER]. One instance, named"Background Worker Data"in the shmem hash, pointed at by the staticBackgroundWorkerData.BackgroundWorkerShmemSize— sizes the array asoffsetof(BackgroundWorkerArray, slot) + max_worker_processes * sizeof(BackgroundWorkerSlot). This is whymax_worker_processesis a hard cap requiring a restart to change.BackgroundWorkerShmemInit— at boot, copies the postmaster-privateBackgroundWorkerListinto the first N slots (marking themin_use, recordingrw->rw_shmem_slot), and marks the rest free. Guarded by!IsUnderPostmasterso only the postmaster initializes.RegisteredBgWorker(inbgworker_internals.h) — the postmaster-private mirror of a slot:rw_worker,rw_pid,rw_crashed_at,rw_shmem_slot,rw_terminate, and therw_lnodedlist link.
Registration (the two entry doors)
Section titled “Registration (the two entry doors)”RegisterBackgroundWorker— static path. Rejects calls outside the postmaster (tolerating theprocess_shared_preload_libraries_in_progresscase for sloppy_PG_inits), rejects calls after shmem init, enforces thenumworkers > max_worker_processescap, and pushes ontoBackgroundWorkerList. Rejects a non-zerobgw_notify_pid— only dynamic workers may request notification.RegisterDynamicBackgroundWorker— dynamic path. Returns false if not under postmaster. TakesBackgroundWorkerLockexclusively, applies the parallel-class admission check, scans for!slot->in_use, fills the slot,generation++,pg_write_barrier(),slot->in_use = true, releases the lock, sendsPMSIGNAL_BACKGROUND_WORKER_CHANGE, and optionally returns aBackgroundWorkerHandle.SanityCheckBackgroundWorker— shared validator: requiresBGWORKER_SHMEM_ACCESS; forbids a DB-connected worker from starting atBgWorkerStart_PostmasterStart; boundsbgw_restart_time; forbids a parallel worker from being restartable (the counter accounting cannot survive a parallel worker through crash-restart); defaultsbgw_typetobgw_name.
Postmaster-side state machine
Section titled “Postmaster-side state machine”BackgroundWorkerStateChange— the postmaster’s reaction to the change signal. Validatestotal_slots, scans slots with apg_read_barrier(), propagatesterminateto running workers viaSIGTERM, and copies brand-new registrations into the private list usingascii_safe_strlcpy(paranoid against non-NUL-terminated corrupt strings).allow_new_workers=false(during shutdown) forces every pending slot toterminate.FindRegisteredWorkerBySlotNumber— linear scan ofBackgroundWorkerListmapping a slot number back to its privateRegisteredBgWorker.maybe_start_bgworkers(inpostmaster.c) — the launch loop. Skips running workers, forgetsterminated ones, honorsbgw_restart_timeby comparingrw_crashed_atagainst now, gates onbgworker_should_start_now, and caps each pass atMAX_BGWORKERS_TO_LAUNCH(100) so a flood of registrations cannot starve the postmaster’s other duties.bgworker_should_start_now(inpostmaster.c) — maps the currentpmStateto whichbgw_start_timevalues are eligible, implementing the “PostmasterStart < ConsistentState < RecoveryFinished” ordering via fall-through.StartBackgroundWorker(inpostmaster.c) — assigns a child slot, forks viapostmaster_child_launch(B_BG_WORKER, …), recordsrw_pid, callsReportBackgroundWorkerPID. A fork failure is treated like a crash (rw_crashed_at = now) so the postmaster backs off before retrying.
Worker-side entry and connection
Section titled “Worker-side entry and connection”BackgroundWorkerMain— the child’smain. Copies the startupBackgroundWorkerintoTopMemoryContext, deletesPostmasterContext, setsMyBgworkerEntry/MyBackendType = B_BG_WORKER, installs the signal handlers (notebgworker_diefor SIGTERM), sets up thesigsetjmperror recovery, callsInitProcess+BaseInit, resolves the entry point withLookupBackgroundWorkerFunction, and calls it. Return ⇒proc_exit(0)(no restart).LookupBackgroundWorkerFunction— resolves(bgw_library_name, bgw_function_name)to an address. For"postgres"it searches theInternalBGWorkers[]table (ParallelWorkerMain,ApplyLauncherMain,ApplyWorkerMain,ParallelApplyWorkerMain,TablesyncWorkerMain); otherwiseload_external_function. This is the name-not-pointer mechanism that survivesEXEC_BACKEND.BackgroundWorkerInitializeConnection/…ByOid— a DB-connected worker calls one of these to runInitPostgresand flip fromInitProcessingtoNormalProcessing. TheBGWORKER_BYPASS_ALLOWCONN/…_ROLELOGINCHECKflags map toINIT_PG_OVERRIDE_*.bgworker_die— the SIGTERM handler:ereport(FATAL, …)naming thebgw_type.BackgroundWorkerBlockSignals/…Unblockwrapsigprocmaskfor critical sections.
Handle-based control (the requesting backend’s API)
Section titled “Handle-based control (the requesting backend’s API)”GetBackgroundWorkerPid— takesBackgroundWorkerLockshared, compareshandle->generationagainstslot->generation(the ABA guard), and returnsBGWH_STARTED/_NOT_YET_STARTED/_STOPPEDbased onslot->pid(>0 /InvalidPid/ 0).WaitForBackgroundWorkerStartup/…Shutdown— latch-wait loops aroundGetBackgroundWorkerPidthat also watchWL_POSTMASTER_DEATHand returnBGWH_POSTMASTER_DIED. They require the caller to have set itself asbgw_notify_pidso the postmaster’sSIGUSR1wakes the latch.TerminateBackgroundWorker— setsslot->terminate = trueunder the lock (only if generation still matches) and signals the postmaster. Safe to call whether or not the worker is still alive.
Reaping and crash recovery
Section titled “Reaping and crash recovery”ReportBackgroundWorkerPID— postmaster writesslot->pidand SIGUSR1s thebgw_notify_pid.ReportBackgroundWorkerExit— on worker exit, writesslot->pid, and ifrw_terminateorbgw_restart_time == BGW_NEVER_RESTART, callsForgetBackgroundWorkerbefore notifying the waiter (narrowing the slot- reuse race).ForgetBackgroundWorker— bumpsparallel_terminate_countfor parallel workers,pg_memory_barrier(), frees the slot (in_use=false), unlinks andpfrees the private entry.ResetBackgroundWorkerCrashTimes— after a server-wide crash-restart cycle: forgetsBGW_NEVER_RESTARTworkers, and zeroesrw_crashed_at/rw_pid/rw_notify_pidon the rest so they relaunch immediately. Asserts no parallel worker reaches the restart branch.ForgetUnstartedBackgroundWorkers— during normal shutdown, zaps not-yet-started workers that have a waiter and notifies the waiter.BackgroundWorkerStopNotifications— clearsbgw_notify_pidfor a dying backend so the postmaster stops trying to signal it.
DSM / DSA companions
Section titled “DSM / DSA companions”dsm_create/dsm_attach/dsm_detach— create/attach/detach a raw segment; reference-counted indsm_control_item.refcnt(refcnt 1 == moribund, so live segments start at 2).dsm_segment_handleyields the process-portable name used as abgw_main_arg.dsm_pin_segment/dsm_pin_mapping— extend a segment’s lifetime past the creating resource owner / backend; DSA pins its backing segments so an area outlives any single attached backend.dsa_create_ext/dsa_attach/dsa_attach_in_place— build or join a shared heap.dsa_get_handleyields adsa_handleto ship to a worker.dsa_allocate_extended/dsa_freeare the heap ops;dsa_get_addresstranslates adsa_pointerto a local address, mapping segments in on demand.
Position hints (as of 2026-06-05, REL_18 273fe94)
Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”| Symbol | File | Line |
|---|---|---|
BackgroundWorkerSlot | src/backend/postmaster/bgworker.c | 74 |
BackgroundWorkerArray | src/backend/postmaster/bgworker.c | 94 |
BackgroundWorkerHandle (struct) | src/backend/postmaster/bgworker.c | 102 |
InternalBGWorkers[] | src/backend/postmaster/bgworker.c | 114 |
BackgroundWorkerShmemSize | src/backend/postmaster/bgworker.c | 145 |
BackgroundWorkerShmemInit | src/backend/postmaster/bgworker.c | 161 |
FindRegisteredWorkerBySlotNumber | src/backend/postmaster/bgworker.c | 220 |
BackgroundWorkerStateChange | src/backend/postmaster/bgworker.c | 245 |
ForgetBackgroundWorker | src/backend/postmaster/bgworker.c | 428 |
ReportBackgroundWorkerPID | src/backend/postmaster/bgworker.c | 460 |
ReportBackgroundWorkerExit | src/backend/postmaster/bgworker.c | 482 |
BackgroundWorkerStopNotifications | src/backend/postmaster/bgworker.c | 513 |
ForgetUnstartedBackgroundWorkers | src/backend/postmaster/bgworker.c | 540 |
ResetBackgroundWorkerCrashTimes | src/backend/postmaster/bgworker.c | 578 |
SanityCheckBackgroundWorker | src/backend/postmaster/bgworker.c | 631 |
bgworker_die | src/backend/postmaster/bgworker.c | 703 |
BackgroundWorkerMain | src/backend/postmaster/bgworker.c | 717 |
BackgroundWorkerInitializeConnection | src/backend/postmaster/bgworker.c | 852 |
BackgroundWorkerInitializeConnectionByOid | src/backend/postmaster/bgworker.c | 886 |
RegisterBackgroundWorker | src/backend/postmaster/bgworker.c | 939 |
RegisterDynamicBackgroundWorker | src/backend/postmaster/bgworker.c | 1045 |
GetBackgroundWorkerPid | src/backend/postmaster/bgworker.c | 1157 |
WaitForBackgroundWorkerStartup | src/backend/postmaster/bgworker.c | 1212 |
WaitForBackgroundWorkerShutdown | src/backend/postmaster/bgworker.c | 1257 |
TerminateBackgroundWorker | src/backend/postmaster/bgworker.c | 1296 |
LookupBackgroundWorkerFunction | src/backend/postmaster/bgworker.c | 1337 |
GetBackgroundWorkerTypeByPid | src/backend/postmaster/bgworker.c | 1371 |
BackgroundWorker (struct) | src/include/postmaster/bgworker.h | 89 |
RegisteredBgWorker | src/include/postmaster/bgworker_internals.h | 32 |
StartBackgroundWorker | src/backend/postmaster/postmaster.c | 4105 |
bgworker_should_start_now | src/backend/postmaster/postmaster.c | 4166 |
maybe_start_bgworkers | src/backend/postmaster/postmaster.c | 4213 |
dsm_create | src/backend/storage/ipc/dsm.c | 516 |
dsm_attach | src/backend/storage/ipc/dsm.c | 665 |
dsm_detach | src/backend/storage/ipc/dsm.c | 803 |
dsm_pin_segment | src/backend/storage/ipc/dsm.c | 955 |
dsm_segment_handle | src/backend/storage/ipc/dsm.c | 1123 |
dsa_create_ext | src/backend/utils/mmgr/dsa.c | 421 |
dsa_get_handle | src/backend/utils/mmgr/dsa.c | 498 |
dsa_attach | src/backend/utils/mmgr/dsa.c | 510 |
dsa_allocate_extended | src/backend/utils/mmgr/dsa.c | 671 |
dsa_get_address | src/backend/utils/mmgr/dsa.c | 942 |
LaunchParallelWorkers (registration block) | src/backend/access/transam/parallel.c | 601 |
logicalrep_worker_launch (registration block) | src/backend/replication/logical/launcher.c | 469 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”Built on /data/hgryoo/references/postgres at REL_18_STABLE, commit
273fe94 (PG 18.x). Spot-checks performed:
- Public API surface.
grep -ninbgworker.hconfirms exactly the documented exports:RegisterBackgroundWorker,RegisterDynamicBackgroundWorker,GetBackgroundWorkerPid,WaitForBackgroundWorkerStartup,WaitForBackgroundWorkerShutdown,GetBackgroundWorkerTypeByPid,TerminateBackgroundWorker,BackgroundWorkerInitializeConnection[ByOid],BackgroundWorkerBlockSignals/…Unblock, plus theMyBgworkerEntryGUC-DLLIMPORT. The flag macrosBGWORKER_SHMEM_ACCESS(0x0001),BGWORKER_BACKEND_DATABASE_CONNECTION(0x0002),BGWORKER_CLASS_PARALLEL(0x0010), and the connection-bypass flags (0x0001/0x0002) are present as quoted. - Lockless protocol invariants. The
pg_write_barrier()immediately beforeslot->in_use = trueinRegisterDynamicBackgroundWorker, and the matchingpg_read_barrier()after the!slot->in_useskip inBackgroundWorkerStateChange, are both present at the quoted sites — the write/read barrier pair is the load-bearing detail and was verified directly in the source, not inferred. - Counter accounting.
parallel_register_countis incremented only underBackgroundWorkerLock(inRegisterDynamicBackgroundWorker);parallel_terminate_countis incremented only in postmaster-context functions (BackgroundWorkerStateChange,ForgetBackgroundWorker) with no lock — matching the struct comment that the postmaster must stay lockless. InternalBGWorkers[]membership. The five internal entry points (ParallelWorkerMain,ApplyLauncherMain,ApplyWorkerMain,ParallelApplyWorkerMain,TablesyncWorkerMain) are exactly those in the array, confirming the parallel-query and logical-replication consumers named in §“PostgreSQL’s Approach”.- Consumer registration blocks. The parallel-query block in
parallel.c(bgw_function_name = "ParallelWorkerMain",BGWORKER_CLASS_PARALLEL,BGW_NEVER_RESTART,bgw_main_arg = dsm_segment_handle(pcxt->seg)) and the logical-launcher block inlauncher.c(ApplyWorkerMain/ParallelApplyWorkerMain/TablesyncWorkerMain,BGW_NEVER_RESTART,bgw_main_arg = Int32GetDatum(slot)) were read in full and quoted faithfully in §“Beyond” below. - Scope guard. No
contrib/code is asserted as core behavior;worker_spiis named only as an illustrative example. No PG19-only symbols appear — every symbol in the position-hint table was located in the REL_18 tree at the listed line.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”The three in-tree consumers, side by side
Section titled “The three in-tree consumers, side by side”The framework’s value is best seen through its callers, who all reduce to
“fill a BackgroundWorker, register it, optionally hold a handle.” Their
differences are entirely in the flags and policy fields.
Parallel query (parallel.c, LaunchParallelWorkers) is the most
demanding consumer: short-lived, fan-out, never restarted, capped against
max_parallel_workers. It passes the gather DSM segment’s handle as the
worker’s argument so the worker can attach to the leader’s tuple queues and
plan:
// LaunchParallelWorkers (registration) — src/backend/access/transam/parallel.cworker.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION | BGWORKER_CLASS_PARALLEL;worker.bgw_start_time = BgWorkerStart_ConsistentState;worker.bgw_restart_time = BGW_NEVER_RESTART;sprintf(worker.bgw_library_name, "postgres");sprintf(worker.bgw_function_name, "ParallelWorkerMain");worker.bgw_main_arg = UInt32GetDatum(dsm_segment_handle(pcxt->seg));worker.bgw_notify_pid = MyProcPid;/* ... loop: RegisterDynamicBackgroundWorker(&worker, &pcxt->worker[i].bgwhandle) ... */The leader tolerates partial success: if registration fails (slots
exhausted) it simply runs with fewer workers. The BGWORKER_CLASS_PARALLEL
flag routes the worker through the dedicated register/terminate counters so a
parallel query can never push the system past max_parallel_workers even
though those workers also count against max_worker_processes. See
postgres-parallel-query.md for the leader/worker tuple-queue and shm_toc
machinery built on top of this registration.
Logical replication apply (launcher.c, logicalrep_worker_launch) is
the long-lived, DB-connected consumer. Each apply / parallel-apply / tablesync
worker is registered dynamically, never auto-restarted by the framework
(BGW_NEVER_RESTART — the launcher itself owns restart policy via its own
worker slots), and carries its logical-rep slot index as the argument:
// logicalrep_worker_launch (registration) — src/backend/replication/logical/launcher.cbgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;snprintf(bgw.bgw_library_name, MAXPGPATH, "postgres");/* function name is ApplyWorkerMain / ParallelApplyWorkerMain / TablesyncWorkerMain */bgw.bgw_restart_time = BGW_NEVER_RESTART;bgw.bgw_notify_pid = MyProcPid;bgw.bgw_main_arg = Int32GetDatum(slot);if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle)) { /* clean up slot */ }The apply launcher itself, by contrast, is registered statically with
RegisterBackgroundWorker and a 5-second bgw_restart_time, so the framework
does resurrect it if it dies — the one place the two registration paths and
two restart policies appear together. See postgres-logical-replication-apply.md.
Third-party extensions are the original motivation: an extension lists
itself in shared_preload_libraries, and its _PG_init calls
RegisterBackgroundWorker with bgw_library_name set to its own .so and
bgw_function_name to an exported entry point — which
LookupBackgroundWorkerFunction resolves via load_external_function. The
in-tree worker_spi contrib module is the canonical worked example (named
here only as illustration; contrib/ is out of scope as core behavior). An
extension can equally register dynamically from a SQL-callable function, the
pattern used by job-scheduler extensions that spin up a worker per scheduled
task and hold its handle to monitor completion.
flowchart TB
subgraph STATIC["Static — shared_preload_libraries only"]
EXT["extension _PG_init"] --> RBW["RegisterBackgroundWorker"]
LAUNCH["apply launcher (5s restart)"] --> RBW
RBW --> PLIST["BackgroundWorkerList (private)"]
end
subgraph DYNAMIC["Dynamic — from any backend at runtime"]
PQ["parallel leader<br/>BGWORKER_CLASS_PARALLEL"] --> RDBW["RegisterDynamicBackgroundWorker"]
LR["logicalrep_worker_launch<br/>apply / tablesync"] --> RDBW
SQLFN["extension SQL function"] --> RDBW
RDBW --> SLOT["claim BackgroundWorkerSlot"]
end
PLIST --> PM["postmaster: maybe_start_bgworkers → fork"]
SLOT -->|"signal"| PM
How other engines solve the same problem
Section titled “How other engines solve the same problem”- Oracle uses a fixed pantheon of named background processes (PMON, SMON,
DBWn, LGWR, CKPT, ARCn, plus job-queue
Jnnnand parallelPnnnslaves). The parallel slaves and job-queue processes are the closest analogues to PostgreSQL dynamic bgworkers — pooled, assigned to transient work, returned — but the population is managed by Oracle-internal resource managers, not exposed as a pluggable C API for arbitrary user code. There is no “register your own supervised process” extension surface comparable toRegisterBackgroundWorker. - SQL Server runs a user-mode cooperative scheduler (SQLOS) with worker threads bound to schedulers bound to logical CPUs; parallelism is thread-based within one process. The fault-isolation tradeoff is the inverse of PostgreSQL’s: cheaper context switches and shared memory by default, but no process-level blast-radius containment, and no notion of an externally-supplied supervised process.
- MySQL/InnoDB has a small fixed set of background threads (purge, page-cleaner, master, I/O) inside the server process; extensibility is via plugins/components that run on the server’s own threads, not via spawning new supervised processes. There is no DSM-handle-passing convention because threads share the heap.
PostgreSQL is unusual in exposing its process supervisor as a documented, stable C ABI that third parties target — a direct consequence of the process-per-backend model (a new unit of work is naturally a new process) and the extensibility ethos traced to the original POSTGRES design papers (Stonebraker & Rowe 1986). The cost is the DSM/DSA apparatus: because workers do not share a heap, every byte of shared state must be explicitly placed in a segment and addressed by handle or pseudo-pointer.
Research frontiers
Section titled “Research frontiers”- Per-core allocator concurrency.
dsa.c’s own header comment flags the current single-lock-per-size-class design as a bottleneck: “Per-core pools to increase concurrency and strategies for reducing the resulting fragmentation are areas for future research.” As core counts climb, the DSA allocator that backs parallel-query shared state is a candidate for the same NUMA-aware, sharded-freelist treatment that modern systems-research allocators (and the Scalable Lock Manager line of PG work, 2013) apply to contended shared structures. - Worker pooling vs. fork-per-operation. Each parallel query currently
forks fresh workers and tears them down; for short queries on many-core
machines the fork/exec and
InitProcesscost is non-trivial. A persistent worker pool (reusing processes across operations, the model SQL Server’s SQLOS embodies) is a recurring design discussion; the framework’s slot-and- handle protocol is largely pool-ready, but connection/database affinity and catalog-cache reset semantics make safe reuse subtle. - Admission control across worker classes. Today parallel workers have a
dedicated counter, but apply workers, autovacuum workers, and extension
workers all draw from the same
max_worker_processespool with no global prioritization. Unifying these under a single resource-governor — the direction Architecture of a Database System sketches for admission control — would let an operator reason about worker budgets holistically rather than per-subsystem.
Sources
Section titled “Sources”- Source tree.
/data/hgryoo/references/postgres@REL_18_STABLE(commit273fe94, PG 18.x):src/backend/postmaster/bgworker.c— the framework itself.src/include/postmaster/bgworker.h,src/include/postmaster/bgworker_internals.h— public + internal API.src/backend/postmaster/postmaster.c—maybe_start_bgworkers,StartBackgroundWorker,bgworker_should_start_now.src/backend/storage/ipc/dsm.c— dynamic shared memory segments.src/backend/utils/mmgr/dsa.c— shared-heap allocator over DSM.src/backend/access/transam/parallel.c— parallel-query consumer.src/backend/replication/logical/launcher.c— logical-apply consumer.
- Cross-referenced KB docs (defer adjacent mechanism to these, do not
duplicate):
postgres-postmaster.md— the supervisor,pmState, child-slot management, server-wide crash-restart.postgres-shared-memory-ipc.md— the main shared-memory region,shm_mq,shm_toc, and the broader DSM/DSA usage story.postgres-parallel-query.md— how parallel workers use the registration above plus tuple queues andshm_toc.postgres-logical-replication-apply.md— apply/tablesync worker lifecycle and the statically-registered launcher.postgres-aux-processes.md,postgres-backend-lifecycle.md— sibling process-model docs.
- Theory anchors (
knowledge/research/dbms-general/,.omc/plans/postgres-paper-bibliography.md):- Hellerstein, Stonebraker & Hamilton, Architecture of a Database System (2007), §2 “Process Models” and §“Admission Control”.
- Silberschatz, Korth & Sudarshan, Database System Concepts (7e), §20 “Database-System Architectures” (process monitor, shared-memory model).
- Stonebraker & Rowe, “The Design of POSTGRES” (1986) — the extensibility ethos behind a pluggable worker ABI.
- Scalable Lock Manager (2013) — contended-shared-structure scalability, relevant to the DSA per-core-pool frontier.