Skip to content

PostgreSQL Background Workers — The Dynamic Worker Framework

Contents:

A database server is, at bottom, a task-dispatch system: a long-lived controlling entity that spawns, supervises, and reaps subordinate units of execution which do the actual work — running queries, flushing buffers, vacuuming dead tuples, shipping WAL. The architectural question every engine must answer is what is the unit of execution, and who owns its lifecycle? Three properties define the design space.

  1. Process vs. thread. Is each subordinate unit an OS process with its own address space, or a thread inside a shared one? Processes give fault isolation (a crashing worker cannot scribble over another’s stack) at the cost of expensive context switches and the need for an explicit shared memory region for anything that must be communicated. Threads give cheap sharing at the cost of fate-sharing: one corrupt pointer takes down the server. Architecture of a Database System (Hellerstein, Stonebraker & Hamilton 2007, §2 “Process Models”) catalogs the three canonical models — process-per-connection, thread-per-connection, and process/thread pool — and observes that the choice is “one of the most fundamental” an engine makes, because it pervades every later decision about caching, locking, and admission control.

  2. Static vs. dynamic worker population. Are the subordinate processes a fixed cohort decided at startup (a thread pool sized once), or can the population grow and shrink in response to load? A parallel query that fans out across eight cores for two seconds and then collapses back to one needs workers spun up and torn down on a sub-second cadence; a background vacuum launcher needs exactly one long-lived helper. A general framework must serve both: register-once-at-boot and register-on-demand-at-runtime.

  3. Supervision and restart policy. When a worker dies, who notices, and what happens? A supervisor that restarts crashed helpers gives self-healing; one that forgets them on clean exit avoids zombie accumulation. The policy must be encoded per-worker, because a parallel worker that finished its slice should never be restarted, whereas a logical-replication apply worker whose connection dropped should be.

Database System Concepts (Silberschatz, Korth & Sudarshan, 7e, §20 “Database-System Architectures”) frames the server as a collection of cooperating processes communicating through shared memory, and notes that the “process monitor” — the entity that detects failures and triggers recovery — is the linchpin of availability. PostgreSQL’s postmaster is exactly this process monitor, and the background worker framework is the generalized, pluggable extension of its supervision logic: it lets the postmaster fork and reap processes whose code it does not itself contain (extensions, parallel workers, apply workers) under the same restart and crash-recovery umbrella that governs the built-in auxiliary processes.

The crucial PostgreSQL-specific constraint that shapes everything below is a reliability invariant on the postmaster: the postmaster must never take a lock — not even a spinlock — on shared memory it shares with backends, because a backend that corrupts shared memory (a wild write, a bug) could otherwise wedge or crash the postmaster, and a dead postmaster means no crash recovery for anybody. The entire background-worker shared-memory protocol is therefore designed to be lockless from the postmaster’s side: backends coordinate among themselves with a conventional LWLock, but they hand off slots to the postmaster through a single carefully-ordered flag (in_use) plus memory barriers. This is the architectural signature of the whole module.

This section names the engineering patterns that recur across engines when they build a supervised-worker framework, so PostgreSQL’s choices read as selections within a shared space.

Almost every process-model engine separates a supervisor (Oracle’s PMON/SMON-era monitors, SQL Server’s SQLOS scheduler, PostgreSQL’s postmaster) from workers that do the user-visible work. The supervisor owns fork/exec (or thread spawn), holds the master list of who is alive, catches SIGCHLD/exit events, and decides on restart. Workers report their status back through a channel the supervisor can read cheaply and safely.

To communicate “please start a worker like this” from a worker to the supervisor, engines use a fixed-size array of descriptor slots in shared memory. Fixed-size (not a linked list) because shared memory is allocated once at startup and the supervisor must be able to scan it without following pointers it cannot trust. Each slot carries the worker’s configuration plus a small state machine (free → claimed → running → exited). The slot count is a hard cap; in PostgreSQL it is the max_worker_processes GUC.

Generation counters to defeat the ABA problem

Section titled “Generation counters to defeat the ABA problem”

A subtle hazard: backend A registers a worker, gets a handle pointing at slot 5, the worker finishes, slot 5 is recycled for backend B’s brand-new worker, and now A’s stale handle appears to point at a live worker that is not its own. The standard defense is a generation counter (a.k.a. a tag or epoch) stored alongside the slot and copied into the handle. Every reuse bumps the counter; a handle is only valid if its generation matches the slot’s. This is the classic ABA-problem fix from lock-free programming, applied to slot reuse. PostgreSQL’s BackgroundWorkerHandle is {slot, generation} for exactly this reason.

When the supervisor forks a worker and the worker must call a function the supervisor was told about, you cannot pass a raw function pointer if there is any chance the address differs between processes — and under EXEC_BACKEND (Windows, or --enable-exec-backend debug builds) the child is a fresh exec, so ASLR can place the same function at a different address. The portable pattern is to pass (library_name, function_name) strings and have the worker resolve them locally via the dynamic loader. PostgreSQL’s bgw_library_name / bgw_function_name pair and LookupBackgroundWorkerFunction implement precisely this.

Dynamic shared memory for variable-size shared state

Section titled “Dynamic shared memory for variable-size shared state”

The fixed shared-memory region is sized at boot, but a parallel query needs a chunk of shared memory sized to this query’s tuple-queue and instrumentation needs — unknowable at boot. Engines answer with a dynamic shared memory facility: per-operation segments created on demand, named by a handle that can be shipped to another process, reference-counted so the last detacher frees them. PostgreSQL layers this as dsm.c (raw segments) → dsa.c (a shared heap inside segments, with dsa_allocate/dsa_free) → shm_mq, shm_toc (message queues and tables-of-contents on top). The background worker framework and DSM are siblings that almost always travel together: you register a worker, then hand it a DSM handle as its bgw_main_arg.

flowchart TB
  subgraph PM["Postmaster (no locks, ever)"]
    LIST["BackgroundWorkerList<br/>private dlist of RegisteredBgWorker"]
    SC["BackgroundWorkerStateChange()<br/>scans slots, copies new regs in"]
    START["maybe_start_bgworkers()<br/>fork eligible workers"]
  end
  subgraph SHM["Shared memory"]
    ARR["BackgroundWorkerArray<br/>slot[max_worker_processes]<br/>in_use / terminate / pid / generation"]
    DSM["DSM segments + DSA areas<br/>per-operation shared state"]
  end
  subgraph BE["Regular backend (may take BackgroundWorkerLock)"]
    REG["RegisterDynamicBackgroundWorker()<br/>claim a free slot"]
    H["BackgroundWorkerHandle<br/>{slot, generation}"]
  end
  REG -->|"LWLock + write barrier,<br/>set in_use=true"| ARR
  REG -->|"PMSIGNAL_BACKGROUND_WORKER_CHANGE"| SC
  SC -->|"read barrier, copy slot"| LIST
  LIST --> START
  START -->|"postmaster_child_launch()<br/>fork"| WORKER["BackgroundWorkerMain()"]
  REG --> H
  H -.->|"GetBackgroundWorkerPid /<br/>WaitFor* / Terminate"| ARR
  WORKER -.->|"dsm_attach(bgw_main_arg)"| DSM
  BE -.->|"dsa_create / dsm_create"| DSM

PostgreSQL exposes a deliberately small public API — five registration/control functions plus one config struct — and hides a careful lockless protocol behind it. We trace the struct, the two registration paths, the slot array, the lifecycle, and the DSM/DSA companions.

Everything a caller wants the postmaster to know about a prospective worker is packed into one fixed-layout struct, because it must be memcpy-able through shared memory and across a fork/exec boundary:

// BackgroundWorker — src/include/postmaster/bgworker.h
typedef struct BackgroundWorker
{
char bgw_name[BGW_MAXLEN];
char bgw_type[BGW_MAXLEN];
int bgw_flags;
BgWorkerStartTime bgw_start_time;
int bgw_restart_time; /* in seconds, or BGW_NEVER_RESTART */
char bgw_library_name[MAXPGPATH];
char bgw_function_name[BGW_MAXLEN];
Datum bgw_main_arg;
char bgw_extra[BGW_EXTRALEN];
pid_t bgw_notify_pid; /* SIGUSR1 this backend on start/stop */
} BackgroundWorker;

Note there are no pointers here except Datum bgw_main_arg, which is by convention a small scalar (often a DSM handle, an OID, or a slot index) — never a heap address, because the worker runs in a different address space. The entry point is named, not pointed at: bgw_library_name + bgw_function_name. bgw_flags is a bitmask of BGWORKER_SHMEM_ACCESS (mandatory), BGWORKER_BACKEND_DATABASE_CONNECTION (worker will call BackgroundWorkerInitializeConnection), and the internal-only BGWORKER_CLASS_PARALLEL. bgw_start_time (one of BgWorkerStart_PostmasterStart / _ConsistentState / _RecoveryFinished) tells the postmaster how far into startup it may launch this worker; a database-connected worker may not start at postmaster start, since the catalogs are not yet available.

Static registration is for workers known at boot. It is callable only from the postmaster itself or from an extension’s _PG_init while shared_preload_libraries is being processed, and it appends to a postmaster-private dlist, not to shared memory — because shared memory does not exist yet:

// RegisterBackgroundWorker — src/backend/postmaster/bgworker.c
void
RegisterBackgroundWorker(BackgroundWorker *worker)
{
RegisteredBgWorker *rw;
static int numworkers = 0;
if (IsUnderPostmaster || !IsPostmasterEnvironment)
{
if (process_shared_preload_libraries_in_progress)
return;
ereport(LOG, ( /* ... must be registered in shared_preload_libraries */ ));
return;
}
if (BackgroundWorkerData != NULL)
elog(ERROR, "cannot register background worker \"%s\" after shmem init", ...);
/* ... SanityCheckBackgroundWorker, numworkers cap ... */
rw->rw_worker = *worker;
rw->rw_pid = 0;
rw->rw_crashed_at = 0;
rw->rw_terminate = false;
dlist_push_head(&BackgroundWorkerList, &rw->rw_lnode);
}

At BackgroundWorkerShmemInit time these private entries are copied 1-to-1 into the shared slot[] array, establishing the correspondence between the postmaster’s private BackgroundWorkerList and the shared BackgroundWorkerArray.

Dynamic registration is for workers requested at runtime by an ordinary backend (a parallel-query leader, the logical-replication launcher, an extension’s SQL function). It must run under the postmaster, takes BackgroundWorkerLock to coordinate with other backends, scans for a free slot, fills it, and — critically — issues a write barrier before setting in_use so the postmaster can never observe a half-initialized slot:

// RegisterDynamicBackgroundWorker — src/backend/postmaster/bgworker.c
LWLockAcquire(BackgroundWorkerLock, LW_EXCLUSIVE);
/* ... parallel-class admission check against max_parallel_workers ... */
for (slotno = 0; slotno < BackgroundWorkerData->total_slots; ++slotno)
{
BackgroundWorkerSlot *slot = &BackgroundWorkerData->slot[slotno];
if (!slot->in_use)
{
memcpy(&slot->worker, worker, sizeof(BackgroundWorker));
slot->pid = InvalidPid; /* indicates not started yet */
slot->generation++;
slot->terminate = false;
generation = slot->generation;
if (parallel)
BackgroundWorkerData->parallel_register_count++;
pg_write_barrier(); /* postmaster must see contents before in_use */
slot->in_use = true;
success = true;
break;
}
}
LWLockRelease(BackgroundWorkerLock);
if (success)
SendPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE);

The backend then signals the postmaster (PMSIGNAL_BACKGROUND_WORKER_CHANGE) and, if it asked for a handle, receives {slot, generation} to query or kill the worker later.

The shared slot carries five fields, and a comment block in bgworker.c spells out the ownership contract: when in_use is false the postmaster ignores the slot and backends own it; once a backend flips in_use to true (after a write barrier) the slot becomes the postmaster’s, and backends may thereafter only set the terminate flag.

// BackgroundWorkerSlot — src/backend/postmaster/bgworker.c
typedef struct BackgroundWorkerSlot
{
bool in_use;
bool terminate;
pid_t pid; /* InvalidPid = not started yet; 0 = dead */
uint64 generation; /* incremented when slot is recycled */
BackgroundWorker worker;
} BackgroundWorkerSlot;

The handle a backend holds is the generation-tagged coordinate:

// BackgroundWorkerHandle — src/backend/postmaster/bgworker.c
struct BackgroundWorkerHandle
{
int slot;
uint64 generation;
};

The postmaster’s side of the hand-off is BackgroundWorkerStateChange, run in response to the signal. It is written defensively — it assumes shared memory may be corrupt and must still not crash the postmaster — and uses a read barrier so it never sees in_use before the slot contents that a backend wrote under its write barrier:

// BackgroundWorkerStateChange — src/backend/postmaster/bgworker.c
for (slotno = 0; slotno < max_worker_processes; ++slotno)
{
BackgroundWorkerSlot *slot = &BackgroundWorkerData->slot[slotno];
if (!slot->in_use)
continue;
pg_read_barrier(); /* pair with the registrant's write barrier */
rw = FindRegisteredWorkerBySlotNumber(slotno);
if (rw != NULL)
{
if (slot->terminate && !rw->rw_terminate) /* backend asked to kill */
{
rw->rw_terminate = true;
if (rw->rw_pid != 0)
kill(rw->rw_pid, SIGTERM);
else
ReportBackgroundWorkerPID(rw);
}
continue;
}
/* ... newly-registered: copy strings paranoidly (ascii_safe_strlcpy),
copy fixed fields, push onto BackgroundWorkerList ... */
}

Because the postmaster cannot take a lock, the parallel-worker accounting is split into two counters that never need to be read together atomically: parallel_register_count (bumped by backends under the lock) and parallel_terminate_count (bumped only by the lock-free postmaster). The live count is their difference, and the subtraction is correct even across uint32 wraparound.

flowchart LR
  FREE["slot free<br/>in_use=false"] -->|"backend: memcpy worker,<br/>generation++, write barrier,<br/>in_use=true"| CLAIMED["claimed<br/>in_use=true, pid=InvalidPid"]
  CLAIMED -->|"postmaster: StateChange<br/>copies to private list"| KNOWN["registered<br/>in postmaster list"]
  KNOWN -->|"maybe_start_bgworkers:<br/>fork, ReportBackgroundWorkerPID"| RUN["running<br/>pid > 0"]
  RUN -->|"exit code 0 OR never-restart<br/>OR terminate flag"| FORGET["ForgetBackgroundWorker<br/>in_use=false again"]
  RUN -->|"exit code 1,<br/>restart interval elapsed"| KNOWN
  CLAIMED -->|"backend sets terminate<br/>before start"| FORGET
  FORGET --> FREE

The postmaster launches eligible workers from its main loop via maybe_start_bgworkersStartBackgroundWorker, which forks through postmaster_child_launch(B_BG_WORKER, …, &rw->rw_worker, …). The forked child runs BackgroundWorkerMain, which copies the registration out of the inherited postmaster memory, frees PostmasterContext, installs signal handlers, calls InitProcess + BaseInit, resolves the entry point, and finally invokes the user function. The restart contract is encoded in the exit code, interpreted in ReportBackgroundWorkerExit: exit 0 or BGW_NEVER_RESTART ⇒ forget the worker (free the slot); exit 1 ⇒ leave it registered so maybe_start_bgworkers relaunches it after bgw_restart_time seconds have elapsed since rw_crashed_at.

A worker that needs to share variable-size state with its launcher attaches a dynamic shared memory segment whose handle was passed as bgw_main_arg. dsm_create carves a segment (preferring the preallocated main-region slots, falling back to OS-level segments), reference-counts it in a control array, and returns a dsm_segment * whose handle is a process-portable name:

// dsm_create (excerpt) — src/backend/storage/ipc/dsm.c
seg = dsm_create_descriptor();
/* try main shared-memory region first, else create an OS segment: */
seg->handle = pg_prng_uint32(&pg_global_prng_state) << 1; /* even handles */
/* ... dsm_impl_op(DSM_OP_CREATE, ...) ... */
dsm_control->item[i].handle = seg->handle;
dsm_control->item[i].refcnt = 2; /* refcnt 1 == moribund, so start at 2 */

For many small allocations in shared memory, raw segments are too coarse, so dsa.c builds a shared heap on top: dsa_create makes a DSM segment, pins it, and lays a control object plus a free-page manager over it; allocations return a dsa_pointer (a {segment-index, offset} pseudo-pointer) that any attached backend translates to a local address with dsa_get_address:

// dsa_get_address — src/backend/utils/mmgr/dsa.c
index = DSA_EXTRACT_SEGMENT_NUMBER(dp);
offset = DSA_EXTRACT_OFFSET(dp);
if (unlikely(area->segment_maps[index].mapped_address == NULL))
get_segment_by_index(area, index); /* map it in on demand */
return area->segment_maps[index].mapped_address + offset;

The pseudo-pointer indirection is what makes DSA pointers shareable: a dsa_pointer means the same logical location in every backend even though the underlying segment is mapped at a different virtual address in each. (For the broader IPC story — shm_mq, shm_toc, the main shared-memory region — see the cross-referenced postgres-shared-memory-ipc.md.)

Symbols below are the canonical anchors; grep for them. They are grouped by call-flow. Line numbers live only in the position-hint table at the end.

  • BackgroundWorkerSlot — the per-worker shared slot: in_use, terminate, pid, generation, embedded worker. The whole lockless protocol is a contract over the ordering of writes to these five fields.
  • BackgroundWorkerArray — the shared header: total_slots, parallel_register_count, parallel_terminate_count, and the slot[FLEXIBLE_ARRAY_MEMBER]. One instance, named "Background Worker Data" in the shmem hash, pointed at by the static BackgroundWorkerData.
  • BackgroundWorkerShmemSize — sizes the array as offsetof(BackgroundWorkerArray, slot) + max_worker_processes * sizeof(BackgroundWorkerSlot). This is why max_worker_processes is a hard cap requiring a restart to change.
  • BackgroundWorkerShmemInit — at boot, copies the postmaster-private BackgroundWorkerList into the first N slots (marking them in_use, recording rw->rw_shmem_slot), and marks the rest free. Guarded by !IsUnderPostmaster so only the postmaster initializes.
  • RegisteredBgWorker (in bgworker_internals.h) — the postmaster-private mirror of a slot: rw_worker, rw_pid, rw_crashed_at, rw_shmem_slot, rw_terminate, and the rw_lnode dlist link.
  • RegisterBackgroundWorker — static path. Rejects calls outside the postmaster (tolerating the process_shared_preload_libraries_in_progress case for sloppy _PG_inits), rejects calls after shmem init, enforces the numworkers > max_worker_processes cap, and pushes onto BackgroundWorkerList. Rejects a non-zero bgw_notify_pid — only dynamic workers may request notification.
  • RegisterDynamicBackgroundWorker — dynamic path. Returns false if not under postmaster. Takes BackgroundWorkerLock exclusively, applies the parallel-class admission check, scans for !slot->in_use, fills the slot, generation++, pg_write_barrier(), slot->in_use = true, releases the lock, sends PMSIGNAL_BACKGROUND_WORKER_CHANGE, and optionally returns a BackgroundWorkerHandle.
  • SanityCheckBackgroundWorker — shared validator: requires BGWORKER_SHMEM_ACCESS; forbids a DB-connected worker from starting at BgWorkerStart_PostmasterStart; bounds bgw_restart_time; forbids a parallel worker from being restartable (the counter accounting cannot survive a parallel worker through crash-restart); defaults bgw_type to bgw_name.
  • BackgroundWorkerStateChange — the postmaster’s reaction to the change signal. Validates total_slots, scans slots with a pg_read_barrier(), propagates terminate to running workers via SIGTERM, and copies brand-new registrations into the private list using ascii_safe_strlcpy (paranoid against non-NUL-terminated corrupt strings). allow_new_workers=false (during shutdown) forces every pending slot to terminate.
  • FindRegisteredWorkerBySlotNumber — linear scan of BackgroundWorkerList mapping a slot number back to its private RegisteredBgWorker.
  • maybe_start_bgworkers (in postmaster.c) — the launch loop. Skips running workers, forgets terminated ones, honors bgw_restart_time by comparing rw_crashed_at against now, gates on bgworker_should_start_now, and caps each pass at MAX_BGWORKERS_TO_LAUNCH (100) so a flood of registrations cannot starve the postmaster’s other duties.
  • bgworker_should_start_now (in postmaster.c) — maps the current pmState to which bgw_start_time values are eligible, implementing the “PostmasterStart < ConsistentState < RecoveryFinished” ordering via fall-through.
  • StartBackgroundWorker (in postmaster.c) — assigns a child slot, forks via postmaster_child_launch(B_BG_WORKER, …), records rw_pid, calls ReportBackgroundWorkerPID. A fork failure is treated like a crash (rw_crashed_at = now) so the postmaster backs off before retrying.
  • BackgroundWorkerMain — the child’s main. Copies the startup BackgroundWorker into TopMemoryContext, deletes PostmasterContext, sets MyBgworkerEntry / MyBackendType = B_BG_WORKER, installs the signal handlers (note bgworker_die for SIGTERM), sets up the sigsetjmp error recovery, calls InitProcess + BaseInit, resolves the entry point with LookupBackgroundWorkerFunction, and calls it. Return ⇒ proc_exit(0) (no restart).
  • LookupBackgroundWorkerFunction — resolves (bgw_library_name, bgw_function_name) to an address. For "postgres" it searches the InternalBGWorkers[] table (ParallelWorkerMain, ApplyLauncherMain, ApplyWorkerMain, ParallelApplyWorkerMain, TablesyncWorkerMain); otherwise load_external_function. This is the name-not-pointer mechanism that survives EXEC_BACKEND.
  • BackgroundWorkerInitializeConnection / …ByOid — a DB-connected worker calls one of these to run InitPostgres and flip from InitProcessing to NormalProcessing. The BGWORKER_BYPASS_ALLOWCONN / …_ROLELOGINCHECK flags map to INIT_PG_OVERRIDE_*.
  • bgworker_die — the SIGTERM handler: ereport(FATAL, …) naming the bgw_type. BackgroundWorkerBlockSignals / …Unblock wrap sigprocmask for critical sections.

Handle-based control (the requesting backend’s API)

Section titled “Handle-based control (the requesting backend’s API)”
  • GetBackgroundWorkerPid — takes BackgroundWorkerLock shared, compares handle->generation against slot->generation (the ABA guard), and returns BGWH_STARTED / _NOT_YET_STARTED / _STOPPED based on slot->pid (>0 / InvalidPid / 0).
  • WaitForBackgroundWorkerStartup / …Shutdown — latch-wait loops around GetBackgroundWorkerPid that also watch WL_POSTMASTER_DEATH and return BGWH_POSTMASTER_DIED. They require the caller to have set itself as bgw_notify_pid so the postmaster’s SIGUSR1 wakes the latch.
  • TerminateBackgroundWorker — sets slot->terminate = true under the lock (only if generation still matches) and signals the postmaster. Safe to call whether or not the worker is still alive.
  • ReportBackgroundWorkerPID — postmaster writes slot->pid and SIGUSR1s the bgw_notify_pid.
  • ReportBackgroundWorkerExit — on worker exit, writes slot->pid, and if rw_terminate or bgw_restart_time == BGW_NEVER_RESTART, calls ForgetBackgroundWorker before notifying the waiter (narrowing the slot- reuse race).
  • ForgetBackgroundWorker — bumps parallel_terminate_count for parallel workers, pg_memory_barrier(), frees the slot (in_use=false), unlinks and pfrees the private entry.
  • ResetBackgroundWorkerCrashTimes — after a server-wide crash-restart cycle: forgets BGW_NEVER_RESTART workers, and zeroes rw_crashed_at / rw_pid / rw_notify_pid on the rest so they relaunch immediately. Asserts no parallel worker reaches the restart branch.
  • ForgetUnstartedBackgroundWorkers — during normal shutdown, zaps not-yet-started workers that have a waiter and notifies the waiter.
  • BackgroundWorkerStopNotifications — clears bgw_notify_pid for a dying backend so the postmaster stops trying to signal it.
  • dsm_create / dsm_attach / dsm_detach — create/attach/detach a raw segment; reference-counted in dsm_control_item.refcnt (refcnt 1 == moribund, so live segments start at 2). dsm_segment_handle yields the process-portable name used as a bgw_main_arg.
  • dsm_pin_segment / dsm_pin_mapping — extend a segment’s lifetime past the creating resource owner / backend; DSA pins its backing segments so an area outlives any single attached backend.
  • dsa_create_ext / dsa_attach / dsa_attach_in_place — build or join a shared heap. dsa_get_handle yields a dsa_handle to ship to a worker. dsa_allocate_extended / dsa_free are the heap ops; dsa_get_address translates a dsa_pointer to a local address, mapping segments in on demand.

Position hints (as of 2026-06-05, REL_18 273fe94)

Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”
SymbolFileLine
BackgroundWorkerSlotsrc/backend/postmaster/bgworker.c74
BackgroundWorkerArraysrc/backend/postmaster/bgworker.c94
BackgroundWorkerHandle (struct)src/backend/postmaster/bgworker.c102
InternalBGWorkers[]src/backend/postmaster/bgworker.c114
BackgroundWorkerShmemSizesrc/backend/postmaster/bgworker.c145
BackgroundWorkerShmemInitsrc/backend/postmaster/bgworker.c161
FindRegisteredWorkerBySlotNumbersrc/backend/postmaster/bgworker.c220
BackgroundWorkerStateChangesrc/backend/postmaster/bgworker.c245
ForgetBackgroundWorkersrc/backend/postmaster/bgworker.c428
ReportBackgroundWorkerPIDsrc/backend/postmaster/bgworker.c460
ReportBackgroundWorkerExitsrc/backend/postmaster/bgworker.c482
BackgroundWorkerStopNotificationssrc/backend/postmaster/bgworker.c513
ForgetUnstartedBackgroundWorkerssrc/backend/postmaster/bgworker.c540
ResetBackgroundWorkerCrashTimessrc/backend/postmaster/bgworker.c578
SanityCheckBackgroundWorkersrc/backend/postmaster/bgworker.c631
bgworker_diesrc/backend/postmaster/bgworker.c703
BackgroundWorkerMainsrc/backend/postmaster/bgworker.c717
BackgroundWorkerInitializeConnectionsrc/backend/postmaster/bgworker.c852
BackgroundWorkerInitializeConnectionByOidsrc/backend/postmaster/bgworker.c886
RegisterBackgroundWorkersrc/backend/postmaster/bgworker.c939
RegisterDynamicBackgroundWorkersrc/backend/postmaster/bgworker.c1045
GetBackgroundWorkerPidsrc/backend/postmaster/bgworker.c1157
WaitForBackgroundWorkerStartupsrc/backend/postmaster/bgworker.c1212
WaitForBackgroundWorkerShutdownsrc/backend/postmaster/bgworker.c1257
TerminateBackgroundWorkersrc/backend/postmaster/bgworker.c1296
LookupBackgroundWorkerFunctionsrc/backend/postmaster/bgworker.c1337
GetBackgroundWorkerTypeByPidsrc/backend/postmaster/bgworker.c1371
BackgroundWorker (struct)src/include/postmaster/bgworker.h89
RegisteredBgWorkersrc/include/postmaster/bgworker_internals.h32
StartBackgroundWorkersrc/backend/postmaster/postmaster.c4105
bgworker_should_start_nowsrc/backend/postmaster/postmaster.c4166
maybe_start_bgworkerssrc/backend/postmaster/postmaster.c4213
dsm_createsrc/backend/storage/ipc/dsm.c516
dsm_attachsrc/backend/storage/ipc/dsm.c665
dsm_detachsrc/backend/storage/ipc/dsm.c803
dsm_pin_segmentsrc/backend/storage/ipc/dsm.c955
dsm_segment_handlesrc/backend/storage/ipc/dsm.c1123
dsa_create_extsrc/backend/utils/mmgr/dsa.c421
dsa_get_handlesrc/backend/utils/mmgr/dsa.c498
dsa_attachsrc/backend/utils/mmgr/dsa.c510
dsa_allocate_extendedsrc/backend/utils/mmgr/dsa.c671
dsa_get_addresssrc/backend/utils/mmgr/dsa.c942
LaunchParallelWorkers (registration block)src/backend/access/transam/parallel.c601
logicalrep_worker_launch (registration block)src/backend/replication/logical/launcher.c469

Built on /data/hgryoo/references/postgres at REL_18_STABLE, commit 273fe94 (PG 18.x). Spot-checks performed:

  • Public API surface. grep -n in bgworker.h confirms exactly the documented exports: RegisterBackgroundWorker, RegisterDynamicBackgroundWorker, GetBackgroundWorkerPid, WaitForBackgroundWorkerStartup, WaitForBackgroundWorkerShutdown, GetBackgroundWorkerTypeByPid, TerminateBackgroundWorker, BackgroundWorkerInitializeConnection[ByOid], BackgroundWorkerBlockSignals / …Unblock, plus the MyBgworkerEntry GUC-DLLIMPORT. The flag macros BGWORKER_SHMEM_ACCESS (0x0001), BGWORKER_BACKEND_DATABASE_CONNECTION (0x0002), BGWORKER_CLASS_PARALLEL (0x0010), and the connection-bypass flags (0x0001/0x0002) are present as quoted.
  • Lockless protocol invariants. The pg_write_barrier() immediately before slot->in_use = true in RegisterDynamicBackgroundWorker, and the matching pg_read_barrier() after the !slot->in_use skip in BackgroundWorkerStateChange, are both present at the quoted sites — the write/read barrier pair is the load-bearing detail and was verified directly in the source, not inferred.
  • Counter accounting. parallel_register_count is incremented only under BackgroundWorkerLock (in RegisterDynamicBackgroundWorker); parallel_terminate_count is incremented only in postmaster-context functions (BackgroundWorkerStateChange, ForgetBackgroundWorker) with no lock — matching the struct comment that the postmaster must stay lockless.
  • InternalBGWorkers[] membership. The five internal entry points (ParallelWorkerMain, ApplyLauncherMain, ApplyWorkerMain, ParallelApplyWorkerMain, TablesyncWorkerMain) are exactly those in the array, confirming the parallel-query and logical-replication consumers named in §“PostgreSQL’s Approach”.
  • Consumer registration blocks. The parallel-query block in parallel.c (bgw_function_name = "ParallelWorkerMain", BGWORKER_CLASS_PARALLEL, BGW_NEVER_RESTART, bgw_main_arg = dsm_segment_handle(pcxt->seg)) and the logical-launcher block in launcher.c (ApplyWorkerMain / ParallelApplyWorkerMain / TablesyncWorkerMain, BGW_NEVER_RESTART, bgw_main_arg = Int32GetDatum(slot)) were read in full and quoted faithfully in §“Beyond” below.
  • Scope guard. No contrib/ code is asserted as core behavior; worker_spi is named only as an illustrative example. No PG19-only symbols appear — every symbol in the position-hint table was located in the REL_18 tree at the listed line.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”

The framework’s value is best seen through its callers, who all reduce to “fill a BackgroundWorker, register it, optionally hold a handle.” Their differences are entirely in the flags and policy fields.

Parallel query (parallel.c, LaunchParallelWorkers) is the most demanding consumer: short-lived, fan-out, never restarted, capped against max_parallel_workers. It passes the gather DSM segment’s handle as the worker’s argument so the worker can attach to the leader’s tuple queues and plan:

// LaunchParallelWorkers (registration) — src/backend/access/transam/parallel.c
worker.bgw_flags =
BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION
| BGWORKER_CLASS_PARALLEL;
worker.bgw_start_time = BgWorkerStart_ConsistentState;
worker.bgw_restart_time = BGW_NEVER_RESTART;
sprintf(worker.bgw_library_name, "postgres");
sprintf(worker.bgw_function_name, "ParallelWorkerMain");
worker.bgw_main_arg = UInt32GetDatum(dsm_segment_handle(pcxt->seg));
worker.bgw_notify_pid = MyProcPid;
/* ... loop: RegisterDynamicBackgroundWorker(&worker, &pcxt->worker[i].bgwhandle) ... */

The leader tolerates partial success: if registration fails (slots exhausted) it simply runs with fewer workers. The BGWORKER_CLASS_PARALLEL flag routes the worker through the dedicated register/terminate counters so a parallel query can never push the system past max_parallel_workers even though those workers also count against max_worker_processes. See postgres-parallel-query.md for the leader/worker tuple-queue and shm_toc machinery built on top of this registration.

Logical replication apply (launcher.c, logicalrep_worker_launch) is the long-lived, DB-connected consumer. Each apply / parallel-apply / tablesync worker is registered dynamically, never auto-restarted by the framework (BGW_NEVER_RESTART — the launcher itself owns restart policy via its own worker slots), and carries its logical-rep slot index as the argument:

// logicalrep_worker_launch (registration) — src/backend/replication/logical/launcher.c
bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
snprintf(bgw.bgw_library_name, MAXPGPATH, "postgres");
/* function name is ApplyWorkerMain / ParallelApplyWorkerMain / TablesyncWorkerMain */
bgw.bgw_restart_time = BGW_NEVER_RESTART;
bgw.bgw_notify_pid = MyProcPid;
bgw.bgw_main_arg = Int32GetDatum(slot);
if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle)) { /* clean up slot */ }

The apply launcher itself, by contrast, is registered statically with RegisterBackgroundWorker and a 5-second bgw_restart_time, so the framework does resurrect it if it dies — the one place the two registration paths and two restart policies appear together. See postgres-logical-replication-apply.md.

Third-party extensions are the original motivation: an extension lists itself in shared_preload_libraries, and its _PG_init calls RegisterBackgroundWorker with bgw_library_name set to its own .so and bgw_function_name to an exported entry point — which LookupBackgroundWorkerFunction resolves via load_external_function. The in-tree worker_spi contrib module is the canonical worked example (named here only as illustration; contrib/ is out of scope as core behavior). An extension can equally register dynamically from a SQL-callable function, the pattern used by job-scheduler extensions that spin up a worker per scheduled task and hold its handle to monitor completion.

flowchart TB
  subgraph STATIC["Static — shared_preload_libraries only"]
    EXT["extension _PG_init"] --> RBW["RegisterBackgroundWorker"]
    LAUNCH["apply launcher (5s restart)"] --> RBW
    RBW --> PLIST["BackgroundWorkerList (private)"]
  end
  subgraph DYNAMIC["Dynamic — from any backend at runtime"]
    PQ["parallel leader<br/>BGWORKER_CLASS_PARALLEL"] --> RDBW["RegisterDynamicBackgroundWorker"]
    LR["logicalrep_worker_launch<br/>apply / tablesync"] --> RDBW
    SQLFN["extension SQL function"] --> RDBW
    RDBW --> SLOT["claim BackgroundWorkerSlot"]
  end
  PLIST --> PM["postmaster: maybe_start_bgworkers → fork"]
  SLOT -->|"signal"| PM
  • Oracle uses a fixed pantheon of named background processes (PMON, SMON, DBWn, LGWR, CKPT, ARCn, plus job-queue Jnnn and parallel Pnnn slaves). The parallel slaves and job-queue processes are the closest analogues to PostgreSQL dynamic bgworkers — pooled, assigned to transient work, returned — but the population is managed by Oracle-internal resource managers, not exposed as a pluggable C API for arbitrary user code. There is no “register your own supervised process” extension surface comparable to RegisterBackgroundWorker.
  • SQL Server runs a user-mode cooperative scheduler (SQLOS) with worker threads bound to schedulers bound to logical CPUs; parallelism is thread-based within one process. The fault-isolation tradeoff is the inverse of PostgreSQL’s: cheaper context switches and shared memory by default, but no process-level blast-radius containment, and no notion of an externally-supplied supervised process.
  • MySQL/InnoDB has a small fixed set of background threads (purge, page-cleaner, master, I/O) inside the server process; extensibility is via plugins/components that run on the server’s own threads, not via spawning new supervised processes. There is no DSM-handle-passing convention because threads share the heap.

PostgreSQL is unusual in exposing its process supervisor as a documented, stable C ABI that third parties target — a direct consequence of the process-per-backend model (a new unit of work is naturally a new process) and the extensibility ethos traced to the original POSTGRES design papers (Stonebraker & Rowe 1986). The cost is the DSM/DSA apparatus: because workers do not share a heap, every byte of shared state must be explicitly placed in a segment and addressed by handle or pseudo-pointer.

  • Per-core allocator concurrency. dsa.c’s own header comment flags the current single-lock-per-size-class design as a bottleneck: “Per-core pools to increase concurrency and strategies for reducing the resulting fragmentation are areas for future research.” As core counts climb, the DSA allocator that backs parallel-query shared state is a candidate for the same NUMA-aware, sharded-freelist treatment that modern systems-research allocators (and the Scalable Lock Manager line of PG work, 2013) apply to contended shared structures.
  • Worker pooling vs. fork-per-operation. Each parallel query currently forks fresh workers and tears them down; for short queries on many-core machines the fork/exec and InitProcess cost is non-trivial. A persistent worker pool (reusing processes across operations, the model SQL Server’s SQLOS embodies) is a recurring design discussion; the framework’s slot-and- handle protocol is largely pool-ready, but connection/database affinity and catalog-cache reset semantics make safe reuse subtle.
  • Admission control across worker classes. Today parallel workers have a dedicated counter, but apply workers, autovacuum workers, and extension workers all draw from the same max_worker_processes pool with no global prioritization. Unifying these under a single resource-governor — the direction Architecture of a Database System sketches for admission control — would let an operator reason about worker budgets holistically rather than per-subsystem.
  • Source tree. /data/hgryoo/references/postgres @ REL_18_STABLE (commit 273fe94, PG 18.x):
    • src/backend/postmaster/bgworker.c — the framework itself.
    • src/include/postmaster/bgworker.h, src/include/postmaster/bgworker_internals.h — public + internal API.
    • src/backend/postmaster/postmaster.cmaybe_start_bgworkers, StartBackgroundWorker, bgworker_should_start_now.
    • src/backend/storage/ipc/dsm.c — dynamic shared memory segments.
    • src/backend/utils/mmgr/dsa.c — shared-heap allocator over DSM.
    • src/backend/access/transam/parallel.c — parallel-query consumer.
    • src/backend/replication/logical/launcher.c — logical-apply consumer.
  • Cross-referenced KB docs (defer adjacent mechanism to these, do not duplicate):
    • postgres-postmaster.md — the supervisor, pmState, child-slot management, server-wide crash-restart.
    • postgres-shared-memory-ipc.md — the main shared-memory region, shm_mq, shm_toc, and the broader DSM/DSA usage story.
    • postgres-parallel-query.md — how parallel workers use the registration above plus tuple queues and shm_toc.
    • postgres-logical-replication-apply.md — apply/tablesync worker lifecycle and the statically-registered launcher.
    • postgres-aux-processes.md, postgres-backend-lifecycle.md — sibling process-model docs.
  • Theory anchors (knowledge/research/dbms-general/, .omc/plans/postgres-paper-bibliography.md):
    • Hellerstein, Stonebraker & Hamilton, Architecture of a Database System (2007), §2 “Process Models” and §“Admission Control”.
    • Silberschatz, Korth & Sudarshan, Database System Concepts (7e), §20 “Database-System Architectures” (process monitor, shared-memory model).
    • Stonebraker & Rowe, “The Design of POSTGRES” (1986) — the extensibility ethos behind a pluggable worker ABI.
    • Scalable Lock Manager (2013) — contended-shared-structure scalability, relevant to the DSA per-core-pool frontier.