PostgreSQL Monitoring & Statistics — Section Overview
Contents:
What this section covers
Section titled “What this section covers”This subcategory is PostgreSQL’s introspection plane: the machinery that
lets a backend report what it is doing right now and lets the engine
accumulate what has happened so that pg_stat_* views, the autovacuum
scheduler, and the planner’s “did this table change enough to re-analyze?”
heuristics have numbers to read. Almost everything here lives under one
source root — src/backend/utils/activity/ — and that physical grouping is
the subcategory boundary.
Three pillars share that home, and the sharp line between them is live state versus accumulated state:
- The cumulative statistics system (
pgstat*.c) — counters that add up over time: tuples inserted/updated/dead, blocks hit/read, vacuum and analyze timestamps, WAL bytes, checkpointer and bgwriter activity, per-IO timings. Since PostgreSQL 15 this is a shared-memory subsystem; it replaced the old dedicated stats-collector auxiliary process that received counter deltas over a UDP socket and owned a private file. That redesign is the defining fact of this subcategory and the reason it exists as its own section rather than a footnote in server-architecture. - Wait events and backend status (
backend_status.c,wait_event.c) — a point-in-time report: each backend publishes “I am running this query, in this state, currently waiting on this event” into its own slot, cheap enough to update on every lock acquisition. This is the live view behindpg_stat_activity. The wait-event taxonomy is code-generated fromwait_event_names.txt. - Progress reporting (
backend_progress.c) — a small per-command progress array (VACUUM,CREATE INDEX,COPY, base backup, …) that a long-running command updates andpg_stat_progress_*views read, with a parallel-leader/worker aggregation path.
Boundaries — what this section hands off:
- The processes that produce the numbers belong elsewhere. The checkpointer (which serializes the stats file at shutdown), the bgwriter, the startup process (which reloads it), and the backend lifecycle are owned by server-architecture. This subcategory owns the counters those processes emit, not the processes.
- The substrate the counters describe. SLRU stats counters live here,
but the SLRU caches themselves are txn-recovery (
postgres-slru.md); IO stats live here, but the async-IO and buffer machinery are storage-engine (postgres-aio.md,postgres-buffer-manager.md). - The shared-memory and DSA machinery the cumulative system is built
on — the fixed segment, DSM/DSA,
dshash, LWLocks — is server-architecture (postgres-shared-memory-ipc.md). This section explains how the stats system uses that substrate, not how the substrate works. - The
pg_stat_*views and SQL functions that surface these numbers are catalog/SQL surface, not analyzed as their own module here; this section stops at the C reporting layer the views read.
The historical arc — stats-collector → shared-memory cumulative stats — is
captured separately in postgres-evolution-statistics.md; this router names
the current (REL_18) shape and defers the version-by-version story there.
The layering
Section titled “The layering”The two module docs split cleanly along the live-vs-accumulated seam. The
cumulative system has a per-backend pending → shared store → disk flow;
the wait/status/progress system is a direct per-PGPROC publish with no
accumulation. Both ride the same utils/activity/ home and the same
shared-memory substrate owned by server-architecture.
flowchart TB
subgraph BACKEND["any backend / aux process"]
ACT["running command<br/>(query, vacuum, copy, ...)"]
PEND["pending stats (process-local)<br/>PgStat_EntryRef cache + have_*_stats"]
LIVE["live self-report<br/>MyBEEntry + MyProc wait slot"]
end
ACT --> PEND
ACT --> LIVE
subgraph CUMUL["postgres-cumulative-stats.md (accumulated view)"]
FLUSH["pgstat_report_stat()<br/>flush at xact end / timeout"]
FIXED["fixed-numbered kinds<br/>plain shmem block<br/>(checkpointer, bgwriter, WAL, IO, SLRU, archiver)"]
VAR["variable-numbered kinds<br/>DSA + dshash, keyed by PgStat_HashKey<br/>(per-relation, per-function, per-db, replslot, subscription, backend)"]
FILE["pgstat.stat on disk<br/>(checkpointer writes at shutdown,<br/>startup reads / discards after crash)"]
end
PEND --> FLUSH
FLUSH --> FIXED
FLUSH --> VAR
FIXED --> FILE
VAR --> FILE
subgraph LIVEDOC["postgres-wait-events-progress.md (live view)"]
STATUS["backend status<br/>backend_status.c -> PgBackendStatus"]
WAIT["wait events<br/>wait_event.c, taxonomy codegen'd<br/>from wait_event_names.txt"]
PROG["command progress<br/>backend_progress.c -> st_progress_param[]"]
end
LIVE --> STATUS
LIVE --> WAIT
LIVE --> PROG
subgraph READERS["readers (out of scope here)"]
VIEWS["pg_stat_* / pg_stat_activity /<br/>pg_stat_progress_* views"]
AV["autovacuum scheduler"]
PLAN["planner re-analyze heuristics"]
end
FIXED --> VIEWS
VAR --> VIEWS
VAR --> AV
VAR --> PLAN
STATUS --> VIEWS
WAIT --> VIEWS
PROG --> VIEWS
The structural takeaways a reader should carry into the module docs:
- Two storage classes inside one cumulative system. Fixed-numbered
kinds (one or a handful of objects — checkpointer, bgwriter, WAL, IO,
SLRU, archiver) live in plain shared memory carved at startup.
Variable-numbered kinds (per-relation, per-function, per-database,
replication slot, subscription, per-backend) live in dynamic shared
memory reached through a
dshashhash table keyed byPgStat_HashKey(kind + dboid + objid). The counters for variable kinds are allocated separately from the hash entry (the entry holds a pointer to abody), so different kinds share one table without bloating it. - Backends never write the shared store on the hot path. A backend
bumps process-local pending counters and flushes them with
pgstat_report_stat()at transaction end (or on a timeout), so the expensive shared-memory write is batched. This is the architectural payoff of the PG15 redesign over the old UDP-packet collector. - Live reporting is even cheaper and never accumulates. Wait-event and
status updates write directly into the backend’s own
PgBackendStatus/MyProcslot with no locking on the common path — designed to be lit up on every lock wait without measurable cost.
Reading order
Section titled “Reading order”Cross-referenced-first: read the cumulative system before the live view,
because the live view’s status reporting (backend_status.c) is physically
co-located with and partly initialized alongside the cumulative subsystem,
and because the “fixed vs variable kind / DSA dshash” model is the harder
idea that the rest builds on.
postgres-cumulative-stats.md— the PG15 shared-memory subsystem: thePgStat_Kindtaxonomy, fixed-vs-variable storage, the per-backend pending → flush →dshashpath, and the checkpointer/startup file lifecycle. Readpostgres-shared-memory-ipc.md(server-architecture) first if the DSA/dshash/LWLock substrate is unfamiliar.postgres-wait-events-progress.md— the live self-report: backend status, the code-generated wait-event taxonomy, and the progress array. Lighter and more self-contained; safe to read second.
Then fan out to the readers: postgres-autovacuum.md and the planner docs
(both consume variable-numbered relation stats), and
postgres-evolution-statistics.md for the historical arc.
Detail-doc summaries
Section titled “Detail-doc summaries”Forward references — these module docs may not exist yet. One-line scope each:
| Module doc | What it will cover |
|---|---|
postgres-cumulative-stats.md | The PG15 shared-memory cumulative statistics system: the PgStat_Kind taxonomy and PgStat_KindInfo dispatch, fixed-numbered (plain shmem) vs variable-numbered (DSA + dshash, keyed by PgStat_HashKey) storage, the process-local pending-counter → pgstat_report_stat() flush model, and the file lifecycle (checkpointer serializes pgstat.stat at shutdown, startup reloads or discards after a crash) — i.e. the replacement for the old stats-collector process. |
postgres-wait-events-progress.md | The live self-report plane: PgBackendStatus and pg_stat_activity backing (backend_status.c), the wait-event class taxonomy and its code generation from wait_event_names.txt via generate-wait_event_types.pl (wait_event.c), and per-command progress reporting with parallel leader/worker aggregation (backend_progress.c). |
Adjacent sections
Section titled “Adjacent sections”- server-architecture (
postgres-overview-server-architecture.md) — the closest neighbor and the biggest handoff. It owns the processes that feed and persist these stats (checkpointer, bgwriter, startup, the backend lifecycle) and the shared-memory / DSA /dshash/ LWLock substrate the cumulative system is built on. This section owns the counters; that one owns the machinery that emits and stores them. - txn-recovery (
postgres-overview-txn-recovery.md) — supplies the subjects of several fixed-numbered stat kinds: WAL activity, SLRU caches, and the checkpointer’s work. SLRU and WAL stats counters live here; the SLRU caches and the WAL machinery themselves live there. - storage-engine (
postgres-overview-storage-engine.md) — the IO stat kind reports on buffer-manager and async-IO activity; the relation stat kind (n_tup_ins/upd/del, n_dead_tup) describes heap mutation. The mechanisms behind those numbers (postgres-buffer-manager.md,postgres-aio.md,postgres-heap-am.md) are owned there. - query-processing (
postgres-overview-query-processing.md) — a consumer: the planner reads variable-numbered relation stats to decide when cached plans and staleANALYZEdata need refreshing, and progress reporting surfacesCREATE INDEX/ parallel-query progress.