PostgreSQL Statistics — From a Collector Process to Extended Stats and Shared-Memory Cumulative Stats
Contents:
- Why this subsystem had to evolve (the original limitation)
- Timeline
- Era 0 — Per-column pg_statistic + the UDP stats-collector process
- PG10 (2017) — CREATE STATISTICS: functional dependencies + multivariate n-distinct
- PG12 (2019) — Multivariate MCV lists in extended statistics
- PG14 (2021) — Statistics on expressions, and inheritance-aware ext stats
- PG15 (2022) — The collector dies: shared-memory cumulative statistics
- PG16 (2023) — pg_stat_io: I/O as a first-class cumulative stat kind
- PG18 (2025) — Per-backend I/O, WAL and byte-level columns in pg_stat_io
- Where it stands at REL_18
- Sources
Why this subsystem had to evolve (the original limitation)
Section titled “Why this subsystem had to evolve (the original limitation)”“Statistics” in PostgreSQL is not one subsystem but two, and the distinction is the key to reading this whole timeline. They are different in their data, their consumers, their update cadence, and their failure modes — and they evolved independently. Conflating them is the classic mistake that the module docs (postgres-cumulative-stats.md and postgres-extended-statistics.md) both open by warning against.
Planner statistics are sampled estimates of data shape. ANALYZE
draws a fixed-size reservoir sample of a table’s rows and condenses each
column into a pg_statistic row: the fraction of nulls, the average
width, the number of distinct values (stadistinct), a most-common-value
(MCV) list, and an equi-depth histogram of the remaining values. The
planner reads these back to turn a WHERE clause into a selectivity, and
selectivity into a row-count estimate — the single most consequential
number in query planning. These statistics are refreshed occasionally,
are approximate by design, and feed the cost-based optimizer.
The original limitation here is structural: per-column statistics assume
column independence. When a query filters on two correlated columns —
the textbook example is WHERE city = 'New York' AND state = 'NY', where
the city already determines the state — the planner multiplies the two
individual selectivities as if they were independent events. The product
is far smaller than reality, the row estimate collapses toward one row,
and the planner picks a nested-loop plan that is catastrophic when the
real cardinality is millions. No amount of sampling more rows per column
fixes this; the information that the columns are correlated simply is not
present in a collection of single-column summaries. That gap is what the
PG10–PG14 extended-statistics work set out to close.
Cumulative activity statistics are exact event tallies of what has
happened over time: how many sequential scans this table has served, how
many tuples were inserted or updated, how many dead tuples have piled up
since the last vacuum, how many WAL bytes were written, how many blocks
were read from disk versus found in the buffer cache. These are not
sampled — they are incremented on every relevant operation — and they
drive the entire pg_stat_* view family plus, critically, autovacuum’s
decision of which table to vacuum next.
The original limitation here is operational. From the earliest versions
through PG14, these counters lived in a single dedicated stats collector
process. Every backend serialized its counter deltas into UDP datagrams
and fired them at the collector over a loopback socket. The collector
owned the only authoritative in-memory copy of every counter, periodically
wrote the whole thing to a file, and answered every pg_stat query by
having the asking backend read and deserialize that file. This design had
three structural costs that grew worse with scale:
- Lossy. UDP datagrams could be silently dropped under load. Counters simply undercounted; there was no retransmission and no error.
- A throughput ceiling. One process received and merged the entire cluster’s stats traffic. On a busy many-core machine that single consumer became the bottleneck.
- Expensive to read. Every
pg_statquery slurped and parsed an entire serialized stats file even to read one counter for one table.
Neither limitation was a bug — both designs were reasonable for their era. But as PostgreSQL pushed into larger schemas (where column correlation is the norm, not the exception) and larger, busier clusters (where the collector’s single-process model could not keep up), both subsystems had to be rebuilt. The two rebuilds happened on parallel tracks, summarized in the timeline below.
Timeline
Section titled “Timeline”timeline
title PostgreSQL Statistics Evolution
section Planner statistics (data shape)
pre-10 : Per-column pg_statistic only : nullfrac, stadistinct, MCV, histogram : assumes column independence
PG10 2017 : CREATE STATISTICS : functional dependencies : multivariate n-distinct
PG12 2019 : Multivariate MCV lists : correlated multi-clause selectivity
PG14 2021 : Statistics on expressions : inheritance-aware ext stats (stxdinherit)
section Cumulative statistics (activity counters)
pre-15 : Single UDP stats-collector process : lossy, single bottleneck, whole-file reads
PG15 2022 : Collector removed : DSA + dshash shared memory : per-backend pending buffers
PG16 2023 : pg_stat_io : I/O as a fixed stat kind
PG18 2025 : Per-backend I/O stats : WAL + byte-level columns in pg_stat_io
Era 0 — Per-column pg_statistic + the UDP stats-collector process
Section titled “Era 0 — Per-column pg_statistic + the UDP stats-collector process”This is the baseline both tracks start from, and it is worth being precise about because every later release is a delta against it.
Planner side — pg_statistic, one row per column. ANALYZE (in
analyze.c) drew a sample sized at 300 * statistics_target rows using a
two-stage block sample plus the Vitter reservoir algorithm, then ran a
type-specific compute_stats routine per column. The result was a single
pg_statistic row holding stanullfrac, stawidth, stadistinct, and up
to a handful of slots, each tagged with a STATISTIC_KIND_* code: an MCV
list (STATISTIC_KIND_MCV), an equi-depth histogram of the non-MCV residue
(STATISTIC_KIND_HISTOGRAM), and a physical/logical correlation
(STATISTIC_KIND_CORRELATION). The slot design was deliberately
open-ended — a type’s typanalyze could fill arbitrary statistic shapes —
which is why later extended statistics could bolt new kinds on without a
catalog redesign of the per-column path.
The planner consumed these in selfuncs.c: var_eq_const probed the MCV
list for an exact frequency or fell back to
(1 - sumcommon - nullfrac) / otherdistinct; scalarineqsel
binary-searched the histogram and interpolated within the bracketing bin.
All of this is per column. Multi-clause selectivity was the product of
per-clause selectivities — the independence assumption baked into the very
shape of the catalog. Mechanism detail lives in
postgres-extended-statistics.md.
Cumulative side — the stats collector. A dedicated auxiliary process,
the stats collector, was forked by the postmaster. Backends did not write
shared counters directly; they accumulated deltas locally and periodically
serialized them into UDP messages sent over a loopback datagram socket to
the collector. The collector merged every message into its private
hashtables — the only in-memory copy — and on a timer wrote the tables
out to a stats file under pg_stat_tmp/. A backend running
SELECT * FROM pg_stat_user_tables triggered a request that caused a fresh
file to be written and then read and deserialized by the asking backend.
flowchart LR
subgraph backends["Backends (pre-15)"]
B1["backend 1<br/>local deltas"]
B2["backend 2<br/>local deltas"]
B3["backend N<br/>local deltas"]
end
COLL["stats collector process<br/>owns the ONLY copy<br/>of every counter"]
FILE[("pgstat.stat file<br/>(whole-file serialize)")]
READER["any backend reading<br/>a pg_stat_* view"]
B1 -. "UDP datagram<br/>(may be dropped)" .-> COLL
B2 -. "UDP datagram<br/>(may be dropped)" .-> COLL
B3 -. "UDP datagram<br/>(may be dropped)" .-> COLL
COLL -- "timer write" --> FILE
FILE -- "read + deserialize<br/>entire file per query" --> READER
The three weaknesses named in the previous section — lossiness, the single-consumer ceiling, and whole-file reads — are all visible in this diagram. The collector is the only writer (so the totals are internally consistent) but it is also the only place the numbers exist, the only process draining the UDP firehose, and the source of a file that must be re-read in full on every query. PG15 attacked all three at once.
PG10 (2017) — CREATE STATISTICS: functional dependencies + multivariate n-distinct
Section titled “PG10 (2017) — CREATE STATISTICS: functional dependencies + multivariate n-distinct”PG10 introduced the extended statistics feature and the
CREATE STATISTICS command, with a new catalog pg_statistic_ext holding
the definition of a declared statistics object (which columns, which
kinds) and statistics data stored alongside. The first release shipped two
kinds, both aimed squarely at the column-independence problem.
Functional dependencies (STATS_EXT_DEPENDENCIES, 'f'). A functional
dependency a -> b says that knowing the value of column a determines
the value of column b (the city/state example). PostgreSQL does not store
a boolean “yes/no”; it stores a degree — a number in [0,1] measuring how
often the dependency holds across the sample, computed in dependencies.c
by sorting on a and checking, within each group of equal a values,
whether b is constant. At planning time the dependency lets the estimator
stop multiplying correlated clauses: if a functionally determines b,
the clause on b adds little independent selectivity, so the combined
estimate is pulled back up toward the selectivity of a alone.
Multivariate n-distinct (STATS_EXT_NDISTINCT, 'd'). A single
stadistinct answers “how many distinct values in column a?”. The
multivariate version answers “how many distinct combinations of
(a, b, c)?” — computed with the same Haas–Stokes Duj1 estimator applied
to column groups (mvdistinct.c). This directly fixes GROUP BY a, b
cardinality and the denominator of grouped-equality selectivity, which the
independence product gets badly wrong whenever the columns are correlated.
The structural shift is that the definition and the data are now
separate catalog objects from pg_statistic, and they are opt-in: a
DBA must run CREATE STATISTICS naming the column group. The per-column
path was untouched; extended stats are a correction layer the planner
consults in addition. ANALYZE learned to walk the pg_statistic_ext
rows for a table and, for each, call into the extended-stats builder to fill
the requested kinds. The estimator-side combination rule and the build
entry point are documented in
postgres-extended-statistics.md.
PG12 (2019) — Multivariate MCV lists in extended statistics
Section titled “PG12 (2019) — Multivariate MCV lists in extended statistics”The PG10 kinds handle aggregate correlation — how many distinct
combinations exist, and whether one column determines another — but they
cannot estimate the selectivity of a specific combination of values.
Functional dependencies assume the dependency holds uniformly; they say
nothing about which (city, state) pairs are common. For skewed,
correlated data that uniformity assumption itself fails.
PG12 added the third extended-stats kind: the multivariate MCV list
(STATS_EXT_MCV, 'm', built in mcv.c). It is the multi-column analogue
of the per-column MCV list — a list of the most common value combinations
across the declared columns, each with its observed frequency, plus a
base-frequency (the independence-assumption frequency) so the planner can
tell how far each combination departs from independence. ANALYZE builds it
by sampling combinations and keeping those whose frequency clears the same
kind of statistical cutoff used for per-column MCVs.
The estimator change is the payoff. Instead of either fully multiplying clause selectivities (independence) or fully collapsing them (a strict functional dependency), the multivariate MCV path lets the planner read the actual joint frequency for matched combinations and apply a blended rule for the residue:
P(a, b) = f * Min(P(a), P(b)) + (1 - f) * P(a) * P(b)where f reflects how much of the distribution the MCV list explains. The
MCV list covers the common, correlated head of the distribution exactly;
the independence product handles the long tail. This is strictly more
expressive than the PG10 kinds, which is why a typical
CREATE STATISTICS s (ndistinct, dependencies, mcv) ON a, b FROM t now
requests all three — they are complementary, not redundant. The combination
logic (statext_clauselist_selectivity, mcv_clauselist_selectivity) is
detailed in
postgres-extended-statistics.md.
The structural shape did not change in PG12 — no new catalog, no new
top-level path. A third 'm' kind was simply added to the same
pg_statistic_ext enabled-kinds set and the same ANALYZE build loop, which
is exactly what the open-ended PG10 design was built to allow.
PG14 (2021) — Statistics on expressions, and inheritance-aware ext stats
Section titled “PG14 (2021) — Statistics on expressions, and inheritance-aware ext stats”By PG13 the extended-statistics catalog could describe correlation among
columns, but only columns. A predicate over an expression —
WHERE lower(email) = '...', WHERE (a + b) > 100,
WHERE date_trunc('day', ts) = ... — had no statistics at all; the planner
fell back to hard-coded default selectivities, which are routinely wrong by
orders of magnitude.
PG14 closed this with two changes.
Statistics on expressions (STATS_EXT_EXPRESSIONS, 'e').
CREATE STATISTICS learned to accept arbitrary expressions, not just bare
column names. ANALYZE evaluates each declared expression over the sample
and then runs the ordinary per-column statistics machinery on the
resulting values — producing nullfrac, stadistinct, an MCV list, and a
histogram for the expression, stored in the extended-stats data. In
extended_stats.c this shows up as a dedicated expression path (the
STATS_EXT_EXPRESSIONS branch in the build and serialize loops): the
expressions carried by a statistics object are evaluated, folded through
eval_const_expressions, and analyzed one at a time. The effect is that a
single-expression CREATE STATISTICS object behaves like “give this
expression the same statistics a real column would have,” which is the
common case users actually want. A single-column functional-index
workaround existed before, but it required an actual index and the
maintenance cost that comes with it; expression stats decouple the
statistics from the index.
Inheritance-aware extended statistics (stxdinherit). The
pg_statistic_ext_data catalog gained a boolean stxdinherit column and a
two-column primary key (stxoid, stxdinherit). This mirrors the
long-standing stainherit flag on per-column pg_statistic: a partitioned
or inherited table now stores two sets of extended statistics — one
describing only the parent’s own rows (stxdinherit = false) and one
describing the parent plus all inheritance/partition children
(stxdinherit = true). The planner picks whichever matches the scope of
the scan it is costing. Before this, extended statistics on a partitioned
root were effectively unusable because there was no place to record the
all-children distribution. The shape change is visible directly in the
catalog header:
// pg_statistic_ext_data — src/include/catalog/pg_statistic_ext_data.hOid stxoid; /* statistics object this data is for */bool stxdinherit; /* true if inheritance children are included *//* ... pg_ndistinct, pg_dependencies, pg_mcv_list payload columns ... */With PG14 the planner-statistics track reaches essentially its current
shape: per-column pg_statistic for the independence baseline, plus
opt-in extended statistics over column groups and expressions, each in
own-rows and with-children variants. Everything from here is refinement
inside that frame — the mechanism as it stands on REL_18 is in
postgres-extended-statistics.md.
PG15 (2022) — The collector dies: shared-memory cumulative statistics
Section titled “PG15 (2022) — The collector dies: shared-memory cumulative statistics”This is the largest single change on either track, and it switched tracks:
nothing about planner statistics changed, but the cumulative statistics
subsystem was completely rebuilt. PG15 (commit 5891c7a8e, Andres Freund)
removed the stats collector process entirely and replaced the
UDP-message / single-owner / whole-file model with a shared-memory
subsystem under src/backend/utils/activity/.
The redesign attacks each of Era 0’s three weaknesses directly:
- Lossy → exact. There is no more UDP. Backends update counters in shared memory (via buffered local pending entries), so a delta is never silently dropped.
- Single bottleneck → no central consumer. No process drains a firehose; every backend writes its own slice of shared memory under a fine-grained lock.
- Whole-file reads → targeted lookups. A
pg_statread finds the one entry it needs in a hashtable instead of deserializing the whole file.
The data structure split. Stats are partitioned into two populations.
Variable-numbered kinds — per-relation, per-function, per-replication-slot,
per-subscription, per-backend — live in a DSA-backed dshash keyed by
PgStat_HashKey {kind, dboid, objid}. The dshash entry holds only a
refcount, a generation counter, a “dropped” flag, and a dsa_pointer to the
separately allocated, variable-size stats body. Fixed-numbered kinds —
checkpointer, bgwriter, archiver, WAL, SLRU, and (from PG16) IO — live in a
plain shared-memory control block, one slot apiece, because there is exactly
one of each per cluster.
The hot path stays local. A backend does not touch shared memory on
every tuple. It buffers updates in process-local pending entries reached
through a local hashtable (pgStatEntryRefHash) of reference-counted
PgStat_EntryRef handles. pgstat_report_stat() flushes the pending list
into shared memory under each entry’s dedicated LWLock at most once per
second (forced at commit). Lifetime is reference-counted, drops are
transactional (deferred to commit and written into the commit/abort WAL
record so replicas and crash recovery stay consistent), and the
checkpointer serializes the whole thing to pg_stat/pgstat.stat at
shutdown for the startup process to reload — or discard after a crash. All
of this mechanism (the dshash entry layout, the refcount/generation race
handling, the flush callbacks) is documented in
postgres-cumulative-stats.md; it is not
re-derived here.
The before/after structural shift:
flowchart TB
subgraph before["Before — PG14 and earlier"]
bB1["backend"] -. "UDP delta" .-> bC["stats collector<br/>(sole owner)"]
bC --> bF[("pgstat.stat<br/>whole file")]
bF -- "full deserialize" --> bR["reader backend"]
end
subgraph after["After — PG15+"]
aB["backend<br/>local pending entries<br/>(PgStat_EntryRef)"]
aB -- "pgstat_report_stat()<br/>flush <= 1/sec under per-entry LWLock" --> aSHM
subgraph aSHM["shared memory"]
aDSH["dshash: variable kinds<br/>key {kind, dboid, objid}<br/>-> dsa_pointer to body"]
aFIX["fixed control block<br/>checkpointer, bgwriter,<br/>archiver, WAL, SLRU, IO"]
end
aSHM -- "targeted lookup" --> aR["reader backend"]
aSHM -. "checkpointer writes at shutdown" .-> aFILE[("pg_stat/pgstat.stat")]
aFILE -. "startup reads<br/>discard after crash" .-> aSHM
end
before --> after
The lesson PostgreSQL drew is the classic message-passing-versus-shared-memory trade-off: the collector’s UDP model gave perfect write isolation (only one process ever mutated the totals) but paid for it in loss, a serialization bottleneck, and read cost. Shared memory with per-entry locking gives up the “single writer” simplicity in exchange for exactness, scalability, and cheap reads — and recovers consistency through reference counting and transactional drops rather than through a single owner.
PG16 (2023) — pg_stat_io: I/O as a first-class cumulative stat kind
Section titled “PG16 (2023) — pg_stat_io: I/O as a first-class cumulative stat kind”Before PG16, I/O accounting was scattered: block-read and block-hit counters lived on per-relation stats, and there was no cluster-wide, broken-down view of who was doing I/O for what reason. You could not easily answer “how much of my read traffic is the bulk-read strategy used by sequential scans versus normal buffer accesses, and how much is being read by vacuum versus the checkpointer?”
PG16 added a brand-new fixed cumulative stat kind, PGSTAT_KIND_IO, and
the pg_stat_io view over it. Because the cumulative subsystem had already
been rebuilt in PG15 around pluggable stat kinds, adding I/O was a matter of
registering a new fixed kind with its own flush callback — not a redesign.
This is the PG15 architecture paying off: a new kind slots into the same
shared-memory control block and the same pgstat_report_stat() flush
discipline.
pg_stat_io reports counts dimensioned by backend type (client backend,
autovacuum worker, checkpointer, background writer, …), I/O object
(relation, temp relation), and I/O context (normal, vacuum, bulkread,
bulkwrite). Each cell tallies reads, writes, extends, hits, evictions, reuses,
fsyncs, and so on. Suddenly the operator can attribute physical I/O to a
specific (backend type, context) pair — the single most requested I/O
observability feature for years. The new kind is visible in the stat-kind
table that survives on REL_18:
// pgstat_kind.h — fixed-numbered kinds, REL_18#define PGSTAT_KIND_ARCHIVER 7#define PGSTAT_KIND_BGWRITER 8#define PGSTAT_KIND_CHECKPOINTER 9#define PGSTAT_KIND_IO 10 /* added PG16 */#define PGSTAT_KIND_SLRU 11#define PGSTAT_KIND_WAL 12PG18 (2025) — Per-backend I/O, WAL and byte-level columns in pg_stat_io
Section titled “PG18 (2025) — Per-backend I/O, WAL and byte-level columns in pg_stat_io”PG18 refined the I/O accounting along two axes, again without touching the PG15 architecture.
Per-backend I/O statistics. PG18 added PGSTAT_KIND_BACKEND (kind 6) —
a variable-numbered kind keyed per backend — and the
pg_stat_get_backend_io() function, so I/O can now be attributed to an
individual session, not only to a backend type in aggregate. A long-running
session hammering the disk can be identified directly rather than inferred
from the type-level totals. As a variable-numbered kind it lives in the same
dshash machinery that holds per-relation and per-function stats; the entry is
created and dropped over the session’s lifetime through the same
reference-counted PgStat_EntryRef path.
WAL and byte-level columns folded into pg_stat_io. The PG18 pg_stat_io
view gained byte-granular columns (read_bytes, write_bytes, and an
op_bytes describing the unit size of an operation) and an object column
that distinguishes the WAL I/O object from relation I/O — so WAL reads and
writes are now visible in the same view and the same byte units as relation
I/O. The view definition on REL_18 reflects this directly:
-- pg_stat_io — src/backend/catalog/system_views.sql (REL_18)CREATE VIEW pg_stat_io ASSELECT b.backend_type, b.object, -- relation vs WAL (PG18) b.context, ... b.read_bytes, -- byte-level accounting (PG18) ...Counting in bytes rather than only in fixed-size blocks matters because WAL
and some I/O paths operate in units other than the 8 KB page; op_bytes
records the unit so a count can be turned into an accurate byte figure.
These PG16/PG18 changes are deliberately additive: each is a new stat kind or new columns on an existing kind, riding the PG15 shared-memory infrastructure. None of them reintroduces a central process or a whole-file read.
Where it stands at REL_18
Section titled “Where it stands at REL_18”On REL_18 the two tracks have each reached a stable shape that the module docs describe in mechanism-level detail:
Planner statistics. ANALYZE (src/backend/commands/analyze.c) still
produces per-column pg_statistic rows as the independence baseline:
nullfrac, width, stadistinct, MCV, histogram, correlation. Layered on top,
extended statistics (src/backend/statistics/extended_stats.c and the
dependencies.c / mvdistinct.c / mcv.c builders) provide opt-in
correlation-aware statistics over declared column groups and expressions,
with four kinds — functional dependencies ('f'), multivariate n-distinct
('d'), multivariate MCV ('m'), and expression stats ('e') — each stored
in own-rows and with-children (stxdinherit) variants in
pg_statistic_ext_data. The estimator combines correlated clauses through
the MCV-plus-residue blend rather than the independence product. The full
mechanism is in
postgres-extended-statistics.md.
Cumulative statistics. There is no stats collector process. The
shared-memory subsystem in src/backend/utils/activity/pgstat.c (and its
pgstat_*.c siblings) holds variable-numbered kinds in a DSA/dshash and
fixed-numbered kinds in a control block. On REL_18 the built-in kinds run
PGSTAT_KIND_DATABASE (1) through PGSTAT_KIND_WAL (12) — including
PGSTAT_KIND_BACKEND (6, per-backend, PG18) and PGSTAT_KIND_IO (10,
PG16) — with the IDs at and above 24 reserved for custom (extension-defined)
stat kinds. Backends buffer pending deltas locally and flush under per-entry
LWLocks at most once per second; the checkpointer serializes to
pg_stat/pgstat.stat at shutdown and the startup process reloads or
discards after a crash. The full mechanism is in
postgres-cumulative-stats.md. The
live-state counterpart — wait events and progress reporting, which is the
other half of monitoring and is snapshot-not-cumulative — is covered in
postgres-wait-events-progress.md.
Next step. PostgreSQL 19 (the next major release, in development) is expected to continue the additive pattern — new and extension-defined cumulative stat kinds slotting onto the PG15 shared-memory frame, and incremental planner-statistics refinement — rather than another structural rebuild of either track. The PG15 collector removal and the PG10–PG14 extended-statistics frame are the two architectural decisions everything since has built on.
Sources
Section titled “Sources”Module docs (current-state mechanism — do not re-derive):
- postgres-cumulative-stats.md — the PG15 shared-memory cumulative stats subsystem (dshash, DSA, pending buffers, refcount lifetime, transactional drops, fixed vs variable kinds).
- postgres-extended-statistics.md —
ANALYZE,
pg_statistic, and the extended-statistics builders/estimators (functional dependencies, n-distinct, MCV, expressions). - postgres-wait-events-progress.md — the live-state monitoring counterpart (wait events, progress views).
Release notes (release attributions):
- PostgreSQL 10 release notes —
CREATE STATISTICS(functional dependencies, multivariate n-distinct). - PostgreSQL 12 release notes — multivariate MCV lists in extended statistics.
- PostgreSQL 14 release notes — statistics on expressions; inheritance-aware extended statistics.
- PostgreSQL 15 release notes — stats collector removed; cumulative
statistics moved to shared memory (commit
5891c7a8e). - PostgreSQL 16 release notes —
pg_stat_io. - PostgreSQL 18 release notes — per-backend I/O statistics; WAL and
byte-level columns in
pg_stat_io.
Key source files (observable on REL_18, commit 273fe94):
src/backend/utils/activity/pgstat.c— cumulative stats core.src/include/utils/pgstat_kind.h— the stat-kind ID registry (PGSTAT_KIND_*).src/backend/statistics/extended_stats.c— extended-statistics build and estimate orchestration.src/backend/commands/analyze.c—ANALYZEsampling and per-columncompute_stats.src/include/catalog/pg_statistic_ext.h,src/include/catalog/pg_statistic_ext_data.h— extended-stats catalogs (definition and data, includingstxdinherit).src/backend/catalog/system_views.sql—pg_stat_ioand thepg_stat_*view definitions.