A query optimizer answers one question: given a declarative SQL statement
that says what result is wanted, which of the exponentially many physical
execution strategies that all compute that result should we run? The
relational model guarantees that many physically distinct plans are
logically equivalent — a join can be a nested loop, a sort-merge, or a
hash join; a table can be reached by sequential scan or by any of its
indexes; a three-way join can be associated and commuted into many shapes.
Database System Concepts (Silberschatz, 7e, ch. 16 “Query Optimization”)
frames this as two coupled sub-problems: plan-space enumeration (which
equivalent plans to consider) and cost estimation (a numeric model that
ranks them so we can pick one without running them). Database Internals
(Petrov, ch. 7) draws the same boundary between a rule-based rewrite stage
that applies always-good transformations and a cost-based stage that
searches a space of alternatives under a cost model.
The canonical realization of cost-based optimization is the System R
optimizer, described in Selinger et al., “Access Path Selection in a
Relational Database Management System” (SIGMOD 1979, captured at
research/dbms-papers/systemr-optimizer.md). System R established the
template that PostgreSQL still follows almost line for line:
Enumerate access paths per table. For each relation, consider the
sequential scan and every applicable index, and estimate the cost of
each using catalog statistics.
Estimate selectivity. Each predicate has a selectivity — the
fraction of rows it passes — and the product of selectivities estimates
how many rows survive. Cardinality estimates drive every downstream cost.
Enumerate join orders by dynamic programming. Rather than evaluate
all n! orderings, System R builds the best plan for every subset of
relations bottom-up: best 1-table plans, then best 2-table plans built
from them, then 3-table, up to the full set. The key pruning rule is the
principle of optimality — the cheapest plan for a set of relations is
built from cheapest plans for its subsets — which collapses a factorial
search to roughly 2^n.
Carry “interesting orders.” A plan that is more expensive but emits
rows already sorted on a column needed later (a merge-join key, an
ORDER BY, a GROUP BY) may still win overall by avoiding a later sort.
System R therefore does not keep only the single cheapest plan per
relation set; it keeps the cheapest plan for each interesting sort
order plus the cheapest unordered plan. This is the seed of PostgreSQL’s
pathkeys.
Selinger’s design space leaves several knobs that every cost-based
optimizer must set, and these are the choices to watch as the rest of this
document unfolds:
What is the unit of the search? System R searches over plans
directly. PostgreSQL interposes an intermediate abstraction — the
Path — a lightweight plan sketch carrying just cost and sort order,
cheaper to generate and discard than a full executable plan.
How exhaustive is the join search? Pure DP is exponential in the
number of relations. Every engine needs a fallback for large joins
(PostgreSQL’s is the genetic optimizer, GEQO).
How are equivalent plans pruned during the search? The pruning rule
decides which “almost as good” plans survive to compete one level up.
The Berkeley POSTGRES project re-implemented this template with its own
extensions; Fong’s MS thesis “The Design and Implementation of the
POSTGRES Query Optimizer” (UCB) documents how the System-R cost model was
adapted into the codebase whose direct descendant is the modern
src/backend/optimizer/ tree. The vocabulary — RelOptInfo, Path,
pathkeys, add_path — traces straight back to that work.
Scope. This document is the spine and the map of that optimizer:
the three-phase control flow, the RelOptInfo/Path/Plan distinction,
bottom-up path generation, cheapest-path selection, and Path→Plan
conversion. The mechanism of each major piece is deferred to a sibling
doc: the per-AM path builders to postgres-path-generation.md, the
join-order DP and join_search internals to postgres-join-ordering.md,
the selectivity and cost formulas to postgres-cost-model.md, the
per-node Path→Plan translation detail to postgres-plan-creation.md, and
what the executor does with the finished Plan to postgres-executor.md.
Between “the System R template” and “the actual PostgreSQL symbols” sits a
layer of engineering conventions that nearly every serious cost-based
optimizer adopts. Naming them first means the PostgreSQL symbols in the
next section read as instances of known patterns rather than inventions.
Three logical stages: rewrite, plan-space search, physical plan emission
Optimizers are almost universally staged. A rewrite / normalization
stage applies transformations that are (almost) always beneficial and need
no cost comparison: flattening nested AND/OR, folding constant
expressions, pulling simple subqueries up into the parent so they join in
the same search, pushing predicates down toward scans. A cost-based
search stage then enumerates genuinely alternative physical strategies
and ranks them. Finally an emission stage materializes the single
winner into whatever the execution engine consumes. Conflating these stages
is a classic source of bugs: a “rewrite” that is only sometimes beneficial
belongs in the cost-based stage instead.
A lightweight plan abstraction distinct from the executable plan
Generating a full executable operator tree for every candidate is wasteful
when most candidates will be discarded. The standard fix is a two-level
representation: a cheap, planning-only node that carries exactly the
information the search needs (estimated cost, output cardinality, output
sort order, required parameters) and omits execution detail, plus a
heavier executable node built only for the chosen plan. SQL Server’s
optimizer has its Group/GroupExpression memo; Calcite has RelNode with
RelTraits; PostgreSQL has Path versus Plan. The discipline is the
same: search over the cheap thing, build the expensive thing once.
One canonical container per relation, holding many candidate plans
The DP search needs a place to accumulate “all the ways to produce this set
of relations” and to record the winner(s) so higher levels can reuse them.
The convention is a per-relation-set structure (System R’s “best plans”
table entry; the memo group in Cascades-style optimizers) that owns a list
of candidate plans and a cached pointer to the cheapest. Crucially there is
exactly one such container per logically distinct relation set, so that
plans arriving from different sub-derivations of the same set compete
against each other in one place.
Pruning by cost and by interesting physical properties
If the search kept only the single cheapest plan per relation set, it would
throw away plans whose sort order saves a sort upstream. The universal
refinement is to prune along all dimensions that matter to a consumer:
keep the cheapest plan for each distinct physical property combination
(sort order, and in PostgreSQL also parameterization). A plan is discarded
only if some surviving plan is at least as cheap and at least as good on
every physical property. This is “interesting orders” generalized.
Bottom-up dynamic programming with a large-query escape hatch
The DP join search is O(2^n) in time and space — fine for a dozen
relations, ruinous for fifty. Every DP optimizer therefore pairs it with a
heuristic search (greedy, randomized, or genetic) that trades optimality
for tractability above a threshold. The optimizer picks the strategy by
relation count.
The single most useful mental model of the PostgreSQL optimizer is a stack
of three functions, each handling one logical layer and delegating the rest
inward. The file comment in planmain.c states the division of labor
exactly: planner.c deals with “subselects, inheritance, aggregates,
grouping, and so on,” while planmain.c is “the main code for planning a
basic join operation, shorn of” those features.
flowchart TB
P["planner / standard_planner<br/>(per-statement entry; sets up PlannerGlobal)"]
SQP["subquery_planner<br/>(one per Query level: pull-up, canonicalize,<br/>const-fold, outer-join reduction)"]
GP["grouping_planner<br/>(upper ops: set-ops, GROUP/agg, window,<br/>DISTINCT, ORDER BY, LIMIT, ModifyTable)"]
QP["query_planner<br/>(scan+join core setup: base rels, ECs,<br/>qual distribution, pathkeys)"]
MOR["make_one_rel<br/>(generate all scan+join Paths bottom-up)"]
CP["create_plan<br/>(turn the one cheapest Path into a Plan)"]
P --> SQP
SQP --> GP
GP --> QP
QP --> MOR
MOR -.->|"returns final joinrel<br/>with cheapest paths"| GP
GP -.->|"layers upper paths onto<br/>UPPERREL_FINAL"| SQP
SQP -.->|"returns PlannerInfo root"| P
P --> CP
CP -.->|"PlannedStmt"| P
Figure 1 — The three nested planning phases plus emission. planner is
the external entry point; it calls subquery_planner once per Query level
(recursing for sub-Queries), which calls grouping_planner for the upper
operations, which calls query_planner/make_one_rel for the scan-and-join
core. Paths flow back up; only after the whole Path tree exists does
planner call create_plan once to emit the executable Plan.
The external entry point is thin. planner is just a hook wrapper; the real
work starts in standard_planner, which sets up the per-invocation
PlannerGlobal, decides whether parallelism is feasible, derives the
tuple_fraction (how much of the result the caller is likely to consume),
and then calls subquery_planner. After planning returns, it does the
Path→Plan conversion:
Three things to notice. First, planning and emission are cleanly separated:
the entire search produces a RelOptInfo (final_rel) full of competing
Paths; create_plan runs once, on the single winner. Second, the winner
is chosen by get_cheapest_fractional_path, which respects the
tuple_fraction — a cursor that will fetch only a few rows can prefer a
fast-start plan over a cheaper-overall one. Third, final_rel is fetched
from the upper relations registry (UPPERREL_FINAL), not from the join
search directly; the upper rels are where grouping_planner stacks the
post-join operations.
Phase 1 — subquery_planner: preprocess one Query level
subquery_planner is invoked once per Query node and recurses for
sub-Queries. It is the rewrite/normalize stage: everything here is a
transformation applied unconditionally or by a local heuristic, not a
cost-based choice. The optimizer README lists the work precisely:
-subquery_planner()
pull up sublinks and subqueries from rangetable, if possible
canonicalize qual ...
simplify constant expressions
process sublinks
convert Vars of outer query levels into Params
In source terms, subquery_planner allocates a fresh PlannerInfo (called
root throughout the optimizer — the per-Query planning context), pulls up
sublinks and simple subqueries into the parent jointree, runs
preprocess_expression over the target list and quals (flatten join alias
Vars, canonicalize the qual into CNF-ish form, const-fold), reduces outer
joins to inner joins where provably safe, and removes useless RTE_RESULT
relations. Then it hands off to the upper planner and finalizes:
/* Make sure we've identified the cheapest Path for the final rel. */
set_cheapest(final_rel);
return root;
The mechanism of pull-up, qual canonicalization, and constant folding is the
subject of the preprocessing doc; here the load-bearing fact is the
boundary: by the time grouping_planner runs, the Query for this level
has been normalized into a form where subqueries that can participate in
the parent’s join search already have, and quals are in a canonical shape
the cost-based search can place precisely.
grouping_planner plans everything that sits above the scan-and-join
core: set operations (UNION/INTERSECT/EXCEPT), grouping and
aggregation, window functions, DISTINCT, ORDER BY, LIMIT/OFFSET, and
for non-SELECT statements the ModifyTable node. Its structure is a
pipeline: get the best scan/join paths from query_planner, then for each
upper operation that the query needs, build a new upper RelOptInfo whose
paths are the lower paths with the operation’s path node stacked on top.
// ... DISTINCT, ORDER BY, LIMIT, ModifyTable each stack their own paths ...
The repeating shape — current_rel = create_<op>_paths(root, current_rel, ...) — is the upper-planner idiom. Each step consumes the previous rel’s
paths and produces a new rel whose paths add one logical operation. Because
each step keeps a set of paths (via add_path), the planner can, for
example, carry both a hash-aggregate and a sort-then-group implementation of
GROUP BY forward and let the eventual ORDER BY step decide which is
cheaper end-to-end (a sort-based group may avoid a later sort). The detail
of each create_*_paths builder belongs to the plan-creation and cost
docs; the spine fact is that grouping_planner is where logical SQL clauses
become stacked physical path alternatives on a chain of upper rels, ending
at UPPERREL_FINAL.
Phase 3 — query_planner + make_one_rel: the scan-and-join core
query_planner (in planmain.c) is the heart of the System-R core. It does
not select a single best path — it cannot, because grouping_planner above
it still has sort/group/limit decisions that affect which scan/join path is
best overall. Instead it returns the final join RelOptInfo with all its
surviving paths and lets the caller choose. Its own job is everything needed
to set up the cost-based search:
if (!final_rel ||!final_rel->cheapest_total_path || ...)
elog(ERROR, "failed to construct the join relation");
return final_rel;
The ordering here is load-bearing and reflects the README’s “Order of
processing for EquivalenceClasses and PathKeys”: deconstruct_jointree
scans the quals and builds EquivalenceClasses from mergejoinable equality
clauses (a = b); only after all ECs are known (no more merging possible)
does the qp_callback build canonicalPathKeys for the query’s
ORDER BY/GROUP BY. Pathkeys must be canonical before path generation so
that two paths with the same sort order share the same pathkey list and can
be compared by pointer. There is also a fast path at the top of
query_planner for the trivial SELECT expr case (a single RTE_RESULT),
which builds one GroupResultPath and returns without the full machinery.
make_one_rel (in allpaths.c) is the two-step generator:
Everything above flows through three node types whose distinction is the
conceptual core of the optimizer.
A RelOptInfo represents one relation the planner reasons about —
either a base relation (a table, a subquery-in-FROM, a function-in-FROM) or
a join relation (a combination of base rels). The README’s invariant is
that there is exactly oneRelOptInfo per distinct set of base rels: the
join {A B C} has one RelOptInfo no matter whether it was built as
(A⋈B)⋈C or A⋈(B⋈C). Those alternative derivations are different Paths
that compete inside the sameRelOptInfo. The struct carries the
candidate paths and the cached winners:
List *indexlist; /* IndexOptInfo list, for index paths */
BlockNumber pages;
Cardinality tuples; /* size estimates from pg_class */
PlannerInfo *subroot; /* if a subquery: its own planner context */
// ... condensed ...
} RelOptInfo;
A Path is one way to produce the rows of its parent RelOptInfo.
It is deliberately lightweight — the planning-only sketch from the Common
Design section. Its base struct carries only what the search needs:
// struct Path — src/include/nodes/pathnodes.h
typedefstruct Path
{
NodeTag type;
NodeTag pathtype; /* tag identifying scan/join method */
RelOptInfo *parent; /* the relation this path can build */
PathTarget *pathtarget; /* list of Vars/Exprs, cost, width */
ParamPathInfo *param_info; /* parameterization info, or NULL */
bool parallel_aware;
bool parallel_safe;
int parallel_workers;
Cardinality rows; /* estimated number of result tuples */
int disabled_nodes;/* count of disabled-by-GUC nodes */
Cost startup_cost; /* cost before fetching any tuples */
Cost total_cost; /* total cost, all tuples fetched */
List *pathkeys; /* sort ordering of path's output */
} Path;
The README states the relationship to Plan exactly: “There is pretty nearly
a one-to-one correspondence between the Path and Plan trees, but Path nodes
omit info that won’t be needed during planning, and include info needed for
planning that won’t be needed by the executor.” pathtype is a NodeTag
(T_SeqScan, T_NestLoop, T_HashJoin, …) that tells create_plan which
executable node to build. Join paths are trees: a HashPath for {A B C}
has left and right subpaths for the two inputs, mirroring the eventual Plan.
A Plan is the executable operator node the executor consumes. There is
one Plan tree per statement, built once from the winning Path. The
distinction is not cosmetic: the pathlist of a single joinrel may hold
dozens of competing Paths during the search, but only one becomes a
Plan. Generating Paths instead of Plans during the search is what makes
the O(2^n) enumeration affordable.
flowchart LR
subgraph ROI["RelOptInfo {A B}"]
direction TB
PL["pathlist:"]
P1["HashPath (cost 120)"]
P2["MergePath (cost 140, sorted on x)"]
P3["NestPath (cost 900)"]
CT["cheapest_total_path → P1"]
end
P1 -.->|"survives add_path"| keep1["kept"]
P2 -.->|"kept: better sort order"| keep2["kept"]
P3 -.->|"dominated → pruned"| drop["discarded"]
Figure 2 — One RelOptInfo per relation set, many competing Paths
inside it. add_path keeps each path that is cheapest for some
cost/sort-order/parameterization niche and prunes the dominated ones; the
MergePath survives despite costing more because its sort order on x may
save a sort upstream. set_cheapest then caches the pointers
(cheapest_total_path, etc.). Only the eventual winner becomes a Plan.
Path generation is strictly bottom-up, which is what makes the
principle-of-optimality pruning sound. First, set_base_rel_pathlists
iterates the base relations and dispatches each by kind:
set_rel_pathlist switches on rel->rtekind / relkind to the right
builder — set_plain_rel_pathlist for an ordinary table (seqscan + index
paths + bitmap paths), set_foreign_pathlist for a foreign table,
set_function_pathlist for a function-in-FROM, set_subquery_pathlist
(actually run during set_rel_size) for a subquery, and so on. The content
of those builders — how a seqscan path is costed, how index paths are
chosen — is the subject of postgres-path-generation.md and
postgres-cost-model.md. The spine fact is what set_rel_pathlist does at
the end of every base rel:
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
Every base rel is finalized — its cheapest paths cached — before any join
path is built. Then make_rel_from_joinlist drives the join search. It
turns the joinlist into a list of initial_rels, recursing into any
sub-lists (FULL JOINs and collapse-limit-blocked joins stay nested), and
then dispatches by relation count:
The default standard_join_search is the System-R dynamic program. It
allocates join_rel_level[], seeds level 1 with the base/sub-list rels, and
builds each level from the ones below:
flowchart BT
A["{A}"]
B["{B}"]
C["{C}"]
AB["{A B}"]
BC["{B C}"]
AC["{A C}"]
ABC["{A B C}"]
A --> AB
B --> AB
B --> BC
C --> BC
A --> AC
C --> AC
AB --> ABC
BC --> ABC
AC --> ABC
Figure 3 — Dynamic-programming join search, bottom-up. Level 1 is the base
rels; join_search_one_level builds every 2-item joinrel, then every
3-item joinrel from the level below, up to the single top rel. Each joinrel
is set_cheapest-finalized before any rel that contains it is built — the
property the README calls out: “we will finish constructing all paths for a
given relation before we construct any paths for relations containing that
rel.” The enumeration rules (which pairs, left/right/bushy, legality of
outer-join reordering) live in postgres-join-ordering.md.
The README explains why this ordering matters: because every contained
rel is fully finalized first, the planner “can reliably identify the
cheapest path for each rel before higher-level relations need to know that”
and “can safely discard a path when we find that another path for the same
rel is better, without worrying that maybe there is already a reference to
that path in some higher-level join path.” That is the principle of
optimality made into a memory-management guarantee.
Cheapest-path selection: add_path and set_cheapest
Two functions enforce the pruning discipline. add_path is called every
time a candidate path is generated; it inserts the new path into the rel’s
pathlist and removes any now-dominated paths (and may reject the new
path if it is itself dominated). Domination is multi-dimensional: a path
dominates another only if it is no more expensive on the relevant cost
metric and no worse on sort order (pathkeys) and no more
parameterized. That is why the MergePath in Figure 2 survives a cheaper
HashPath — its sort order is a dimension the hash path loses on.
Once all paths for a rel are added, set_cheapest walks the surviving
pathlist once and caches the winners into the RelOptInfo:
Note the tie-break: when two unparameterized paths cost the same,
set_cheapest keeps the better-sorted one, again preserving sort order as a
first-class property. The result is the cheapest_total_path,
cheapest_startup_path, and cheapest_parameterized_paths that higher
levels (and ultimately get_cheapest_fractional_path at the top) read. The
cost formulas behind compare_path_costs are postgres-cost-model.md’s
subject; the spine fact is that pruning happens continuously during the
search (add_path) and the winners are cached per niche (set_cheapest).
Phase 4 — create_plan: the winning Path becomes a Plan
After subquery_planner returns, standard_planner has the final
RelOptInfo and picks best_path. create_plan runs once to translate the
one winning Path tree into the executable Plan tree:
plan =create_plan_recurse(root, best_path, CP_EXACT_TLIST);
// ... condensed: label top tlist with original column names,
// attach initPlans, check NestLoopParams ...
return plan;
create_plan_recurse switches on best_path->pathtype and dispatches to
the matching builder, recursing into subpaths first (so children are built
before parents):
create_scan_plan shows the Path→Plan translation concretely: it pulls the
restriction clauses out of the parent RelOptInfo (baserestrictinfo, or
the index’s indrestrictinfo for an index scan), appends parameterized
join clauses if the path is parameterized, decides the output tlist, and
builds the executable scan node:
if (best_path->param_info) /* parameterized: add join clauses */
scan_clauses =list_concat_copy(scan_clauses,
best_path->param_info->ppi_clauses);
// ... choose tlist (physical vs. needed), build the scan Plan node ...
The per-pathtype builder bodies — how a NestPath becomes a NestLoop,
how parameterized scans wire up nestloop params, how gating Result nodes
get inserted for pseudoconstant quals — are the subject of
postgres-plan-creation.md. The spine fact is that create_plan is a
deterministic structural walk: no cost decisions remain, because the
search already chose the winner. The Plan tree it emits is what
postgres-executor.md picks up.
flowchart TB
BP["best_path (winning Path tree)"]
CPR["create_plan_recurse<br/>(switch on pathtype)"]
SCAN["create_scan_plan<br/>(SeqScan, IndexScan, ...)"]
JOIN["create_join_plan<br/>(NestLoop, MergeJoin, HashJoin)"]
OTHER["create_append/sort/agg/limit/...plan"]
PLAN["Plan tree (executable)"]
BP --> CPR
CPR --> SCAN
CPR --> JOIN
CPR --> OTHER
CPR -.->|"recurse into subpaths first"| CPR
SCAN --> PLAN
JOIN --> PLAN
OTHER --> PLAN
Figure 4 — create_plan as a top-down structural walk of the one winning
Path tree. create_plan_recurse dispatches on pathtype, recursing into
child subpaths before building each parent node, so the executable Plan
tree is assembled bottom-up from a single chosen Path tree. No costing
happens here — the search is over.
The optimizer is large (the six files cited here total ~29,000 lines), but
the spine — entry, the three phases, path generation, pruning, emission —
is a small set of stable functions. They are grouped below by the call-flow
the figures above trace. The per-AM path builders, the join enumeration
rules, the cost formulas, and the per-node Path→Plan bodies live in the
sibling docs; what follows is the skeleton that ties them together.
standard_planner — the real entry point behind the planner hook
wrapper. Builds the per-invocation PlannerGlobal, computes
tuple_fraction from cursorOptions, calls subquery_planner for the
top Query, then does the one-shot emission: fetch_upper_rel(root, UPPERREL_FINAL, NULL) → get_cheapest_fractional_path → create_plan.
Everything after that (set_plan_references, building the PlannedStmt)
is post-processing, not search.
get_cheapest_fractional_path — picks the winner from the final rel’s
pathlist, honoring tuple_fraction: a plan that starts fast can beat a
plan that is cheaper end-to-end when the caller will fetch only part of
the result (a DECLARE CURSOR, a LIMIT). This is the reason the search
keeps cheapest_startup_path alongside cheapest_total_path.
subquery_planner — invoked once per Query node, recursing for
sub-Queries. Allocates the PlannerInfo (root), pulls up sublinks and
simple subqueries (pull_up_sublinks, pull_up_subqueries), runs
preprocess_expression over targetlist and quals, reduces outer joins
(reduce_outer_joins), removes useless result RTEs, then calls
grouping_planner and finishes with SS_charge_for_initplans +
set_cheapest(final_rel). The mechanism of pull-up and qual
canonicalization is postgres preprocessing material; the boundary fact
is that the Query is normalized before any cost-based work begins.
grouping_planner — plans everything above the scan/join core. Calls
query_planner to get current_rel (the best scan/join paths), builds
the final PathTarget, then stacks upper operations as new upper rels:
create_grouping_paths (GROUP BY / aggregation, both hashed and
sorted variants), create_window_paths (window functions),
create_distinct_paths (DISTINCT), create_ordered_paths (the
final ORDER BY), plus LIMIT and, for DML, the ModifyTable path. The
current_rel = create_<op>_paths(root, current_rel, ...) idiom is the
through-line: each step consumes a rel’s pathlist and emits a new rel with
one more operation stacked on, ending at UPPERREL_FINAL.
Phase 3 — the scan-and-join core — planmain.c + allpaths.c
query_planner (planmain.c) — sets up the cost-based search but does
not choose a single path. Builds base-rel RelOptInfos
(add_base_rels_to_query), distributes quals and builds
EquivalenceClasses (deconstruct_jointree,
reconsider_outer_join_clauses, generate_base_implied_equalities), then
— only once ECs are final — invokes the qp_callback to build canonicalPathKeys, and finally calls make_one_rel. The EC-before-pathkeys
ordering is mandated by the README’s “Order of processing for
EquivalenceClasses and PathKeys”.
make_one_rel (allpaths.c) — the two-pass generator:
set_base_rel_sizes → set_base_rel_pathlists (all per-table access
paths) → make_rel_from_joinlist (all join paths). Returns the single top
joinrel covering all_query_rels.
set_base_rel_pathlists / set_rel_pathlist — iterate base rels and
dispatch each by rtekind/relkind to the right builder
(set_plain_rel_pathlist, set_foreign_pathlist,
set_function_pathlist, …). set_rel_pathlist ends every base rel with
the set_rel_pathlist_hook extension point, optional
generate_useful_gather_paths (parallel), and set_cheapest(rel) — so
every base rel is finalized before any join consumes it.
make_rel_from_joinlist — turns the joinlist into initial_rels and
dispatches by relation count: a single item returns directly; otherwise it
honors join_search_hook, falls to geqo when enable_geqo && levels_needed >= geqo_threshold, else standard_join_search.
standard_join_search — the System-R dynamic program. Allocates
join_rel_level[], seeds level 1 with initial_rels, and for each level
lev from 2 up calls join_search_one_level(root, lev) then
generate_partitionwise_join_paths, optional
generate_useful_gather_paths, and set_cheapest on each just-built
joinrel. The level-by-level loop is what guarantees every contained rel is
finalized before any rel containing it — the principle-of-optimality
memory guarantee. The enumeration rules inside join_search_one_level
are postgres-join-ordering.md.
add_path — called on every generated path. Inserts it into the rel’s
pathlist and removes paths it now dominates (or rejects the newcomer if
dominated). Domination is multi-dimensional: cheaper-or-equal on the cost
metric and no worse on pathkeysand no more parameterized. This
is why a costlier but better-sorted path survives.
add_path_precheck — the cheap pre-test a builder calls before
spending effort to construct a path it would only discard.
set_cheapest — after all paths are added, walks the surviving
pathlist once and caches cheapest_startup_path, cheapest_total_path,
and cheapest_parameterized_paths, with a sort-order tie-break among
equal-cost unparameterized paths.
create_plan — runs once on the chosen Path. Calls
create_plan_recurse(root, best_path, CP_EXACT_TLIST), then labels the top
tlist with the query’s output column names and attaches initPlans.
create_plan_recurse — switches on best_path->pathtype and dispatches
to create_scan_plan (all scan tags), create_join_plan (NestLoop /
MergeJoin / HashJoin), create_append_plan, and the upper-node builders
(Result, Sort, Agg, Limit, ModifyTable, …), recursing into subpaths first.
create_scan_plan — the concrete Path→Plan translation for a scan:
pulls the restriction clauses from the parent rel (baserestrictinfo, or
the index’s indrestrictinfo), appends ppi_clauses if parameterized,
chooses the output tlist, and builds the executable scan node. The
per-node bodies are postgres-plan-creation.md; the spine fact is that no
cost decision remains here — the walk is purely structural.
All checks below were made against /data/hgryoo/references/postgres at
branch REL_18_STABLE, commit 273fe94852b3a7e34fd171e8abdf1481beb302fa
(PostgreSQL 18.x, committed 2026-06-05).
standard_planner separates search from emission with three distinct
statements. Verified at planner.c lines 456–459:
final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL),
best_path = get_cheapest_fractional_path(final_rel, tuple_fraction),
top_plan = create_plan(root, best_path). The Path search and the single
create_plan call are textually separate; emission consumes one winner.
subquery_planner calls grouping_planner and then set_cheapest on
the final upper rel. Verified by reading subquery_planner
(planner.c line 669): after grouping_planner, it fetches
UPPERREL_FINAL, charges init-plans, and calls set_cheapest(final_rel)
before returning root. Confirms that path selection at this level is
deferred to the caller via the cached cheapest pointers.
make_one_rel is genuinely two-pass. Verified at allpaths.c line
171: set_base_rel_sizes then set_base_rel_pathlists then
make_rel_from_joinlist, with Assert(bms_equal(rel->relids, root->all_query_rels)) on the result. All base-rel access paths exist
before any join path is built.
The DP loop finalizes each level before building the next. Verified by
reading standard_join_search (allpaths.c line 3457): the for (lev = 2; lev <= levels_needed; lev++) loop calls join_search_one_level, then
per joinrel runs generate_partitionwise_join_paths,
generate_useful_gather_paths (except for the topmost rel), and
set_cheapest. This is the principle-of-optimality memory guarantee the
README describes.
GEQO is gated by enable_geqo && levels_needed >= geqo_threshold.
Verified at allpaths.c (the make_rel_from_joinlist dispatch, line
3352). geqo_threshold is a real GUC declared at allpaths.c line 80 and
registered in guc_tables.c (line 2223); its shipped default is 12.
Below the threshold (and absent a join_search_hook),
standard_join_search runs.
The README states the Path↔Plan correspondence verbatim. Verified at
src/backend/optimizer/README line 23: “There is pretty nearly a
one-to-one correspondence between the [Path and Plan trees], but Path
nodes omit info that won’t be needed during planning, and include info
needed for planning that won’t be needed by the executor.” This is the
authority for the RelOptInfo/Path/Plan framing in §“PostgreSQL’s Approach”.
The EC-before-PathKey ordering is documented, not incidental. Verified
at README “Order of processing for EquivalenceClasses and PathKeys”
(line 1026): ECs are built during deconstruct_jointree /
reconsider_outer_join_clauses; only after no further merging is possible
are they “canonical” and PathKeys built. This is why query_planner
invokes the qp_callbackaftergenerate_base_implied_equalities.
struct Path carries only planning-relevant fields, not execution
detail. Verified at pathnodes.h line 1753: the base struct holds
pathtype, parent, pathtarget, param_info, parallel flags, rows,
disabled_nodes, startup_cost, total_cost, pathkeys — and no
executor state. RelOptInfo (line 883) holds the pathlist,
partial_pathlist, and the four cheapest_* slots quoted in the body.
disabled_nodes vs. legacy disable_cost in path comparison. The
Path struct (line 1753) carries an integer disabled_nodes count, and
compare_path_costs_fuzzily weighs it ahead of cost. The exact tie-break
ordering between disabled_nodes, startup_cost, and total_cost inside
add_path / set_cheapest was not fully traced here. Investigation path:
read compare_path_costs_fuzzily and compare_path_costs in
pathnode.c and confirm disabled_nodes is strictly dominant. (Belongs
to postgres-cost-model.md.)
Upper-rel parameterization. Whether any UPPERREL_* rel ever carries
parameterized paths (vs. only scan/join rels) was not verified. The body
treats parameterization as a scan/join-level property. Investigation
path: grep create_*_paths in planner.c for param_info usage on
upper paths.
Exact set of pathtype tags reaching create_plan_recurse. The body
lists the major cases; the full switch in create_plan_recurse
(createplan.c line 388) has more (e.g., T_TidScan,
T_TableFuncScan, T_NamedTuplestoreScan, T_CustomScan). The complete
enumeration is deferred to postgres-plan-creation.md.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
System R / Selinger (SIGMOD 1979).Access Path Selection in a
Relational Database Management System is the direct ancestor of this
optimizer — bottom-up DP over relation subsets, interesting orders, a
catalog-statistics cost model. Captured at
research/dbms-papers/systemr-optimizer.md. PostgreSQL’s Path,
pathkeys, add_path, and standard_join_search are the line-for-line
descendants; the one structural addition is the explicit Path
abstraction interposed between the search and the executable plan.
Fong, The Design and Implementation of the POSTGRES Query Optimizer
(UCB MS thesis). Documents how the System-R cost model was adapted into
the Berkeley POSTGRES codebase whose descendant is today’s
src/backend/optimizer/. The naming lineage (RelOptInfo, Path,
RestrictInfo) traces here. A focused read would pin down which modern
behaviors are original POSTGRES versus later additions.
Cascades / Volcano (Graefe). The dominant alternative to bottom-up DP:
a top-down, memoized, rule-driven search over a memo of equivalence
groups, used by SQL Server and Apache Calcite. PostgreSQL’s RelOptInfo
is the analogue of a memo group, but the search is bottom-up DP rather
than top-down transformation rules. A side-by-side note would clarify what
PostgreSQL trades by not having a transformation-rule engine (harder to
add new algebraic rewrites; simpler, more predictable search).
GEQO — the genetic large-join escape hatch. Above geqo_threshold
(default 12) PostgreSQL abandons exhaustive DP for a genetic algorithm
(geqo). The classic comparison target is the literature on randomized
join enumeration (Ioannidis & Kang, simulated annealing / iterative
improvement). A dedicated note in postgres-join-ordering.md should
measure plan quality vs. planning time at the crossover.
Adaptive / learned query optimization. Modern research (Bao,
Neo, Balsa; adaptive execution in Spark/Presto) replaces or augments the
static cost model with runtime feedback or learned cost estimators.
PostgreSQL’s optimizer is fully static — it commits to one plan before
execution. The Path/cost split is exactly the seam where a learned cost
estimator would be injected; this is the natural frontier note once
postgres-cost-model.md lands.
src/backend/optimizer/README — the optimizer’s own design document
(Path↔Plan correspondence; EC/PathKey processing order; the
“finish all paths for a rel before any containing rel” guarantee).
Theory and papers:
Selinger et al., Access Path Selection in a Relational Database
Management System (SIGMOD 1979) — research/dbms-papers/systemr-optimizer.md.
Fong, The Design and Implementation of the POSTGRES Query Optimizer
(UC Berkeley MS thesis).