Skip to content

PostgreSQL Planner — From Query Tree to Plan: Paths, Cost, and the System-R Lineage

Contents:

A query optimizer answers one question: given a declarative SQL statement that says what result is wanted, which of the exponentially many physical execution strategies that all compute that result should we run? The relational model guarantees that many physically distinct plans are logically equivalent — a join can be a nested loop, a sort-merge, or a hash join; a table can be reached by sequential scan or by any of its indexes; a three-way join can be associated and commuted into many shapes. Database System Concepts (Silberschatz, 7e, ch. 16 “Query Optimization”) frames this as two coupled sub-problems: plan-space enumeration (which equivalent plans to consider) and cost estimation (a numeric model that ranks them so we can pick one without running them). Database Internals (Petrov, ch. 7) draws the same boundary between a rule-based rewrite stage that applies always-good transformations and a cost-based stage that searches a space of alternatives under a cost model.

The canonical realization of cost-based optimization is the System R optimizer, described in Selinger et al., “Access Path Selection in a Relational Database Management System” (SIGMOD 1979, captured at research/dbms-papers/systemr-optimizer.md). System R established the template that PostgreSQL still follows almost line for line:

  1. Enumerate access paths per table. For each relation, consider the sequential scan and every applicable index, and estimate the cost of each using catalog statistics.
  2. Estimate selectivity. Each predicate has a selectivity — the fraction of rows it passes — and the product of selectivities estimates how many rows survive. Cardinality estimates drive every downstream cost.
  3. Enumerate join orders by dynamic programming. Rather than evaluate all n! orderings, System R builds the best plan for every subset of relations bottom-up: best 1-table plans, then best 2-table plans built from them, then 3-table, up to the full set. The key pruning rule is the principle of optimality — the cheapest plan for a set of relations is built from cheapest plans for its subsets — which collapses a factorial search to roughly 2^n.
  4. Carry “interesting orders.” A plan that is more expensive but emits rows already sorted on a column needed later (a merge-join key, an ORDER BY, a GROUP BY) may still win overall by avoiding a later sort. System R therefore does not keep only the single cheapest plan per relation set; it keeps the cheapest plan for each interesting sort order plus the cheapest unordered plan. This is the seed of PostgreSQL’s pathkeys.

Selinger’s design space leaves several knobs that every cost-based optimizer must set, and these are the choices to watch as the rest of this document unfolds:

  • What is the unit of the search? System R searches over plans directly. PostgreSQL interposes an intermediate abstraction — the Path — a lightweight plan sketch carrying just cost and sort order, cheaper to generate and discard than a full executable plan.
  • How exhaustive is the join search? Pure DP is exponential in the number of relations. Every engine needs a fallback for large joins (PostgreSQL’s is the genetic optimizer, GEQO).
  • How are equivalent plans pruned during the search? The pruning rule decides which “almost as good” plans survive to compete one level up.

The Berkeley POSTGRES project re-implemented this template with its own extensions; Fong’s MS thesis “The Design and Implementation of the POSTGRES Query Optimizer” (UCB) documents how the System-R cost model was adapted into the codebase whose direct descendant is the modern src/backend/optimizer/ tree. The vocabulary — RelOptInfo, Path, pathkeys, add_path — traces straight back to that work.

Scope. This document is the spine and the map of that optimizer: the three-phase control flow, the RelOptInfo/Path/Plan distinction, bottom-up path generation, cheapest-path selection, and Path→Plan conversion. The mechanism of each major piece is deferred to a sibling doc: the per-AM path builders to postgres-path-generation.md, the join-order DP and join_search internals to postgres-join-ordering.md, the selectivity and cost formulas to postgres-cost-model.md, the per-node Path→Plan translation detail to postgres-plan-creation.md, and what the executor does with the finished Plan to postgres-executor.md.

Between “the System R template” and “the actual PostgreSQL symbols” sits a layer of engineering conventions that nearly every serious cost-based optimizer adopts. Naming them first means the PostgreSQL symbols in the next section read as instances of known patterns rather than inventions.

Three logical stages: rewrite, plan-space search, physical plan emission

Section titled “Three logical stages: rewrite, plan-space search, physical plan emission”

Optimizers are almost universally staged. A rewrite / normalization stage applies transformations that are (almost) always beneficial and need no cost comparison: flattening nested AND/OR, folding constant expressions, pulling simple subqueries up into the parent so they join in the same search, pushing predicates down toward scans. A cost-based search stage then enumerates genuinely alternative physical strategies and ranks them. Finally an emission stage materializes the single winner into whatever the execution engine consumes. Conflating these stages is a classic source of bugs: a “rewrite” that is only sometimes beneficial belongs in the cost-based stage instead.

A lightweight plan abstraction distinct from the executable plan

Section titled “A lightweight plan abstraction distinct from the executable plan”

Generating a full executable operator tree for every candidate is wasteful when most candidates will be discarded. The standard fix is a two-level representation: a cheap, planning-only node that carries exactly the information the search needs (estimated cost, output cardinality, output sort order, required parameters) and omits execution detail, plus a heavier executable node built only for the chosen plan. SQL Server’s optimizer has its Group/GroupExpression memo; Calcite has RelNode with RelTraits; PostgreSQL has Path versus Plan. The discipline is the same: search over the cheap thing, build the expensive thing once.

One canonical container per relation, holding many candidate plans

Section titled “One canonical container per relation, holding many candidate plans”

The DP search needs a place to accumulate “all the ways to produce this set of relations” and to record the winner(s) so higher levels can reuse them. The convention is a per-relation-set structure (System R’s “best plans” table entry; the memo group in Cascades-style optimizers) that owns a list of candidate plans and a cached pointer to the cheapest. Crucially there is exactly one such container per logically distinct relation set, so that plans arriving from different sub-derivations of the same set compete against each other in one place.

Pruning by cost and by interesting physical properties

Section titled “Pruning by cost and by interesting physical properties”

If the search kept only the single cheapest plan per relation set, it would throw away plans whose sort order saves a sort upstream. The universal refinement is to prune along all dimensions that matter to a consumer: keep the cheapest plan for each distinct physical property combination (sort order, and in PostgreSQL also parameterization). A plan is discarded only if some surviving plan is at least as cheap and at least as good on every physical property. This is “interesting orders” generalized.

Bottom-up dynamic programming with a large-query escape hatch

Section titled “Bottom-up dynamic programming with a large-query escape hatch”

The DP join search is O(2^n) in time and space — fine for a dozen relations, ruinous for fifty. Every DP optimizer therefore pairs it with a heuristic search (greedy, randomized, or genetic) that trades optimality for tractability above a threshold. The optimizer picks the strategy by relation count.

Theory / conventionPostgreSQL name
Rewrite/normalize stagesubquery_planner (pull-up, canonicalize, const-fold)
Upper logical operations (group/window/sort/limit)grouping_planner
Scan-and-join cost-based corequery_plannermake_one_rel
Per-relation-set “best plans” containerRelOptInfo (one per base rel and per joinrel)
Lightweight planning-only plan sketchPath (and subtypes: IndexPath, NestPath, HashPath, …)
Executable operator nodePlan (and subtypes: SeqScan, HashJoin, …)
Estimated output cardinalityRelOptInfo.rows, Path.rows
Cost of a candidatePath.startup_cost, Path.total_cost
Interesting sort orderPath.pathkeys (list of PathKey)
Add-and-prune a candidateadd_path
Pick winner(s) per nicheset_cheapest (fills cheapest_total_path etc.)
DP join enumerationstandard_join_search / join_search_one_level
Large-query escape hatchgeqo (genetic optimizer, above geqo_threshold)
Path → Plan emissioncreate_plan / create_plan_recurse

The single most useful mental model of the PostgreSQL optimizer is a stack of three functions, each handling one logical layer and delegating the rest inward. The file comment in planmain.c states the division of labor exactly: planner.c deals with “subselects, inheritance, aggregates, grouping, and so on,” while planmain.c is “the main code for planning a basic join operation, shorn of” those features.

flowchart TB
  P["planner / standard_planner<br/>(per-statement entry; sets up PlannerGlobal)"]
  SQP["subquery_planner<br/>(one per Query level: pull-up, canonicalize,<br/>const-fold, outer-join reduction)"]
  GP["grouping_planner<br/>(upper ops: set-ops, GROUP/agg, window,<br/>DISTINCT, ORDER BY, LIMIT, ModifyTable)"]
  QP["query_planner<br/>(scan+join core setup: base rels, ECs,<br/>qual distribution, pathkeys)"]
  MOR["make_one_rel<br/>(generate all scan+join Paths bottom-up)"]
  CP["create_plan<br/>(turn the one cheapest Path into a Plan)"]

  P --> SQP
  SQP --> GP
  GP --> QP
  QP --> MOR
  MOR -.->|"returns final joinrel<br/>with cheapest paths"| GP
  GP -.->|"layers upper paths onto<br/>UPPERREL_FINAL"| SQP
  SQP -.->|"returns PlannerInfo root"| P
  P --> CP
  CP -.->|"PlannedStmt"| P

Figure 1 — The three nested planning phases plus emission. planner is the external entry point; it calls subquery_planner once per Query level (recursing for sub-Queries), which calls grouping_planner for the upper operations, which calls query_planner/make_one_rel for the scan-and-join core. Paths flow back up; only after the whole Path tree exists does planner call create_plan once to emit the executable Plan.

The external entry point is thin. planner is just a hook wrapper; the real work starts in standard_planner, which sets up the per-invocation PlannerGlobal, decides whether parallelism is feasible, derives the tuple_fraction (how much of the result the caller is likely to consume), and then calls subquery_planner. After planning returns, it does the Path→Plan conversion:

// standard_planner — src/backend/optimizer/plan/planner.c
/* primary planning entry point (may recurse for subqueries) */
root = subquery_planner(glob, parse, NULL, false, tuple_fraction, NULL);
/* Select best Path and turn it into a Plan */
final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
top_plan = create_plan(root, best_path);

Three things to notice. First, planning and emission are cleanly separated: the entire search produces a RelOptInfo (final_rel) full of competing Paths; create_plan runs once, on the single winner. Second, the winner is chosen by get_cheapest_fractional_path, which respects the tuple_fraction — a cursor that will fetch only a few rows can prefer a fast-start plan over a cheaper-overall one. Third, final_rel is fetched from the upper relations registry (UPPERREL_FINAL), not from the join search directly; the upper rels are where grouping_planner stacks the post-join operations.

Phase 1 — subquery_planner: preprocess one Query level

Section titled “Phase 1 — subquery_planner: preprocess one Query level”

subquery_planner is invoked once per Query node and recurses for sub-Queries. It is the rewrite/normalize stage: everything here is a transformation applied unconditionally or by a local heuristic, not a cost-based choice. The optimizer README lists the work precisely:

-subquery_planner()
pull up sublinks and subqueries from rangetable, if possible
canonicalize qual ...
simplify constant expressions
process sublinks
convert Vars of outer query levels into Params

In source terms, subquery_planner allocates a fresh PlannerInfo (called root throughout the optimizer — the per-Query planning context), pulls up sublinks and simple subqueries into the parent jointree, runs preprocess_expression over the target list and quals (flatten join alias Vars, canonicalize the qual into CNF-ish form, const-fold), reduces outer joins to inner joins where provably safe, and removes useless RTE_RESULT relations. Then it hands off to the upper planner and finalizes:

// subquery_planner — src/backend/optimizer/plan/planner.c
/* Do the main planning. */
grouping_planner(root, tuple_fraction, setops);
// ... condensed: SS_identify_outer_params, init-plan charging ...
final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
SS_charge_for_initplans(root, final_rel);
/* Make sure we've identified the cheapest Path for the final rel. */
set_cheapest(final_rel);
return root;

The mechanism of pull-up, qual canonicalization, and constant folding is the subject of the preprocessing doc; here the load-bearing fact is the boundary: by the time grouping_planner runs, the Query for this level has been normalized into a form where subqueries that can participate in the parent’s join search already have, and quals are in a canonical shape the cost-based search can place precisely.

Phase 2 — grouping_planner: the upper operations

Section titled “Phase 2 — grouping_planner: the upper operations”

grouping_planner plans everything that sits above the scan-and-join core: set operations (UNION/INTERSECT/EXCEPT), grouping and aggregation, window functions, DISTINCT, ORDER BY, LIMIT/OFFSET, and for non-SELECT statements the ModifyTable node. Its structure is a pipeline: get the best scan/join paths from query_planner, then for each upper operation that the query needs, build a new upper RelOptInfo whose paths are the lower paths with the operation’s path node stacked on top.

// grouping_planner — src/backend/optimizer/plan/planner.c
/*
* Generate the best unsorted and presorted paths for the scan/join
* portion of this Query, ie the processing represented by the
* FROM/WHERE clauses.
*/
current_rel = query_planner(root, standard_qp_callback, &qp_extra);
// ... condensed: build final_target (PathTarget), apply scan/join target ...
if (have_grouping)
current_rel = create_grouping_paths(root, current_rel, ...);
if (activeWindows)
current_rel = create_window_paths(root, current_rel, ...);
// ... DISTINCT, ORDER BY, LIMIT, ModifyTable each stack their own paths ...

The repeating shape — current_rel = create_<op>_paths(root, current_rel, ...) — is the upper-planner idiom. Each step consumes the previous rel’s paths and produces a new rel whose paths add one logical operation. Because each step keeps a set of paths (via add_path), the planner can, for example, carry both a hash-aggregate and a sort-then-group implementation of GROUP BY forward and let the eventual ORDER BY step decide which is cheaper end-to-end (a sort-based group may avoid a later sort). The detail of each create_*_paths builder belongs to the plan-creation and cost docs; the spine fact is that grouping_planner is where logical SQL clauses become stacked physical path alternatives on a chain of upper rels, ending at UPPERREL_FINAL.

Phase 3 — query_planner + make_one_rel: the scan-and-join core

Section titled “Phase 3 — query_planner + make_one_rel: the scan-and-join core”

query_planner (in planmain.c) is the heart of the System-R core. It does not select a single best path — it cannot, because grouping_planner above it still has sort/group/limit decisions that affect which scan/join path is best overall. Instead it returns the final join RelOptInfo with all its surviving paths and lets the caller choose. Its own job is everything needed to set up the cost-based search:

// query_planner — src/backend/optimizer/plan/planmain.c
/* Construct RelOptInfo nodes for all base relations used in the query. */
add_base_rels_to_query(root, (Node *) parse->jointree);
// ... condensed ...
build_base_rel_tlists(root, root->processed_tlist);
find_placeholders_in_jointree(root);
find_lateral_references(root);
joinlist = deconstruct_jointree(root); /* split quals; build ECs */
reconsider_outer_join_clauses(root);
generate_base_implied_equalities(root);
(*qp_callback) (root, qp_extra); /* now build canonical pathkeys */
// ... join removal, placeholder distribution, appendrel expansion ...
/* Ready to do the primary planning. */
final_rel = make_one_rel(root, joinlist);
if (!final_rel || !final_rel->cheapest_total_path || ...)
elog(ERROR, "failed to construct the join relation");
return final_rel;

The ordering here is load-bearing and reflects the README’s “Order of processing for EquivalenceClasses and PathKeys”: deconstruct_jointree scans the quals and builds EquivalenceClasses from mergejoinable equality clauses (a = b); only after all ECs are known (no more merging possible) does the qp_callback build canonical PathKeys for the query’s ORDER BY/GROUP BY. Pathkeys must be canonical before path generation so that two paths with the same sort order share the same pathkey list and can be compared by pointer. There is also a fast path at the top of query_planner for the trivial SELECT expr case (a single RTE_RESULT), which builds one GroupResultPath and returns without the full machinery.

make_one_rel (in allpaths.c) is the two-step generator:

// make_one_rel — src/backend/optimizer/path/allpaths.c
set_base_rel_sizes(root); /* size estimates per base rel */
// ... compute total_table_pages ...
/* Generate access paths for each base rel. */
set_base_rel_pathlists(root);
/* Generate access paths for the entire join tree. */
rel = make_rel_from_joinlist(root, joinlist);
Assert(bms_equal(rel->relids, root->all_query_rels));
return rel;

So the scan-and-join core is genuinely two passes: build all the per-table access paths first, then build all the join paths on top of them.

The data-structure trinity: RelOptInfo, Path, Plan

Section titled “The data-structure trinity: RelOptInfo, Path, Plan”

Everything above flows through three node types whose distinction is the conceptual core of the optimizer.

A RelOptInfo represents one relation the planner reasons about — either a base relation (a table, a subquery-in-FROM, a function-in-FROM) or a join relation (a combination of base rels). The README’s invariant is that there is exactly one RelOptInfo per distinct set of base rels: the join {A B C} has one RelOptInfo no matter whether it was built as (A⋈B)⋈C or A⋈(B⋈C). Those alternative derivations are different Paths that compete inside the same RelOptInfo. The struct carries the candidate paths and the cached winners:

// struct RelOptInfo — src/include/nodes/pathnodes.h
typedef struct RelOptInfo
{
// ... condensed ...
Relids relids; /* set of base + OJ relids in this rel */
Cardinality rows; /* estimated number of result tuples */
struct PathTarget *reltarget;/* columns this rel must output */
List *pathlist; /* Path structures */
List *partial_pathlist; /* partial (parallel) Paths */
struct Path *cheapest_startup_path;
struct Path *cheapest_total_path;
struct Path *cheapest_unique_path;
List *cheapest_parameterized_paths;
Index relid; /* (base rels only) rangetable index */
RTEKind rtekind; /* RELATION, SUBQUERY, FUNCTION, ... */
List *indexlist; /* IndexOptInfo list, for index paths */
BlockNumber pages;
Cardinality tuples; /* size estimates from pg_class */
PlannerInfo *subroot; /* if a subquery: its own planner context */
// ... condensed ...
} RelOptInfo;

A Path is one way to produce the rows of its parent RelOptInfo. It is deliberately lightweight — the planning-only sketch from the Common Design section. Its base struct carries only what the search needs:

// struct Path — src/include/nodes/pathnodes.h
typedef struct Path
{
NodeTag type;
NodeTag pathtype; /* tag identifying scan/join method */
RelOptInfo *parent; /* the relation this path can build */
PathTarget *pathtarget; /* list of Vars/Exprs, cost, width */
ParamPathInfo *param_info; /* parameterization info, or NULL */
bool parallel_aware;
bool parallel_safe;
int parallel_workers;
Cardinality rows; /* estimated number of result tuples */
int disabled_nodes;/* count of disabled-by-GUC nodes */
Cost startup_cost; /* cost before fetching any tuples */
Cost total_cost; /* total cost, all tuples fetched */
List *pathkeys; /* sort ordering of path's output */
} Path;

The README states the relationship to Plan exactly: “There is pretty nearly a one-to-one correspondence between the Path and Plan trees, but Path nodes omit info that won’t be needed during planning, and include info needed for planning that won’t be needed by the executor.” pathtype is a NodeTag (T_SeqScan, T_NestLoop, T_HashJoin, …) that tells create_plan which executable node to build. Join paths are trees: a HashPath for {A B C} has left and right subpaths for the two inputs, mirroring the eventual Plan.

A Plan is the executable operator node the executor consumes. There is one Plan tree per statement, built once from the winning Path. The distinction is not cosmetic: the pathlist of a single joinrel may hold dozens of competing Paths during the search, but only one becomes a Plan. Generating Paths instead of Plans during the search is what makes the O(2^n) enumeration affordable.

flowchart LR
  subgraph ROI["RelOptInfo {A B}"]
    direction TB
    PL["pathlist:"]
    P1["HashPath (cost 120)"]
    P2["MergePath (cost 140, sorted on x)"]
    P3["NestPath (cost 900)"]
    CT["cheapest_total_path → P1"]
  end
  P1 -.->|"survives add_path"| keep1["kept"]
  P2 -.->|"kept: better sort order"| keep2["kept"]
  P3 -.->|"dominated → pruned"| drop["discarded"]

Figure 2 — One RelOptInfo per relation set, many competing Paths inside it. add_path keeps each path that is cheapest for some cost/sort-order/parameterization niche and prunes the dominated ones; the MergePath survives despite costing more because its sort order on x may save a sort upstream. set_cheapest then caches the pointers (cheapest_total_path, etc.). Only the eventual winner becomes a Plan.

Path generation is strictly bottom-up, which is what makes the principle-of-optimality pruning sound. First, set_base_rel_pathlists iterates the base relations and dispatches each by kind:

// set_base_rel_pathlists — src/backend/optimizer/path/allpaths.c
for (rti = 1; rti < root->simple_rel_array_size; rti++)
{
RelOptInfo *rel = root->simple_rel_array[rti];
if (rel == NULL) continue; /* empty slot */
if (rel->reloptkind != RELOPT_BASEREL) /* skip "other rels" */
continue;
set_rel_pathlist(root, rel, rti, root->simple_rte_array[rti]);
}

set_rel_pathlist switches on rel->rtekind / relkind to the right builder — set_plain_rel_pathlist for an ordinary table (seqscan + index paths + bitmap paths), set_foreign_pathlist for a foreign table, set_function_pathlist for a function-in-FROM, set_subquery_pathlist (actually run during set_rel_size) for a subquery, and so on. The content of those builders — how a seqscan path is costed, how index paths are chosen — is the subject of postgres-path-generation.md and postgres-cost-model.md. The spine fact is what set_rel_pathlist does at the end of every base rel:

// set_rel_pathlist (tail) — src/backend/optimizer/path/allpaths.c
if (set_rel_pathlist_hook)
(*set_rel_pathlist_hook) (root, rel, rti, rte); /* extension point */
if (rel->reloptkind == RELOPT_BASEREL &&
!bms_equal(rel->relids, root->all_query_rels))
generate_useful_gather_paths(root, rel, false); /* parallel paths */
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);

Every base rel is finalized — its cheapest paths cached — before any join path is built. Then make_rel_from_joinlist drives the join search. It turns the joinlist into a list of initial_rels, recursing into any sub-lists (FULL JOINs and collapse-limit-blocked joins stay nested), and then dispatches by relation count:

// make_rel_from_joinlist — src/backend/optimizer/path/allpaths.c
if (levels_needed == 1)
return (RelOptInfo *) linitial(initial_rels); /* single item, done */
else
{
root->initial_rels = initial_rels;
if (join_search_hook) /* extension override */
return (*join_search_hook) (root, levels_needed, initial_rels);
else if (enable_geqo && levels_needed >= geqo_threshold)
return geqo(root, levels_needed, initial_rels); /* genetic search */
else
return standard_join_search(root, levels_needed, initial_rels);
}

The default standard_join_search is the System-R dynamic program. It allocates join_rel_level[], seeds level 1 with the base/sub-list rels, and builds each level from the ones below:

// standard_join_search — src/backend/optimizer/path/allpaths.c
root->join_rel_level = palloc0((levels_needed + 1) * sizeof(List *));
root->join_rel_level[1] = initial_rels;
for (lev = 2; lev <= levels_needed; lev++)
{
join_search_one_level(root, lev); /* build all lev-item joinrels */
foreach(lc, root->join_rel_level[lev])
{
rel = (RelOptInfo *) lfirst(lc);
generate_partitionwise_join_paths(root, rel);
if (!bms_equal(rel->relids, root->all_query_rels))
generate_useful_gather_paths(root, rel, false);
set_cheapest(rel); /* finalize this joinrel */
}
}
Assert(list_length(root->join_rel_level[levels_needed]) == 1);
return (RelOptInfo *) linitial(root->join_rel_level[levels_needed]);
flowchart BT
  A["{A}"]
  B["{B}"]
  C["{C}"]
  AB["{A B}"]
  BC["{B C}"]
  AC["{A C}"]
  ABC["{A B C}"]
  A --> AB
  B --> AB
  B --> BC
  C --> BC
  A --> AC
  C --> AC
  AB --> ABC
  BC --> ABC
  AC --> ABC

Figure 3 — Dynamic-programming join search, bottom-up. Level 1 is the base rels; join_search_one_level builds every 2-item joinrel, then every 3-item joinrel from the level below, up to the single top rel. Each joinrel is set_cheapest-finalized before any rel that contains it is built — the property the README calls out: “we will finish constructing all paths for a given relation before we construct any paths for relations containing that rel.” The enumeration rules (which pairs, left/right/bushy, legality of outer-join reordering) live in postgres-join-ordering.md.

The README explains why this ordering matters: because every contained rel is fully finalized first, the planner “can reliably identify the cheapest path for each rel before higher-level relations need to know that” and “can safely discard a path when we find that another path for the same rel is better, without worrying that maybe there is already a reference to that path in some higher-level join path.” That is the principle of optimality made into a memory-management guarantee.

Cheapest-path selection: add_path and set_cheapest

Section titled “Cheapest-path selection: add_path and set_cheapest”

Two functions enforce the pruning discipline. add_path is called every time a candidate path is generated; it inserts the new path into the rel’s pathlist and removes any now-dominated paths (and may reject the new path if it is itself dominated). Domination is multi-dimensional: a path dominates another only if it is no more expensive on the relevant cost metric and no worse on sort order (pathkeys) and no more parameterized. That is why the MergePath in Figure 2 survives a cheaper HashPath — its sort order is a dimension the hash path loses on.

Once all paths for a rel are added, set_cheapest walks the surviving pathlist once and caches the winners into the RelOptInfo:

// set_cheapest — src/backend/optimizer/util/pathnode.c
foreach(p, parent_rel->pathlist)
{
Path *path = (Path *) lfirst(p);
if (path->param_info)
{
/* Parameterized path: track best per minimum parameterization */
parameterized_paths = lappend(parameterized_paths, path);
// ... condensed: bms_subset_compare to keep least-parameterized ...
}
else
{
/* Unparameterized: consider for cheapest startup / total slots */
cmp = compare_path_costs(cheapest_total_path, path, TOTAL_COST);
if (cmp > 0 ||
(cmp == 0 && compare_pathkeys(...) == PATHKEYS_BETTER2))
cheapest_total_path = path;
// ... cheapest_startup_path similarly ...
}
}

Note the tie-break: when two unparameterized paths cost the same, set_cheapest keeps the better-sorted one, again preserving sort order as a first-class property. The result is the cheapest_total_path, cheapest_startup_path, and cheapest_parameterized_paths that higher levels (and ultimately get_cheapest_fractional_path at the top) read. The cost formulas behind compare_path_costs are postgres-cost-model.md’s subject; the spine fact is that pruning happens continuously during the search (add_path) and the winners are cached per niche (set_cheapest).

Phase 4 — create_plan: the winning Path becomes a Plan

Section titled “Phase 4 — create_plan: the winning Path becomes a Plan”

After subquery_planner returns, standard_planner has the final RelOptInfo and picks best_path. create_plan runs once to translate the one winning Path tree into the executable Plan tree:

// create_plan — src/backend/optimizer/plan/createplan.c
plan = create_plan_recurse(root, best_path, CP_EXACT_TLIST);
// ... condensed: label top tlist with original column names,
// attach initPlans, check NestLoopParams ...
return plan;

create_plan_recurse switches on best_path->pathtype and dispatches to the matching builder, recursing into subpaths first (so children are built before parents):

// create_plan_recurse — src/backend/optimizer/plan/createplan.c
switch (best_path->pathtype)
{
case T_SeqScan:
case T_IndexScan:
case T_BitmapHeapScan:
// ... all scan pathtypes ...
plan = create_scan_plan(root, best_path, flags);
break;
case T_HashJoin:
case T_MergeJoin:
case T_NestLoop:
plan = create_join_plan(root, (JoinPath *) best_path);
break;
case T_Append:
plan = create_append_plan(root, (AppendPath *) best_path, flags);
break;
// ... Result, ProjectSet, Sort, Agg, Limit, ModifyTable, ...
}

create_scan_plan shows the Path→Plan translation concretely: it pulls the restriction clauses out of the parent RelOptInfo (baserestrictinfo, or the index’s indrestrictinfo for an index scan), appends parameterized join clauses if the path is parameterized, decides the output tlist, and builds the executable scan node:

// create_scan_plan — src/backend/optimizer/plan/createplan.c
RelOptInfo *rel = best_path->parent;
switch (best_path->pathtype)
{
case T_IndexScan:
case T_IndexOnlyScan:
scan_clauses = castNode(IndexPath, best_path)->indexinfo->indrestrictinfo;
break;
default:
scan_clauses = rel->baserestrictinfo;
break;
}
if (best_path->param_info) /* parameterized: add join clauses */
scan_clauses = list_concat_copy(scan_clauses,
best_path->param_info->ppi_clauses);
// ... choose tlist (physical vs. needed), build the scan Plan node ...

The per-pathtype builder bodies — how a NestPath becomes a NestLoop, how parameterized scans wire up nestloop params, how gating Result nodes get inserted for pseudoconstant quals — are the subject of postgres-plan-creation.md. The spine fact is that create_plan is a deterministic structural walk: no cost decisions remain, because the search already chose the winner. The Plan tree it emits is what postgres-executor.md picks up.

flowchart TB
  BP["best_path (winning Path tree)"]
  CPR["create_plan_recurse<br/>(switch on pathtype)"]
  SCAN["create_scan_plan<br/>(SeqScan, IndexScan, ...)"]
  JOIN["create_join_plan<br/>(NestLoop, MergeJoin, HashJoin)"]
  OTHER["create_append/sort/agg/limit/...plan"]
  PLAN["Plan tree (executable)"]
  BP --> CPR
  CPR --> SCAN
  CPR --> JOIN
  CPR --> OTHER
  CPR -.->|"recurse into subpaths first"| CPR
  SCAN --> PLAN
  JOIN --> PLAN
  OTHER --> PLAN

Figure 4 — create_plan as a top-down structural walk of the one winning Path tree. create_plan_recurse dispatches on pathtype, recursing into child subpaths before building each parent node, so the executable Plan tree is assembled bottom-up from a single chosen Path tree. No costing happens here — the search is over.

The optimizer is large (the six files cited here total ~29,000 lines), but the spine — entry, the three phases, path generation, pruning, emission — is a small set of stable functions. They are grouped below by the call-flow the figures above trace. The per-AM path builders, the join enumeration rules, the cost formulas, and the per-node Path→Plan bodies live in the sibling docs; what follows is the skeleton that ties them together.

Entry and the planning/emission split — planner.c

Section titled “Entry and the planning/emission split — planner.c”
  • standard_planner — the real entry point behind the planner hook wrapper. Builds the per-invocation PlannerGlobal, computes tuple_fraction from cursorOptions, calls subquery_planner for the top Query, then does the one-shot emission: fetch_upper_rel(root, UPPERREL_FINAL, NULL)get_cheapest_fractional_pathcreate_plan. Everything after that (set_plan_references, building the PlannedStmt) is post-processing, not search.
  • get_cheapest_fractional_path — picks the winner from the final rel’s pathlist, honoring tuple_fraction: a plan that starts fast can beat a plan that is cheaper end-to-end when the caller will fetch only part of the result (a DECLARE CURSOR, a LIMIT). This is the reason the search keeps cheapest_startup_path alongside cheapest_total_path.

Phase 1 — preprocess one Query level — planner.c

Section titled “Phase 1 — preprocess one Query level — planner.c”
  • subquery_planner — invoked once per Query node, recursing for sub-Queries. Allocates the PlannerInfo (root), pulls up sublinks and simple subqueries (pull_up_sublinks, pull_up_subqueries), runs preprocess_expression over targetlist and quals, reduces outer joins (reduce_outer_joins), removes useless result RTEs, then calls grouping_planner and finishes with SS_charge_for_initplans + set_cheapest(final_rel). The mechanism of pull-up and qual canonicalization is postgres preprocessing material; the boundary fact is that the Query is normalized before any cost-based work begins.

Phase 2 — the upper operations — planner.c

Section titled “Phase 2 — the upper operations — planner.c”
  • grouping_planner — plans everything above the scan/join core. Calls query_planner to get current_rel (the best scan/join paths), builds the final PathTarget, then stacks upper operations as new upper rels: create_grouping_paths (GROUP BY / aggregation, both hashed and sorted variants), create_window_paths (window functions), create_distinct_paths (DISTINCT), create_ordered_paths (the final ORDER BY), plus LIMIT and, for DML, the ModifyTable path. The current_rel = create_<op>_paths(root, current_rel, ...) idiom is the through-line: each step consumes a rel’s pathlist and emits a new rel with one more operation stacked on, ending at UPPERREL_FINAL.

Phase 3 — the scan-and-join core — planmain.c + allpaths.c

Section titled “Phase 3 — the scan-and-join core — planmain.c + allpaths.c”
  • query_planner (planmain.c) — sets up the cost-based search but does not choose a single path. Builds base-rel RelOptInfos (add_base_rels_to_query), distributes quals and builds EquivalenceClasses (deconstruct_jointree, reconsider_outer_join_clauses, generate_base_implied_equalities), then — only once ECs are final — invokes the qp_callback to build canonical PathKeys, and finally calls make_one_rel. The EC-before-pathkeys ordering is mandated by the README’s “Order of processing for EquivalenceClasses and PathKeys”.
  • make_one_rel (allpaths.c) — the two-pass generator: set_base_rel_sizesset_base_rel_pathlists (all per-table access paths) → make_rel_from_joinlist (all join paths). Returns the single top joinrel covering all_query_rels.
  • set_base_rel_pathlists / set_rel_pathlist — iterate base rels and dispatch each by rtekind/relkind to the right builder (set_plain_rel_pathlist, set_foreign_pathlist, set_function_pathlist, …). set_rel_pathlist ends every base rel with the set_rel_pathlist_hook extension point, optional generate_useful_gather_paths (parallel), and set_cheapest(rel) — so every base rel is finalized before any join consumes it.
  • make_rel_from_joinlist — turns the joinlist into initial_rels and dispatches by relation count: a single item returns directly; otherwise it honors join_search_hook, falls to geqo when enable_geqo && levels_needed >= geqo_threshold, else standard_join_search.
  • standard_join_search — the System-R dynamic program. Allocates join_rel_level[], seeds level 1 with initial_rels, and for each level lev from 2 up calls join_search_one_level(root, lev) then generate_partitionwise_join_paths, optional generate_useful_gather_paths, and set_cheapest on each just-built joinrel. The level-by-level loop is what guarantees every contained rel is finalized before any rel containing it — the principle-of-optimality memory guarantee. The enumeration rules inside join_search_one_level are postgres-join-ordering.md.
  • add_path — called on every generated path. Inserts it into the rel’s pathlist and removes paths it now dominates (or rejects the newcomer if dominated). Domination is multi-dimensional: cheaper-or-equal on the cost metric and no worse on pathkeys and no more parameterized. This is why a costlier but better-sorted path survives.
  • add_path_precheck — the cheap pre-test a builder calls before spending effort to construct a path it would only discard.
  • set_cheapest — after all paths are added, walks the surviving pathlist once and caches cheapest_startup_path, cheapest_total_path, and cheapest_parameterized_paths, with a sort-order tie-break among equal-cost unparameterized paths.

Phase 4 — Path → Plan emission — createplan.c

Section titled “Phase 4 — Path → Plan emission — createplan.c”
  • create_plan — runs once on the chosen Path. Calls create_plan_recurse(root, best_path, CP_EXACT_TLIST), then labels the top tlist with the query’s output column names and attaches initPlans.
  • create_plan_recurse — switches on best_path->pathtype and dispatches to create_scan_plan (all scan tags), create_join_plan (NestLoop / MergeJoin / HashJoin), create_append_plan, and the upper-node builders (Result, Sort, Agg, Limit, ModifyTable, …), recursing into subpaths first.
  • create_scan_plan — the concrete Path→Plan translation for a scan: pulls the restriction clauses from the parent rel (baserestrictinfo, or the index’s indrestrictinfo), appends ppi_clauses if parameterized, chooses the output tlist, and builds the executable scan node. The per-node bodies are postgres-plan-creation.md; the spine fact is that no cost decision remains here — the walk is purely structural.

Position hints (as of 2026-06-05, REL_18 273fe94)

Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”
SymbolFileLine
standard_plannersrc/backend/optimizer/plan/planner.c321
subquery_plannersrc/backend/optimizer/plan/planner.c669
grouping_plannersrc/backend/optimizer/plan/planner.c1676
create_grouping_pathssrc/backend/optimizer/plan/planner.c4022
create_window_pathssrc/backend/optimizer/plan/planner.c4775
create_distinct_pathssrc/backend/optimizer/plan/planner.c5032
create_ordered_pathssrc/backend/optimizer/plan/planner.c5550
get_cheapest_fractional_pathsrc/backend/optimizer/plan/planner.c6859
query_plannersrc/backend/optimizer/plan/planmain.c54
make_one_relsrc/backend/optimizer/path/allpaths.c171
set_base_rel_sizessrc/backend/optimizer/path/allpaths.c290
set_base_rel_pathlistssrc/backend/optimizer/path/allpaths.c333
set_rel_pathlistsrc/backend/optimizer/path/allpaths.c469
make_rel_from_joinlistsrc/backend/optimizer/path/allpaths.c3352
standard_join_searchsrc/backend/optimizer/path/allpaths.c3457
set_cheapestsrc/backend/optimizer/util/pathnode.c272
add_pathsrc/backend/optimizer/util/pathnode.c464
add_path_prechecksrc/backend/optimizer/util/pathnode.c691
create_plansrc/backend/optimizer/plan/createplan.c337
create_plan_recursesrc/backend/optimizer/plan/createplan.c388
create_scan_plansrc/backend/optimizer/plan/createplan.c559
create_join_plansrc/backend/optimizer/plan/createplan.c1081
struct PlannerInfo (fwd typedef)src/include/nodes/pathnodes.h212
struct RelOptInfosrc/include/nodes/pathnodes.h883
struct Pathsrc/include/nodes/pathnodes.h1753

All checks below were made against /data/hgryoo/references/postgres at branch REL_18_STABLE, commit 273fe94852b3a7e34fd171e8abdf1481beb302fa (PostgreSQL 18.x, committed 2026-06-05).

  • standard_planner separates search from emission with three distinct statements. Verified at planner.c lines 456–459: final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL), best_path = get_cheapest_fractional_path(final_rel, tuple_fraction), top_plan = create_plan(root, best_path). The Path search and the single create_plan call are textually separate; emission consumes one winner.

  • subquery_planner calls grouping_planner and then set_cheapest on the final upper rel. Verified by reading subquery_planner (planner.c line 669): after grouping_planner, it fetches UPPERREL_FINAL, charges init-plans, and calls set_cheapest(final_rel) before returning root. Confirms that path selection at this level is deferred to the caller via the cached cheapest pointers.

  • make_one_rel is genuinely two-pass. Verified at allpaths.c line 171: set_base_rel_sizes then set_base_rel_pathlists then make_rel_from_joinlist, with Assert(bms_equal(rel->relids, root->all_query_rels)) on the result. All base-rel access paths exist before any join path is built.

  • The DP loop finalizes each level before building the next. Verified by reading standard_join_search (allpaths.c line 3457): the for (lev = 2; lev <= levels_needed; lev++) loop calls join_search_one_level, then per joinrel runs generate_partitionwise_join_paths, generate_useful_gather_paths (except for the topmost rel), and set_cheapest. This is the principle-of-optimality memory guarantee the README describes.

  • GEQO is gated by enable_geqo && levels_needed >= geqo_threshold. Verified at allpaths.c (the make_rel_from_joinlist dispatch, line 3352). geqo_threshold is a real GUC declared at allpaths.c line 80 and registered in guc_tables.c (line 2223); its shipped default is 12. Below the threshold (and absent a join_search_hook), standard_join_search runs.

  • The README states the Path↔Plan correspondence verbatim. Verified at src/backend/optimizer/README line 23: “There is pretty nearly a one-to-one correspondence between the [Path and Plan trees], but Path nodes omit info that won’t be needed during planning, and include info needed for planning that won’t be needed by the executor.” This is the authority for the RelOptInfo/Path/Plan framing in §“PostgreSQL’s Approach”.

  • The EC-before-PathKey ordering is documented, not incidental. Verified at README “Order of processing for EquivalenceClasses and PathKeys” (line 1026): ECs are built during deconstruct_jointree / reconsider_outer_join_clauses; only after no further merging is possible are they “canonical” and PathKeys built. This is why query_planner invokes the qp_callback after generate_base_implied_equalities.

  • struct Path carries only planning-relevant fields, not execution detail. Verified at pathnodes.h line 1753: the base struct holds pathtype, parent, pathtarget, param_info, parallel flags, rows, disabled_nodes, startup_cost, total_cost, pathkeys — and no executor state. RelOptInfo (line 883) holds the pathlist, partial_pathlist, and the four cheapest_* slots quoted in the body.

  1. disabled_nodes vs. legacy disable_cost in path comparison. The Path struct (line 1753) carries an integer disabled_nodes count, and compare_path_costs_fuzzily weighs it ahead of cost. The exact tie-break ordering between disabled_nodes, startup_cost, and total_cost inside add_path / set_cheapest was not fully traced here. Investigation path: read compare_path_costs_fuzzily and compare_path_costs in pathnode.c and confirm disabled_nodes is strictly dominant. (Belongs to postgres-cost-model.md.)

  2. Upper-rel parameterization. Whether any UPPERREL_* rel ever carries parameterized paths (vs. only scan/join rels) was not verified. The body treats parameterization as a scan/join-level property. Investigation path: grep create_*_paths in planner.c for param_info usage on upper paths.

  3. Exact set of pathtype tags reaching create_plan_recurse. The body lists the major cases; the full switch in create_plan_recurse (createplan.c line 388) has more (e.g., T_TidScan, T_TableFuncScan, T_NamedTuplestoreScan, T_CustomScan). The complete enumeration is deferred to postgres-plan-creation.md.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”
  • System R / Selinger (SIGMOD 1979). Access Path Selection in a Relational Database Management System is the direct ancestor of this optimizer — bottom-up DP over relation subsets, interesting orders, a catalog-statistics cost model. Captured at research/dbms-papers/systemr-optimizer.md. PostgreSQL’s Path, pathkeys, add_path, and standard_join_search are the line-for-line descendants; the one structural addition is the explicit Path abstraction interposed between the search and the executable plan.

  • Fong, The Design and Implementation of the POSTGRES Query Optimizer (UCB MS thesis). Documents how the System-R cost model was adapted into the Berkeley POSTGRES codebase whose descendant is today’s src/backend/optimizer/. The naming lineage (RelOptInfo, Path, RestrictInfo) traces here. A focused read would pin down which modern behaviors are original POSTGRES versus later additions.

  • Cascades / Volcano (Graefe). The dominant alternative to bottom-up DP: a top-down, memoized, rule-driven search over a memo of equivalence groups, used by SQL Server and Apache Calcite. PostgreSQL’s RelOptInfo is the analogue of a memo group, but the search is bottom-up DP rather than top-down transformation rules. A side-by-side note would clarify what PostgreSQL trades by not having a transformation-rule engine (harder to add new algebraic rewrites; simpler, more predictable search).

  • GEQO — the genetic large-join escape hatch. Above geqo_threshold (default 12) PostgreSQL abandons exhaustive DP for a genetic algorithm (geqo). The classic comparison target is the literature on randomized join enumeration (Ioannidis & Kang, simulated annealing / iterative improvement). A dedicated note in postgres-join-ordering.md should measure plan quality vs. planning time at the crossover.

  • Adaptive / learned query optimization. Modern research (Bao, Neo, Balsa; adaptive execution in Spark/Presto) replaces or augments the static cost model with runtime feedback or learned cost estimators. PostgreSQL’s optimizer is fully static — it commits to one plan before execution. The Path/cost split is exactly the seam where a learned cost estimator would be injected; this is the natural frontier note once postgres-cost-model.md lands.

Source code (/data/hgryoo/references/postgres, REL_18_STABLE, commit 273fe94):

  • src/backend/optimizer/plan/planner.cstandard_planner, subquery_planner, grouping_planner, the upper-path builders, get_cheapest_fractional_path.
  • src/backend/optimizer/plan/planmain.cquery_planner.
  • src/backend/optimizer/path/allpaths.cmake_one_rel, set_base_rel_pathlists, set_rel_pathlist, make_rel_from_joinlist, standard_join_search.
  • src/backend/optimizer/plan/createplan.ccreate_plan, create_plan_recurse, create_scan_plan, create_join_plan.
  • src/backend/optimizer/util/pathnode.cadd_path, add_path_precheck, set_cheapest.
  • src/include/nodes/pathnodes.hPlannerInfo, RelOptInfo, Path.
  • src/backend/optimizer/README — the optimizer’s own design document (Path↔Plan correspondence; EC/PathKey processing order; the “finish all paths for a rel before any containing rel” guarantee).

Theory and papers:

  • Selinger et al., Access Path Selection in a Relational Database Management System (SIGMOD 1979) — research/dbms-papers/systemr-optimizer.md.
  • Fong, The Design and Implementation of the POSTGRES Query Optimizer (UC Berkeley MS thesis).
  • Silberschatz, Korth, Sudarshan, Database System Concepts, 7e, ch. 16 “Query Optimization”.
  • Petrov, Database Internals, ch. 7 (query planning and execution).

Sibling documents (scope boundaries):

  • postgres-architecture-overview.md — where the planner sits in the parse → rewrite → plan → execute pipeline.
  • postgres-path-generation.md — the per-AM path builders called by set_rel_pathlist.
  • postgres-join-ordering.mdjoin_search_one_level, GEQO, legality of join reordering.
  • postgres-cost-model.md — selectivity, the cost formulas behind compare_path_costs.
  • postgres-plan-creation.md — the per-pathtype bodies inside create_plan_recurse.
  • postgres-executor.md — what the executor does with the finished Plan.