PostgreSQL Path Generation — Scan and Join Path Builders, add_path Pruning, and Pathkeys
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”A cost-based optimizer must enumerate physical access strategies for each
relation before it can rank them. The System R optimizer (Selinger et al.,
“Access Path Selection in a Relational Database Management System,” SIGMOD
1979, captured at research/dbms-papers/systemr-optimizer.md) named the
enumeration unit the access path: for a base relation, a path is a
particular combination of scan method, applicable predicates, and output sort
order; for a join, a path is a particular pairing of outer and inner paths with
a specific join algorithm. The key insight in System R is that paths are
cheap to generate and discard — you evaluate thousands of candidates but
build only the one winner into an executable plan.
Database Internals (Petrov, ch. 7) frames this as plan-space enumeration: the optimizer builds a lattice of candidate strategies bottom-up, keeping only the Pareto-optimal frontier at each level (cheapest total cost, cheapest startup cost, cheapest for each useful sort order). The Pareto-optimality criterion — keep a candidate only if no other candidate dominates it on every relevant dimension — is the pruning rule that makes the search tractable.
Two dimensions that every access-path system must manage are:
-
Physical access strategies for a single table. For a heap relation, the choices are sequential scan, index scan (one per applicable index), index-only scan, bitmap index scan (possibly merging multiple indexes via AND/OR), and TID scan. Each has a different startup cost, total cost, and output sort order.
-
Sort order as a first-class property. A plan that is more expensive than a competitor but emits rows pre-sorted on a join key or an
ORDER BYcolumn may still win overall by eliminating a downstream sort. System R calls these interesting sort orders and explicitly keeps a cheapest path for each one alongside the cheapest unordered path. This is the conceptual seed of PostgreSQL’spathkeys.
The access-path layer sits between two sibling concerns: cost estimation
(which assigns numeric cost to each candidate — costsize.c,
postgres-cost-model.md) and join-order enumeration (which pairs and
sequences the base-rel paths into join paths — joinpath.c, joinrels.c,
postgres-join-ordering.md). This document covers only the access-path
generation and pruning machinery — the scan-level builders, add_path, and
the pathkeys system.
Common DBMS Design
Section titled “Common DBMS Design”Per-relation candidate list with dominance pruning
Section titled “Per-relation candidate list with dominance pruning”Every cost-based optimizer maintains a candidate list per relation that accumulates competing strategies as they are discovered and prunes dominated ones on the fly. The canonical shape is: a list (or priority queue) ordered by total cost; when a new candidate arrives, scan the list and discard any existing entry that the new candidate dominates on every relevant dimension. If any existing entry dominates the new candidate, discard the new one instead. The surviving frontier is then available to higher join levels.
The dimensions on which dominance is assessed vary by engine: cost (startup and total), output sort order, required outer relations (parameterization), row count, and in modern engines also parallel-safety. The wider the set of dimensions, the more candidates survive (because it is harder to dominate on all axes simultaneously), but the more thorough the search.
Separation of scan-path generation from join-path generation
Section titled “Separation of scan-path generation from join-path generation”Practically all serious optimizers treat base-relation scanning as a distinct sub-problem from join ordering. Base-rel paths are fully generated and pruned first; join paths then compose from the surviving base-rel winners. This separation is clean because base-rel paths carry no dependency on the join order — a sequential scan of table A costs the same regardless of which table it will later be joined to, with one exception: parameterized paths (index scans whose index clauses reference values from another relation) do depend on which outer relation is available. Most optimizers therefore treat unparameterized base-rel paths as globally reusable and parameterized ones as join-level artifacts.
Index path generation: matching clauses to index columns
Section titled “Index path generation: matching clauses to index columns”Index path generation is the most complex part of access-path enumeration. The standard engineering pattern has three steps:
-
Match WHERE/ON predicates to index columns. A predicate of the form
col OP constant(orcol OP col2for parameterized paths) can use index columncolif the operatorOPis in the index’s operator class. Only predicates that match the index’s key-column prefix can actually prune tuples via the index. -
Decide between regular scan, index-only scan, and bitmap scan. A regular index scan follows the index leaf entries and fetches heap pages one at a time in index order. An index-only scan reads the index without visiting the heap at all (possible when the index covers all needed columns). A bitmap scan first collects all matching TIDs into a bitmap, then fetches heap pages in physical order — a better strategy when many rows match and the selectivity is moderate. Engines typically generate all three variants and let the cost model pick.
-
Handle partial indexes and EquivalenceClass-derived predicates. A partial index (one with a
WHEREpredicate in its definition) is only useful if the query’s predicates imply the index predicate. An EquivalenceClass-derived predicate (e.g., ifa = bandb = 5, we can generatea = 5for an index ona) can open additional index paths not explicitly named in the original query.
Pathkeys: sort order as a canonical algebraic structure
Section titled “Pathkeys: sort order as a canonical algebraic structure”The classical treatment of interesting sort orders (System R) is ad hoc: the
planner tests a small set of pre-identified orderings. A more principled
approach canonicalizes sort order into a data structure that represents
equivalence classes of expressions known to be equal: if a.x = b.y (a
join predicate or filter), then any sort order on a.x is the same as a sort
order on b.y. An engine that tracks this can recognize that an index scan
on a.x satisfies an ORDER BY b.y without an extra sort, because
a.x ≡ b.y in the relevant join context.
Theory ↔ PostgreSQL mapping
Section titled “Theory ↔ PostgreSQL mapping”| Theory / pattern | PostgreSQL name |
|---|---|
| Per-relation candidate list | RelOptInfo.pathlist |
| Dominance pruning on add | add_path |
| Cheapest-path cache per dimension | cheapest_startup_path, cheapest_total_path, cheapest_parameterized_paths (in RelOptInfo) |
| Scan path for base rel (sequential) | Path with pathtype = T_SeqScan; built by create_seqscan_path |
| Scan path for base rel (index) | IndexPath; built by build_index_paths inside create_index_paths |
| Bitmap multi-index scan | BitmapHeapPath wrapping BitmapAndPath/BitmapOrPath of IndexPaths |
| Parameterized path (index join clause) | IndexPath with non-NULL param_info |
| TID scan | TidPath / TidRangePath; built by create_tidscan_paths |
| Parallel scan fragment | partial path in RelOptInfo.partial_pathlist; gathered by create_gather_path |
| Interesting sort order | Path.pathkeys — list of PathKey |
| Canonical sort-order expression | PathKey referencing an EquivalenceClass |
| Equivalence class of equal expressions | EquivalenceClass in PlannerInfo.eq_classes |
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”Overview: the two-pass structure of make_one_rel
Section titled “Overview: the two-pass structure of make_one_rel”make_one_rel (allpaths.c:171) is the function query_planner calls to
produce the final join RelOptInfo. It has a strictly ordered two-pass
structure:
// make_one_rel — src/backend/optimizer/path/allpaths.cRelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist){ /* Pass 1: size estimates (needed by parameterized path cost) */ set_base_rel_sizes(root);
/* Accumulate total_table_pages for cost model */ total_pages = 0; for (rti = 1; rti < root->simple_rel_array_size; rti++) { ... } root->total_table_pages = total_pages;
/* Pass 2: generate all scan paths for base rels */ set_base_rel_pathlists(root);
/* Join search: produce the final joinrel */ rel = make_rel_from_joinlist(root, joinlist); return rel;}set_base_rel_sizes must run before set_base_rel_pathlists because
parameterized index paths need the cardinality estimate of the outer relation
to compute their loop-count-adjusted cost. make_rel_from_joinlist is the
entry point for join ordering (covered by postgres-join-ordering.md).
set_base_rel_pathlists: the per-RTE dispatch
Section titled “set_base_rel_pathlists: the per-RTE dispatch”set_base_rel_pathlists (allpaths.c:333) iterates root->simple_rel_array
and calls set_rel_pathlist for each base rel. set_rel_pathlist
(allpaths.c:469) dispatches by RTE kind:
// set_rel_pathlist — src/backend/optimizer/path/allpaths.cstatic voidset_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte){ if (IS_DUMMY_REL(rel)) /* already proved empty, nothing to do */; else if (rte->inh) set_append_rel_pathlist(root, rel, rti, rte); /* partitioned/inherited */ else { switch (rel->rtekind) { case RTE_RELATION: if (rte->relkind == RELKIND_FOREIGN_TABLE) set_foreign_pathlist(root, rel, rte); else if (rte->tablesample != NULL) set_tablesample_rel_pathlist(root, rel, rte); else set_plain_rel_pathlist(root, rel, rte); /* ordinary heap */ break; case RTE_FUNCTION: set_function_pathlist(...); break; case RTE_VALUES: set_values_pathlist(...); break; /* RTE_SUBQUERY, RTE_CTE, etc: handled in set_rel_size */ /* ... */ } } /* Plugin hook: extensions can add/remove paths here */ if (set_rel_pathlist_hook) (*set_rel_pathlist_hook)(root, rel, rti, rte);
/* Collect partial paths into Gather paths (parallelism) */ if (rel->reloptkind == RELOPT_BASEREL && !bms_equal(rel->relids, root->all_query_rels)) generate_useful_gather_paths(root, rel, false);
/* Pin cheapest paths into RelOptInfo fields */ set_cheapest(rel);}Three things to notice. First, set_rel_pathlist_hook is the extensibility
point for custom scan providers (CustomPath). Second, parallel partial paths
are collected here via generate_useful_gather_paths — this wraps any
partial_pathlist entries in GatherPath or GatherMergePath nodes and adds
them to the main pathlist. Third, set_cheapest runs at the end of every
base-rel dispatch, pinning cheapest_total_path before the join search
begins.
set_plain_rel_pathlist: the three scan families
Section titled “set_plain_rel_pathlist: the three scan families”For an ordinary heap table, set_plain_rel_pathlist (allpaths.c:768) adds
exactly three families of paths:
// set_plain_rel_pathlist — src/backend/optimizer/path/allpaths.cstatic voidset_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte){ Relids required_outer = rel->lateral_relids;
/* TID scan (forced if CurrentOfExpr is present; returns if so) */ if (create_tidscan_paths(root, rel)) return;
/* Sequential scan — always added */ add_path(rel, create_seqscan_path(root, rel, required_outer, 0));
/* Parallel sequential scan fragments */ if (rel->consider_parallel && required_outer == NULL) create_plain_partial_paths(root, rel);
/* All applicable index paths */ create_index_paths(root, rel);}The sequential scan is unconditional (it is always a valid, if expensive,
choice). create_tidscan_paths returns true only when a CurrentOfExpr is
present (cursor-positioned update/delete), in which case no other path is
valid and the function returns immediately. create_index_paths is where most
of the work happens.
create_index_paths: the index path generator
Section titled “create_index_paths: the index path generator”create_index_paths (indxpath.c:240) iterates rel->indexlist (the list of
IndexOptInfo structs describing every index on the relation) and for each
index runs a three-phase clause-matching pipeline:
// create_index_paths — src/backend/optimizer/path/indxpath.cvoidcreate_index_paths(PlannerInfo *root, RelOptInfo *rel){ /* For each index on this relation: */ foreach(lc, rel->indexlist) { IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
/* Skip partial indexes whose predicate the query doesn't satisfy */ if (index->indpred != NIL && !index->predOK) continue;
/* Phase 1: match WHERE/baserestrictinfo to index columns */ match_restriction_clauses_to_index(root, index, &rclauseset); get_index_paths(root, rel, index, &rclauseset, &bitindexpaths);
/* Phase 2: match join clauses (loose, not yet in an EC) */ match_join_clauses_to_index(root, rel, index, &jclauseset, &joinorclauses);
/* Phase 3: match EquivalenceClass-derived join predicates */ match_eclass_clauses_to_index(root, index, &eclauseset);
/* Combine phases 2+3 into parameterized index paths */ if (jclauseset.nonempty || eclauseset.nonempty) consider_index_join_clauses(root, rel, index, &rclauseset, &jclauseset, &eclauseset, &bitjoinpaths); }
/* Merge restriction bitmaps into one BitmapHeapPath */ if (bitindexpaths != NIL) { bitmapqual = choose_bitmap_and(root, rel, bitindexpaths); bpath = create_bitmap_heap_path(root, rel, bitmapqual, rel->lateral_relids, 1.0, 0); add_path(rel, (Path *) bpath); if (rel->consider_parallel && rel->lateral_relids == NULL) create_partial_bitmap_paths(root, rel, bitmapqual); }
/* Merge join bitmaps, one BitmapHeapPath per distinct parameterization */ if (bitjoinpaths != NIL) { /* ... for each distinct outer-rel set ... */ }}Phase 1 (match_restriction_clauses_to_index) walks rel->baserestrictinfo
and checks each RestrictInfo against each index key column. A match
requires the operator to be in the index’s operator family and the indexable
column to be on the appropriate side. get_index_paths then calls
build_index_paths which emits IndexPath and IndexPath (index-only)
objects via add_path for regular scans, and accumulates IndexPath
objects into bitindexpaths for later bitmap aggregation.
Phases 2 and 3 find parameterized index paths — index scans where the
index clause references a value from another relation (e.g., WHERE a.x = b.y can use an index on a.x if b.y is available from an outer rel in a
nested-loop join). Phase 2 matches explicit join clauses; phase 3 matches
predicates that can be derived from the query’s EquivalenceClass graph.
The bitmap path assembly at the end (choose_bitmap_and) is important: when
multiple indexes each cover a different subset of the qualifying rows,
combining their TID bitmaps with AND reduces the set of heap pages to fetch.
The cost model in costsize.c evaluates whether the combined bitmap scan is
cheaper than any individual index scan.
Figure 1 — create_index_paths clause pipeline per index
flowchart TD A["create_index_paths\n(rel->indexlist loop)"] B["match_restriction_clauses_to_index\n(WHERE / baserestrictinfo)"] C["get_index_paths → build_index_paths\n(IndexPath, IndexOnlyScan, bitmap)"] D["match_join_clauses_to_index\n(explicit join clauses)"] E["match_eclass_clauses_to_index\n(EC-derived join predicates)"] F["consider_index_join_clauses\n(parameterized IndexPath + bitmap)"] G["choose_bitmap_and\n→ BitmapHeapPath (restriction)"] H["choose_bitmap_and × parameterization\n→ BitmapHeapPath (join)"] A --> B --> C A --> D A --> E D & E --> F C -->|"bitindexpaths"| G F -->|"bitjoinpaths"| H
Figure 1 — The three-phase clause-matching pipeline inside create_index_paths. Each pass through the outer foreach(lc, rel->indexlist) loop runs phases 1–3 for one index. Bitmap paths from all indexes accumulate across the loop and are assembled into a single BitmapHeapPath (restriction) or one per parameterization (join) at the end.
Path structs: the type hierarchy
Section titled “Path structs: the type hierarchy”Every path is a Path node with a pathtype tag that identifies the concrete
scan or join algorithm. Specialized paths embed Path as their first member
(the standard C struct-inheritance pattern):
// Path (base) — src/include/nodes/pathnodes.htypedef struct Path { NodeTag type; NodeTag pathtype; /* T_SeqScan, T_IndexScan, T_NestLoop, … */ RelOptInfo *parent; /* the relation this path produces */ PathTarget *pathtarget; /* output column list (default = rel->reltarget) */ ParamPathInfo *param_info; /* NULL = unparameterized */ bool parallel_aware; bool parallel_safe; int parallel_workers; Cardinality rows; int disabled_nodes; /* count of GUC-disabled plan nodes used */ Cost startup_cost; Cost total_cost; List *pathkeys; /* sort order: list of PathKey */} Path;
// IndexPath — src/include/nodes/pathnodes.htypedef struct IndexPath { Path path; /* pathtype = T_IndexScan or T_IndexOnlyScan */ IndexOptInfo *indexinfo; List *indexclauses; /* list of IndexClause */ List *indexorderbys; /* ORDER BY expressions usable by index AM */ ScanDirection indexscandir; Cost indextotalcost; Selectivity indexselectivity;} IndexPath;
// BitmapHeapPath — src/include/nodes/pathnodes.htypedef struct BitmapHeapPath { Path path; /* pathtype = T_BitmapHeapScan; pathkeys = NIL */ Path *bitmapqual; /* IndexPath, BitmapAndPath, or BitmapOrPath */} BitmapHeapPath;The disabled_nodes field (new in recent PG releases) counts how many nodes
in the path tree use a GUC-disabled plan type (e.g., enable_seqscan = off).
add_path treats disabled_nodes as a higher-order cost component: a path
with fewer disabled nodes always beats one with more, regardless of estimated
cost. This is how enable_seqscan = off actually works — it does not remove
seqscan paths from the pathlist; it adds a large disabled_nodes penalty so
they lose the add_path comparison.
Join paths embed a JoinPath struct (which extends Path with outer/inner
subpaths and join type), and specializations for each algorithm:
// NestPath / MergePath / HashPath — src/include/nodes/pathnodes.htypedef struct NestPath { JoinPath jpath; } NestPath;
typedef struct MergePath { JoinPath jpath; List *path_mergeclauses; List *outersortkeys; /* NIL = outer is already sorted */ List *innersortkeys; /* NIL = inner is already sorted */ int outer_presorted_keys; /* for incremental sort eligibility */ bool skip_mark_restore; bool materialize_inner;} MergePath;
typedef struct HashPath { JoinPath jpath; List *path_hashclauses; int num_batches; Cardinality inner_rows_total;} HashPath;add_path: the dominance pruning rule
Section titled “add_path: the dominance pruning rule”add_path (pathnode.c:464) is called every time a new path candidate is
constructed. It implements the Pareto-dominance test against every existing
path in rel->pathlist. A new path dominates an old path if the new one
is at least as good on all of cost, pathkeys, parameterization, row count,
and parallel-safety — and strictly better on at least one:
// add_path — src/backend/optimizer/util/pathnode.cvoidadd_path(RelOptInfo *parent_rel, Path *new_path){ bool accept_new = true; /* Parameterized paths are treated as having NIL pathkeys — they cannot win on sort order, only on cost */ new_path_pathkeys = new_path->param_info ? NIL : new_path->pathkeys;
foreach(p1, parent_rel->pathlist) { Path *old_path = (Path *) lfirst(p1); bool remove_old = false;
costcmp = compare_path_costs_fuzzily(new_path, old_path, STD_FUZZ_FACTOR); /* If costs differ in both directions, keep both (no dominance) */ if (costcmp != COSTS_DIFFERENT) { keyscmp = compare_pathkeys(new_path_pathkeys, old_path_pathkeys); if (keyscmp != PATHKEYS_DIFFERENT) { outercmp = bms_subset_compare(PATH_REQ_OUTER(new_path), PATH_REQ_OUTER(old_path)); /* check all combinations of costcmp, keyscmp, outercmp, rows, parallel_safe to set remove_old or accept_new=false */ /* ... */ } } if (remove_old) { /* delete old from pathlist, pfree unless IndexPath */ } else if (new_path->total_cost >= old_path->total_cost) insert_at++; if (!accept_new) break; } if (accept_new) parent_rel->pathlist = list_insert_nth(parent_rel->pathlist, insert_at, new_path); else if (!IsA(new_path, IndexPath)) pfree(new_path);}Three design choices embedded in add_path are worth noting:
-
Parameterized paths get NIL pathkeys. A parameterized path cannot win on sort order because doing so would allow a less-parameterized path to be discarded just because its parameterized competitor happens to be sorted — which would leave the planner without an unparameterized option when the join topology changes. Treating parameterized paths as unsorted limits how many of them survive, keeping the pathlist tractable.
-
Fuzzy cost comparison.
compare_path_costs_fuzzilyuses a small fuzz factor (STD_FUZZ_FACTOR = 1.01) so that two paths whose costs differ by less than 1% are treated as equal. This prevents platform-specific floating-point rounding from producing spuriously different plans. -
IndexPath nodes are never pfree’d.
IndexPathobjects may be referenced both as standalone paths and as children ofBitmapHeapPathnodes.add_paththerefore skipspfreeforIndexPathnodes when rejecting them (to avoid a dangling pointer in anyBitmapHeapPaththat already holds a pointer to it).
set_cheapest: pinning the winners
Section titled “set_cheapest: pinning the winners”After add_path has been called for all paths of a relation, set_cheapest
(pathnode.c:272) scans the surviving pathlist and pins four pointers into
RelOptInfo:
// set_cheapest — src/backend/optimizer/util/pathnode.cvoidset_cheapest(RelOptInfo *parent_rel){ /* Scan pathlist (sorted by disabled_nodes then total_cost) */ foreach(p, parent_rel->pathlist) { Path *path = (Path *) lfirst(p); if (path->param_info) { parameterized_paths = lappend(parameterized_paths, path); /* track best_param_path by (least parameterization, then cost) */ } else { /* update cheapest_startup_path and cheapest_total_path */ } } /* If no unparameterized path exists, use the best parameterized one */ parent_rel->cheapest_startup_path = cheapest_startup_path; parent_rel->cheapest_total_path = cheapest_total_path; parent_rel->cheapest_unique_path = NULL; /* computed on demand */ parent_rel->cheapest_parameterized_paths = parameterized_paths;}cheapest_parameterized_paths is a list (not a single pointer) because
different outer-rel parameterizations are incomparable — a path parameterized
on {A} and a path parameterized on {B} are both kept because neither is a
subset of the other. The join-path builders will pick from this list based on
which outer rel they are currently considering.
Pathkeys: canonical sort order
Section titled “Pathkeys: canonical sort order”A PathKey (pathnodes.h:1595) represents one sort key in a path’s output
ordering:
// PathKey — src/include/nodes/pathnodes.htypedef struct PathKey { NodeTag type; EquivalenceClass *pk_eclass; /* EC containing the sort expression */ Oid pk_opfamily; /* btree opfamily defining the ordering */ CompareType pk_cmptype; /* ascending or descending */ bool pk_nulls_first;} PathKey;The core insight is that pk_eclass is an EquivalenceClass — a set of
expressions that the optimizer has proved are mutually equal in the query
context. If a.x = b.y is a join condition, both a.x and b.y belong to
the same EC, so a PathKey referencing that EC represents a sort order on
either expression. An index scan on a.x and an index scan on b.y will
produce PathKey nodes that are pointer-equal (because make_canonical_pathkey
returns a canonical shared instance per EC), so compare_pathkeys can
determine they match by pointer comparison:
// make_canonical_pathkey — src/backend/optimizer/path/pathkeys.cPathKey *make_canonical_pathkey(PlannerInfo *root, EquivalenceClass *eclass, Oid opfamily, CompareType cmptype, bool nulls_first){ /* Chase ec_merged links to the canonical EC */ while (eclass->ec_merged) eclass = ecase->ec_merged;
/* Return pre-existing canonical PathKey if one matches */ foreach(lc, root->canon_pathkeys) { pk = (PathKey *) lfirst(lc); if (eclass == pk->pk_eclass && opfamily == pk->pk_opfamily && cmptype == pk->pk_cmptype && nulls_first == pk->pk_nulls_first) return pk; } /* Allocate a new canonical PathKey in planner_cxt */ pk = makeNode(PathKey); /* ... fill fields ... */ root->canon_pathkeys = lappend(root->canon_pathkeys, pk); return pk;}This canonicalization means pathkeys_contained_in (pathkeys.c:343) can
determine whether one path’s sort order satisfies another’s requirements in
O(n) list comparisons with pointer equality, rather than deep expression tree
comparisons.
Figure 2 — PathKey canonicalization and EC-based sort-order matching
flowchart LR
subgraph EC["EquivalenceClass {a.x, b.y}"]
AX["a.x"]
BY["b.y"]
end
PK["PathKey\npk_eclass → EC\npk_opfamily = int4_ops\npk_cmptype = ASC"]
IX["IndexPath on a.x\npathkeys = [PK]"]
IY["IndexPath on b.y\npathkeys = [PK]"]
MJ["MergePath\noutersortkeys = [PK]\ninnersortkeys = [PK]"]
OB["ORDER BY a.x\nrequired_pathkeys = [PK]"]
EC --> PK
PK --> IX
PK --> IY
PK --> MJ
PK --> OB
Figure 2 — A single canonical PathKey is shared by every Path node whose sort output is a.x or b.y (equivalent expressions). The merge join and the ORDER BY reference the same PathKey pointer, so pathkeys_contained_in matches them by pointer equality in O(1) per key.
Append and partitioned-table paths
Section titled “Append and partitioned-table paths”For partitioned tables and inheritance hierarchies, set_append_rel_pathlist
(allpaths.c:1251) recurses into each child relation (calling set_rel_pathlist
per child), then assembles the results into AppendPath or MergeAppendPath.
A MergeAppendPath is used when all children produce output in the same sort
order (e.g., each child has a matching index) and the merge can avoid a top-level
sort. This is a key optimization for partition-wise operations: if the query has
an ORDER BY that matches the partition key, the planner can avoid sorting
by using MergeAppend over per-partition index scans.
Partition pruning happens before path generation, during set_rel_size —
partitions whose ranges cannot overlap the query’s filter are marked as dummy
rels. When set_append_rel_pathlist iterates the children, it skips dummy
rels, so no paths are ever generated for pruned partitions.
Foreign and custom scan paths
Section titled “Foreign and custom scan paths”set_foreign_pathlist (allpaths.c:938) calls the FDW’s GetForeignPaths
callback, which creates ForeignPath nodes using create_foreignscan_path
(pathnode.c). The FDW decides how many paths to submit; it must call
add_path for each. The FDW hook pattern is the same add_path API as core
scan builders — custom paths compete directly against heap-scan and index-scan
paths in the same pathlist.
Custom scan providers (set_rel_pathlist_hook) use the same mechanism.
CustomPath embeds a methods pointer to a CustomPathMethods struct whose
PlanCustomPath callback is invoked by create_plan to emit the executable
CustomScan node.
Source Walkthrough
Section titled “Source Walkthrough”allpaths.c — make_one_rel and per-RTE dispatch
Section titled “allpaths.c — make_one_rel and per-RTE dispatch”make_one_rel— top-level entry for base+join path generation; drives size estimation pass then path-generation pass; callsmake_rel_from_joinlistfor join ordering.set_base_rel_sizes— first pass over base rels: callsset_rel_sizeper rel to populaterows,pages,consider_parallel.set_base_rel_pathlists— second pass: callsset_rel_pathlistper base rel.set_rel_pathlist— dispatches byrtekind/relkindto the appropriate path builder; callsset_rel_pathlist_hook,generate_useful_gather_paths, andset_cheapestat the end.set_plain_rel_pathlist— for ordinary heap tables: TID, seqscan, parallel seqscan, index paths.create_plain_partial_paths— emits parallel-worker-count partial seqscan paths intorel->partial_pathlist.set_append_rel_pathlist— for partitioned/inherited tables: per-child recursion, thenAppendPath/MergeAppendPathassembly.set_foreign_pathlist— calls FDWGetForeignPathscallback.set_subquery_pathlist— for subquery RTEs: selects a pushdown-safe plan from the recursively planned subquery, applies qual pushdown.
indxpath.c — index path generation
Section titled “indxpath.c — index path generation”create_index_paths— outer loop overrel->indexlist; coordinates three clause-matching phases; assembles bitmap paths.get_index_paths— for one index and one clause set, decides scan direction and callsbuild_index_pathsfor both forward and backward scan.build_index_paths— emitsIndexPath(IndexScan or IndexOnlyScan) viaadd_path; accumulatesIndexPathto bitmap candidates list.match_restriction_clauses_to_index— matchesrel->baserestrictinfoagainst index key columns; fillsIndexClauseSet.match_join_clauses_to_index— matchesrel->joininfo(loose join clauses) against index key columns; fills a separateIndexClauseSet.match_eclass_clauses_to_index— for eachEquivalenceClassthat mentions this rel’s columns, generates indexable join predicates; fillseclauseset.consider_index_join_clauses— for each distinct parameterization derived from phases 2+3, callsbuild_index_pathsto produce parameterizedIndexPathnodes.
pathnode.c — path constructors and pruning
Section titled “pathnode.c — path constructors and pruning”set_cheapest— scanspathlistafter all paths are added; setscheapest_startup_path,cheapest_total_path,cheapest_parameterized_paths.add_path— dominance pruning; inserts or discards a new path; the sorted insertion order (bydisabled_nodesthentotal_cost) makes the scan fast.add_partial_path— same pruning forpartial_pathlist(parallel fragments).create_seqscan_path— builds a plainPathnode for a sequential scan withpathtype = T_SeqScan; cost fromcost_seqscan.create_nestloop_path— builds aNestPath; requires outer and inner path as inputs; cost fromcost_nestloop.create_mergejoin_path— builds aMergePath; computes sort requirements for both inputs; cost fromcost_mergejoin.create_hashjoin_path— builds aHashPath; cost fromcost_hashjoin.create_gather_path— wraps a partial path in aGatherPathfor parallel execution; cost fromcost_gather.
pathkeys.c — canonical sort-order tracking
Section titled “pathkeys.c — canonical sort-order tracking”make_canonical_pathkey— returns (or creates) the uniquePathKeyfor a given(EC, opfamily, cmptype, nulls_first)tuple; stored inroot->canon_pathkeys.pathkeys_contained_in— tests whether pathkeys listkeys1is a prefix of (or equal to)keys2; used to decide whether a sort is redundant.build_join_pathkeys— derives the pathkeys for a join path from the outer path’s pathkeys and the available ECs after the join.make_pathkeys_for_sortclauses— converts a SQLORDER BY/GROUP BYclause list into canonicalPathKeynodes for use as a required ordering.
Position hints (as of 2026-06-05, commit 273fe94)
Section titled “Position hints (as of 2026-06-05, commit 273fe94)”| Symbol | File | Line |
|---|---|---|
make_one_rel | src/backend/optimizer/path/allpaths.c | 171 |
set_base_rel_sizes | src/backend/optimizer/path/allpaths.c | 290 |
set_base_rel_pathlists | src/backend/optimizer/path/allpaths.c | 333 |
set_rel_pathlist | src/backend/optimizer/path/allpaths.c | 469 |
set_plain_rel_pathlist | src/backend/optimizer/path/allpaths.c | 768 |
create_plain_partial_paths | src/backend/optimizer/path/allpaths.c | 806 |
set_foreign_pathlist | src/backend/optimizer/path/allpaths.c | 938 |
set_append_rel_pathlist | src/backend/optimizer/path/allpaths.c | 1251 |
set_subquery_pathlist | src/backend/optimizer/path/allpaths.c | 2529 |
create_index_paths | src/backend/optimizer/path/indxpath.c | 240 |
get_index_paths | src/backend/optimizer/path/indxpath.c | 716 |
build_index_paths | src/backend/optimizer/path/indxpath.c | 810 |
match_restriction_clauses_to_index | src/backend/optimizer/path/indxpath.c | 2466 |
match_join_clauses_to_index | src/backend/optimizer/path/indxpath.c | 2482 |
match_eclass_clauses_to_index | src/backend/optimizer/path/indxpath.c | 2516 |
set_cheapest | src/backend/optimizer/util/pathnode.c | 272 |
add_path | src/backend/optimizer/util/pathnode.c | 464 |
add_partial_path | src/backend/optimizer/util/pathnode.c | 798 |
create_seqscan_path | src/backend/optimizer/util/pathnode.c | 986 |
create_nestloop_path | src/backend/optimizer/util/pathnode.c | 2671 |
create_mergejoin_path | src/backend/optimizer/util/pathnode.c | 2768 |
create_hashjoin_path | src/backend/optimizer/util/pathnode.c | 2836 |
create_gather_path | src/backend/optimizer/util/pathnode.c | 2180 |
make_canonical_pathkey | src/backend/optimizer/path/pathkeys.c | 56 |
pathkeys_contained_in | src/backend/optimizer/path/pathkeys.c | 343 |
build_join_pathkeys | src/backend/optimizer/path/pathkeys.c | 1295 |
make_pathkeys_for_sortclauses | src/backend/optimizer/path/pathkeys.c | 1336 |
RelOptInfo (struct) | src/include/nodes/pathnodes.h | 883 |
Path (struct) | src/include/nodes/pathnodes.h | 1753 |
IndexPath (struct) | src/include/nodes/pathnodes.h | 1842 |
BitmapHeapPath (struct) | src/include/nodes/pathnodes.h | 1917 |
EquivalenceClass (struct) | src/include/nodes/pathnodes.h | 1442 |
PathKey (struct) | src/include/nodes/pathnodes.h | 1595 |
NestPath (struct) | src/include/nodes/pathnodes.h | 2225 |
MergePath (struct) | src/include/nodes/pathnodes.h | 2271 |
HashPath (struct) | src/include/nodes/pathnodes.h | 2292 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”Verified facts
Section titled “Verified facts”-
make_one_relperforms a strict two-pass structure: sizes before paths. Confirmed by readingallpaths.c:171–232:set_base_rel_sizesis called unconditionally beforeset_base_rel_pathlists, and thetotal_table_pagesaccumulation loop runs between them. This ordering is required because parameterized index path costs depend on the cardinality of outer rels. -
set_rel_pathlist_hookfires after the built-in builders but beforeset_cheapest. Confirmed at allpaths.c:540–562: the hook call and thegenerate_useful_gather_pathscall both precedeset_cheapest(rel). This means custom scan providers see all built-in paths already inpathlistand can add to or remove from it before the cheapest pointers are set. -
add_pathnever pfreesIndexPathnodes. Confirmed at pathnode.c (~line 650): the free call isif (!IsA(old_path, IndexPath)) pfree(old_path). The comment explains the reason:IndexPathobjects are shared withBitmapHeapPathchildren. -
disabled_nodesis a higher-order cost component. Confirmed by readingadd_path: theinsert_atlogic usesnew_path->disabled_nodes > old_path->disabled_nodesas the primary sort key before comparingtotal_cost. This meansenable_seqscan = offdoes not eliminate seqscan paths from the search — it just ensures they lose to any enabled alternative. -
Parameterized paths get NIL pathkeys in
add_path. Confirmed at pathnode.c:506:new_path_pathkeys = new_path->param_info ? NIL : new_path->pathkeys. The comment explains the design rationale. -
make_canonical_pathkeychasesec_mergedlinks to find the canonical EC before lookup. Confirmed at pathkeys.c:56–104. This ensures that after two ECs are merged (because a transitive equality was discovered), all PathKey nodes built before the merge still resolve to the same canonical EC. -
create_index_pathsskips partial indexes whosepredOKflag is false. Confirmed at indxpath.c:268:if (index->indpred != NIL && !index->predOK) continue.predOKis set bycheck_index_predicatesinsideset_plain_rel_size(beforeset_plain_rel_pathlistis called), which tests whether the query’sbaserestrictinfoimplies the partial index’s predicate.
Open questions
Section titled “Open questions”-
Partial bitmap path generation threshold.
create_partial_bitmap_pathsis called only whenrel->consider_parallel && rel->lateral_relids == NULL. The condition forlateral_relids == NULLis clear (LATERAL references cannot be parallelized at the base-rel level), but it is not immediately clear from the code why a LATERAL reference on a bitmap heap path specifically prevents partial execution — investigation path: trace therequired_outerhandling increate_partial_bitmap_pathsandcreate_bitmap_heap_path. -
choose_bitmap_andcost model accuracy. The function selects the “best AND combination” of bitmap index paths using a greedy heuristic (not exhaustive search). The cost model for combined bitmap scans (incost_bitmap_heap_scan) uses an approximation for correlated pages. Whether this approximation is accurate for highly correlated columns (e.g., a composite index scenario) is an open question. Investigation path:cost_bitmap_and_nodeincostsize.c. -
outer_presorted_keysinMergePathand incremental sort interaction. Theouter_presorted_keysfield enables incremental sort when the outer path is partially sorted. The planner sets this when building merge paths, but the interaction withIncrementalSortPathgeneration (which happens ingrouping_plannerfor the upper rels) is not verified — it is unclear whether base-rel merge paths can also generateIncrementalSortPathon their outer input or whether that only applies to upper-rel paths.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”-
Cascades / Columbia / Orca memo architecture. PostgreSQL uses a per-relation pathlist (System R style), where candidates are pruned as they arrive and the structure is destroyed as the search moves up. Cascades-family optimizers (Graefe 1995; implemented in SQL Server, Orca/Greenplum) use a memo that preserves all logical and physical expressions in a hash table of groups, enabling top-down search with memoization and allowing the search to revisit lower groups when a higher-level cost context changes. This is strictly more powerful for queries with many equivalent logical transformations but has higher memory overhead. A side-by-side comparison of add_path’s bottom-up pruning versus Cascades’ top-down rule-fire model would illustrate the tradeoff.
-
Adaptive query processing and re-optimization. PostgreSQL’s path generation is entirely static — all costs are estimated from statistics before execution begins. Systems like Eddies (Avnur & Hellerstein, SIGMOD 2000) and more recent work on runtime re-optimization (e.g., mid-query plan switches in SQL Server) replace or augment the static search with feedback from actual runtime row counts. PG18’s async I/O (
postgres-aio.md) improves I/O throughput for plans the static optimizer already chose, but does not feed back actual I/O costs into re-planning. -
ML-based cardinality estimation and learned cost models. The accuracy of path selection in
create_index_pathsdepends entirely on the cardinality estimates from the statistics subsystem. Recent research (e.g., Kipf et al., “Learned Cardinalities,” CIDR 2019; Marcus et al., “Bao,” SIGMOD 2021) replaces or augments histogram-based estimates with ML models trained on query execution feedback. PostgreSQL’s extended statistics (postgres-extended- statistics.md) is the production step in this direction within the current architecture. -
Partial index exploitation depth. PostgreSQL’s
predOKcheck (check_index_predicates) tests whether the query’s filter implies the index predicate. A deeper exploitation would also generate filters to make more partial indexes applicable — this is related to the “semantic query optimization” literature (e.g., using integrity constraints to generate new predicates). No current RDBMS does this fully. -
Partition-wise join and aggregate path generation.
set_append_rel_pathlistcurrently generatesAppendPath/MergeAppendPath. Whenconsider_partitionwise_joinis set on aRelOptInfo, the join path builders injoinpath.cgenerate partition-wise join paths (introduced in PG11). The interaction between pathkey canonicalization and partition key expressions (which must be in the same EC for partition-wise join to be legal) is an active area of development.
Sources
Section titled “Sources”In-tree documentation
Section titled “In-tree documentation”src/backend/optimizer/README— the authoritative design document for the entire optimizer; covers path/join-tree construction, EquivalenceClasses, outer-join ordering, Vars/PlaceHolderVars, and the optimizer function outline.
Papers
Section titled “Papers”- Selinger et al., “Access Path Selection in a Relational Database Management
System,” SIGMOD 1979 — the System R template; captured at
knowledge/research/dbms-papers/systemr-optimizer.md.
Textbooks
Section titled “Textbooks”- Database Internals (Petrov, ch. 7) — cost-based search, plan-space enumeration, Pareto-optimal frontier.
- Database System Concepts (Silberschatz, 7e, ch. 16) — query optimization overview; cost estimation; dynamic programming join enumeration.
Source files
Section titled “Source files”src/backend/optimizer/path/allpaths.c—make_one_rel, all per-RTE path builderssrc/backend/optimizer/path/indxpath.c— index path generation pipelinesrc/backend/optimizer/path/pathkeys.c— canonical pathkey construction and comparisonsrc/backend/optimizer/util/pathnode.c—add_path,set_cheapest, all path constructorssrc/include/nodes/pathnodes.h—RelOptInfo,Path,IndexPath,BitmapHeapPath,EquivalenceClass,PathKey, join path structs