PostgreSQL Join Ordering — Dynamic-Programming Search and the GEQO Fallback
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”For any query that joins more than two relations, the optimizer faces two
nested decisions. The first is which order to join the relations in —
((A ⋈ B) ⋈ C) versus (A ⋈ (B ⋈ C)) versus ((A ⋈ C) ⋈ B) and so on.
The second, for a fixed order, is which physical algorithm to run at
each join (nested loop, sort-merge, hash) and which access path to read
each base relation through (sequential scan, index scan, bitmap scan). This
document is about the first decision — join enumeration — and treats the
second as a delegated sub-problem (covered in postgres-path-generation.md
and postgres-cost-model.md).
The reason join ordering dominates is that the number of orderings explodes combinatorially while the per-order work is comparatively bounded. The number of distinct join trees over n relations, counting both left-deep and bushy shapes and both orderings of every join, is
(2(n-1))! / (n-1)!
which is 2 for n=2, 12 for n=3, 120 for n=4, 1680 for n=5, and grows
super-exponentially thereafter. Even restricting to left-deep trees there are
n! orderings. No engine can cost every tree for large n, so every
optimizer must either prune the space or search it heuristically. The choice
of which order matters enormously: a bad order builds a huge intermediate
result that a good order would have avoided entirely, and intermediate size
multiplies through the rest of the plan.
The foundational treatment is Selinger et al., Access Path Selection in a
Relational Database Management System (SIGMOD 1979 — the System R optimizer
paper, captured in knowledge/research/dbms-papers/systemr-optimizer.md).
Two of its ideas are load-bearing for everything PostgreSQL does:
-
Cost-based, not rule-based. System R assigns every candidate plan a numeric cost — a weighted sum of estimated page fetches and CPU instructions, computed from catalog statistics (cardinalities, selectivities) — and picks the minimum. The paper introduces the selectivity factors and the cost formulae that PostgreSQL’s cost model still echoes.
-
Dynamic programming over relation subsets. Rather than enumerate whole trees, System R builds the answer bottom-up by subset size. It first finds the best way to access each single relation; then, for every pair, the best two-way join; then, for every triple, the best three-way join, reusing the already-computed best plans for the constituent subsets. The key insight is the principle of optimality: the cheapest plan for joining a set S of relations must be built from a cheapest plan for some subset of S joined to the rest. So the optimizer need only keep, for each subset of relations, the cheapest plan(s) producing that subset — a memo. This collapses the
(2(n-1))!/(n-1)!tree space to roughly2^nsubsets, each costed a bounded number of times.
To make the explosion concrete, consider a five-way join A⋈B⋈C⋈D⋈E.
There are 1680 distinct join trees (bushy, both child orders counted) but
only 2^5 - 1 = 31 non-empty subsets of {A,B,C,D,E}, of which only those
with 2+ members (26 of them) are join relations. DP costs each of those 26
relations once and, for each, considers the handful of ways to split it into
two already-costed sub-relations. The asymptotic difference is the whole
game: tree count is Θ(4^n / n^{1.5}), subset count is 2^n. At
geqo_threshold = 12 relations, 2^12 = 4096 subsets is still tractable;
the 1680-style tree count for 12 relations is astronomically larger, which
is exactly why the cutoff sits where it does.
Selinger’s System R kept only left-deep trees and pruned away
non-interesting orders, trading completeness for tractability. The paper also
introduced interesting orders: a sub-plan that produces its output sorted
in an order useful to a later operator (a merge join, an ORDER BY, a
GROUP BY) is retained even if it is not the cheapest, because the sort it
provides may save a later sort. That nuance — that “cheapest” is not a single
scalar but is qualified by output ordering and by parameterization — is why
PostgreSQL keeps a set of non-dominated paths per relation rather than one
winner, and is detailed in postgres-path-generation.md.
The interesting-orders idea deserves a second look because it is the reason
“the cheapest plan” is not a scalar. Selinger observed that a sub-plan
producing relation {A,B} sorted on the column that a later merge join
needs is worth keeping even if a cheaper unsorted plan for {A,B} exists —
the sort it already provides can save the later merge join from sorting, and
that downstream saving may exceed the upfront premium. So the optimizer must
retain, per relation set, not the single cheapest plan but the Pareto
frontier over (cost, output-order) — and, in modern PostgreSQL, over (cost,
output-order, parameterization). This is why the join-search code repeatedly
calls set_cheapest (to pick scalar winners for the next level’s
building) yet keeps a whole pathlist (to preserve the frontier). The
distinction — scalar winner for construction, frontier for retention —
recurs throughout the implementation.
The DP approach has one well-known failure mode: 2^n itself becomes
intractable past a few dozen relations, and the constant factors (each subset
spawns many candidate joins and paths) bite well before that. For large
join problems the literature turns to randomized / heuristic search —
iterative improvement, simulated annealing, and genetic algorithms that
treat a join order as a “tour” (by analogy to the Travelling Salesman
Problem) and evolve a population of tours toward lower cost. PostgreSQL
implements exactly this two-regime design: exhaustive DP for small problems,
a genetic algorithm (GEQO) past a threshold.
Common DBMS Design
Section titled “Common DBMS Design”The theory fixes the skeleton — cost-based DP over relation subsets, with a heuristic escape hatch for large problems. This section names the engineering conventions that production optimizers share when they implement it, so that PostgreSQL’s specific code reads as one point in a known design space.
1. A memo keyed by relation-set. Every DP optimizer needs a table that
maps a set of base relations to the best plan(s) producing that set. The
set is the key precisely because join order is associative and commutative
at the logical level: (A ⋈ B) ⋈ C and (B ⋈ C) ⋈ A produce the same
relation {A,B,C} and must collapse to one memo entry, against which both
construction routes compete. Whether the engine calls this a “memo” (Cascades
/ SQL Server / Calcite), a “MEMO structure” of “groups”, or — in PostgreSQL —
the join_rel_list / join_rel_hash of RelOptInfo nodes keyed by a
Relids bitmapset, the role is identical: find-or-create by relid set, then
add candidate plans to the existing node.
A subtlety the textbook glosses but every implementation must confront: the
memo entry for a relation set does not hold one plan, it holds a small
set of non-dominated plans. A plan that is more expensive overall may still
be worth keeping if it delivers its output in a sort order a later operator
needs (Selinger’s “interesting orders”), or if it is parameterized so it can
be the inner side of a parameterized nested loop. So “find-or-create by relid
set” is followed by “merge this candidate into the kept set, discarding any it
dominates and discarding itself if dominated” — the Pareto frontier
maintained by what PostgreSQL calls add_path. The relid set is the memo
key; the kept path set is the memo value.
2. Level-by-level construction (the DP order). The bottom-up variant
(“Selinger-style DP”) iterates by subset size: build all size-2 joins from
size-1 rels, all size-3 from {size-1 ⋈ size-2}, and so on. This guarantees
that when you build a size-k join, every smaller subset it could be built
from already has its best plan computed. PostgreSQL materializes this as an
explicit array indexed by level, join_rel_level[k].
3. Tree shape: left-deep vs. bushy. A left-deep tree always joins an accumulated result to a single base relation; a bushy tree may join two composite intermediates. Left-deep is smaller to search and pipelines well for nested loops; bushy can be dramatically cheaper for star/snowflake schemas and is sometimes the only legal shape under outer-join ordering constraints. Many early optimizers (System R) searched left-deep only; modern ones, PostgreSQL included, search bushy trees too — but prune bushy joins to pairs that actually share a join clause, to keep the blow-up bounded.
4. Connected-subgraph pruning. Costing a Cartesian product (a join with
no join clause linking the two sides) is almost always wasteful, so the
enumerator prefers pairs (R, S) that are linked by a join predicate — i.e.
adjacent in the join graph whose nodes are relations and whose edges are
join clauses. Cartesian joins are generated only as a last resort when no
clause-connected plan exists.
5. Join-order constraints from semantics. Inner joins commute and
associate freely, but outer joins, semi/anti joins (from IN/EXISTS),
and LATERAL references impose ordering restrictions: some relations must
be joined before others, or cannot be reordered past an outer join without
changing results. A correctness check (join_is_legal in PostgreSQL) must
gate every proposed pair, and the enumerator must be willing to produce bushy
plans when constraints make left-deep illegal.
6. The large-N escape hatch. Because 2^n and its constants eventually
dominate, optimizers cap the exhaustive search and switch to a heuristic for
big join problems: greedy left-deep (“minimum selectivity first”),
randomized restarts, simulated annealing, or a genetic algorithm. The cap is
usually a tunable threshold on the number of FROM-list items.
7. Symmetry exploitation. Because logical join is commutative, the
enumerator would waste half its work if it considered both (R, S) and
(S, R) as distinct order-search events. The standard trick is to make the
join-rel constructor symmetric — given an unordered pair it internally tries
both orientations as physical alternatives — and to have the enumerator
visit each unordered pair once. PostgreSQL does both: make_join_rel(x, y)
“handles both x,y and y,x cases”, and join_search_one_level uses a
first_rel index at the symmetric levels (level == 2, and the bushy
k == other_level case) to skip pairs already considered. The result is that
order search enumerates unordered pairs while path search still gets to
choose which side is outer.
8. Cost-model coupling. The enumerator is only as good as the cardinality
and cost estimates that drive add_path’s domination test. A poor row
estimate for an intermediate join propagates: it makes the wrong sub-plan
look cheapest, which then becomes the building block for the next level. This
is the Achilles heel every Selinger-style optimizer shares, and the reason
the “Beyond” section below devotes space to adaptive and learned approaches
that revisit ordering at execution time. The DP machinery itself is correct;
its inputs are the soft spot.
flowchart TD
J["make_rel_from_joinlist<br/>levels_needed = #jointree items"] --> Q{"enable_geqo AND<br/>levels_needed >= geqo_threshold?"}
Q -- "no (small)" --> DP["standard_join_search<br/>exhaustive DP"]
Q -- "yes (large)" --> GEQO["geqo<br/>genetic search"]
DP --> L["join_rel_level[1..n]<br/>filled level by level"]
GEQO --> P["pool of join-order tours<br/>evolved over generations"]
L --> TOP["top-level RelOptInfo<br/>cheapest_total_path"]
P --> TOP
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”PostgreSQL is a faithful, modern implementation of Selinger-style DP, with a
genetic-algorithm fallback bolted on at a configurable threshold. The whole
join-search subsystem is reached from one function, make_rel_from_joinlist
(in allpaths.c), which the planner calls after it has flattened the FROM
clause into a joinlist (a possibly-nested list of RangeTblRef leaves
and sub-lists; see postgres-planner-overview.md for how the joinlist is
produced by deconstruct_jointree).
The entry point counts the items and dispatches:
// make_rel_from_joinlist — src/backend/optimizer/path/allpaths.clevels_needed = list_length(joinlist);// ... build initial_rels (one RelOptInfo per joinlist child, recursing// into sub-lists) ...if (levels_needed == 1) return (RelOptInfo *) linitial(initial_rels);else{ root->initial_rels = initial_rels; if (join_search_hook) return (*join_search_hook) (root, levels_needed, initial_rels); else if (enable_geqo && levels_needed >= geqo_threshold) return geqo(root, levels_needed, initial_rels); else return standard_join_search(root, levels_needed, initial_rels);}levels_needed is the number of independent jointree items — not the raw
table count, because explicit JOIN syntax and flattened sub-selects can
group items. The three-way branch is the design in miniature: a loadable
extension (join_search_hook, e.g. pg_hint_plan) can replace the search
entirely; otherwise the geqo_threshold GUC (default 12) selects the
genetic search for large problems and the exhaustive DP for everything else.
The DP memo. The thing that makes this dynamic programming and not brute
force is root->join_rel_level, an array of List * indexed by level. The
field is documented on PlannerInfo:
// PlannerInfo — src/include/nodes/pathnodes.h/* * When doing a dynamic-programming-style join search, join_rel_level[k] * is a list of all the join-relations of k items ... * join_cur_level is the current level. New join-relation RelOptInfos are * automatically added to the join_rel_level[join_cur_level] list. * join_rel_level is NULL if not in use. */List **join_rel_level pg_node_attr(read_write_ignore);int join_cur_level;Crucially, the uniqueness of a join relation is keyed by its set of relids,
not by how it was built. build_join_rel (called from make_join_rel) does
a find-or-create against join_rel_hash: the first construction route that
produces {A,B,C} allocates the RelOptInfo; every later route that
produces the same set finds the existing node and merely adds more paths
to it. That is the memo. join_search_one_level adds each freshly-built
joinrel to join_rel_level[join_cur_level] exactly once, regardless of how
many pairs build it.
standard_join_search — the level loop. Level 1 is seeded with the
initial rels; each subsequent level is built by join_search_one_level, and
each new joinrel is finalized with set_cheapest:
// standard_join_search — src/backend/optimizer/path/allpaths.croot->join_rel_level = (List **) palloc0((levels_needed + 1) * sizeof(List *));root->join_rel_level[1] = initial_rels;
for (lev = 2; lev <= levels_needed; lev++){ join_search_one_level(root, lev); foreach(lc, root->join_rel_level[lev]) { rel = (RelOptInfo *) lfirst(lc); generate_partitionwise_join_paths(root, rel); if (!bms_equal(rel->relids, root->all_query_rels)) generate_useful_gather_paths(root, rel, false); set_cheapest(rel); }}// final level must contain exactly one rel — the whole joinAssert(list_length(root->join_rel_level[levels_needed]) == 1);rel = (RelOptInfo *) linitial(root->join_rel_level[levels_needed]);After the last level there must be exactly one rel (the join of everything),
and its cheapest_total_path is the chosen plan. set_cheapest is what
distills the per-rel pathlist (a set of non-dominated paths kept by
add_path) down to the cheapest-startup and cheapest-total winners that the
next level will build on.
join_search_one_level — the enumerator. This is the heart of the
strategy. For a target level, it considers three families of constructions:
- Left/right-sided joins — each
(level-1)-rel joined to an initial (level-1, i.e. single) rel it shares a clause with. This is the left-deep backbone. - Bushy joins — each
k-rel joined to a(level-k)-rel for2 <= k <= level-2, only when a relevant join clause or order restriction links them. - Last-ditch Cartesian — if neither produced any joinrel at this level, force clauseless joins so the search cannot dead-end.
// join_search_one_level — src/backend/optimizer/path/joinrels.cforeach(r, joinrels[level - 1]){ RelOptInfo *old_rel = (RelOptInfo *) lfirst(r);
if (old_rel->joininfo != NIL || old_rel->has_eclass_joins || has_join_restriction(root, old_rel)) { int first_rel; if (level == 2) /* avoid considering the same pair twice */ first_rel = foreach_current_index(r) + 1; else first_rel = 0; make_rels_by_clause_joins(root, old_rel, joinrels[1], first_rel); } else make_rels_by_clauseless_joins(root, old_rel, joinrels[1]);}The level == 2 first_rel trick exploits symmetry — at level 2 joining
(a) to (b) is the same problem as (b) to (a), so it only looks at
rels after the current one. The bushy loop walks k from 2 to the halfway
point (since make_join_rel(x,y) already handles both orderings) and only
joins pairs with !bms_overlap (disjoint relid sets) that have a relevant
clause or restriction:
// join_search_one_level (bushy section) — joinrels.cfor (k = 2;; k++){ int other_level = level - k; if (k > other_level) break; /* only go to the halfway point */ foreach(r, joinrels[k]) { /* ... skip rels with no join clause and no restriction ... */ for_each_from(r2, joinrels[other_level], first_rel) { RelOptInfo *new_rel = (RelOptInfo *) lfirst(r2); if (!bms_overlap(old_rel->relids, new_rel->relids)) { if (have_relevant_joinclause(root, old_rel, new_rel) || have_join_order_restriction(root, old_rel, new_rel)) (void) make_join_rel(root, old_rel, new_rel); } } }}make_join_rel — legality, memo, paths. Every candidate pair flows
through make_join_rel. It computes the union of relids, checks
join_is_legal (rejecting orderings forbidden by outer/semi-join or LATERAL
constraints — returns NULL for an illegal pair), finds-or-builds the memo
RelOptInfo, and then calls populate_joinrel_with_paths:
// make_join_rel — src/backend/optimizer/path/joinrels.cjoinrelids = bms_union(rel1->relids, rel2->relids);if (!join_is_legal(root, rel1, rel2, joinrelids, &sjinfo, &reversed)) return NULL; /* invalid join order */joinrelids = add_outer_joins_to_relids(root, joinrelids, sjinfo, &pushed_down_joins);if (reversed) { /* swap rel1/rel2 to match sjinfo */ }if (sjinfo == NULL) { sjinfo = &sjinfo_data; init_dummy_sjinfo(...); }joinrel = build_join_rel(root, joinrelids, rel1, rel2, sjinfo, pushed_down_joins, &restrictlist);if (is_dummy_rel(joinrel)) return joinrel; /* provably empty */populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo, restrictlist);return joinrel;populate_joinrel_with_paths switches on the join type and calls
add_paths_to_joinrel once for each legal (outer, inner) orientation —
e.g. for a plain inner join it tries both (rel1 outer, rel2 inner) and
(rel2 outer, rel1 inner), because nested-loop and hash joins are not
symmetric in cost (the inner side is rescanned / hashed). This is where the
join-order search hands off to the join-method and path search, which
add_paths_to_joinrel performs by calling sort_inner_and_outer,
match_unsorted_outer, and hash_inner_and_outer in turn — covered in the
next section and, in depth, in postgres-path-generation.md and
postgres-cost-model.md.
flowchart TD
S1["join_search_one_level(level)"] --> A["left/right-sided:<br/>make_rels_by_clause_joins<br/>(level-1) x initial rels"]
S1 --> B["bushy: k x (level-k)<br/>with relevant clause only"]
S1 --> C{"joinrels[level] empty?"}
C -- yes --> D["last-ditch:<br/>make_rels_by_clauseless_joins"]
A --> MJ["make_join_rel(rel1, rel2)"]
B --> MJ
D --> MJ
MJ --> LEG{"join_is_legal?"}
LEG -- no --> X["return NULL"]
LEG -- yes --> BJR["build_join_rel<br/>find-or-create by relids = memo"]
BJR --> PP["populate_joinrel_with_paths"]
PP --> APJ["add_paths_to_joinrel<br/>sort-merge / nestloop / hash"]
A worked level-by-level trace. Take a four-item query
SELECT ... FROM a, b, c, d WHERE a.x=b.x AND b.y=c.y AND c.z=d.z — a chain
join graph a—b—c—d. levels_needed = 4, below the threshold, so
standard_join_search runs. The DP array fills like this:
| Level | join_rel_level[k] contents | How each was built |
|---|---|---|
| 1 | {a}, {b}, {c}, {d} | the initial_rels seed |
| 2 | {a,b}, {b,c}, {c,d} | clause-joins; {a,c},{a,d},{b,d} skipped — no join clause |
| 3 | {a,b,c}, {b,c,d} | {a,b}⋈c, c⋈{a,b}; {b,c}⋈a, {b,c}⋈d; {c,d}⋈b |
| 4 | {a,b,c,d} | {a,b,c}⋈d (left-deep) and {a,b}⋈{c,d} (bushy) |
The instructive row is level 4: the final relation {a,b,c,d} is reached by
two construction routes — the left-deep {a,b,c}⋈d from
make_rels_by_clause_joins, and the bushy {a,b}⋈{c,d} from the
k=2, other_level=2 branch of the bushy loop. Both call make_join_rel,
both resolve via build_join_rel to the same RelOptInfo (keyed on the
relid set {a,b,c,d}), and both contribute paths to its single pathlist;
set_cheapest then picks the global winner across all routes. That collapse
is the dynamic-programming memo doing its job: the relation is costed as one
node no matter how many ways the enumerator found to assemble it. Note also
the level-2 omissions — {a,c}, {a,d}, {b,d} are never built because no
join clause links them, so the connected-subgraph pruning keeps even this
small problem from costing Cartesian intermediates.
Legality gating and outer joins. Inner joins commute and associate
freely, but the moment outer joins, semi/anti joins, or LATERAL references
enter, most orderings become illegal — reordering a relation past an outer
join changes which rows get null-extended, and a LATERAL subquery must be
joined after the relations it references. make_join_rel therefore gates
every pair on join_is_legal before doing any work, and bails to NULL for
an illegal pair so no RelOptInfo and no paths are ever created for it:
// make_join_rel — src/backend/optimizer/path/joinrels.cAssert(!bms_overlap(rel1->relids, rel2->relids));joinrelids = bms_union(rel1->relids, rel2->relids);if (!join_is_legal(root, rel1, rel2, joinrelids, &sjinfo, &reversed)){ bms_free(joinrelids); return NULL; /* invalid join order — drop it */}join_is_legal consults root->join_info_list (one SpecialJoinInfo per
non-inner join, carrying min_lefthand / min_righthand relid sets that
encode the legal operand ranges) and the LATERAL bookkeeping. Its two outputs
steer the rest of make_join_rel: *sjinfo_p (the matched join’s
SpecialJoinInfo, or NULL for a plain inner join — in which case
init_dummy_sjinfo fabricates a minimal one for the selectivity code) and
*reversed_p (whether the two inputs must be swapped to match the join’s
syntactic LHS/RHS). The presence of these constraints is also why the
enumerator must build bushy trees: there are queries — nested semijoins are
the canonical example — where no left-deep order is legal at some
intermediate level, and only a bushy assembly at a higher level succeeds. The
elog(ERROR, "failed to build any %d-way joins", level) in
join_search_one_level is deliberately suppressed when
root->join_info_list != NIL for exactly this reason: an empty level can be
a legitimate, recoverable state when special joins are present.
Where order search ends and path search begins. Once make_join_rel has
fixed an (outer, inner) orientation, add_paths_to_joinrel runs a fixed,
numbered sequence of path builders. The numbering in the source is worth
quoting because it is the contract between the two searches — the join-order
layer above never sees this; it only sees the resulting pathlist:
// add_paths_to_joinrel — src/backend/optimizer/path/joinpath.c/* 1. mergejoin with both relations explicitly sorted */if (mergejoin_allowed) sort_inner_and_outer(root, joinrel, outerrel, innerrel, jointype, &extra);/* 2. nestloops + mergejoins where the outer is already ordered */if (mergejoin_allowed) match_unsorted_outer(root, joinrel, outerrel, innerrel, jointype, &extra);/* 4. hash join (both sides hashed) */if (enable_hashjoin || jointype == JOIN_FULL) hash_inner_and_outer(root, joinrel, outerrel, innerrel, jointype, &extra);/* 5/6. FDW join pushdown and the set_join_pathlist_hook extension seam */Each builder ends in a try_*_path helper that screens the candidate with an
initial_cost_* estimate and, if it survives, calls add_path to insert it
into joinrel->pathlist — add_path being the Pareto filter that keeps the
set of non-dominated paths (cheapest-total, plus any with a useful sort
order or cheaper parameterization). The full content of these builders is the
subject of postgres-path-generation.md; here the point is only that join
ordering treats them as a black box returning costed paths.
The GEQO regime. When levels_needed >= geqo_threshold, geqo() runs
instead of the DP. It is a textbook generational genetic algorithm whose
“genome” is a tour — a permutation of relation indices — and whose fitness
is plan cost. The driver loop is compact:
// geqo — src/backend/optimizer/geqo/geqo_main.cpool_size = gimme_pool_size(number_of_rels);number_generations = gimme_number_generations(pool_size);pool = alloc_pool(root, pool_size, number_of_rels);random_init_pool(root, pool);sort_pool(root, pool); /* sort by fitness once; kids replace worst */
for (generation = 0; generation < number_generations; generation++){ geqo_selection(root, momma, daddy, pool, Geqo_selection_bias); /* default build: edge recombination crossover (ERX) */ gimme_edge_table(root, momma->string, daddy->string, pool->string_length, edge_table); kid = momma; edge_failures += gimme_tour(root, edge_table, kid->string, pool->string_length); kid->worth = geqo_eval(root, kid->string, pool->string_length); spread_chromo(root, kid, pool); /* reinsert kid by fitness rank */}best_tour = (Gene *) pool->data[0].string;best_rel = gimme_tree(root, best_tour, pool->string_length);Three design points matter. First, the pool is sorted once and thereafter
each child is slotted into rank order by spread_chromo, pushing out the
worst individual — a steady-state-ish scheme adapted from Whitley’s Genitor.
Second, fitness is real plan cost: geqo_eval actually builds the joinrels
for the tour and reads cheapest_total_path->total_cost, so GEQO is costing
the same paths the DP would, just over far fewer orders. Third, the search is
seeded by geqo_seed, so its output is nondeterministic unless the seed
is pinned — the operational tax for escaping the 2^n wall.
The clump-merging gimme_tree is what lets a random permutation still yield
a legal plan: rather than blindly join in tour order (which could violate
outer-join or LATERAL constraints), each relation joins the first existing
clump it desirable_joins with, and leftover clumps are force-joined at the
end. Because it routes through the same make_join_rel, every legality and
costing rule the exhaustive search obeys is obeyed here too.
flowchart TD
INIT["alloc_pool + random_init_pool<br/>sort_pool by fitness"] --> GEN{"generation < number_generations?"}
GEN -- yes --> SEL["geqo_selection<br/>pick momma + daddy by linear bias"]
SEL --> XOVER["recombination (ERX)<br/>crossover -> kid tour"]
XOVER --> EVAL["geqo_eval -> gimme_tree<br/>fitness = cheapest_total_path cost"]
EVAL --> SPREAD["spread_chromo<br/>reinsert kid, drop worst"]
SPREAD --> GEN
GEN -- no --> BEST["gimme_tree(best_tour)<br/>return cheapest join rel"]
The genetic loop and the DP loop are interchangeable from the caller’s view:
both consume initial_rels and return a single top-level RelOptInfo with a
cheapest_total_path. Only the coverage differs — DP guarantees the global
optimum within its cost model, GEQO samples the space and returns the best of
what it sampled. This substitutability is exactly why make_rel_from_joinlist
can pick between them (and a hook) with a single if.
Source Walkthrough
Section titled “Source Walkthrough”This section walks the call chain symbol by symbol, from the planner’s
hand-off down into the genetic fallback. Anchor on the symbol names; the
line numbers in the closing table are hints valid only as of the updated:
date.
Entry and dispatch (allpaths.c)
Section titled “Entry and dispatch (allpaths.c)”make_one_relis the planner’s top of access-path generation; it sets base-rel paths (set_base_rel_pathlists) and then callsmake_rel_from_joinlistto build the join tree. (Detailed inpostgres-path-generation.md.)make_rel_from_joinlistrecurses over the joinlist buildinginitial_rels, then dispatches onlevels_needed: ajoin_search_hookoverride,geqowhenenable_geqo && levels_needed >= geqo_threshold, orstandard_join_search. A single jointree item returns immediately with no search.join_search_hook(ajoin_search_hook_typeglobal) lets extensions replace the search wholesale; it is the seampg_hint_planuses.
Exhaustive DP (allpaths.c + joinrels.c)
Section titled “Exhaustive DP (allpaths.c + joinrels.c)”standard_join_searchallocatesroot->join_rel_level, seeds level 1 withinitial_rels, then loopslevfrom 2 tolevels_needed, callingjoin_search_one_leveland finalizing each new joinrel withgenerate_partitionwise_join_paths,generate_useful_gather_paths, andset_cheapest. It asserts a single rel at the top level and returns it.join_search_one_levelis the enumerator. It scansjoinrels[level-1]for left/right-sided joins, runs the bushy double-loop overjoinrels[k] x joinrels[level-k], and — only if the level came up empty — re-runs a clauseless pass as a last-ditch Cartesian fallback. It setsroot->join_cur_levelso newly built joinrels land in the right list.make_rels_by_clause_joinsjoinsold_relto each member ofother_rels(the initial rels) that is disjoint and linked by a relevant join clause or ahave_join_order_restriction.make_rels_by_clauseless_joinsjoinsold_relto every disjoint member unconditionally — used for relations with no usable join clause and for the last-ditch pass.has_join_restriction/have_relevant_joinclause/have_join_order_restrictionare the join-graph adjacency tests that decide which pairs are worth costing (defined injoininfo.c).
Join-rel construction and the memo (joinrels.c)
Section titled “Join-rel construction and the memo (joinrels.c)”make_join_relis the universal entry for “build the join of these two rels”:bms_unionthe relids, gate onjoin_is_legal, fold in outer-join relids viaadd_outer_joins_to_relids, find-or-build theRelOptInfowithbuild_join_rel, and populate paths. ReturnsNULLfor an illegal pair.join_is_legaldetermines whether the proposed join obeys outer-join, semi/anti-join, and LATERAL ordering constraints, and reports theSpecialJoinInfoand whether the inputs must be reversed.init_dummy_sjinfofabricates a minimalSpecialJoinInfofor a plain inner join (which normally has none) so selectivity code knows what is being joined.build_join_rel(inrelnode.c) does the find-or-create againstjoin_rel_hashkeyed by the relid set — this is the DP memo. The first caller for a relid set allocates; later callers reuse and add paths.populate_joinrel_with_pathsswitches onsjinfo->jointypeand callsadd_paths_to_joinrelfor each legal orientation, marking the joinrel dummy when a side is provably empty or a constant-false restriction applies.
Path enumeration for a fixed pair (joinpath.c)
Section titled “Path enumeration for a fixed pair (joinpath.c)”These are the join-method search, invoked once the order pair is fixed.
They belong to postgres-path-generation.md; named here to close the chain.
add_paths_to_joinrelcomputesJoinPathExtraData(mergeclause list, inner-unique flag,param_source_rels) and then runs the four path builders in order:sort_inner_and_outer(mergejoin with both sides sorted),match_unsorted_outer(nestloop + mergejoin with an already-ordered outer),hash_inner_and_outer(hash join), and finally the FDW andset_join_pathlist_hookextension hooks.sort_inner_and_outerbuilds explicit-sort mergejoin paths from the cheapest-total inputs, one per candidate leading pathkey.match_unsorted_outerbuilds nestloop paths (and ordered-outer mergejoins); it is where thenestjoinOK/useallclausesjointype dispatch lives.hash_inner_and_outerscans the restrictlist for hashable clauses and builds hash-join paths from cheapest-startup and cheapest-total outers against the cheapest-total inner.try_nestloop_path/try_mergejoin_path/try_hashjoin_pathare the cost-and-keep helpers: each does aninitial_cost_*screen, computesrequired_outer, and callsadd_pathto insert the path into the joinrel’s pathlist if it survives Pareto comparison.
Genetic fallback (geqo_main.c + geqo_eval.c)
Section titled “Genetic fallback (geqo_main.c + geqo_eval.c)”geqois the GA driver. It sizes the pool (gimme_pool_size) and generation count (gimme_number_generations), random-initializes and sorts the pool by fitness, then loops:geqo_selectionpicks two parents, a recombination operator (ERX by default) crosses them into akidtour,geqo_evalscores the kid, andspread_chromoreinserts it. After all generations it returnsgimme_treeon the best tour.gimme_pool_sizedefaults the population to2^(nr_rel+1), clamped to[10*Geqo_effort, 50*Geqo_effort](i.e. 10–100 .. 50–500).gimme_number_generationsdefaults the generation count to the pool size.geqo_evalis the fitness function. It opens a private memory context and a temporaryjoin_rel_hash, saves and later truncatesjoin_rel_list(so repeated evaluations do not leak memo entries), callsgimme_tree, and returns the resultingcheapest_total_path->total_cost(orDBL_MAXif the tour yields no legal plan).gimme_treeturns a gene tour into a join relation using a clump-merging heuristic: each relation in tour order is added to the first existing “clump” it candesirable_joinwith (viamerge_clump), postponing illegal/undesirable joins; leftover clumps are then force-joined. This lets GEQO produce correct bushy plans even when the raw tour order is illegal. It calls the very samemake_join_relas the exhaustive search.merge_clumprecursively absorbs a clump into the first joinable clump, re-runningset_cheapeston each formed joinrel, keeping larger clumps first; withforce=trueit joins anywhere legal (allowing Cartesian).desirable_joinis the clump heuristic: join only if there is a relevant join clause or a join-order restriction; otherwise postpone.
A point easy to miss on first reading: GEQO must not pollute the shared
optimizer state across its thousands of fitness evaluations. geqo_eval
brackets every gimme_tree call with state save/restore — it records
list_length(root->join_rel_list) and root->join_rel_hash, runs inside a
throwaway AllocSetContext, and on the way out list_truncates the join-rel
list back to its saved length, restores the hash pointer, and deletes the
context. Without this, each evaluated tour would leak its joinrels into the
global memo and the next tour would “find” stale entries. This is the
mechanism the standard_join_search header comment alludes to when it warns
plugin authors that “the functions invoked during standard_join_search()
modify root->join_rel_list and root->join_rel_hash … See geqo_eval() for an
example” of saving and restoring them — GEQO is the in-tree precedent for any
hook that wants to run multiple join searches.
Position hints (as of 2026-06-05, REL_18 273fe94)
Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”| Symbol | File | Line |
|---|---|---|
make_one_rel | src/backend/optimizer/path/allpaths.c | 171 |
make_rel_from_joinlist | src/backend/optimizer/path/allpaths.c | 3352 |
join_search_hook (global) | src/backend/optimizer/path/allpaths.c | 88 |
standard_join_search | src/backend/optimizer/path/allpaths.c | 3457 |
join_search_one_level | src/backend/optimizer/path/joinrels.c | 73 |
make_rels_by_clause_joins | src/backend/optimizer/path/joinrels.c | 280 |
make_rels_by_clauseless_joins | src/backend/optimizer/path/joinrels.c | 314 |
join_is_legal | src/backend/optimizer/path/joinrels.c | 350 |
init_dummy_sjinfo | src/backend/optimizer/path/joinrels.c | 661 |
make_join_rel | src/backend/optimizer/path/joinrels.c | 696 |
add_outer_joins_to_relids | src/backend/optimizer/path/joinrels.c | 793 |
populate_joinrel_with_paths | src/backend/optimizer/path/joinrels.c | 885 |
add_paths_to_joinrel | src/backend/optimizer/path/joinpath.c | 124 |
try_nestloop_path | src/backend/optimizer/path/joinpath.c | 831 |
try_mergejoin_path | src/backend/optimizer/path/joinpath.c | 1029 |
try_hashjoin_path | src/backend/optimizer/path/joinpath.c | 1222 |
sort_inner_and_outer | src/backend/optimizer/path/joinpath.c | 1357 |
match_unsorted_outer | src/backend/optimizer/path/joinpath.c | 1812 |
hash_inner_and_outer | src/backend/optimizer/path/joinpath.c | 2220 |
select_mergejoin_clauses | src/backend/optimizer/path/joinpath.c | 2501 |
geqo | src/backend/optimizer/geqo/geqo_main.c | 72 |
gimme_pool_size | src/backend/optimizer/geqo/geqo_main.c | 320 |
gimme_number_generations | src/backend/optimizer/geqo/geqo_main.c | 352 |
geqo_eval | src/backend/optimizer/geqo/geqo_eval.c | 57 |
gimme_tree | src/backend/optimizer/geqo/geqo_eval.c | 163 |
merge_clump | src/backend/optimizer/geqo/geqo_eval.c | 238 |
desirable_join | src/backend/optimizer/geqo/geqo_eval.c | 325 |
geqo_threshold (GUC) | src/backend/utils/misc/guc_tables.c | 2223 |
join_rel_level field | src/include/nodes/pathnodes.h | 315 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”All claims below were checked against the REL_18_STABLE working tree at
commit 273fe94852b3a7e34fd171e8abdf1481beb302fa (PG 18.x).
- Dispatch threshold and branch.
make_rel_from_joinlist(allpaths.c) contains exactly the three-way branch quoted —join_search_hook, thenenable_geqo && levels_needed >= geqo_threshold→geqo, elsestandard_join_search. Verified by reading lines 3400–3424. geqo_thresholddefault is 12.guc_tables.cdeclares&geqo_threshold, 12, 2, INT_MAX(default 12, min 2). The companionDEFAULT_GEQO_EFFORT 5lives insrc/include/optimizer/geqo.h; pool size thus ranges 50–500 / floor 10–100, matching thegimme_pool_sizecomment50 to 500 individuals/10 to 100 individuals.join_rel_levelis the DP array. The field and its comment exist onPlannerInfoinpathnodes.h(read_write_ignore), andstandard_join_searchallocates it withpalloc0((levels_needed + 1) * sizeof(List *)), seeds[1] = initial_rels, and nulls it again before returning. Verified.- Memo via find-or-create.
make_join_relcallsbuild_join_rel(root, joinrelids, ...); the relid-set find-or-create semantics (and the consequent “same joinrel reached by many pairs” memoization) are stated in themake_rels_by_clause_joinsheader comment injoinrels.c(“at levels above 2 we will generate the same joined relation in multiple ways … the join_rel_level mechanism … automatically ensures that each new joinrel is only added to the list once”). Verified. - Bushy loop bound. The bushy
for (k = 2;; k++)loop breaks atk > other_level(halfway point) and only callsmake_join_relfor disjoint pairs withhave_relevant_joinclauseorhave_join_order_restriction. Verified at joinrels.c lines 148–198. - Last-ditch Cartesian. The
if (joinrels[level] == NIL)block re-runsmake_rels_by_clauseless_joins, and the subsequentelog(ERROR, "failed to build any %d-way joins", level)fires only when there are no special joins and no LATERAL RTEs. Verified. - GEQO reuses
make_join_rel.merge_clumpingeqo_eval.ccallsmake_join_rel(root, old_clump->joinrel, new_clump->joinrel)— i.e. the genetic path and the exhaustive path share the identical join-rel construction and path-generation machinery. Verified at geqo_eval.c lines 260–262. - GEQO memo hygiene.
geqo_evalsaveslist_length(root->join_rel_list)androot->join_rel_hash, nulls the hash, runsgimme_treein a privateAllocSetContextCreate("GEQO", ...), thenlist_truncatesjoin_rel_listback and restores the hash and deletes the context. Verified at geqo_eval.c lines 95–131. - Default recombination operator.
geqo.h’s#errorguard requires one of ERX/PMX/CX/PX/OX1/OX2; the default build definesERX(edge recombination crossover), which is the#if defined (ERX)branch taken ingeqo()’s generation loop. The CX-onlygeqo_mutationinclude is guarded by#if defined(CX). Verified. - Symmetry handling.
make_rels_by_clause_joinsis called with afirst_relofforeach_current_index(r) + 1only atlevel == 2, and the bushy loop uses the same offset only whenk == other_level; both reflect the “make_join_relhandles both x,y and y,x” symmetry stated in its header comment and thejoin_search_one_levelblock comments. Verified. add_paths_to_joinrelstep ordering. The numbered comments1.(sort_inner_and_outer),2.(match_unsorted_outer), the diked-out3.,4.(hash_inner_and_outer),5.(FDW),6.(set_join_pathlist_hook) appear verbatim in joinpath.c; step 3 (match_unsorted_inner) is inside#ifdef NOT_USED. Verified at joinpath.c lines 278–345.- REL_18 scope guard. No XLOG2 rmgr or
B_DATACHECKSUMSWORKER_*symbols appear; the join-search code is core (src/backend/optimizer/...), not contrib. The header comments still readCopyright (c) 1996-2025, which is the REL_18 source as shipped.
Taken together, these checks confirm the document’s central claim: a single
dispatch in make_rel_from_joinlist selects between an exhaustive DP that
fills join_rel_level bottom-up and a genetic search that samples join-order
tours, both funneling through the identical make_join_rel →
build_join_rel (memo) → add_paths_to_joinrel (path search) pipeline, with
the geqo_threshold GUC as the only knob that moves the boundary between the
two regimes.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”PostgreSQL’s join enumerator sits at one well-understood point in a large design space. Naming the alternatives clarifies why PostgreSQL made the choices it did and where the active research lies.
1. Top-down (Cascades) vs. bottom-up (Selinger). PostgreSQL is firmly
bottom-up DP: it builds size-2 joins, then size-3, and so on. The chief
modern alternative is the top-down, transformation-based framework of
Graefe’s Volcano/Cascades optimizer (used by SQL Server and Apache
Calcite). Cascades represents the search space as a Memo of equivalence
groups and applies transformation rules (join commutativity, associativity,
operator substitution) lazily, driven by a goal-oriented search with
branch-and-bound pruning and memoization, so it can stop exploring a
group once a cost bound is exceeded. The trade-off: Cascades is more
extensible (new operators = new rules) and prunes more aggressively, but is
heavier machinery; Selinger-style DP is simpler, exhaustive within its
regime, and easier to reason about — which suits PostgreSQL’s “no surprises”
philosophy. Both share the memo keyed by relation-set idea; PostgreSQL’s
join_rel_hash is a stripped-down memo without the rule engine.
2. The DPccp lineage — enumerating only connected subgraphs. The classic
academic refinement of Selinger DP is DPsize / DPsub / DPccp (Moerkotte &
Neumann, “Analysis of Two Existing and One New Dynamic Programming Algorithm
for the Generation of Optimal Bushy Join Trees”, VLDB 2006). DPccp enumerates
exactly the connected subgraphs and their complement pairs (csg-cmp-pairs)
of the join graph, in an order that touches each only once, and is provably
optimal in the number of join pairs it considers for bushy trees.
PostgreSQL’s join_search_one_level is a coarser approximation: it iterates
by level and uses have_relevant_joinclause to skip disconnected pairs, but
it does not implement the tight csg-cmp-pair enumeration, so it can
redundantly revisit work (the source comments openly accept “duplicated
effort” in some clauseless cases). This is a known gap; DPhyp (the
hypergraph extension handling complex join predicates) is the state of the
art that PostgreSQL has not adopted.
3. The threshold problem and GEQO’s age. PostgreSQL’s switch to a genetic
algorithm at geqo_threshold = 12 is pragmatic but blunt. A genetic
algorithm gives no optimality guarantee and its results are nondetermin-
istic across geqo_seed values, which surprises users who expect stable
plans. Modern systems often push exhaustive DP much further (DPhyp handles
20–25 relations comfortably) or use adaptive / cost-bounded schemes
rather than a hard cutoff. GEQO is essentially unchanged 1990s code (the
Utesch contribution, adapted from Whitley’s Genitor); raising
geqo_threshold or disabling GEQO entirely is common operational advice for
deterministic plans, accepting longer planning time. There is recurring
community discussion about replacing GEQO with a more modern heuristic or a
better DP algorithm, but the exhaustive path’s O(2^n) constants are the
real obstacle.
It is worth being precise about what GEQO does and does not guarantee. It is
a generational genetic algorithm with a steady-state reinsertion
(spread_chromo drops the worst, not a whole-generation replace), edge
recombination crossover by default (ERX, which tends to preserve adjacency
information from both parents — apt for a tour-like genome), and selection
pressure tuned by Geqo_selection_bias. None of this guarantees the global
optimum, and the pool size (2^(nr_rel+1) clamped to [10..50] * Geqo_effort)
and generation count (defaulting to the pool size) are heuristics chosen for
“good enough, fast enough”, not for convergence proofs. In practice GEQO’s
plans are usually within a small factor of optimal, but the variance across
seeds is real, and a single bad evaluation that returns DBL_MAX (an illegal
tour gimme_tree could not repair) simply drops out of the gene pool. The
honest summary is that GEQO trades the DP’s optimality-within-cost-model
guarantee for tractability, and the geqo_threshold GUC is the user’s dial
on where that trade kicks in.
4. Adaptive and learned query optimization. The deepest critique of the whole Selinger edifice is that it commits to a single plan before execution based on estimated cardinalities — and cardinality estimation errors compound multiplicatively through a join tree, so the “optimal” plan over bad estimates can be arbitrarily bad. Two research fronts respond:
- Adaptive query processing — e.g. Eddies (Avnur & Hellerstein,
captured in
knowledge/research/dbms-papers/eddies.md) abandons a fixed join order entirely, routing each tuple through operators dynamically so the effective order can change during execution in response to observed selectivities. PostgreSQL has nothing of this kind; its order is frozen at plan time. - Learned optimizers — recent systems (Neo, Bao, and the broader “ML for
query optimization” line) use reinforcement learning or learned cost models
to choose join orders, treating the enumerator’s output as a policy to be
improved from execution feedback. These remain largely research / extension
territory; PostgreSQL’s core stays with analytic cost models, though its
join_search_hookis precisely the seam such a system would plug into.
5. What PostgreSQL gets right. Despite its age, the design has durable
virtues: the find-or-create memo is correct and simple; bushy search with
clause-connectivity pruning covers the cases that matter (star schemas,
forced-bushy outer joins) without full DPccp complexity; and the clean
join_search_hook / set_join_pathlist_hook seams let pg_hint_plan and
experimental optimizers intervene without forking the planner. The two-regime
structure — exhaustive when feasible, heuristic when not — is exactly the
shape Selinger’s paper anticipated, four decades on.
Sources
Section titled “Sources”- Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A., &
Price, T. G. (1979). Access Path Selection in a Relational Database
Management System. SIGMOD. The origin of cost-based, bottom-up DP join
ordering, interesting orders, and selectivity-based cost estimation.
Local capture:
knowledge/research/dbms-papers/systemr-optimizer.md. - Avnur, R. & Hellerstein, J. M. (2000). Eddies: Continuously Adaptive
Query Processing. SIGMOD. The adaptive-ordering counterpoint to fixed
Selinger ordering. Local capture:
knowledge/research/dbms-papers/eddies.md. - Moerkotte, G. & Neumann, T. (2006). Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees (DPccp), VLDB; and DPhyp (2008). The optimal connected-subgraph enumeration PostgreSQL approximates. (Reference, not locally captured.)
- PostgreSQL REL_18_STABLE source (commit
273fe94):src/backend/optimizer/path/allpaths.c—make_rel_from_joinlist,standard_join_search,make_one_rel.src/backend/optimizer/path/joinrels.c—join_search_one_level,make_join_rel,make_rels_by_clause_joins,populate_joinrel_with_paths.src/backend/optimizer/path/joinpath.c—add_paths_to_joinreland the sort/nestloop/hash path builders.src/backend/optimizer/geqo/geqo_main.c,src/backend/optimizer/geqo/geqo_eval.c— the genetic fallback.src/backend/optimizer/README— the planner’s own narrative of join search, parameterized paths, and the param_source_rels heuristic.src/include/nodes/pathnodes.h—PlannerInfo.join_rel_level,RelOptInfo.src/include/optimizer/geqo.h,src/backend/utils/misc/guc_tables.c— GEQO defaults and thegeqo_thresholdGUC.
- Cross-references within this KB:
postgres-planner-overview.md(how the joinlist is produced and how the optimizer fits the planner),postgres-path-generation.md(the per-pair path search this doc hands off to),postgres-cost-model.md(the cost functions that driveadd_path), andpostgres-executor.md(how the chosen join plan executes).