Skip to content

PostgreSQL Join Ordering — Dynamic-Programming Search and the GEQO Fallback

Contents:

For any query that joins more than two relations, the optimizer faces two nested decisions. The first is which order to join the relations in((A ⋈ B) ⋈ C) versus (A ⋈ (B ⋈ C)) versus ((A ⋈ C) ⋈ B) and so on. The second, for a fixed order, is which physical algorithm to run at each join (nested loop, sort-merge, hash) and which access path to read each base relation through (sequential scan, index scan, bitmap scan). This document is about the first decision — join enumeration — and treats the second as a delegated sub-problem (covered in postgres-path-generation.md and postgres-cost-model.md).

The reason join ordering dominates is that the number of orderings explodes combinatorially while the per-order work is comparatively bounded. The number of distinct join trees over n relations, counting both left-deep and bushy shapes and both orderings of every join, is

(2(n-1))! / (n-1)!

which is 2 for n=2, 12 for n=3, 120 for n=4, 1680 for n=5, and grows super-exponentially thereafter. Even restricting to left-deep trees there are n! orderings. No engine can cost every tree for large n, so every optimizer must either prune the space or search it heuristically. The choice of which order matters enormously: a bad order builds a huge intermediate result that a good order would have avoided entirely, and intermediate size multiplies through the rest of the plan.

The foundational treatment is Selinger et al., Access Path Selection in a Relational Database Management System (SIGMOD 1979 — the System R optimizer paper, captured in knowledge/research/dbms-papers/systemr-optimizer.md). Two of its ideas are load-bearing for everything PostgreSQL does:

  1. Cost-based, not rule-based. System R assigns every candidate plan a numeric cost — a weighted sum of estimated page fetches and CPU instructions, computed from catalog statistics (cardinalities, selectivities) — and picks the minimum. The paper introduces the selectivity factors and the cost formulae that PostgreSQL’s cost model still echoes.

  2. Dynamic programming over relation subsets. Rather than enumerate whole trees, System R builds the answer bottom-up by subset size. It first finds the best way to access each single relation; then, for every pair, the best two-way join; then, for every triple, the best three-way join, reusing the already-computed best plans for the constituent subsets. The key insight is the principle of optimality: the cheapest plan for joining a set S of relations must be built from a cheapest plan for some subset of S joined to the rest. So the optimizer need only keep, for each subset of relations, the cheapest plan(s) producing that subset — a memo. This collapses the (2(n-1))!/(n-1)! tree space to roughly 2^n subsets, each costed a bounded number of times.

To make the explosion concrete, consider a five-way join A⋈B⋈C⋈D⋈E. There are 1680 distinct join trees (bushy, both child orders counted) but only 2^5 - 1 = 31 non-empty subsets of {A,B,C,D,E}, of which only those with 2+ members (26 of them) are join relations. DP costs each of those 26 relations once and, for each, considers the handful of ways to split it into two already-costed sub-relations. The asymptotic difference is the whole game: tree count is Θ(4^n / n^{1.5}), subset count is 2^n. At geqo_threshold = 12 relations, 2^12 = 4096 subsets is still tractable; the 1680-style tree count for 12 relations is astronomically larger, which is exactly why the cutoff sits where it does.

Selinger’s System R kept only left-deep trees and pruned away non-interesting orders, trading completeness for tractability. The paper also introduced interesting orders: a sub-plan that produces its output sorted in an order useful to a later operator (a merge join, an ORDER BY, a GROUP BY) is retained even if it is not the cheapest, because the sort it provides may save a later sort. That nuance — that “cheapest” is not a single scalar but is qualified by output ordering and by parameterization — is why PostgreSQL keeps a set of non-dominated paths per relation rather than one winner, and is detailed in postgres-path-generation.md.

The interesting-orders idea deserves a second look because it is the reason “the cheapest plan” is not a scalar. Selinger observed that a sub-plan producing relation {A,B} sorted on the column that a later merge join needs is worth keeping even if a cheaper unsorted plan for {A,B} exists — the sort it already provides can save the later merge join from sorting, and that downstream saving may exceed the upfront premium. So the optimizer must retain, per relation set, not the single cheapest plan but the Pareto frontier over (cost, output-order) — and, in modern PostgreSQL, over (cost, output-order, parameterization). This is why the join-search code repeatedly calls set_cheapest (to pick scalar winners for the next level’s building) yet keeps a whole pathlist (to preserve the frontier). The distinction — scalar winner for construction, frontier for retention — recurs throughout the implementation.

The DP approach has one well-known failure mode: 2^n itself becomes intractable past a few dozen relations, and the constant factors (each subset spawns many candidate joins and paths) bite well before that. For large join problems the literature turns to randomized / heuristic search — iterative improvement, simulated annealing, and genetic algorithms that treat a join order as a “tour” (by analogy to the Travelling Salesman Problem) and evolve a population of tours toward lower cost. PostgreSQL implements exactly this two-regime design: exhaustive DP for small problems, a genetic algorithm (GEQO) past a threshold.

The theory fixes the skeleton — cost-based DP over relation subsets, with a heuristic escape hatch for large problems. This section names the engineering conventions that production optimizers share when they implement it, so that PostgreSQL’s specific code reads as one point in a known design space.

1. A memo keyed by relation-set. Every DP optimizer needs a table that maps a set of base relations to the best plan(s) producing that set. The set is the key precisely because join order is associative and commutative at the logical level: (A ⋈ B) ⋈ C and (B ⋈ C) ⋈ A produce the same relation {A,B,C} and must collapse to one memo entry, against which both construction routes compete. Whether the engine calls this a “memo” (Cascades / SQL Server / Calcite), a “MEMO structure” of “groups”, or — in PostgreSQL — the join_rel_list / join_rel_hash of RelOptInfo nodes keyed by a Relids bitmapset, the role is identical: find-or-create by relid set, then add candidate plans to the existing node.

A subtlety the textbook glosses but every implementation must confront: the memo entry for a relation set does not hold one plan, it holds a small set of non-dominated plans. A plan that is more expensive overall may still be worth keeping if it delivers its output in a sort order a later operator needs (Selinger’s “interesting orders”), or if it is parameterized so it can be the inner side of a parameterized nested loop. So “find-or-create by relid set” is followed by “merge this candidate into the kept set, discarding any it dominates and discarding itself if dominated” — the Pareto frontier maintained by what PostgreSQL calls add_path. The relid set is the memo key; the kept path set is the memo value.

2. Level-by-level construction (the DP order). The bottom-up variant (“Selinger-style DP”) iterates by subset size: build all size-2 joins from size-1 rels, all size-3 from {size-1 ⋈ size-2}, and so on. This guarantees that when you build a size-k join, every smaller subset it could be built from already has its best plan computed. PostgreSQL materializes this as an explicit array indexed by level, join_rel_level[k].

3. Tree shape: left-deep vs. bushy. A left-deep tree always joins an accumulated result to a single base relation; a bushy tree may join two composite intermediates. Left-deep is smaller to search and pipelines well for nested loops; bushy can be dramatically cheaper for star/snowflake schemas and is sometimes the only legal shape under outer-join ordering constraints. Many early optimizers (System R) searched left-deep only; modern ones, PostgreSQL included, search bushy trees too — but prune bushy joins to pairs that actually share a join clause, to keep the blow-up bounded.

4. Connected-subgraph pruning. Costing a Cartesian product (a join with no join clause linking the two sides) is almost always wasteful, so the enumerator prefers pairs (R, S) that are linked by a join predicate — i.e. adjacent in the join graph whose nodes are relations and whose edges are join clauses. Cartesian joins are generated only as a last resort when no clause-connected plan exists.

5. Join-order constraints from semantics. Inner joins commute and associate freely, but outer joins, semi/anti joins (from IN/EXISTS), and LATERAL references impose ordering restrictions: some relations must be joined before others, or cannot be reordered past an outer join without changing results. A correctness check (join_is_legal in PostgreSQL) must gate every proposed pair, and the enumerator must be willing to produce bushy plans when constraints make left-deep illegal.

6. The large-N escape hatch. Because 2^n and its constants eventually dominate, optimizers cap the exhaustive search and switch to a heuristic for big join problems: greedy left-deep (“minimum selectivity first”), randomized restarts, simulated annealing, or a genetic algorithm. The cap is usually a tunable threshold on the number of FROM-list items.

7. Symmetry exploitation. Because logical join is commutative, the enumerator would waste half its work if it considered both (R, S) and (S, R) as distinct order-search events. The standard trick is to make the join-rel constructor symmetric — given an unordered pair it internally tries both orientations as physical alternatives — and to have the enumerator visit each unordered pair once. PostgreSQL does both: make_join_rel(x, y) “handles both x,y and y,x cases”, and join_search_one_level uses a first_rel index at the symmetric levels (level == 2, and the bushy k == other_level case) to skip pairs already considered. The result is that order search enumerates unordered pairs while path search still gets to choose which side is outer.

8. Cost-model coupling. The enumerator is only as good as the cardinality and cost estimates that drive add_path’s domination test. A poor row estimate for an intermediate join propagates: it makes the wrong sub-plan look cheapest, which then becomes the building block for the next level. This is the Achilles heel every Selinger-style optimizer shares, and the reason the “Beyond” section below devotes space to adaptive and learned approaches that revisit ordering at execution time. The DP machinery itself is correct; its inputs are the soft spot.

flowchart TD
  J["make_rel_from_joinlist<br/>levels_needed = #jointree items"] --> Q{"enable_geqo AND<br/>levels_needed >= geqo_threshold?"}
  Q -- "no (small)" --> DP["standard_join_search<br/>exhaustive DP"]
  Q -- "yes (large)" --> GEQO["geqo<br/>genetic search"]
  DP --> L["join_rel_level[1..n]<br/>filled level by level"]
  GEQO --> P["pool of join-order tours<br/>evolved over generations"]
  L --> TOP["top-level RelOptInfo<br/>cheapest_total_path"]
  P --> TOP

PostgreSQL is a faithful, modern implementation of Selinger-style DP, with a genetic-algorithm fallback bolted on at a configurable threshold. The whole join-search subsystem is reached from one function, make_rel_from_joinlist (in allpaths.c), which the planner calls after it has flattened the FROM clause into a joinlist (a possibly-nested list of RangeTblRef leaves and sub-lists; see postgres-planner-overview.md for how the joinlist is produced by deconstruct_jointree).

The entry point counts the items and dispatches:

// make_rel_from_joinlist — src/backend/optimizer/path/allpaths.c
levels_needed = list_length(joinlist);
// ... build initial_rels (one RelOptInfo per joinlist child, recursing
// into sub-lists) ...
if (levels_needed == 1)
return (RelOptInfo *) linitial(initial_rels);
else
{
root->initial_rels = initial_rels;
if (join_search_hook)
return (*join_search_hook) (root, levels_needed, initial_rels);
else if (enable_geqo && levels_needed >= geqo_threshold)
return geqo(root, levels_needed, initial_rels);
else
return standard_join_search(root, levels_needed, initial_rels);
}

levels_needed is the number of independent jointree items — not the raw table count, because explicit JOIN syntax and flattened sub-selects can group items. The three-way branch is the design in miniature: a loadable extension (join_search_hook, e.g. pg_hint_plan) can replace the search entirely; otherwise the geqo_threshold GUC (default 12) selects the genetic search for large problems and the exhaustive DP for everything else.

The DP memo. The thing that makes this dynamic programming and not brute force is root->join_rel_level, an array of List * indexed by level. The field is documented on PlannerInfo:

// PlannerInfo — src/include/nodes/pathnodes.h
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
* is a list of all the join-relations of k items ...
* join_cur_level is the current level. New join-relation RelOptInfos are
* automatically added to the join_rel_level[join_cur_level] list.
* join_rel_level is NULL if not in use.
*/
List **join_rel_level pg_node_attr(read_write_ignore);
int join_cur_level;

Crucially, the uniqueness of a join relation is keyed by its set of relids, not by how it was built. build_join_rel (called from make_join_rel) does a find-or-create against join_rel_hash: the first construction route that produces {A,B,C} allocates the RelOptInfo; every later route that produces the same set finds the existing node and merely adds more paths to it. That is the memo. join_search_one_level adds each freshly-built joinrel to join_rel_level[join_cur_level] exactly once, regardless of how many pairs build it.

standard_join_search — the level loop. Level 1 is seeded with the initial rels; each subsequent level is built by join_search_one_level, and each new joinrel is finalized with set_cheapest:

// standard_join_search — src/backend/optimizer/path/allpaths.c
root->join_rel_level = (List **) palloc0((levels_needed + 1) * sizeof(List *));
root->join_rel_level[1] = initial_rels;
for (lev = 2; lev <= levels_needed; lev++)
{
join_search_one_level(root, lev);
foreach(lc, root->join_rel_level[lev])
{
rel = (RelOptInfo *) lfirst(lc);
generate_partitionwise_join_paths(root, rel);
if (!bms_equal(rel->relids, root->all_query_rels))
generate_useful_gather_paths(root, rel, false);
set_cheapest(rel);
}
}
// final level must contain exactly one rel — the whole join
Assert(list_length(root->join_rel_level[levels_needed]) == 1);
rel = (RelOptInfo *) linitial(root->join_rel_level[levels_needed]);

After the last level there must be exactly one rel (the join of everything), and its cheapest_total_path is the chosen plan. set_cheapest is what distills the per-rel pathlist (a set of non-dominated paths kept by add_path) down to the cheapest-startup and cheapest-total winners that the next level will build on.

join_search_one_level — the enumerator. This is the heart of the strategy. For a target level, it considers three families of constructions:

  1. Left/right-sided joins — each (level-1)-rel joined to an initial (level-1, i.e. single) rel it shares a clause with. This is the left-deep backbone.
  2. Bushy joins — each k-rel joined to a (level-k)-rel for 2 <= k <= level-2, only when a relevant join clause or order restriction links them.
  3. Last-ditch Cartesian — if neither produced any joinrel at this level, force clauseless joins so the search cannot dead-end.
// join_search_one_level — src/backend/optimizer/path/joinrels.c
foreach(r, joinrels[level - 1])
{
RelOptInfo *old_rel = (RelOptInfo *) lfirst(r);
if (old_rel->joininfo != NIL || old_rel->has_eclass_joins ||
has_join_restriction(root, old_rel))
{
int first_rel;
if (level == 2) /* avoid considering the same pair twice */
first_rel = foreach_current_index(r) + 1;
else
first_rel = 0;
make_rels_by_clause_joins(root, old_rel, joinrels[1], first_rel);
}
else
make_rels_by_clauseless_joins(root, old_rel, joinrels[1]);
}

The level == 2 first_rel trick exploits symmetry — at level 2 joining (a) to (b) is the same problem as (b) to (a), so it only looks at rels after the current one. The bushy loop walks k from 2 to the halfway point (since make_join_rel(x,y) already handles both orderings) and only joins pairs with !bms_overlap (disjoint relid sets) that have a relevant clause or restriction:

// join_search_one_level (bushy section) — joinrels.c
for (k = 2;; k++)
{
int other_level = level - k;
if (k > other_level)
break; /* only go to the halfway point */
foreach(r, joinrels[k])
{
/* ... skip rels with no join clause and no restriction ... */
for_each_from(r2, joinrels[other_level], first_rel)
{
RelOptInfo *new_rel = (RelOptInfo *) lfirst(r2);
if (!bms_overlap(old_rel->relids, new_rel->relids))
{
if (have_relevant_joinclause(root, old_rel, new_rel) ||
have_join_order_restriction(root, old_rel, new_rel))
(void) make_join_rel(root, old_rel, new_rel);
}
}
}
}

make_join_rel — legality, memo, paths. Every candidate pair flows through make_join_rel. It computes the union of relids, checks join_is_legal (rejecting orderings forbidden by outer/semi-join or LATERAL constraints — returns NULL for an illegal pair), finds-or-builds the memo RelOptInfo, and then calls populate_joinrel_with_paths:

// make_join_rel — src/backend/optimizer/path/joinrels.c
joinrelids = bms_union(rel1->relids, rel2->relids);
if (!join_is_legal(root, rel1, rel2, joinrelids, &sjinfo, &reversed))
return NULL; /* invalid join order */
joinrelids = add_outer_joins_to_relids(root, joinrelids, sjinfo,
&pushed_down_joins);
if (reversed) { /* swap rel1/rel2 to match sjinfo */ }
if (sjinfo == NULL) { sjinfo = &sjinfo_data; init_dummy_sjinfo(...); }
joinrel = build_join_rel(root, joinrelids, rel1, rel2,
sjinfo, pushed_down_joins, &restrictlist);
if (is_dummy_rel(joinrel)) return joinrel; /* provably empty */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo, restrictlist);
return joinrel;

populate_joinrel_with_paths switches on the join type and calls add_paths_to_joinrel once for each legal (outer, inner) orientation — e.g. for a plain inner join it tries both (rel1 outer, rel2 inner) and (rel2 outer, rel1 inner), because nested-loop and hash joins are not symmetric in cost (the inner side is rescanned / hashed). This is where the join-order search hands off to the join-method and path search, which add_paths_to_joinrel performs by calling sort_inner_and_outer, match_unsorted_outer, and hash_inner_and_outer in turn — covered in the next section and, in depth, in postgres-path-generation.md and postgres-cost-model.md.

flowchart TD
  S1["join_search_one_level(level)"] --> A["left/right-sided:<br/>make_rels_by_clause_joins<br/>(level-1) x initial rels"]
  S1 --> B["bushy: k x (level-k)<br/>with relevant clause only"]
  S1 --> C{"joinrels[level] empty?"}
  C -- yes --> D["last-ditch:<br/>make_rels_by_clauseless_joins"]
  A --> MJ["make_join_rel(rel1, rel2)"]
  B --> MJ
  D --> MJ
  MJ --> LEG{"join_is_legal?"}
  LEG -- no --> X["return NULL"]
  LEG -- yes --> BJR["build_join_rel<br/>find-or-create by relids = memo"]
  BJR --> PP["populate_joinrel_with_paths"]
  PP --> APJ["add_paths_to_joinrel<br/>sort-merge / nestloop / hash"]

A worked level-by-level trace. Take a four-item query SELECT ... FROM a, b, c, d WHERE a.x=b.x AND b.y=c.y AND c.z=d.z — a chain join graph a—b—c—d. levels_needed = 4, below the threshold, so standard_join_search runs. The DP array fills like this:

Leveljoin_rel_level[k] contentsHow each was built
1{a}, {b}, {c}, {d}the initial_rels seed
2{a,b}, {b,c}, {c,d}clause-joins; {a,c},{a,d},{b,d} skipped — no join clause
3{a,b,c}, {b,c,d}{a,b}⋈c, c⋈{a,b}; {b,c}⋈a, {b,c}⋈d; {c,d}⋈b
4{a,b,c,d}{a,b,c}⋈d (left-deep) and {a,b}⋈{c,d} (bushy)

The instructive row is level 4: the final relation {a,b,c,d} is reached by two construction routes — the left-deep {a,b,c}⋈d from make_rels_by_clause_joins, and the bushy {a,b}⋈{c,d} from the k=2, other_level=2 branch of the bushy loop. Both call make_join_rel, both resolve via build_join_rel to the same RelOptInfo (keyed on the relid set {a,b,c,d}), and both contribute paths to its single pathlist; set_cheapest then picks the global winner across all routes. That collapse is the dynamic-programming memo doing its job: the relation is costed as one node no matter how many ways the enumerator found to assemble it. Note also the level-2 omissions — {a,c}, {a,d}, {b,d} are never built because no join clause links them, so the connected-subgraph pruning keeps even this small problem from costing Cartesian intermediates.

Legality gating and outer joins. Inner joins commute and associate freely, but the moment outer joins, semi/anti joins, or LATERAL references enter, most orderings become illegal — reordering a relation past an outer join changes which rows get null-extended, and a LATERAL subquery must be joined after the relations it references. make_join_rel therefore gates every pair on join_is_legal before doing any work, and bails to NULL for an illegal pair so no RelOptInfo and no paths are ever created for it:

// make_join_rel — src/backend/optimizer/path/joinrels.c
Assert(!bms_overlap(rel1->relids, rel2->relids));
joinrelids = bms_union(rel1->relids, rel2->relids);
if (!join_is_legal(root, rel1, rel2, joinrelids, &sjinfo, &reversed))
{
bms_free(joinrelids);
return NULL; /* invalid join order — drop it */
}

join_is_legal consults root->join_info_list (one SpecialJoinInfo per non-inner join, carrying min_lefthand / min_righthand relid sets that encode the legal operand ranges) and the LATERAL bookkeeping. Its two outputs steer the rest of make_join_rel: *sjinfo_p (the matched join’s SpecialJoinInfo, or NULL for a plain inner join — in which case init_dummy_sjinfo fabricates a minimal one for the selectivity code) and *reversed_p (whether the two inputs must be swapped to match the join’s syntactic LHS/RHS). The presence of these constraints is also why the enumerator must build bushy trees: there are queries — nested semijoins are the canonical example — where no left-deep order is legal at some intermediate level, and only a bushy assembly at a higher level succeeds. The elog(ERROR, "failed to build any %d-way joins", level) in join_search_one_level is deliberately suppressed when root->join_info_list != NIL for exactly this reason: an empty level can be a legitimate, recoverable state when special joins are present.

Where order search ends and path search begins. Once make_join_rel has fixed an (outer, inner) orientation, add_paths_to_joinrel runs a fixed, numbered sequence of path builders. The numbering in the source is worth quoting because it is the contract between the two searches — the join-order layer above never sees this; it only sees the resulting pathlist:

// add_paths_to_joinrel — src/backend/optimizer/path/joinpath.c
/* 1. mergejoin with both relations explicitly sorted */
if (mergejoin_allowed)
sort_inner_and_outer(root, joinrel, outerrel, innerrel, jointype, &extra);
/* 2. nestloops + mergejoins where the outer is already ordered */
if (mergejoin_allowed)
match_unsorted_outer(root, joinrel, outerrel, innerrel, jointype, &extra);
/* 4. hash join (both sides hashed) */
if (enable_hashjoin || jointype == JOIN_FULL)
hash_inner_and_outer(root, joinrel, outerrel, innerrel, jointype, &extra);
/* 5/6. FDW join pushdown and the set_join_pathlist_hook extension seam */

Each builder ends in a try_*_path helper that screens the candidate with an initial_cost_* estimate and, if it survives, calls add_path to insert it into joinrel->pathlistadd_path being the Pareto filter that keeps the set of non-dominated paths (cheapest-total, plus any with a useful sort order or cheaper parameterization). The full content of these builders is the subject of postgres-path-generation.md; here the point is only that join ordering treats them as a black box returning costed paths.

The GEQO regime. When levels_needed >= geqo_threshold, geqo() runs instead of the DP. It is a textbook generational genetic algorithm whose “genome” is a tour — a permutation of relation indices — and whose fitness is plan cost. The driver loop is compact:

// geqo — src/backend/optimizer/geqo/geqo_main.c
pool_size = gimme_pool_size(number_of_rels);
number_generations = gimme_number_generations(pool_size);
pool = alloc_pool(root, pool_size, number_of_rels);
random_init_pool(root, pool);
sort_pool(root, pool); /* sort by fitness once; kids replace worst */
for (generation = 0; generation < number_generations; generation++)
{
geqo_selection(root, momma, daddy, pool, Geqo_selection_bias);
/* default build: edge recombination crossover (ERX) */
gimme_edge_table(root, momma->string, daddy->string, pool->string_length, edge_table);
kid = momma;
edge_failures += gimme_tour(root, edge_table, kid->string, pool->string_length);
kid->worth = geqo_eval(root, kid->string, pool->string_length);
spread_chromo(root, kid, pool); /* reinsert kid by fitness rank */
}
best_tour = (Gene *) pool->data[0].string;
best_rel = gimme_tree(root, best_tour, pool->string_length);

Three design points matter. First, the pool is sorted once and thereafter each child is slotted into rank order by spread_chromo, pushing out the worst individual — a steady-state-ish scheme adapted from Whitley’s Genitor. Second, fitness is real plan cost: geqo_eval actually builds the joinrels for the tour and reads cheapest_total_path->total_cost, so GEQO is costing the same paths the DP would, just over far fewer orders. Third, the search is seeded by geqo_seed, so its output is nondeterministic unless the seed is pinned — the operational tax for escaping the 2^n wall.

The clump-merging gimme_tree is what lets a random permutation still yield a legal plan: rather than blindly join in tour order (which could violate outer-join or LATERAL constraints), each relation joins the first existing clump it desirable_joins with, and leftover clumps are force-joined at the end. Because it routes through the same make_join_rel, every legality and costing rule the exhaustive search obeys is obeyed here too.

flowchart TD
  INIT["alloc_pool + random_init_pool<br/>sort_pool by fitness"] --> GEN{"generation < number_generations?"}
  GEN -- yes --> SEL["geqo_selection<br/>pick momma + daddy by linear bias"]
  SEL --> XOVER["recombination (ERX)<br/>crossover -> kid tour"]
  XOVER --> EVAL["geqo_eval -> gimme_tree<br/>fitness = cheapest_total_path cost"]
  EVAL --> SPREAD["spread_chromo<br/>reinsert kid, drop worst"]
  SPREAD --> GEN
  GEN -- no --> BEST["gimme_tree(best_tour)<br/>return cheapest join rel"]

The genetic loop and the DP loop are interchangeable from the caller’s view: both consume initial_rels and return a single top-level RelOptInfo with a cheapest_total_path. Only the coverage differs — DP guarantees the global optimum within its cost model, GEQO samples the space and returns the best of what it sampled. This substitutability is exactly why make_rel_from_joinlist can pick between them (and a hook) with a single if.

This section walks the call chain symbol by symbol, from the planner’s hand-off down into the genetic fallback. Anchor on the symbol names; the line numbers in the closing table are hints valid only as of the updated: date.

  • make_one_rel is the planner’s top of access-path generation; it sets base-rel paths (set_base_rel_pathlists) and then calls make_rel_from_joinlist to build the join tree. (Detailed in postgres-path-generation.md.)
  • make_rel_from_joinlist recurses over the joinlist building initial_rels, then dispatches on levels_needed: a join_search_hook override, geqo when enable_geqo && levels_needed >= geqo_threshold, or standard_join_search. A single jointree item returns immediately with no search.
  • join_search_hook (a join_search_hook_type global) lets extensions replace the search wholesale; it is the seam pg_hint_plan uses.
  • standard_join_search allocates root->join_rel_level, seeds level 1 with initial_rels, then loops lev from 2 to levels_needed, calling join_search_one_level and finalizing each new joinrel with generate_partitionwise_join_paths, generate_useful_gather_paths, and set_cheapest. It asserts a single rel at the top level and returns it.
  • join_search_one_level is the enumerator. It scans joinrels[level-1] for left/right-sided joins, runs the bushy double-loop over joinrels[k] x joinrels[level-k], and — only if the level came up empty — re-runs a clauseless pass as a last-ditch Cartesian fallback. It sets root->join_cur_level so newly built joinrels land in the right list.
  • make_rels_by_clause_joins joins old_rel to each member of other_rels (the initial rels) that is disjoint and linked by a relevant join clause or a have_join_order_restriction.
  • make_rels_by_clauseless_joins joins old_rel to every disjoint member unconditionally — used for relations with no usable join clause and for the last-ditch pass.
  • has_join_restriction / have_relevant_joinclause / have_join_order_restriction are the join-graph adjacency tests that decide which pairs are worth costing (defined in joininfo.c).

Join-rel construction and the memo (joinrels.c)

Section titled “Join-rel construction and the memo (joinrels.c)”
  • make_join_rel is the universal entry for “build the join of these two rels”: bms_union the relids, gate on join_is_legal, fold in outer-join relids via add_outer_joins_to_relids, find-or-build the RelOptInfo with build_join_rel, and populate paths. Returns NULL for an illegal pair.
  • join_is_legal determines whether the proposed join obeys outer-join, semi/anti-join, and LATERAL ordering constraints, and reports the SpecialJoinInfo and whether the inputs must be reversed.
  • init_dummy_sjinfo fabricates a minimal SpecialJoinInfo for a plain inner join (which normally has none) so selectivity code knows what is being joined.
  • build_join_rel (in relnode.c) does the find-or-create against join_rel_hash keyed by the relid set — this is the DP memo. The first caller for a relid set allocates; later callers reuse and add paths.
  • populate_joinrel_with_paths switches on sjinfo->jointype and calls add_paths_to_joinrel for each legal orientation, marking the joinrel dummy when a side is provably empty or a constant-false restriction applies.

Path enumeration for a fixed pair (joinpath.c)

Section titled “Path enumeration for a fixed pair (joinpath.c)”

These are the join-method search, invoked once the order pair is fixed. They belong to postgres-path-generation.md; named here to close the chain.

  • add_paths_to_joinrel computes JoinPathExtraData (mergeclause list, inner-unique flag, param_source_rels) and then runs the four path builders in order: sort_inner_and_outer (mergejoin with both sides sorted), match_unsorted_outer (nestloop + mergejoin with an already-ordered outer), hash_inner_and_outer (hash join), and finally the FDW and set_join_pathlist_hook extension hooks.
  • sort_inner_and_outer builds explicit-sort mergejoin paths from the cheapest-total inputs, one per candidate leading pathkey.
  • match_unsorted_outer builds nestloop paths (and ordered-outer mergejoins); it is where the nestjoinOK / useallclauses jointype dispatch lives.
  • hash_inner_and_outer scans the restrictlist for hashable clauses and builds hash-join paths from cheapest-startup and cheapest-total outers against the cheapest-total inner.
  • try_nestloop_path / try_mergejoin_path / try_hashjoin_path are the cost-and-keep helpers: each does an initial_cost_* screen, computes required_outer, and calls add_path to insert the path into the joinrel’s pathlist if it survives Pareto comparison.

Genetic fallback (geqo_main.c + geqo_eval.c)

Section titled “Genetic fallback (geqo_main.c + geqo_eval.c)”
  • geqo is the GA driver. It sizes the pool (gimme_pool_size) and generation count (gimme_number_generations), random-initializes and sorts the pool by fitness, then loops: geqo_selection picks two parents, a recombination operator (ERX by default) crosses them into a kid tour, geqo_eval scores the kid, and spread_chromo reinserts it. After all generations it returns gimme_tree on the best tour.
  • gimme_pool_size defaults the population to 2^(nr_rel+1), clamped to [10*Geqo_effort, 50*Geqo_effort] (i.e. 10–100 .. 50–500).
  • gimme_number_generations defaults the generation count to the pool size.
  • geqo_eval is the fitness function. It opens a private memory context and a temporary join_rel_hash, saves and later truncates join_rel_list (so repeated evaluations do not leak memo entries), calls gimme_tree, and returns the resulting cheapest_total_path->total_cost (or DBL_MAX if the tour yields no legal plan).
  • gimme_tree turns a gene tour into a join relation using a clump-merging heuristic: each relation in tour order is added to the first existing “clump” it can desirable_join with (via merge_clump), postponing illegal/undesirable joins; leftover clumps are then force-joined. This lets GEQO produce correct bushy plans even when the raw tour order is illegal. It calls the very same make_join_rel as the exhaustive search.
  • merge_clump recursively absorbs a clump into the first joinable clump, re-running set_cheapest on each formed joinrel, keeping larger clumps first; with force=true it joins anywhere legal (allowing Cartesian).
  • desirable_join is the clump heuristic: join only if there is a relevant join clause or a join-order restriction; otherwise postpone.

A point easy to miss on first reading: GEQO must not pollute the shared optimizer state across its thousands of fitness evaluations. geqo_eval brackets every gimme_tree call with state save/restore — it records list_length(root->join_rel_list) and root->join_rel_hash, runs inside a throwaway AllocSetContext, and on the way out list_truncates the join-rel list back to its saved length, restores the hash pointer, and deletes the context. Without this, each evaluated tour would leak its joinrels into the global memo and the next tour would “find” stale entries. This is the mechanism the standard_join_search header comment alludes to when it warns plugin authors that “the functions invoked during standard_join_search() modify root->join_rel_list and root->join_rel_hash … See geqo_eval() for an example” of saving and restoring them — GEQO is the in-tree precedent for any hook that wants to run multiple join searches.

Position hints (as of 2026-06-05, REL_18 273fe94)

Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”
SymbolFileLine
make_one_relsrc/backend/optimizer/path/allpaths.c171
make_rel_from_joinlistsrc/backend/optimizer/path/allpaths.c3352
join_search_hook (global)src/backend/optimizer/path/allpaths.c88
standard_join_searchsrc/backend/optimizer/path/allpaths.c3457
join_search_one_levelsrc/backend/optimizer/path/joinrels.c73
make_rels_by_clause_joinssrc/backend/optimizer/path/joinrels.c280
make_rels_by_clauseless_joinssrc/backend/optimizer/path/joinrels.c314
join_is_legalsrc/backend/optimizer/path/joinrels.c350
init_dummy_sjinfosrc/backend/optimizer/path/joinrels.c661
make_join_relsrc/backend/optimizer/path/joinrels.c696
add_outer_joins_to_relidssrc/backend/optimizer/path/joinrels.c793
populate_joinrel_with_pathssrc/backend/optimizer/path/joinrels.c885
add_paths_to_joinrelsrc/backend/optimizer/path/joinpath.c124
try_nestloop_pathsrc/backend/optimizer/path/joinpath.c831
try_mergejoin_pathsrc/backend/optimizer/path/joinpath.c1029
try_hashjoin_pathsrc/backend/optimizer/path/joinpath.c1222
sort_inner_and_outersrc/backend/optimizer/path/joinpath.c1357
match_unsorted_outersrc/backend/optimizer/path/joinpath.c1812
hash_inner_and_outersrc/backend/optimizer/path/joinpath.c2220
select_mergejoin_clausessrc/backend/optimizer/path/joinpath.c2501
geqosrc/backend/optimizer/geqo/geqo_main.c72
gimme_pool_sizesrc/backend/optimizer/geqo/geqo_main.c320
gimme_number_generationssrc/backend/optimizer/geqo/geqo_main.c352
geqo_evalsrc/backend/optimizer/geqo/geqo_eval.c57
gimme_treesrc/backend/optimizer/geqo/geqo_eval.c163
merge_clumpsrc/backend/optimizer/geqo/geqo_eval.c238
desirable_joinsrc/backend/optimizer/geqo/geqo_eval.c325
geqo_threshold (GUC)src/backend/utils/misc/guc_tables.c2223
join_rel_level fieldsrc/include/nodes/pathnodes.h315

All claims below were checked against the REL_18_STABLE working tree at commit 273fe94852b3a7e34fd171e8abdf1481beb302fa (PG 18.x).

  • Dispatch threshold and branch. make_rel_from_joinlist (allpaths.c) contains exactly the three-way branch quoted — join_search_hook, then enable_geqo && levels_needed >= geqo_thresholdgeqo, else standard_join_search. Verified by reading lines 3400–3424.
  • geqo_threshold default is 12. guc_tables.c declares &geqo_threshold, 12, 2, INT_MAX (default 12, min 2). The companion DEFAULT_GEQO_EFFORT 5 lives in src/include/optimizer/geqo.h; pool size thus ranges 50–500 / floor 10–100, matching the gimme_pool_size comment 50 to 500 individuals / 10 to 100 individuals.
  • join_rel_level is the DP array. The field and its comment exist on PlannerInfo in pathnodes.h (read_write_ignore), and standard_join_search allocates it with palloc0((levels_needed + 1) * sizeof(List *)), seeds [1] = initial_rels, and nulls it again before returning. Verified.
  • Memo via find-or-create. make_join_rel calls build_join_rel(root, joinrelids, ...); the relid-set find-or-create semantics (and the consequent “same joinrel reached by many pairs” memoization) are stated in the make_rels_by_clause_joins header comment in joinrels.c (“at levels above 2 we will generate the same joined relation in multiple ways … the join_rel_level mechanism … automatically ensures that each new joinrel is only added to the list once”). Verified.
  • Bushy loop bound. The bushy for (k = 2;; k++) loop breaks at k > other_level (halfway point) and only calls make_join_rel for disjoint pairs with have_relevant_joinclause or have_join_order_restriction. Verified at joinrels.c lines 148–198.
  • Last-ditch Cartesian. The if (joinrels[level] == NIL) block re-runs make_rels_by_clauseless_joins, and the subsequent elog(ERROR, "failed to build any %d-way joins", level) fires only when there are no special joins and no LATERAL RTEs. Verified.
  • GEQO reuses make_join_rel. merge_clump in geqo_eval.c calls make_join_rel(root, old_clump->joinrel, new_clump->joinrel) — i.e. the genetic path and the exhaustive path share the identical join-rel construction and path-generation machinery. Verified at geqo_eval.c lines 260–262.
  • GEQO memo hygiene. geqo_eval saves list_length(root->join_rel_list) and root->join_rel_hash, nulls the hash, runs gimme_tree in a private AllocSetContextCreate("GEQO", ...), then list_truncates join_rel_list back and restores the hash and deletes the context. Verified at geqo_eval.c lines 95–131.
  • Default recombination operator. geqo.h’s #error guard requires one of ERX/PMX/CX/PX/OX1/OX2; the default build defines ERX (edge recombination crossover), which is the #if defined (ERX) branch taken in geqo()’s generation loop. The CX-only geqo_mutation include is guarded by #if defined(CX). Verified.
  • Symmetry handling. make_rels_by_clause_joins is called with a first_rel of foreach_current_index(r) + 1 only at level == 2, and the bushy loop uses the same offset only when k == other_level; both reflect the “make_join_rel handles both x,y and y,x” symmetry stated in its header comment and the join_search_one_level block comments. Verified.
  • add_paths_to_joinrel step ordering. The numbered comments 1. (sort_inner_and_outer), 2. (match_unsorted_outer), the diked-out 3., 4. (hash_inner_and_outer), 5. (FDW), 6. (set_join_pathlist_hook) appear verbatim in joinpath.c; step 3 (match_unsorted_inner) is inside #ifdef NOT_USED. Verified at joinpath.c lines 278–345.
  • REL_18 scope guard. No XLOG2 rmgr or B_DATACHECKSUMSWORKER_* symbols appear; the join-search code is core (src/backend/optimizer/...), not contrib. The header comments still read Copyright (c) 1996-2025, which is the REL_18 source as shipped.

Taken together, these checks confirm the document’s central claim: a single dispatch in make_rel_from_joinlist selects between an exhaustive DP that fills join_rel_level bottom-up and a genetic search that samples join-order tours, both funneling through the identical make_join_relbuild_join_rel (memo) → add_paths_to_joinrel (path search) pipeline, with the geqo_threshold GUC as the only knob that moves the boundary between the two regimes.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”

PostgreSQL’s join enumerator sits at one well-understood point in a large design space. Naming the alternatives clarifies why PostgreSQL made the choices it did and where the active research lies.

1. Top-down (Cascades) vs. bottom-up (Selinger). PostgreSQL is firmly bottom-up DP: it builds size-2 joins, then size-3, and so on. The chief modern alternative is the top-down, transformation-based framework of Graefe’s Volcano/Cascades optimizer (used by SQL Server and Apache Calcite). Cascades represents the search space as a Memo of equivalence groups and applies transformation rules (join commutativity, associativity, operator substitution) lazily, driven by a goal-oriented search with branch-and-bound pruning and memoization, so it can stop exploring a group once a cost bound is exceeded. The trade-off: Cascades is more extensible (new operators = new rules) and prunes more aggressively, but is heavier machinery; Selinger-style DP is simpler, exhaustive within its regime, and easier to reason about — which suits PostgreSQL’s “no surprises” philosophy. Both share the memo keyed by relation-set idea; PostgreSQL’s join_rel_hash is a stripped-down memo without the rule engine.

2. The DPccp lineage — enumerating only connected subgraphs. The classic academic refinement of Selinger DP is DPsize / DPsub / DPccp (Moerkotte & Neumann, “Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees”, VLDB 2006). DPccp enumerates exactly the connected subgraphs and their complement pairs (csg-cmp-pairs) of the join graph, in an order that touches each only once, and is provably optimal in the number of join pairs it considers for bushy trees. PostgreSQL’s join_search_one_level is a coarser approximation: it iterates by level and uses have_relevant_joinclause to skip disconnected pairs, but it does not implement the tight csg-cmp-pair enumeration, so it can redundantly revisit work (the source comments openly accept “duplicated effort” in some clauseless cases). This is a known gap; DPhyp (the hypergraph extension handling complex join predicates) is the state of the art that PostgreSQL has not adopted.

3. The threshold problem and GEQO’s age. PostgreSQL’s switch to a genetic algorithm at geqo_threshold = 12 is pragmatic but blunt. A genetic algorithm gives no optimality guarantee and its results are nondetermin- istic across geqo_seed values, which surprises users who expect stable plans. Modern systems often push exhaustive DP much further (DPhyp handles 20–25 relations comfortably) or use adaptive / cost-bounded schemes rather than a hard cutoff. GEQO is essentially unchanged 1990s code (the Utesch contribution, adapted from Whitley’s Genitor); raising geqo_threshold or disabling GEQO entirely is common operational advice for deterministic plans, accepting longer planning time. There is recurring community discussion about replacing GEQO with a more modern heuristic or a better DP algorithm, but the exhaustive path’s O(2^n) constants are the real obstacle.

It is worth being precise about what GEQO does and does not guarantee. It is a generational genetic algorithm with a steady-state reinsertion (spread_chromo drops the worst, not a whole-generation replace), edge recombination crossover by default (ERX, which tends to preserve adjacency information from both parents — apt for a tour-like genome), and selection pressure tuned by Geqo_selection_bias. None of this guarantees the global optimum, and the pool size (2^(nr_rel+1) clamped to [10..50] * Geqo_effort) and generation count (defaulting to the pool size) are heuristics chosen for “good enough, fast enough”, not for convergence proofs. In practice GEQO’s plans are usually within a small factor of optimal, but the variance across seeds is real, and a single bad evaluation that returns DBL_MAX (an illegal tour gimme_tree could not repair) simply drops out of the gene pool. The honest summary is that GEQO trades the DP’s optimality-within-cost-model guarantee for tractability, and the geqo_threshold GUC is the user’s dial on where that trade kicks in.

4. Adaptive and learned query optimization. The deepest critique of the whole Selinger edifice is that it commits to a single plan before execution based on estimated cardinalities — and cardinality estimation errors compound multiplicatively through a join tree, so the “optimal” plan over bad estimates can be arbitrarily bad. Two research fronts respond:

  • Adaptive query processing — e.g. Eddies (Avnur & Hellerstein, captured in knowledge/research/dbms-papers/eddies.md) abandons a fixed join order entirely, routing each tuple through operators dynamically so the effective order can change during execution in response to observed selectivities. PostgreSQL has nothing of this kind; its order is frozen at plan time.
  • Learned optimizers — recent systems (Neo, Bao, and the broader “ML for query optimization” line) use reinforcement learning or learned cost models to choose join orders, treating the enumerator’s output as a policy to be improved from execution feedback. These remain largely research / extension territory; PostgreSQL’s core stays with analytic cost models, though its join_search_hook is precisely the seam such a system would plug into.

5. What PostgreSQL gets right. Despite its age, the design has durable virtues: the find-or-create memo is correct and simple; bushy search with clause-connectivity pruning covers the cases that matter (star schemas, forced-bushy outer joins) without full DPccp complexity; and the clean join_search_hook / set_join_pathlist_hook seams let pg_hint_plan and experimental optimizers intervene without forking the planner. The two-regime structure — exhaustive when feasible, heuristic when not — is exactly the shape Selinger’s paper anticipated, four decades on.

  • Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A., & Price, T. G. (1979). Access Path Selection in a Relational Database Management System. SIGMOD. The origin of cost-based, bottom-up DP join ordering, interesting orders, and selectivity-based cost estimation. Local capture: knowledge/research/dbms-papers/systemr-optimizer.md.
  • Avnur, R. & Hellerstein, J. M. (2000). Eddies: Continuously Adaptive Query Processing. SIGMOD. The adaptive-ordering counterpoint to fixed Selinger ordering. Local capture: knowledge/research/dbms-papers/eddies.md.
  • Moerkotte, G. & Neumann, T. (2006). Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees (DPccp), VLDB; and DPhyp (2008). The optimal connected-subgraph enumeration PostgreSQL approximates. (Reference, not locally captured.)
  • PostgreSQL REL_18_STABLE source (commit 273fe94):
    • src/backend/optimizer/path/allpaths.cmake_rel_from_joinlist, standard_join_search, make_one_rel.
    • src/backend/optimizer/path/joinrels.cjoin_search_one_level, make_join_rel, make_rels_by_clause_joins, populate_joinrel_with_paths.
    • src/backend/optimizer/path/joinpath.cadd_paths_to_joinrel and the sort/nestloop/hash path builders.
    • src/backend/optimizer/geqo/geqo_main.c, src/backend/optimizer/geqo/geqo_eval.c — the genetic fallback.
    • src/backend/optimizer/README — the planner’s own narrative of join search, parameterized paths, and the param_source_rels heuristic.
    • src/include/nodes/pathnodes.hPlannerInfo.join_rel_level, RelOptInfo.
    • src/include/optimizer/geqo.h, src/backend/utils/misc/guc_tables.c — GEQO defaults and the geqo_threshold GUC.
  • Cross-references within this KB: postgres-planner-overview.md (how the joinlist is produced and how the optimizer fits the planner), postgres-path-generation.md (the per-pair path search this doc hands off to), postgres-cost-model.md (the cost functions that drive add_path), and postgres-executor.md (how the chosen join plan executes).