PostgreSQL Plan Creation — From Winning Path to Executable Plan Tree
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”A cost-based optimizer is, in the textbook framing, a search over the space of equivalent evaluation plans for one query. Database System Concepts (Silberschatz, 7e, ch. 16 “Query Optimization”) describes the job as: enumerate logically equivalent expressions, annotate each with a physical operator and an access method, estimate the cost of each, and “choose an evaluation plan that has lowest estimated cost” (§16.1). The chapter is careful to separate two artifacts that beginners conflate:
- An evaluation-plan expression — the annotated operator tree the optimizer reasons about. Internally PostgreSQL calls these Paths. A Path is cheap to make and throw away: it records only what the optimizer needs to compare alternatives — an operator type, the inputs, the sort order it produces, the rows it will emit, and the cost to produce them. Thousands of Paths are generated and discarded during join-order search.
- An evaluation plan — the concrete recipe the execution engine runs. DSC §16.1 notes that the chosen expression must still be “annotated with the method to be used for evaluating each operation” and turned into something executable. This is the Plan in PostgreSQL. There is exactly one, and it is built only for the single Path that won the cost tournament.
The transformation from “the cheapest expression” to “the executable
plan” is its own phase, and it is the subject of this document. DSC
treats it almost in passing — the textbook’s interest is the search, not
the lowering — but the lowering is where a real engine does a surprising
amount of work. Three problems must be solved before the operator tree
can be handed to the iterator engine of postgres-executor.md:
-
Operator selection and node materialization. The winning Path says “hash join here, sequential scan there.” Each abstract operator must become a concrete node carrying everything the executor needs: the join condition stripped to bare expressions, the qualification clauses ordered for cheapest-first short-circuiting, the output column list, and the frozen cost estimates for
EXPLAIN. -
Variable resolution. DSC §15.3 and §16 reason about columns by name (
r.a,s.b). The parser numbers them as(range-table-index, attribute-number)pairs. But the executor does not have a range table per operator — at a join, the inputs are two anonymous tuple slots, and the only sensible way to name a column is “field k of the outer tuple” or “field k of the inner tuple.” Resolving everyVarfrom catalog identity to slot position is a second, independent tree walk. -
Nested sub-queries. A scalar sub-SELECT in a
WHEREclause is a whole second query tree. DSC §16.4.4 (“Optimizing Nested Subqueries”) observes that a correlated subquery — one that references the outer row — is logically a function evaluated once per outer tuple, whereas an uncorrelated subquery can be “evaluated once” and its result reused. That dichotomy maps directly onto PostgreSQL’sSubPlan(per-tuple) versusinitPlan(once-per-query) distinction.
The remaining sections trace how PostgreSQL’s createplan.c,
setrefs.c, and subselect.c solve these three problems on the REL_18
source. The upstream phases — join enumeration and Path costing — are
covered in postgres-planner-overview.md and
postgres-path-generation.md; the downstream consumer of the finished
Plan is postgres-executor.md.
Common DBMS Design
Section titled “Common DBMS Design”The textbook gives the what (lower the chosen expression to something runnable). This section names the engineering conventions almost every production optimizer adopts for the lowering, so PostgreSQL’s specific choices read as one point in a shared design space.
Two object models: a throwaway plan-space node and a durable plan node
Section titled “Two object models: a throwaway plan-space node and a durable plan node”Optimizers that do real search keep the search-space node deliberately
thin and a separate, fatter node for the survivor. The search node
(Path / Group expression in Cascades / AbstractRelNode in some
academic systems) carries only comparison-relevant fields so that
allocating millions of them is cheap. The survivor is expanded — once —
into a node that carries execution detail (resolved expressions,
serialization hints, parallel-worker counts). Conflating the two would
make search allocate execution metadata it will throw away 99.99% of the
time. PostgreSQL’s Path vs Plan split is the textbook instance.
Top-down structural recursion with a “what tlist do you owe me?” contract
Section titled “Top-down structural recursion with a “what tlist do you owe me?” contract”The lowering walks the chosen tree from the root down. A subtlety almost every engine hits: a parent operator sometimes needs its child to emit exactly a certain set of columns in a certain order (a sort needs its sort keys present; a set-operation needs matching column counts), and sometimes does not care because the parent will project anyway (a nested-loop join re-projects its output, so its children may emit whatever is convenient). Engines therefore thread a small “tlist requirement” flag down the recursion so a child can choose the cheapest target list it is allowed to. Emitting a wider physical tuple when allowed lets the scan skip a projection step entirely.
Quals: order them, strip them, and split one-time from per-tuple
Section titled “Quals: order them, strip them, and split one-time from per-tuple”A node’s qualification clauses are reorganized at lowering time, not run
time. Two transformations are universal: (a) order the clauses so
the cheapest and most selective run first, since AND short-circuits;
and (b) strip the optimizer’s bookkeeping wrapper off each clause
(PostgreSQL’s RestrictInfo) to leave the bare boolean expression the
evaluator wants. A third, more specialized split pulls out
pseudoconstant clauses — conditions that do not depend on the current
row (WHERE 1=0, WHERE $1 IS NULL) — so they can be evaluated once
as a gate rather than re-tested per tuple.
Positional variable binding
Section titled “Positional variable binding”This is the deepest convention and the one most invisible to users.
Inside the executor, an operator receives its inputs as opaque tuple
slots; it must address a column as “slot, offset k,” not as “table
T, column c.” So the lowering performs a global rewrite: at a scan
leaf, a Var keeps its identity but its range-table index is shifted
into a single flattened namespace; at every interior node, a Var that
referred to a base column is rewritten to point at the position of that
column in the child’s output tuple. After this pass, no operator needs
the catalog to find a column — it dereferences an array. The price is a
second full tree walk and a per-node index structure to make the
“which output position is this Var?” lookup fast.
Subqueries: once-per-query vs once-per-row
Section titled “Subqueries: once-per-query vs once-per-row”Every engine must decide, for each nested sub-SELECT, whether it can be hoisted to run a single time (its result cached in a parameter slot) or must be re-evaluated for each outer row because it depends on that row. The hoistable case is strictly cheaper and is taken whenever the subquery has no correlation to the current level. The per-row case is wired as a callable sub-plan that reads its correlation values from parameter slots the parent fills in before each invocation.
Freeze the estimates
Section titled “Freeze the estimates”The cost and row-count estimates computed during search are copied onto
the plan node and frozen. They are no longer used to make decisions —
the decisions are over — but EXPLAIN surfaces them, and a few late
rewrites (inserting a Sort or Material node not present as its own
Path) re-derive a local estimate to keep EXPLAIN self-consistent.
Theory ↔ PostgreSQL mapping
Section titled “Theory ↔ PostgreSQL mapping”| Convention | PostgreSQL realization |
|---|---|
| Throwaway search node | Path (and subtypes) in pathnodes.h |
| Durable execution node | Plan (and subtypes) in plannodes.h |
| Top-down lowering | create_plan → create_plan_recurse |
| Tlist requirement flag | CP_EXACT_TLIST / CP_SMALL_TLIST / CP_LABEL_TLIST / CP_IGNORE_TLIST |
| Wider physical tlist | use_physical_tlist → build_physical_tlist |
| Order + strip quals | order_qual_clauses + extract_actual_clauses |
| One-time qual gate | get_gating_quals → create_gating_plan (a Result) |
| Positional variable binding | set_plan_references → fix_scan_expr / fix_join_expr / fix_upper_expr |
| Per-row subquery | SubPlan (build_subplan, isInitPlan = false) |
| Once-per-query subquery | initPlan (isInitPlan = true, SS_attach_initplans) |
| Frozen estimates | copy_generic_path_info |
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”PostgreSQL splits plan creation into two cleanly separated passes over the tree, with subquery wiring threaded through both.
Pass 1 — createplan.c: Path tree → Plan tree (still in parser Var
numbering). create_plan is the entry point. It is handed the single
best_path chosen by the join search and recurses down it,
materializing one Plan node per Path node. Crucially, the comment at
the head of the file is explicit that this pass leaves variable numbering
alone:
// createplan.c file header — src/backend/optimizer/plan/createplan.c * The tlists and quals in the plan tree are still in planner format, * ie, Vars still correspond to the parser's numbering. This will be * fixed later by setrefs.c.The recursion dispatches purely on best_path->pathtype, and the switch
is the table of contents for the whole module:
// create_plan_recurse — src/backend/optimizer/plan/createplan.cswitch (best_path->pathtype){ case T_SeqScan: case T_IndexScan: /* ... all scan types ... */ case T_CustomScan: plan = create_scan_plan(root, best_path, flags); break; case T_HashJoin: case T_MergeJoin: case T_NestLoop: plan = create_join_plan(root, (JoinPath *) best_path); break; /* ... Append, Sort, Agg, Material, Gather, ModifyTable, Limit ... */}The flags argument is the “tlist requirement” contract from the Common
Design section. Its values are defined at the top of the file:
// flag definitions — src/backend/optimizer/plan/createplan.c#define CP_EXACT_TLIST 0x0001 /* Plan must return specified tlist */#define CP_SMALL_TLIST 0x0002 /* Prefer narrower tlists */#define CP_LABEL_TLIST 0x0004 /* tlist must contain sortgrouprefs */#define CP_IGNORE_TLIST 0x0008 /* caller will replace tlist */create_plan seeds the recursion with CP_EXACT_TLIST because the
top-level plan must emit exactly the query’s output columns; interior
nodes relax this for their children when they themselves can project.
Pass 2 — setrefs.c: resolve every Var to a slot position. After
create_plan returns, the caller (standard_planner →
set_plan_references) walks the finished Plan tree a second time. This
pass flattens all per-subquery range tables into one global
finalrtable, shifts every scan node’s Vars by an rtoffset, and — the
heart of the pass — rewrites each join/upper node’s Vars so they address
the output position of the child tuple rather than a base column.
Subquery wiring — subselect.c. Sub-SELECTs were turned into
SubPlan/initPlan structures earlier (during expression preprocessing
and build_subplan), but plan creation is where they are stitched onto
the tree: SS_attach_initplans hangs the level’s initPlans on the
topmost node, and setrefs.c resolves the PARAM_EXEC references that
connect a sub-plan’s output to the place it is consumed.
Pass 1 in detail: building one scan node
Section titled “Pass 1 in detail: building one scan node”create_scan_plan is the worker for all leaf relations. It does four
things in order: pick the restriction clauses, decide on a target list,
build the typed node, and (if needed) wrap a gating Result. The
target-list decision is the interesting part — when the parent does not
demand an exact tlist, the scan prefers a physical tlist (all columns
in storage order) so the executor can skip projection:
// create_scan_plan — src/backend/optimizer/plan/createplan.cif (flags == CP_IGNORE_TLIST){ tlist = NULL;}else if (use_physical_tlist(root, best_path, flags)){ /* ... index-only scan uses the index's tlist; else: */ tlist = build_physical_tlist(root, rel); if (tlist == NIL) /* dropped cols -> fall back */ tlist = build_path_tlist(root, best_path);}else{ tlist = build_path_tlist(root, best_path);}The normal target list is built by build_path_tlist, which simply
wraps each expression of the Path’s pathtarget in a TargetEntry with
a sequential resno, and — for a parameterized path — replaces any
lateral references with nestloop Params on the way:
// build_path_tlist — src/backend/optimizer/plan/createplan.cforeach(v, path->pathtarget->exprs){ Node *node = (Node *) lfirst(v); TargetEntry *tle;
if (path->param_info) node = replace_nestloop_params(root, node);
tle = makeTargetEntry((Expr *) node, resno, NULL, false); if (sortgrouprefs) tle->ressortgroupref = sortgrouprefs[resno - 1]; tlist = lappend(tlist, tle); resno++;}use_physical_tlist is the gate: it refuses the wide-tuple shortcut
whenever the parent demanded an exact or small tlist, whenever the
relation is not a simple base/subquery/function/values/CTE rel, and
whenever system columns or whole-row Vars are needed (those cannot be
addressed positionally in setrefs.c):
// use_physical_tlist — src/backend/optimizer/plan/createplan.cif (flags & (CP_EXACT_TLIST | CP_SMALL_TLIST)) return false;if (rel->rtekind != RTE_RELATION && rel->rtekind != RTE_SUBQUERY && /* ... RTE_FUNCTION, RTE_TABLEFUNC, RTE_VALUES, RTE_CTE ... */ ) return false;/* Can't do it if any system columns or whole-row Vars are requested. */for (i = rel->min_attr; i <= 0; i++) if (!bms_is_empty(rel->attr_needed[i - rel->min_attr])) return false;Once the tlist is chosen, the per-type creator runs. The sequential-scan case is the cleanest illustration of the whole node-building idiom — order quals, strip them, replace outer Vars, allocate, copy estimates:
// create_seqscan_plan — src/backend/optimizer/plan/createplan.c/* Sort clauses into best execution order */scan_clauses = order_qual_clauses(root, scan_clauses);/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */scan_clauses = extract_actual_clauses(scan_clauses, false);/* Replace any outer-relation variables with nestloop params */if (best_path->param_info) scan_clauses = (List *) replace_nestloop_params(root, (Node *) scan_clauses);
scan_plan = make_seqscan(tlist, scan_clauses, scan_relid);copy_generic_path_info(&scan_plan->scan.plan, best_path);make_seqscan is a pure allocator — no logic, just field assignment —
which keeps the policy (above) separate from the node shape:
// make_seqscan — src/backend/optimizer/plan/createplan.cSeqScan *node = makeNode(SeqScan);Plan *plan = &node->scan.plan;
plan->targetlist = qptlist;plan->qual = qpqual;plan->lefttree = NULL;plan->righttree = NULL;node->scan.scanrelid = scanrelid;And copy_generic_path_info is where the search-phase estimates are
frozen onto the durable node:
// copy_generic_path_info — src/backend/optimizer/plan/createplan.cdest->disabled_nodes = src->disabled_nodes;dest->startup_cost = src->startup_cost;dest->total_cost = src->total_cost;dest->plan_rows = src->rows;dest->plan_width = src->pathtarget->width;dest->parallel_aware = src->parallel_aware;dest->parallel_safe = src->parallel_safe;The overall control flow of Pass 1 for a two-table join with a correlated nested loop:
flowchart TD
A["create_plan(root, best_path)<br/>seed flags = CP_EXACT_TLIST"] --> B["create_plan_recurse<br/>dispatch on pathtype"]
B -->|T_NestLoop| C["create_join_plan<br/>-> create_nestloop_plan"]
C --> D["recurse: outer child<br/>flags = 0 (NL can project)"]
C --> E["push outer relids into<br/>root->curOuterRels"]
E --> F["recurse: inner child<br/>flags = 0"]
D -->|T_SeqScan| G["create_scan_plan<br/>physical tlist allowed?"]
F -->|T_IndexScan| H["create_scan_plan<br/>fix indexquals, replace outer Vars"]
G --> I["make_seqscan +<br/>copy_generic_path_info"]
H --> J["make_indexscan +<br/>copy_generic_path_info"]
C --> K["identify_current_nestloop_params<br/>build nestParams list"]
K --> L["make_nestloop(tlist, joinclauses,<br/>nestParams, outer, inner)"]
B --> M["apply_tlist_labeling on top node"]
M --> N["SS_attach_initplans(root, plan)"]
N --> O["Plan tree in PARSER Var numbering"]
Pass 1: joins, projection, and gating
Section titled “Pass 1: joins, projection, and gating”create_join_plan dispatches the three join methods, then checks for
pseudoconstant gating quals exactly as the scan path does:
// create_join_plan — src/backend/optimizer/plan/createplan.cswitch (best_path->path.pathtype){ case T_MergeJoin: plan = (Plan *) create_mergejoin_plan(root, (MergePath *) best_path); break; case T_HashJoin: plan = (Plan *) create_hashjoin_plan(root, (HashPath *) best_path); break; case T_NestLoop: plan = (Plan *) create_nestloop_plan(root, (NestPath *) best_path); break;}gating_clauses = get_gating_quals(root, best_path->joinrestrictinfo);if (gating_clauses) plan = create_gating_plan(root, (Path *) best_path, plan, gating_clauses);create_nestloop_plan is where lateral correlation becomes machinery.
A nested-loop join can re-scan its inner side once per outer row, so the
inner subtree is allowed to reference outer columns. Those references
are carried as Vars during search; here they are converted to
NestLoopParams. The routine pushes the outer relids into
root->curOuterRels before recursing into the inner child, so that
replace_nestloop_params (called deep inside the inner scan’s clause
processing) knows which Vars belong to the outer side:
// create_nestloop_plan — src/backend/optimizer/plan/createplan.c/* NestLoop can project, so no need to be picky about child tlists */outer_plan = create_plan_recurse(root, best_path->jpath.outerjoinpath, 0);
/* For a nestloop, include outer relids in curOuterRels for inner side */outerrelids = best_path->jpath.outerjoinpath->parent->relids;root->curOuterRels = bms_union(root->curOuterRels, outerrelids);
inner_plan = create_plan_recurse(root, best_path->jpath.innerjoinpath, 0);
/* Restore curOuterRels */bms_free(root->curOuterRels);root->curOuterRels = saveOuterRels;After both children exist, the join clauses are ordered, stripped (with
an outer-join split so that nulled-side quals become otherclauses),
and the surviving nestloop params are identified and removed from
root->curOuterParams:
// create_nestloop_plan — src/backend/optimizer/plan/createplan.cif (best_path->jpath.path.param_info){ joinclauses = (List *) replace_nestloop_params(root, (Node *) joinclauses); otherclauses = (List *) replace_nestloop_params(root, (Node *) otherclauses);}nestParams = identify_current_nestloop_params(root, outerrelids, PATH_REQ_OUTER((Path *) best_path));join_plan = make_nestloop(tlist, joinclauses, otherclauses, nestParams, outer_plan, inner_plan, best_path->jpath.jointype, best_path->jpath.inner_unique);The replace_nestloop_params mutator itself is the rule: a Var is
turned into a PARAM_EXEC parameter only if its varno is a member of
the current outer relids; otherwise it passes through untouched. This is
why the curOuterRels book-keeping above is load-bearing:
// replace_nestloop_params_mutator — src/backend/optimizer/plan/createplan.cif (IsA(node, Var)){ Var *var = (Var *) node; /* If not to be replaced, we can just return the Var unmodified */ if (IS_SPECIAL_VARNO(var->varno) || !bms_is_member(var->varno, root->curOuterRels)) return node; /* Replace the Var with a nestloop Param */ return (Node *) replace_nestloop_param_var(root, var);}Projection. create_projection_plan shows the tlist-requirement
contract paying off: if the parent does not care about the exact tlist
(use_physical_tlist), or if the child is itself projection-capable, no
separate Result node is built — the projection is folded into the
child by simply reassigning its targetlist. A Result is materialized
only when a non-projecting child must have its output reshaped:
// create_projection_plan — src/backend/optimizer/plan/createplan.cif (!needs_result_node){ /* Don't need a separate Result, just assign tlist to subplan */ plan = subplan; plan->targetlist = tlist; /* Label plan with the estimated costs we actually used */ plan->startup_cost = best_path->path.startup_cost; /* ... */}else{ /* We need a Result node */ plan = (Plan *) make_result(tlist, NULL, subplan); copy_generic_path_info(plan, (Path *) best_path);}Pass 1: handing subqueries to a fresh planner context
Section titled “Pass 1: handing subqueries to a fresh planner context”When a scan is over a sub-SELECT (RTE_SUBQUERY),
create_subqueryscan_plan does not call create_plan_recurse; it
calls create_plan afresh on the subquery’s own PlannerInfo
(rel->subroot), because the subquery is a separate range-table
namespace that must be lowered independently before being spliced in:
// create_subqueryscan_plan — src/backend/optimizer/plan/createplan.c/* * Recursively create Plan from Path for subquery. Since we are entering * a different planner context (subroot), recurse to create_plan not * create_plan_recurse. */subplan = create_plan(rel->subroot, best_path->subpath);/* ... */if (best_path->path.param_info){ process_subquery_nestloop_params(root, rel->subplan_params); scan_clauses = (List *) replace_nestloop_params(root, (Node *) scan_clauses);}scan_plan = make_subqueryscan(tlist, scan_clauses, scan_relid, subplan);Scalar sub-SELECTs in expressions (not in FROM) are a different
mechanism entirely, handled in subselect.c. build_subplan classifies
each one. The key decision is initPlan vs SubPlan: if the sub-SELECT
has no parameters supplied by the current level (parParam == NIL) and
is of a hoistable type (EXISTS, EXPR, ARRAY, ROWCOMPARE,
MULTIEXPR), it becomes an initPlan that runs once and stashes its
result in a PARAM_EXEC slot, and a bare Param is returned to stand in
for it in the parent expression:
// build_subplan — src/backend/optimizer/plan/subselect.cif (splan->parParam == NIL && subLinkType == EXPR_SUBLINK){ TargetEntry *te = linitial(plan->targetlist); Param *prm;
prm = generate_new_exec_param(root, exprType((Node *) te->expr), exprTypmod((Node *) te->expr), exprCollation((Node *) te->expr)); splan->setParam = list_make1_int(prm->paramid); isInitPlan = true; result = (Node *) prm; /* parent sees a Param, not the SubPlan */}Otherwise — a correlated subquery, or an ANY/ALL test that must scan
the output per outer tuple — it stays a SubPlan that the executor
invokes per row, optionally hash-tabled or materialized:
// build_subplan — src/backend/optimizer/plan/subselect.cif (subLinkType == ANY_SUBLINK && splan->parParam == NIL && subplan_is_hashable(plan) && testexpr_is_hashable(splan->testexpr, splan->paramIds)) splan->useHashTable = true;else if (splan->parParam == NIL && enable_material && !ExecMaterializesOutput(nodeTag(plan))) plan = materialize_finished_plan(plan);result = (Node *) splan;isInitPlan = false;Either way the finished sub-plan is registered in the global
glob->subplans list (indexed by plan_id), and initPlans additionally
go on root->init_plans so SS_attach_initplans can later hang them on
the topmost node:
// build_subplan (tail) — src/backend/optimizer/plan/subselect.croot->glob->subplans = lappend(root->glob->subplans, plan);splan->plan_id = list_length(root->glob->subplans);if (isInitPlan) root->init_plans = lappend(root->init_plans, splan);splan->plan_name = psprintf("%s %d", isInitPlan ? "InitPlan" : "SubPlan", splan->plan_id);Pass 2: positional variable binding
Section titled “Pass 2: positional variable binding”The second pass, set_plan_references, is the conceptual climax. The
file header enumerates its nine jobs; the two that matter here are
flattening the range table and resolving Vars. It begins by recording
the current size of the global range table as rtoffset, then appends
this query level’s RTEs to it — so every scan node in this level will
need its varno shifted up by rtoffset to address the flattened table:
// set_plan_references — src/backend/optimizer/plan/setrefs.cint rtoffset = list_length(glob->finalrtable);
/* Add all the query's RTEs to the flattened rangetable. */add_rtes_to_flat_rtable(root, false);/* ... adjust rowmarks, appendrels, alt-subplans ... *//* Now fix the Plan tree */result = set_plan_refs(root, plan, rtoffset);set_plan_refs recurses the Plan tree, assigns each node a unique
plan_node_id, and dispatches on node type. For a scan leaf, the fix is
purely additive: shift scanrelid and every Var’s varno by rtoffset,
because a scan Var still names a base column — only its table index moves
into the global namespace:
// set_plan_refs (T_SeqScan case) — src/backend/optimizer/plan/setrefs.cSeqScan *splan = (SeqScan *) plan;
splan->scan.scanrelid += rtoffset;splan->scan.plan.targetlist = fix_scan_list(root, splan->scan.plan.targetlist, rtoffset, NUM_EXEC_TLIST(plan));splan->scan.plan.qual = fix_scan_list(root, splan->scan.plan.qual, rtoffset, NUM_EXEC_QUAL(plan));fix_scan_expr does the Var shift via its mutator. Note it asserts the
scan never sees an INNER_VAR/OUTER_VAR — those special varnos only
appear above a join, never at a leaf:
// fix_scan_expr_mutator — src/backend/optimizer/plan/setrefs.cif (IsA(node, Var)){ Var *var = copyVar((Var *) node); Assert(var->varno != INNER_VAR); Assert(var->varno != OUTER_VAR); Assert(var->varno != ROWID_VAR); if (!IS_SPECIAL_VARNO(var->varno)) var->varno += context->rtoffset; if (var->varnosyn > 0) var->varnosyn += context->rtoffset; return (Node *) var;}At a join, the rewrite is fundamentally different. A join’s output
Vars must no longer name base columns; they must name positions in the
inner or outer child’s output tuple. set_join_references builds an
indexed_tlist over each child’s target list, then rewrites the join’s
own clauses and tlist against those indexes. The special varnos
INNER_VAR (-1) and OUTER_VAR (-2) are the executor’s way of saying
“this column comes from the inner / outer slot”:
// INNER_VAR / OUTER_VAR / INDEX_VAR — src/include/nodes/primnodes.h#define INNER_VAR (-1) /* reference to inner subplan */#define OUTER_VAR (-2) /* reference to outer subplan */#define INDEX_VAR (-3) /* reference to index column */// set_join_references — src/backend/optimizer/plan/setrefs.couter_itlist = build_tlist_index(outer_plan->targetlist);inner_itlist = build_tlist_index(inner_plan->targetlist);
join->joinqual = fix_join_expr(root, join->joinqual, outer_itlist, inner_itlist, (Index) 0, rtoffset, NRM_EQUAL, ...);/* ... join-type-specific: mergeclauses / hashclauses / nestParams ... */join->plan.targetlist = fix_join_expr(root, join->plan.targetlist, outer_itlist, inner_itlist, (Index) 0, rtoffset, (join->jointype == JOIN_INNER ? NRM_EQUAL : NRM_SUPERSET), ...);build_tlist_index is the lookup accelerator: it flattens a child’s
tlist into an array of (varno, varattno) -> resno triples so that
matching a parent Var to a child output position is a linear scan over a
compact array rather than over TargetEntry nodes:
// build_tlist_index — src/backend/optimizer/plan/setrefs.cforeach(l, tlist){ TargetEntry *tle = (TargetEntry *) lfirst(l); if (tle->expr && IsA(tle->expr, Var)) { Var *var = (Var *) tle->expr; vinfo->varno = var->varno; vinfo->varattno = var->varattno; vinfo->resno = tle->resno; vinfo->varnullingrels = var->varnullingrels; vinfo++; } else if (tle->expr && IsA(tle->expr, PlaceHolderVar)) itlist->has_ph_vars = true; else itlist->has_non_vars = true;}fix_join_expr_mutator is the per-Var rewriter: it searches the outer
index first, then the inner, and on a hit returns a copy of the Var with
varno set to OUTER_VAR/INNER_VAR and varattno set to the child’s
output resno — the column’s position, not its catalog attribute:
// fix_join_expr_mutator — src/backend/optimizer/plan/setrefs.cif (context->outer_itlist){ newvar = search_indexed_tlist_for_var(var, context->outer_itlist, OUTER_VAR, context->rtoffset, context->nrm_match); if (newvar) return (Node *) newvar;}if (context->inner_itlist){ newvar = search_indexed_tlist_for_var(var, context->inner_itlist, INNER_VAR, context->rtoffset, context->nrm_match); if (newvar) return (Node *) newvar;}/* ... PlaceHolderVar / non-Var / acceptable_rel fallbacks ... */elog(ERROR, "variable not found in subplan target lists");And search_indexed_tlist_for_var is where the substitution is made
concrete — the matched Var’s varno becomes the special varno and its
varattno becomes the child output slot’s resno:
// search_indexed_tlist_for_var — src/backend/optimizer/plan/setrefs.cif (vinfo->varno == varno && vinfo->varattno == varattno){ Var *newvar = copyVar(var); /* ... varnullingrels cross-check ... */ newvar->varno = newvarno; /* OUTER_VAR or INNER_VAR */ newvar->varattno = vinfo->resno; /* position in child tuple */ if (newvar->varnosyn > 0) newvar->varnosyn += rtoffset; return newvar;}For single-input upper nodes (Agg, Group, Result), the analogous routine
set_upper_references builds one index over the lone child and rewrites
Vars to OUTER_VAR. The data-flow of Pass 2 for the same join:
flowchart TD
A["set_plan_references<br/>rtoffset = len(finalrtable)"] --> B["add_rtes_to_flat_rtable<br/>append this level's RTEs"]
B --> C["set_plan_refs<br/>recurse, assign plan_node_id"]
C -->|leaf scan| D["scanrelid += rtoffset"]
D --> E["fix_scan_expr<br/>Var.varno += rtoffset<br/>(still a base-column Var)"]
C -->|join node| F["build_tlist_index<br/>over outer + inner tlists"]
F --> G["fix_join_expr on<br/>joinqual, hashclauses, tlist"]
G --> H["search_indexed_tlist_for_var"]
H --> I["match outer?<br/>varno = OUTER_VAR<br/>varattno = child resno"]
H --> J["match inner?<br/>varno = INNER_VAR<br/>varattno = child resno"]
C -->|upper node| K["set_upper_references<br/>build_tlist_index over lefttree"]
K --> L["fix_upper_expr -> OUTER_VAR"]
I --> M["Plan tree: every Var is a<br/>(special varno, slot offset)"]
J --> M
L --> M
E --> M
Source Walkthrough
Section titled “Source Walkthrough”The plan-creation machinery lives in three files under
src/backend/optimizer/plan/. The symbols below are grouped by the phase
they belong to; each is a stable anchor (line numbers are hints in the
table at the end).
Pass 1 — createplan.c: Path → Plan
Section titled “Pass 1 — createplan.c: Path → Plan”create_plan— module entry point. Seeds the recursion withCP_EXACT_TLIST, callscreate_plan_recurse, then does three finishing touches on the root node:apply_tlist_labeling(restore original column names for the result descriptor),SS_attach_initplans(hang this level’s initPlans), and an assertion that allNestLoopParams were assigned.create_plan_recurse— the dispatchswitchonbest_path->pathtype. Guards against deep trees withcheck_stack_depth(). Returns onePlan *perPath.create_scan_plan— leaf-relation worker. Selectsscan_clauses(usingindrestrictinfofor index paths), folds inparam_info->ppi_clausesfor parameterized scans, decides the tlist viause_physical_tlist, dispatches to the typed creator, and wraps a gatingResultifget_gating_qualsreturned pseudoconstants.create_join_plan— dispatchescreate_mergejoin_plan,create_hashjoin_plan,create_nestloop_plan; appends a gatingResultfor pseudoconstantjoinrestrictinfo.build_path_tlist— wraps eachpathtarget->exprselement in aTargetEntry; appliesreplace_nestloop_paramsfor parameterized paths; copiessortgrouprefs.use_physical_tlist— gate for emitting a storage-order (all-columns) tlist so the executor can skip projection. Refuses whenCP_EXACT_TLIST/CP_SMALL_TLISTset, for non-simple rtekinds, for inheritance/Custom paths, and when system columns or whole-row Vars are needed.create_seqscan_plan/make_seqscan— the canonical node-build idiom (order → strip → replace outer Vars → allocate → copy estimates) and its pure allocator.create_indexscan_plan— additionally callsfix_indexqual_referencesto rewrite index-key Vars toINDEX_VARattribute numbers and to stripRestrictInfowrappers.create_nestloop_plan— managesroot->curOuterRelsaround the inner-child recursion; callsidentify_current_nestloop_params; builds the node withmake_nestloop.create_projection_plan/make_result/inject_projection_plan— fold projection into the child when possible, else materialize aResult.create_subqueryscan_plan— re-enterscreate_planonrel->subroot(a separate planner context), callsprocess_subquery_nestloop_paramsfor lateral refs.replace_nestloop_params/replace_nestloop_params_mutator— convert outer-relationVars andPlaceHolderVars intoPARAM_EXECparameters, keyed onroot->curOuterRelsmembership.copy_generic_path_info— freeze cost/rows/width/parallel flags from thePathonto thePlan.copy_plan_costsizedoes the same for nodes inserted without their own Path (e.g.Sort,Material).change_plan_targetlistswaps a child’s tlist when a nestloop must publish a PHV in the outer output.
Pass 2 — setrefs.c: positional variable binding
Section titled “Pass 2 — setrefs.c: positional variable binding”set_plan_references— pass entry. Computesrtoffset = list_length(glob->finalrtable), flattens RTEs (add_rtes_to_flat_rtable), adjusts rowmarks/appendrels, then callsset_plan_refs.set_plan_refs— recursive per-node dispatch. Assignsplan_node_id; for scans shiftsscanrelidand Vars byrtoffset; for joins delegates toset_join_references; for single-input upper nodes delegates toset_upper_references. Also deletes no-opSubqueryScan/Append/MergeAppendnodes (trivial_subqueryscan).fix_scan_expr/fix_scan_expr_mutator/fix_scan_expr_walker— scan-level Var shift. The walker (no-copy) variant is taken whenrtoffset == 0and there are no params/PHVs to rewrite, saving cycles on trivial queries. Callsfix_param_nodeforPARAM_MULTIEXPR→PARAM_EXECsubstitution.set_join_references— buildsindexed_tlists over both children, rewritesjoinqual, method-specific clauses (mergeclauses,hashclauses,hashkeys),nestParams, thentargetlist/qualviafix_join_expr/fix_upper_expr.set_upper_references— single-child analogue (Agg, Group, Result), rewriting Vars toOUTER_VAR.set_indexonlyscan_references— special-cases index-only scans so Vars referenceINDEX_VARcolumns.build_tlist_index/build_tlist_index_other_vars— flatten a child tlist into atlist_vinfoarray for O(n) Var matching; sethas_ph_vars/has_non_varsflags.search_indexed_tlist_for_var— the core substitution:(varno, varattno)→(newvarno, resno)with avarnullingrelscross-check governed byNullingRelsMatch. Companionssearch_indexed_tlist_for_phvandsearch_indexed_tlist_for_non_varhandle PHVs and whole expressions pushed into a child tlist.fix_join_expr/fix_join_expr_mutator— outer-then-inner Var search; errors with"variable not found in subplan target lists"on miss.fix_upper_expr/fix_upper_expr_mutator— the single-input variant.set_dummy_tlist_references— for nodes (Sort, Material, etc.) whose output is exactly their input: rewrite the tlist to a list ofOUTER_VARreferences by position.
Subquery wiring — subselect.c
Section titled “Subquery wiring — subselect.c”make_subplan— top-level conversion of aSubLinkto a plan; runs the subquery planner, then callsbuild_subplan.build_subplan— the initPlan-vs-SubPlan classifier. BuildsparParam/args, decidesisInitPlan, allocates the outputPARAM_EXECviagenerate_new_exec_param/generate_subquery_params, optionally setsuseHashTableor materializes, registers inglob->subplansand (for initPlans)root->init_plans, and labels forEXPLAIN.generate_subquery_params/convert_testexpr— build thePARAM_EXEClist standing for the subquery’s output columns and splice them into the test expression.SS_attach_initplans— assignplan->initPlan = root->init_planson the topmost node (called fromcreate_plan).SS_process_ctes— turnWITHCTEs into initplans / CTE scans.SS_finalize_plan/finalize_plan— final recursive pass computing each node’sextParam/allParamsets (run afterset_plan_references).
Position hints (as of 2026-06-05, REL_18 273fe94):
| Symbol | File | Line |
|---|---|---|
create_plan | src/backend/optimizer/plan/createplan.c | 337 |
create_plan_recurse | src/backend/optimizer/plan/createplan.c | 388 |
create_scan_plan | src/backend/optimizer/plan/createplan.c | 559 |
build_path_tlist | src/backend/optimizer/plan/createplan.c | 825 |
use_physical_tlist | src/backend/optimizer/plan/createplan.c | 865 |
get_gating_quals | src/backend/optimizer/plan/createplan.c | 1002 |
create_gating_plan | src/backend/optimizer/plan/createplan.c | 1022 |
create_join_plan | src/backend/optimizer/plan/createplan.c | 1081 |
create_projection_plan | src/backend/optimizer/plan/createplan.c | 2015 |
inject_projection_plan | src/backend/optimizer/plan/createplan.c | 2117 |
change_plan_targetlist | src/backend/optimizer/plan/createplan.c | 2149 |
create_seqscan_plan | src/backend/optimizer/plan/createplan.c | 2910 |
create_indexscan_plan | src/backend/optimizer/plan/createplan.c | 2999 |
create_nestloop_plan | src/backend/optimizer/plan/createplan.c | 4341 |
create_mergejoin_plan | src/backend/optimizer/plan/createplan.c | 4493 |
create_hashjoin_plan | src/backend/optimizer/plan/createplan.c | 4847 |
replace_nestloop_params | src/backend/optimizer/plan/createplan.c | 5036 |
replace_nestloop_params_mutator | src/backend/optimizer/plan/createplan.c | 5043 |
fix_indexqual_references | src/backend/optimizer/plan/createplan.c | 5121 |
copy_generic_path_info | src/backend/optimizer/plan/createplan.c | 5514 |
copy_plan_costsize | src/backend/optimizer/plan/createplan.c | 5530 |
make_seqscan | src/backend/optimizer/plan/createplan.c | 5643 |
make_nestloop | src/backend/optimizer/plan/createplan.c | 6083 |
make_result | src/backend/optimizer/plan/createplan.c | 7129 |
set_plan_references | src/backend/optimizer/plan/setrefs.c | 288 |
add_rtes_to_flat_rtable | src/backend/optimizer/plan/setrefs.c | 396 |
set_plan_refs | src/backend/optimizer/plan/setrefs.c | 619 |
set_indexonlyscan_references | src/backend/optimizer/plan/setrefs.c | 1333 |
trivial_subqueryscan | src/backend/optimizer/plan/setrefs.c | 1476 |
fix_param_node | src/backend/optimizer/plan/setrefs.c | 2125 |
fix_scan_expr | src/backend/optimizer/plan/setrefs.c | 2212 |
fix_scan_expr_mutator | src/backend/optimizer/plan/setrefs.c | 2247 |
set_join_references | src/backend/optimizer/plan/setrefs.c | 2332 |
set_upper_references | src/backend/optimizer/plan/setrefs.c | 2481 |
set_dummy_tlist_references | src/backend/optimizer/plan/setrefs.c | 2692 |
build_tlist_index | src/backend/optimizer/plan/setrefs.c | 2759 |
build_tlist_index_other_vars | src/backend/optimizer/plan/setrefs.c | 2810 |
search_indexed_tlist_for_var | src/backend/optimizer/plan/setrefs.c | 2868 |
search_indexed_tlist_for_phv | src/backend/optimizer/plan/setrefs.c | 2933 |
search_indexed_tlist_for_non_var | src/backend/optimizer/plan/setrefs.c | 2986 |
fix_join_expr | src/backend/optimizer/plan/setrefs.c | 3104 |
fix_join_expr_mutator | src/backend/optimizer/plan/setrefs.c | 3126 |
fix_upper_expr | src/backend/optimizer/plan/setrefs.c | 3278 |
fix_upper_expr_mutator | src/backend/optimizer/plan/setrefs.c | 3298 |
make_subplan | src/backend/optimizer/plan/subselect.c | 162 |
build_subplan | src/backend/optimizer/plan/subselect.c | 319 |
generate_subquery_params | src/backend/optimizer/plan/subselect.c | 582 |
convert_testexpr | src/backend/optimizer/plan/subselect.c | 644 |
SS_process_ctes | src/backend/optimizer/plan/subselect.c | 880 |
SS_attach_initplans | src/backend/optimizer/plan/subselect.c | 2353 |
SS_finalize_plan | src/backend/optimizer/plan/subselect.c | 2368 |
INNER_VAR / OUTER_VAR / INDEX_VAR | src/include/nodes/primnodes.h | 242 |
IS_SPECIAL_VARNO | src/include/nodes/primnodes.h | 247 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”All claims below were checked against the REL_18 working tree at commit
273fe94 on 2026-06-05.
- Two passes, parser numbering preserved through Pass 1. Verified
via the
create_planheader comment (“Vars still correspond to the parser’s numbering. This will be fixed later by setrefs.c”) and by the fact that nocreateplan.croutine writesINNER_VAR/OUTER_VARinto a Var’svarno— those constants are written only insearch_indexed_tlist_for_var/set_dummy_tlist_referencesinsetrefs.c. CP_*flags. The four flags and their hex values are defined at the top ofcreateplan.c(lines 70–73) exactly as quoted;CP_IGNORE_TLISTis0x0008per the header comment block.- Physical-tlist gate.
use_physical_tlistreturnsfalseonCP_EXACT_TLIST | CP_SMALL_TLISTand on system-column / whole-row Var requirements, as quoted. The fallback tobuild_path_tlistonbuild_physical_tlistreturningNIL(dropped columns) is present increate_scan_plan. - NestLoop param machinery.
create_nestloop_planpushesouterrelidsintoroot->curOuterRelsbefore recursing into the inner child and restores it afterward;replace_nestloop_params_mutatorkeys replacement onbms_is_member(var->varno, root->curOuterRels). Both verified verbatim. - Special varnos.
INNER_VAR = -1,OUTER_VAR = -2,INDEX_VAR = -3andIS_SPECIAL_VARNO(varno) = ((int)(varno) < 0)confirmed inprimnodes.h(lines 242–247). - Var → slot rewrite.
search_indexed_tlist_for_varsetsnewvar->varno = newvarno(the passed-inOUTER_VAR/INNER_VAR) andnewvar->varattno = vinfo->resno(the child output position), verified at the match branch. - initPlan vs SubPlan.
build_subplansetsisInitPlan = trueforparParam == NILplusEXISTS/EXPR/ARRAY/ROWCOMPAREand theparParam == NILarm ofMULTIEXPR;ANY_SUBLINKuncorrelated may setuseHashTable; the finallappend(root->init_plans, splan)runs only whenisInitPlan. Verified. SS_attach_initplansis a one-liner assigningplan->initPlan = root->init_plans; confirmed at line 2353. The comment increate_plannotes initPlans are attached only to the topmost node of the query level, whichSS_finalize_plan’s header comment also relies on.- REL_18 specifics. The
varreturningtypecheck infix_join_expr_mutator(forRETURNING old/newVars, a PG 18 MERGE/ RETURNING feature) is present and quoted; no removed-feature symbols (e.g. noXLOG2rmgr, noB_DATACHECKSUMSWORKER_*) appear in these files, consistent with the REL_18 constraint.
One caveat worth flagging: line numbers in the position table are hints scoped to this revision. The symbol names are the durable anchors; a reformatting commit will move the numbers without touching the names.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”PostgreSQL’s “lower the winning Path, then resolve Vars positionally” pipeline is one well-trodden point in a larger design space. The contrasts below sharpen what is essential versus what is a PostgreSQL convention.
Volcano/Cascades: lowering is implementation rules, not a separate phase
Section titled “Volcano/Cascades: lowering is implementation rules, not a separate phase”In a Cascades-style optimizer (Microsoft SQL Server, Greenplum’s ORCA,
Apache Calcite’s Volcano planner), there is no clean “Path tree → Plan
tree” lowering step. The search memo already holds physical group
expressions, and the chosen physical plan is extracted directly from the
memo. Operator selection happens during search via implementation
rules, not afterward. PostgreSQL instead keeps a deliberately simple
System-R-style bottom-up join search (postgres-planner-overview.md)
that produces lightweight Paths, and concentrates all the “make it
runnable” work into createplan.c. The trade is clear: PostgreSQL’s
lowering is simpler to read and reason about, but it cannot make
operator-implementation decisions that depend on properties only known
after the global plan shape is fixed.
Positional binding vs. name-based execution
Section titled “Positional binding vs. name-based execution”The Var → (INNER_VAR/OUTER_VAR, resno) rewrite is the executor’s
performance contract: at run time, reading a column is an array index,
never a catalog lookup or a name comparison. Some interpreted or
research engines keep columns name-addressed for flexibility and pay the
lookup cost per access; PostgreSQL pays a whole second tree walk at plan
time (setrefs.c) to buy O(1) column access at execution time, every
tuple, forever. This is the same philosophy as JIT-compiled tuple
deforming in postgres-expression-eval.md: spend plan-time cycles to
make per-tuple cycles cheap.
Compiled vs. interpreted plans
Section titled “Compiled vs. interpreted plans”HyPer and Umbra (Neumann, Efficiently Compiling Efficient Query Plans
for Modern Hardware, VLDB 2011) push the idea further: instead of
producing an interpreted operator tree, they compile the plan to
machine code, fusing pipelines so that a tuple is pushed through a chain
of operators with no per-operator virtual call. In that world the
“plan” is a code-generation IR, and the analogue of setrefs.c’s
positional binding is register/stack-slot allocation in the generated
code. PostgreSQL’s REL_18 baseline remains an interpreted iterator tree
(its LLVM JIT compiles expressions, not the operator tree), so the
Plan node stays a durable interpretable structure — which is precisely
why positional Var resolution must be baked in at plan time.
Plan caching and the read-only plan tree
Section titled “Plan caching and the read-only plan tree”Because set_plan_references produces a Plan whose every reference is
resolved to offsets and whose range table is flattened and frozen, the
finished Plan is immutable and reusable. This is what lets
plancache.c cache a prepared statement’s plan and re-execute it across
transactions and parallel workers — each execution builds a fresh
mutable PlanState tree (postgres-executor.md) over the shared
read-only Plan. The dependency lists set_plan_references accumulates
(glob->relationOids, glob->invalItems) are exactly what plan-cache
invalidation keys on when a referenced relation or function changes.
Engines without a separate immutable-plan artifact (those that re-plan
on every execution) avoid this bookkeeping but forfeit the caching win.
Subquery decorrelation
Section titled “Subquery decorrelation”PostgreSQL’s initPlan/SubPlan split (subselect.c) is a local
decision: a subquery is hoisted to an initPlan only when it is already
uncorrelated. It does not aggressively decorrelate — rewrite a
correlated subquery into a join — beyond the pull-up transforms done
earlier in the rewriter/planner for the cases it recognizes
(ANY/EXISTS → semijoin). Research and commercial systems (the
classic work is Neumann & Kemper, Unnesting Arbitrary Queries, BTW
2015) decorrelate arbitrary nested queries into a single dependent
join that the cost optimizer can then plan freely, often a large win for
analytic workloads. PostgreSQL’s per-tuple SubPlan is the conservative
fallback when decorrelation does not fire — correct always, but
potentially O(outer rows) re-executions of the subquery.
Where PostgreSQL sits
Section titled “Where PostgreSQL sits”PostgreSQL deliberately favors a readable, debuggable, two-pass lowering
with an immutable cacheable output over the fused-search-and-codegen
designs of compilation-first engines. For the workloads it targets —
mixed OLTP with moderate analytics, heavy reliance on prepared-statement
plan caching — the separation of createplan.c (structure) from
setrefs.c (binding) from subselect.c (subquery wiring) keeps each
concern independently testable, which matches the engineering values of
the rest of the tree documented in this code-analysis/postgres/ series.
Sources
Section titled “Sources”- Source tree:
/data/hgryoo/references/postgres, REL_18_STABLE at commit273fe94852b3a7e34fd171e8abdf1481beb302fa, read 2026-06-05. Core only (no contrib).src/backend/optimizer/plan/createplan.c— Path → Plan lowering (Pass 1).src/backend/optimizer/plan/setrefs.c—set_plan_referencesand positional Var binding (Pass 2).src/backend/optimizer/plan/subselect.c— initPlan/SubPlan construction andSS_attach_initplans.src/include/nodes/primnodes.h—INNER_VAR/OUTER_VAR/INDEX_VAR,IS_SPECIAL_VARNO,Varstruct.src/include/nodes/plannodes.h,pathnodes.h—Plan/Pathnode definitions.
- Silberschatz, Korth, Sudarshan, Database System Concepts, 7e,
ch. 15 “Query Processing” and ch. 16 “Query Optimization” (esp.
§16.1 cost-based plan choice, §16.4.4 nested-subquery optimization).
Capture:
knowledge/research/dbms-general/database-system-concepts.md. - Selinger et al. 1979, Access Path Selection in a Relational Database
Management System (System R) — the bottom-up join search whose Paths
this module lowers. See
.omc/plans/postgres-paper-bibliography.md→dbms-papers/systemr-optimizer.md. - Neumann, Efficiently Compiling Efficient Query Plans for Modern Hardware, VLDB 2011 — compilation-first contrast.
- Neumann & Kemper, Unnesting Arbitrary Queries, BTW 2015 — subquery decorrelation contrast.
- Cross-references within this series:
postgres-planner-overview.md(join search that produces the Path),postgres-path-generation.md(how individual Paths are built and costed),postgres-executor.md(the consumer of the finished Plan),postgres-expression-eval.md(how the resolved expressions are evaluated at run time).