PostgreSQL Parse Analysis — Transforming the Raw Tree into a Query
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”A SQL statement enters the engine as a string and must leave the front end as a structure the optimizer can reason about. Database System Concepts (Silberschatz, 7e, ch. 15 “Query Processing”, §15.1 “Overview”) names this the very first of three processing steps:
“1. Parsing and translation. 2. Optimization. 3. Evaluation.” (DSC §15.1)
and is explicit that step 1 is not one thing but two — a syntactic phase and a semantic phase fused under the word “parser”:
“Thus, the first action the system must take in query processing is to translate a given query into its internal form. This translation process is similar to the work performed by the parser of a compiler. In generating the internal form of the query, the parser checks the syntax of the user’s query, verifies that the relation names appearing in the query are names of the relations in the database, and so on. The system constructs a parse-tree representation of the query, which it then translates into a relational-algebra expression. If the query was expressed in terms of a view, the translation phase also replaces all uses of the view by the relational-algebra expression that defines the view.” (DSC §15.1)
Two activities are bundled in that sentence, and the distinction is the central design axis of any front end:
- Syntactic analysis (parsing). Is this a well-formed SQL sentence? Tokenize, match the grammar, build a parse tree. This phase is a pure function of the input string — it consults nothing outside the grammar. A compiler-textbook LALR parser is the canonical realization.
- Semantic analysis (the “translation”/“verifies that…” half).
Do the names in this well-formed sentence mean anything? Does
table
texist? Does columncbelong to it, and what is its type? IsWHERE xa boolean? This phase is a function of the catalog — it cannot be done by the grammar because the grammar does not know what tables exist.
The textbook leaves the boundary between these two implicit, but it is the load-bearing seam. Syntactic analysis can be a fixed, catalog-free, reentrant routine; semantic analysis must read the system catalog, may acquire locks on the relations it resolves, and must produce typed, position-resolved references. Almost every production engine draws a hard line here, with a distinct intermediate representation on each side:
- the raw parse tree — syntactic, names unresolved, the literal shape of what the user wrote; and
- the resolved query tree (“internal form”, DSC’s relational-algebra expression) — every name bound to a catalog object, every expression typed.
A third property the textbook names in passing is view expansion: “the translation phase also replaces all uses of the view by the relational-algebra expression that defines the view.” Whether that substitution happens during semantic analysis or in a later dedicated phase is an engineering choice (PostgreSQL defers it to a separate rewriter, covered elsewhere), but the textbook files it under the same “translate to internal form” umbrella, so it belongs in the design space this document sits in.
PostgreSQL implements this split almost exactly. The grammar
(gram.y, owned by postgres-parser.md) does the syntactic phase and
emits a RawStmt; parse analysis — the subject of this document —
does the semantic phase, walking the RawStmt and producing a
fully-resolved Query. The rest of this document traces that walk in
the REL_18 source. The textbook’s “verifies that relation names are
names of relations” becomes catalog lookup plus range-table
construction; its “translates into a relational-algebra expression”
becomes the Query node with its jointree, target list, and
qualifier expressions.
Common DBMS Design
Section titled “Common DBMS Design”The textbook gives the model — a syntactic phase feeding a semantic phase that resolves names against the catalog and emits a typed internal form. This section names the engineering conventions that recur across production semantic analyzers, the patterns the textbook leaves implicit. PostgreSQL’s specific choices in §“PostgreSQL’s Approach” are best read as one set of dials within this shared space.
A separate IR for “resolved”, distinct from “parsed”
Section titled “A separate IR for “resolved”, distinct from “parsed””The raw parse tree mirrors surface syntax: a table is just a name, a
column reference is just a list of identifiers (a.b.c), SELECT *
is a literal star. The resolved tree must instead carry bound
references: a table becomes an index into a per-query list of resolved
relations, a column reference becomes a typed pointer at one such
relation’s nth attribute, the star is gone — expanded into the
actual columns. Keeping these two as distinct node types (rather
than mutating the raw tree in place) means the syntactic phase stays a
catalog-free pure function that can run in a frontend library, be
cached, or be re-analyzed against a different catalog snapshot.
The range table: one indexed list of “relations this query reads”
Section titled “The range table: one indexed list of “relations this query reads””The recurring representation for “which relations does this query
touch” is a flat, 1-based indexed list — every table, subquery,
function-in-FROM, join, or VALUES list the query references gets one
slot. A resolved column reference then encodes its source as a pair
(range-table index, attribute number) rather than carrying a table
name. This indirection is what makes the rest of the stack tractable:
the optimizer reorders joins by shuffling references to range-table
indexes, never by rewriting names; two columns from the same table
share one range-table slot; and a column’s identity survives every
later transformation because it is a pair of integers, not a string.
Scope as a namespace stack
Section titled “Scope as a namespace stack”Name resolution needs a notion of what is visible here. SQL scoping
is not trivial — a subquery in FROM sees the outer query’s columns
only through LATERAL; a join’s USING columns hide the inputs’
copies; an outer query level is searched only after the local one.
The convention is a namespace structure, layered or stacked, that
the analyzer grows as it walks into a FROM item and consults
whenever it must turn an identifier into a bound reference. Resolving
x means searching the current namespace for a relation exposing a
column named x, erroring on ambiguity, and recursing to the parent
scope only if the local search fails.
Clause ordering driven by data dependencies, not syntax order
Section titled “Clause ordering driven by data dependencies, not syntax order”The clauses of a SELECT are written SELECT … FROM … WHERE … GROUP BY … HAVING … ORDER BY, but they cannot be analyzed in that order.
WHERE and the target list both reference columns, so FROM (which
populates the namespace) must be analyzed first. GROUP BY and
DISTINCT may reference target-list aliases or ORDER BY sort items,
so the target list and sort clause must be built before them. The
convention is to analyze clauses in dependency order, not the
order the user typed, and to thread the partially-built pieces (the
growing target list especially) through the steps that extend them.
Auto-naming and implicit type coercion at the boundary
Section titled “Auto-naming and implicit type coercion at the boundary”Two janitorial conventions almost every analyzer carries. First,
output-column naming: an unnamed target-list expression
(SELECT a+1) still needs a column label, so the analyzer derives one
from the expression’s shape (the function name, the column name, or a
fallback). Second, context-driven coercion: WHERE, HAVING, and
CHECK expressions must be boolean; LIMIT must be an integer; a
UNION’s two arms must share a type. The analyzer applies implicit
casts at these well-known boundaries and errors when no cast exists.
Theory ↔ PostgreSQL mapping
Section titled “Theory ↔ PostgreSQL mapping”By the time you reach a named PostgreSQL symbol in the next section, you should already know what kind of thing it is:
| Theory / convention | PostgreSQL name |
|---|---|
| Syntactic phase output (raw parse tree) | RawStmt wrapping SelectStmt / InsertStmt / … (from gram.y) |
| Semantic phase (this document) | parse analysis — transformStmt and the transform* tree |
| Resolved/internal form (relational algebra) | Query node (commandType, jointree, targetList, …) |
| Per-query analysis scratch state | ParseState (pstate) |
| Range table (indexed list of relations) | pstate->p_rtable — list of RangeTblEntry (RTE); index is varno |
| One range-table entry | RangeTblEntry built by addRangeTableEntryForRelation & siblings |
| Bound column reference | Var carrying (varno, varattno, vartype, varcollid) |
| Visible-names scope | pstate->p_namespace — list of ParseNamespaceItem (nsitem) |
| Per-column resolution data | ParseNamespaceColumn (in each nsitem’s p_nscolumns) |
| FROM/join structure of the query | pstate->p_joinlist → Query.jointree (a FromExpr) |
| Target-list column slot counter | pstate->p_next_resno → each TargetEntry.resno |
| Star expansion | ExpandColumnRefStar / ExpandAllTables |
| Output-column auto-naming | FigureColname |
| Boolean / integer coercion at clause edges | coerce_to_boolean, coerce_to_specific_type |
| Catalog name lookup for a table | transformTableEntry → addRangeTableEntry (opens & locks the relation) |
The syntactic phase — how gram.y builds the RawStmt and why it is
catalog-free — is owned by postgres-parser.md. View expansion and
rule application (DSC’s “replace uses of the view”) happen in a
separate phase owned by the rewriter (postgres-rewriter.md).
Expression-internal resolution — how transformExpr turns an
A_Expr into an OpExpr, resolves function calls, and folds
constants — is largely owned by postgres-parser.md’s expression
section; this document treats transformExpr as a black box that
returns a typed Node. What the planner does with the finished
Query is owned by postgres-planner-overview.md. This document
covers the transform framework: the ParseState, range-table and
RTE construction, FROM/JOIN, the target list, GROUP/HAVING/ORDER
resolution, and how the clauses thread together into a Query.
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”PostgreSQL’s parse analysis is the semantic phase rendered as a family
of mutually-recursive transform* functions over the raw tree, with
four design decisions that give it its shape:
- One mutable
ParseStatethreads the whole pass. Rather than returning accumulators up the call stack, the analyzer carries apstatethat everytransform*function reads and mutates: the range table grows inp_rtable, the visible scope inp_namespace, the join structure inp_joinlist, the next target-list slot inp_next_resno. - The raw
…Stmtnode and the resolvedQuerynode are distinct types.transformSelectStmtconsumes aSelectStmt(syntactic) and produces aQuery(resolved). The raw tree is never mutated into the result; the two trees have different node tags. - Clauses are transformed in dependency order, not syntax order:
FROMbefore everything (it fills the namespace), target list andORDER BYbeforeGROUP BY/DISTINCT(which reuse them), collation assignment and aggregate-legality checks last (they need the whole tree). - A single dispatch on node tag.
transformStmtswitches on the raw statement’s node tag to picktransformSelectStmt,transformInsertStmt, etc.; anything not “optimizable” gets wrapped in aCMD_UTILITYQueryand passed through untouched.
Entry points and the dispatch
Section titled “Entry points and the dispatch”Parse analysis is entered through parse_analyze_fixedparams (and
_varparams / _withcb variants for different parameter-handling
needs). Each builds a fresh top-level ParseState, records the source
text (for error cursor positions), and calls transformTopLevelStmt:
// parse_analyze_fixedparams — src/backend/parser/analyze.c (condensed)ParseState *pstate = make_parsestate(NULL);Query *query;
pstate->p_sourcetext = sourceText;if (numParams > 0) setup_parse_fixed_parameters(pstate, paramTypes, numParams);pstate->p_queryEnv = queryEnv;
query = transformTopLevelStmt(pstate, parseTree);// ... JumbleQuery, post_parse_analyze_hook ...free_parsestate(pstate);return query;transformTopLevelStmt exists only to copy statement location data
and to allow the top-level-only SELECT … INTO → CREATE TABLE AS
rewrite; it then funnels into transformStmt, the dispatcher:
// transformStmt — src/backend/parser/analyze.c (condensed)switch (nodeTag(parseTree)){ case T_InsertStmt: result = transformInsertStmt(pstate, (InsertStmt *) parseTree); break; case T_DeleteStmt: result = transformDeleteStmt(pstate, (DeleteStmt *) parseTree); break; case T_UpdateStmt: result = transformUpdateStmt(pstate, (UpdateStmt *) parseTree); break; case T_SelectStmt: { SelectStmt *n = (SelectStmt *) parseTree; if (n->valuesLists) result = transformValuesClause(pstate, n); else if (n->op == SETOP_NONE) result = transformSelectStmt(pstate, n); else result = transformSetOperationStmt(pstate, n); } break; // ... DeclareCursor, Explain, CreateTableAs, Call ... default: /* not optimizable: wrap a CMD_UTILITY Query around it */ result = makeNode(Query); result->commandType = CMD_UTILITY; result->utilityStmt = (Node *) parseTree; break;}The default arm is the textbook’s “and so on” — CREATE TABLE,
GRANT, and most DDL need no semantic analysis here (their analysis,
if any, runs at execution time, per parse_utilcmd.c), so they are
wrapped untouched. The recursion into subqueries uses a sibling entry
point, parse_sub_analyze, which makes a child ParseState linked
to the parent via parentParseState — that link is what lets an inner
query resolve an outer column (a correlated reference).
flowchart TB RAW["RawStmt<br/>(from gram.y — syntactic only)"] PA["parse_analyze_fixedparams<br/>make_parsestate(NULL)"] TTLS["transformTopLevelStmt<br/>(SELECT INTO rewrite, location)"] TS["transformStmt<br/>(switch on nodeTag)"] RAW --> PA --> TTLS --> TS TS -->|T_SelectStmt| TSEL["transformSelectStmt"] TS -->|T_InsertStmt| TINS["transformInsertStmt"] TS -->|T_UpdateStmt| TUPD["transformUpdateStmt"] TS -->|T_DeleteStmt| TDEL["transformDeleteStmt"] TS -->|other| UTIL["CMD_UTILITY Query<br/>(passed through)"] TSEL --> Q["Query<br/>(resolved internal form)"] TINS --> Q TUPD --> Q TDEL --> Q
Figure 1 — The dispatch funnel. A catalog-free RawStmt from the
grammar enters parse_analyze_fixedparams, which allocates the
ParseState and dispatches by node tag in transformStmt. Optimizable
statements get a real semantic transform; everything else is wrapped in
a CMD_UTILITY Query and passed through. Every branch yields a
resolved Query. (Source: analyze.c.)
ParseState — the scratchpad threaded through the whole pass
Section titled “ParseState — the scratchpad threaded through the whole pass”Every transform* function takes a ParseState *pstate as its first
argument. It is the analyzer’s working memory: the accumulating range
table, the current name-visible scope, the join structure being built,
flags recording what constructs were seen, and the link to the parent
query level for correlated references.
// struct ParseState — src/include/parser/parse_node.h (condensed)struct ParseState{ ParseState *parentParseState; /* stack link to outer query */ const char *p_sourcetext; /* for error cursor positions */ List *p_rtable; /* range table so far (RangeTblEntry list) */ List *p_rteperminfos; /* permission info, one per RTE_RELATION */ List *p_joinlist; /* join items -> Query.jointree fromlist */ List *p_namespace; /* currently-referenceable RTEs (nsitems) */ bool p_lateral_active; /* are LATERAL-only names visible now? */ List *p_ctenamespace; /* visible WITH items */ Relation p_target_relation; /* INSERT/UPDATE/DELETE/MERGE target */ ParseExprKind p_expr_kind; /* what kind of expression we're in */ int p_next_resno; /* next TargetEntry.resno, from 1 */ /* flags recording what the query contains: */ bool p_hasAggs; bool p_hasWindowFuncs; bool p_hasTargetSRFs; bool p_hasSubLinks; /* ... hooks for parameter/columnref resolution ... */};Three fields carry the bulk of the work and deserve naming up front:
p_rtableis the range table — the indexed list of every relation the query reads. Its 1-based position is thevarnothat every resolvedVarcarries.p_namespaceis the visible scope — a list ofParseNamespaceItem(nsitem) describing which RTEs can currently be referenced, by qualified name (p_rel_visible) and/or by unqualified name (p_cols_visible). It is not the same asp_rtable: an RTE exists in the range table the moment it is added, but becomes visible only when its nsitem is pushed onto the namespace, and a laterFROMitem cannot see an earlier sibling except throughLATERAL.p_joinlistaccumulates the top-level FROM items (RangeTblRefs andJoinExprs) that become theQuery.jointree’s fromlist at the end.
The SELECT spine — transformSelectStmt
Section titled “The SELECT spine — transformSelectStmt”transformSelectStmt is the canonical walk, and its body reads as a
checklist of clauses in dependency order. The ordering comments in
the source are themselves the design documentation:
// transformSelectStmt — src/backend/parser/analyze.c (condensed)Query *qry = makeNode(Query);qry->commandType = CMD_SELECT;
/* process the WITH clause independently of all else */if (stmt->withClause) { ... qry->cteList = transformWithClause(pstate, ...); }
/* make FOR UPDATE/SHARE and WINDOW info available downstream */pstate->p_locking_clause = stmt->lockingClause;pstate->p_windowdefs = stmt->windowClause;
/* process the FROM clause --- fills p_rtable, p_namespace, p_joinlist */transformFromClause(pstate, stmt->fromClause);
/* transform targetlist (expands '*', auto-names columns) */qry->targetList = transformTargetList(pstate, stmt->targetList, EXPR_KIND_SELECT_TARGET);markTargetListOrigins(pstate, qry->targetList);
/* transform WHERE and HAVING (each coerced to boolean) */qual = transformWhereClause(pstate, stmt->whereClause, EXPR_KIND_WHERE, "WHERE");qry->havingQual = transformWhereClause(pstate, stmt->havingClause, EXPR_KIND_HAVING, "HAVING");
/* ORDER BY first because GROUP BY and DISTINCT both need its results */qry->sortClause = transformSortClause(pstate, stmt->sortClause, &qry->targetList, EXPR_KIND_ORDER_BY, false);qry->groupClause = transformGroupClause(pstate, stmt->groupClause, &qry->groupingSets, &qry->targetList, qry->sortClause, EXPR_KIND_GROUP_BY, false);/* ... DISTINCT / DISTINCT ON, LIMIT/OFFSET, window clauses ... */
qry->rtable = pstate->p_rtable;qry->rteperminfos = pstate->p_rteperminfos;qry->jointree = makeFromExpr(pstate->p_joinlist, qual);qry->hasSubLinks = pstate->p_hasSubLinks;qry->hasAggs = pstate->p_hasAggs;
assign_query_collations(pstate, qry);if (pstate->p_hasAggs || qry->groupClause || qry->groupingSets || qry->havingQual) parseCheckAggregates(pstate, qry);return qry;Two ordering decisions are worth dwelling on, because they are the non-obvious part:
- FROM before the target list and WHERE. Both reference columns,
and a column reference can only be resolved once its source RTE is
in
p_namespace. SotransformFromClauseruns first and fully populates the namespace before any expression is transformed. - ORDER BY before GROUP BY and DISTINCT. The source comment says
it plainly: “Do ORDER BY first because both transformGroupClause and
transformDistinctClause need the results.” All three may add resjunk
(hidden) entries to the target list, and they coordinate through the
shared, by-reference
&qry->targetList. The sort clause is built first so the grouping/distinct steps can match against it.
The final three lines are the whole-tree passes: the jointree is
assembled from the accumulated p_joinlist plus the WHERE qual;
assign_query_collations walks every expression assigning collation;
and parseCheckAggregates enforces the SQL rule that every
non-aggregated target-list column appears in GROUP BY.
flowchart TB WITH["WITH (CTEs)"] FROM["transformFromClause<br/>builds RTEs + namespace + joinlist"] TL["transformTargetList<br/>'*' expansion, FigureColname"] WH["transformWhereClause<br/>coerce_to_boolean"] HAV["HAVING<br/>coerce_to_boolean"] OB["transformSortClause<br/>(ORDER BY first)"] GB["transformGroupClause<br/>DISTINCT"] LIM["LIMIT / OFFSET / windows"] COLL["assign_query_collations"] AGG["parseCheckAggregates"] QRY["assemble Query<br/>rtable, jointree, targetList"] WITH --> FROM --> TL --> WH --> HAV --> OB --> GB --> LIM --> QRY --> COLL --> AGG
Figure 2 — Clause transform order inside transformSelectStmt. The
order is driven by data dependencies, not the SQL keyword order: FROM
populates the namespace before anything references columns; ORDER BY
precedes GROUP BY/DISTINCT because the latter reuse the sort target
list; collation assignment and the aggregate-legality check run last
over the assembled tree. (Source: analyze.c
transformSelectStmt.)
FROM and the range table
Section titled “FROM and the range table”transformFromClause iterates the raw FROM list left-to-right (order
matters for LATERAL visibility), transforming each item into a join
node and merging its names into the namespace:
// transformFromClause — src/backend/parser/parse_clause.c (condensed)foreach(fl, frmList){ Node *n = lfirst(fl); ParseNamespaceItem *nsitem; List *namespace;
n = transformFromClauseItem(pstate, n, &nsitem, &namespace); checkNameSpaceConflicts(pstate, pstate->p_namespace, namespace); /* new items are LATERAL-only until the whole FROM list is done */ setNamespaceLateralState(namespace, true, true); pstate->p_joinlist = lappend(pstate->p_joinlist, n); pstate->p_namespace = list_concat(pstate->p_namespace, namespace);}/* once FROM is fully parsed, make everything unconditionally visible */setNamespaceLateralState(pstate->p_namespace, false, true);transformFromClauseItem is a type switch over the kinds of FROM
item. A plain table (RangeVar), a sub-SELECT (RangeSubselect), a
set-returning function (RangeFunction), JSON_TABLE, TABLESAMPLE,
and explicit JoinExprs each route to a dedicated transform that adds
one or more RTEs and returns the nsitem(s) it exposes:
// transformFromClauseItem — src/backend/parser/parse_clause.c (condensed)if (IsA(n, RangeVar)){ /* Plain relation reference, or perhaps a CTE reference */ nsitem = getNSItemForSpecialRelationTypes(pstate, rv); if (!nsitem) nsitem = transformTableEntry(pstate, rv); /* opens + locks the table */ *top_nsitem = nsitem; *namespace = list_make1(nsitem); rtr = makeNode(RangeTblRef); rtr->rtindex = nsitem->p_rtindex; /* points at the new RTE */ return (Node *) rtr;}else if (IsA(n, RangeSubselect)) { ... transformRangeSubselect ... }else if (IsA(n, JoinExpr)) { ... recurse into larg / rarg ... }A plain table eventually reaches transformTableEntry →
addRangeTableEntry, which is where the textbook’s “verifies that the
relation names appearing in the query are names of relations in the
database” actually happens — the relation is opened from the catalog
(taking the appropriate lock), and a RangeTblEntry is built from its
TupleDesc:
// addRangeTableEntryForRelation — src/backend/parser/parse_relation.c (condensed)RangeTblEntry *rte = makeNode(RangeTblEntry);char *refname = alias ? alias->aliasname : RelationGetRelationName(rel);
rte->rtekind = RTE_RELATION;rte->relid = RelationGetRelid(rel); /* the catalog OID */rte->relkind = rel->rd_rel->relkind;rte->rellockmode = lockmode;rte->eref = makeAlias(refname, NIL);buildRelationAliases(rel->rd_att, alias, rte->eref); /* column names */
perminfo = addRTEPermissionInfo(&pstate->p_rteperminfos, rte);perminfo->requiredPerms = ACL_SELECT;
/* append to the range table; its 1-based position is the varno */pstate->p_rtable = lappend(pstate->p_rtable, rte);return buildNSItemFromTupleDesc(rte, list_length(pstate->p_rtable), perminfo, rel->rd_att);The returned nsitem is the resolution table for this relation’s
columns. buildNSItemFromTupleDesc walks the relation’s TupleDesc
and records, per column, exactly the data needed to later synthesize a
Var:
// buildNSItemFromTupleDesc — src/backend/parser/parse_relation.c (condensed)for (varattno = 0; varattno < maxattrs; varattno++){ Form_pg_attribute attr = TupleDescAttr(tupdesc, varattno); if (attr->attisdropped) continue; /* dropped column: leave entry all-zero */ nscolumns[varattno].p_varno = rtindex; /* range-table index */ nscolumns[varattno].p_varattno = varattno + 1; /* 1-based attno */ nscolumns[varattno].p_vartype = attr->atttypid; nscolumns[varattno].p_vartypmod = attr->atttypmod; nscolumns[varattno].p_varcollid = attr->attcollation; nscolumns[varattno].p_varnosyn = rtindex; nscolumns[varattno].p_varattnosyn = varattno + 1;}nsitem->p_rte = rte;nsitem->p_rtindex = rtindex;nsitem->p_nscolumns = nscolumns;nsitem->p_rel_visible = true; /* qualified refs OK */nsitem->p_cols_visible = true; /* unqualified refs OK */So the relationship is: the RTE goes in p_rtable (giving it a
varno); the nsitem goes in p_namespace (making it referenceable);
and each ParseNamespaceColumn is the recipe for the Var that a
column reference to that column will become. When transformExpr
later resolves t.c, it finds the nsitem for t, locates column c,
and copies (p_varno, p_varattno, p_vartype, p_varcollid) straight
into a fresh Var.
Joins, LATERAL, and the visibility dance
Section titled “Joins, LATERAL, and the visibility dance”For an explicit JoinExpr, transformFromClauseItem recurses into
the left arm, then temporarily exposes the left arm’s names so the
right arm can LATERAL-reference them, processes the right arm, then
removes them again:
// transformFromClauseItem (JoinExpr arm) — parse_clause.c (condensed)j->larg = transformFromClauseItem(pstate, j->larg, &l_nsitem, &l_namespace);
/* expose left side to the right side for LATERAL, then process RHS */lateral_ok = (j->jointype == JOIN_INNER || j->jointype == JOIN_LEFT);setNamespaceLateralState(l_namespace, true, lateral_ok);sv_namespace_length = list_length(pstate->p_namespace);pstate->p_namespace = list_concat(pstate->p_namespace, l_namespace);
j->rarg = transformFromClauseItem(pstate, j->rarg, &r_nsitem, &r_namespace);
/* remove the left-side RTEs from the namespace again */pstate->p_namespace = list_truncate(pstate->p_namespace, sv_namespace_length);checkNameSpaceConflicts(pstate, l_namespace, r_namespace);For NATURAL joins and USING, the code then walks the two sides’
column-name lists to compute the merged join columns; the join itself
becomes an RTE_JOIN entry whose nsitem exposes the merged column
set. The subtle visibility rule the ParseNamespaceItem comment
encodes — a join without an alias does not hide its member tables
for qualified references but does hide their columns for
unqualified ones — is exactly why p_rel_visible and
p_cols_visible are two separate booleans rather than one.
The target list — star expansion and auto-naming
Section titled “The target list — star expansion and auto-naming”transformTargetList turns the raw list of ResTarget nodes into a
list of TargetEntry nodes. Its two special jobs are expanding
something.* and auto-naming unlabeled columns:
// transformTargetList — src/backend/parser/parse_target.c (condensed)expand_star = (exprKind != EXPR_KIND_UPDATE_SOURCE);foreach(o_target, targetlist){ ResTarget *res = (ResTarget *) lfirst(o_target); if (expand_star && IsA(res->val, ColumnRef) && IsA(llast(((ColumnRef *) res->val)->fields), A_Star)) { /* "tbl.*" or "*": expand into one target per column */ p_target = list_concat(p_target, ExpandColumnRefStar(pstate, (ColumnRef *) res->val, true)); continue; } /* ordinary expression */ p_target = lappend(p_target, transformTargetEntry(pstate, res->val, NULL, exprKind, res->name, false));}transformTargetEntry transforms the expression (via transformExpr)
and, crucially, supplies a column name when the user didn’t write
AS …:
// transformTargetEntry — src/backend/parser/parse_target.c (condensed)if (expr == NULL) expr = transformExpr(pstate, node, exprKind);if (colname == NULL && !resjunk) colname = FigureColname(node); /* derive a label from the expr shape */return makeTargetEntry((Expr *) expr, (AttrNumber) pstate->p_next_resno++, /* assign resno */ colname, resjunk);pstate->p_next_resno++ is the target-list column counter: each
TargetEntry gets the next resno, starting at 1. FigureColname
descends the expression to pick a label — the last field name of a
ColumnRef, a function’s name for a FuncCall, and so on — falling
back to the literal "?column?" when nothing better is available:
// FigureColname — src/backend/parser/parse_target.c (condensed)char *FigureColname(Node *node){ char *name = NULL; (void) FigureColnameInternal(node, &name); if (name != NULL) return name; return "?column?"; /* the famous default */}After the list is built, markTargetListOrigins annotates each
TargetEntry with the underlying table/column it came from
(resorigtbl / resorigcol) when the expression is a simple Var of
a real relation — this is what lets psql \d-style tools and
information_schema report a result column’s source table. It drills
through subqueries and CTEs but deliberately not through joins or
views.
Resolving GROUP BY / ORDER BY against the target list
Section titled “Resolving GROUP BY / ORDER BY against the target list”ORDER BY and GROUP BY items can be written three ways — a full
expression, a target-list column alias, or an output-column
number (ORDER BY 1). findTargetlistEntrySQL92 implements the
SQL92 alias/number rules: a bare identifier is matched against
existing target-list names (erroring on a non-equal duplicate), and an
integer constant selects the nth output column:
// findTargetlistEntrySQL92 — src/backend/parser/parse_clause.c (condensed)if (IsA(node, ColumnRef) && list_length(cref->fields) == 1 && IsA(linitial(cref->fields), String)){ char *name = strVal(linitial(cref->fields)); if (exprKind == EXPR_KIND_GROUP_BY) { /* GROUP BY prefers a FROM-column match over a tlist alias */ if (colNameToVar(pstate, name, true, location) != NULL) name = NULL; } if (name != NULL) { foreach(tl, *tlist) /* match against existing tlist names */ { TargetEntry *tle = (TargetEntry *) lfirst(tl); if (!tle->resjunk && strcmp(tle->resname, name) == 0) { if (target_result != NULL && !equal(target_result->expr, tle->expr)) ereport(ERROR, (errcode(ERRCODE_AMBIGUOUS_COLUMN), ...)); target_result = tle; } } if (target_result != NULL) return target_result; }}if (IsA(node, A_Const)) { /* "ORDER BY 3" -> the 3rd output column */ }The EXPR_KIND_GROUP_BY branch encodes a genuine SQL subtlety: per
SQL92, an identifier in GROUP BY refers to a FROM column, not a
target-list alias, so the code first checks whether the name resolves
as a FROM column (colNameToVar) and only falls through to the
tlist-alias rules if it doesn’t. transformSortClause and
transformGroupClause both go through this resolver and then call
addTargetToSortList, which appends a resjunk target-list entry if
the sort/group expression isn’t already an output column — this is why
SELECT a FROM t ORDER BY b works even though b is not selected.
Where each clause’s expression resolution lands
Section titled “Where each clause’s expression resolution lands”The leaf of every clause transform is transformExpr (owned by
postgres-parser.md’s expression section), called with a
ParseExprKind tag that tells it the context. transformWhereClause
is representative — transform, then coerce:
// transformWhereClause — src/backend/parser/parse_clause.cNode *qual = transformExpr(pstate, clause, exprKind);qual = coerce_to_boolean(pstate, qual, constructName);return qual;The ParseExprKind enum (EXPR_KIND_WHERE, EXPR_KIND_HAVING,
EXPR_KIND_GROUP_BY, EXPR_KIND_LIMIT, …) is how one
transformExpr produces context-specific error messages — e.g.,
“aggregate functions are not allowed in WHERE” — without each clause
needing its own expression walker. It is the single most reused piece
of context in the whole pass.
Source Walkthrough
Section titled “Source Walkthrough”The symbols below are the stable anchors of the parse-analysis pass. Line numbers drift; grep the symbol name to relocate. The position table at the end records line numbers as observed at the documented revision.
Entry points and dispatch (analyze.c)
Section titled “Entry points and dispatch (analyze.c)”parse_analyze_fixedparams— top-level entry: builds a rootParseState, sets source text and fixed$nparameter types, callstransformTopLevelStmt, runsJumbleQueryandpost_parse_analyze_hook, frees the pstate. Siblingsparse_analyze_varparams(inferred param types) andparse_analyze_withcb(caller-supplied param hook).parse_sub_analyze— recursive entry for a sub-statement; makes a childParseStatewithparentParseStateset, so the subquery can resolve correlated references to the outer query.transformTopLevelStmt— copiesstmt_location/stmt_leninto the result and allows the top-onlySELECT … INTOrewrite viatransformOptionalSelectInto.transformStmt— the node-tag dispatcher. RoutesT_SelectStmt/T_InsertStmt/T_UpdateStmt/T_DeleteStmt/T_MergeStmtto their transforms; wraps everything else in aCMD_UTILITYQuery. Marks the resultQSRC_ORIGINAL,canSetTag = true.stmt_requires_parse_analysis— companion predicate (true exactly for the statement typestransformStmtdoes non-trivial work on); used by callers to decide whether re-analysis is ever needed.
The SELECT transform (analyze.c)
Section titled “The SELECT transform (analyze.c)”transformSelectStmt— the spine: WITH, FROM, target list, WHERE/HAVING, ORDER BY → GROUP BY → DISTINCT, LIMIT, windows, collations, aggregate check. Assembles theQuery.transformValuesClause— standaloneVALUES (…)as a SELECT; builds a column-organized intermediate then anRTE_VALUES.transformInsertStmt/transformUpdateStmt/transformDeleteStmt— the DML transforms; each callssetTargetTableto add and lock the target relation, then transform their respective clauses (RETURNINGviatransformReturningList).
ParseState and namespace (parse_node.h, parse_node.c)
Section titled “ParseState and namespace (parse_node.h, parse_node.c)”struct ParseState— the threaded scratch state;p_rtable,p_namespace,p_joinlist,p_next_resnoare the load-bearing fields.make_parsestate/free_parsestate— allocate/free the pstate;make_parsestateinherits a parent for subqueries.struct ParseNamespaceItem— one visible relation: its RTE, range-table index, per-column data, and thep_rel_visible/p_cols_visible/p_lateral_onlyvisibility flags.struct ParseNamespaceColumn— per-columnVar-construction recipe (p_varno,p_varattno,p_vartype,p_varcollid, syntactic variants).ParseExprKind(enum) — the per-context tag threaded intotransformExpr.
FROM / range table (parse_clause.c, parse_relation.c)
Section titled “FROM / range table (parse_clause.c, parse_relation.c)”transformFromClause— iterate FROM items left-to-right; merge namespaces; buildp_joinlist.transformFromClauseItem— type switch overRangeVar/RangeSubselect/RangeFunction/RangeTableFunc/JsonTable/RangeTableSample/JoinExpr; recurses for joins with the LATERAL-visibility dance.transformTableEntry→addRangeTableEntry/addRangeTableEntryForRelation— open & lock the relation, build theRTE_RELATION, append top_rtable.addRangeTableEntryForSubquery—RTE_SUBQUERYfrom an analyzed sub-Query.buildNSItemFromTupleDesc— fillParseNamespaceColumn[]from a relation’sTupleDesc.addNSItemToQuery— push an nsitem onto joinlist and/or namespace with the right visibility flags.setTargetTable— add + lock the INSERT/UPDATE/DELETE/MERGE target, setp_target_relation/p_target_nsitem.scanRTEForColumn/colNameToVar— resolve an (unqualified) column name to an attribute /Var, erroring on ambiguity.GetRTEByRangeTablePosn— fetch the RTE at a(varno, levelsup), walking up parent pstates.addRTEPermissionInfo— record theRTEPermissionInfofor anRTE_RELATION(permissions checked later, at execution).
Target list (parse_target.c)
Section titled “Target list (parse_target.c)”transformTargetList—ResTargetlist →TargetEntrylist, withsomething.*expansion.transformTargetEntry— transform one expression, assignresnoviap_next_resno++, auto-name viaFigureColname.transformExpressionList— same as the target list but for bare expressions (ROW(),VALUES).ExpandColumnRefStar/ExpandAllTables— expandtbl.*/ bare*into per-column targets.FigureColname/FigureColnameInternal— derive an output column name from an expression’s shape;"?column?"fallback.markTargetListOrigins/markTargetListOrigin— setresorigtbl/resorigcolfor simple-Var targets.resolveTargetListUnknowns— coerce still-unknown output columns totext.
Sorting / grouping / clause coercion (parse_clause.c)
Section titled “Sorting / grouping / clause coercion (parse_clause.c)”transformSortClause— ORDER BY →SortGroupClauselist (run before GROUP BY).transformGroupClause/transformGroupingSet— GROUP BY, includingCUBE/ROLLUP/GROUPING SETSflattening.transformDistinctClause/transformDistinctOnClause— DISTINCT and DISTINCT ON.findTargetlistEntrySQL92/findTargetlistEntrySQL99— resolve a sort/group item as a tlist alias, output-column number, or full expression.transformWhereClause— transform +coerce_to_boolean.transformLimitClause— transform +coerce_to_specific_type(INT8) + var-free check.transformWindowDefinitions—WindowDef→WindowClause.
Whole-tree finishing passes
Section titled “Whole-tree finishing passes”makeFromExpr(makefuncs.c) — wrapp_joinlist+ WHERE qual into theQuery.jointree.assign_query_collations(parse_collate.c) — assign collations across all expressions.parseCheckAggregates(parse_agg.c) — enforce the GROUP-BY/aggregate legality rule.transformWithClause(parse_cte.c) — analyze WITH items intocteList.transformExpr(parse_expr.c) — the per-expression resolver every clause bottoms out in (black box here; owned bypostgres-parser.md).
Position hints (as of 2026-06-05, REL_18 273fe94)
Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”| Symbol | File | Line |
|---|---|---|
parse_analyze_fixedparams | parser/analyze.c | 105 |
parse_sub_analyze | parser/analyze.c | 222 |
transformTopLevelStmt | parser/analyze.c | 249 |
transformOptionalSelectInto | parser/analyze.c | 273 |
transformStmt | parser/analyze.c | 312 |
stmt_requires_parse_analysis | parser/analyze.c | 447 |
transformDeleteStmt | parser/analyze.c | 553 |
transformInsertStmt | parser/analyze.c | 633 |
transformSelectStmt | parser/analyze.c | 1389 |
transformValuesClause | parser/analyze.c | 1532 |
transformUpdateStmt | parser/analyze.c | 2472 |
struct ParseState | include/parser/parse_node.h | 192 |
struct ParseNamespaceItem | include/parser/parse_node.h | 292 |
struct ParseNamespaceColumn | include/parser/parse_node.h | 328 |
ParseExprKind (enum) | include/parser/parse_node.h | 38 |
make_parsestate | parser/parse_node.c | 39 |
transformFromClause | parser/parse_clause.c | 112 |
transformTableEntry | parser/parse_clause.c | 395 |
transformFromClauseItem | parser/parse_clause.c | 1054 |
transformWhereClause | parser/parse_clause.c | 1854 |
transformLimitClause | parser/parse_clause.c | 1881 |
findTargetlistEntrySQL92 | parser/parse_clause.c | 2006 |
findTargetlistEntrySQL99 | parser/parse_clause.c | 2172 |
transformGroupClause | parser/parse_clause.c | 2632 |
transformSortClause | parser/parse_clause.c | 2732 |
transformWindowDefinitions | parser/parse_clause.c | 2765 |
GetRTEByRangeTablePosn | parser/parse_relation.c | 545 |
scanRTEForColumn | parser/parse_relation.c | 815 |
colNameToVar | parser/parse_relation.c | 898 |
buildNSItemFromTupleDesc | parser/parse_relation.c | 1309 |
addRangeTableEntry | parser/parse_relation.c | 1487 |
addRangeTableEntryForRelation | parser/parse_relation.c | 1584 |
addRangeTableEntryForSubquery | parser/parse_relation.c | 1655 |
addNSItemToQuery | parser/parse_relation.c | 2710 |
addRTEPermissionInfo | parser/parse_relation.c | 3980 |
transformTargetEntry | parser/parse_target.c | 75 |
transformTargetList | parser/parse_target.c | 121 |
transformExpressionList | parser/parse_target.c | 220 |
resolveTargetListUnknowns | parser/parse_target.c | 288 |
markTargetListOrigins | parser/parse_target.c | 318 |
ExpandColumnRefStar | parser/parse_target.c | 1123 |
ExpandAllTables | parser/parse_target.c | 1296 |
FigureColname | parser/parse_target.c | 1713 |
assign_query_collations | parser/parse_collate.c | 101 |
parseCheckAggregates | parser/parse_agg.c | 1138 |
transformWithClause | parser/parse_cte.c | 110 |
transformExpr | parser/parse_expr.c | 119 |
makeFromExpr | nodes/makefuncs.c | 336 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”Verified facts
Section titled “Verified facts”- The grammar output (
RawStmt) names no catalog objects; parse analysis is where catalog lookup first happens. Verified by readingtransformTableEntry→addRangeTableEntry→addRangeTableEntryForRelationinparse_clause.c/parse_relation.con 2026-06-05: the relation is opened (RelationGetRelid, lock taken) inside parse analysis, not by the grammar. This is the concrete realization of DSC §15.1’s “verifies that the relation names … are names of the relations in the database.” transformStmtwraps every non-optimizable statement in aCMD_UTILITYQueryand does no further analysis. Verified in thedefault:arm oftransformStmt(analyze.c) on 2026-06-05. DDL semantic analysis (parse_utilcmd.c) is deferred to execution time — confirmed by the parser/README line “parse_utilcmd.c parse analysis for utility commands (done at execution time).”- Clause transform order is dependency-driven, not SQL-keyword
order. Verified by reading
transformSelectStmt(analyze.c):transformFromClauseruns before the target list and WHERE, and the source comment “Do ORDER BY first because both transformGroupClause and transformDistinctClause need the results” precedestransformSortClause, which is called beforetransformGroupClauseand the DISTINCT transforms. - A column’s resolved identity is
(varno, varattno)plus type and collation, copied from the nsitem’sParseNamespaceColumn. Verified inbuildNSItemFromTupleDesc(parse_relation.c): eachParseNamespaceColumnrecordsp_varno,p_varattno,p_vartype,p_vartypmod,p_varcollid— exactly the fields aVarneeds. - The default output-column name for an un-nameable expression is
the literal string
"?column?". Verified inFigureColname(parse_target.c) on 2026-06-05 — a hard-coded fallback, not a parameter. p_rel_visibleandp_cols_visibleare two independent booleans so that an unaliased JOIN can keep its member tables qualified-name-visible while hiding their columns from unqualified references. Verified in thestruct ParseNamespaceItemcomment and the join arm oftransformFromClauseItemon 2026-06-05.LATERALvisibility is implemented by temporarily concatenating the left arm’s namespace before processing the right arm, then truncating it back. Verified in theJoinExprarm oftransformFromClauseItem(list_concat…list_truncatearound the RHS recursion) on 2026-06-05.- WHERE/HAVING are coerced to boolean and LIMIT to int8 inside the
clause transforms. Verified:
transformWhereClausecallscoerce_to_boolean;transformLimitClausecallscoerce_to_specific_type(..., INT8OID, ...)andcheckExprIsVarFreeon 2026-06-05.
Open questions
Section titled “Open questions”- Lock-ordering guarantee for self-referential DML. The comment
on
setTargetTableclaims the write lock on the target must be grabbed before any read lock “in case the target is also mentioned as a source relation.” Does any later code path (e.g., a CTE that re-references the target, or a rule expansion) violate this intended ordering? Investigation path: tracesetTargetTablecallers intransformInsertStmt/transformUpdateStmt/transformMergeStmtand compare againsttransformFromClauselock acquisition order. - Cost of re-analysis under plan invalidation. Because
stmt_requires_parse_analysisreturning false lets callers skip the whole analyze/rewrite/plan pipeline, when exactly is parse analysis re-run for a cached plan whose catalog dependency changed? Investigation path: followRevalidateCachedQueryinplancache.cand where it re-entersparse_analyze_*— this crosses into territory owned by a futurepostgres-plan-cache.md. markTargetListOriginsand security-barrier views. The function deliberately does not drill through views (reports the view as owner). Does this interact with row-level security or security-barrier view leakproofness in any observable way for information_schema consumers? Investigation path: cross-check with the rewriter’s view-expansion handling (postgres-rewriter.md, not yet written).
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”- The raw-tree / query-tree split as a reuse point. PostgreSQL’s
hard line between the catalog-free
RawStmtand the resolvedQueryis the same seam that letslibpg_query(pganalyze) ship the grammar as a standalone library for SQL linters and ORMs — they get step 1 (parsing) without a running server. A comparison with how SQLite fuses parse + resolve + codegen into one pass (no separate resolved IR at all) would sharpen why PostgreSQL pays for two trees. - Range table as the join-reordering substrate. PostgreSQL encodes
every column reference as
(varno, varattno)so the optimizer can reorder joins by permuting range-table indexes. This is the classic System R / Selinger (SIGMOD 1979) representation; the next document to write ispostgres-planner-overview.md’s account of howRelOptInfokeys off the range table. Cross-ref: this doc’s RTE construction is the input to that. - Name resolution and binding in the compiler literature. The ParseState namespace stack is the database analog of a compiler’s symbol-table / scope chain (Aho, Lam, Sethi & Ullman, Compilers, ch. 2 & 6). The LATERAL “expose then truncate” dance is essentially nested-scope management; a note connecting it to lexical scoping rules in block-structured languages would orient compiler-trained readers.
- View expansion: analyze-time vs. rewrite-time. DSC §15.1 files
view substitution under “translation,” but PostgreSQL defers it to a
separate rule-based rewriter (Stonebraker’s POSTGRES rule system,
The Design of POSTGRES, SIGMOD 1986). The trade-off — a clean
analyze/rewrite boundary at the cost of a second tree-walk — is the
subject of the not-yet-written
postgres-rewriter.md; this doc’sQueryis exactly that rewriter’s input. - Holistic semantic checking vs. streaming. PostgreSQL does
several whole-tree passes at the end (
assign_query_collations,parseCheckAggregates). An engine that wanted lower latency to first error might interleave these into the single descent; a comparison with how DuckDB’s binder structures its passes would reveal what PostgreSQL trades for the simpler two-stage shape.
Sources
Section titled “Sources”Raw analysis materials consumed: none — this document was
synthesized directly from the REL_18 source tree and the textbook
references below (sources: [] in frontmatter).
Textbook chapters:
- Silberschatz, Korth & Sudarshan, Database System Concepts, 7th
ed., ch. 15 “Query Processing”, §15.1 “Overview” — the
parsing/translation/optimization/evaluation framing and the
“verifies that relation names are names of relations” /
“translates into a relational-algebra expression” account of
semantic analysis and view expansion. Captured at
knowledge/research/dbms-general/database-system-concepts.md. - Hellerstein, Stonebraker & Hamilton, Architecture of a Database
System (FnT 2007), §4 “Relational Query Processor” — the query
processor as a pipeline of parser → rewriter → optimizer →
executor. Captured under
dbms-papers/fntdb07-architecture.md.
Source code (REL_18, 273fe94, as of 2026-06-05):
src/backend/parser/README— directory map; the syntactic/semantic split and the “done at execution time” note for utility commands.src/backend/parser/analyze.c— entry points,transformStmtdispatch,transformSelectStmtand the DML transforms.src/backend/parser/parse_clause.c— FROM/JOIN, WHERE/HAVING/LIMIT, ORDER BY / GROUP BY resolution.src/backend/parser/parse_relation.c— range-table and RTE construction, namespace/nsitem building, column resolution.src/backend/parser/parse_target.c— target-list transform, star expansion,FigureColname, origin marking.src/backend/parser/parse_node.c,src/include/parser/parse_node.h—ParseState,ParseNamespaceItem,ParseNamespaceColumn,ParseExprKind.src/backend/parser/parse_collate.c,parse_agg.c,parse_cte.c,parse_expr.c,src/backend/nodes/makefuncs.c— the finishing passes and the expression-resolution leaf.
Related documents in this tree:
postgres-parser.md— the syntactic phase (gram.y, scanner) and expression-internal resolution (transformExprinternals).postgres-planner-overview.md— what the planner does with the finishedQuery.postgres-executor.md— the demand-pull execution of the resulting plan.postgres-overview-query-processing.md— the query-processing subsystem map this document sits within.