Skip to content

PostgreSQL Parse Analysis — Transforming the Raw Tree into a Query

Contents:

A SQL statement enters the engine as a string and must leave the front end as a structure the optimizer can reason about. Database System Concepts (Silberschatz, 7e, ch. 15 “Query Processing”, §15.1 “Overview”) names this the very first of three processing steps:

“1. Parsing and translation. 2. Optimization. 3. Evaluation.” (DSC §15.1)

and is explicit that step 1 is not one thing but two — a syntactic phase and a semantic phase fused under the word “parser”:

“Thus, the first action the system must take in query processing is to translate a given query into its internal form. This translation process is similar to the work performed by the parser of a compiler. In generating the internal form of the query, the parser checks the syntax of the user’s query, verifies that the relation names appearing in the query are names of the relations in the database, and so on. The system constructs a parse-tree representation of the query, which it then translates into a relational-algebra expression. If the query was expressed in terms of a view, the translation phase also replaces all uses of the view by the relational-algebra expression that defines the view.” (DSC §15.1)

Two activities are bundled in that sentence, and the distinction is the central design axis of any front end:

  1. Syntactic analysis (parsing). Is this a well-formed SQL sentence? Tokenize, match the grammar, build a parse tree. This phase is a pure function of the input string — it consults nothing outside the grammar. A compiler-textbook LALR parser is the canonical realization.
  2. Semantic analysis (the “translation”/“verifies that…” half). Do the names in this well-formed sentence mean anything? Does table t exist? Does column c belong to it, and what is its type? Is WHERE x a boolean? This phase is a function of the catalog — it cannot be done by the grammar because the grammar does not know what tables exist.

The textbook leaves the boundary between these two implicit, but it is the load-bearing seam. Syntactic analysis can be a fixed, catalog-free, reentrant routine; semantic analysis must read the system catalog, may acquire locks on the relations it resolves, and must produce typed, position-resolved references. Almost every production engine draws a hard line here, with a distinct intermediate representation on each side:

  • the raw parse tree — syntactic, names unresolved, the literal shape of what the user wrote; and
  • the resolved query tree (“internal form”, DSC’s relational-algebra expression) — every name bound to a catalog object, every expression typed.

A third property the textbook names in passing is view expansion: “the translation phase also replaces all uses of the view by the relational-algebra expression that defines the view.” Whether that substitution happens during semantic analysis or in a later dedicated phase is an engineering choice (PostgreSQL defers it to a separate rewriter, covered elsewhere), but the textbook files it under the same “translate to internal form” umbrella, so it belongs in the design space this document sits in.

PostgreSQL implements this split almost exactly. The grammar (gram.y, owned by postgres-parser.md) does the syntactic phase and emits a RawStmt; parse analysis — the subject of this document — does the semantic phase, walking the RawStmt and producing a fully-resolved Query. The rest of this document traces that walk in the REL_18 source. The textbook’s “verifies that relation names are names of relations” becomes catalog lookup plus range-table construction; its “translates into a relational-algebra expression” becomes the Query node with its jointree, target list, and qualifier expressions.

The textbook gives the model — a syntactic phase feeding a semantic phase that resolves names against the catalog and emits a typed internal form. This section names the engineering conventions that recur across production semantic analyzers, the patterns the textbook leaves implicit. PostgreSQL’s specific choices in §“PostgreSQL’s Approach” are best read as one set of dials within this shared space.

A separate IR for “resolved”, distinct from “parsed”

Section titled “A separate IR for “resolved”, distinct from “parsed””

The raw parse tree mirrors surface syntax: a table is just a name, a column reference is just a list of identifiers (a.b.c), SELECT * is a literal star. The resolved tree must instead carry bound references: a table becomes an index into a per-query list of resolved relations, a column reference becomes a typed pointer at one such relation’s nth attribute, the star is gone — expanded into the actual columns. Keeping these two as distinct node types (rather than mutating the raw tree in place) means the syntactic phase stays a catalog-free pure function that can run in a frontend library, be cached, or be re-analyzed against a different catalog snapshot.

The range table: one indexed list of “relations this query reads”

Section titled “The range table: one indexed list of “relations this query reads””

The recurring representation for “which relations does this query touch” is a flat, 1-based indexed list — every table, subquery, function-in-FROM, join, or VALUES list the query references gets one slot. A resolved column reference then encodes its source as a pair (range-table index, attribute number) rather than carrying a table name. This indirection is what makes the rest of the stack tractable: the optimizer reorders joins by shuffling references to range-table indexes, never by rewriting names; two columns from the same table share one range-table slot; and a column’s identity survives every later transformation because it is a pair of integers, not a string.

Name resolution needs a notion of what is visible here. SQL scoping is not trivial — a subquery in FROM sees the outer query’s columns only through LATERAL; a join’s USING columns hide the inputs’ copies; an outer query level is searched only after the local one. The convention is a namespace structure, layered or stacked, that the analyzer grows as it walks into a FROM item and consults whenever it must turn an identifier into a bound reference. Resolving x means searching the current namespace for a relation exposing a column named x, erroring on ambiguity, and recursing to the parent scope only if the local search fails.

Clause ordering driven by data dependencies, not syntax order

Section titled “Clause ordering driven by data dependencies, not syntax order”

The clauses of a SELECT are written SELECT … FROM … WHERE … GROUP BY … HAVING … ORDER BY, but they cannot be analyzed in that order. WHERE and the target list both reference columns, so FROM (which populates the namespace) must be analyzed first. GROUP BY and DISTINCT may reference target-list aliases or ORDER BY sort items, so the target list and sort clause must be built before them. The convention is to analyze clauses in dependency order, not the order the user typed, and to thread the partially-built pieces (the growing target list especially) through the steps that extend them.

Auto-naming and implicit type coercion at the boundary

Section titled “Auto-naming and implicit type coercion at the boundary”

Two janitorial conventions almost every analyzer carries. First, output-column naming: an unnamed target-list expression (SELECT a+1) still needs a column label, so the analyzer derives one from the expression’s shape (the function name, the column name, or a fallback). Second, context-driven coercion: WHERE, HAVING, and CHECK expressions must be boolean; LIMIT must be an integer; a UNION’s two arms must share a type. The analyzer applies implicit casts at these well-known boundaries and errors when no cast exists.

By the time you reach a named PostgreSQL symbol in the next section, you should already know what kind of thing it is:

Theory / conventionPostgreSQL name
Syntactic phase output (raw parse tree)RawStmt wrapping SelectStmt / InsertStmt / … (from gram.y)
Semantic phase (this document)parse analysis — transformStmt and the transform* tree
Resolved/internal form (relational algebra)Query node (commandType, jointree, targetList, …)
Per-query analysis scratch stateParseState (pstate)
Range table (indexed list of relations)pstate->p_rtable — list of RangeTblEntry (RTE); index is varno
One range-table entryRangeTblEntry built by addRangeTableEntryForRelation & siblings
Bound column referenceVar carrying (varno, varattno, vartype, varcollid)
Visible-names scopepstate->p_namespace — list of ParseNamespaceItem (nsitem)
Per-column resolution dataParseNamespaceColumn (in each nsitem’s p_nscolumns)
FROM/join structure of the querypstate->p_joinlistQuery.jointree (a FromExpr)
Target-list column slot counterpstate->p_next_resno → each TargetEntry.resno
Star expansionExpandColumnRefStar / ExpandAllTables
Output-column auto-namingFigureColname
Boolean / integer coercion at clause edgescoerce_to_boolean, coerce_to_specific_type
Catalog name lookup for a tabletransformTableEntryaddRangeTableEntry (opens & locks the relation)

The syntactic phase — how gram.y builds the RawStmt and why it is catalog-free — is owned by postgres-parser.md. View expansion and rule application (DSC’s “replace uses of the view”) happen in a separate phase owned by the rewriter (postgres-rewriter.md). Expression-internal resolution — how transformExpr turns an A_Expr into an OpExpr, resolves function calls, and folds constants — is largely owned by postgres-parser.md’s expression section; this document treats transformExpr as a black box that returns a typed Node. What the planner does with the finished Query is owned by postgres-planner-overview.md. This document covers the transform framework: the ParseState, range-table and RTE construction, FROM/JOIN, the target list, GROUP/HAVING/ORDER resolution, and how the clauses thread together into a Query.

PostgreSQL’s parse analysis is the semantic phase rendered as a family of mutually-recursive transform* functions over the raw tree, with four design decisions that give it its shape:

  1. One mutable ParseState threads the whole pass. Rather than returning accumulators up the call stack, the analyzer carries a pstate that every transform* function reads and mutates: the range table grows in p_rtable, the visible scope in p_namespace, the join structure in p_joinlist, the next target-list slot in p_next_resno.
  2. The raw …Stmt node and the resolved Query node are distinct types. transformSelectStmt consumes a SelectStmt (syntactic) and produces a Query (resolved). The raw tree is never mutated into the result; the two trees have different node tags.
  3. Clauses are transformed in dependency order, not syntax order: FROM before everything (it fills the namespace), target list and ORDER BY before GROUP BY / DISTINCT (which reuse them), collation assignment and aggregate-legality checks last (they need the whole tree).
  4. A single dispatch on node tag. transformStmt switches on the raw statement’s node tag to pick transformSelectStmt, transformInsertStmt, etc.; anything not “optimizable” gets wrapped in a CMD_UTILITY Query and passed through untouched.

Parse analysis is entered through parse_analyze_fixedparams (and _varparams / _withcb variants for different parameter-handling needs). Each builds a fresh top-level ParseState, records the source text (for error cursor positions), and calls transformTopLevelStmt:

// parse_analyze_fixedparams — src/backend/parser/analyze.c (condensed)
ParseState *pstate = make_parsestate(NULL);
Query *query;
pstate->p_sourcetext = sourceText;
if (numParams > 0)
setup_parse_fixed_parameters(pstate, paramTypes, numParams);
pstate->p_queryEnv = queryEnv;
query = transformTopLevelStmt(pstate, parseTree);
// ... JumbleQuery, post_parse_analyze_hook ...
free_parsestate(pstate);
return query;

transformTopLevelStmt exists only to copy statement location data and to allow the top-level-only SELECT … INTOCREATE TABLE AS rewrite; it then funnels into transformStmt, the dispatcher:

// transformStmt — src/backend/parser/analyze.c (condensed)
switch (nodeTag(parseTree))
{
case T_InsertStmt:
result = transformInsertStmt(pstate, (InsertStmt *) parseTree);
break;
case T_DeleteStmt:
result = transformDeleteStmt(pstate, (DeleteStmt *) parseTree);
break;
case T_UpdateStmt:
result = transformUpdateStmt(pstate, (UpdateStmt *) parseTree);
break;
case T_SelectStmt:
{
SelectStmt *n = (SelectStmt *) parseTree;
if (n->valuesLists)
result = transformValuesClause(pstate, n);
else if (n->op == SETOP_NONE)
result = transformSelectStmt(pstate, n);
else
result = transformSetOperationStmt(pstate, n);
}
break;
// ... DeclareCursor, Explain, CreateTableAs, Call ...
default:
/* not optimizable: wrap a CMD_UTILITY Query around it */
result = makeNode(Query);
result->commandType = CMD_UTILITY;
result->utilityStmt = (Node *) parseTree;
break;
}

The default arm is the textbook’s “and so on” — CREATE TABLE, GRANT, and most DDL need no semantic analysis here (their analysis, if any, runs at execution time, per parse_utilcmd.c), so they are wrapped untouched. The recursion into subqueries uses a sibling entry point, parse_sub_analyze, which makes a child ParseState linked to the parent via parentParseState — that link is what lets an inner query resolve an outer column (a correlated reference).

flowchart TB
  RAW["RawStmt<br/>(from gram.y — syntactic only)"]
  PA["parse_analyze_fixedparams<br/>make_parsestate(NULL)"]
  TTLS["transformTopLevelStmt<br/>(SELECT INTO rewrite, location)"]
  TS["transformStmt<br/>(switch on nodeTag)"]
  RAW --> PA --> TTLS --> TS
  TS -->|T_SelectStmt| TSEL["transformSelectStmt"]
  TS -->|T_InsertStmt| TINS["transformInsertStmt"]
  TS -->|T_UpdateStmt| TUPD["transformUpdateStmt"]
  TS -->|T_DeleteStmt| TDEL["transformDeleteStmt"]
  TS -->|other| UTIL["CMD_UTILITY Query<br/>(passed through)"]
  TSEL --> Q["Query<br/>(resolved internal form)"]
  TINS --> Q
  TUPD --> Q
  TDEL --> Q

Figure 1 — The dispatch funnel. A catalog-free RawStmt from the grammar enters parse_analyze_fixedparams, which allocates the ParseState and dispatches by node tag in transformStmt. Optimizable statements get a real semantic transform; everything else is wrapped in a CMD_UTILITY Query and passed through. Every branch yields a resolved Query. (Source: analyze.c.)

ParseState — the scratchpad threaded through the whole pass

Section titled “ParseState — the scratchpad threaded through the whole pass”

Every transform* function takes a ParseState *pstate as its first argument. It is the analyzer’s working memory: the accumulating range table, the current name-visible scope, the join structure being built, flags recording what constructs were seen, and the link to the parent query level for correlated references.

// struct ParseState — src/include/parser/parse_node.h (condensed)
struct ParseState
{
ParseState *parentParseState; /* stack link to outer query */
const char *p_sourcetext; /* for error cursor positions */
List *p_rtable; /* range table so far (RangeTblEntry list) */
List *p_rteperminfos; /* permission info, one per RTE_RELATION */
List *p_joinlist; /* join items -> Query.jointree fromlist */
List *p_namespace; /* currently-referenceable RTEs (nsitems) */
bool p_lateral_active; /* are LATERAL-only names visible now? */
List *p_ctenamespace; /* visible WITH items */
Relation p_target_relation; /* INSERT/UPDATE/DELETE/MERGE target */
ParseExprKind p_expr_kind; /* what kind of expression we're in */
int p_next_resno; /* next TargetEntry.resno, from 1 */
/* flags recording what the query contains: */
bool p_hasAggs;
bool p_hasWindowFuncs;
bool p_hasTargetSRFs;
bool p_hasSubLinks;
/* ... hooks for parameter/columnref resolution ... */
};

Three fields carry the bulk of the work and deserve naming up front:

  • p_rtable is the range table — the indexed list of every relation the query reads. Its 1-based position is the varno that every resolved Var carries.
  • p_namespace is the visible scope — a list of ParseNamespaceItem (nsitem) describing which RTEs can currently be referenced, by qualified name (p_rel_visible) and/or by unqualified name (p_cols_visible). It is not the same as p_rtable: an RTE exists in the range table the moment it is added, but becomes visible only when its nsitem is pushed onto the namespace, and a later FROM item cannot see an earlier sibling except through LATERAL.
  • p_joinlist accumulates the top-level FROM items (RangeTblRefs and JoinExprs) that become the Query.jointree’s fromlist at the end.

transformSelectStmt is the canonical walk, and its body reads as a checklist of clauses in dependency order. The ordering comments in the source are themselves the design documentation:

// transformSelectStmt — src/backend/parser/analyze.c (condensed)
Query *qry = makeNode(Query);
qry->commandType = CMD_SELECT;
/* process the WITH clause independently of all else */
if (stmt->withClause) { ... qry->cteList = transformWithClause(pstate, ...); }
/* make FOR UPDATE/SHARE and WINDOW info available downstream */
pstate->p_locking_clause = stmt->lockingClause;
pstate->p_windowdefs = stmt->windowClause;
/* process the FROM clause --- fills p_rtable, p_namespace, p_joinlist */
transformFromClause(pstate, stmt->fromClause);
/* transform targetlist (expands '*', auto-names columns) */
qry->targetList = transformTargetList(pstate, stmt->targetList,
EXPR_KIND_SELECT_TARGET);
markTargetListOrigins(pstate, qry->targetList);
/* transform WHERE and HAVING (each coerced to boolean) */
qual = transformWhereClause(pstate, stmt->whereClause,
EXPR_KIND_WHERE, "WHERE");
qry->havingQual = transformWhereClause(pstate, stmt->havingClause,
EXPR_KIND_HAVING, "HAVING");
/* ORDER BY first because GROUP BY and DISTINCT both need its results */
qry->sortClause = transformSortClause(pstate, stmt->sortClause,
&qry->targetList, EXPR_KIND_ORDER_BY, false);
qry->groupClause = transformGroupClause(pstate, stmt->groupClause,
&qry->groupingSets, &qry->targetList,
qry->sortClause, EXPR_KIND_GROUP_BY, false);
/* ... DISTINCT / DISTINCT ON, LIMIT/OFFSET, window clauses ... */
qry->rtable = pstate->p_rtable;
qry->rteperminfos = pstate->p_rteperminfos;
qry->jointree = makeFromExpr(pstate->p_joinlist, qual);
qry->hasSubLinks = pstate->p_hasSubLinks;
qry->hasAggs = pstate->p_hasAggs;
assign_query_collations(pstate, qry);
if (pstate->p_hasAggs || qry->groupClause || qry->groupingSets || qry->havingQual)
parseCheckAggregates(pstate, qry);
return qry;

Two ordering decisions are worth dwelling on, because they are the non-obvious part:

  • FROM before the target list and WHERE. Both reference columns, and a column reference can only be resolved once its source RTE is in p_namespace. So transformFromClause runs first and fully populates the namespace before any expression is transformed.
  • ORDER BY before GROUP BY and DISTINCT. The source comment says it plainly: “Do ORDER BY first because both transformGroupClause and transformDistinctClause need the results.” All three may add resjunk (hidden) entries to the target list, and they coordinate through the shared, by-reference &qry->targetList. The sort clause is built first so the grouping/distinct steps can match against it.

The final three lines are the whole-tree passes: the jointree is assembled from the accumulated p_joinlist plus the WHERE qual; assign_query_collations walks every expression assigning collation; and parseCheckAggregates enforces the SQL rule that every non-aggregated target-list column appears in GROUP BY.

flowchart TB
  WITH["WITH (CTEs)"]
  FROM["transformFromClause<br/>builds RTEs + namespace + joinlist"]
  TL["transformTargetList<br/>'*' expansion, FigureColname"]
  WH["transformWhereClause<br/>coerce_to_boolean"]
  HAV["HAVING<br/>coerce_to_boolean"]
  OB["transformSortClause<br/>(ORDER BY first)"]
  GB["transformGroupClause<br/>DISTINCT"]
  LIM["LIMIT / OFFSET / windows"]
  COLL["assign_query_collations"]
  AGG["parseCheckAggregates"]
  QRY["assemble Query<br/>rtable, jointree, targetList"]
  WITH --> FROM --> TL --> WH --> HAV --> OB --> GB --> LIM --> QRY --> COLL --> AGG

Figure 2 — Clause transform order inside transformSelectStmt. The order is driven by data dependencies, not the SQL keyword order: FROM populates the namespace before anything references columns; ORDER BY precedes GROUP BY/DISTINCT because the latter reuse the sort target list; collation assignment and the aggregate-legality check run last over the assembled tree. (Source: analyze.c transformSelectStmt.)

transformFromClause iterates the raw FROM list left-to-right (order matters for LATERAL visibility), transforming each item into a join node and merging its names into the namespace:

// transformFromClause — src/backend/parser/parse_clause.c (condensed)
foreach(fl, frmList)
{
Node *n = lfirst(fl);
ParseNamespaceItem *nsitem;
List *namespace;
n = transformFromClauseItem(pstate, n, &nsitem, &namespace);
checkNameSpaceConflicts(pstate, pstate->p_namespace, namespace);
/* new items are LATERAL-only until the whole FROM list is done */
setNamespaceLateralState(namespace, true, true);
pstate->p_joinlist = lappend(pstate->p_joinlist, n);
pstate->p_namespace = list_concat(pstate->p_namespace, namespace);
}
/* once FROM is fully parsed, make everything unconditionally visible */
setNamespaceLateralState(pstate->p_namespace, false, true);

transformFromClauseItem is a type switch over the kinds of FROM item. A plain table (RangeVar), a sub-SELECT (RangeSubselect), a set-returning function (RangeFunction), JSON_TABLE, TABLESAMPLE, and explicit JoinExprs each route to a dedicated transform that adds one or more RTEs and returns the nsitem(s) it exposes:

// transformFromClauseItem — src/backend/parser/parse_clause.c (condensed)
if (IsA(n, RangeVar))
{
/* Plain relation reference, or perhaps a CTE reference */
nsitem = getNSItemForSpecialRelationTypes(pstate, rv);
if (!nsitem)
nsitem = transformTableEntry(pstate, rv); /* opens + locks the table */
*top_nsitem = nsitem;
*namespace = list_make1(nsitem);
rtr = makeNode(RangeTblRef);
rtr->rtindex = nsitem->p_rtindex; /* points at the new RTE */
return (Node *) rtr;
}
else if (IsA(n, RangeSubselect)) { ... transformRangeSubselect ... }
else if (IsA(n, JoinExpr)) { ... recurse into larg / rarg ... }

A plain table eventually reaches transformTableEntryaddRangeTableEntry, which is where the textbook’s “verifies that the relation names appearing in the query are names of relations in the database” actually happens — the relation is opened from the catalog (taking the appropriate lock), and a RangeTblEntry is built from its TupleDesc:

// addRangeTableEntryForRelation — src/backend/parser/parse_relation.c (condensed)
RangeTblEntry *rte = makeNode(RangeTblEntry);
char *refname = alias ? alias->aliasname : RelationGetRelationName(rel);
rte->rtekind = RTE_RELATION;
rte->relid = RelationGetRelid(rel); /* the catalog OID */
rte->relkind = rel->rd_rel->relkind;
rte->rellockmode = lockmode;
rte->eref = makeAlias(refname, NIL);
buildRelationAliases(rel->rd_att, alias, rte->eref); /* column names */
perminfo = addRTEPermissionInfo(&pstate->p_rteperminfos, rte);
perminfo->requiredPerms = ACL_SELECT;
/* append to the range table; its 1-based position is the varno */
pstate->p_rtable = lappend(pstate->p_rtable, rte);
return buildNSItemFromTupleDesc(rte, list_length(pstate->p_rtable),
perminfo, rel->rd_att);

The returned nsitem is the resolution table for this relation’s columns. buildNSItemFromTupleDesc walks the relation’s TupleDesc and records, per column, exactly the data needed to later synthesize a Var:

// buildNSItemFromTupleDesc — src/backend/parser/parse_relation.c (condensed)
for (varattno = 0; varattno < maxattrs; varattno++)
{
Form_pg_attribute attr = TupleDescAttr(tupdesc, varattno);
if (attr->attisdropped)
continue; /* dropped column: leave entry all-zero */
nscolumns[varattno].p_varno = rtindex; /* range-table index */
nscolumns[varattno].p_varattno = varattno + 1; /* 1-based attno */
nscolumns[varattno].p_vartype = attr->atttypid;
nscolumns[varattno].p_vartypmod = attr->atttypmod;
nscolumns[varattno].p_varcollid = attr->attcollation;
nscolumns[varattno].p_varnosyn = rtindex;
nscolumns[varattno].p_varattnosyn = varattno + 1;
}
nsitem->p_rte = rte;
nsitem->p_rtindex = rtindex;
nsitem->p_nscolumns = nscolumns;
nsitem->p_rel_visible = true; /* qualified refs OK */
nsitem->p_cols_visible = true; /* unqualified refs OK */

So the relationship is: the RTE goes in p_rtable (giving it a varno); the nsitem goes in p_namespace (making it referenceable); and each ParseNamespaceColumn is the recipe for the Var that a column reference to that column will become. When transformExpr later resolves t.c, it finds the nsitem for t, locates column c, and copies (p_varno, p_varattno, p_vartype, p_varcollid) straight into a fresh Var.

For an explicit JoinExpr, transformFromClauseItem recurses into the left arm, then temporarily exposes the left arm’s names so the right arm can LATERAL-reference them, processes the right arm, then removes them again:

// transformFromClauseItem (JoinExpr arm) — parse_clause.c (condensed)
j->larg = transformFromClauseItem(pstate, j->larg, &l_nsitem, &l_namespace);
/* expose left side to the right side for LATERAL, then process RHS */
lateral_ok = (j->jointype == JOIN_INNER || j->jointype == JOIN_LEFT);
setNamespaceLateralState(l_namespace, true, lateral_ok);
sv_namespace_length = list_length(pstate->p_namespace);
pstate->p_namespace = list_concat(pstate->p_namespace, l_namespace);
j->rarg = transformFromClauseItem(pstate, j->rarg, &r_nsitem, &r_namespace);
/* remove the left-side RTEs from the namespace again */
pstate->p_namespace = list_truncate(pstate->p_namespace, sv_namespace_length);
checkNameSpaceConflicts(pstate, l_namespace, r_namespace);

For NATURAL joins and USING, the code then walks the two sides’ column-name lists to compute the merged join columns; the join itself becomes an RTE_JOIN entry whose nsitem exposes the merged column set. The subtle visibility rule the ParseNamespaceItem comment encodes — a join without an alias does not hide its member tables for qualified references but does hide their columns for unqualified ones — is exactly why p_rel_visible and p_cols_visible are two separate booleans rather than one.

The target list — star expansion and auto-naming

Section titled “The target list — star expansion and auto-naming”

transformTargetList turns the raw list of ResTarget nodes into a list of TargetEntry nodes. Its two special jobs are expanding something.* and auto-naming unlabeled columns:

// transformTargetList — src/backend/parser/parse_target.c (condensed)
expand_star = (exprKind != EXPR_KIND_UPDATE_SOURCE);
foreach(o_target, targetlist)
{
ResTarget *res = (ResTarget *) lfirst(o_target);
if (expand_star && IsA(res->val, ColumnRef) &&
IsA(llast(((ColumnRef *) res->val)->fields), A_Star))
{
/* "tbl.*" or "*": expand into one target per column */
p_target = list_concat(p_target,
ExpandColumnRefStar(pstate,
(ColumnRef *) res->val, true));
continue;
}
/* ordinary expression */
p_target = lappend(p_target,
transformTargetEntry(pstate, res->val, NULL,
exprKind, res->name, false));
}

transformTargetEntry transforms the expression (via transformExpr) and, crucially, supplies a column name when the user didn’t write AS …:

// transformTargetEntry — src/backend/parser/parse_target.c (condensed)
if (expr == NULL)
expr = transformExpr(pstate, node, exprKind);
if (colname == NULL && !resjunk)
colname = FigureColname(node); /* derive a label from the expr shape */
return makeTargetEntry((Expr *) expr,
(AttrNumber) pstate->p_next_resno++, /* assign resno */
colname, resjunk);

pstate->p_next_resno++ is the target-list column counter: each TargetEntry gets the next resno, starting at 1. FigureColname descends the expression to pick a label — the last field name of a ColumnRef, a function’s name for a FuncCall, and so on — falling back to the literal "?column?" when nothing better is available:

// FigureColname — src/backend/parser/parse_target.c (condensed)
char *FigureColname(Node *node)
{
char *name = NULL;
(void) FigureColnameInternal(node, &name);
if (name != NULL)
return name;
return "?column?"; /* the famous default */
}

After the list is built, markTargetListOrigins annotates each TargetEntry with the underlying table/column it came from (resorigtbl / resorigcol) when the expression is a simple Var of a real relation — this is what lets psql \d-style tools and information_schema report a result column’s source table. It drills through subqueries and CTEs but deliberately not through joins or views.

Resolving GROUP BY / ORDER BY against the target list

Section titled “Resolving GROUP BY / ORDER BY against the target list”

ORDER BY and GROUP BY items can be written three ways — a full expression, a target-list column alias, or an output-column number (ORDER BY 1). findTargetlistEntrySQL92 implements the SQL92 alias/number rules: a bare identifier is matched against existing target-list names (erroring on a non-equal duplicate), and an integer constant selects the nth output column:

// findTargetlistEntrySQL92 — src/backend/parser/parse_clause.c (condensed)
if (IsA(node, ColumnRef) && list_length(cref->fields) == 1 &&
IsA(linitial(cref->fields), String))
{
char *name = strVal(linitial(cref->fields));
if (exprKind == EXPR_KIND_GROUP_BY)
{
/* GROUP BY prefers a FROM-column match over a tlist alias */
if (colNameToVar(pstate, name, true, location) != NULL)
name = NULL;
}
if (name != NULL)
{
foreach(tl, *tlist) /* match against existing tlist names */
{
TargetEntry *tle = (TargetEntry *) lfirst(tl);
if (!tle->resjunk && strcmp(tle->resname, name) == 0)
{
if (target_result != NULL &&
!equal(target_result->expr, tle->expr))
ereport(ERROR, (errcode(ERRCODE_AMBIGUOUS_COLUMN), ...));
target_result = tle;
}
}
if (target_result != NULL) return target_result;
}
}
if (IsA(node, A_Const)) { /* "ORDER BY 3" -> the 3rd output column */ }

The EXPR_KIND_GROUP_BY branch encodes a genuine SQL subtlety: per SQL92, an identifier in GROUP BY refers to a FROM column, not a target-list alias, so the code first checks whether the name resolves as a FROM column (colNameToVar) and only falls through to the tlist-alias rules if it doesn’t. transformSortClause and transformGroupClause both go through this resolver and then call addTargetToSortList, which appends a resjunk target-list entry if the sort/group expression isn’t already an output column — this is why SELECT a FROM t ORDER BY b works even though b is not selected.

Where each clause’s expression resolution lands

Section titled “Where each clause’s expression resolution lands”

The leaf of every clause transform is transformExpr (owned by postgres-parser.md’s expression section), called with a ParseExprKind tag that tells it the context. transformWhereClause is representative — transform, then coerce:

// transformWhereClause — src/backend/parser/parse_clause.c
Node *qual = transformExpr(pstate, clause, exprKind);
qual = coerce_to_boolean(pstate, qual, constructName);
return qual;

The ParseExprKind enum (EXPR_KIND_WHERE, EXPR_KIND_HAVING, EXPR_KIND_GROUP_BY, EXPR_KIND_LIMIT, …) is how one transformExpr produces context-specific error messages — e.g., “aggregate functions are not allowed in WHERE” — without each clause needing its own expression walker. It is the single most reused piece of context in the whole pass.

The symbols below are the stable anchors of the parse-analysis pass. Line numbers drift; grep the symbol name to relocate. The position table at the end records line numbers as observed at the documented revision.

  • parse_analyze_fixedparams — top-level entry: builds a root ParseState, sets source text and fixed $n parameter types, calls transformTopLevelStmt, runs JumbleQuery and post_parse_analyze_hook, frees the pstate. Siblings parse_analyze_varparams (inferred param types) and parse_analyze_withcb (caller-supplied param hook).
  • parse_sub_analyze — recursive entry for a sub-statement; makes a child ParseState with parentParseState set, so the subquery can resolve correlated references to the outer query.
  • transformTopLevelStmt — copies stmt_location / stmt_len into the result and allows the top-only SELECT … INTO rewrite via transformOptionalSelectInto.
  • transformStmt — the node-tag dispatcher. Routes T_SelectStmt/T_InsertStmt/T_UpdateStmt/T_DeleteStmt/ T_MergeStmt to their transforms; wraps everything else in a CMD_UTILITY Query. Marks the result QSRC_ORIGINAL, canSetTag = true.
  • stmt_requires_parse_analysis — companion predicate (true exactly for the statement types transformStmt does non-trivial work on); used by callers to decide whether re-analysis is ever needed.
  • transformSelectStmt — the spine: WITH, FROM, target list, WHERE/HAVING, ORDER BY → GROUP BY → DISTINCT, LIMIT, windows, collations, aggregate check. Assembles the Query.
  • transformValuesClause — standalone VALUES (…) as a SELECT; builds a column-organized intermediate then an RTE_VALUES.
  • transformInsertStmt / transformUpdateStmt / transformDeleteStmt — the DML transforms; each calls setTargetTable to add and lock the target relation, then transform their respective clauses (RETURNING via transformReturningList).

ParseState and namespace (parse_node.h, parse_node.c)

Section titled “ParseState and namespace (parse_node.h, parse_node.c)”
  • struct ParseState — the threaded scratch state; p_rtable, p_namespace, p_joinlist, p_next_resno are the load-bearing fields.
  • make_parsestate / free_parsestate — allocate/free the pstate; make_parsestate inherits a parent for subqueries.
  • struct ParseNamespaceItem — one visible relation: its RTE, range-table index, per-column data, and the p_rel_visible / p_cols_visible / p_lateral_only visibility flags.
  • struct ParseNamespaceColumn — per-column Var-construction recipe (p_varno, p_varattno, p_vartype, p_varcollid, syntactic variants).
  • ParseExprKind (enum) — the per-context tag threaded into transformExpr.

FROM / range table (parse_clause.c, parse_relation.c)

Section titled “FROM / range table (parse_clause.c, parse_relation.c)”
  • transformFromClause — iterate FROM items left-to-right; merge namespaces; build p_joinlist.
  • transformFromClauseItem — type switch over RangeVar / RangeSubselect / RangeFunction / RangeTableFunc / JsonTable / RangeTableSample / JoinExpr; recurses for joins with the LATERAL-visibility dance.
  • transformTableEntryaddRangeTableEntry / addRangeTableEntryForRelation — open & lock the relation, build the RTE_RELATION, append to p_rtable.
  • addRangeTableEntryForSubqueryRTE_SUBQUERY from an analyzed sub-Query.
  • buildNSItemFromTupleDesc — fill ParseNamespaceColumn[] from a relation’s TupleDesc.
  • addNSItemToQuery — push an nsitem onto joinlist and/or namespace with the right visibility flags.
  • setTargetTable — add + lock the INSERT/UPDATE/DELETE/MERGE target, set p_target_relation / p_target_nsitem.
  • scanRTEForColumn / colNameToVar — resolve an (unqualified) column name to an attribute / Var, erroring on ambiguity.
  • GetRTEByRangeTablePosn — fetch the RTE at a (varno, levelsup), walking up parent pstates.
  • addRTEPermissionInfo — record the RTEPermissionInfo for an RTE_RELATION (permissions checked later, at execution).
  • transformTargetListResTarget list → TargetEntry list, with something.* expansion.
  • transformTargetEntry — transform one expression, assign resno via p_next_resno++, auto-name via FigureColname.
  • transformExpressionList — same as the target list but for bare expressions (ROW(), VALUES).
  • ExpandColumnRefStar / ExpandAllTables — expand tbl.* / bare * into per-column targets.
  • FigureColname / FigureColnameInternal — derive an output column name from an expression’s shape; "?column?" fallback.
  • markTargetListOrigins / markTargetListOrigin — set resorigtbl / resorigcol for simple-Var targets.
  • resolveTargetListUnknowns — coerce still-unknown output columns to text.

Sorting / grouping / clause coercion (parse_clause.c)

Section titled “Sorting / grouping / clause coercion (parse_clause.c)”
  • transformSortClause — ORDER BY → SortGroupClause list (run before GROUP BY).
  • transformGroupClause / transformGroupingSet — GROUP BY, including CUBE/ROLLUP/GROUPING SETS flattening.
  • transformDistinctClause / transformDistinctOnClause — DISTINCT and DISTINCT ON.
  • findTargetlistEntrySQL92 / findTargetlistEntrySQL99 — resolve a sort/group item as a tlist alias, output-column number, or full expression.
  • transformWhereClause — transform + coerce_to_boolean.
  • transformLimitClause — transform + coerce_to_specific_type (INT8) + var-free check.
  • transformWindowDefinitionsWindowDefWindowClause.
  • makeFromExpr (makefuncs.c) — wrap p_joinlist + WHERE qual into the Query.jointree.
  • assign_query_collations (parse_collate.c) — assign collations across all expressions.
  • parseCheckAggregates (parse_agg.c) — enforce the GROUP-BY/aggregate legality rule.
  • transformWithClause (parse_cte.c) — analyze WITH items into cteList.
  • transformExpr (parse_expr.c) — the per-expression resolver every clause bottoms out in (black box here; owned by postgres-parser.md).

Position hints (as of 2026-06-05, REL_18 273fe94)

Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”
SymbolFileLine
parse_analyze_fixedparamsparser/analyze.c105
parse_sub_analyzeparser/analyze.c222
transformTopLevelStmtparser/analyze.c249
transformOptionalSelectIntoparser/analyze.c273
transformStmtparser/analyze.c312
stmt_requires_parse_analysisparser/analyze.c447
transformDeleteStmtparser/analyze.c553
transformInsertStmtparser/analyze.c633
transformSelectStmtparser/analyze.c1389
transformValuesClauseparser/analyze.c1532
transformUpdateStmtparser/analyze.c2472
struct ParseStateinclude/parser/parse_node.h192
struct ParseNamespaceIteminclude/parser/parse_node.h292
struct ParseNamespaceColumninclude/parser/parse_node.h328
ParseExprKind (enum)include/parser/parse_node.h38
make_parsestateparser/parse_node.c39
transformFromClauseparser/parse_clause.c112
transformTableEntryparser/parse_clause.c395
transformFromClauseItemparser/parse_clause.c1054
transformWhereClauseparser/parse_clause.c1854
transformLimitClauseparser/parse_clause.c1881
findTargetlistEntrySQL92parser/parse_clause.c2006
findTargetlistEntrySQL99parser/parse_clause.c2172
transformGroupClauseparser/parse_clause.c2632
transformSortClauseparser/parse_clause.c2732
transformWindowDefinitionsparser/parse_clause.c2765
GetRTEByRangeTablePosnparser/parse_relation.c545
scanRTEForColumnparser/parse_relation.c815
colNameToVarparser/parse_relation.c898
buildNSItemFromTupleDescparser/parse_relation.c1309
addRangeTableEntryparser/parse_relation.c1487
addRangeTableEntryForRelationparser/parse_relation.c1584
addRangeTableEntryForSubqueryparser/parse_relation.c1655
addNSItemToQueryparser/parse_relation.c2710
addRTEPermissionInfoparser/parse_relation.c3980
transformTargetEntryparser/parse_target.c75
transformTargetListparser/parse_target.c121
transformExpressionListparser/parse_target.c220
resolveTargetListUnknownsparser/parse_target.c288
markTargetListOriginsparser/parse_target.c318
ExpandColumnRefStarparser/parse_target.c1123
ExpandAllTablesparser/parse_target.c1296
FigureColnameparser/parse_target.c1713
assign_query_collationsparser/parse_collate.c101
parseCheckAggregatesparser/parse_agg.c1138
transformWithClauseparser/parse_cte.c110
transformExprparser/parse_expr.c119
makeFromExprnodes/makefuncs.c336
  • The grammar output (RawStmt) names no catalog objects; parse analysis is where catalog lookup first happens. Verified by reading transformTableEntryaddRangeTableEntryaddRangeTableEntryForRelation in parse_clause.c / parse_relation.c on 2026-06-05: the relation is opened (RelationGetRelid, lock taken) inside parse analysis, not by the grammar. This is the concrete realization of DSC §15.1’s “verifies that the relation names … are names of the relations in the database.”
  • transformStmt wraps every non-optimizable statement in a CMD_UTILITY Query and does no further analysis. Verified in the default: arm of transformStmt (analyze.c) on 2026-06-05. DDL semantic analysis (parse_utilcmd.c) is deferred to execution time — confirmed by the parser/README line “parse_utilcmd.c parse analysis for utility commands (done at execution time).”
  • Clause transform order is dependency-driven, not SQL-keyword order. Verified by reading transformSelectStmt (analyze.c): transformFromClause runs before the target list and WHERE, and the source comment “Do ORDER BY first because both transformGroupClause and transformDistinctClause need the results” precedes transformSortClause, which is called before transformGroupClause and the DISTINCT transforms.
  • A column’s resolved identity is (varno, varattno) plus type and collation, copied from the nsitem’s ParseNamespaceColumn. Verified in buildNSItemFromTupleDesc (parse_relation.c): each ParseNamespaceColumn records p_varno, p_varattno, p_vartype, p_vartypmod, p_varcollid — exactly the fields a Var needs.
  • The default output-column name for an un-nameable expression is the literal string "?column?". Verified in FigureColname (parse_target.c) on 2026-06-05 — a hard-coded fallback, not a parameter.
  • p_rel_visible and p_cols_visible are two independent booleans so that an unaliased JOIN can keep its member tables qualified-name-visible while hiding their columns from unqualified references. Verified in the struct ParseNamespaceItem comment and the join arm of transformFromClauseItem on 2026-06-05.
  • LATERAL visibility is implemented by temporarily concatenating the left arm’s namespace before processing the right arm, then truncating it back. Verified in the JoinExpr arm of transformFromClauseItem (list_concatlist_truncate around the RHS recursion) on 2026-06-05.
  • WHERE/HAVING are coerced to boolean and LIMIT to int8 inside the clause transforms. Verified: transformWhereClause calls coerce_to_boolean; transformLimitClause calls coerce_to_specific_type(..., INT8OID, ...) and checkExprIsVarFree on 2026-06-05.
  1. Lock-ordering guarantee for self-referential DML. The comment on setTargetTable claims the write lock on the target must be grabbed before any read lock “in case the target is also mentioned as a source relation.” Does any later code path (e.g., a CTE that re-references the target, or a rule expansion) violate this intended ordering? Investigation path: trace setTargetTable callers in transformInsertStmt / transformUpdateStmt / transformMergeStmt and compare against transformFromClause lock acquisition order.
  2. Cost of re-analysis under plan invalidation. Because stmt_requires_parse_analysis returning false lets callers skip the whole analyze/rewrite/plan pipeline, when exactly is parse analysis re-run for a cached plan whose catalog dependency changed? Investigation path: follow RevalidateCachedQuery in plancache.c and where it re-enters parse_analyze_* — this crosses into territory owned by a future postgres-plan-cache.md.
  3. markTargetListOrigins and security-barrier views. The function deliberately does not drill through views (reports the view as owner). Does this interact with row-level security or security-barrier view leakproofness in any observable way for information_schema consumers? Investigation path: cross-check with the rewriter’s view-expansion handling (postgres-rewriter.md, not yet written).

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”
  • The raw-tree / query-tree split as a reuse point. PostgreSQL’s hard line between the catalog-free RawStmt and the resolved Query is the same seam that lets libpg_query (pganalyze) ship the grammar as a standalone library for SQL linters and ORMs — they get step 1 (parsing) without a running server. A comparison with how SQLite fuses parse + resolve + codegen into one pass (no separate resolved IR at all) would sharpen why PostgreSQL pays for two trees.
  • Range table as the join-reordering substrate. PostgreSQL encodes every column reference as (varno, varattno) so the optimizer can reorder joins by permuting range-table indexes. This is the classic System R / Selinger (SIGMOD 1979) representation; the next document to write is postgres-planner-overview.md’s account of how RelOptInfo keys off the range table. Cross-ref: this doc’s RTE construction is the input to that.
  • Name resolution and binding in the compiler literature. The ParseState namespace stack is the database analog of a compiler’s symbol-table / scope chain (Aho, Lam, Sethi & Ullman, Compilers, ch. 2 & 6). The LATERAL “expose then truncate” dance is essentially nested-scope management; a note connecting it to lexical scoping rules in block-structured languages would orient compiler-trained readers.
  • View expansion: analyze-time vs. rewrite-time. DSC §15.1 files view substitution under “translation,” but PostgreSQL defers it to a separate rule-based rewriter (Stonebraker’s POSTGRES rule system, The Design of POSTGRES, SIGMOD 1986). The trade-off — a clean analyze/rewrite boundary at the cost of a second tree-walk — is the subject of the not-yet-written postgres-rewriter.md; this doc’s Query is exactly that rewriter’s input.
  • Holistic semantic checking vs. streaming. PostgreSQL does several whole-tree passes at the end (assign_query_collations, parseCheckAggregates). An engine that wanted lower latency to first error might interleave these into the single descent; a comparison with how DuckDB’s binder structures its passes would reveal what PostgreSQL trades for the simpler two-stage shape.

Raw analysis materials consumed: none — this document was synthesized directly from the REL_18 source tree and the textbook references below (sources: [] in frontmatter).

Textbook chapters:

  • Silberschatz, Korth & Sudarshan, Database System Concepts, 7th ed., ch. 15 “Query Processing”, §15.1 “Overview” — the parsing/translation/optimization/evaluation framing and the “verifies that relation names are names of relations” / “translates into a relational-algebra expression” account of semantic analysis and view expansion. Captured at knowledge/research/dbms-general/database-system-concepts.md.
  • Hellerstein, Stonebraker & Hamilton, Architecture of a Database System (FnT 2007), §4 “Relational Query Processor” — the query processor as a pipeline of parser → rewriter → optimizer → executor. Captured under dbms-papers/fntdb07-architecture.md.

Source code (REL_18, 273fe94, as of 2026-06-05):

  • src/backend/parser/README — directory map; the syntactic/semantic split and the “done at execution time” note for utility commands.
  • src/backend/parser/analyze.c — entry points, transformStmt dispatch, transformSelectStmt and the DML transforms.
  • src/backend/parser/parse_clause.c — FROM/JOIN, WHERE/HAVING/LIMIT, ORDER BY / GROUP BY resolution.
  • src/backend/parser/parse_relation.c — range-table and RTE construction, namespace/nsitem building, column resolution.
  • src/backend/parser/parse_target.c — target-list transform, star expansion, FigureColname, origin marking.
  • src/backend/parser/parse_node.c, src/include/parser/parse_node.hParseState, ParseNamespaceItem, ParseNamespaceColumn, ParseExprKind.
  • src/backend/parser/parse_collate.c, parse_agg.c, parse_cte.c, parse_expr.c, src/backend/nodes/makefuncs.c — the finishing passes and the expression-resolution leaf.

Related documents in this tree:

  • postgres-parser.md — the syntactic phase (gram.y, scanner) and expression-internal resolution (transformExpr internals).
  • postgres-planner-overview.md — what the planner does with the finished Query.
  • postgres-executor.md — the demand-pull execution of the resulting plan.
  • postgres-overview-query-processing.md — the query-processing subsystem map this document sits within.