PostgreSQL Query Processing — Section Overview

Contents:

What this section covers
The pipeline
Reading order
Detail-doc summaries
Adjacent sections

What this section covers

This subcategory is the read/compute path: everything between a SQL string arriving in a backend and a Plan tree producing tuples. It is the classic five-stage pipeline — parse → analyze → rewrite → plan → execute — plus the four things that hang off that spine: parallel query, prepared statements and portals, planner statistics, and the Node infrastructure the whole engine builds its trees from.

The per-stage entry points are functions in tcop/postgres.c (pg_parse_query, pg_analyze_and_rewrite_fixedparams, pg_rewrite_query, pg_plan_queries), so that file is the cheapest place to see the pipeline laid out as straight-line code before diving into any one stage.

Scope, with the sharp boundaries this section does not cross:

Where the SQL came from is out of scope. The wire protocol that delivers the query string, the simple-vs-extended query messages, and authentication belong to client-protocol. This section starts once a backend has a query_string in hand.
What the executor’s scans actually touch is out of scope. When a SeqScan or IndexScan node pulls a tuple, it calls a table access method or index access method — and those, plus the buffer manager and on-page layout below them, are owned by storage-engine. This section describes the node that drives the scan; storage-engine owns the heap page it lands on.
Visibility and locking under the scan are out of scope. Whether a tuple a scan reads is visible to the current snapshot, and what locks the executor takes, belong to txn-recovery (MVCC, snapshots, procarray) and server-architecture (the lock manager). The executor asks; those sections answer.
DDL and utility statements are out of scope. pg_plan_queries only plans optimizable statements (SELECT/INSERT/UPDATE/DELETE/ MERGE); a CREATE TABLE is a utility statement routed through ProcessUtility, which lives in ddl-schema. This section is the optimizable-query path only.
The catalog the planner reads is out of scope as a subsystem. The optimizer’s cost model consumes per-column statistics and relation metadata, but the relcache/catcache machinery that serves them belongs to system-catalog. This section owns the consumer (selectivity estimation, extended statistics), not the cache.

Within scope, the load-bearing claims a reader should carry forward: plan generation has two representations — cheapest-first Path nodes during search, then a single executable Plan tree built from the winner; join ordering is dynamic programming with a GEQO genetic fallback once the join count crosses geqo_threshold; and the executor is a demand-pull tree where every node answers ExecProcNode by pulling from its children.

The pipeline

flowchart TB
  SQL["SQL text (query_string)"]

  subgraph FRONT["front of the pipeline — tcop/postgres.c drives it"]
    PARSE["postgres-parser<br/>pg_parse_query: scan.l + gram.y<br/>-> raw parse tree (RawStmt)"]
    ANALYZE["postgres-analyze-transform<br/>parse_analyze: name resolution,<br/>type coercion -> Query tree"]
    REWRITE["postgres-rewriter<br/>rule system: view expansion,<br/>RLS, DO INSTEAD -> Query list"]
  end
  SQL --> PARSE --> ANALYZE --> REWRITE

  subgraph PLAN["the planner / optimizer — optimizer/README"]
    PREP["postgres-prep-rewrites<br/>subquery pull-up, qual canon"]
    PATHS["postgres-path-generation<br/>candidate Paths, cheapest-first"]
    JOINS["postgres-join-ordering<br/>DP join search + GEQO fallback"]
    COST["postgres-cost-model<br/>costsize.c + selectivity"]
    CREATE["postgres-plan-creation<br/>winning Path -> executable Plan tree"]
    STATS["postgres-extended-statistics<br/>multi-column stats for selectivity"]
  end
  REWRITE --> PREP --> PATHS
  PATHS --> JOINS
  COST -. feeds .-> PATHS
  COST -. feeds .-> JOINS
  STATS -. feeds .-> COST
  JOINS --> CREATE

  subgraph EXEC["the executor — executor/README"]
    PORTAL["postgres-portals-prepared<br/>PortalStart/PortalRun, plan cache"]
    DRIVER["postgres-executor<br/>ExecProcNode demand-pull tree"]
    EXPR["postgres-expression-eval<br/>switch-threaded interpreter"]
    SCANS["postgres-scan-nodes"]
    JOINNODES["postgres-join-nodes"]
    AGGSORT["postgres-agg-sort-nodes"]
    PARALLEL["postgres-parallel-query<br/>Gather forks workers over shm_mq"]
    JIT["postgres-jit<br/>optional LLVM compile of exprs"]
    SORT["postgres-tuplesort<br/>external merge sort / spill"]
  end
  CREATE --> PORTAL --> DRIVER
  DRIVER --> SCANS
  DRIVER --> JOINNODES
  DRIVER --> AGGSORT
  DRIVER --> PARALLEL
  DRIVER --> EXPR
  EXPR -. optionally .-> JIT
  AGGSORT -. spills via .-> SORT

  NODES["postgres-node-trees<br/>the Node/List substrate every tree above is built from"]
  NODES -. underpins .-> FRONT
  NODES -. underpins .-> PLAN
  NODES -. underpins .-> EXEC

  SCANS --> AM["table / index access methods (storage-engine)"]

Three structural facts the diagram encodes:

Trees, not streams. Each stage hands the next a tree of Nodes: the raw parse tree, the Query tree, the Plan tree, the runtime PlanState tree. postgres-node-trees owns the Node/List machinery (the gen_node_support.pl-generated copy/equal/out/read functions) that every other doc in this section assumes.
The planner has two grammars. During search it builds and compares Path nodes (cheap to make, cheapest-first); only the winner is turned into a Plan by postgres-plan-creation. Cost (postgres-cost-model) and statistics (postgres-extended-statistics) are inputs to search, not stages — hence the dotted “feeds” edges.
Parallel query is a fork of the same tree. A Gather/GatherMerge node launches background workers that run a copy of the sub-plan and stream tuples back over shm_mq; it is not a separate execution model.

Reading order

Cross-referenced-first, so each doc leans on ones already read:

postgres-node-trees — read this first. Everything downstream is a tree of Nodes; the copy/equal/out/read and List conventions are assumed by every other doc here.
The front of the pipeline — postgres-parser → postgres-analyze-transform → postgres-rewriter. Short, linear, and they establish the Query tree the planner consumes.
The planner — postgres-planner-overview first (it frames the whole optimizer and is the home of optimizer/README), then postgres-path-generation and postgres-join-ordering for the search, postgres-cost-model and postgres-extended-statistics for the inputs, and postgres-plan-creation for Path→Plan. postgres-prep-rewrites can trail (it is the planner’s preprocessing).
The executor — postgres-executor (the demand-pull driver) before the per-node docs postgres-scan-nodes, postgres-join-nodes, postgres-agg-sort-nodes. postgres-expression-eval pairs with the executor; postgres-tuplesort is the spill machinery agg/sort nodes lean on.
The cross-cutting four — postgres-parallel-query, postgres-portals-prepared, and (optional, leaf) postgres-jit last, since they presuppose the plan and executor models.

Detail-doc summaries

Forward references — these module docs may not exist yet. One line each on what each will own.

Doc	One-line scope
`postgres-parser.md`	Lexer (`scan.l`) and bison grammar (`gram.y`) turning SQL text into a raw parse tree; what is and isn’t decided at parse time.
`postgres-analyze-transform.md`	Parse analysis (`analyze.c`, `parse_*`): name/relation resolution, type coercion, and the transformation of a raw parse tree into a `Query`.
`postgres-rewriter.md`	The rule system (`rewriteHandler.c`): view expansion, `DO INSTEAD` rules, and row-level-security qual injection on the `Query` tree.
`postgres-planner-overview.md`	`standard_planner`/`subquery_planner` top-level flow; the Path-vs-Plan model and how the optimizer’s pieces fit (home of `optimizer/README`).
`postgres-path-generation.md`	Building candidate access `Path`s per relation (`allpaths.c`, `indxpath.c`): seqscan/indexscan/bitmap paths, pathkeys, cheapest-first bookkeeping.
`postgres-join-ordering.md`	Join-order search: dynamic programming over join relations (`joinrels.c`, `joinpath.c`) and the GEQO genetic fallback past `geqo_threshold`.
`postgres-cost-model.md`	The cost functions (`costsize.c`) and selectivity estimation (`selfuncs.c`): how a Path’s `total_cost` is computed from row estimates and `*_cost` GUCs.
`postgres-plan-creation.md`	Turning the winning Path into an executable `Plan` tree (`createplan.c`, `setrefs.c`, `subselect.c`): var fixup, subplan handling.
`postgres-prep-rewrites.md`	Planner preprocessing (`optimizer/prep/*`): subquery pull-up/flattening, qual canonicalization, set-operation and target-list prep.
`postgres-extended-statistics.md`	Multi-column extended statistics (`statistics/*`): ndistinct, functional dependencies, MCV lists, and how `ANALYZE` populates them for better selectivity.
`postgres-executor.md`	The demand-pull engine: `ExecInitNode`/`ExecProcNode`/`ExecEndNode`, the `PlanState` tree, tuple slots, and the executor lifecycle.
`postgres-expression-eval.md`	Expression compilation to an opcode steps array (`execExpr.c`) and the switch-threaded interpreter (`execExprInterp.c`, `ExecInterpExpr`).
`postgres-scan-nodes.md`	Leaf scan executor nodes: seqscan, indexscan, index-only, bitmap-heap, tid and tid-range scans, and how they call the access methods.
`postgres-join-nodes.md`	Join executor nodes: nested loop, hash join (`nodeHash`/`nodeHashjoin`), and merge join — algorithms, batching, and memory behavior.
`postgres-agg-sort-nodes.md`	Grouping/ordering nodes: hash and sorted aggregation (`nodeAgg`), sort, incremental sort, group, and window aggregation.
`postgres-parallel-query.md`	Parallel execution: `Gather`/`GatherMerge`, `execParallel` worker setup, the `shm_mq` tuple queue, and parallel-aware nodes (per `README.parallel`).
`postgres-portals-prepared.md`	Portals (`pquery.c`) and prepared statements (`prepare.c`, `plancache.c`): plan caching, generic vs custom plans, cursor execution.
`postgres-jit.md`	Optional LLVM JIT (`jit/llvm/*`): when expressions and tuple deforming get compiled instead of interpreted, and the cost thresholds that trigger it.
`postgres-tuplesort.md`	General-purpose sorting (`tuplesort.c`) and `tuplestore`: in-memory quicksort, external merge with `logtape`, and the spill path agg/sort/merge-join use.
`postgres-node-trees.md`	The `Node`/`List` substrate (`nodes/*`, `gen_node_support.pl`): tagged nodes, copy/equal/out/read codegen, and `Bitmapset`/`tidbitmap` helpers.

Adjacent sections

storage-engine (postgres-overview-storage-engine.md) — the section below execution. The executor’s scan nodes stop at the table/index access-method boundary; storage-engine owns the AMs, the buffer manager, the page layout, and the heap they read. The handoff is the single most trafficked seam in the engine.
txn-recovery (postgres-overview-txn-recovery.md) — supplies the snapshot a scan reads against and the visibility rules that decide which tuple versions a query sees. The executor asks “is this tuple visible to my snapshot?”; txn-recovery answers.
system-catalog (postgres-overview-system-catalog.md) — the planner’s cost model and the parser’s name resolution both read catalog metadata (relations, types, operators, statistics) through the relcache/catcache. This section consumes those caches; system-catalog owns them.
client-protocol (postgres-overview-client-protocol.md) — the section before parse. It delivers the query_string and, via the extended query protocol, drives the Parse/Bind/Execute lifecycle that maps onto prepared statements and portals here.
server-architecture (postgres-overview-server-architecture.md) — the backend that runs this entire pipeline is a B_BACKEND, and parallel query’s workers are background workers attached to the shared-memory machine. Process model, locks, and the shm_mq/DSM substrate parallel query rides on all live there.