PostgreSQL Query Processing — Section Overview
Contents:
What this section covers
Section titled “What this section covers”This subcategory is the read/compute path: everything between a SQL
string arriving in a backend and a Plan tree producing tuples. It is the
classic five-stage pipeline — parse → analyze → rewrite → plan →
execute — plus the four things that hang off that spine: parallel
query, prepared statements and portals, planner statistics, and
the Node infrastructure the whole engine builds its trees from.
The per-stage entry points are functions in tcop/postgres.c
(pg_parse_query, pg_analyze_and_rewrite_fixedparams, pg_rewrite_query,
pg_plan_queries), so that file is the cheapest place to see the pipeline
laid out as straight-line code before diving into any one stage.
Scope, with the sharp boundaries this section does not cross:
- Where the SQL came from is out of scope. The wire protocol that
delivers the query string, the simple-vs-extended query messages, and
authentication belong to client-protocol. This section starts once a
backend has a
query_stringin hand. - What the executor’s scans actually touch is out of scope. When a
SeqScanorIndexScannode pulls a tuple, it calls a table access method or index access method — and those, plus the buffer manager and on-page layout below them, are owned by storage-engine. This section describes the node that drives the scan; storage-engine owns the heap page it lands on. - Visibility and locking under the scan are out of scope. Whether a tuple a scan reads is visible to the current snapshot, and what locks the executor takes, belong to txn-recovery (MVCC, snapshots, procarray) and server-architecture (the lock manager). The executor asks; those sections answer.
- DDL and utility statements are out of scope.
pg_plan_queriesonly plans optimizable statements (SELECT/INSERT/UPDATE/DELETE/MERGE); aCREATE TABLEis a utility statement routed throughProcessUtility, which lives in ddl-schema. This section is the optimizable-query path only. - The catalog the planner reads is out of scope as a subsystem. The optimizer’s cost model consumes per-column statistics and relation metadata, but the relcache/catcache machinery that serves them belongs to system-catalog. This section owns the consumer (selectivity estimation, extended statistics), not the cache.
Within scope, the load-bearing claims a reader should carry forward:
plan generation has two representations — cheapest-first Path nodes
during search, then a single executable Plan tree built from the winner;
join ordering is dynamic programming with a GEQO genetic fallback
once the join count crosses geqo_threshold; and the executor is a
demand-pull tree where every node answers ExecProcNode by pulling from
its children.
The pipeline
Section titled “The pipeline”flowchart TB
SQL["SQL text (query_string)"]
subgraph FRONT["front of the pipeline — tcop/postgres.c drives it"]
PARSE["postgres-parser<br/>pg_parse_query: scan.l + gram.y<br/>-> raw parse tree (RawStmt)"]
ANALYZE["postgres-analyze-transform<br/>parse_analyze: name resolution,<br/>type coercion -> Query tree"]
REWRITE["postgres-rewriter<br/>rule system: view expansion,<br/>RLS, DO INSTEAD -> Query list"]
end
SQL --> PARSE --> ANALYZE --> REWRITE
subgraph PLAN["the planner / optimizer — optimizer/README"]
PREP["postgres-prep-rewrites<br/>subquery pull-up, qual canon"]
PATHS["postgres-path-generation<br/>candidate Paths, cheapest-first"]
JOINS["postgres-join-ordering<br/>DP join search + GEQO fallback"]
COST["postgres-cost-model<br/>costsize.c + selectivity"]
CREATE["postgres-plan-creation<br/>winning Path -> executable Plan tree"]
STATS["postgres-extended-statistics<br/>multi-column stats for selectivity"]
end
REWRITE --> PREP --> PATHS
PATHS --> JOINS
COST -. feeds .-> PATHS
COST -. feeds .-> JOINS
STATS -. feeds .-> COST
JOINS --> CREATE
subgraph EXEC["the executor — executor/README"]
PORTAL["postgres-portals-prepared<br/>PortalStart/PortalRun, plan cache"]
DRIVER["postgres-executor<br/>ExecProcNode demand-pull tree"]
EXPR["postgres-expression-eval<br/>switch-threaded interpreter"]
SCANS["postgres-scan-nodes"]
JOINNODES["postgres-join-nodes"]
AGGSORT["postgres-agg-sort-nodes"]
PARALLEL["postgres-parallel-query<br/>Gather forks workers over shm_mq"]
JIT["postgres-jit<br/>optional LLVM compile of exprs"]
SORT["postgres-tuplesort<br/>external merge sort / spill"]
end
CREATE --> PORTAL --> DRIVER
DRIVER --> SCANS
DRIVER --> JOINNODES
DRIVER --> AGGSORT
DRIVER --> PARALLEL
DRIVER --> EXPR
EXPR -. optionally .-> JIT
AGGSORT -. spills via .-> SORT
NODES["postgres-node-trees<br/>the Node/List substrate every tree above is built from"]
NODES -. underpins .-> FRONT
NODES -. underpins .-> PLAN
NODES -. underpins .-> EXEC
SCANS --> AM["table / index access methods (storage-engine)"]
Three structural facts the diagram encodes:
- Trees, not streams. Each stage hands the next a tree of
Nodes: the raw parse tree, theQuerytree, thePlantree, the runtimePlanStatetree.postgres-node-treesowns theNode/Listmachinery (thegen_node_support.pl-generated copy/equal/out/read functions) that every other doc in this section assumes. - The planner has two grammars. During search it builds and compares
Pathnodes (cheap to make, cheapest-first); only the winner is turned into aPlanbypostgres-plan-creation. Cost (postgres-cost-model) and statistics (postgres-extended-statistics) are inputs to search, not stages — hence the dotted “feeds” edges. - Parallel query is a fork of the same tree. A
Gather/GatherMergenode launches background workers that run a copy of the sub-plan and stream tuples back overshm_mq; it is not a separate execution model.
Reading order
Section titled “Reading order”Cross-referenced-first, so each doc leans on ones already read:
postgres-node-trees— read this first. Everything downstream is a tree ofNodes; the copy/equal/out/read andListconventions are assumed by every other doc here.- The front of the pipeline —
postgres-parser→postgres-analyze-transform→postgres-rewriter. Short, linear, and they establish theQuerytree the planner consumes. - The planner —
postgres-planner-overviewfirst (it frames the whole optimizer and is the home ofoptimizer/README), thenpostgres-path-generationandpostgres-join-orderingfor the search,postgres-cost-modelandpostgres-extended-statisticsfor the inputs, andpostgres-plan-creationfor Path→Plan.postgres-prep-rewritescan trail (it is the planner’s preprocessing). - The executor —
postgres-executor(the demand-pull driver) before the per-node docspostgres-scan-nodes,postgres-join-nodes,postgres-agg-sort-nodes.postgres-expression-evalpairs with the executor;postgres-tuplesortis the spill machinery agg/sort nodes lean on. - The cross-cutting four —
postgres-parallel-query,postgres-portals-prepared, and (optional, leaf)postgres-jitlast, since they presuppose the plan and executor models.
Detail-doc summaries
Section titled “Detail-doc summaries”Forward references — these module docs may not exist yet. One line each on what each will own.
| Doc | One-line scope |
|---|---|
postgres-parser.md | Lexer (scan.l) and bison grammar (gram.y) turning SQL text into a raw parse tree; what is and isn’t decided at parse time. |
postgres-analyze-transform.md | Parse analysis (analyze.c, parse_*): name/relation resolution, type coercion, and the transformation of a raw parse tree into a Query. |
postgres-rewriter.md | The rule system (rewriteHandler.c): view expansion, DO INSTEAD rules, and row-level-security qual injection on the Query tree. |
postgres-planner-overview.md | standard_planner/subquery_planner top-level flow; the Path-vs-Plan model and how the optimizer’s pieces fit (home of optimizer/README). |
postgres-path-generation.md | Building candidate access Paths per relation (allpaths.c, indxpath.c): seqscan/indexscan/bitmap paths, pathkeys, cheapest-first bookkeeping. |
postgres-join-ordering.md | Join-order search: dynamic programming over join relations (joinrels.c, joinpath.c) and the GEQO genetic fallback past geqo_threshold. |
postgres-cost-model.md | The cost functions (costsize.c) and selectivity estimation (selfuncs.c): how a Path’s total_cost is computed from row estimates and *_cost GUCs. |
postgres-plan-creation.md | Turning the winning Path into an executable Plan tree (createplan.c, setrefs.c, subselect.c): var fixup, subplan handling. |
postgres-prep-rewrites.md | Planner preprocessing (optimizer/prep/*): subquery pull-up/flattening, qual canonicalization, set-operation and target-list prep. |
postgres-extended-statistics.md | Multi-column extended statistics (statistics/*): ndistinct, functional dependencies, MCV lists, and how ANALYZE populates them for better selectivity. |
postgres-executor.md | The demand-pull engine: ExecInitNode/ExecProcNode/ExecEndNode, the PlanState tree, tuple slots, and the executor lifecycle. |
postgres-expression-eval.md | Expression compilation to an opcode steps array (execExpr.c) and the switch-threaded interpreter (execExprInterp.c, ExecInterpExpr). |
postgres-scan-nodes.md | Leaf scan executor nodes: seqscan, indexscan, index-only, bitmap-heap, tid and tid-range scans, and how they call the access methods. |
postgres-join-nodes.md | Join executor nodes: nested loop, hash join (nodeHash/nodeHashjoin), and merge join — algorithms, batching, and memory behavior. |
postgres-agg-sort-nodes.md | Grouping/ordering nodes: hash and sorted aggregation (nodeAgg), sort, incremental sort, group, and window aggregation. |
postgres-parallel-query.md | Parallel execution: Gather/GatherMerge, execParallel worker setup, the shm_mq tuple queue, and parallel-aware nodes (per README.parallel). |
postgres-portals-prepared.md | Portals (pquery.c) and prepared statements (prepare.c, plancache.c): plan caching, generic vs custom plans, cursor execution. |
postgres-jit.md | Optional LLVM JIT (jit/llvm/*): when expressions and tuple deforming get compiled instead of interpreted, and the cost thresholds that trigger it. |
postgres-tuplesort.md | General-purpose sorting (tuplesort.c) and tuplestore: in-memory quicksort, external merge with logtape, and the spill path agg/sort/merge-join use. |
postgres-node-trees.md | The Node/List substrate (nodes/*, gen_node_support.pl): tagged nodes, copy/equal/out/read codegen, and Bitmapset/tidbitmap helpers. |
Adjacent sections
Section titled “Adjacent sections”- storage-engine (
postgres-overview-storage-engine.md) — the section below execution. The executor’s scan nodes stop at the table/index access-method boundary; storage-engine owns the AMs, the buffer manager, the page layout, and the heap they read. The handoff is the single most trafficked seam in the engine. - txn-recovery (
postgres-overview-txn-recovery.md) — supplies the snapshot a scan reads against and the visibility rules that decide which tuple versions a query sees. The executor asks “is this tuple visible to my snapshot?”; txn-recovery answers. - system-catalog (
postgres-overview-system-catalog.md) — the planner’s cost model and the parser’s name resolution both read catalog metadata (relations, types, operators, statistics) through the relcache/catcache. This section consumes those caches; system-catalog owns them. - client-protocol (
postgres-overview-client-protocol.md) — the section before parse. It delivers thequery_stringand, via the extended query protocol, drives the Parse/Bind/Execute lifecycle that maps onto prepared statements and portals here. - server-architecture (
postgres-overview-server-architecture.md) — the backend that runs this entire pipeline is aB_BACKEND, and parallel query’s workers are background workers attached to the shared-memory machine. Process model, locks, and theshm_mq/DSM substrate parallel query rides on all live there.