PostgreSQL SPI — The Server Programming Interface: How PLs and Extensions Run SQL From Inside the Backend
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”Every relational engine eventually faces the same architectural question:
how does code running inside the server submit SQL? The wire protocol
(PQexec, the extended-query Parse/Bind/Execute messages) is the answer
for an external client — a separate process that talks to the backend over a
socket. But a stored procedure, a trigger function, or a procedural-language
handler is not a separate process. It is C code linked into the backend,
running on the backend’s own stack, sharing the backend’s transaction, its
snapshot, its lock table, its memory contexts. For such code, going back out
to the socket would be absurd: it would mean serializing a query to bytes,
shipping it to itself, and re-parsing the result. What it needs instead is a
function-call interface to the very parser/planner/executor pipeline it is
already sitting inside.
This is the embedded query problem, and it is as old as stored procedures
themselves. The SQL standard’s answer is embedded SQL (the EXEC SQL
preprocessor of ESQL/C) and the module language; the practical answer that
every major engine ships is a C-level library with three irreducible
primitives:
- A session/connection handle — a way to say “I am now a query-issuing agent; allocate me whatever state I need,” paired with a teardown that frees it. Embedded code can nest (a function calls a function that issues SQL), so this handle is naturally a stack.
- Prepare/execute separation — the ability to parse-and-plan a statement
once and execute it many times with different parameter values, because the
archetypal consumer is a loop body inside a PL function that runs the same
INSERTten thousand times. - A result cursor — a way to walk a large result set one row (or one batch) at a time, rather than materializing millions of rows into the function’s address space at once.
The deeper theory here is about isolation of execution contexts. When
function f (issuing query Q1) calls function g (issuing query Q2), the
two queries must not corrupt each other’s notion of “the current result,” “the
current snapshot,” or “the memory in which my tuples live.” A textbook
treatment (see Database System Concepts, the chapters on application
development and on the storage/buffer interface captured in
dbms-general/database-system-concepts.md) frames this as the classic problem
of reentrancy with shared global state: the API exposes globals for
ergonomics (SPI_tuptable is a plain global so callers don’t thread a handle
through every line), but the implementation must save-and-restore those globals
across every nesting boundary, exactly as a CPU saves callee registers.
The second theoretical axis is plan caching. Preparing a statement produces
a generic or custom plan that can be reused, but reuse across executions
spanning DDL, search_path changes, or statistics churn requires
revalidation: the cached plan must be checked-and-possibly-replanned on each
use. PostgreSQL factors this into a separate plancache subsystem
(postgres-portals-prepared.md), and SPI is one of its principal clients — so
“how SPI runs SQL” is inseparable from “how the plan cache decides whether to
reuse or rebuild.”
The third axis — the one that makes SPI genuinely hard rather than merely
clerical — is snapshot and transaction management across the embedded
boundary. A read-only function (IMMUTABLE/STABLE) must see a stable
snapshot for the duration of the surrounding query, so it should not take a
fresh snapshot per statement. A VOLATILE function, by contrast, must see the
effects of its own prior modifications, so it advances the command counter and
re-snapshots. And a procedure called via CALL (or a DO block) may legally
COMMIT mid-body, which means SPI must be able to tear down and restart the
ambient transaction without losing the procedure’s own state. These three
snapshot regimes, plus the transaction-control case, are the substance of the
SPI executor loop.
Common DBMS Design
Section titled “Common DBMS Design”Before looking at PostgreSQL’s symbols, it is worth naming the conventions that every server-side SQL API converges on, so the PostgreSQL specifics read as choices within a shared design space.
The connection-as-stack convention
Section titled “The connection-as-stack convention”Server-side SQL APIs are reentrant: embedded code calls embedded code. The
universal solution is a stack of execution frames, one pushed per
connect, popped per finish, where each frame owns the resources allocated
on its behalf. Oracle’s OCI/OCIStmt handles, SQL Server’s SQLCLR
SqlContext, DB2’s SQL-routine runtime — all maintain per-invocation state
that nests. The reason it is a stack and not a flat handle is that the outer
frame’s results must survive intact while the inner frame runs and is torn
down.
Globals snapshotted at the boundary
Section titled “Globals snapshotted at the boundary”For ergonomics, these APIs expose the “most recent result” as ambient state rather than forcing the caller to thread a handle through every accessor. The implementation cost is that the connect path must snapshot the outer globals and the finish path must restore them, so a nested call is transparent to its caller. This is the embedded-SQL analog of caller/callee register save conventions.
One-shot vs. prepared plans
Section titled “One-shot vs. prepared plans”Two execution modes are universal:
- One-shot: hand over a query string, and the API parses, plans, and executes it, discarding the plan afterward. Cheap to call, expensive per call. The right choice for a statement run once.
- Prepared: parse-and-plan once into a reusable handle, then execute the handle repeatedly with bound parameters. Expensive to set up, cheap to repeat. The right choice for a loop body.
A mature API additionally distinguishes transient prepared plans (live only until the current operation ends) from saved plans (live across calls, parented in a long-lived cache context). The lifetime decision is explicit because it trades memory for replanning cost.
Result delivery: receiver vs. cursor
Section titled “Result delivery: receiver vs. cursor”Two delivery shapes recur. A materializing receiver accumulates the whole result set into a table-like structure in the caller’s memory — simple, but unbounded in size. A cursor/portal executes the plan lazily and lets the caller pull rows in batches — bounded memory, suitable for streaming a large result through a loop. Server-side APIs offer both and let the PL choose.
Snapshot inheritance
Section titled “Snapshot inheritance”The embedded query usually runs under the caller’s transaction and, for read-only callers, the caller’s snapshot — it does not start its own transaction. Whether to take a fresh snapshot (to see one’s own prior writes) or reuse the existing one (to stay stable) is the single most consequential correctness knob, and it is driven by the volatility classification of the function issuing the SQL.
flowchart TD PL["PL handler / extension C code<br/>(already inside a backend)"] -->|"SPI_connect"| STK["push _SPI_connection frame<br/>(stack, snapshots globals)"] STK -->|"SPI_execute(src)"| ONE["one-shot:<br/>parse + rewrite + plan + execute"] STK -->|"SPI_prepare(src)"| PREP["build CachedPlanSource<br/>(transient SPIPlan)"] PREP -->|"SPI_keepplan"| SAVE["reparent into CacheMemoryContext<br/>(saved plan, reusable)"] SAVE -->|"SPI_execute_plan(plan, params)"| RUN["revalidate plan + execute"] STK -->|"SPI_cursor_open(plan)"| CUR["hand plan to Portal<br/>fetch incrementally"] ONE --> TT["SPITupleTable<br/>(SPI_processed / SPI_tuptable)"] RUN --> TT CUR -->|"SPI_cursor_fetch"| TT TT -->|"SPI_finish"| POP["pop frame, free contexts,<br/>restore outer globals"]
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”PostgreSQL implements all of the above conventions in a single file,
src/backend/executor/spi.c, atop a small private header
src/include/executor/spi_priv.h. The whole design hangs off four data
structures and a global stack.
The connection stack. SPI keeps a process-global, dynamically grown array
_SPI_stack of _SPI_connection frames, with _SPI_connected indexing the
top and _SPI_current pointing at it. Each SPI_connect pushes a frame; each
SPI_finish pops it. Crucially, the stack can move (it is repalloc’d when
it grows), so any code that re-enters SPI must re-fetch _SPI_current after a
nested call — a hazard the source calls out explicitly in
_SPI_cursor_operation.
Two memory contexts per frame. Every frame owns a procCxt (“SPI Proc”)
and an execCxt (“SPI Exec”). The split is the heart of SPI’s lifetime model:
the exec context is reset after every SPI call (it holds parse/plan/execute
scratch), while the proc context survives until SPI_finish (it holds
results the caller will read, including SPITupleTables and transient plans).
In atomic contexts these are children of TopTransactionContext; in
non-atomic contexts (procedures that may commit) they hang off
PortalContext so they outlive a transaction boundary.
Three API globals, snapshotted. SPI_processed (row count),
SPI_tuptable (the result table), and SPI_result (error code) are plain
globals. SPI_connect_ext saves the outer values into outer_processed,
outer_tuptable, outer_result and zeroes the live ones; SPI_finish
restores them. This is what makes nesting transparent.
The SPITupleTable. Results materialize into a SPITupleTable: a palloc0’d
struct living in its own child context (tuptabcxt, “SPI TupTable”), holding
a growable vals[] array of HeapTuple and a tupdesc. The DestSPI
receiver — spi_dest_startup allocates the table, spi_printtup appends each
tuple — is how the executor streams rows into it. Every live tuptable is linked
onto the frame’s tuptables slist so subtransaction abort can free orphans.
Plans are plancache entries. An SPIPlanPtr (_SPI_plan) is a thin wrapper
around a list of CachedPlanSources. A transient plan lives in execCxt and
dies at end-of-call; _SPI_make_plan_non_temp copies it into procCxt;
SPI_keepplan/_SPI_save_plan reparents it into CacheMemoryContext and marks
the plancache entries saved, making the plan reusable across calls and immune to
transaction end. Execution always goes through GetCachedPlan, which
revalidates and replans as needed.
The snapshot regimes. _SPI_execute_plan is where SPI earns its
complexity. Its header comment enumerates four behaviors keyed on
(snapshot supplied?) × (read_only?). For read-only callers it reuses the active
snapshot; for read-write it advances the command counter and takes a fresh
transaction snapshot per statement; for non-atomic execution it leans on the
Portal’s snapshot so a mid-procedure COMMIT works.
flowchart TD
EP["_SPI_execute_plan(plan, options,<br/>snapshot, crosscheck)"] --> Q{"snapshot != InvalidSnapshot?"}
Q -->|"yes, read_only"| PUSH1["PushActiveSnapshot(snapshot)<br/>one push for whole plan"]
Q -->|"yes, read-write"| PUSH2["PushCopiedSnapshot(snapshot)<br/>advance CID per querytree"]
Q -->|"no (InvalidSnapshot)"| LOOP["per CachedPlanSource:"]
LOOP --> NEED{"stmt requires snapshot?"}
NEED -->|"no"| SKIP["run with no snapshot"]
NEED -->|"yes"| ENS["EnsurePortalSnapshotExists()"]
ENS --> RW{"read-write & atomic?"}
RW -->|"yes"| FRESH["PushActiveSnapshot(GetTransactionSnapshot())<br/>fresh per statement-list"]
RW -->|"no / nonatomic"| PORTAL["use Portal snapshot unmodified"]
PUSH1 --> EXEC["foreach stmt: CreateQueryDesc + _SPI_pquery<br/>or ProcessUtility"]
PUSH2 --> EXEC
FRESH --> EXEC
PORTAL --> EXEC
SKIP --> EXEC
The remaining sections walk the actual code for each of these moving parts.
Source Walkthrough
Section titled “Source Walkthrough”All excerpts are condensed from REL_18 (commit 273fe94, captured
2026-06-05). Each block leads with a // symbol — path comment; exact line
numbers are collected in the position-hint table at the end of this section.
1. Pushing a frame: SPI_connect / SPI_connect_ext
Section titled “1. Pushing a frame: SPI_connect / SPI_connect_ext”SPI_connect is a thin wrapper; all the work is in SPI_connect_ext, which
grows the stack, initializes the frame, and snapshots the outer globals.
// SPI_connect_ext — src/backend/executor/spi.cintSPI_connect_ext(int options){ int newdepth;
/* Enlarge stack if necessary */ if (_SPI_stack == NULL) { if (_SPI_connected != -1 || _SPI_stack_depth != 0) elog(ERROR, "SPI stack corrupted"); newdepth = 16; _SPI_stack = (_SPI_connection *) MemoryContextAlloc(TopMemoryContext, newdepth * sizeof(_SPI_connection)); _SPI_stack_depth = newdepth; } else { if (_SPI_stack_depth == _SPI_connected + 1) { newdepth = _SPI_stack_depth * 2; _SPI_stack = (_SPI_connection *) repalloc(_SPI_stack, newdepth * sizeof(_SPI_connection)); _SPI_stack_depth = newdepth; } }
/* Enter new stack level */ _SPI_connected++; _SPI_current = &(_SPI_stack[_SPI_connected]); _SPI_current->processed = 0; _SPI_current->tuptable = NULL; slist_init(&_SPI_current->tuptables); _SPI_current->connectSubid = GetCurrentSubTransactionId(); _SPI_current->atomic = (options & SPI_OPT_NONATOMIC ? false : true); _SPI_current->internal_xact = false; _SPI_current->outer_processed = SPI_processed; _SPI_current->outer_tuptable = SPI_tuptable; _SPI_current->outer_result = SPI_result;Note the three outer_* fields: the caller’s results are stashed here so a
nested SPI user cannot see them change underneath it. The atomic flag, derived
from SPI_OPT_NONATOMIC, decides whether this frame may later commit a
transaction.
The same function then builds the two memory contexts — and the choice of parent is the crux of SPI’s transaction-spanning ability:
// SPI_connect_ext (continued) — src/backend/executor/spi.c /* * In atomic contexts (the normal case), we use TopTransactionContext, * otherwise PortalContext, so that it lives across transaction * boundaries. */ _SPI_current->procCxt = AllocSetContextCreate( _SPI_current->atomic ? TopTransactionContext : PortalContext, "SPI Proc", ALLOCSET_DEFAULT_SIZES); _SPI_current->execCxt = AllocSetContextCreate( _SPI_current->atomic ? TopTransactionContext : _SPI_current->procCxt, "SPI Exec", ALLOCSET_DEFAULT_SIZES); /* ... and switch to procedure's context */ _SPI_current->savedcxt = MemoryContextSwitchTo(_SPI_current->procCxt);
/* Reset API global variables ... */ SPI_processed = 0; SPI_tuptable = NULL; SPI_result = 0;
return SPI_OK_CONNECT;}A non-atomic frame parents both contexts on PortalContext (not
TopTransactionContext) precisely so that when the procedure’s body does
COMMIT, the proc/exec contexts survive the transaction reset.
2. Popping a frame: SPI_finish
Section titled “2. Popping a frame: SPI_finish”SPI_finish is the mirror image: switch back to the saved context, delete the
two contexts (which frees every tuptable and transient plan), restore the outer
globals, and pop the stack index.
// SPI_finish — src/backend/executor/spi.cintSPI_finish(void){ int res;
res = _SPI_begin_call(false); /* just check we're connected */ if (res < 0) return res;
/* Restore memory context as it was before procedure call */ MemoryContextSwitchTo(_SPI_current->savedcxt);
/* Release memory used in procedure call (including tuptables) */ MemoryContextDelete(_SPI_current->execCxt); _SPI_current->execCxt = NULL; MemoryContextDelete(_SPI_current->procCxt); _SPI_current->procCxt = NULL;
/* Restore outer API variables ... */ SPI_processed = _SPI_current->outer_processed; SPI_tuptable = _SPI_current->outer_tuptable; SPI_result = _SPI_current->outer_result;
/* Exit stack level */ _SPI_connected--; if (_SPI_connected < 0) _SPI_current = NULL; else _SPI_current = &(_SPI_stack[_SPI_connected]);
return SPI_OK_FINISH;}Because procCxt owns every SPITupleTable, deleting it reclaims all result
memory in one stroke — there is no per-tuple bookkeeping at teardown. The
comment “SPI_tuptable which is probably pointing at a just-deleted tuptable” is
why the restore happens after the deletes.
3. The per-call boundary: _SPI_begin_call / _SPI_end_call
Section titled “3. The per-call boundary: _SPI_begin_call / _SPI_end_call”Every public SPI entry point brackets its work with these two helpers. They switch into execCxt on entry and, on exit, switch back to procCxt and reset execCxt — this is what bounds the lifetime of parse/plan scratch to a single call.
// _SPI_begin_call / _SPI_end_call — src/backend/executor/spi.cstatic int_SPI_begin_call(bool use_exec){ if (_SPI_current == NULL) return SPI_ERROR_UNCONNECTED; if (use_exec) { _SPI_current->execSubid = GetCurrentSubTransactionId(); _SPI_execmem(); /* switch to executor context */ } return 0;}
static int_SPI_end_call(bool use_exec){ if (use_exec) { _SPI_procmem(); /* switch back to procedure context */ _SPI_current->execSubid = InvalidSubTransactionId; MemoryContextReset(_SPI_current->execCxt); /* free executor memory */ } return 0;}_SPI_execmem and _SPI_procmem are one-liners — MemoryContextSwitchTo on
execCxt and procCxt respectively — but they encode the entire “scratch dies
per call, results survive per frame” discipline.
4. One-shot execution: SPI_execute
Section titled “4. One-shot execution: SPI_execute”SPI_execute builds a stack-local _SPI_plan, parses it as a oneshot plan,
and hands it straight to _SPI_execute_plan with InvalidSnapshot (meaning
“SPI, manage the snapshot yourself per the read_only flag”).
// SPI_execute — src/backend/executor/spi.cintSPI_execute(const char *src, bool read_only, long tcount){ _SPI_plan plan; SPIExecuteOptions options; int res;
if (src == NULL || tcount < 0) return SPI_ERROR_ARGUMENT;
res = _SPI_begin_call(true); if (res < 0) return res;
memset(&plan, 0, sizeof(_SPI_plan)); plan.magic = _SPI_PLAN_MAGIC; plan.parse_mode = RAW_PARSE_DEFAULT; plan.cursor_options = CURSOR_OPT_PARALLEL_OK;
_SPI_prepare_oneshot_plan(src, &plan);
memset(&options, 0, sizeof(options)); options.read_only = read_only; options.tcount = tcount;
res = _SPI_execute_plan(&plan, &options, InvalidSnapshot, InvalidSnapshot, true);
_SPI_end_call(true); return res;}The plan lives on the C stack and its subsidiary data in execCxt, so it
evaporates at _SPI_end_call. CURSOR_OPT_PARALLEL_OK lets a one-shot
SELECT use parallel workers.
5. Preparing a reusable plan: SPI_prepare and _SPI_make_plan_non_temp
Section titled “5. Preparing a reusable plan: SPI_prepare and _SPI_make_plan_non_temp”SPI_prepare (via SPI_prepare_cursor) parses into a transient _SPI_plan,
then copies it into procCxt so it survives the call:
// SPI_prepare_cursor — src/backend/executor/spi.cSPIPlanPtrSPI_prepare_cursor(const char *src, int nargs, Oid *argtypes, int cursorOptions){ _SPI_plan plan; SPIPlanPtr result;
if (src == NULL || nargs < 0 || (nargs > 0 && argtypes == NULL)) { SPI_result = SPI_ERROR_ARGUMENT; return NULL; }
SPI_result = _SPI_begin_call(true); if (SPI_result < 0) return NULL;
memset(&plan, 0, sizeof(_SPI_plan)); plan.magic = _SPI_PLAN_MAGIC; plan.parse_mode = RAW_PARSE_DEFAULT; plan.cursor_options = cursorOptions; plan.nargs = nargs; plan.argtypes = argtypes;
_SPI_prepare_plan(src, &plan);
/* copy plan to procedure context */ result = _SPI_make_plan_non_temp(&plan);
_SPI_end_call(true); return result;}_SPI_make_plan_non_temp is the reparenting trick. Rather than deep-copying the
plan trees, it creates an “SPI Plan” context under procCxt and reparents the
existing CachedPlanSources into it, then unlinks them from the temporary
plan so _SPI_end_call’s execCxt reset cannot touch them:
// _SPI_make_plan_non_temp — src/backend/executor/spi.cstatic SPIPlanPtr_SPI_make_plan_non_temp(SPIPlanPtr plan){ SPIPlanPtr newplan; MemoryContext parentcxt = _SPI_current->procCxt; MemoryContext plancxt; ListCell *lc;
Assert(plan->magic == _SPI_PLAN_MAGIC); Assert(!plan->oneshot); /* one-shot plans can't be saved */
plancxt = AllocSetContextCreate(parentcxt, "SPI Plan", ALLOCSET_SMALL_SIZES); MemoryContextSwitchTo(plancxt);
newplan = (SPIPlanPtr) palloc0(sizeof(_SPI_plan)); newplan->magic = _SPI_PLAN_MAGIC; newplan->plancxt = plancxt; /* ... copy parse_mode, cursor_options, argtypes ... */
foreach(lc, plan->plancache_list) { CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
CachedPlanSetParentContext(plansource, parentcxt); /* reparent */ newplan->plancache_list = lappend(newplan->plancache_list, plansource); }
/* For safety, unlink the CachedPlanSources from the temporary plan */ plan->plancache_list = NIL; return newplan;}At this point the plan is unsaved: it lives in procCxt and dies at
SPI_finish. To make it reusable across calls (the PL/pgSQL plan cache does
this for every statement in a function body), the caller invokes
SPI_keepplan.
6. Promoting to a saved plan: SPI_keepplan
Section titled “6. Promoting to a saved plan: SPI_keepplan”SPI_keepplan reparents the plan context up to CacheMemoryContext — which
outlives transactions entirely — and marks each CachedPlanSource saved so the
plancache will track invalidations for it:
// SPI_keepplan — src/backend/executor/spi.cintSPI_keepplan(SPIPlanPtr plan){ ListCell *lc;
if (plan == NULL || plan->magic != _SPI_PLAN_MAGIC || plan->saved || plan->oneshot) return SPI_ERROR_ARGUMENT;
/* * Mark it saved, reparent it under CacheMemoryContext, and mark all the * component CachedPlanSources as saved. This sequence cannot fail * partway through, so there's no risk of long-term memory leakage. */ plan->saved = true; MemoryContextSetParent(plan->plancxt, CacheMemoryContext);
foreach(lc, plan->plancache_list) { CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc); SaveCachedPlan(plansource); }
return 0;}The “cannot fail partway through” comment matters: a half-saved plan would leak
into CacheMemoryContext permanently, so the sequence is deliberately
allocation-free.
7. Executing a saved plan: SPI_execute_plan
Section titled “7. Executing a saved plan: SPI_execute_plan”SPI_execute_plan converts the caller’s positional Values/Nulls arrays into
a ParamListInfo and dispatches to _SPI_execute_plan — again with
InvalidSnapshot so SPI manages snapshots:
// SPI_execute_plan — src/backend/executor/spi.cintSPI_execute_plan(SPIPlanPtr plan, Datum *Values, const char *Nulls, bool read_only, long tcount){ SPIExecuteOptions options; int res;
if (plan == NULL || plan->magic != _SPI_PLAN_MAGIC || tcount < 0) return SPI_ERROR_ARGUMENT; if (plan->nargs > 0 && Values == NULL) return SPI_ERROR_PARAM;
res = _SPI_begin_call(true); if (res < 0) return res;
memset(&options, 0, sizeof(options)); options.params = _SPI_convert_params(plan->nargs, plan->argtypes, Values, Nulls); options.read_only = read_only; options.tcount = tcount;
res = _SPI_execute_plan(plan, &options, InvalidSnapshot, InvalidSnapshot, true);
_SPI_end_call(true); return res;}8. The engine room: _SPI_execute_plan snapshot management
Section titled “8. The engine room: _SPI_execute_plan snapshot management”This is the function that implements the four snapshot regimes. The leading
case handles a caller-supplied snapshot (used internally, e.g. by
SPI_execute_snapshot):
// _SPI_execute_plan — src/backend/executor/spi.cstatic int_SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options, Snapshot snapshot, Snapshot crosscheck_snapshot, bool fire_triggers){ bool allow_nonatomic; bool pushed_active_snap = false; ResourceOwner plan_owner = options->owner; CachedPlan *cplan = NULL; ListCell *lc1;
allow_nonatomic = options->allow_nonatomic && !_SPI_current->atomic && !IsSubTransaction();
if (snapshot != InvalidSnapshot) { Assert(!options->allow_nonatomic); if (options->read_only) { PushActiveSnapshot(snapshot); /* use exactly this snapshot */ pushed_active_snap = true; } else { PushCopiedSnapshot(snapshot); /* private copy to advance CID */ pushed_active_snap = true; } } /* ... ensure resource owner iff plan->saved ... */When the snapshot is InvalidSnapshot (the public-API case), the per-statement
loop decides whether each statement-list needs a snapshot and, for read-write
atomic execution, takes a fresh transaction snapshot per statement-list:
// _SPI_execute_plan (per-plansource loop) — src/backend/executor/spi.c foreach(lc1, plan->plancache_list) { CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1); List *stmt_list;
/* ... one-shot plans get parse-analyzed here via CompleteCachedPlan ... */
/* Replan if needed, and increment plan refcount. */ cplan = GetCachedPlan(plansource, options->params, plan_owner, _SPI_current->queryEnv); stmt_list = cplan->stmt_list;
if (snapshot == InvalidSnapshot && (list_length(stmt_list) > 1 || (list_length(stmt_list) == 1 && PlannedStmtRequiresSnapshot(linitial_node(PlannedStmt, stmt_list))))) { /* back-fill a Portal snapshot in case prior op was COMMIT/ROLLBACK */ EnsurePortalSnapshotExists();
if (!options->read_only && !allow_nonatomic) { if (pushed_active_snap) PopActiveSnapshot(); PushActiveSnapshot(GetTransactionSnapshot()); pushed_active_snap = true; } } /* ... inner loop over individual PlannedStmts ... */ }The inner statement loop is where read-write execution advances the command
counter so each statement sees the previous one’s effects, then dispatches to
either the executor (_SPI_pquery) or ProcessUtility:
// _SPI_execute_plan (inner stmt loop) — src/backend/executor/spi.c foreach(lc2, stmt_list) { PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2); DestReceiver *dest;
_SPI_current->processed = 0; _SPI_current->tuptable = NULL;
/* read-only callers may not run a volatile command */ if (options->read_only && !CommandIsReadOnly(stmt)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("%s is not allowed in a non-volatile function", CreateCommandName((Node *) stmt))));
/* read-write: advance command counter + snapshot CID per command */ if (!options->read_only && pushed_active_snap) { CommandCounterIncrement(); UpdateActiveSnapshotCommandId(); }
if (!stmt->canSetTag) dest = CreateDestReceiver(DestNone); else if (options->dest) dest = options->dest; else dest = CreateDestReceiver(DestSPI);
if (stmt->utilityStmt == NULL) { QueryDesc *qdesc; Snapshot snap = ActiveSnapshotSet() ? GetActiveSnapshot() : InvalidSnapshot;
qdesc = CreateQueryDesc(stmt, plansource->query_string, snap, crosscheck_snapshot, dest, options->params, _SPI_current->queryEnv, 0); res = _SPI_pquery(qdesc, fire_triggers, stmt->canSetTag ? options->tcount : 0); FreeQueryDesc(qdesc); } else /* ProcessUtility(...) for DDL/utility statements */ ; }The CommandIsReadOnly guard is SPI’s enforcement of function volatility: a
STABLE/IMMUTABLE function that tries to run an INSERT is rejected here
with “not allowed in a non-volatile function.”
9. Result delivery: spi_dest_startup and spi_printtup
Section titled “9. Result delivery: spi_dest_startup and spi_printtup”The DestSPI receiver materializes executor output into a SPITupleTable.
spi_dest_startup allocates the table in a fresh child of procCxt and links it
onto the frame’s tuptables list:
// spi_dest_startup — src/backend/executor/spi.cvoidspi_dest_startup(DestReceiver *self, int operation, TupleDesc typeinfo){ SPITupleTable *tuptable; MemoryContext oldcxt, tuptabcxt;
if (_SPI_current == NULL) elog(ERROR, "spi_dest_startup called while not connected to SPI"); if (_SPI_current->tuptable != NULL) elog(ERROR, "improper call to spi_dest_startup");
oldcxt = _SPI_procmem(); /* switch to procedure memory context */ tuptabcxt = AllocSetContextCreate(CurrentMemoryContext, "SPI TupTable", ALLOCSET_DEFAULT_SIZES); MemoryContextSwitchTo(tuptabcxt);
_SPI_current->tuptable = tuptable = (SPITupleTable *) palloc0(sizeof(SPITupleTable)); tuptable->tuptabcxt = tuptabcxt; tuptable->subid = GetCurrentSubTransactionId();
/* put it on the SPI context's tuptables list (so abort can free it) */ slist_push_head(&_SPI_current->tuptables, &tuptable->next);
tuptable->alloced = 128; tuptable->vals = (HeapTuple *) palloc(tuptable->alloced * sizeof(HeapTuple)); tuptable->numvals = 0; tuptable->tupdesc = CreateTupleDescCopy(typeinfo);
MemoryContextSwitchTo(oldcxt);}spi_printtup is called once per result row; it grows vals[] geometrically
and copies the slot’s tuple into the tuptable’s context:
// spi_printtup — src/backend/executor/spi.cboolspi_printtup(TupleTableSlot *slot, DestReceiver *self){ SPITupleTable *tuptable = _SPI_current->tuptable; MemoryContext oldcxt;
if (tuptable == NULL) elog(ERROR, "improper call to spi_printtup");
oldcxt = MemoryContextSwitchTo(tuptable->tuptabcxt);
if (tuptable->numvals >= tuptable->alloced) { uint64 newalloced = tuptable->alloced * 2; /* double the array */ tuptable->vals = (HeapTuple *) repalloc_huge(tuptable->vals, newalloced * sizeof(HeapTuple)); tuptable->alloced = newalloced; }
tuptable->vals[tuptable->numvals] = ExecCopySlotHeapTuple(slot); (tuptable->numvals)++;
MemoryContextSwitchTo(oldcxt); return true;}Because every tuple is copied into tuptabcxt, the result table is
self-contained: it survives executor teardown and is freed only when its
context is deleted (by SPI_freetuptable, SPI_finish, or abort).
10. Freeing results: SPI_freetuptable
Section titled “10. Freeing results: SPI_freetuptable”SPI_freetuptable lets a caller reclaim one result table early. It guards
against double-free by requiring the table to be on the current frame’s
tuptables list, then deletes its context:
// SPI_freetuptable — src/backend/executor/spi.cvoidSPI_freetuptable(SPITupleTable *tuptable){ bool found = false;
if (tuptable == NULL) return;
if (_SPI_current != NULL) { slist_mutable_iter siter; slist_foreach_modify(siter, &_SPI_current->tuptables) { SPITupleTable *tt = slist_container(SPITupleTable, next, siter.cur); if (tt == tuptable) { slist_delete_current(&siter); found = true; break; } } }
if (!found) { elog(WARNING, "attempt to delete invalid SPITupleTable %p", tuptable); return; }
/* reset globals that might point at it, then free its memory */ if (tuptable == _SPI_current->tuptable) _SPI_current->tuptable = NULL; if (tuptable == SPI_tuptable) SPI_tuptable = NULL; MemoryContextDelete(tuptable->tuptabcxt);}11. Cursors: SPI_cursor_open_internal
Section titled “11. Cursors: SPI_cursor_open_internal”Cursors give a PL incremental fetch by handing the plan to the Portal
machinery (postgres-portals-prepared.md) instead of materializing everything.
SPI_cursor_open_internal creates a portal, copies the plan in, sets up the
snapshot exactly as the executor would, and calls PortalStart:
// SPI_cursor_open_internal — src/backend/executor/spi.cPortalSPI_cursor_open_internal(const char *name, SPIPlanPtr plan, ParamListInfo paramLI, bool read_only){ CachedPlanSource *plansource; CachedPlan *cplan; List *stmt_list; Portal portal; Snapshot snapshot;
if (!SPI_is_cursor_plan(plan)) ereport(ERROR, (errcode(ERRCODE_INVALID_CURSOR_DEFINITION), errmsg("cannot open non-SELECT as cursor")));
plansource = (CachedPlanSource *) linitial(plan->plancache_list);
if (_SPI_begin_call(true) < 0) elog(ERROR, "SPI_cursor_open called while not connected");
SPI_processed = 0; SPI_tuptable = NULL;
/* create the portal (named or anonymous) */ if (name == NULL || name[0] == '\0') portal = CreateNewPortal(); else portal = CreatePortal(name, false, false);
/* revalidate plan, take a refcount for the portal */ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv); stmt_list = cplan->stmt_list; /* ... copy unsaved plans into portalContext; PortalDefineQuery(...) ... */
/* Set up the snapshot exactly like the executor would. */ if (read_only) snapshot = GetActiveSnapshot(); else { CommandCounterIncrement(); snapshot = GetTransactionSnapshot(); }
PortalStart(portal, paramLI, 0, snapshot); Assert(portal->strategy != PORTAL_MULTI_QUERY);
_SPI_end_call(true); return portal;}Fetching then goes through _SPI_cursor_operation, which contains the famous
re-fetch hazard: running the portal may run PL code that itself uses SPI,
moving _SPI_stack via repalloc, so _SPI_current must be re-read after
PortalRunFetch:
// _SPI_cursor_operation — src/backend/executor/spi.cstatic void_SPI_cursor_operation(Portal portal, FetchDirection direction, long count, DestReceiver *dest){ uint64 nfetched;
if (!PortalIsValid(portal)) elog(ERROR, "invalid portal in SPI cursor operation"); if (_SPI_begin_call(true) < 0) elog(ERROR, "SPI cursor operation called while not connected");
SPI_processed = 0; SPI_tuptable = NULL; _SPI_current->processed = 0; _SPI_current->tuptable = NULL;
nfetched = PortalRunFetch(portal, direction, count, dest);
/* * ... the portal may move _SPI_stack around; re-fetch _SPI_current * after the call before storing into it. */ _SPI_current->processed = nfetched;
if (dest->mydest == DestSPI && _SPI_checktuples()) elog(ERROR, "consistency check on SPI tuple count failed");
SPI_processed = _SPI_current->processed; SPI_tuptable = _SPI_current->tuptable; _SPI_current->tuptable = NULL; /* now caller's responsibility */
_SPI_end_call(true);}12. Transaction-boundary cleanup: AtEOXact_SPI
Section titled “12. Transaction-boundary cleanup: AtEOXact_SPI”At transaction end, any SPI frames left dangling (a PL that errored without
calling SPI_finish) are popped here. The loop stops at an internal_xact
frame — the frame belonging to the SPI_commit/SPI_rollback caller, which
manages its own lifetime:
// AtEOXact_SPI — src/backend/executor/spi.cvoidAtEOXact_SPI(bool isCommit){ bool found = false;
while (_SPI_connected >= 0) { _SPI_connection *connection = &(_SPI_stack[_SPI_connected]);
if (connection->internal_xact) break; /* belongs to SPI_commit/SPI_rollback caller */
found = true;
/* contexts go away with their parent; just restore globals + pop */ SPI_processed = connection->outer_processed; SPI_tuptable = connection->outer_tuptable; SPI_result = connection->outer_result;
_SPI_connected--; _SPI_current = (_SPI_connected < 0) ? NULL : &(_SPI_stack[_SPI_connected]); }
if (found && isCommit) ereport(WARNING, (errcode(ERRCODE_WARNING), errmsg("transaction left non-empty SPI stack"), errhint("Check for missing \"SPI_finish\" calls.")));}A warning on commit (but silent on abort) encodes the contract: leaving a frame
open across a successful transaction is a bug in the PL/extension; leaving one
open across an error is expected (the error short-circuited the SPI_finish).
The companion AtEOSubXact_SPI does the same at subtransaction boundaries but
explicitly deletes each frame’s exec/proc contexts (a subxact abort cannot
rely on the parent context disappearing) and uses the subid stamped on each
tuptable to free only those created in the aborting subtransaction.
Position hints (as of 2026-06-05, REL_18 273fe94)
Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”| Symbol | File | Line |
|---|---|---|
SPI_connect | src/backend/executor/spi.c | 94 |
SPI_connect_ext | src/backend/executor/spi.c | 100 |
SPI_finish | src/backend/executor/spi.c | 182 |
AtEOXact_SPI | src/backend/executor/spi.c | 428 |
AtEOSubXact_SPI | src/backend/executor/spi.c | 482 |
SPI_execute | src/backend/executor/spi.c | 596 |
SPI_execute_extended | src/backend/executor/spi.c | 637 |
SPI_execute_plan | src/backend/executor/spi.c | 672 |
SPI_prepare | src/backend/executor/spi.c | 860 |
SPI_prepare_cursor | src/backend/executor/spi.c | 866 |
SPI_keepplan | src/backend/executor/spi.c | 976 |
SPI_saveplan | src/backend/executor/spi.c | 1003 |
SPI_copytuple | src/backend/executor/spi.c | 1047 |
SPI_getvalue | src/backend/executor/spi.c | 1220 |
SPI_freetuptable | src/backend/executor/spi.c | 1386 |
SPI_cursor_open | src/backend/executor/spi.c | 1445 |
SPI_cursor_open_internal | src/backend/executor/spi.c | 1577 |
SPI_cursor_fetch | src/backend/executor/spi.c | 1806 |
spi_dest_startup | src/backend/executor/spi.c | 2123 |
spi_printtup | src/backend/executor/spi.c | 2171 |
_SPI_prepare_plan | src/backend/executor/spi.c | 2221 |
_SPI_execute_plan | src/backend/executor/spi.c | 2399 |
_SPI_cursor_operation | src/backend/executor/spi.c | 3007 |
_SPI_begin_call | src/backend/executor/spi.c | 3077 |
_SPI_end_call | src/backend/executor/spi.c | 3101 |
_SPI_make_plan_non_temp | src/backend/executor/spi.c | 3141 |
_SPI_save_plan | src/backend/executor/spi.c | 3209 |
SPI_register_relation | src/backend/executor/spi.c | 3297 |
_SPI_connection (struct) | src/include/executor/spi_priv.h | 22 |
_SPI_plan (struct) | src/include/executor/spi_priv.h | 90 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”Claims in this document were checked against the REL_18 tree at
/data/hgryoo/references/postgres (commit 273fe94, captured 2026-06-05) as
follows:
- Connection stack is a growable global array. Confirmed:
_SPI_stack,_SPI_stack_depth,_SPI_connected,_SPI_currentare file-scope statics inspi.c;SPI_connect_extallocates 16 frames initially and doubles viarepalloc. The “stack can move” hazard is documented in_SPI_cursor_operation’s comment (“_SPI_stackis likely to move around … re-fetch the pointer”). - Two contexts per frame, parent chosen by atomicity. Confirmed in
SPI_connect_ext:procCxt/execCxtare created with parentTopTransactionContext(atomic) orPortalContext/procCxt(non-atomic). The comment block spelling out the rationale is present verbatim. - Globals snapshotted at the boundary. Confirmed:
outer_processed,outer_tuptable,outer_resultfields in_SPI_connection(spi_priv.h), saved inSPI_connect_extand restored inSPI_finish,AtEOXact_SPI, andAtEOSubXact_SPI. - One-shot path uses InvalidSnapshot. Confirmed:
SPI_executeandSPI_execute_extendedboth call_SPI_prepare_oneshot_planthen_SPI_execute_plan(..., InvalidSnapshot, InvalidSnapshot, true). - Four snapshot regimes. Confirmed by the header comment block inside
_SPI_execute_planenumerating the (snapshot supplied?) × (read_only?) matrix, and by the code branches:PushActiveSnapshot/PushCopiedSnapshotfor supplied snapshots,EnsurePortalSnapshotExists+PushActiveSnapshot(GetTransactionSnapshot())for the managed read-write case. - Volatility enforcement. Confirmed:
_SPI_execute_planraisesERRCODE_FEATURE_NOT_SUPPORTED(“%sis not allowed in a non-volatile function”) whenoptions->read_only && !CommandIsReadOnly(stmt). - SPITupleTable geometric growth + per-tuple copy. Confirmed:
spi_dest_startupsetsalloced = 128;spi_printtupdoubles viarepalloc_hugeand copies withExecCopySlotHeapTuple. - SPI_keepplan reparents to CacheMemoryContext. Confirmed:
MemoryContextSetParent(plan->plancxt, CacheMemoryContext)+SaveCachedPlanloop, with the “cannot fail partway through” comment. - Transaction control restricted to non-atomic, non-subxact. Confirmed:
_SPI_commitraisesERRCODE_INVALID_TRANSACTION_TERMINATIONwhen_SPI_current->atomicorIsSubTransaction().
Line numbers in the position-hint table were read directly from the file on 2026-06-05; they are approximate anchors and may drift with future commits — the symbol names are the stable references.
Scope caveat: this document covers src/backend/executor/spi.c and its private
header. It does not assert anything about PL/pgSQL’s internal state machine
(pl_exec.c), the executor’s plan-node machinery, or portal/prepared-statement
lifecycle beyond the SPI touch-points; those are deferred to
postgres-plpgsql.md, postgres-executor.md, and
postgres-portals-prepared.md respectively. contrib/ modules are out of
scope.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”SPI vs. the wire protocol — two front doors to one engine
Section titled “SPI vs. the wire protocol — two front doors to one engine”The cleanest way to situate SPI is to contrast it with PostgreSQL’s other
query-submission path, the extended-query protocol
(postgres-wire-protocol.md, postgres-portals-prepared.md). The wire path
runs exec_parse_message → exec_bind_message → exec_execute_message and is
driven by an external client; SPI runs SPI_prepare → SPI_execute_plan and is
driven by in-backend C code. Both ultimately call into the same plancache and
the same executor — SPI is best understood as a thin in-process facade that
reuses the heavy machinery without the socket. The notable asymmetry is
snapshot ownership: the wire path always starts its own command and snapshot,
whereas SPI usually inherits the caller’s transaction and (for read-only
callers) the caller’s snapshot. This inheritance is exactly what lets a
STABLE function see a consistent view of the database for the lifetime of the
outer query.
Comparison with other engines’ server-side SQL APIs
Section titled “Comparison with other engines’ server-side SQL APIs”- Oracle (OCI / PL/SQL’s
EXECUTE IMMEDIATE). Oracle’s PL/SQL runs in the same process as the SQL engine and shares cursors through a shared-cursor cache keyed by SQL text; PostgreSQL’s per-statementCachedPlanSource+SPI_keepplanis the analog, but PostgreSQL’s plan cache is per-backend (no cross-session sharing of plans), trading memory for simplicity and isolation. Oracle’s bind-by-name versus PostgreSQL’s positional$1..$n/Values[]is a surface difference; the deeper one is that Oracle shares compiled plans across sessions in the shared pool, a design PostgreSQL deliberately avoids. - SQL Server (SQLCLR
SqlContext/context connection). Managed code running inside SQL Server obtains a “context connection” that reuses the host’s transaction and session — conceptually identical to SPI inheriting the backend’s transaction. TheSPI_OPT_NONATOMICdistinction (can this code commit?) mirrors SQL Server’s rules about which CLR contexts may control transactions. - SQLite (
sqlite3_exec/ virtual tables). SQLite has no separate-process boundary at all, so its “embedded SQL” is the only API; its prepared-statement object (sqlite3_stmt) is the moral equivalent of anSPIPlanPtr, and itssqlite3_stepcursor maps toSPI_cursor_fetch. The absence of a connection stack reflects SQLite’s single-writer model versus PostgreSQL’s reentrant PL-calls-PL nesting.
Why the connection-stack-with-globals design persists
Section titled “Why the connection-stack-with-globals design persists”A recurring critique of SPI is its reliance on process globals (SPI_tuptable
et al.), which forces the save/restore dance and makes the API non-thread-safe
by construction. The justification is ergonomic: PL handler authors write
SPI_execute("..."); n = SPI_processed; without threading a handle through
every call, and PostgreSQL’s one-backend-per-connection process model means
there is never more than one SPI user per address space at a time anyway. The
stack-of-frames pattern is the minimal machinery that makes those globals safe
under nesting — a pragmatic choice that has survived essentially unchanged for
two decades precisely because the process model removes the thread-safety
pressure that would otherwise condemn it.
Research frontiers
Section titled “Research frontiers”- Compiling away the SPI boundary. Projects on SQL-to-native compilation
(HyPer/Umbra-style data-centric code generation, and PostgreSQL’s own JIT for
expression evaluation,
postgres-expression-eval.md) raise the question of whether the parse→plan→SPITupleTable→copy pipeline could be short-circuited for hot PL loops by fusing the PL body and the embedded query into one compiled unit, eliminating the per-rowExecCopySlotHeapTuple. No production PostgreSQL does this, but it is the obvious next efficiency frontier for PL-heavy workloads. - Plan-cache sharing. Whether per-backend plan caches should become
shared-memory-resident (as in Oracle’s shared pool) is a perennial design
debate; the SPI layer would be a principal beneficiary, since every PL
statement is a
SPI_keepplanconsumer. The counter-argument — invalidation complexity and cross-session contention — is why PostgreSQL has not moved. - Non-atomic execution and stored procedures. The
SPI_OPT_NONATOMICpath, added to support transaction-controlling procedures (CALL), is the youngest and least-settled part of SPI; the interaction of mid-procedureCOMMITwith pinned portals, held snapshots (HoldPinnedPortals,ForgetPortalSnapshots), and parallel workers remains an area of ongoing refinement.
Sources
Section titled “Sources”Primary source (REL_18, commit 273fe94, captured 2026-06-05):
src/backend/executor/spi.c— the entire SPI implementation: connection stack, prepare/execute/cursor paths, SPITupleTable receiver, snapshot and transaction management.src/include/executor/spi.h— public API surface,SPITupleTable,SPIExecuteOptions/SPIPrepareOptions, return-code constants.src/include/executor/spi_priv.h—_SPI_connectionand_SPI_plan(SPIPlanPtr) private structs,_SPI_PLAN_MAGIC.src/backend/utils/cache/plancache.c—GetCachedPlan,SaveCachedPlan,CachedPlanSetParentContext; SPI is a primary client (seepostgres-portals-prepared.md).src/backend/tcop/pquery.c—PortalStart,PortalRunFetchused by the SPI cursor path.
Theory and cross-references in this knowledge base:
knowledge/research/dbms-general/database-system-concepts.md— application development / embedded SQL / stored procedures framing.knowledge/research/dbms-general/database-internals.md— execution-context and buffer/storage interface background.knowledge/code-analysis/postgres/postgres-portals-prepared.md— portal and prepared-statement lifecycle (SPI cursors hand off here).knowledge/code-analysis/postgres/postgres-executor.md— the executor proper that SPI drives viaCreateQueryDesc/_SPI_pquery.knowledge/code-analysis/postgres/postgres-plpgsql.md— the principal SPI client; PL/pgSQL’s own state machine is deferred there.knowledge/code-analysis/postgres/postgres-mvcc-snapshots.md— the snapshot semantics SPI’s four regimes manipulate.knowledge/code-analysis/postgres/postgres-memory-contexts.md— the procCxt/execCxt/tuptabcxt lifetime model rests on this.knowledge/code-analysis/postgres/postgres-extensions.md— how extensions link against and call the SPI API.