PostgreSQL CustomScan — The Provider API for Pluggable Plan Nodes
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”A relational query plan is a tree of physical operators — Seq Scan, Index
Scan, Hash Join, Sort, Aggregate — each of which is an instance of the
iterator (a.k.a. Volcano) model: every operator exposes the same
open() / next() / close() interface and pulls tuples from its children one
at a time, so operators compose into a tree without any operator knowing the
concrete type of its neighbours. Graefe’s Volcano — An Extensible and
Parallel Query Evaluation System (Graefe 1994) is the canonical statement of
why this uniformity matters: because the iterator interface is the only
contract between operators, you can add a new operator — or a new
implementation of an existing logical operation — without touching the rest
of the engine, and you can interpose parallelism (an exchange operator)
transparently between any two operators. The iterator interface is, in
object-oriented terms, an abstract base class; each operator is a subclass
that fills in the virtual methods.
The same paper makes a second, subtler point that the CustomScan API depends on: query optimization and query execution are two different phases over two different representations. The optimizer reasons over paths (or “plans” in Volcano’s vocabulary) — cheap, cost-annotated descriptors that can be generated and discarded by the thousand — and only the single cheapest path is lowered into an executable plan tree that the executor walks. An extensible engine therefore needs an extension point in both representations: a way to propose a custom path during optimization (so the cost-based search can compare it against the built-in alternatives and reject it if it loses), and a way to materialize the winning custom path into a custom executor node. Graefe’s later Cascades framework (Graefe 1995) formalizes this as rules that map logical operators to physical implementation algorithms; PostgreSQL does not have a full Cascades rule engine, but its CustomPath → CustomScan → CustomScanState pipeline is the same three-representation idea (logical relation → physical path → executable plan → runtime state) with an extension seam cut into each transition.
The deeper architectural lesson comes from Hellerstein, Stonebraker &
Hamilton’s Architecture of a Database System (2007) and from the original
POSTGRES design papers (Stonebraker & Rowe 1986): a DBMS that expects to live
for decades must be extensible from outside the core. POSTGRES made the
type system and access methods extensible; modern PostgreSQL extends that
philosophy to the operators of the executor itself. The motivating use
cases are exactly the ones Graefe anticipated — a GPU-accelerated scan/join
(PG-Strom), a columnar or in-memory cache scan, a foreign-data push-down that
does not fit the FDW mould — each of which is a new physical implementation
of an existing logical operation (scan a relation, join two relations). The
design constraint is that none of this may require patching nodeSeqscan.c,
createplan.c, or the node-copy/serialize machinery: the extension must plug
into stable seams. The CustomScan API is precisely the set of seams that make
“add a physical operator from a loadable module” possible.
A final theoretical wrinkle is identity under serialization. A plan tree
in PostgreSQL is not a transient in-memory object: it is copied
(copyObject), it is serialized to text and read back (nodeToString /
stringToNode, used to ship plans to parallel workers and to cache them),
and it must survive these transformations with its behaviour intact. But the
“behaviour” of a custom node lives in C function pointers, which are
process-local addresses that cannot be copied or serialized meaningfully
across a fork() or a text round-trip. The classic solution — and the one
PostgreSQL uses — is a name-keyed method registry: the node stores a
stable string name, every process that loads the provider registers the same
name → vtable mapping, and the copy/serialize machinery re-resolves the vtable
by name on the far side. This is the same indirection a language runtime uses
for a vtable pointer, lifted to survive serialization.
Common DBMS Design
Section titled “Common DBMS Design”Most extensible engines converge on a small number of recurring patterns for “let an extension add a physical operator,” and PostgreSQL’s choices are best understood against that backdrop.
1. The method-vtable (provider) struct. The universal shape of an
operator-extension API is a struct of function pointers — a hand-rolled
vtable — that the extension fills in and hands to the core. The core never
calls the extension’s functions by name; it calls them through the vtable, so
the set of operations is fixed by the core (the struct layout) while the
implementation is owned by the extension. PostgreSQL splits this into three
vtables aligned with the three plan representations, rather than one fat
struct, because the lifetimes differ: a path vtable is consulted once per
candidate path, a scan-methods vtable once per chosen plan, and an exec
vtable many times during execution.
2. Cost-based admission. A well-behaved extensible optimizer does not let
an extension force its operator into the plan; it lets the extension
propose the operator with a cost, and the core’s existing cost-comparison
machinery decides whether it wins. PostgreSQL’s add_path() is exactly this
gate: a CustomPath competes on total_cost against Seq Scan, Index Scan, etc.,
and is pruned if dominated. This keeps the extension honest — a badly
costed custom operator simply never gets chosen — and means the extension
author only has to estimate a cost, not re-implement path pruning.
3. The hook seam. To get control during planning, the extension needs a
callback the core invokes at the right moment. The lightweight industry
pattern is a global function-pointer “hook” that defaults to NULL and is
invoked if set. PostgreSQL exposes set_rel_pathlist_hook (called after the
core has generated the built-in paths for a base relation) and
set_join_pathlist_hook (the analogous point for a join), so a provider can
add CustomPaths exactly where the core is about to finalize the path list.
(The general hook mechanism is covered in postgres-hooks.md; here it is just
the entry door.)
4. The opaque private-data channel. A custom operator needs to carry
arbitrary state from planning into execution — predicates it pushed down,
chosen algorithm parameters, GPU kernel identifiers. Engines either force this
into a void * blob (opaque, but then it cannot be copied or serialized) or
into the engine’s own node framework (copyable and serializable, but the
engine must know the type). PostgreSQL offers both: custom_private is a
List of ordinary copyable/serializable nodes for the common case, and the
extensible-node framework (T_ExtensibleNode) lets a provider define a
genuinely new node type that still round-trips through copyObject and
stringToNode via provider-supplied callbacks.
5. Capability flags instead of fat interfaces. Not every custom operator
supports backward scan, mark/restore, or parallelism. Rather than demand every
provider implement every method, the API uses a bitmask of capability flags
(CUSTOMPATH_SUPPORT_BACKWARD_SCAN, _MARK_RESTORE, _PROJECTION) plus
optional method-pointer slots that may be NULL. The executor shim checks the
pointer (or the flag) before dispatching and raises a clean
ERRCODE_FEATURE_NOT_SUPPORTED error if an unsupported capability is invoked.
This is the classic “narrow required core + wide optional surface” interface
design, and it keeps a minimal provider tiny (four required exec callbacks)
while letting an ambitious provider opt into parallel-aware DSM coordination.
flowchart TD
subgraph plan["Planning (paths)"]
H["set_rel_pathlist_hook /<br/>set_join_pathlist_hook"] -->|add_path| CP["CustomPath<br/>flags + custom_private<br/>methods: CustomPathMethods"]
CP -->|cheapest wins| WIN["chosen CustomPath"]
end
subgraph lower["Plan creation (lowering)"]
WIN --> CCP["create_customscan_plan"]
CCP -->|PlanCustomPath| CS["CustomScan plan node<br/>custom_exprs / custom_private<br/>methods: CustomScanMethods"]
end
subgraph exec["Execution (state)"]
CS -->|ExecInitCustomScan| CSS["CustomScanState<br/>methods: CustomExecMethods"]
CSS -->|ExecCustomScan loop| ROWS["TupleTableSlots"]
end
REG["name registry<br/>RegisterCustomScanMethods /<br/>RegisterExtensibleNodeMethods"] -.resolve by name.-> CS
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”PostgreSQL implements the operator-extension API as three method structs,
one per plan representation, declared together in src/include/nodes/extensible.h.
They are deliberately asymmetric in size, mirroring how often each is consulted.
The path-level vtable carries essentially one job — knowing how to turn a CustomPath into a plan — plus an optional reparameterization helper for partitionwise joins:
// CustomPathMethods — src/include/nodes/extensible.htypedef struct CustomPathMethods{ const char *CustomName;
/* Convert Path to a Plan */ struct Plan *(*PlanCustomPath) (PlannerInfo *root, RelOptInfo *rel, struct CustomPath *best_path, List *tlist, List *clauses, List *custom_plans); struct List *(*ReparameterizeCustomPathByChild) (PlannerInfo *root, List *custom_private, RelOptInfo *child_rel);} CustomPathMethods;The scan-methods vtable is even thinner — one callback to manufacture the executor state object from the plan node:
// CustomScanMethods — src/include/nodes/extensible.htypedef struct CustomScanMethods{ const char *CustomName;
/* Create execution state (CustomScanState) from a CustomScan plan node */ Node *(*CreateCustomScanState) (CustomScan *cscan);} CustomScanMethods;The execution-time vtable is the heavy one. Its first four callbacks are
required (the Volcano open/next/close plus rescan); the rest are optional
and gated by capability flags or NULL checks — mark/restore for plans that sit
under a Merge Join, the DSM-coordination quartet for parallel-aware providers,
a shutdown hook, and an EXPLAIN hook:
// CustomExecMethods — src/include/nodes/extensible.htypedef struct CustomExecMethods{ const char *CustomName;
/* Required executor methods */ void (*BeginCustomScan) (CustomScanState *node, EState *estate, int eflags); TupleTableSlot *(*ExecCustomScan) (CustomScanState *node); void (*EndCustomScan) (CustomScanState *node); void (*ReScanCustomScan) (CustomScanState *node);
/* Optional methods: needed if mark/restore is supported */ void (*MarkPosCustomScan) (CustomScanState *node); void (*RestrPosCustomScan) (CustomScanState *node);
/* Optional methods: needed if parallel execution is supported */ Size (*EstimateDSMCustomScan) (CustomScanState *node, ParallelContext *pcxt); void (*InitializeDSMCustomScan) (CustomScanState *node, ParallelContext *pcxt, void *coordinate); void (*ReInitializeDSMCustomScan) (CustomScanState *node, ParallelContext *pcxt, void *coordinate); void (*InitializeWorkerCustomScan) (CustomScanState *node, shm_toc *toc, void *coordinate); void (*ShutdownCustomScan) (CustomScanState *node);
/* Optional: print additional information in EXPLAIN */ void (*ExplainCustomScan) (CustomScanState *node, List *ancestors, ExplainState *es);} CustomExecMethods;The three structs are pointed at from the three node types. The crucial design
decision — stated as a load-bearing comment in the headers — is that the
methods field is a pointer to a static table the core never copies. The
CustomScan plan node says so explicitly:
// CustomScan — src/include/nodes/plannodes.htypedef struct CustomScan{ Scan scan; uint32 flags; /* mask of CUSTOMPATH_* flags */ List *custom_plans; /* list of child Plan nodes, if any */ List *custom_exprs; /* expressions that custom code may evaluate */ List *custom_private; /* private data for custom code */ List *custom_scan_tlist; /* optional tlist describing scan tuple */ Bitmapset *custom_relids; /* RTIs generated by this scan */
/* * NOTE: The method field of CustomScan is required to be a pointer to a * static table of callback functions. So we don't copy the table itself, * just reference the original one. */ const struct CustomScanMethods *methods;} CustomScan;Because the vtable is never copied and is just an address, it cannot survive
serialization to a parallel worker or a text round-trip. PostgreSQL closes
that gap with a process-local, name-keyed registry: the provider calls
RegisterCustomScanMethods(methods) at module load, the registry stores the
vtable under methods->CustomName, and the node only ever serializes the
name. On the far side, copyfuncs/readfuncs call GetCustomScanMethods(name)
to re-resolve the vtable. The capability flags that thread through all three
representations are a single bitmask:
// capability flags — src/include/nodes/extensible.h#define CUSTOMPATH_SUPPORT_BACKWARD_SCAN 0x0001#define CUSTOMPATH_SUPPORT_MARK_RESTORE 0x0002#define CUSTOMPATH_SUPPORT_PROJECTION 0x0004The flow a provider drives looks like this end to end:
flowchart TD
A["_PG_init: RegisterCustomScanMethods(&scan_methods)<br/>install set_rel_pathlist_hook"] --> B["planner reaches base rel<br/>set_rel_pathlist() done with core paths"]
B --> C["hook fires: provider builds CustomPath<br/>flags, custom_private, methods=CustomPathMethods<br/>add_path(rel, custompath)"]
C --> D{"cheapest total_cost?"}
D -->|no| X["CustomPath discarded"]
D -->|yes| E["create_plan_recurse → create_customscan_plan"]
E -->|methods->PlanCustomPath| F["CustomScan plan node<br/>methods=CustomScanMethods"]
F --> G["ExecInitNode → ExecInitCustomScan<br/>methods->CreateCustomScanState"]
G --> H["CustomScanState<br/>methods=CustomExecMethods"]
H -->|BeginCustomScan| I["per-tuple: ExecCustomScan loop"]
I -->|EndCustomScan| J["teardown"]
The provider author’s surface area is small and well-bounded: register two vtables (scan + exec) by name, optionally a third (extensible-node) for private node types, set one planner hook, and fill in the required executor callbacks. Everything else — cost comparison, plan copying, parallel-worker plan shipping, EXPLAIN tree-walking — is handled by the core through the seams described next.
Source Walkthrough
Section titled “Source Walkthrough”The CustomScan machinery is spread thin across the three phases of query
processing. This walkthrough follows a tuple’s-eye journey: how a provider
registers its vtables, how a CustomPath is injected during planning, how it
is lowered to a CustomScan plan node, how that node is instantiated and
driven by the executor shim in nodeCustom.c, how parallel coordination
works, how EXPLAIN renders the node, and finally how the extensible-node
registry lets private node types round-trip through copy/serialize.
1. Registration — the name-keyed vtable registry (extensible.c)
Section titled “1. Registration — the name-keyed vtable registry (extensible.c)”Everything begins with a provider registering its vtables, normally from
_PG_init(). Both registration paths funnel through one internal helper that
lazily creates a string-keyed hash and rejects duplicate names:
// RegisterExtensibleNodeEntry — src/backend/nodes/extensible.cstatic voidRegisterExtensibleNodeEntry(HTAB **p_htable, const char *htable_label, const char *extnodename, const void *extnodemethods){ ExtensibleNodeEntry *entry; bool found;
if (*p_htable == NULL) { HASHCTL ctl; ctl.keysize = EXTNODENAME_MAX_LEN; ctl.entrysize = sizeof(ExtensibleNodeEntry); *p_htable = hash_create(htable_label, 100, &ctl, HASH_ELEM | HASH_STRINGS); }
if (strlen(extnodename) >= EXTNODENAME_MAX_LEN) elog(ERROR, "extensible node name is too long");
entry = (ExtensibleNodeEntry *) hash_search(*p_htable, extnodename, HASH_ENTER, &found); if (found) ereport(ERROR, (errcode(ERRCODE_DUPLICATE_OBJECT), errmsg("extensible node type \"%s\" already exists", extnodename)));
entry->extnodemethods = extnodemethods;}There are two independent hash tables — custom_scan_methods and
extensible_node_methods — both file-static HTAB * initialized to NULL.
RegisterCustomScanMethods keys on methods->CustomName; the lookup side,
GetCustomScanMethods, re-resolves it. Note the asymmetric missing_ok
default through GetExtensibleNodeEntry: a miss with missing_ok == false
raises ERRCODE_UNDEFINED_OBJECT rather than returning NULL, so a node that
serialized a name the far side never registered fails loudly:
// GetExtensibleNodeEntry / GetCustomScanMethods — src/backend/nodes/extensible.cstatic const void *GetExtensibleNodeEntry(HTAB *htable, const char *extnodename, bool missing_ok){ ExtensibleNodeEntry *entry = NULL;
if (htable != NULL) entry = (ExtensibleNodeEntry *) hash_search(htable, extnodename, HASH_FIND, NULL); if (!entry) { if (missing_ok) return NULL; ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg("ExtensibleNodeMethods \"%s\" was not registered", extnodename))); } return entry->extnodemethods;}The registry is process-local. In a parallel query each worker re-runs the
provider’s _PG_init (the library is listed in a shared-preload or loaded on
demand), so the same name → vtable mapping exists in every backend that will
deserialize the plan. This is what makes the next steps’ “serialize only the
name” strategy correct.
2. Injection — getting control during planning (allpaths.c)
Section titled “2. Injection — getting control during planning (allpaths.c)”A provider does not call into the planner; the planner calls out to the provider through a global hook, fired right after the core has generated all built-in paths for a base relation:
// set_rel_pathlist (excerpt) — src/backend/optimizer/path/allpaths.c/* * Allow a plugin to editorialize on the set of Paths for this base * relation. It could add new paths (such as CustomPaths) by calling * add_path(), or add_partial_path() if parallel aware. */if (set_rel_pathlist_hook) (*set_rel_pathlist_hook) (root, rel, rti, rte);The hook signature hands the provider everything it needs to build and cost a
CustomPath — the PlannerInfo, the target RelOptInfo, the range-table index,
and the RTE:
// set_rel_pathlist_hook_type — src/include/optimizer/paths.htypedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte);extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;The join analogue, set_join_pathlist_hook, fires inside
add_paths_to_joinrel() and receives the outer/inner rels plus the join type,
so a provider can replace a join implementation (e.g. a GPU hash join). The
CustomPath the provider builds is a lightweight path node carrying child paths,
restrict-info, and the provider’s private list:
// CustomPath — src/include/nodes/pathnodes.htypedef struct CustomPath{ Path path; uint32 flags; /* mask of CUSTOMPATH_* flags */ List *custom_paths; /* list of child Path nodes, if any */ List *custom_restrictinfo; List *custom_private; const struct CustomPathMethods *methods;} CustomPath;The provider hands this to add_path(rel, (Path *) custompath). From here the
CustomPath is just another candidate: it competes on cost and is pruned if a
cheaper path dominates it. The provider never has to touch the path-pruning
logic — it only has to set path.total_cost honestly.
3. Lowering — CustomPath to CustomScan (createplan.c)
Section titled “3. Lowering — CustomPath to CustomScan (createplan.c)”If the CustomPath wins, create_plan_recurse dispatches to
create_customscan_plan, which recursively lowers any child paths, orders the
scan clauses, and then calls the provider’s PlanCustomPath to produce the
actual plan node — the core does not construct the CustomScan itself:
// create_customscan_plan — src/backend/optimizer/plan/createplan.cstatic CustomScan *create_customscan_plan(PlannerInfo *root, CustomPath *best_path, List *tlist, List *scan_clauses){ CustomScan *cplan; RelOptInfo *rel = best_path->path.parent; List *custom_plans = NIL; ListCell *lc;
/* Recursively transform child paths. */ foreach(lc, best_path->custom_paths) { Plan *plan = create_plan_recurse(root, (Path *) lfirst(lc), CP_EXACT_TLIST); custom_plans = lappend(custom_plans, plan); }
scan_clauses = order_qual_clauses(root, scan_clauses);
/* Invoke custom plan provider to create the Plan node. */ cplan = castNode(CustomScan, best_path->methods->PlanCustomPath(root, rel, best_path, tlist, scan_clauses, custom_plans));
/* Copy cost data from Path to Plan ... */ copy_generic_path_info(&cplan->scan.plan, &best_path->path); cplan->custom_relids = best_path->path.parent->relids;
if (best_path->path.param_info) { cplan->scan.plan.qual = (List *) replace_nestloop_params(root, (Node *) cplan->scan.plan.qual); cplan->custom_exprs = (List *) replace_nestloop_params(root, (Node *) cplan->custom_exprs); } return cplan;}Two division-of-labour details are worth flagging. First, the core fills in
the generic cost fields (copy_generic_path_info) and the relids after the
provider returns, so the provider’s PlanCustomPath need only populate the
custom-specific fields. Second, replace_nestloop_params rewrites
outer-relation Vars into nestloop params in both the qual and custom_exprs,
so a custom scan placed on the inner side of a parameterized nestloop gets its
parameters wired up without the provider doing anything — but note the core
assumes custom_scan_tlist contains no such Vars.
4. Instantiation & the executor shim (nodeCustom.c)
Section titled “4. Instantiation & the executor shim (nodeCustom.c)”At execution start, ExecInitNode dispatches T_CustomScan to
ExecInitCustomScan. This is the most substantial function in nodeCustom.c,
and it is where the provider-allocated state object is woven into the standard
ScanState framework. Critically, the provider allocates the state (so it
can embed CustomScanState as the first field of a larger struct), and the
shim then fills the standard fields:
// ExecInitCustomScan (condensed) — src/backend/executor/nodeCustom.cCustomScanState *ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags){ CustomScanState *css; const TupleTableSlotOps *slotOps; Relation scan_rel = NULL; Index scanrelid = cscan->scan.scanrelid; int tlistvarno;
/* Provider does the palloc and sets node tag + methods. */ css = castNode(CustomScanState, cscan->methods->CreateCustomScanState(cscan)); css->flags = cscan->flags;
css->ss.ps.plan = &cscan->scan.plan; css->ss.ps.state = estate; css->ss.ps.ExecProcNode = ExecCustomScan; ExecAssignExprContext(estate, &css->ss.ps);
/* open the scan relation, if any */ if (scanrelid > 0) { scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags); css->ss.ss_currentRelation = scan_rel; }
/* Use a custom slot if specified, else a virtual slot. */ slotOps = css->slotOps; if (!slotOps) slotOps = &TTSOpsVirtual;
if (cscan->custom_scan_tlist != NIL || scan_rel == NULL) { TupleDesc scan_tupdesc = ExecTypeFromTL(cscan->custom_scan_tlist); ExecInitScanTupleSlot(estate, &css->ss, scan_tupdesc, slotOps); tlistvarno = INDEX_VAR; /* Vars carry varno = INDEX_VAR */ } else { ExecInitScanTupleSlot(estate, &css->ss, RelationGetDescr(scan_rel), slotOps); tlistvarno = scanrelid; }
ExecInitResultTupleSlotTL(&css->ss.ps, &TTSOpsVirtual); ExecAssignScanProjectionInfoWithVarno(&css->ss, tlistvarno); css->ss.ps.qual = ExecInitQual(cscan->scan.plan.qual, (PlanState *) css);
/* Provider finishes initialization. */ css->methods->BeginCustomScan(css, estate, eflags); return css;}Three behaviours encoded here are part of the provider contract. (a) The
provider sets the node tag and methods inside CreateCustomScanState; the
shim only asserts via castNode. (b) The scan tuple type is taken from
custom_scan_tlist when present (a join-style custom scan with no single base
relation, scanrelid == 0), otherwise from the base relation’s rowtype — and
the targetlist’s varno is set accordingly (INDEX_VAR vs scanrelid). (c)
A provider may install its own TupleTableSlotOps via css->slotOps;
otherwise the shim defaults to a virtual slot.
The per-tuple driver is a one-liner that forwards to the provider’s required
ExecCustomScan, guarding interrupts at the standard executor cadence:
// ExecCustomScan — src/backend/executor/nodeCustom.cstatic TupleTableSlot *ExecCustomScan(PlanState *pstate){ CustomScanState *node = castNode(CustomScanState, pstate); CHECK_FOR_INTERRUPTS(); Assert(node->methods->ExecCustomScan != NULL); return node->methods->ExecCustomScan(node);}ExecEndCustomScan and ExecReScanCustomScan are equally thin forwarders,
each Assert-ing that the required callback is non-NULL before dispatching.
The CustomScanState struct itself is where provider state hangs:
// CustomScanState — src/include/nodes/execnodes.htypedef struct CustomScanState{ ScanState ss; uint32 flags; /* mask of CUSTOMPATH_* flags */ List *custom_ps; /* list of child PlanState nodes, if any */ Size pscan_len; /* size of parallel coordination information */ const struct CustomExecMethods *methods; const struct TupleTableSlotOps *slotOps;} CustomScanState;5. Optional capabilities — mark/restore and the NULL-check guard
Section titled “5. Optional capabilities — mark/restore and the NULL-check guard”The optional methods are gated. Mark/restore is only meaningful under a Merge Join, and a provider that did not implement it must fail cleanly rather than crash on a NULL pointer. The shim turns the missing callback into a proper SQL error:
// ExecCustomMarkPos — src/backend/executor/nodeCustom.cvoidExecCustomMarkPos(CustomScanState *node){ if (!node->methods->MarkPosCustomScan) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("custom scan \"%s\" does not support MarkPos", node->methods->CustomName))); node->methods->MarkPosCustomScan(node);}This is the runtime half of the CUSTOMPATH_SUPPORT_MARK_RESTORE flag: the
planner only places a custom scan under a Merge Join if that flag is set, and
the executor backstops it with this guard so a misbehaving provider that
advertised the flag but left the callback NULL still gets a comprehensible
error rather than a segfault.
6. Parallel coordination — the DSM quartet
Section titled “6. Parallel coordination — the DSM quartet”A parallel-aware provider implements four optional callbacks that the shim invokes during parallel setup. The estimate/initialize pair is representative: each is a no-op if the provider left the callback NULL, and otherwise the shim manages the shared-memory TOC bookkeeping while the provider fills the chunk:
// ExecCustomScanEstimate / ExecCustomScanInitializeDSM — src/backend/executor/nodeCustom.cvoidExecCustomScanEstimate(CustomScanState *node, ParallelContext *pcxt){ const CustomExecMethods *methods = node->methods; if (methods->EstimateDSMCustomScan) { node->pscan_len = methods->EstimateDSMCustomScan(node, pcxt); shm_toc_estimate_chunk(&pcxt->estimator, node->pscan_len); shm_toc_estimate_keys(&pcxt->estimator, 1); }}
voidExecCustomScanInitializeDSM(CustomScanState *node, ParallelContext *pcxt){ const CustomExecMethods *methods = node->methods; if (methods->InitializeDSMCustomScan) { int plan_node_id = node->ss.ps.plan->plan_node_id; void *coordinate = shm_toc_allocate(pcxt->toc, node->pscan_len); methods->InitializeDSMCustomScan(node, pcxt, coordinate); shm_toc_insert(pcxt->toc, plan_node_id, coordinate); }}The chunk is keyed by plan_node_id in the TOC, and the worker side
(ExecCustomScanInitializeWorker) looks it up by that same id — so the leader
and every worker share one coordination region per custom-scan node.
ExecShutdownCustomScan lets the provider drain results from workers before
the DSM segment is torn down. All four are NULL-tolerant, so a non-parallel
provider simply leaves them unset and the parallel machinery skips it.
7. EXPLAIN integration (explain.c)
Section titled “7. EXPLAIN integration (explain.c)”When EXPLAIN walks a plan tree and reaches a CustomScan, it shows the standard scan qual and then defers to the optional provider hook for any extra detail:
// ExplainNode (T_CustomScan case) — src/backend/commands/explain.ccase T_CustomScan: { CustomScanState *css = (CustomScanState *) planstate; show_scan_qual(plan->qual, "Filter", planstate, ancestors, es); if (plan->qual) show_instrumentation_count("Rows Removed by Filter", 1, planstate, es); if (css->methods->ExplainCustomScan) css->methods->ExplainCustomScan(css, ancestors, es); } break;The node label itself comes from the vtable’s CustomName
(((CustomScan *) plan)->methods->CustomName in ExplainNode’s node-name
switch), and child plans stored in custom_ps are recursed via
ExplainCustomChildren, which labels them “child”/“children” and re-enters
ExplainNode. So a provider with sub-plans gets a properly nested EXPLAIN tree
for free.
8. Extensible nodes — round-tripping private types (copyfuncs/outfuncs/readfuncs)
Section titled “8. Extensible nodes — round-tripping private types (copyfuncs/outfuncs/readfuncs)”The extensible-node framework is the second consumer of the name registry. A
provider that wants a genuinely new node type (beyond stuffing ordinary nodes
into custom_private) tags it T_ExtensibleNode with an extnodename, and
registers an ExtensibleNodeMethods vtable of four serialization callbacks:
// ExtensibleNodeMethods — src/include/nodes/extensible.htypedef struct ExtensibleNodeMethods{ const char *extnodename; Size node_size; void (*nodeCopy) (struct ExtensibleNode *newnode, const struct ExtensibleNode *oldnode); bool (*nodeEqual) (const struct ExtensibleNode *a, const struct ExtensibleNode *b); void (*nodeOut) (struct StringInfoData *str, const struct ExtensibleNode *node); void (*nodeRead) (struct ExtensibleNode *node);} ExtensibleNodeMethods;The core’s copyObject, nodeToString, and stringToNode each special-case
T_ExtensibleNode by re-resolving the vtable by name and dispatching to the
provider’s callback. Copy allocates node_size bytes (the provider’s possibly
larger struct), copies the name field generically, then hands off the private
fields:
// _copyExtensibleNode — src/backend/nodes/copyfuncs.cstatic ExtensibleNode *_copyExtensibleNode(const ExtensibleNode *from){ ExtensibleNode *newnode; const ExtensibleNodeMethods *methods;
methods = GetExtensibleNodeMethods(from->extnodename, false); newnode = (ExtensibleNode *) newNode(methods->node_size, T_ExtensibleNode); COPY_STRING_FIELD(extnodename); methods->nodeCopy(newnode, from); /* copy the private fields */ return newnode;}Read is the symmetric operation: it pulls the :extnodename token, resolves
the vtable, allocates node_size, and lets the provider reconstruct its
private fields from the token stream:
// _readExtensibleNode — src/backend/nodes/readfuncs.cstatic ExtensibleNode *_readExtensibleNode(void){ const ExtensibleNodeMethods *methods; ExtensibleNode *local_node; const char *extnodename; READ_TEMP_LOCALS();
token = pg_strtok(&length); /* skip :extnodename */ token = pg_strtok(&length); /* get extnodename */ extnodename = nullable_string(token, length); if (!extnodename) elog(ERROR, "extnodename has to be supplied"); methods = GetExtensibleNodeMethods(extnodename, false);
local_node = (ExtensibleNode *) newNode(methods->node_size, T_ExtensibleNode); local_node->extnodename = extnodename; methods->nodeRead(local_node); /* deserialize the private fields */ READ_DONE();}_outExtensibleNode (writes EXTENSIBLENODE + the name + methods->nodeOut)
and _equalExtensibleNode (compares the name, then methods->nodeEqual)
complete the quartet. The header comment is emphatic that all four callbacks
are mandatory — there is no default serialization for a type the core knows
nothing about. This is the mechanism that lets a CustomScan ship to a parallel
worker with arbitrarily complex provider-private state intact: the state is
extensible nodes inside custom_private, and the whole custom_private list
round-trips through the standard node serializer because each element resolves
its own vtable by name on the worker side.
Position hints (as of 2026-06-05, REL_18 273fe94)
Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”| Symbol | File | Line |
|---|---|---|
EXTNODENAME_MAX_LEN | src/include/nodes/extensible.h | 24 |
ExtensibleNodeMethods | src/include/nodes/extensible.h | 62 |
CUSTOMPATH_SUPPORT_BACKWARD_SCAN | src/include/nodes/extensible.h | 84 |
CustomPathMethods | src/include/nodes/extensible.h | 92 |
CustomScanMethods | src/include/nodes/extensible.h | 112 |
CustomExecMethods | src/include/nodes/extensible.h | 124 |
RegisterCustomScanMethods (decl) | src/include/nodes/extensible.h | 160 |
RegisterExtensibleNodeEntry | src/backend/nodes/extensible.c | 39 |
RegisterExtensibleNodeMethods | src/backend/nodes/extensible.c | 76 |
RegisterCustomScanMethods | src/backend/nodes/extensible.c | 88 |
GetExtensibleNodeEntry | src/backend/nodes/extensible.c | 100 |
GetExtensibleNodeMethods | src/backend/nodes/extensible.c | 125 |
GetCustomScanMethods | src/backend/nodes/extensible.c | 137 |
ExecInitCustomScan | src/backend/executor/nodeCustom.c | 26 |
ExecCustomScan (driver) | src/backend/executor/nodeCustom.c | 114 |
ExecEndCustomScan | src/backend/executor/nodeCustom.c | 125 |
ExecReScanCustomScan | src/backend/executor/nodeCustom.c | 132 |
ExecCustomMarkPos | src/backend/executor/nodeCustom.c | 139 |
ExecCustomScanEstimate | src/backend/executor/nodeCustom.c | 161 |
ExecCustomScanInitializeDSM | src/backend/executor/nodeCustom.c | 174 |
ExecCustomScanInitializeWorker | src/backend/executor/nodeCustom.c | 205 |
ExecShutdownCustomScan | src/backend/executor/nodeCustom.c | 221 |
CustomScan (struct) | src/include/nodes/plannodes.h | 864 |
CustomPath (struct) | src/include/nodes/pathnodes.h | 2038 |
CustomScanState (struct) | src/include/nodes/execnodes.h | 2125 |
set_rel_pathlist_hook (invoke) | src/backend/optimizer/path/allpaths.c | 538 |
set_join_pathlist_hook (invoke) | src/backend/optimizer/path/joinpath.c | 342 |
create_customscan_plan | src/backend/optimizer/plan/createplan.c | 4269 |
_copyExtensibleNode | src/backend/nodes/copyfuncs.c | 147 |
_outExtensibleNode | src/backend/nodes/outfuncs.c | 490 |
_readExtensibleNode | src/backend/nodes/readfuncs.c | 537 |
_equalExtensibleNode | src/backend/nodes/equalfuncs.c | 117 |
EXPLAIN T_CustomScan case | src/backend/commands/explain.c | 2146 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”All claims in this doc were checked against the REL_18_STABLE working tree at
/data/hgryoo/references/postgres, commit 273fe94852b3a7e34fd171e8abdf1481beb302fa
(PostgreSQL 18.x). Verification notes:
-
Three method structs and their callbacks —
CustomPathMethods,CustomScanMethods, andCustomExecMethodswere read in full fromsrc/include/nodes/extensible.h.CustomPathMethodshas exactly two callbacks (PlanCustomPath,ReparameterizeCustomPathByChild);CustomScanMethodshas exactly one (CreateCustomScanState);CustomExecMethodshas four required (BeginCustomScan,ExecCustomScan,EndCustomScan,ReScanCustomScan) and eight optional callbacks (MarkPosCustomScan,RestrPosCustomScan, the four DSM callbacks,ShutdownCustomScan,ExplainCustomScan). The header comment “All callbacks are mandatory” applies toExtensibleNodeMethods, not to the optionalCustomExecMethodsslots — confirmed by reading both struct comments. -
Capability flags — exactly three are defined (
CUSTOMPATH_SUPPORT_BACKWARD_SCAN = 0x0001,_MARK_RESTORE = 0x0002,_PROJECTION = 0x0004).EXTNODENAME_MAX_LENis64. Verified verbatim. -
The “static vtable, never copied” contract — the comment is present verbatim in the
CustomScanstruct insrc/include/nodes/plannodes.hand is the stated reason the registry exists. Confirmed. -
Two separate registry hashes —
extensible.cdeclares two file-staticHTAB *(extensible_node_methods,custom_scan_methods), both routed throughRegisterExtensibleNodeEntry/GetExtensibleNodeEntry. The duplicate-name check raisesERRCODE_DUPLICATE_OBJECT; the missing-name lookup withmissing_ok == falseraisesERRCODE_UNDEFINED_OBJECT. Verified. -
Hook invocation points —
set_rel_pathlist_hookis invoked inset_rel_pathlist()(allpaths.c) andset_join_pathlist_hookinadd_paths_to_joinrel()(joinpath.c). Both arePGDLLIMPORTglobals defaulting to NULL. Confirmed by reading both call sites and the declarations insrc/include/optimizer/paths.h. -
Lowering —
create_customscan_plan(createplan.c) callsbest_path->methods->PlanCustomPath(...), thencopy_generic_path_infoand thereplace_nestloop_paramsrewrite ofqualandcustom_exprs. The function isstaticand reached fromcreate_scan_plan’sT_CustomScandispatch. Verified. -
Executor shim — every function quoted from
nodeCustom.c(ExecInitCustomScan, the staticExecCustomScandriver,ExecEndCustomScan,ExecReScanCustomScan,ExecCustomMarkPos, the DSM quartet,ExecShutdownCustomScan) was read line-for-line. TheINDEX_VARvsscanrelidtargetlist-varno branch, thecss->slotOpsdefault toTTSOpsVirtual, and the provider-allocates-state contract are all literal. -
EXPLAIN — the
T_CustomScancase inExplainNode(explain.c) shows the Filter qual then dispatches to the optionalExplainCustomScan; the node name ismethods->CustomName; children incustom_psrecurse throughExplainCustomChildren. Confirmed. -
Extensible-node serialization —
_copyExtensibleNode(copyfuncs.c),_outExtensibleNode(outfuncs.c),_readExtensibleNode(readfuncs.c), and_equalExtensibleNode(equalfuncs.c) each resolve the vtable viaGetExtensibleNodeMethods(name, false)and dispatch to the provider callback. Verified. -
Scope guard (REL_18, no PG19-only claims) — this doc asserts only the CustomScan/extensible-node surface as it exists in REL_18. No PG19-only items (e.g. XLOG2 rmgr, online-checksum BackendTypes) are referenced.
contrib/is out of scope; PG-Strom and similar are named only as external examples of the API’s intended use, not analyzed.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”The FDW sibling, and why both exist. PostgreSQL has two “scan from
outside the core” mechanisms: Foreign Data Wrappers (postgres-fdw.md) and
CustomScan. They are deliberately parallel in shape — the CustomScan struct’s
own header comment says its custom_exprs / custom_private / custom_scan_tlist / custom_relids fields work “equally” to ForeignScan’s fdw_* fields. The
difference is scope. An FDW is bound to a foreign table through SQL DDL
(CREATE FOREIGN TABLE, a handler returning FdwRoutine) and the planner
invokes it only for that table’s RTE; it is the right tool for “this relation
lives elsewhere.” A CustomScan is bound to nothing in the catalog — it is
injected by a planner hook and can replace an arbitrary scan or join node, can
have scanrelid == 0 (representing a join over several base relations), and can
carry child plans. The rule of thumb: FDW for a relation with an external home;
CustomScan for a new implementation of an operation over ordinary local
relations (GPU execution, columnar cache, vectorized join). The canonical
external consumer, PG-Strom, uses CustomScan precisely because it
reimplements scans/joins/aggregates over normal heap tables on a GPU — there is
no “foreign” relation involved.
Cascades and the rule-driven alternative. SQL Server, Greenplum’s ORCA, and CockroachDB build their optimizers on Graefe’s Cascades model, where adding a physical operator means adding an implementation rule that the memo-driven search applies; the new operator competes inside a uniform rule framework rather than through an out-of-band hook. PostgreSQL’s optimizer is not Cascades — it is a bottom-up dynamic-programming join search — so it lacks a rule registry, and the CustomScan hook is the pragmatic substitute: instead of “register a rule,” you “set a hook and call add_path.” The trade-off is honesty by cost (the add_path gate prunes a dominated custom path just like a Cascades cost bound would) versus the expressiveness of true logical-rewrite rules, which CustomScan cannot express — a custom node can only be a physical alternative for a relation the core already identified, not a logical transformation of the query.
Codegen and vectorized execution as the modern frontier. The research
motivation Graefe gave for extensibility — “some kind of logic we haven’t
dreamed up yet,” quoted almost verbatim in the CustomPath header comment — has
in practice meant two things since 2014: (1) query compilation (Neumann’s
Efficiently Compiling Efficient Query Plans for Modern Hardware, VLDB 2011;
the HyPer/Umbra lineage), where operators are JIT-compiled into tight loops
rather than interpreted through the iterator dispatch, and (2) vectorized
execution (the MonetDB/X100 and Vectorwise line), where operators process
column batches instead of one tuple per next(). CustomScan is the seam through
which a PostgreSQL extension can smuggle either model into an otherwise
tuple-at-a-time interpreted executor: the provider’s ExecCustomScan can run a
compiled or vectorized kernel internally and hand back ordinary
TupleTableSlots at the boundary, so the surrounding plan tree never knows. The
optional slotOps field and the DSM-coordination quartet are exactly what such
a provider needs — a custom slot type for a columnar batch, and shared-memory
coordination for parallel kernels.
Limits and frontier friction. The boundary cost of CustomScan is real: every
tuple still crosses the TupleTableSlot interface at the node’s edges, so a
vectorized provider pays a re-tuplification tax whenever its parent is a stock
operator. Projects that want end-to-end vectorization (e.g. column-store
extensions) end up wanting several adjacent custom nodes so batches stay in
columnar form across operator boundaries — which CustomScan permits (via
custom_plans children) but does not make ergonomic. The other friction point
is that the planner only offers a custom path where its hook fires; there is no
way for a provider to introduce a custom node at a plan position the core never
considered (e.g. a novel two-phase aggregate shape) without also influencing
upstream path generation. These are the open edges where PostgreSQL’s
hook-plus-cost extensibility model is visibly less general than a full Cascades
rule engine — the recurring theme of POSTGRES-lineage extensibility: maximal
reach into the type and access-method layers, more constrained reach into the
optimizer search itself.
Sources
Section titled “Sources”- PostgreSQL source, REL_18_STABLE @
273fe94(/data/hgryoo/references/postgres):src/backend/executor/nodeCustom.c— the executor shim (init/exec/end/rescan, mark-restore guard, DSM quartet, shutdown).src/backend/nodes/extensible.c— the name-keyed vtable registry (register/get for both custom-scan and extensible-node methods).src/include/nodes/extensible.h— the three method structs,ExtensibleNodeMethods, capability flags,EXTNODENAME_MAX_LEN.src/include/nodes/plannodes.h—CustomScanplan node and the “static vtable, never copied” contract.src/include/nodes/pathnodes.h—CustomPath.src/include/nodes/execnodes.h—CustomScanState.src/backend/optimizer/plan/createplan.c—create_customscan_plan(lowering).src/backend/optimizer/path/allpaths.c,.../path/joinpath.c—set_rel_pathlist_hook/set_join_pathlist_hookinvocation.src/include/optimizer/paths.h— hook type declarations.src/backend/commands/explain.c— theT_CustomScanEXPLAIN case andExplainCustomChildren.src/backend/nodes/copyfuncs.c,outfuncs.c,readfuncs.c,equalfuncs.c— extensible-node copy/out/read/equal.
- Theory anchors (see
dbms-papers/andresearch/dbms-general/):- Graefe, Volcano — An Extensible and Parallel Query Evaluation System (IEEE TKDE, 1994) — iterator model, path-vs-plan separation, extensibility.
- Graefe, The Cascades Framework for Query Optimization (1995) — rule-driven physical-operator implementation, the comparative frame in §6.
- Hellerstein, Stonebraker & Hamilton, Architecture of a Database System (2007) — extensibility as a long-lived-DBMS requirement.
- Stonebraker & Rowe, The Design of POSTGRES (1986) — the extensibility philosophy CustomScan inherits.
- Neumann, Efficiently Compiling Efficient Query Plans for Modern Hardware (VLDB 2011) — codegen frontier referenced in §6.
- Sibling docs (cross-reference, not duplicated here):
postgres-fdw.md(the FDW mechanism),postgres-executor.md(the surrounding executor/ScanState framework),postgres-planner-overview.mdandpostgres-path-generation.md(path generation and add_path),postgres-hooks.md(the general planner-hook mechanism),postgres-node-trees.md(copy/serialize infrastructure),postgres-parallel-query.md(the DSM/parallel-worker machinery the DSM quartet plugs into).