Skip to content

PostgreSQL CustomScan — The Provider API for Pluggable Plan Nodes

Contents:

A relational query plan is a tree of physical operators — Seq Scan, Index Scan, Hash Join, Sort, Aggregate — each of which is an instance of the iterator (a.k.a. Volcano) model: every operator exposes the same open() / next() / close() interface and pulls tuples from its children one at a time, so operators compose into a tree without any operator knowing the concrete type of its neighbours. Graefe’s Volcano — An Extensible and Parallel Query Evaluation System (Graefe 1994) is the canonical statement of why this uniformity matters: because the iterator interface is the only contract between operators, you can add a new operator — or a new implementation of an existing logical operation — without touching the rest of the engine, and you can interpose parallelism (an exchange operator) transparently between any two operators. The iterator interface is, in object-oriented terms, an abstract base class; each operator is a subclass that fills in the virtual methods.

The same paper makes a second, subtler point that the CustomScan API depends on: query optimization and query execution are two different phases over two different representations. The optimizer reasons over paths (or “plans” in Volcano’s vocabulary) — cheap, cost-annotated descriptors that can be generated and discarded by the thousand — and only the single cheapest path is lowered into an executable plan tree that the executor walks. An extensible engine therefore needs an extension point in both representations: a way to propose a custom path during optimization (so the cost-based search can compare it against the built-in alternatives and reject it if it loses), and a way to materialize the winning custom path into a custom executor node. Graefe’s later Cascades framework (Graefe 1995) formalizes this as rules that map logical operators to physical implementation algorithms; PostgreSQL does not have a full Cascades rule engine, but its CustomPath → CustomScan → CustomScanState pipeline is the same three-representation idea (logical relation → physical path → executable plan → runtime state) with an extension seam cut into each transition.

The deeper architectural lesson comes from Hellerstein, Stonebraker & Hamilton’s Architecture of a Database System (2007) and from the original POSTGRES design papers (Stonebraker & Rowe 1986): a DBMS that expects to live for decades must be extensible from outside the core. POSTGRES made the type system and access methods extensible; modern PostgreSQL extends that philosophy to the operators of the executor itself. The motivating use cases are exactly the ones Graefe anticipated — a GPU-accelerated scan/join (PG-Strom), a columnar or in-memory cache scan, a foreign-data push-down that does not fit the FDW mould — each of which is a new physical implementation of an existing logical operation (scan a relation, join two relations). The design constraint is that none of this may require patching nodeSeqscan.c, createplan.c, or the node-copy/serialize machinery: the extension must plug into stable seams. The CustomScan API is precisely the set of seams that make “add a physical operator from a loadable module” possible.

A final theoretical wrinkle is identity under serialization. A plan tree in PostgreSQL is not a transient in-memory object: it is copied (copyObject), it is serialized to text and read back (nodeToString / stringToNode, used to ship plans to parallel workers and to cache them), and it must survive these transformations with its behaviour intact. But the “behaviour” of a custom node lives in C function pointers, which are process-local addresses that cannot be copied or serialized meaningfully across a fork() or a text round-trip. The classic solution — and the one PostgreSQL uses — is a name-keyed method registry: the node stores a stable string name, every process that loads the provider registers the same name → vtable mapping, and the copy/serialize machinery re-resolves the vtable by name on the far side. This is the same indirection a language runtime uses for a vtable pointer, lifted to survive serialization.

Most extensible engines converge on a small number of recurring patterns for “let an extension add a physical operator,” and PostgreSQL’s choices are best understood against that backdrop.

1. The method-vtable (provider) struct. The universal shape of an operator-extension API is a struct of function pointers — a hand-rolled vtable — that the extension fills in and hands to the core. The core never calls the extension’s functions by name; it calls them through the vtable, so the set of operations is fixed by the core (the struct layout) while the implementation is owned by the extension. PostgreSQL splits this into three vtables aligned with the three plan representations, rather than one fat struct, because the lifetimes differ: a path vtable is consulted once per candidate path, a scan-methods vtable once per chosen plan, and an exec vtable many times during execution.

2. Cost-based admission. A well-behaved extensible optimizer does not let an extension force its operator into the plan; it lets the extension propose the operator with a cost, and the core’s existing cost-comparison machinery decides whether it wins. PostgreSQL’s add_path() is exactly this gate: a CustomPath competes on total_cost against Seq Scan, Index Scan, etc., and is pruned if dominated. This keeps the extension honest — a badly costed custom operator simply never gets chosen — and means the extension author only has to estimate a cost, not re-implement path pruning.

3. The hook seam. To get control during planning, the extension needs a callback the core invokes at the right moment. The lightweight industry pattern is a global function-pointer “hook” that defaults to NULL and is invoked if set. PostgreSQL exposes set_rel_pathlist_hook (called after the core has generated the built-in paths for a base relation) and set_join_pathlist_hook (the analogous point for a join), so a provider can add CustomPaths exactly where the core is about to finalize the path list. (The general hook mechanism is covered in postgres-hooks.md; here it is just the entry door.)

4. The opaque private-data channel. A custom operator needs to carry arbitrary state from planning into execution — predicates it pushed down, chosen algorithm parameters, GPU kernel identifiers. Engines either force this into a void * blob (opaque, but then it cannot be copied or serialized) or into the engine’s own node framework (copyable and serializable, but the engine must know the type). PostgreSQL offers both: custom_private is a List of ordinary copyable/serializable nodes for the common case, and the extensible-node framework (T_ExtensibleNode) lets a provider define a genuinely new node type that still round-trips through copyObject and stringToNode via provider-supplied callbacks.

5. Capability flags instead of fat interfaces. Not every custom operator supports backward scan, mark/restore, or parallelism. Rather than demand every provider implement every method, the API uses a bitmask of capability flags (CUSTOMPATH_SUPPORT_BACKWARD_SCAN, _MARK_RESTORE, _PROJECTION) plus optional method-pointer slots that may be NULL. The executor shim checks the pointer (or the flag) before dispatching and raises a clean ERRCODE_FEATURE_NOT_SUPPORTED error if an unsupported capability is invoked. This is the classic “narrow required core + wide optional surface” interface design, and it keeps a minimal provider tiny (four required exec callbacks) while letting an ambitious provider opt into parallel-aware DSM coordination.

flowchart TD
    subgraph plan["Planning (paths)"]
        H["set_rel_pathlist_hook /<br/>set_join_pathlist_hook"] -->|add_path| CP["CustomPath<br/>flags + custom_private<br/>methods: CustomPathMethods"]
        CP -->|cheapest wins| WIN["chosen CustomPath"]
    end
    subgraph lower["Plan creation (lowering)"]
        WIN --> CCP["create_customscan_plan"]
        CCP -->|PlanCustomPath| CS["CustomScan plan node<br/>custom_exprs / custom_private<br/>methods: CustomScanMethods"]
    end
    subgraph exec["Execution (state)"]
        CS -->|ExecInitCustomScan| CSS["CustomScanState<br/>methods: CustomExecMethods"]
        CSS -->|ExecCustomScan loop| ROWS["TupleTableSlots"]
    end
    REG["name registry<br/>RegisterCustomScanMethods /<br/>RegisterExtensibleNodeMethods"] -.resolve by name.-> CS

PostgreSQL implements the operator-extension API as three method structs, one per plan representation, declared together in src/include/nodes/extensible.h. They are deliberately asymmetric in size, mirroring how often each is consulted.

The path-level vtable carries essentially one job — knowing how to turn a CustomPath into a plan — plus an optional reparameterization helper for partitionwise joins:

// CustomPathMethods — src/include/nodes/extensible.h
typedef struct CustomPathMethods
{
const char *CustomName;
/* Convert Path to a Plan */
struct Plan *(*PlanCustomPath) (PlannerInfo *root,
RelOptInfo *rel,
struct CustomPath *best_path,
List *tlist,
List *clauses,
List *custom_plans);
struct List *(*ReparameterizeCustomPathByChild) (PlannerInfo *root,
List *custom_private,
RelOptInfo *child_rel);
} CustomPathMethods;

The scan-methods vtable is even thinner — one callback to manufacture the executor state object from the plan node:

// CustomScanMethods — src/include/nodes/extensible.h
typedef struct CustomScanMethods
{
const char *CustomName;
/* Create execution state (CustomScanState) from a CustomScan plan node */
Node *(*CreateCustomScanState) (CustomScan *cscan);
} CustomScanMethods;

The execution-time vtable is the heavy one. Its first four callbacks are required (the Volcano open/next/close plus rescan); the rest are optional and gated by capability flags or NULL checks — mark/restore for plans that sit under a Merge Join, the DSM-coordination quartet for parallel-aware providers, a shutdown hook, and an EXPLAIN hook:

// CustomExecMethods — src/include/nodes/extensible.h
typedef struct CustomExecMethods
{
const char *CustomName;
/* Required executor methods */
void (*BeginCustomScan) (CustomScanState *node, EState *estate, int eflags);
TupleTableSlot *(*ExecCustomScan) (CustomScanState *node);
void (*EndCustomScan) (CustomScanState *node);
void (*ReScanCustomScan) (CustomScanState *node);
/* Optional methods: needed if mark/restore is supported */
void (*MarkPosCustomScan) (CustomScanState *node);
void (*RestrPosCustomScan) (CustomScanState *node);
/* Optional methods: needed if parallel execution is supported */
Size (*EstimateDSMCustomScan) (CustomScanState *node, ParallelContext *pcxt);
void (*InitializeDSMCustomScan) (CustomScanState *node, ParallelContext *pcxt, void *coordinate);
void (*ReInitializeDSMCustomScan) (CustomScanState *node, ParallelContext *pcxt, void *coordinate);
void (*InitializeWorkerCustomScan) (CustomScanState *node, shm_toc *toc, void *coordinate);
void (*ShutdownCustomScan) (CustomScanState *node);
/* Optional: print additional information in EXPLAIN */
void (*ExplainCustomScan) (CustomScanState *node, List *ancestors, ExplainState *es);
} CustomExecMethods;

The three structs are pointed at from the three node types. The crucial design decision — stated as a load-bearing comment in the headers — is that the methods field is a pointer to a static table the core never copies. The CustomScan plan node says so explicitly:

// CustomScan — src/include/nodes/plannodes.h
typedef struct CustomScan
{
Scan scan;
uint32 flags; /* mask of CUSTOMPATH_* flags */
List *custom_plans; /* list of child Plan nodes, if any */
List *custom_exprs; /* expressions that custom code may evaluate */
List *custom_private; /* private data for custom code */
List *custom_scan_tlist; /* optional tlist describing scan tuple */
Bitmapset *custom_relids; /* RTIs generated by this scan */
/*
* NOTE: The method field of CustomScan is required to be a pointer to a
* static table of callback functions. So we don't copy the table itself,
* just reference the original one.
*/
const struct CustomScanMethods *methods;
} CustomScan;

Because the vtable is never copied and is just an address, it cannot survive serialization to a parallel worker or a text round-trip. PostgreSQL closes that gap with a process-local, name-keyed registry: the provider calls RegisterCustomScanMethods(methods) at module load, the registry stores the vtable under methods->CustomName, and the node only ever serializes the name. On the far side, copyfuncs/readfuncs call GetCustomScanMethods(name) to re-resolve the vtable. The capability flags that thread through all three representations are a single bitmask:

// capability flags — src/include/nodes/extensible.h
#define CUSTOMPATH_SUPPORT_BACKWARD_SCAN 0x0001
#define CUSTOMPATH_SUPPORT_MARK_RESTORE 0x0002
#define CUSTOMPATH_SUPPORT_PROJECTION 0x0004

The flow a provider drives looks like this end to end:

flowchart TD
    A["_PG_init: RegisterCustomScanMethods(&scan_methods)<br/>install set_rel_pathlist_hook"] --> B["planner reaches base rel<br/>set_rel_pathlist() done with core paths"]
    B --> C["hook fires: provider builds CustomPath<br/>flags, custom_private, methods=CustomPathMethods<br/>add_path(rel, custompath)"]
    C --> D{"cheapest total_cost?"}
    D -->|no| X["CustomPath discarded"]
    D -->|yes| E["create_plan_recurse → create_customscan_plan"]
    E -->|methods->PlanCustomPath| F["CustomScan plan node<br/>methods=CustomScanMethods"]
    F --> G["ExecInitNode → ExecInitCustomScan<br/>methods->CreateCustomScanState"]
    G --> H["CustomScanState<br/>methods=CustomExecMethods"]
    H -->|BeginCustomScan| I["per-tuple: ExecCustomScan loop"]
    I -->|EndCustomScan| J["teardown"]

The provider author’s surface area is small and well-bounded: register two vtables (scan + exec) by name, optionally a third (extensible-node) for private node types, set one planner hook, and fill in the required executor callbacks. Everything else — cost comparison, plan copying, parallel-worker plan shipping, EXPLAIN tree-walking — is handled by the core through the seams described next.

The CustomScan machinery is spread thin across the three phases of query processing. This walkthrough follows a tuple’s-eye journey: how a provider registers its vtables, how a CustomPath is injected during planning, how it is lowered to a CustomScan plan node, how that node is instantiated and driven by the executor shim in nodeCustom.c, how parallel coordination works, how EXPLAIN renders the node, and finally how the extensible-node registry lets private node types round-trip through copy/serialize.

1. Registration — the name-keyed vtable registry (extensible.c)

Section titled “1. Registration — the name-keyed vtable registry (extensible.c)”

Everything begins with a provider registering its vtables, normally from _PG_init(). Both registration paths funnel through one internal helper that lazily creates a string-keyed hash and rejects duplicate names:

// RegisterExtensibleNodeEntry — src/backend/nodes/extensible.c
static void
RegisterExtensibleNodeEntry(HTAB **p_htable, const char *htable_label,
const char *extnodename, const void *extnodemethods)
{
ExtensibleNodeEntry *entry;
bool found;
if (*p_htable == NULL)
{
HASHCTL ctl;
ctl.keysize = EXTNODENAME_MAX_LEN;
ctl.entrysize = sizeof(ExtensibleNodeEntry);
*p_htable = hash_create(htable_label, 100, &ctl, HASH_ELEM | HASH_STRINGS);
}
if (strlen(extnodename) >= EXTNODENAME_MAX_LEN)
elog(ERROR, "extensible node name is too long");
entry = (ExtensibleNodeEntry *) hash_search(*p_htable, extnodename,
HASH_ENTER, &found);
if (found)
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("extensible node type \"%s\" already exists", extnodename)));
entry->extnodemethods = extnodemethods;
}

There are two independent hash tables — custom_scan_methods and extensible_node_methods — both file-static HTAB * initialized to NULL. RegisterCustomScanMethods keys on methods->CustomName; the lookup side, GetCustomScanMethods, re-resolves it. Note the asymmetric missing_ok default through GetExtensibleNodeEntry: a miss with missing_ok == false raises ERRCODE_UNDEFINED_OBJECT rather than returning NULL, so a node that serialized a name the far side never registered fails loudly:

// GetExtensibleNodeEntry / GetCustomScanMethods — src/backend/nodes/extensible.c
static const void *
GetExtensibleNodeEntry(HTAB *htable, const char *extnodename, bool missing_ok)
{
ExtensibleNodeEntry *entry = NULL;
if (htable != NULL)
entry = (ExtensibleNodeEntry *) hash_search(htable, extnodename,
HASH_FIND, NULL);
if (!entry)
{
if (missing_ok)
return NULL;
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("ExtensibleNodeMethods \"%s\" was not registered", extnodename)));
}
return entry->extnodemethods;
}

The registry is process-local. In a parallel query each worker re-runs the provider’s _PG_init (the library is listed in a shared-preload or loaded on demand), so the same name → vtable mapping exists in every backend that will deserialize the plan. This is what makes the next steps’ “serialize only the name” strategy correct.

2. Injection — getting control during planning (allpaths.c)

Section titled “2. Injection — getting control during planning (allpaths.c)”

A provider does not call into the planner; the planner calls out to the provider through a global hook, fired right after the core has generated all built-in paths for a base relation:

// set_rel_pathlist (excerpt) — src/backend/optimizer/path/allpaths.c
/*
* Allow a plugin to editorialize on the set of Paths for this base
* relation. It could add new paths (such as CustomPaths) by calling
* add_path(), or add_partial_path() if parallel aware.
*/
if (set_rel_pathlist_hook)
(*set_rel_pathlist_hook) (root, rel, rti, rte);

The hook signature hands the provider everything it needs to build and cost a CustomPath — the PlannerInfo, the target RelOptInfo, the range-table index, and the RTE:

// set_rel_pathlist_hook_type — src/include/optimizer/paths.h
typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
RelOptInfo *rel,
Index rti,
RangeTblEntry *rte);
extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;

The join analogue, set_join_pathlist_hook, fires inside add_paths_to_joinrel() and receives the outer/inner rels plus the join type, so a provider can replace a join implementation (e.g. a GPU hash join). The CustomPath the provider builds is a lightweight path node carrying child paths, restrict-info, and the provider’s private list:

// CustomPath — src/include/nodes/pathnodes.h
typedef struct CustomPath
{
Path path;
uint32 flags; /* mask of CUSTOMPATH_* flags */
List *custom_paths; /* list of child Path nodes, if any */
List *custom_restrictinfo;
List *custom_private;
const struct CustomPathMethods *methods;
} CustomPath;

The provider hands this to add_path(rel, (Path *) custompath). From here the CustomPath is just another candidate: it competes on cost and is pruned if a cheaper path dominates it. The provider never has to touch the path-pruning logic — it only has to set path.total_cost honestly.

3. Lowering — CustomPath to CustomScan (createplan.c)

Section titled “3. Lowering — CustomPath to CustomScan (createplan.c)”

If the CustomPath wins, create_plan_recurse dispatches to create_customscan_plan, which recursively lowers any child paths, orders the scan clauses, and then calls the provider’s PlanCustomPath to produce the actual plan node — the core does not construct the CustomScan itself:

// create_customscan_plan — src/backend/optimizer/plan/createplan.c
static CustomScan *
create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
List *tlist, List *scan_clauses)
{
CustomScan *cplan;
RelOptInfo *rel = best_path->path.parent;
List *custom_plans = NIL;
ListCell *lc;
/* Recursively transform child paths. */
foreach(lc, best_path->custom_paths)
{
Plan *plan = create_plan_recurse(root, (Path *) lfirst(lc), CP_EXACT_TLIST);
custom_plans = lappend(custom_plans, plan);
}
scan_clauses = order_qual_clauses(root, scan_clauses);
/* Invoke custom plan provider to create the Plan node. */
cplan = castNode(CustomScan,
best_path->methods->PlanCustomPath(root, rel, best_path,
tlist, scan_clauses,
custom_plans));
/* Copy cost data from Path to Plan ... */
copy_generic_path_info(&cplan->scan.plan, &best_path->path);
cplan->custom_relids = best_path->path.parent->relids;
if (best_path->path.param_info)
{
cplan->scan.plan.qual = (List *)
replace_nestloop_params(root, (Node *) cplan->scan.plan.qual);
cplan->custom_exprs = (List *)
replace_nestloop_params(root, (Node *) cplan->custom_exprs);
}
return cplan;
}

Two division-of-labour details are worth flagging. First, the core fills in the generic cost fields (copy_generic_path_info) and the relids after the provider returns, so the provider’s PlanCustomPath need only populate the custom-specific fields. Second, replace_nestloop_params rewrites outer-relation Vars into nestloop params in both the qual and custom_exprs, so a custom scan placed on the inner side of a parameterized nestloop gets its parameters wired up without the provider doing anything — but note the core assumes custom_scan_tlist contains no such Vars.

4. Instantiation & the executor shim (nodeCustom.c)

Section titled “4. Instantiation & the executor shim (nodeCustom.c)”

At execution start, ExecInitNode dispatches T_CustomScan to ExecInitCustomScan. This is the most substantial function in nodeCustom.c, and it is where the provider-allocated state object is woven into the standard ScanState framework. Critically, the provider allocates the state (so it can embed CustomScanState as the first field of a larger struct), and the shim then fills the standard fields:

// ExecInitCustomScan (condensed) — src/backend/executor/nodeCustom.c
CustomScanState *
ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
{
CustomScanState *css;
const TupleTableSlotOps *slotOps;
Relation scan_rel = NULL;
Index scanrelid = cscan->scan.scanrelid;
int tlistvarno;
/* Provider does the palloc and sets node tag + methods. */
css = castNode(CustomScanState, cscan->methods->CreateCustomScanState(cscan));
css->flags = cscan->flags;
css->ss.ps.plan = &cscan->scan.plan;
css->ss.ps.state = estate;
css->ss.ps.ExecProcNode = ExecCustomScan;
ExecAssignExprContext(estate, &css->ss.ps);
/* open the scan relation, if any */
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
css->ss.ss_currentRelation = scan_rel;
}
/* Use a custom slot if specified, else a virtual slot. */
slotOps = css->slotOps;
if (!slotOps)
slotOps = &TTSOpsVirtual;
if (cscan->custom_scan_tlist != NIL || scan_rel == NULL)
{
TupleDesc scan_tupdesc = ExecTypeFromTL(cscan->custom_scan_tlist);
ExecInitScanTupleSlot(estate, &css->ss, scan_tupdesc, slotOps);
tlistvarno = INDEX_VAR; /* Vars carry varno = INDEX_VAR */
}
else
{
ExecInitScanTupleSlot(estate, &css->ss, RelationGetDescr(scan_rel), slotOps);
tlistvarno = scanrelid;
}
ExecInitResultTupleSlotTL(&css->ss.ps, &TTSOpsVirtual);
ExecAssignScanProjectionInfoWithVarno(&css->ss, tlistvarno);
css->ss.ps.qual = ExecInitQual(cscan->scan.plan.qual, (PlanState *) css);
/* Provider finishes initialization. */
css->methods->BeginCustomScan(css, estate, eflags);
return css;
}

Three behaviours encoded here are part of the provider contract. (a) The provider sets the node tag and methods inside CreateCustomScanState; the shim only asserts via castNode. (b) The scan tuple type is taken from custom_scan_tlist when present (a join-style custom scan with no single base relation, scanrelid == 0), otherwise from the base relation’s rowtype — and the targetlist’s varno is set accordingly (INDEX_VAR vs scanrelid). (c) A provider may install its own TupleTableSlotOps via css->slotOps; otherwise the shim defaults to a virtual slot.

The per-tuple driver is a one-liner that forwards to the provider’s required ExecCustomScan, guarding interrupts at the standard executor cadence:

// ExecCustomScan — src/backend/executor/nodeCustom.c
static TupleTableSlot *
ExecCustomScan(PlanState *pstate)
{
CustomScanState *node = castNode(CustomScanState, pstate);
CHECK_FOR_INTERRUPTS();
Assert(node->methods->ExecCustomScan != NULL);
return node->methods->ExecCustomScan(node);
}

ExecEndCustomScan and ExecReScanCustomScan are equally thin forwarders, each Assert-ing that the required callback is non-NULL before dispatching. The CustomScanState struct itself is where provider state hangs:

// CustomScanState — src/include/nodes/execnodes.h
typedef struct CustomScanState
{
ScanState ss;
uint32 flags; /* mask of CUSTOMPATH_* flags */
List *custom_ps; /* list of child PlanState nodes, if any */
Size pscan_len; /* size of parallel coordination information */
const struct CustomExecMethods *methods;
const struct TupleTableSlotOps *slotOps;
} CustomScanState;

5. Optional capabilities — mark/restore and the NULL-check guard

Section titled “5. Optional capabilities — mark/restore and the NULL-check guard”

The optional methods are gated. Mark/restore is only meaningful under a Merge Join, and a provider that did not implement it must fail cleanly rather than crash on a NULL pointer. The shim turns the missing callback into a proper SQL error:

// ExecCustomMarkPos — src/backend/executor/nodeCustom.c
void
ExecCustomMarkPos(CustomScanState *node)
{
if (!node->methods->MarkPosCustomScan)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("custom scan \"%s\" does not support MarkPos",
node->methods->CustomName)));
node->methods->MarkPosCustomScan(node);
}

This is the runtime half of the CUSTOMPATH_SUPPORT_MARK_RESTORE flag: the planner only places a custom scan under a Merge Join if that flag is set, and the executor backstops it with this guard so a misbehaving provider that advertised the flag but left the callback NULL still gets a comprehensible error rather than a segfault.

6. Parallel coordination — the DSM quartet

Section titled “6. Parallel coordination — the DSM quartet”

A parallel-aware provider implements four optional callbacks that the shim invokes during parallel setup. The estimate/initialize pair is representative: each is a no-op if the provider left the callback NULL, and otherwise the shim manages the shared-memory TOC bookkeeping while the provider fills the chunk:

// ExecCustomScanEstimate / ExecCustomScanInitializeDSM — src/backend/executor/nodeCustom.c
void
ExecCustomScanEstimate(CustomScanState *node, ParallelContext *pcxt)
{
const CustomExecMethods *methods = node->methods;
if (methods->EstimateDSMCustomScan)
{
node->pscan_len = methods->EstimateDSMCustomScan(node, pcxt);
shm_toc_estimate_chunk(&pcxt->estimator, node->pscan_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
}
}
void
ExecCustomScanInitializeDSM(CustomScanState *node, ParallelContext *pcxt)
{
const CustomExecMethods *methods = node->methods;
if (methods->InitializeDSMCustomScan)
{
int plan_node_id = node->ss.ps.plan->plan_node_id;
void *coordinate = shm_toc_allocate(pcxt->toc, node->pscan_len);
methods->InitializeDSMCustomScan(node, pcxt, coordinate);
shm_toc_insert(pcxt->toc, plan_node_id, coordinate);
}
}

The chunk is keyed by plan_node_id in the TOC, and the worker side (ExecCustomScanInitializeWorker) looks it up by that same id — so the leader and every worker share one coordination region per custom-scan node. ExecShutdownCustomScan lets the provider drain results from workers before the DSM segment is torn down. All four are NULL-tolerant, so a non-parallel provider simply leaves them unset and the parallel machinery skips it.

When EXPLAIN walks a plan tree and reaches a CustomScan, it shows the standard scan qual and then defers to the optional provider hook for any extra detail:

// ExplainNode (T_CustomScan case) — src/backend/commands/explain.c
case T_CustomScan:
{
CustomScanState *css = (CustomScanState *) planstate;
show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1, planstate, es);
if (css->methods->ExplainCustomScan)
css->methods->ExplainCustomScan(css, ancestors, es);
}
break;

The node label itself comes from the vtable’s CustomName (((CustomScan *) plan)->methods->CustomName in ExplainNode’s node-name switch), and child plans stored in custom_ps are recursed via ExplainCustomChildren, which labels them “child”/“children” and re-enters ExplainNode. So a provider with sub-plans gets a properly nested EXPLAIN tree for free.

8. Extensible nodes — round-tripping private types (copyfuncs/outfuncs/readfuncs)

Section titled “8. Extensible nodes — round-tripping private types (copyfuncs/outfuncs/readfuncs)”

The extensible-node framework is the second consumer of the name registry. A provider that wants a genuinely new node type (beyond stuffing ordinary nodes into custom_private) tags it T_ExtensibleNode with an extnodename, and registers an ExtensibleNodeMethods vtable of four serialization callbacks:

// ExtensibleNodeMethods — src/include/nodes/extensible.h
typedef struct ExtensibleNodeMethods
{
const char *extnodename;
Size node_size;
void (*nodeCopy) (struct ExtensibleNode *newnode,
const struct ExtensibleNode *oldnode);
bool (*nodeEqual) (const struct ExtensibleNode *a,
const struct ExtensibleNode *b);
void (*nodeOut) (struct StringInfoData *str,
const struct ExtensibleNode *node);
void (*nodeRead) (struct ExtensibleNode *node);
} ExtensibleNodeMethods;

The core’s copyObject, nodeToString, and stringToNode each special-case T_ExtensibleNode by re-resolving the vtable by name and dispatching to the provider’s callback. Copy allocates node_size bytes (the provider’s possibly larger struct), copies the name field generically, then hands off the private fields:

// _copyExtensibleNode — src/backend/nodes/copyfuncs.c
static ExtensibleNode *
_copyExtensibleNode(const ExtensibleNode *from)
{
ExtensibleNode *newnode;
const ExtensibleNodeMethods *methods;
methods = GetExtensibleNodeMethods(from->extnodename, false);
newnode = (ExtensibleNode *) newNode(methods->node_size, T_ExtensibleNode);
COPY_STRING_FIELD(extnodename);
methods->nodeCopy(newnode, from); /* copy the private fields */
return newnode;
}

Read is the symmetric operation: it pulls the :extnodename token, resolves the vtable, allocates node_size, and lets the provider reconstruct its private fields from the token stream:

// _readExtensibleNode — src/backend/nodes/readfuncs.c
static ExtensibleNode *
_readExtensibleNode(void)
{
const ExtensibleNodeMethods *methods;
ExtensibleNode *local_node;
const char *extnodename;
READ_TEMP_LOCALS();
token = pg_strtok(&length); /* skip :extnodename */
token = pg_strtok(&length); /* get extnodename */
extnodename = nullable_string(token, length);
if (!extnodename)
elog(ERROR, "extnodename has to be supplied");
methods = GetExtensibleNodeMethods(extnodename, false);
local_node = (ExtensibleNode *) newNode(methods->node_size, T_ExtensibleNode);
local_node->extnodename = extnodename;
methods->nodeRead(local_node); /* deserialize the private fields */
READ_DONE();
}

_outExtensibleNode (writes EXTENSIBLENODE + the name + methods->nodeOut) and _equalExtensibleNode (compares the name, then methods->nodeEqual) complete the quartet. The header comment is emphatic that all four callbacks are mandatory — there is no default serialization for a type the core knows nothing about. This is the mechanism that lets a CustomScan ship to a parallel worker with arbitrarily complex provider-private state intact: the state is extensible nodes inside custom_private, and the whole custom_private list round-trips through the standard node serializer because each element resolves its own vtable by name on the worker side.

Position hints (as of 2026-06-05, REL_18 273fe94)

Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”
SymbolFileLine
EXTNODENAME_MAX_LENsrc/include/nodes/extensible.h24
ExtensibleNodeMethodssrc/include/nodes/extensible.h62
CUSTOMPATH_SUPPORT_BACKWARD_SCANsrc/include/nodes/extensible.h84
CustomPathMethodssrc/include/nodes/extensible.h92
CustomScanMethodssrc/include/nodes/extensible.h112
CustomExecMethodssrc/include/nodes/extensible.h124
RegisterCustomScanMethods (decl)src/include/nodes/extensible.h160
RegisterExtensibleNodeEntrysrc/backend/nodes/extensible.c39
RegisterExtensibleNodeMethodssrc/backend/nodes/extensible.c76
RegisterCustomScanMethodssrc/backend/nodes/extensible.c88
GetExtensibleNodeEntrysrc/backend/nodes/extensible.c100
GetExtensibleNodeMethodssrc/backend/nodes/extensible.c125
GetCustomScanMethodssrc/backend/nodes/extensible.c137
ExecInitCustomScansrc/backend/executor/nodeCustom.c26
ExecCustomScan (driver)src/backend/executor/nodeCustom.c114
ExecEndCustomScansrc/backend/executor/nodeCustom.c125
ExecReScanCustomScansrc/backend/executor/nodeCustom.c132
ExecCustomMarkPossrc/backend/executor/nodeCustom.c139
ExecCustomScanEstimatesrc/backend/executor/nodeCustom.c161
ExecCustomScanInitializeDSMsrc/backend/executor/nodeCustom.c174
ExecCustomScanInitializeWorkersrc/backend/executor/nodeCustom.c205
ExecShutdownCustomScansrc/backend/executor/nodeCustom.c221
CustomScan (struct)src/include/nodes/plannodes.h864
CustomPath (struct)src/include/nodes/pathnodes.h2038
CustomScanState (struct)src/include/nodes/execnodes.h2125
set_rel_pathlist_hook (invoke)src/backend/optimizer/path/allpaths.c538
set_join_pathlist_hook (invoke)src/backend/optimizer/path/joinpath.c342
create_customscan_plansrc/backend/optimizer/plan/createplan.c4269
_copyExtensibleNodesrc/backend/nodes/copyfuncs.c147
_outExtensibleNodesrc/backend/nodes/outfuncs.c490
_readExtensibleNodesrc/backend/nodes/readfuncs.c537
_equalExtensibleNodesrc/backend/nodes/equalfuncs.c117
EXPLAIN T_CustomScan casesrc/backend/commands/explain.c2146

All claims in this doc were checked against the REL_18_STABLE working tree at /data/hgryoo/references/postgres, commit 273fe94852b3a7e34fd171e8abdf1481beb302fa (PostgreSQL 18.x). Verification notes:

  • Three method structs and their callbacksCustomPathMethods, CustomScanMethods, and CustomExecMethods were read in full from src/include/nodes/extensible.h. CustomPathMethods has exactly two callbacks (PlanCustomPath, ReparameterizeCustomPathByChild); CustomScanMethods has exactly one (CreateCustomScanState); CustomExecMethods has four required (BeginCustomScan, ExecCustomScan, EndCustomScan, ReScanCustomScan) and eight optional callbacks (MarkPosCustomScan, RestrPosCustomScan, the four DSM callbacks, ShutdownCustomScan, ExplainCustomScan). The header comment “All callbacks are mandatory” applies to ExtensibleNodeMethods, not to the optional CustomExecMethods slots — confirmed by reading both struct comments.

  • Capability flags — exactly three are defined (CUSTOMPATH_SUPPORT_BACKWARD_SCAN = 0x0001, _MARK_RESTORE = 0x0002, _PROJECTION = 0x0004). EXTNODENAME_MAX_LEN is 64. Verified verbatim.

  • The “static vtable, never copied” contract — the comment is present verbatim in the CustomScan struct in src/include/nodes/plannodes.h and is the stated reason the registry exists. Confirmed.

  • Two separate registry hashesextensible.c declares two file-static HTAB * (extensible_node_methods, custom_scan_methods), both routed through RegisterExtensibleNodeEntry / GetExtensibleNodeEntry. The duplicate-name check raises ERRCODE_DUPLICATE_OBJECT; the missing-name lookup with missing_ok == false raises ERRCODE_UNDEFINED_OBJECT. Verified.

  • Hook invocation pointsset_rel_pathlist_hook is invoked in set_rel_pathlist() (allpaths.c) and set_join_pathlist_hook in add_paths_to_joinrel() (joinpath.c). Both are PGDLLIMPORT globals defaulting to NULL. Confirmed by reading both call sites and the declarations in src/include/optimizer/paths.h.

  • Loweringcreate_customscan_plan (createplan.c) calls best_path->methods->PlanCustomPath(...), then copy_generic_path_info and the replace_nestloop_params rewrite of qual and custom_exprs. The function is static and reached from create_scan_plan’s T_CustomScan dispatch. Verified.

  • Executor shim — every function quoted from nodeCustom.c (ExecInitCustomScan, the static ExecCustomScan driver, ExecEndCustomScan, ExecReScanCustomScan, ExecCustomMarkPos, the DSM quartet, ExecShutdownCustomScan) was read line-for-line. The INDEX_VAR vs scanrelid targetlist-varno branch, the css->slotOps default to TTSOpsVirtual, and the provider-allocates-state contract are all literal.

  • EXPLAIN — the T_CustomScan case in ExplainNode (explain.c) shows the Filter qual then dispatches to the optional ExplainCustomScan; the node name is methods->CustomName; children in custom_ps recurse through ExplainCustomChildren. Confirmed.

  • Extensible-node serialization_copyExtensibleNode (copyfuncs.c), _outExtensibleNode (outfuncs.c), _readExtensibleNode (readfuncs.c), and _equalExtensibleNode (equalfuncs.c) each resolve the vtable via GetExtensibleNodeMethods(name, false) and dispatch to the provider callback. Verified.

  • Scope guard (REL_18, no PG19-only claims) — this doc asserts only the CustomScan/extensible-node surface as it exists in REL_18. No PG19-only items (e.g. XLOG2 rmgr, online-checksum BackendTypes) are referenced. contrib/ is out of scope; PG-Strom and similar are named only as external examples of the API’s intended use, not analyzed.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”

The FDW sibling, and why both exist. PostgreSQL has two “scan from outside the core” mechanisms: Foreign Data Wrappers (postgres-fdw.md) and CustomScan. They are deliberately parallel in shape — the CustomScan struct’s own header comment says its custom_exprs / custom_private / custom_scan_tlist / custom_relids fields work “equally” to ForeignScan’s fdw_* fields. The difference is scope. An FDW is bound to a foreign table through SQL DDL (CREATE FOREIGN TABLE, a handler returning FdwRoutine) and the planner invokes it only for that table’s RTE; it is the right tool for “this relation lives elsewhere.” A CustomScan is bound to nothing in the catalog — it is injected by a planner hook and can replace an arbitrary scan or join node, can have scanrelid == 0 (representing a join over several base relations), and can carry child plans. The rule of thumb: FDW for a relation with an external home; CustomScan for a new implementation of an operation over ordinary local relations (GPU execution, columnar cache, vectorized join). The canonical external consumer, PG-Strom, uses CustomScan precisely because it reimplements scans/joins/aggregates over normal heap tables on a GPU — there is no “foreign” relation involved.

Cascades and the rule-driven alternative. SQL Server, Greenplum’s ORCA, and CockroachDB build their optimizers on Graefe’s Cascades model, where adding a physical operator means adding an implementation rule that the memo-driven search applies; the new operator competes inside a uniform rule framework rather than through an out-of-band hook. PostgreSQL’s optimizer is not Cascades — it is a bottom-up dynamic-programming join search — so it lacks a rule registry, and the CustomScan hook is the pragmatic substitute: instead of “register a rule,” you “set a hook and call add_path.” The trade-off is honesty by cost (the add_path gate prunes a dominated custom path just like a Cascades cost bound would) versus the expressiveness of true logical-rewrite rules, which CustomScan cannot express — a custom node can only be a physical alternative for a relation the core already identified, not a logical transformation of the query.

Codegen and vectorized execution as the modern frontier. The research motivation Graefe gave for extensibility — “some kind of logic we haven’t dreamed up yet,” quoted almost verbatim in the CustomPath header comment — has in practice meant two things since 2014: (1) query compilation (Neumann’s Efficiently Compiling Efficient Query Plans for Modern Hardware, VLDB 2011; the HyPer/Umbra lineage), where operators are JIT-compiled into tight loops rather than interpreted through the iterator dispatch, and (2) vectorized execution (the MonetDB/X100 and Vectorwise line), where operators process column batches instead of one tuple per next(). CustomScan is the seam through which a PostgreSQL extension can smuggle either model into an otherwise tuple-at-a-time interpreted executor: the provider’s ExecCustomScan can run a compiled or vectorized kernel internally and hand back ordinary TupleTableSlots at the boundary, so the surrounding plan tree never knows. The optional slotOps field and the DSM-coordination quartet are exactly what such a provider needs — a custom slot type for a columnar batch, and shared-memory coordination for parallel kernels.

Limits and frontier friction. The boundary cost of CustomScan is real: every tuple still crosses the TupleTableSlot interface at the node’s edges, so a vectorized provider pays a re-tuplification tax whenever its parent is a stock operator. Projects that want end-to-end vectorization (e.g. column-store extensions) end up wanting several adjacent custom nodes so batches stay in columnar form across operator boundaries — which CustomScan permits (via custom_plans children) but does not make ergonomic. The other friction point is that the planner only offers a custom path where its hook fires; there is no way for a provider to introduce a custom node at a plan position the core never considered (e.g. a novel two-phase aggregate shape) without also influencing upstream path generation. These are the open edges where PostgreSQL’s hook-plus-cost extensibility model is visibly less general than a full Cascades rule engine — the recurring theme of POSTGRES-lineage extensibility: maximal reach into the type and access-method layers, more constrained reach into the optimizer search itself.

  • PostgreSQL source, REL_18_STABLE @ 273fe94 (/data/hgryoo/references/postgres):
    • src/backend/executor/nodeCustom.c — the executor shim (init/exec/end/rescan, mark-restore guard, DSM quartet, shutdown).
    • src/backend/nodes/extensible.c — the name-keyed vtable registry (register/get for both custom-scan and extensible-node methods).
    • src/include/nodes/extensible.h — the three method structs, ExtensibleNodeMethods, capability flags, EXTNODENAME_MAX_LEN.
    • src/include/nodes/plannodes.hCustomScan plan node and the “static vtable, never copied” contract.
    • src/include/nodes/pathnodes.hCustomPath.
    • src/include/nodes/execnodes.hCustomScanState.
    • src/backend/optimizer/plan/createplan.ccreate_customscan_plan (lowering).
    • src/backend/optimizer/path/allpaths.c, .../path/joinpath.cset_rel_pathlist_hook / set_join_pathlist_hook invocation.
    • src/include/optimizer/paths.h — hook type declarations.
    • src/backend/commands/explain.c — the T_CustomScan EXPLAIN case and ExplainCustomChildren.
    • src/backend/nodes/copyfuncs.c, outfuncs.c, readfuncs.c, equalfuncs.c — extensible-node copy/out/read/equal.
  • Theory anchors (see dbms-papers/ and research/dbms-general/):
    • Graefe, Volcano — An Extensible and Parallel Query Evaluation System (IEEE TKDE, 1994) — iterator model, path-vs-plan separation, extensibility.
    • Graefe, The Cascades Framework for Query Optimization (1995) — rule-driven physical-operator implementation, the comparative frame in §6.
    • Hellerstein, Stonebraker & Hamilton, Architecture of a Database System (2007) — extensibility as a long-lived-DBMS requirement.
    • Stonebraker & Rowe, The Design of POSTGRES (1986) — the extensibility philosophy CustomScan inherits.
    • Neumann, Efficiently Compiling Efficient Query Plans for Modern Hardware (VLDB 2011) — codegen frontier referenced in §6.
  • Sibling docs (cross-reference, not duplicated here): postgres-fdw.md (the FDW mechanism), postgres-executor.md (the surrounding executor/ScanState framework), postgres-planner-overview.md and postgres-path-generation.md (path generation and add_path), postgres-hooks.md (the general planner-hook mechanism), postgres-node-trees.md (copy/serialize infrastructure), postgres-parallel-query.md (the DSM/parallel-worker machinery the DSM quartet plugs into).