PostgreSQL JIT — LLVM Compilation of Expressions and Tuple Deforming
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”A relational engine spends a surprising fraction of its CPU budget not on
“doing the query” but on being general. Evaluating the predicate
WHERE a.col = 3 does not require many machine instructions in principle —
load a column, compare to a constant, branch — yet a generic interpreter
needs several hundred cycles to do it, because it must work for any
expression tree, over any table shape, with any extension’s operators
installed. That generality is paid for in indirect calls (dispatch to an
operator implementation looked up by OID), unpredictable branches
(per-tuple switch on the next interpreter step), and memory traffic
(carrying Datum/isnull pairs through generic structures). For
short-lived OLTP statements this overhead is irrelevant; for analytic
queries that grind millions of tuples through the same expression, it is
the dominant cost.
Just-in-Time (JIT) compilation attacks this overhead by generating, at
query execution time, a native function specialized to the exact
expression and table at hand. The PostgreSQL jit/README frames it as
“turning some form of interpreted program evaluation into a native program,
and doing so at runtime.” It is JIT rather than ahead-of-time (AOT)
compilation precisely because the specialization inputs — the parsed
expression, the concrete TupleDesc, the resolved operator OIDs — are not
known until run time. Three properties define the design space:
-
What to specialize on. A JIT compiler removes overhead only when it can bake in runtime-known facts. For expression evaluation, the structure of the expression tree and the addresses of the operator functions are constants once the query plan exists, so indirect dispatch collapses into direct (and inlinable) calls. For tuple deforming, the number of columns, their fixed widths, their NOT-NULL-ness, and their alignment are properties of the
TupleDesc, so the generic “loop over attributes consulting the descriptor” can be unrolled into a straight-line sequence with most branches resolved at compile time. -
When to pay for compilation. Compiling is not free: building IR, optimizing it, and emitting machine code costs milliseconds — an eternity next to a cheap query. JIT pays off only when the compiled code runs enough times to amortize that cost. The decision therefore needs a cost model: compile only when the estimated work is large enough.
-
How interpreter and compiler stay in sync. A JIT for a hand-written interpreter is really a second implementation of the same semantics. If the interpreter gains an operator the compiler doesn’t know, results diverge. The maintainable designs keep the two implementations structurally parallel (one
caseper opcode in each) and share the underlying helper functions and type definitions rather than duplicating them.
Database System Concepts (Silberschatz et al., query-processing chapters) treats expression and predicate evaluation as the inner loop of the iterator model, and notes that “compiled” evaluation has historically been an alternative to interpretation going back to System R’s access-module generation. The modern lineage — Krikellas et al.’s holistic query compilation and Neumann’s HyPer “produce/consume” model — pushes the idea further, compiling whole pipelines rather than single expressions. PostgreSQL occupies a deliberately conservative point in that space: it JIT-compiles two well-isolated hot spots (expression evaluation and tuple deforming) while leaving the surrounding executor an interpreter, because those two are “commonly major CPU bottlenecks in analytics queries” yet have clean, narrow interfaces to the rest of the engine.
Common DBMS Design
Section titled “Common DBMS Design”This section names the engineering patterns engines share when they bolt a JIT onto an existing interpreted executor, so PostgreSQL’s specific choices read as selections within a shared space.
Provider abstraction behind a thin wrapper
Section titled “Provider abstraction behind a thin wrapper”A JIT backend is a heavyweight dependency — LLVM is tens of megabytes of
C++ — that not every deployment wants linked into the core server binary.
The common answer is an indirection layer: the executor calls a small,
dependency-free wrapper (jit_compile_expr), which forwards to a provider
implementing a fixed callback interface, loaded as a separate shared object
only when JIT is actually requested. This keeps the main binary free of the
compiler dependency, lets the OS package the compiler separately, and — as a
bonus — makes the provider swappable (one could write a non-LLVM provider
behind the same interface).
Mirroring the interpreter, op for op
Section titled “Mirroring the interpreter, op for op”The compiler and the interpreter must agree on semantics. The least
error-prone way to guarantee that is to give the code generator the same
shape as the interpreter: a switch over the same opcode enum, with one
arm per step, each arm emitting IR that does exactly what the interpreter’s
corresponding arm does. Steps that are rare or complex are not re-emitted
inline at all — the generated code simply calls back into the existing
interpreter helper (ExecEval*), so only the hot, common steps get bespoke
IR while the long tail reuses the C implementation.
Specializing tuple access from the descriptor
Section titled “Specializing tuple access from the descriptor”“Deforming” — turning a packed on-disk tuple into an array of Datum plus
isnull flags — is branch-heavy in the generic case: every column consults
the descriptor for length, alignment, and null-bitmap position. Because the
descriptor is fixed for a given scan, a JIT can generate a
descriptor-specific deform routine in which fixed-width columns become
constant pointer increments, NOT-NULL columns skip the null-bitmap check,
and known alignment eliminates the per-column align computation. This is the
“program that isn’t a program” case: deforming is not an expression, but it
benefits enormously from compile-time knowledge.
Inlining the operator bodies
Section titled “Inlining the operator bodies”The single biggest expression-evaluation win is removing the call into the
SQL operator implementation (int4eq, float8pl, …) and instead inlining
its body, exposing it to the optimizer for constant folding and dead-branch
elimination. Maintaining a second copy of every operator just for inlining
would be untenable, so the standard trick is to compile the engine’s own C
operator sources to compiler IR bitcode at build time, ship that bitcode
alongside the binary, and have the JIT pull operator definitions out of it
on demand.
Cost-gated, lazily-emitted compilation
Section titled “Cost-gated, lazily-emitted compilation”Because compilation is expensive, engines gate it on a cost estimate: the planner already computes a total plan cost, so a threshold on that cost is a cheap, already-available trigger. And because emitting one function at a time wastes per-emission overhead, the mature design defers emission — functions are defined into a module during plan initialization but machine code is generated lazily, the first time any function in the module is actually called, so a whole query’s worth of functions emit together.
flowchart TD
subgraph plan["Planner — standard_planner"]
cost["top_plan->total_cost"] --> gate{"cost ><br/>jit_above_cost?"}
gate -- no --> none["jitFlags = PGJIT_NONE<br/>(pure interpreter)"]
gate -- yes --> flags["set PGJIT_PERFORM<br/>+ EXPR / DEFORM<br/>+ OPT3 / INLINE<br/>by further thresholds"]
end
subgraph exec["Executor — ExecInitNode time"]
flags --> compile["jit_compile_expr(state)"]
compile --> provider["load llvmjit.so<br/>via provider_init()"]
provider --> emitIR["emit LLVM IR<br/>for each ExprState"]
end
subgraph run["First evaluation"]
emitIR --> lazy["ExecRunCompiledExpr<br/>-> llvm_get_function"]
lazy --> machine["LLVM emits machine code<br/>for the whole module"]
machine --> fast["subsequent calls run<br/>native function directly"]
end
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”PostgreSQL’s JIT is structured as exactly the layered design above: a
provider-independent core in src/backend/jit/jit.c that is compiled into
every server, and an LLVM-specific provider in src/backend/jit/llvm/ that
lives in a separately loadable llvmjit shared library. The README is
explicit that “code intending to perform JIT … calls an LLVM independent
wrapper located in jit.c,” and that the wrapper “is allowed to fail in case
no JIT provider can be loaded.” That failure-tolerance is the whole point:
a build without LLVM, or a deployment that hasn’t installed the LLVM
package, simply runs the interpreter.
The provider interface and lazy library load
Section titled “The provider interface and lazy library load”The provider contract is three function pointers, populated by the provider’s init entry point and stored in a single static struct:
// _PG_jit_provider_init — src/backend/jit/llvm/llvmjit.cvoid_PG_jit_provider_init(JitProviderCallbacks *cb){ cb->reset_after_error = llvm_reset_after_error; cb->release_context = llvm_release_context; cb->compile_expr = llvm_compile_expr;}The core never references LLVM symbols directly; it only calls through these
pointers. Loading is lazy and one-shot: provider_init() probes for the
shared library on disk, and — crucially — caches both success and failure so
a missing provider is not retried on every expression:
// provider_init — src/backend/jit/jit.cif (!jit_enabled) return false;if (provider_failed_loading) // never retry a known failure return false;if (provider_successfully_loaded) return true;snprintf(path, MAXPGPATH, "%s/%s%s", pkglib_path, jit_provider, DLSUFFIX);if (!pg_file_exists(path)) { // probe before dlopen, which would ERROR provider_failed_loading = true; return false;}provider_failed_loading = true; // assume failure until init() returnsinit = (JitProviderInit) load_external_function(path, "_PG_jit_provider_init", true, NULL);init(&provider);provider_successfully_loaded = true;provider_failed_loading = false;The jit_provider GUC (default "llvmjit") names the library, so the
abstraction is genuinely pluggable — a different provider shared object
behind the same _PG_jit_provider_init symbol would slot in unchanged.
The entry gate: jit_compile_expr
Section titled “The entry gate: jit_compile_expr”Every JIT compilation request funnels through jit_compile_expr in the
core. It is the single place that consults the per-query flags and decides
whether to hand off to the provider:
// jit_compile_expr — src/backend/jit/jit.cif (!state->parent) // need an EState lifetime return false;if (!(state->parent->state->es_jit_flags & PGJIT_PERFORM)) return false;if (!(state->parent->state->es_jit_flags & PGJIT_EXPR)) return false;if (provider_init()) // also checks !jit_enabled return provider.compile_expr(state);return false;Note the first guard: an ExprState with no parent PlanState has no
EState to anchor a JIT context’s lifetime to, so it is never compiled —
those one-off expressions would otherwise leak compiled functions until
end-of-transaction. When all guards pass, provider.compile_expr (i.e.
llvm_compile_expr) takes over.
Per-opcode IR generation mirrors the interpreter
Section titled “Per-opcode IR generation mirrors the interpreter”llvm_compile_expr is, structurally, a clone of execExprInterp.c’s
ExecInterpExpr: it walks the linearized ExprState->steps[] array and, in
a switch (opcode), emits IR for each step. It first creates an LLVM
function with the same C signature as the interpreter (pulled from
llvmjit_types.c so the types stay in sync), then pre-creates one basic
block per step so that steps can branch to each other:
// llvm_compile_expr — src/backend/jit/llvm/llvmjit_expr.ceval_fn = LLVMAddFunction(mod, funcname, llvm_pg_var_func_type("ExecInterpExprStillValid"));LLVMSetLinkage(eval_fn, LLVMExternalLinkage);llvm_copy_attributes(AttributeTemplate, eval_fn);/* ... load v_state, v_econtext, slot value/null arrays ... */opblocks = palloc(sizeof(LLVMBasicBlockRef) * state->steps_len);for (int opno = 0; opno < state->steps_len; opno++) opblocks[opno] = l_bb_append_v(eval_fn, "b.op.%d.start", opno);LLVMBuildBr(b, opblocks[0]); // jump into first stepfor (int opno = 0; opno < state->steps_len; opno++) { op = &state->steps[opno]; opcode = ExecEvalStepOp(state, op); LLVMPositionBuilderAtEnd(b, opblocks[opno]); switch (opcode) { /* one arm per ExprEvalOp ... */ }}A hot step like reading an already-deformed column variable is emitted as
direct loads from the slot’s tts_values/tts_isnull arrays — no function
call at all:
// EEOP_*_VAR — src/backend/jit/llvm/llvmjit_expr.cv_attnum = l_int32_const(lc, op->d.var.attnum);value = l_load_gep1(b, TypeSizeT, v_values, v_attnum, "");isnull = l_load_gep1(b, TypeStorageBool, v_nulls, v_attnum, "");LLVMBuildStore(b, value, v_resvaluep);LLVMBuildStore(b, isnull, v_resnullp);LLVMBuildBr(b, opblocks[opno + 1]);A function-call step (EEOP_FUNCEXPR_STRICT) emits an explicit null-check
chain over the arguments — one basic block per argument — falling through to
the real call only when all are non-null, exactly replicating the strict
semantics the interpreter implements with a loop:
// EEOP_FUNCEXPR_STRICT — src/backend/jit/llvm/llvmjit_expr.cfor (int argno = 0; argno < op->d.func.nargs; argno++) { LLVMPositionBuilderAtEnd(b, b_checkargnulls[argno]); b_argnotnull = (argno + 1 == op->d.func.nargs) ? b_nonull : b_checkargnulls[argno + 1]; v_argisnull = l_funcnull(b, v_fcinfo, argno); // load fcinfo->args[i].isnull LLVMBuildCondBr(b, LLVMBuildICmp(b, LLVMIntEQ, v_argisnull, l_sbool_const(1), ""), opblocks[opno + 1], // any null -> skip call b_argnotnull);}LLVMPositionBuilderAtEnd(b, b_nonull);v_retval = BuildV1Call(context, b, mod, fcinfo, &v_fcinfo_isnull);Steps that are rare or unwieldy are not reimplemented in IR. Instead the
generated code calls the interpreter’s own helper through the
build_EvalXFunc macro, which assembles a direct call to e.g.
ExecEvalParamExtern. This is the “mirror, but reuse for the long tail”
pattern made concrete:
// EEOP_PARAM_EXTERN — src/backend/jit/llvm/llvmjit_expr.ccase EEOP_PARAM_EXTERN: build_EvalXFunc(b, mod, "ExecEvalParamExtern", v_state, op, v_econtext); LLVMBuildBr(b, opblocks[opno + 1]); break;Direct, inlinable operator calls via BuildV1Call
Section titled “Direct, inlinable operator calls via BuildV1Call”The mechanism that turns an indirect operator dispatch into a direct call
— the prerequisite for later inlining — is BuildV1Call together with
llvm_function_reference. For an operator whose symbol is known
(fmgr_symbol resolves it), the call is emitted as a direct reference to a
named function (int4eq, pgextern.<module>.<fn>, …); only for opaque
function pointers does it fall back to loading a constant pointer. The
result is IR in which the operator is a named callee that the inliner can
later replace with the operator’s body:
// BuildV1Call — src/backend/jit/llvm/llvmjit_expr.cv_fn = llvm_function_reference(context, b, mod, fcinfo);v_fcinfo = l_ptr_const(fcinfo, l_ptr(StructFunctionCallInfoData));v_fcinfo_isnullp = l_struct_gep(b, StructFunctionCallInfoData, v_fcinfo, FIELDNO_FUNCTIONCALLINFODATA_ISNULL, "");LLVMBuildStore(b, l_sbool_const(0), v_fcinfo_isnullp);v_retval = l_call(b, LLVMGetFunctionType(AttributeTemplate), v_fn, &v_fcinfo, 1, "funccall");The companion lifetime-end annotation it emits (on LLVM < 22) tells the optimizer the argument memory need not be preserved across the call — improving the odds the inliner can drop redundant stores.
A predicate step (EEOP_QUAL, the WHERE-clause short-circuit) shows how
control flow that the interpreter expresses with C if/goto becomes
explicit basic-block branching in IR. A null or false result jumps to the
qualfail block, which normalizes the result to a non-null false and jumps to
the qual’s jumpdone target; a true result simply falls through:
// EEOP_QUAL — src/backend/jit/llvm/llvmjit_expr.cv_resvalue = l_load(b, TypeSizeT, v_resvaluep, "");v_resnull = l_load(b, TypeStorageBool, v_resnullp, "");v_nullorfalse = LLVMBuildOr(b, LLVMBuildICmp(b, LLVMIntEQ, v_resnull, l_sbool_const(1), ""), LLVMBuildICmp(b, LLVMIntEQ, v_resvalue, l_sizet_const(0), ""), "");LLVMBuildCondBr(b, v_nullorfalse, b_qualfail, opblocks[opno + 1]);LLVMPositionBuilderAtEnd(b, b_qualfail);LLVMBuildStore(b, l_sbool_const(0), v_resnullp); /* result not null */LLVMBuildStore(b, l_sizet_const(0), v_resvaluep); /* result is false */LLVMBuildBr(b, opblocks[op->d.qualexpr.jumpdone]); /* short-circuit out */The simplest arm, EEOP_CONST, makes the “bake in runtime-known facts”
principle literal: the constant’s Datum and null flag — fixed once the
plan exists — are emitted as IR constants, so the optimizer can later
constant-fold any operator that consumes them:
// EEOP_CONST — src/backend/jit/llvm/llvmjit_expr.cv_constvalue = l_sizet_const(op->d.constval.value);v_constnull = l_sbool_const(op->d.constval.isnull);LLVMBuildStore(b, v_constvalue, v_resvaluep);LLVMBuildStore(b, v_constnull, v_resnullp);Tuple deforming specialized to the TupleDesc
Section titled “Tuple deforming specialized to the TupleDesc”The deforming JIT is the clearest demonstration of “specialize on
runtime-known facts.” slot_compile_deform builds a function — taking just
a TupleTableSlot * — that deforms one specific tuple shape up to natts
columns. It declines to generate anything for slot kinds it can’t handle
(virtual slots never need deforming; only heap/buffer-heap/minimal slots are
supported):
// slot_compile_deform — src/backend/jit/llvm/llvmjit_deform.cif (ops == &TTSOpsVirtual) return NULL;if (ops != &TTSOpsHeapTuple && ops != &TTSOpsBufferHeapTuple && ops != &TTSOpsMinimalTuple) return NULL;Before emitting anything, it analyzes the descriptor to learn two
compile-time facts per column: the last guaranteed-present column (a
trailing run of NOT NULL, non-missing, non-dropped columns can be read
without checking the tuple’s natts), and the known alignment so far. A
fixed-width NOT NULL column lets the next column’s offset be a compile-time
constant; the moment a variable-length or nullable column appears, alignment
becomes unknown again:
// slot_compile_deform — src/backend/jit/llvm/llvmjit_deform.cif (att->attnullability == ATTNULLABLE_VALID && !att->atthasmissing && !att->attisdropped) guaranteed_column_number = attnum; // can skip natts check up to here/* ... */if (att->attlen < 0) { // varlena: alignment now unknown known_alignment = -1; attguaranteedalign = false;} else if (att->attnullability == ATTNULLABLE_VALID && attguaranteedalign && known_alignment >= 0) { known_alignment += att->attlen; // offset stays a constant}The store loop then emits, per column, code whose shape depends on those
facts. For a by-value type the on-disk bytes are loaded at the exact
integer width and sign-extended into a Datum; for a by-reference type a
pointer-to-data is stored; the data pointer is advanced by a constant for
fixed-width columns and by a call to varsize_any/strlen only for varlena
and cstring:
// slot_compile_deform store loop — src/backend/jit/llvm/llvmjit_deform.cif (att->attbyval) { LLVMTypeRef vartype = LLVMIntTypeInContext(lc, att->attlen * 8); v_tmp_loaddata = l_load(b, vartype, LLVMBuildPointerCast(b, v_attdatap, LLVMPointerType(vartype, 0), ""), ""); v_tmp_loaddata = LLVMBuildSExt(b, v_tmp_loaddata, TypeSizeT, ""); LLVMBuildStore(b, v_tmp_loaddata, v_resultp);} else { LLVMBuildStore(b, LLVMBuildPtrToInt(b, v_attdatap, TypeSizeT, "attr_ptr"), v_resultp);}if (att->attlen > 0) v_incby = l_sizet_const(att->attlen); // constant strideelse if (att->attlen == -1) { v_incby = l_call(b, llvm_pg_var_func_type("varsize_any"), llvm_pg_func(mod, "varsize_any"), &v_attdatap, 1, ""); l_callsite_alwaysinline(v_incby); // mark varsize_any for inlining}The deform function is not produced eagerly for every scan. It is built
on demand the first time a EEOP_*_FETCHSOME step runs in the compiled
expression, and only when PGJIT_DEFORM is set. The fetch step first checks
whether the slot already has enough attributes deformed (tts_nvalid >= last_var) and branches to the deform call only if not:
// EEOP_*_FETCHSOME — src/backend/jit/llvm/llvmjit_expr.cv_nvalid = l_load_struct_gep(b, StructTupleTableSlot, v_slot, FIELDNO_TUPLETABLESLOT_NVALID, "");LLVMBuildCondBr(b, LLVMBuildICmp(b, LLVMIntUGE, v_nvalid, l_int16_const(lc, op->d.fetch.last_var), ""), opblocks[opno + 1], b_fetch); // already deformed -> skip/* ... in b_fetch: */if (tts_ops && desc && (context->base.flags & PGJIT_DEFORM)) l_jit_deform = slot_compile_deform(context, desc, tts_ops, op->d.fetch.last_var);if (l_jit_deform) // call the specialized fn l_call(b, LLVMGetFunctionType(l_jit_deform), l_jit_deform, params, 1, "");else // fall back to interpreter l_call(b, llvm_pg_var_func_type("slot_getsomeattrs_int"), llvm_pg_func(mod, "slot_getsomeattrs_int"), params, 2, "");Inlining operator bodies from bitcode
Section titled “Inlining operator bodies from bitcode”When PGJIT_INLINE is set, llvm_compile_module runs llvm_inline over the
module before optimization. The inliner does not maintain a second copy of
each operator: at build time the engine’s C sources are compiled to LLVM
bitcode and installed under $pkglibdir/bitcode/postgres/, with a summary
index. llvm_inline builds an import plan — which external function
references in the module are small enough and available in bitcode — then
pulls those definitions in:
// llvm_inline — src/backend/jit/llvm/llvmjit_inline.cppllvm_inline(LLVMModuleRef M){ llvm::Module *mod = llvm::unwrap(M); std::unique_ptr<ImportMapTy> globalsToInline = llvm_build_inline_plan(lc, mod); if (!globalsToInline) return; llvm_execute_inline_plan(mod, globalsToInline.get());}Because the direct calls emitted by BuildV1Call reference operators by
their real symbol names, the inliner can match them against the bitcode and
splice in the bodies — after which constant folding and dead-branch removal
can fire on the now-visible operator logic. The README notes this is the
“one big advantage” of JITing: collapsing PostgreSQL’s extensible
function/operator dispatch.
Optimization levels, lazy emission, and cost gating
Section titled “Optimization levels, lazy emission, and cost gating”llvm_optimize_module picks a pass pipeline from the context flags. Without
PGJIT_OPT3 it runs a cheap default<O0>,mem2reg (plus an inline pass if
PGJIT_INLINE is set); with PGJIT_OPT3 it runs a full default<O3> with
an inliner threshold of 512:
// llvm_optimize_module (LLVM >= 17) — src/backend/jit/llvm/llvmjit.cif (context->base.flags & PGJIT_OPT3) passes = "default<O3>";else if (context->base.flags & PGJIT_INLINE) passes = "default<O0>,mem2reg,inline";else passes = "default<O0>,mem2reg";LLVMPassBuilderOptionsSetInlinerThreshold(options, 512);err = LLVMRunPasses(module, passes, NULL, options);Crucially, IR is defined during ExecInitNode but machine code is emitted
lazily. llvm_compile_expr installs ExecRunCompiledExpr as the
expression’s eval function; the first actual evaluation calls
llvm_get_function, which triggers llvm_compile_module (inline + optimize
- hand the module to ORC) and then resolves the symbol — and ORC itself only materializes code the first time a symbol is looked up:
// ExecRunCompiledExpr — src/backend/jit/llvm/llvmjit_expr.cCheckExprStillValid(state, econtext);llvm_enter_fatal_on_oom();func = (ExprStateEvalFunc) llvm_get_function(cstate->context, cstate->funcname);llvm_leave_fatal_on_oom();state->evalfunc = func; // remove the indirection for future callsreturn func(state, econtext, isNull);Whether any of this happens at all is decided once, in the planner. After
producing the final plan, standard_planner compares the top plan’s cost to
the JIT thresholds and sets the per-query jitFlags:
// standard_planner (jitFlags) — src/backend/optimizer/plan/planner.cresult->jitFlags = PGJIT_NONE;if (jit_enabled && jit_above_cost >= 0 && top_plan->total_cost > jit_above_cost) { result->jitFlags |= PGJIT_PERFORM; if (jit_optimize_above_cost >= 0 && top_plan->total_cost > jit_optimize_above_cost) result->jitFlags |= PGJIT_OPT3; if (jit_inline_above_cost >= 0 && top_plan->total_cost > jit_inline_above_cost) result->jitFlags |= PGJIT_INLINE; if (jit_expressions) result->jitFlags |= PGJIT_EXPR; if (jit_tuple_deforming) result->jitFlags |= PGJIT_DEFORM;}The defaults make the tiering legible: jit_above_cost = 100000 turns JIT
on; jit_inline_above_cost = jit_optimize_above_cost = 500000 add inlining
and full optimization only for substantially more expensive plans. A
negative threshold disables that tier. This is the cost model from the
Theoretical Background made concrete — reusing the planner’s existing cost
estimate rather than instrumenting evaluation counts.
flowchart TD
init["ExecInitExpr / ExecInitNode"] --> jce["jit_compile_expr(state)"]
jce --> lce["llvm_compile_expr"]
lce --> mut["llvm_mutable_module:<br/>get/create LLVM module"]
mut --> sw["switch over steps[]:<br/>emit IR per opcode"]
sw --> fetchq{"FETCHSOME step?"}
fetchq -- yes --> deform["slot_compile_deform:<br/>tupledesc-specialized<br/>deform fn into same module"]
fetchq -- no --> sw
sw --> install["install ExecRunCompiledExpr<br/>as evalfunc (no emit yet)"]
install --> firstcall["first evaluation"]
firstcall --> getfn["llvm_get_function"]
getfn --> cmod["llvm_compile_module:<br/>inline -> optimize -> ORC add"]
cmod --> orc["ORC materializes machine code<br/>on symbol lookup"]
orc --> native["native fn pointer cached in state->evalfunc"]
Source Walkthrough
Section titled “Source Walkthrough”The JIT subsystem splits cleanly across the provider boundary. Below the symbols are grouped by file and call-flow; line numbers are deferred to the position-hint table at the end of the section.
Provider-independent core (src/backend/jit/jit.c)
Section titled “Provider-independent core (src/backend/jit/jit.c)”- GUC variables.
jit_enabled,jit_provider,jit_expressions,jit_tuple_deforming,jit_above_cost,jit_inline_above_cost,jit_optimize_above_cost,jit_debugging_support,jit_profiling_support,jit_dump_bitcode— the knobs read by the planner and provider. provider_init— lazy, one-shot loader. Probes$pkglibdir/<jit_provider>$DLSUFFIXwithpg_file_exists, thenload_external_functions the_PG_jit_provider_initsymbol and caches success/failure inprovider_successfully_loaded/provider_failed_loading.pg_jit_available— the SQL-callable wrapper that forces a load attempt and returns whether a provider is usable.jit_compile_expr— the single entry gate. Checksstate->parent,PGJIT_PERFORM,PGJIT_EXPR, thenprovider_init()before delegating toprovider.compile_expr.jit_release_context,jit_reset_after_error— forward to the provider’srelease_context/reset_after_errorcallbacks.InstrJitAgg— folds per-contextJitInstrumentationcounters (created functions, generation/deform/inlining/optimization/emission time) into an aggregate forEXPLAIN.JitProviderCallbacks(injit/jit.h) and thePGJIT_*flag macros (PGJIT_PERFORM,PGJIT_OPT3,PGJIT_INLINE,PGJIT_EXPR,PGJIT_DEFORM) — the provider contract and the per-query flag bits.
LLVM provider core (src/backend/jit/llvm/llvmjit.c)
Section titled “LLVM provider core (src/backend/jit/llvm/llvmjit.c)”_PG_jit_provider_init— populates the three callbacks (reset_after_error,release_context,compile_expr).llvm_create_context/llvm_release_context— allocate/free aLLVMJitContext, register it with the currentResourceOwner(viaResourceOwnerRememberJITand thejit_resowner_desc) so it is cleaned up on error or transaction end; trackllvm_jit_context_in_use_count.llvm_recreate_llvm_context— periodically disposes and recreates the sharedLLVMContextRef(afterLLVMJIT_LLVM_CONTEXT_REUSE_MAXuses) to bound the type leakage that inlining causes; callsllvm_inline_reset_cachesfirst.llvm_mutable_module— returns the in-progressLLVMModuleRef, creating one (with the right triple/layout) if none is pending.llvm_expand_funcname— produces a unique externally-visible function name (<base>_<module_generation>_<counter>) and bumpscreated_functions.llvm_get_function— forcesllvm_compile_moduleif the module is not yet compiled, thenLLVMOrcLLJITLookups the symbol, accumulatingemission_counter(ORC emits lazily on lookup).llvm_pg_var_type/llvm_pg_var_func_type/llvm_pg_func— pull type and function signatures out of the bitcode-loadedllvm_types_module, keeping JIT IR in sync with C structs.llvm_function_reference— resolves anfcinfoto a named callee (pgextern.<mod>.<fn>, internal name, or a constant pointer global), enabling direct/inlinable calls.llvm_optimize_module— selects the pass pipeline from the context flags (default<O0>,mem2regvs.default<O3>), with an inliner threshold of 512.llvm_compile_module— runsllvm_inline(ifPGJIT_INLINE), optimizes, optionally dumps bitcode (jit_dump_bitcode), then adds the module to the opt0/opt3 ORCLLJITinstance viaLLVMOrcLLJITAddLLVMIRModuleWithRT.llvm_session_initialize/llvm_shutdown— one-time per-backend setup of the native target, host CPU/features, opt0/opt3 target machines, and the twoLLVMOrcLLJITRefinstances; shutdown disposes them onproc_exit.llvm_create_types— loadsllvmjit_types.bcand binds the globalLLVMTypeRefs (StructTupleTableSlot,StructExprState,StructFunctionCallInfoData, theAttributeTemplate, …).llvm_split_symbol_name/llvm_resolve_symbol/llvm_create_jit_instance— symbol resolution plumbing for ORC, including the custom definition generator that resolves SQL-callable functions and main-binary symbols.
Expression code generation (src/backend/jit/llvm/llvmjit_expr.c)
Section titled “Expression code generation (src/backend/jit/llvm/llvmjit_expr.c)”llvm_compile_expr— the heart of the provider: creates theevalexprfunction, loads slot value/null arrays, pre-allocates one basic block perExprStatestep, andswitches overExecEvalStepOpto emit IR per opcode. MirrorsExecInterpExprinexecExprInterp.c.- Opcode arms — representative cases:
EEOP_DONE_RETURN(store isnull, return value),EEOP_*_FETCHSOME(deform trigger),EEOP_*_VAR(direct slot load),EEOP_CONST,EEOP_FUNCEXPR/EEOP_FUNCEXPR_STRICT(null-check chain +BuildV1Call),EEOP_QUAL(short-circuit on null/false),EEOP_PARAM_EXTERN(delegate to interpreter helper). BuildV1Call— emits the direct V1 function call, storesisnull, and (LLVM < 22) thellvm.lifetime.endannotation viacreate_LifetimeEnd.build_EvalXFuncInt(and thebuild_EvalXFuncmacro) — assembles a direct call into a namedExecEval*interpreter helper for the long-tail opcodes.ExecRunCompiledExpr— the thunk installed asstate->evalfunc; validates the expression, triggers lazy emission viallvm_get_function, caches the native pointer, and tail-calls it.
Tuple deforming (src/backend/jit/llvm/llvmjit_deform.c)
Section titled “Tuple deforming (src/backend/jit/llvm/llvmjit_deform.c)”slot_compile_deform— builds aTupleDesc-specialized deform function; declines virtual/unknown slot kinds; precomputesguaranteed_column_numberandknown_alignment; emits per-column load / store / pointer-advance with constant strides where possible andvarsize_any/strlencalls (marked always-inline) for varlena/cstring.
Inlining (src/backend/jit/llvm/llvmjit_inline.cpp)
Section titled “Inlining (src/backend/jit/llvm/llvmjit_inline.cpp)”llvm_inline— entry point; builds an import plan withllvm_build_inline_plan(consultingfunction_inlinable) and applies it withllvm_execute_inline_plan, pulling operator bodies out of the installed$pkglibdir/bitcode/summaries.llvm_inline_reset_caches— drops cached bitcode modules before the sharedLLVMContextRefis recreated.
Cost gating (src/backend/optimizer/plan/planner.c)
Section titled “Cost gating (src/backend/optimizer/plan/planner.c)”standard_planner(jitFlags block) — comparestop_plan->total_costtojit_above_cost/jit_optimize_above_cost/jit_inline_above_costand sets thePlannedStmt.jitFlagsconsumed at execution time.
Position hints (as of 2026-06-05, REL_18 273fe94)
Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”| Symbol | File | Line |
|---|---|---|
provider_init | src/backend/jit/jit.c | 67 |
pg_jit_available | src/backend/jit/jit.c | 56 |
jit_compile_expr | src/backend/jit/jit.c | 151 |
jit_release_context | src/backend/jit/jit.c | 137 |
jit_reset_after_error | src/backend/jit/jit.c | 127 |
InstrJitAgg | src/backend/jit/jit.c | 182 |
jit_above_cost (GUC default) | src/backend/jit/jit.c | 39 |
PGJIT_PERFORM / PGJIT_DEFORM | src/include/jit/jit.h | 20 / 24 |
_PG_jit_provider_init | src/backend/jit/llvm/llvmjit.c | 151 |
llvm_create_context | src/backend/jit/llvm/llvmjit.c | 223 |
llvm_release_context | src/backend/jit/llvm/llvmjit.c | 252 |
llvm_recreate_llvm_context | src/backend/jit/llvm/llvmjit.c | 173 |
llvm_mutable_module | src/backend/jit/llvm/llvmjit.c | 316 |
llvm_expand_funcname | src/backend/jit/llvm/llvmjit.c | 341 |
llvm_get_function | src/backend/jit/llvm/llvmjit.c | 362 |
llvm_function_reference | src/backend/jit/llvm/llvmjit.c | 540 |
llvm_optimize_module | src/backend/jit/llvm/llvmjit.c | 603 |
llvm_compile_module | src/backend/jit/llvm/llvmjit.c | 709 |
llvm_session_initialize | src/backend/jit/llvm/llvmjit.c | 825 |
llvm_create_types | src/backend/jit/llvm/llvmjit.c | 995 |
llvm_resolve_symbol | src/backend/jit/llvm/llvmjit.c | 1087 |
llvm_create_jit_instance | src/backend/jit/llvm/llvmjit.c | 1220 |
llvm_compile_expr | src/backend/jit/llvm/llvmjit_expr.c | 80 |
EEOP_*_FETCHSOME arm | src/backend/jit/llvm/llvmjit_expr.c | 344 |
EEOP_*_VAR arm | src/backend/jit/llvm/llvmjit_expr.c | 444 |
EEOP_FUNCEXPR_STRICT arm | src/backend/jit/llvm/llvmjit_expr.c | 665 |
ExecRunCompiledExpr | src/backend/jit/llvm/llvmjit_expr.c | 2988 |
BuildV1Call | src/backend/jit/llvm/llvmjit_expr.c | 3008 |
build_EvalXFuncInt | src/backend/jit/llvm/llvmjit_expr.c | 3060 |
create_LifetimeEnd | src/backend/jit/llvm/llvmjit_expr.c | 3090 |
slot_compile_deform | src/backend/jit/llvm/llvmjit_deform.c | 34 |
llvm_inline | src/backend/jit/llvm/llvmjit_inline.cpp | 167 |
llvm_inline_reset_caches | src/backend/jit/llvm/llvmjit_inline.cpp | 156 |
llvm_build_inline_plan | src/backend/jit/llvm/llvmjit_inline.cpp | 183 |
function_inlinable | src/backend/jit/llvm/llvmjit_inline.cpp | 125 |
standard_planner jitFlags | src/backend/optimizer/plan/planner.c | 604 |
jit_above_cost GUC entry | src/backend/utils/misc/guc_tables.c | 3960 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”All claims below were checked against the REL_18 tree at commit
273fe94852b3a7e34fd171e8abdf1481beb302fa (2026-06-05).
- Provider abstraction is three callbacks. Confirmed:
_PG_jit_provider_initsets exactlyreset_after_error,release_context, andcompile_expr(llvmjit.c). The core injit.creferences LLVM throughstatic JitProviderCallbacks provideronly. - Lazy, cached library load. Confirmed:
provider_initreturns early onprovider_failed_loading/provider_successfully_loaded, probes withpg_file_existsbeforeload_external_function, and setsprovider_failed_loading = truebefore callinginitso a throwinginitis not retried. jit_compile_exprguards. Confirmed: it returns false whenstate->parentis NULL, orPGJIT_PERFORM/PGJIT_EXPRare unset, before ever calling the provider.- GUC defaults. Confirmed in
jit.c:jit_above_cost = 100000,jit_inline_above_cost = 500000,jit_optimize_above_cost = 500000,jit_enabled = true,jit_expressions = true,jit_tuple_deforming = true. The GUC table entries are inguc_tables.c. - Planner sets flags from cost. Confirmed:
standard_plannersetsPGJIT_PERFORMwhentop_plan->total_cost > jit_above_cost, and layersPGJIT_OPT3/PGJIT_INLINE/PGJIT_EXPR/PGJIT_DEFORMfrom the further thresholds and thejit_expressions/jit_tuple_deformingGUCs. - Opcode
switchmirrors the interpreter. Confirmed:llvm_compile_expriteratesstate->steps[0 .. steps_len-1], callsExecEvalStepOp, and has arms for the fullExprEvalOpenum down toEEOP_LAST(anAssert(false)). - Deform specialization facts. Confirmed:
slot_compile_deformreturns NULL forTTSOpsVirtualand any slot kind other than heap/buffer-heap/ minimal; computesguaranteed_column_numberfromATTNULLABLE_VALID && !atthasmissing && !attisdropped; advances the data pointer byl_sizet_const(att->attlen)for fixed-width columns and byvarsize_any/strlenforattlen == -1/-2. - Lazy emission. Confirmed:
llvm_compile_exprinstallsExecRunCompiledExpr(not a compiled pointer) asstate->evalfunc;llvm_compile_moduleis invoked fromllvm_get_function, and the comment inllvm_compile_modulenotes ORC “doesn’t actually emit code … happens lazily the first time a symbol … is requested.” - Inlining pulls from bitcode. Confirmed by the
README(operators compiled to$pkglibdir/bitcode/postgres/with an index) and byllvm_inline→llvm_build_inline_plan→llvm_execute_inline_plan. - ResourceOwner cleanup. Confirmed:
jit_resowner_descwithRELEASE_PRIO_JIT_CONTEXTSandResOwnerReleaseJitContext; contexts are remembered inllvm_create_contextand forgotten inllvm_release_context. - Caveat — not yet cached across queries. The
README“Caching” section states generated functions are not reused across executions because they embed pointers into per-execution memory; there is no IR/function cache in the REL_18 tree. Treat any claim of cross-query reuse as false. - Out of scope here. The
ExprState/ExprEvalSteplinearization, the interpreter dispatch it mirrors, and plan cost computation are covered inpostgres-expression-eval.md,postgres-executor.md, andpostgres-cost-model.md; this doc does not re-derive them.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”PostgreSQL’s JIT sits at a deliberately conservative point in a rich design space. Placing it against the alternatives clarifies both what it buys and what it leaves on the table.
Expression JIT vs. whole-pipeline compilation (HyPer)
Section titled “Expression JIT vs. whole-pipeline compilation (HyPer)”PostgreSQL compiles expressions and deforming but keeps the executor an
interpreter: each plan node still pulls tuples through the volcano-style
ExecProcNode iterator, and only the per-tuple expression and deform hot
spots become native. Thomas Neumann’s HyPer takes the opposite tack with the
produce/consume (push) model, compiling an entire pipeline of operators
into a single tight loop with no per-tuple function-call boundaries between
operators at all — data stays in CPU registers across operator boundaries
until a pipeline-breaker (hash build, sort) forces materialization. The
trade-off is stark: HyPer’s approach removes far more overhead but requires
the whole executor to be code-generated, a much larger engineering
commitment. PostgreSQL’s README explicitly lists “compiling larger parts
of queries” as future work and notes the obvious-seeming approach of JITing
individual expressions after N executions “turns out not to work too well”
because emitting many small functions has high per-function overhead — the
same observation that pushed HyPer toward whole-pipeline fusion.
Holistic query compilation (Krikellas et al.)
Section titled “Holistic query compilation (Krikellas et al.)”Krikellas, Viglas, and Cintra’s holistic model (the “generate, compile, link, execute” pipeline) predates HyPer’s push model and is closer in spirit to what PostgreSQL does: take a query plan and emit C source specialized to it, then invoke the system C compiler. PostgreSQL’s choice of LLVM over emitting-and-compiling-C is pragmatic — it avoids a hard runtime dependency on a full C toolchain and an on-disk compile step, getting IR directly via the LLVM C API and the Clang-emitted bitcode for operators. The cost is the LLVM dependency itself, which is precisely why the provider is a separately loadable shared object.
Vectorized interpretation vs. compilation (MonetDB/X100, DuckDB)
Section titled “Vectorized interpretation vs. compilation (MonetDB/X100, DuckDB)”A competing answer to interpreter overhead is vectorization rather than compilation: instead of generating native code per query, process tuples in batches (vectors) so the interpreter’s dispatch cost is amortized over a whole vector and the inner loops auto-vectorize. MonetDB/X100 and, more recently, DuckDB take this route and avoid compilation latency entirely. The 2018 “Everything you always wanted to know about compiled and vectorized queries but were afraid to ask” study (Kersten et al.) found the two approaches roughly competitive, with compilation favoring complex expression-heavy queries and vectorization favoring simpler, memory-bound ones. PostgreSQL is neither fully vectorized nor fully compiled: its executor remains a tuple-at-a-time interpreter, with JIT bolted onto the two spots where per-tuple compilation pays off, and the cost thresholds steer it toward exactly the expression-heavy analytic queries where compilation wins.
Caching and adaptive compilation — the open frontiers
Section titled “Caching and adaptive compilation — the open frontiers”The largest gap the README itself flags is caching: generated
functions embed absolute pointers into per-execution memory, so they cannot
currently be reused across executions or tied to prepared statements. The
fix it sketches — make ExprState reference per-execution memory as offsets
from a single base block — is a prerequisite for an LRU cache keyed on the
generated IR, and for moving expression compilation into the planner so a
prepared statement carries its compiled form. Beyond that, an adaptive
(“tiered”) JIT — start interpreting or compile at -O0, then rebuild an
optimized version in a background thread once a query proves long-running —
is the standard technique in managed-language VMs (HotSpot, V8) and is named
as a “further off” possibility. PostgreSQL’s all-or-nothing, cost-gated,
single-shot model is simpler and avoids the bookkeeping of profiling
counters, at the price of mispredicting on queries whose true cost diverges
from the planner’s estimate.
Where PostgreSQL’s choices land
Section titled “Where PostgreSQL’s choices land”The net picture: PostgreSQL chose maintainability and optionality over peak throughput. The op-for-op mirror of the interpreter keeps the two implementations in lockstep; the bitcode-from-C-sources trick avoids a second copy of every operator; the provider shared library keeps LLVM out of the base binary; and the planner-cost gate reuses an estimate the system already computes. The costs are real — no cross-query caching, no whole-pipeline fusion, occasional mis-triggering when cost estimates are wrong (a frequent source of “JIT made my query slower” reports) — but each is a conscious trade in favor of a JIT that an existing, extensible, interpreter-based engine can actually ship and maintain.
Sources
Section titled “Sources”- PostgreSQL REL_18 source (commit
273fe94852b3a7e34fd171e8abdf1481beb302fa, 2026-06-05):src/backend/jit/jit.c— provider-independent core, GUCs, entry gate.src/backend/jit/README— design rationale (what/why/how/when to JIT, shared-library separation, JIT context, error handling, type sync, inlining, caching limitations).src/backend/jit/llvm/llvmjit.c— LLVM provider core: context lifecycle, module/function management, optimization, ORC emission, session setup, type loading, symbol resolution.src/backend/jit/llvm/llvmjit_expr.c— per-opcode IR generation;BuildV1Call,build_EvalXFuncInt,ExecRunCompiledExpr.src/backend/jit/llvm/llvmjit_deform.c—slot_compile_deform, TupleDesc-specialized deforming.src/backend/jit/llvm/llvmjit_inline.cpp— bitcode-based operator inlining.src/backend/jit/llvm/llvmjit_types.c— type/function signature synchronization between C and JIT IR.src/include/jit/jit.h,src/include/jit/llvmjit.h—PGJIT_*flags,JitProviderCallbacks,LLVMJitContext.src/backend/optimizer/plan/planner.c—standard_plannercost-basedjitFlagsassignment.src/backend/utils/misc/guc_tables.c—jit_above_cost/jit_inline_above_cost/jit_optimize_above_costGUC definitions.
- Textbook background —
knowledge/research/dbms-general/captures of Database System Concepts (Silberschatz, Korth, Sudarshan; query processing / the iterator model) and Database Internals (Petrov; query execution). - Research lineage (named for orientation; see
knowledge/research/dbms-papers/where captured):- T. Neumann, “Efficiently Compiling Efficient Query Plans for Modern Hardware” (VLDB 2011) — HyPer produce/consume push model.
- K. Krikellas, S. Viglas, M. Cintra, “Generating Code for Holistic Query Evaluation” (ICDE 2010).
- P. Boncz, M. Zukowski, N. Nes, “MonetDB/X100: Hyper-Pipelining Query Execution” (CIDR 2005) — vectorized execution.
- T. Kersten et al., “Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask” (VLDB 2018).
- Cross-references (sibling docs in this folder):
postgres-expression-eval.md(theExprState/ExprEvalSteplinearization and interpreter this JIT mirrors),postgres-executor.md(the surrounding node-iterator machinery and wherees_jit/es_jit_flagslive onEState),postgres-cost-model.md(howtotal_cost— the quantity the JIT thresholds compare against — is computed).