Skip to content

PostgreSQL JIT — LLVM Compilation of Expressions and Tuple Deforming

Contents:

A relational engine spends a surprising fraction of its CPU budget not on “doing the query” but on being general. Evaluating the predicate WHERE a.col = 3 does not require many machine instructions in principle — load a column, compare to a constant, branch — yet a generic interpreter needs several hundred cycles to do it, because it must work for any expression tree, over any table shape, with any extension’s operators installed. That generality is paid for in indirect calls (dispatch to an operator implementation looked up by OID), unpredictable branches (per-tuple switch on the next interpreter step), and memory traffic (carrying Datum/isnull pairs through generic structures). For short-lived OLTP statements this overhead is irrelevant; for analytic queries that grind millions of tuples through the same expression, it is the dominant cost.

Just-in-Time (JIT) compilation attacks this overhead by generating, at query execution time, a native function specialized to the exact expression and table at hand. The PostgreSQL jit/README frames it as “turning some form of interpreted program evaluation into a native program, and doing so at runtime.” It is JIT rather than ahead-of-time (AOT) compilation precisely because the specialization inputs — the parsed expression, the concrete TupleDesc, the resolved operator OIDs — are not known until run time. Three properties define the design space:

  1. What to specialize on. A JIT compiler removes overhead only when it can bake in runtime-known facts. For expression evaluation, the structure of the expression tree and the addresses of the operator functions are constants once the query plan exists, so indirect dispatch collapses into direct (and inlinable) calls. For tuple deforming, the number of columns, their fixed widths, their NOT-NULL-ness, and their alignment are properties of the TupleDesc, so the generic “loop over attributes consulting the descriptor” can be unrolled into a straight-line sequence with most branches resolved at compile time.

  2. When to pay for compilation. Compiling is not free: building IR, optimizing it, and emitting machine code costs milliseconds — an eternity next to a cheap query. JIT pays off only when the compiled code runs enough times to amortize that cost. The decision therefore needs a cost model: compile only when the estimated work is large enough.

  3. How interpreter and compiler stay in sync. A JIT for a hand-written interpreter is really a second implementation of the same semantics. If the interpreter gains an operator the compiler doesn’t know, results diverge. The maintainable designs keep the two implementations structurally parallel (one case per opcode in each) and share the underlying helper functions and type definitions rather than duplicating them.

Database System Concepts (Silberschatz et al., query-processing chapters) treats expression and predicate evaluation as the inner loop of the iterator model, and notes that “compiled” evaluation has historically been an alternative to interpretation going back to System R’s access-module generation. The modern lineage — Krikellas et al.’s holistic query compilation and Neumann’s HyPer “produce/consume” model — pushes the idea further, compiling whole pipelines rather than single expressions. PostgreSQL occupies a deliberately conservative point in that space: it JIT-compiles two well-isolated hot spots (expression evaluation and tuple deforming) while leaving the surrounding executor an interpreter, because those two are “commonly major CPU bottlenecks in analytics queries” yet have clean, narrow interfaces to the rest of the engine.

This section names the engineering patterns engines share when they bolt a JIT onto an existing interpreted executor, so PostgreSQL’s specific choices read as selections within a shared space.

Provider abstraction behind a thin wrapper

Section titled “Provider abstraction behind a thin wrapper”

A JIT backend is a heavyweight dependency — LLVM is tens of megabytes of C++ — that not every deployment wants linked into the core server binary. The common answer is an indirection layer: the executor calls a small, dependency-free wrapper (jit_compile_expr), which forwards to a provider implementing a fixed callback interface, loaded as a separate shared object only when JIT is actually requested. This keeps the main binary free of the compiler dependency, lets the OS package the compiler separately, and — as a bonus — makes the provider swappable (one could write a non-LLVM provider behind the same interface).

The compiler and the interpreter must agree on semantics. The least error-prone way to guarantee that is to give the code generator the same shape as the interpreter: a switch over the same opcode enum, with one arm per step, each arm emitting IR that does exactly what the interpreter’s corresponding arm does. Steps that are rare or complex are not re-emitted inline at all — the generated code simply calls back into the existing interpreter helper (ExecEval*), so only the hot, common steps get bespoke IR while the long tail reuses the C implementation.

Specializing tuple access from the descriptor

Section titled “Specializing tuple access from the descriptor”

“Deforming” — turning a packed on-disk tuple into an array of Datum plus isnull flags — is branch-heavy in the generic case: every column consults the descriptor for length, alignment, and null-bitmap position. Because the descriptor is fixed for a given scan, a JIT can generate a descriptor-specific deform routine in which fixed-width columns become constant pointer increments, NOT-NULL columns skip the null-bitmap check, and known alignment eliminates the per-column align computation. This is the “program that isn’t a program” case: deforming is not an expression, but it benefits enormously from compile-time knowledge.

The single biggest expression-evaluation win is removing the call into the SQL operator implementation (int4eq, float8pl, …) and instead inlining its body, exposing it to the optimizer for constant folding and dead-branch elimination. Maintaining a second copy of every operator just for inlining would be untenable, so the standard trick is to compile the engine’s own C operator sources to compiler IR bitcode at build time, ship that bitcode alongside the binary, and have the JIT pull operator definitions out of it on demand.

Because compilation is expensive, engines gate it on a cost estimate: the planner already computes a total plan cost, so a threshold on that cost is a cheap, already-available trigger. And because emitting one function at a time wastes per-emission overhead, the mature design defers emission — functions are defined into a module during plan initialization but machine code is generated lazily, the first time any function in the module is actually called, so a whole query’s worth of functions emit together.

flowchart TD
  subgraph plan["Planner — standard_planner"]
    cost["top_plan->total_cost"] --> gate{"cost ><br/>jit_above_cost?"}
    gate -- no --> none["jitFlags = PGJIT_NONE<br/>(pure interpreter)"]
    gate -- yes --> flags["set PGJIT_PERFORM<br/>+ EXPR / DEFORM<br/>+ OPT3 / INLINE<br/>by further thresholds"]
  end
  subgraph exec["Executor — ExecInitNode time"]
    flags --> compile["jit_compile_expr(state)"]
    compile --> provider["load llvmjit.so<br/>via provider_init()"]
    provider --> emitIR["emit LLVM IR<br/>for each ExprState"]
  end
  subgraph run["First evaluation"]
    emitIR --> lazy["ExecRunCompiledExpr<br/>-> llvm_get_function"]
    lazy --> machine["LLVM emits machine code<br/>for the whole module"]
    machine --> fast["subsequent calls run<br/>native function directly"]
  end

PostgreSQL’s JIT is structured as exactly the layered design above: a provider-independent core in src/backend/jit/jit.c that is compiled into every server, and an LLVM-specific provider in src/backend/jit/llvm/ that lives in a separately loadable llvmjit shared library. The README is explicit that “code intending to perform JIT … calls an LLVM independent wrapper located in jit.c,” and that the wrapper “is allowed to fail in case no JIT provider can be loaded.” That failure-tolerance is the whole point: a build without LLVM, or a deployment that hasn’t installed the LLVM package, simply runs the interpreter.

The provider interface and lazy library load

Section titled “The provider interface and lazy library load”

The provider contract is three function pointers, populated by the provider’s init entry point and stored in a single static struct:

// _PG_jit_provider_init — src/backend/jit/llvm/llvmjit.c
void
_PG_jit_provider_init(JitProviderCallbacks *cb)
{
cb->reset_after_error = llvm_reset_after_error;
cb->release_context = llvm_release_context;
cb->compile_expr = llvm_compile_expr;
}

The core never references LLVM symbols directly; it only calls through these pointers. Loading is lazy and one-shot: provider_init() probes for the shared library on disk, and — crucially — caches both success and failure so a missing provider is not retried on every expression:

// provider_init — src/backend/jit/jit.c
if (!jit_enabled)
return false;
if (provider_failed_loading) // never retry a known failure
return false;
if (provider_successfully_loaded)
return true;
snprintf(path, MAXPGPATH, "%s/%s%s", pkglib_path, jit_provider, DLSUFFIX);
if (!pg_file_exists(path)) { // probe before dlopen, which would ERROR
provider_failed_loading = true;
return false;
}
provider_failed_loading = true; // assume failure until init() returns
init = (JitProviderInit)
load_external_function(path, "_PG_jit_provider_init", true, NULL);
init(&provider);
provider_successfully_loaded = true;
provider_failed_loading = false;

The jit_provider GUC (default "llvmjit") names the library, so the abstraction is genuinely pluggable — a different provider shared object behind the same _PG_jit_provider_init symbol would slot in unchanged.

Every JIT compilation request funnels through jit_compile_expr in the core. It is the single place that consults the per-query flags and decides whether to hand off to the provider:

// jit_compile_expr — src/backend/jit/jit.c
if (!state->parent) // need an EState lifetime
return false;
if (!(state->parent->state->es_jit_flags & PGJIT_PERFORM))
return false;
if (!(state->parent->state->es_jit_flags & PGJIT_EXPR))
return false;
if (provider_init()) // also checks !jit_enabled
return provider.compile_expr(state);
return false;

Note the first guard: an ExprState with no parent PlanState has no EState to anchor a JIT context’s lifetime to, so it is never compiled — those one-off expressions would otherwise leak compiled functions until end-of-transaction. When all guards pass, provider.compile_expr (i.e. llvm_compile_expr) takes over.

Per-opcode IR generation mirrors the interpreter

Section titled “Per-opcode IR generation mirrors the interpreter”

llvm_compile_expr is, structurally, a clone of execExprInterp.c’s ExecInterpExpr: it walks the linearized ExprState->steps[] array and, in a switch (opcode), emits IR for each step. It first creates an LLVM function with the same C signature as the interpreter (pulled from llvmjit_types.c so the types stay in sync), then pre-creates one basic block per step so that steps can branch to each other:

// llvm_compile_expr — src/backend/jit/llvm/llvmjit_expr.c
eval_fn = LLVMAddFunction(mod, funcname,
llvm_pg_var_func_type("ExecInterpExprStillValid"));
LLVMSetLinkage(eval_fn, LLVMExternalLinkage);
llvm_copy_attributes(AttributeTemplate, eval_fn);
/* ... load v_state, v_econtext, slot value/null arrays ... */
opblocks = palloc(sizeof(LLVMBasicBlockRef) * state->steps_len);
for (int opno = 0; opno < state->steps_len; opno++)
opblocks[opno] = l_bb_append_v(eval_fn, "b.op.%d.start", opno);
LLVMBuildBr(b, opblocks[0]); // jump into first step
for (int opno = 0; opno < state->steps_len; opno++) {
op = &state->steps[opno];
opcode = ExecEvalStepOp(state, op);
LLVMPositionBuilderAtEnd(b, opblocks[opno]);
switch (opcode) { /* one arm per ExprEvalOp ... */ }
}

A hot step like reading an already-deformed column variable is emitted as direct loads from the slot’s tts_values/tts_isnull arrays — no function call at all:

// EEOP_*_VAR — src/backend/jit/llvm/llvmjit_expr.c
v_attnum = l_int32_const(lc, op->d.var.attnum);
value = l_load_gep1(b, TypeSizeT, v_values, v_attnum, "");
isnull = l_load_gep1(b, TypeStorageBool, v_nulls, v_attnum, "");
LLVMBuildStore(b, value, v_resvaluep);
LLVMBuildStore(b, isnull, v_resnullp);
LLVMBuildBr(b, opblocks[opno + 1]);

A function-call step (EEOP_FUNCEXPR_STRICT) emits an explicit null-check chain over the arguments — one basic block per argument — falling through to the real call only when all are non-null, exactly replicating the strict semantics the interpreter implements with a loop:

// EEOP_FUNCEXPR_STRICT — src/backend/jit/llvm/llvmjit_expr.c
for (int argno = 0; argno < op->d.func.nargs; argno++) {
LLVMPositionBuilderAtEnd(b, b_checkargnulls[argno]);
b_argnotnull = (argno + 1 == op->d.func.nargs)
? b_nonull : b_checkargnulls[argno + 1];
v_argisnull = l_funcnull(b, v_fcinfo, argno); // load fcinfo->args[i].isnull
LLVMBuildCondBr(b,
LLVMBuildICmp(b, LLVMIntEQ, v_argisnull, l_sbool_const(1), ""),
opblocks[opno + 1], // any null -> skip call
b_argnotnull);
}
LLVMPositionBuilderAtEnd(b, b_nonull);
v_retval = BuildV1Call(context, b, mod, fcinfo, &v_fcinfo_isnull);

Steps that are rare or unwieldy are not reimplemented in IR. Instead the generated code calls the interpreter’s own helper through the build_EvalXFunc macro, which assembles a direct call to e.g. ExecEvalParamExtern. This is the “mirror, but reuse for the long tail” pattern made concrete:

// EEOP_PARAM_EXTERN — src/backend/jit/llvm/llvmjit_expr.c
case EEOP_PARAM_EXTERN:
build_EvalXFunc(b, mod, "ExecEvalParamExtern",
v_state, op, v_econtext);
LLVMBuildBr(b, opblocks[opno + 1]);
break;

Direct, inlinable operator calls via BuildV1Call

Section titled “Direct, inlinable operator calls via BuildV1Call”

The mechanism that turns an indirect operator dispatch into a direct call — the prerequisite for later inlining — is BuildV1Call together with llvm_function_reference. For an operator whose symbol is known (fmgr_symbol resolves it), the call is emitted as a direct reference to a named function (int4eq, pgextern.<module>.<fn>, …); only for opaque function pointers does it fall back to loading a constant pointer. The result is IR in which the operator is a named callee that the inliner can later replace with the operator’s body:

// BuildV1Call — src/backend/jit/llvm/llvmjit_expr.c
v_fn = llvm_function_reference(context, b, mod, fcinfo);
v_fcinfo = l_ptr_const(fcinfo, l_ptr(StructFunctionCallInfoData));
v_fcinfo_isnullp = l_struct_gep(b, StructFunctionCallInfoData, v_fcinfo,
FIELDNO_FUNCTIONCALLINFODATA_ISNULL, "");
LLVMBuildStore(b, l_sbool_const(0), v_fcinfo_isnullp);
v_retval = l_call(b, LLVMGetFunctionType(AttributeTemplate), v_fn,
&v_fcinfo, 1, "funccall");

The companion lifetime-end annotation it emits (on LLVM < 22) tells the optimizer the argument memory need not be preserved across the call — improving the odds the inliner can drop redundant stores.

A predicate step (EEOP_QUAL, the WHERE-clause short-circuit) shows how control flow that the interpreter expresses with C if/goto becomes explicit basic-block branching in IR. A null or false result jumps to the qualfail block, which normalizes the result to a non-null false and jumps to the qual’s jumpdone target; a true result simply falls through:

// EEOP_QUAL — src/backend/jit/llvm/llvmjit_expr.c
v_resvalue = l_load(b, TypeSizeT, v_resvaluep, "");
v_resnull = l_load(b, TypeStorageBool, v_resnullp, "");
v_nullorfalse = LLVMBuildOr(b,
LLVMBuildICmp(b, LLVMIntEQ, v_resnull, l_sbool_const(1), ""),
LLVMBuildICmp(b, LLVMIntEQ, v_resvalue, l_sizet_const(0), ""), "");
LLVMBuildCondBr(b, v_nullorfalse, b_qualfail, opblocks[opno + 1]);
LLVMPositionBuilderAtEnd(b, b_qualfail);
LLVMBuildStore(b, l_sbool_const(0), v_resnullp); /* result not null */
LLVMBuildStore(b, l_sizet_const(0), v_resvaluep); /* result is false */
LLVMBuildBr(b, opblocks[op->d.qualexpr.jumpdone]); /* short-circuit out */

The simplest arm, EEOP_CONST, makes the “bake in runtime-known facts” principle literal: the constant’s Datum and null flag — fixed once the plan exists — are emitted as IR constants, so the optimizer can later constant-fold any operator that consumes them:

// EEOP_CONST — src/backend/jit/llvm/llvmjit_expr.c
v_constvalue = l_sizet_const(op->d.constval.value);
v_constnull = l_sbool_const(op->d.constval.isnull);
LLVMBuildStore(b, v_constvalue, v_resvaluep);
LLVMBuildStore(b, v_constnull, v_resnullp);

Tuple deforming specialized to the TupleDesc

Section titled “Tuple deforming specialized to the TupleDesc”

The deforming JIT is the clearest demonstration of “specialize on runtime-known facts.” slot_compile_deform builds a function — taking just a TupleTableSlot * — that deforms one specific tuple shape up to natts columns. It declines to generate anything for slot kinds it can’t handle (virtual slots never need deforming; only heap/buffer-heap/minimal slots are supported):

// slot_compile_deform — src/backend/jit/llvm/llvmjit_deform.c
if (ops == &TTSOpsVirtual)
return NULL;
if (ops != &TTSOpsHeapTuple && ops != &TTSOpsBufferHeapTuple &&
ops != &TTSOpsMinimalTuple)
return NULL;

Before emitting anything, it analyzes the descriptor to learn two compile-time facts per column: the last guaranteed-present column (a trailing run of NOT NULL, non-missing, non-dropped columns can be read without checking the tuple’s natts), and the known alignment so far. A fixed-width NOT NULL column lets the next column’s offset be a compile-time constant; the moment a variable-length or nullable column appears, alignment becomes unknown again:

// slot_compile_deform — src/backend/jit/llvm/llvmjit_deform.c
if (att->attnullability == ATTNULLABLE_VALID &&
!att->atthasmissing && !att->attisdropped)
guaranteed_column_number = attnum; // can skip natts check up to here
/* ... */
if (att->attlen < 0) { // varlena: alignment now unknown
known_alignment = -1;
attguaranteedalign = false;
} else if (att->attnullability == ATTNULLABLE_VALID &&
attguaranteedalign && known_alignment >= 0) {
known_alignment += att->attlen; // offset stays a constant
}

The store loop then emits, per column, code whose shape depends on those facts. For a by-value type the on-disk bytes are loaded at the exact integer width and sign-extended into a Datum; for a by-reference type a pointer-to-data is stored; the data pointer is advanced by a constant for fixed-width columns and by a call to varsize_any/strlen only for varlena and cstring:

// slot_compile_deform store loop — src/backend/jit/llvm/llvmjit_deform.c
if (att->attbyval) {
LLVMTypeRef vartype = LLVMIntTypeInContext(lc, att->attlen * 8);
v_tmp_loaddata = l_load(b, vartype,
LLVMBuildPointerCast(b, v_attdatap, LLVMPointerType(vartype, 0), ""), "");
v_tmp_loaddata = LLVMBuildSExt(b, v_tmp_loaddata, TypeSizeT, "");
LLVMBuildStore(b, v_tmp_loaddata, v_resultp);
} else {
LLVMBuildStore(b,
LLVMBuildPtrToInt(b, v_attdatap, TypeSizeT, "attr_ptr"), v_resultp);
}
if (att->attlen > 0)
v_incby = l_sizet_const(att->attlen); // constant stride
else if (att->attlen == -1) {
v_incby = l_call(b, llvm_pg_var_func_type("varsize_any"),
llvm_pg_func(mod, "varsize_any"), &v_attdatap, 1, "");
l_callsite_alwaysinline(v_incby); // mark varsize_any for inlining
}

The deform function is not produced eagerly for every scan. It is built on demand the first time a EEOP_*_FETCHSOME step runs in the compiled expression, and only when PGJIT_DEFORM is set. The fetch step first checks whether the slot already has enough attributes deformed (tts_nvalid >= last_var) and branches to the deform call only if not:

// EEOP_*_FETCHSOME — src/backend/jit/llvm/llvmjit_expr.c
v_nvalid = l_load_struct_gep(b, StructTupleTableSlot, v_slot,
FIELDNO_TUPLETABLESLOT_NVALID, "");
LLVMBuildCondBr(b,
LLVMBuildICmp(b, LLVMIntUGE, v_nvalid,
l_int16_const(lc, op->d.fetch.last_var), ""),
opblocks[opno + 1], b_fetch); // already deformed -> skip
/* ... in b_fetch: */
if (tts_ops && desc && (context->base.flags & PGJIT_DEFORM))
l_jit_deform = slot_compile_deform(context, desc, tts_ops,
op->d.fetch.last_var);
if (l_jit_deform) // call the specialized fn
l_call(b, LLVMGetFunctionType(l_jit_deform), l_jit_deform, params, 1, "");
else // fall back to interpreter
l_call(b, llvm_pg_var_func_type("slot_getsomeattrs_int"),
llvm_pg_func(mod, "slot_getsomeattrs_int"), params, 2, "");

When PGJIT_INLINE is set, llvm_compile_module runs llvm_inline over the module before optimization. The inliner does not maintain a second copy of each operator: at build time the engine’s C sources are compiled to LLVM bitcode and installed under $pkglibdir/bitcode/postgres/, with a summary index. llvm_inline builds an import plan — which external function references in the module are small enough and available in bitcode — then pulls those definitions in:

// llvm_inline — src/backend/jit/llvm/llvmjit_inline.cpp
llvm_inline(LLVMModuleRef M)
{
llvm::Module *mod = llvm::unwrap(M);
std::unique_ptr<ImportMapTy> globalsToInline =
llvm_build_inline_plan(lc, mod);
if (!globalsToInline)
return;
llvm_execute_inline_plan(mod, globalsToInline.get());
}

Because the direct calls emitted by BuildV1Call reference operators by their real symbol names, the inliner can match them against the bitcode and splice in the bodies — after which constant folding and dead-branch removal can fire on the now-visible operator logic. The README notes this is the “one big advantage” of JITing: collapsing PostgreSQL’s extensible function/operator dispatch.

Optimization levels, lazy emission, and cost gating

Section titled “Optimization levels, lazy emission, and cost gating”

llvm_optimize_module picks a pass pipeline from the context flags. Without PGJIT_OPT3 it runs a cheap default<O0>,mem2reg (plus an inline pass if PGJIT_INLINE is set); with PGJIT_OPT3 it runs a full default<O3> with an inliner threshold of 512:

// llvm_optimize_module (LLVM >= 17) — src/backend/jit/llvm/llvmjit.c
if (context->base.flags & PGJIT_OPT3)
passes = "default<O3>";
else if (context->base.flags & PGJIT_INLINE)
passes = "default<O0>,mem2reg,inline";
else
passes = "default<O0>,mem2reg";
LLVMPassBuilderOptionsSetInlinerThreshold(options, 512);
err = LLVMRunPasses(module, passes, NULL, options);

Crucially, IR is defined during ExecInitNode but machine code is emitted lazily. llvm_compile_expr installs ExecRunCompiledExpr as the expression’s eval function; the first actual evaluation calls llvm_get_function, which triggers llvm_compile_module (inline + optimize

  • hand the module to ORC) and then resolves the symbol — and ORC itself only materializes code the first time a symbol is looked up:
// ExecRunCompiledExpr — src/backend/jit/llvm/llvmjit_expr.c
CheckExprStillValid(state, econtext);
llvm_enter_fatal_on_oom();
func = (ExprStateEvalFunc) llvm_get_function(cstate->context, cstate->funcname);
llvm_leave_fatal_on_oom();
state->evalfunc = func; // remove the indirection for future calls
return func(state, econtext, isNull);

Whether any of this happens at all is decided once, in the planner. After producing the final plan, standard_planner compares the top plan’s cost to the JIT thresholds and sets the per-query jitFlags:

// standard_planner (jitFlags) — src/backend/optimizer/plan/planner.c
result->jitFlags = PGJIT_NONE;
if (jit_enabled && jit_above_cost >= 0 &&
top_plan->total_cost > jit_above_cost) {
result->jitFlags |= PGJIT_PERFORM;
if (jit_optimize_above_cost >= 0 &&
top_plan->total_cost > jit_optimize_above_cost)
result->jitFlags |= PGJIT_OPT3;
if (jit_inline_above_cost >= 0 &&
top_plan->total_cost > jit_inline_above_cost)
result->jitFlags |= PGJIT_INLINE;
if (jit_expressions) result->jitFlags |= PGJIT_EXPR;
if (jit_tuple_deforming) result->jitFlags |= PGJIT_DEFORM;
}

The defaults make the tiering legible: jit_above_cost = 100000 turns JIT on; jit_inline_above_cost = jit_optimize_above_cost = 500000 add inlining and full optimization only for substantially more expensive plans. A negative threshold disables that tier. This is the cost model from the Theoretical Background made concrete — reusing the planner’s existing cost estimate rather than instrumenting evaluation counts.

flowchart TD
  init["ExecInitExpr / ExecInitNode"] --> jce["jit_compile_expr(state)"]
  jce --> lce["llvm_compile_expr"]
  lce --> mut["llvm_mutable_module:<br/>get/create LLVM module"]
  mut --> sw["switch over steps[]:<br/>emit IR per opcode"]
  sw --> fetchq{"FETCHSOME step?"}
  fetchq -- yes --> deform["slot_compile_deform:<br/>tupledesc-specialized<br/>deform fn into same module"]
  fetchq -- no --> sw
  sw --> install["install ExecRunCompiledExpr<br/>as evalfunc (no emit yet)"]
  install --> firstcall["first evaluation"]
  firstcall --> getfn["llvm_get_function"]
  getfn --> cmod["llvm_compile_module:<br/>inline -> optimize -> ORC add"]
  cmod --> orc["ORC materializes machine code<br/>on symbol lookup"]
  orc --> native["native fn pointer cached in state->evalfunc"]

The JIT subsystem splits cleanly across the provider boundary. Below the symbols are grouped by file and call-flow; line numbers are deferred to the position-hint table at the end of the section.

Provider-independent core (src/backend/jit/jit.c)

Section titled “Provider-independent core (src/backend/jit/jit.c)”
  • GUC variables. jit_enabled, jit_provider, jit_expressions, jit_tuple_deforming, jit_above_cost, jit_inline_above_cost, jit_optimize_above_cost, jit_debugging_support, jit_profiling_support, jit_dump_bitcode — the knobs read by the planner and provider.
  • provider_init — lazy, one-shot loader. Probes $pkglibdir/<jit_provider>$DLSUFFIX with pg_file_exists, then load_external_functions the _PG_jit_provider_init symbol and caches success/failure in provider_successfully_loaded / provider_failed_loading.
  • pg_jit_available — the SQL-callable wrapper that forces a load attempt and returns whether a provider is usable.
  • jit_compile_expr — the single entry gate. Checks state->parent, PGJIT_PERFORM, PGJIT_EXPR, then provider_init() before delegating to provider.compile_expr.
  • jit_release_context, jit_reset_after_error — forward to the provider’s release_context / reset_after_error callbacks.
  • InstrJitAgg — folds per-context JitInstrumentation counters (created functions, generation/deform/inlining/optimization/emission time) into an aggregate for EXPLAIN.
  • JitProviderCallbacks (in jit/jit.h) and the PGJIT_* flag macros (PGJIT_PERFORM, PGJIT_OPT3, PGJIT_INLINE, PGJIT_EXPR, PGJIT_DEFORM) — the provider contract and the per-query flag bits.

LLVM provider core (src/backend/jit/llvm/llvmjit.c)

Section titled “LLVM provider core (src/backend/jit/llvm/llvmjit.c)”
  • _PG_jit_provider_init — populates the three callbacks (reset_after_error, release_context, compile_expr).
  • llvm_create_context / llvm_release_context — allocate/free a LLVMJitContext, register it with the current ResourceOwner (via ResourceOwnerRememberJIT and the jit_resowner_desc) so it is cleaned up on error or transaction end; track llvm_jit_context_in_use_count.
  • llvm_recreate_llvm_context — periodically disposes and recreates the shared LLVMContextRef (after LLVMJIT_LLVM_CONTEXT_REUSE_MAX uses) to bound the type leakage that inlining causes; calls llvm_inline_reset_caches first.
  • llvm_mutable_module — returns the in-progress LLVMModuleRef, creating one (with the right triple/layout) if none is pending.
  • llvm_expand_funcname — produces a unique externally-visible function name (<base>_<module_generation>_<counter>) and bumps created_functions.
  • llvm_get_function — forces llvm_compile_module if the module is not yet compiled, then LLVMOrcLLJITLookups the symbol, accumulating emission_counter (ORC emits lazily on lookup).
  • llvm_pg_var_type / llvm_pg_var_func_type / llvm_pg_func — pull type and function signatures out of the bitcode-loaded llvm_types_module, keeping JIT IR in sync with C structs.
  • llvm_function_reference — resolves an fcinfo to a named callee (pgextern.<mod>.<fn>, internal name, or a constant pointer global), enabling direct/inlinable calls.
  • llvm_optimize_module — selects the pass pipeline from the context flags (default<O0>,mem2reg vs. default<O3>), with an inliner threshold of 512.
  • llvm_compile_module — runs llvm_inline (if PGJIT_INLINE), optimizes, optionally dumps bitcode (jit_dump_bitcode), then adds the module to the opt0/opt3 ORC LLJIT instance via LLVMOrcLLJITAddLLVMIRModuleWithRT.
  • llvm_session_initialize / llvm_shutdown — one-time per-backend setup of the native target, host CPU/features, opt0/opt3 target machines, and the two LLVMOrcLLJITRef instances; shutdown disposes them on proc_exit.
  • llvm_create_types — loads llvmjit_types.bc and binds the global LLVMTypeRefs (StructTupleTableSlot, StructExprState, StructFunctionCallInfoData, the AttributeTemplate, …).
  • llvm_split_symbol_name / llvm_resolve_symbol / llvm_create_jit_instance — symbol resolution plumbing for ORC, including the custom definition generator that resolves SQL-callable functions and main-binary symbols.

Expression code generation (src/backend/jit/llvm/llvmjit_expr.c)

Section titled “Expression code generation (src/backend/jit/llvm/llvmjit_expr.c)”
  • llvm_compile_expr — the heart of the provider: creates the evalexpr function, loads slot value/null arrays, pre-allocates one basic block per ExprState step, and switches over ExecEvalStepOp to emit IR per opcode. Mirrors ExecInterpExpr in execExprInterp.c.
  • Opcode arms — representative cases: EEOP_DONE_RETURN (store isnull, return value), EEOP_*_FETCHSOME (deform trigger), EEOP_*_VAR (direct slot load), EEOP_CONST, EEOP_FUNCEXPR / EEOP_FUNCEXPR_STRICT (null-check chain + BuildV1Call), EEOP_QUAL (short-circuit on null/false), EEOP_PARAM_EXTERN (delegate to interpreter helper).
  • BuildV1Call — emits the direct V1 function call, stores isnull, and (LLVM < 22) the llvm.lifetime.end annotation via create_LifetimeEnd.
  • build_EvalXFuncInt (and the build_EvalXFunc macro) — assembles a direct call into a named ExecEval* interpreter helper for the long-tail opcodes.
  • ExecRunCompiledExpr — the thunk installed as state->evalfunc; validates the expression, triggers lazy emission via llvm_get_function, caches the native pointer, and tail-calls it.

Tuple deforming (src/backend/jit/llvm/llvmjit_deform.c)

Section titled “Tuple deforming (src/backend/jit/llvm/llvmjit_deform.c)”
  • slot_compile_deform — builds a TupleDesc-specialized deform function; declines virtual/unknown slot kinds; precomputes guaranteed_column_number and known_alignment; emits per-column load / store / pointer-advance with constant strides where possible and varsize_any/strlen calls (marked always-inline) for varlena/cstring.

Inlining (src/backend/jit/llvm/llvmjit_inline.cpp)

Section titled “Inlining (src/backend/jit/llvm/llvmjit_inline.cpp)”
  • llvm_inline — entry point; builds an import plan with llvm_build_inline_plan (consulting function_inlinable) and applies it with llvm_execute_inline_plan, pulling operator bodies out of the installed $pkglibdir/bitcode/ summaries.
  • llvm_inline_reset_caches — drops cached bitcode modules before the shared LLVMContextRef is recreated.

Cost gating (src/backend/optimizer/plan/planner.c)

Section titled “Cost gating (src/backend/optimizer/plan/planner.c)”
  • standard_planner (jitFlags block) — compares top_plan->total_cost to jit_above_cost / jit_optimize_above_cost / jit_inline_above_cost and sets the PlannedStmt.jitFlags consumed at execution time.

Position hints (as of 2026-06-05, REL_18 273fe94)

Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”
SymbolFileLine
provider_initsrc/backend/jit/jit.c67
pg_jit_availablesrc/backend/jit/jit.c56
jit_compile_exprsrc/backend/jit/jit.c151
jit_release_contextsrc/backend/jit/jit.c137
jit_reset_after_errorsrc/backend/jit/jit.c127
InstrJitAggsrc/backend/jit/jit.c182
jit_above_cost (GUC default)src/backend/jit/jit.c39
PGJIT_PERFORM / PGJIT_DEFORMsrc/include/jit/jit.h20 / 24
_PG_jit_provider_initsrc/backend/jit/llvm/llvmjit.c151
llvm_create_contextsrc/backend/jit/llvm/llvmjit.c223
llvm_release_contextsrc/backend/jit/llvm/llvmjit.c252
llvm_recreate_llvm_contextsrc/backend/jit/llvm/llvmjit.c173
llvm_mutable_modulesrc/backend/jit/llvm/llvmjit.c316
llvm_expand_funcnamesrc/backend/jit/llvm/llvmjit.c341
llvm_get_functionsrc/backend/jit/llvm/llvmjit.c362
llvm_function_referencesrc/backend/jit/llvm/llvmjit.c540
llvm_optimize_modulesrc/backend/jit/llvm/llvmjit.c603
llvm_compile_modulesrc/backend/jit/llvm/llvmjit.c709
llvm_session_initializesrc/backend/jit/llvm/llvmjit.c825
llvm_create_typessrc/backend/jit/llvm/llvmjit.c995
llvm_resolve_symbolsrc/backend/jit/llvm/llvmjit.c1087
llvm_create_jit_instancesrc/backend/jit/llvm/llvmjit.c1220
llvm_compile_exprsrc/backend/jit/llvm/llvmjit_expr.c80
EEOP_*_FETCHSOME armsrc/backend/jit/llvm/llvmjit_expr.c344
EEOP_*_VAR armsrc/backend/jit/llvm/llvmjit_expr.c444
EEOP_FUNCEXPR_STRICT armsrc/backend/jit/llvm/llvmjit_expr.c665
ExecRunCompiledExprsrc/backend/jit/llvm/llvmjit_expr.c2988
BuildV1Callsrc/backend/jit/llvm/llvmjit_expr.c3008
build_EvalXFuncIntsrc/backend/jit/llvm/llvmjit_expr.c3060
create_LifetimeEndsrc/backend/jit/llvm/llvmjit_expr.c3090
slot_compile_deformsrc/backend/jit/llvm/llvmjit_deform.c34
llvm_inlinesrc/backend/jit/llvm/llvmjit_inline.cpp167
llvm_inline_reset_cachessrc/backend/jit/llvm/llvmjit_inline.cpp156
llvm_build_inline_plansrc/backend/jit/llvm/llvmjit_inline.cpp183
function_inlinablesrc/backend/jit/llvm/llvmjit_inline.cpp125
standard_planner jitFlagssrc/backend/optimizer/plan/planner.c604
jit_above_cost GUC entrysrc/backend/utils/misc/guc_tables.c3960

All claims below were checked against the REL_18 tree at commit 273fe94852b3a7e34fd171e8abdf1481beb302fa (2026-06-05).

  • Provider abstraction is three callbacks. Confirmed: _PG_jit_provider_init sets exactly reset_after_error, release_context, and compile_expr (llvmjit.c). The core in jit.c references LLVM through static JitProviderCallbacks provider only.
  • Lazy, cached library load. Confirmed: provider_init returns early on provider_failed_loading / provider_successfully_loaded, probes with pg_file_exists before load_external_function, and sets provider_failed_loading = true before calling init so a throwing init is not retried.
  • jit_compile_expr guards. Confirmed: it returns false when state->parent is NULL, or PGJIT_PERFORM/PGJIT_EXPR are unset, before ever calling the provider.
  • GUC defaults. Confirmed in jit.c: jit_above_cost = 100000, jit_inline_above_cost = 500000, jit_optimize_above_cost = 500000, jit_enabled = true, jit_expressions = true, jit_tuple_deforming = true. The GUC table entries are in guc_tables.c.
  • Planner sets flags from cost. Confirmed: standard_planner sets PGJIT_PERFORM when top_plan->total_cost > jit_above_cost, and layers PGJIT_OPT3 / PGJIT_INLINE / PGJIT_EXPR / PGJIT_DEFORM from the further thresholds and the jit_expressions / jit_tuple_deforming GUCs.
  • Opcode switch mirrors the interpreter. Confirmed: llvm_compile_expr iterates state->steps[0 .. steps_len-1], calls ExecEvalStepOp, and has arms for the full ExprEvalOp enum down to EEOP_LAST (an Assert(false)).
  • Deform specialization facts. Confirmed: slot_compile_deform returns NULL for TTSOpsVirtual and any slot kind other than heap/buffer-heap/ minimal; computes guaranteed_column_number from ATTNULLABLE_VALID && !atthasmissing && !attisdropped; advances the data pointer by l_sizet_const(att->attlen) for fixed-width columns and by varsize_any / strlen for attlen == -1 / -2.
  • Lazy emission. Confirmed: llvm_compile_expr installs ExecRunCompiledExpr (not a compiled pointer) as state->evalfunc; llvm_compile_module is invoked from llvm_get_function, and the comment in llvm_compile_module notes ORC “doesn’t actually emit code … happens lazily the first time a symbol … is requested.”
  • Inlining pulls from bitcode. Confirmed by the README (operators compiled to $pkglibdir/bitcode/postgres/ with an index) and by llvm_inlinellvm_build_inline_planllvm_execute_inline_plan.
  • ResourceOwner cleanup. Confirmed: jit_resowner_desc with RELEASE_PRIO_JIT_CONTEXTS and ResOwnerReleaseJitContext; contexts are remembered in llvm_create_context and forgotten in llvm_release_context.
  • Caveat — not yet cached across queries. The README “Caching” section states generated functions are not reused across executions because they embed pointers into per-execution memory; there is no IR/function cache in the REL_18 tree. Treat any claim of cross-query reuse as false.
  • Out of scope here. The ExprState/ExprEvalStep linearization, the interpreter dispatch it mirrors, and plan cost computation are covered in postgres-expression-eval.md, postgres-executor.md, and postgres-cost-model.md; this doc does not re-derive them.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”

PostgreSQL’s JIT sits at a deliberately conservative point in a rich design space. Placing it against the alternatives clarifies both what it buys and what it leaves on the table.

Expression JIT vs. whole-pipeline compilation (HyPer)

Section titled “Expression JIT vs. whole-pipeline compilation (HyPer)”

PostgreSQL compiles expressions and deforming but keeps the executor an interpreter: each plan node still pulls tuples through the volcano-style ExecProcNode iterator, and only the per-tuple expression and deform hot spots become native. Thomas Neumann’s HyPer takes the opposite tack with the produce/consume (push) model, compiling an entire pipeline of operators into a single tight loop with no per-tuple function-call boundaries between operators at all — data stays in CPU registers across operator boundaries until a pipeline-breaker (hash build, sort) forces materialization. The trade-off is stark: HyPer’s approach removes far more overhead but requires the whole executor to be code-generated, a much larger engineering commitment. PostgreSQL’s README explicitly lists “compiling larger parts of queries” as future work and notes the obvious-seeming approach of JITing individual expressions after N executions “turns out not to work too well” because emitting many small functions has high per-function overhead — the same observation that pushed HyPer toward whole-pipeline fusion.

Holistic query compilation (Krikellas et al.)

Section titled “Holistic query compilation (Krikellas et al.)”

Krikellas, Viglas, and Cintra’s holistic model (the “generate, compile, link, execute” pipeline) predates HyPer’s push model and is closer in spirit to what PostgreSQL does: take a query plan and emit C source specialized to it, then invoke the system C compiler. PostgreSQL’s choice of LLVM over emitting-and-compiling-C is pragmatic — it avoids a hard runtime dependency on a full C toolchain and an on-disk compile step, getting IR directly via the LLVM C API and the Clang-emitted bitcode for operators. The cost is the LLVM dependency itself, which is precisely why the provider is a separately loadable shared object.

Vectorized interpretation vs. compilation (MonetDB/X100, DuckDB)

Section titled “Vectorized interpretation vs. compilation (MonetDB/X100, DuckDB)”

A competing answer to interpreter overhead is vectorization rather than compilation: instead of generating native code per query, process tuples in batches (vectors) so the interpreter’s dispatch cost is amortized over a whole vector and the inner loops auto-vectorize. MonetDB/X100 and, more recently, DuckDB take this route and avoid compilation latency entirely. The 2018 “Everything you always wanted to know about compiled and vectorized queries but were afraid to ask” study (Kersten et al.) found the two approaches roughly competitive, with compilation favoring complex expression-heavy queries and vectorization favoring simpler, memory-bound ones. PostgreSQL is neither fully vectorized nor fully compiled: its executor remains a tuple-at-a-time interpreter, with JIT bolted onto the two spots where per-tuple compilation pays off, and the cost thresholds steer it toward exactly the expression-heavy analytic queries where compilation wins.

Caching and adaptive compilation — the open frontiers

Section titled “Caching and adaptive compilation — the open frontiers”

The largest gap the README itself flags is caching: generated functions embed absolute pointers into per-execution memory, so they cannot currently be reused across executions or tied to prepared statements. The fix it sketches — make ExprState reference per-execution memory as offsets from a single base block — is a prerequisite for an LRU cache keyed on the generated IR, and for moving expression compilation into the planner so a prepared statement carries its compiled form. Beyond that, an adaptive (“tiered”) JIT — start interpreting or compile at -O0, then rebuild an optimized version in a background thread once a query proves long-running — is the standard technique in managed-language VMs (HotSpot, V8) and is named as a “further off” possibility. PostgreSQL’s all-or-nothing, cost-gated, single-shot model is simpler and avoids the bookkeeping of profiling counters, at the price of mispredicting on queries whose true cost diverges from the planner’s estimate.

The net picture: PostgreSQL chose maintainability and optionality over peak throughput. The op-for-op mirror of the interpreter keeps the two implementations in lockstep; the bitcode-from-C-sources trick avoids a second copy of every operator; the provider shared library keeps LLVM out of the base binary; and the planner-cost gate reuses an estimate the system already computes. The costs are real — no cross-query caching, no whole-pipeline fusion, occasional mis-triggering when cost estimates are wrong (a frequent source of “JIT made my query slower” reports) — but each is a conscious trade in favor of a JIT that an existing, extensible, interpreter-based engine can actually ship and maintain.

  • PostgreSQL REL_18 source (commit 273fe94852b3a7e34fd171e8abdf1481beb302fa, 2026-06-05):
    • src/backend/jit/jit.c — provider-independent core, GUCs, entry gate.
    • src/backend/jit/README — design rationale (what/why/how/when to JIT, shared-library separation, JIT context, error handling, type sync, inlining, caching limitations).
    • src/backend/jit/llvm/llvmjit.c — LLVM provider core: context lifecycle, module/function management, optimization, ORC emission, session setup, type loading, symbol resolution.
    • src/backend/jit/llvm/llvmjit_expr.c — per-opcode IR generation; BuildV1Call, build_EvalXFuncInt, ExecRunCompiledExpr.
    • src/backend/jit/llvm/llvmjit_deform.cslot_compile_deform, TupleDesc-specialized deforming.
    • src/backend/jit/llvm/llvmjit_inline.cpp — bitcode-based operator inlining.
    • src/backend/jit/llvm/llvmjit_types.c — type/function signature synchronization between C and JIT IR.
    • src/include/jit/jit.h, src/include/jit/llvmjit.hPGJIT_* flags, JitProviderCallbacks, LLVMJitContext.
    • src/backend/optimizer/plan/planner.cstandard_planner cost-based jitFlags assignment.
    • src/backend/utils/misc/guc_tables.cjit_above_cost / jit_inline_above_cost / jit_optimize_above_cost GUC definitions.
  • Textbook backgroundknowledge/research/dbms-general/ captures of Database System Concepts (Silberschatz, Korth, Sudarshan; query processing / the iterator model) and Database Internals (Petrov; query execution).
  • Research lineage (named for orientation; see knowledge/research/dbms-papers/ where captured):
    • T. Neumann, “Efficiently Compiling Efficient Query Plans for Modern Hardware” (VLDB 2011) — HyPer produce/consume push model.
    • K. Krikellas, S. Viglas, M. Cintra, “Generating Code for Holistic Query Evaluation” (ICDE 2010).
    • P. Boncz, M. Zukowski, N. Nes, “MonetDB/X100: Hyper-Pipelining Query Execution” (CIDR 2005) — vectorized execution.
    • T. Kersten et al., “Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask” (VLDB 2018).
  • Cross-references (sibling docs in this folder): postgres-expression-eval.md (the ExprState/ExprEvalStep linearization and interpreter this JIT mirrors), postgres-executor.md (the surrounding node-iterator machinery and where es_jit / es_jit_flags live on EState), postgres-cost-model.md (how total_cost — the quantity the JIT thresholds compare against — is computed).