Skip to content

CUBRID XASL Generator — Compiling the Optimized Plan Tree to a Server-Side Execution Tree

Contents:

Once an optimizer has chosen a plan, the plan is still not the thing a database engine runs. Database Internals (Petrov, Ch. 12 “Query Processing”) names the gap explicitly: the optimizer produces a plan, the executor consumes an executable form, and somebody has to walk the gap. The plan is a declarative artifact — “use index X for table A, then nested-loop join into B, then sort by C” — built from cost estimates and access-method selections. The executable form is a procedural artifact — a tree of operators each of which knows how to open, fetch, and close itself, with concrete buffers, predicate evaluators, and tuple layouts wired in. Codd’s relational algebra and Selinger’s System R cost model both stop at the plan boundary; they say nothing about how σ(p)(R) becomes a function the CPU can call.

Three textbook problems live in that gap, and they shape every plan-to-IR layer in every relational engine:

  1. Choosing an executor model. Database Internals Ch. 12 describes the iterator (Volcano) model, where each operator exposes open() / next() / close() and parents pull from children. Newer engines bolt on push-based execution (vectorised, MonetDB-style) or push-down compilation (HyPer, LLVM-IR JITted code). The choice changes what the IR holds. A pure iterator IR encodes one open/next/close triple per operator; a vectorised IR encodes batch-at-a-time loops; a compiled IR encodes generated code fragments.
  2. Naming the data inside the plan. Predicates and projections in the plan still reference parser-level objects (column names, expression trees, host variables). The IR needs to dereference these to memory locations the executor can read at runtime — descriptors that say “attribute id 3 of class oid X, please fetch into this DB_VALUE slot.” A dedicated sub-IR for runtime variables is universal: PostgreSQL’s Var/Const/Param/Aggref nodes, MySQL’s Item tree, Oracle’s “row source” expressions, and CUBRID’s REGU_VARIABLE are all the same shape.
  3. Serialising the IR for client-server execution. Many engines run optimisation client-side and execution server-side. CUBRID is one of them — xasl_generation.c compiles in the client; query_executor.c runs in the server. The IR must therefore round-trip through a byte stream. Pointers cannot be sent over the wire, so the serializer has to flatten the cyclic, sharing-rich IR into a self-describing buffer with offset-based references and an address-already-seen table. Designing Data-Intensive Applications (Kleppmann) Ch. 4 calls this positional encoding and notes its requirement: the encoder must visit each shared sub-tree once and emit subsequent references as offsets, otherwise the wire size blows up exponentially.

After these are named, every CUBRID-specific structure in this document either implements one of them or makes the resulting data structure efficient on the wire.

Every relational engine that compiles SQL down to a procedural plan tree adopts the same handful of patterns on top of the algebra. They are not in Selinger’s original System R paper; they are the engineering vocabulary that lives between the optimizer’s chosen plan and the runtime that executes it.

PostgreSQL’s nodes/plannodes.h defines Plan as the abstract parent of SeqScan, IndexScan, BitmapHeapScan, NestLoop, MergeJoin, HashJoin, Agg, Sort, Limit, Append, SetOp, ModifyTable, and friends — one C struct per physical operator, all linked by lefttree and righttree pointers so the tree is a normal binary plan tree. The executor walks this tree top-down at startup (ExecInitNode) and pull-fetches tuples from it bottom-up at runtime (ExecProcNode). MySQL 8.x’s iterator executor has the same shape with RowIterator and AccessPath (sql/access_path.h) classes; SQL Server’s showplan XML is the textual rendering of an analogous tree. The pattern is universal: one tree node per physical operator, child links per join leg or input source, and a common header that says what kind of operator this is.

CUBRID picks a more compact form. A single XASL_NODE struct holds all operator types via a PROC_TYPE type discriminator and a tagged union proc covering the variants. Where PostgreSQL has 30+ struct types under Plan, CUBRID has one struct with 17 PROC_TYPE values. Each proc carries its own specialisation in the union arm; the common fields (predicates, output list, sort spec, limit, sub-pointers) live directly on xasl_node and are reused across procs. This trades type safety for memory locality and makes the serializer trivial (one process function per discriminator).

Every leaf operator in the IR has to refer to a physical access path: a heap scan, an index scan, a list-file scan, or a constructed set of values. The name varies — PostgreSQL’s Path, MySQL’s AccessPath, Spark’s SparkPlan — but the purpose is identical: encode “where does the tuple stream come from” separately from “what do we do with it”. CUBRID factors this out as the ACCESS_SPEC_TYPE chain hanging off xasl->spec_list, with a HYBRID_NODE union picking the flavour (class scan, list scan, set scan, JSON-table scan, showstmt, dblink, etc.). For an index scan the access spec points to an INDX_INFO describing range type (R_KEY, R_RANGE, R_KEYLIST, …), btree id, and key ranges; for a heap scan it carries a class oid and attribute descriptors.

Predicates and projections survive the optimizer as expression trees referencing parser-level names. The executor cannot evaluate t.col1 = 7 against parser names; it needs an expression IR whose leaves are slots in tuple buffers, whose constants are pre-bound DB_VALUEs, and whose nodes know their operation kind without a string lookup. PostgreSQL’s ExprState (the runtime evaluator) is generated from the parse-tree expression by ExecInitExpr; MySQL keeps the parsed Item tree but caches resolved field positions on each evaluation. CUBRID’s REGU_VARIABLE is the equivalent — a discriminated union of “constant DB_VALUE”, “attribute id with offset”, “result of an arithmetic node”, “function call”, “position in a tuple list”. The XASL generator’s pt_to_regu_* family is the translator; the executor’s fetch_peek_dbval is the consumer.

Subquery placement: where do nested plans live?

Section titled “Subquery placement: where do nested plans live?”

A subquery in SQL becomes a plan tree of its own, and the parent plan needs to know when to run it. There are three canonical strategies:

  • Hoist into the outer plan as a join — the subquery is rewritten into a join during query rewrite and the IR carries no subquery node at all (PostgreSQL’s PLANNER does this for most uncorrelated IN subqueries via convert_ANY_sublink).
  • Pre-compute once and feed in — the subquery is run before the outer scan and its result list-file is bound as an input (PostgreSQL’s InitPlan). This works only for uncorrelated subqueries; the IR carries a list of init-plans on the parent.
  • Re-evaluate per outer row — a correlated subquery has to rerun on every outer row because its predicates reference outer columns; the IR carries it as a sub-plan attached to the outer (PostgreSQL’s SubPlan).

CUBRID encodes all three on every XASL_NODE: aptr_list (ahead pointer) holds uncorrelated subqueries that run once before the parent; dptr_list (driving pointer) holds correlated subqueries that re-run per outer tuple; scan_ptr holds the inner side of a join chained off the current scan. A single SELECT with both kinds of subquery and a multi-way join compiles into a single XASL whose child slots cover all three roles in one struct.

When optimisation runs on the client and execution runs on the server, the IR has to travel as bytes. Every system that does this faces the same three problems: shared subtrees must be emitted once (or the encoding blows up), pointers must be rewritten as offsets, and the receiver has to allocate a contiguous arena before unpacking. PostgreSQL’s outfuncs.c / readfuncs.c use a textual encoding with (@1) back-references for shared trees. CUBRID’s xts_map_xasl_to_stream / stx_map_stream_to_xasl use a binary stream with a visited-pointer hash table (XTS_VISITED_PTR) and offsets into a single xts_Stream_buffer, plus a small header that names class oids and lock requirements before the body even starts — so the server can take the locks before allocating XASL.

Theoretical conceptCUBRID name
Plan tree nodeXASL_NODE (src/query/xasl.h)
Physical operator discriminatorPROC_TYPE { UNION_PROC, DIFFERENCE_PROC, INTERSECTION_PROC, OBJFETCH_PROC, BUILDLIST_PROC, BUILDVALUE_PROC, SCAN_PROC, MERGELIST_PROC, HASHJOIN_PROC, UPDATE_PROC, DELETE_PROC, INSERT_PROC, CONNECTBY_PROC, DO_PROC, MERGE_PROC, BUILD_SCHEMA_PROC, CTE_PROC } (xasl.h)
Operator-specific payloadxasl_node::proc tagged union (buildlist, buildvalue, union_, mergelist, hashjoin, update, insert, delete_, connect_by, merge, cte, fetch)
Access pathACCESS_SPEC_TYPE chain in xasl->spec_list; HYBRID_NODE union picks class / list / set / showstmt / json-table / dblink / method / regu-value
Runtime expression IRREGU_VARIABLE (regu_var.hpp) with TYPE_CONSTANT, TYPE_ATTR_ID, TYPE_POSITION, TYPE_INARITH, TYPE_FUNC etc.
Predicate IRPRED_EXPR (xasl/xasl_predicate.hpp); T_PRED / T_EVAL_TERM / T_NOT_TERM discriminator
Output projectionOUTPTR_LIST / REGU_VARIABLE_LIST chain hanging off xasl->outptr_list
Per-spec column bufferVAL_LIST of QPROC_DB_VALUE_LIST nodes; reused between sibling regus that name the same column
Pre-computed (uncorrelated) subqueryxasl->aptr_list
Correlated subqueryxasl->dptr_list
Inner side of join chained off current scanxasl->scan_ptr
Path-fetch (OBJFETCH) sub-plansxasl->bptr_list, xasl->fptr_list
Index access metadataINDX_INFO referenced by ACCESS_SPEC_TYPE::indexptr (R_KEY, R_RANGE, R_KEYLIST, R_RANGE_LIST, …)
Predicate site within a scan (vertical / horizontal / heap)where_range, where_key, where_pred on ACCESS_SPEC_TYPE
Top-of-walk entry from PT_NODE → XASL_NODEparser_generate_xasl (xasl_generation.c)
Per-SELECT plan compilerpt_plan_query (xasl_generation.c)
Multi-row vs single-row select dispatchpt_to_buildlist_proc / pt_to_buildvalue_proc (xasl_generation.c)
Plan walker that fills spec/scan/subquery slotsqo_to_xaslgen_outer / gen_inner (optimizer/plan_generation.c)
Parser-tree → expression IRpt_to_regu_variable, pt_to_pred_expr, pt_to_outlist, pt_to_val_list, pt_to_index_info, pt_to_spec_list (xasl_generation.c)
Aptr / dptr injectorspt_set_aptr, pt_set_dptr (xasl_generation.c)
XASL allocatorregu_xasl_node_alloc (xasl_regu_alloc.cpp)
Serialization driverxts_map_xasl_to_stream (query/xasl_to_stream.c)
Per-XASL pack functionxts_save_xasl_node, xts_process_xasl_node (query/xasl_to_stream.c)
Visited-pointer offset tablexts_get_offset_visited_ptr, xts_mark_ptr_visited (query/xasl_to_stream.c)
Server-side unpackerstx_init_xasl_unpack_info, stream_to_xasl (query/stream_to_xasl.c, xasl/xasl_stream.cpp)

The XASL generator has four moving parts: the entry-point walk that turns a query-shaped PT_NODE into a tree of XASL procs, the plan-driven inner walk (gen_outer / gen_inner) that fills in spec lists, scan pointers, and subquery slots, the sub-IR builders (pt_to_regu_variable, pt_to_pred_expr, pt_to_outlist, pt_to_val_list, pt_to_index_info, pt_to_spec_list) that compile expressions, predicates, projections, and access specs, and the serializer (xts_map_xasl_to_stream and friends) that turns the resulting tree of pointers into a flat byte stream the server can unpack. We walk them in that order.

flowchart LR
  subgraph CLIENT["Client (xasl_generation.c, plan_generation.c)"]
    PT["PT_NODE tree<br/>(name-resolved,<br/>type-checked,<br/>view-rewritten)"]
    QO["QO_PLAN tree<br/>(optimizer output:<br/>scan / join / sort / follow)"]
    PGX["parser_generate_xasl<br/>· parser_generate_xasl_post<br/>(parser_walk_tree)"]
    PPQ["pt_plan_query<br/>per PT_SELECT"]
    BLP["pt_to_buildlist_proc<br/>(multi-row)"]
    BVP["pt_to_buildvalue_proc<br/>(single-row)"]
    POL["pt_to_outlist<br/>pt_to_val_list"]
    PSA["pt_set_aptr"]
    PGOP["pt_gen_optimized_plan"]
    QTX["qo_to_xasl<br/>gen_outer / gen_inner"]
    AAS["add_access_spec<br/>add_scan_proc<br/>add_subqueries"]
    PSD["pt_set_dptr"]
    XASL["XASL_NODE tree<br/>(in-memory, on client)"]
    XTS["xts_map_xasl_to_stream<br/>xts_save_xasl_node<br/>xts_process_xasl_node"]
    STREAM["XASL stream<br/>(packed bytes)"]
  end
  subgraph SERVER["Server (stream_to_xasl.c, query_executor.c)"]
    STX["stx_init_xasl_unpack_info<br/>stream_to_xasl"]
    XASLS["XASL_NODE tree<br/>(rebuilt, server side)"]
    EXEC["query_executor.c<br/>qexec_execute_mainblock"]
  end
  PT --> PGX
  PGX --> PPQ
  PPQ --> BLP
  PPQ --> BVP
  BLP --> POL
  BLP --> PSA
  BLP --> PGOP
  PGOP --> QTX
  QTX --> AAS
  AAS --> XASL
  BLP --> PSD
  PSD --> XASL
  QO --> PGOP
  XASL --> XTS
  XTS --> STREAM
  STREAM --> STX
  STX --> XASLS
  XASLS --> EXEC

The figure encodes three boundaries. (parser ↔ plan) the PT_NODE side is owned by xasl_generation.c under src/parser/; the QO_PLAN side is owned by plan_generation.c under src/optimizer/. They meet inside pt_gen_optimized_plan, which calls qo_to_xasl after allocating the parent XASL and seeding its outptr_list and aptr_list. (client ↔ server) the entire XASL generator lives client-side; serialization is the only path through which its output reaches the server. (buildlist ↔ buildvalue) a multi-row select compiles to BUILDLIST_PROC (which materialises into a temp list-file); a guaranteed-single-row select compiles to BUILDVALUE_PROC (which writes its result into a single tuple slot, no temp).

The node is wide: its common fields cover all proc kinds, and the proc-specific extras hide in the proc union.

// xasl_node — src/query/xasl.h
struct xasl_node
{
XASL_NODE_HEADER header; /* xasl_flag + id, packed first on the wire */
XASL_NODE *next; /* sibling XASL block (UNION_PROC chain etc.) */
PROC_TYPE type; /* discriminator: BUILDLIST_PROC, SCAN_PROC, … */
int flag;
QFILE_LIST_ID *list_id; /* materialised result handle */
// ... condensed: sort, limit, instnum/ordbynum bookkeeping ...
OUTPTR_LIST *outptr_list; /* projection (select list) */
ACCESS_SPEC_TYPE *spec_list; /* leaf access specs (class / list / set / …) */
VAL_LIST *val_list; /* per-tuple column buffer for spec_list */
XASL_NODE *aptr_list; /* CTEs + uncorrelated subqueries (run once, ahead) */
XASL_NODE *bptr_list; /* OBJFETCH_PROC list (path expressions) */
XASL_NODE *dptr_list; /* correlated subqueries (run per outer tuple) */
XASL_NODE *fptr_list; /* OBJFETCH after dptr */
XASL_NODE *scan_ptr; /* inner side of join chained off current scan */
XASL_NODE *connect_by_ptr; /* CONNECT BY xasl */
PRED_EXPR *during_join_pred; /* predicate evaluated during outer join match */
PRED_EXPR *after_join_pred; /* predicate evaluated after the join */
PRED_EXPR *if_pred; /* WHERE clause residual */
PRED_EXPR *instnum_pred; /* INST_NUM() < N */
union {
UNION_PROC_NODE union_; /* UNION/DIFFERENCE/INTERSECTION */
FETCH_PROC_NODE fetch; /* OBJFETCH_PROC */
BUILDLIST_PROC_NODE buildlist; /* multi-row result */
BUILDVALUE_PROC_NODE buildvalue; /* single-row aggregate result */
MERGELIST_PROC_NODE mergelist;
HASHJOIN_PROC_NODE hashjoin;
UPDATE_PROC_NODE update;
INSERT_PROC_NODE insert;
DELETE_PROC_NODE delete_;
CONNECTBY_PROC_NODE connect_by;
MERGE_PROC_NODE merge;
CTE_PROC_NODE cte;
} proc;
/* XASL cache + serialization metadata */
OID creator_oid;
int n_oid_list;
OID *class_oid_list;
int *class_locks;
int *tcard_list;
// ...
};

The four pointer slots (aptr_list, bptr_list, dptr_list, scan_ptr) plus next are how a single XASL_NODE structure encodes the four kinds of nested plan a SELECT can produce: a chain of unions (next), an outer-then-inner join chain (scan_ptr), an uncorrelated subquery that materialises before the parent (aptr_list), and a correlated subquery that re-runs per outer tuple (dptr_list). The deck’s slogan XASL(aptr, dptr, scan_ptr) is the executor’s mental model: for each outer row, run the dptrs, then descend the scan_ptr chain.

The compiler is an ordered walk over the post-rewrite PT_NODE, not a one-shot translation. It runs as a parser_walk_tree with a pre-order hook (parser_generate_xasl_pre) that pushes a fresh symbol-info frame and a post-order hook (parser_generate_xasl_post) that builds XASL bottom-up — so inner subqueries become XASL before the outer query that references them.

// parser_generate_xasl — src/parser/xasl_generation.c (condensed)
XASL_NODE *
parser_generate_xasl (PARSER_CONTEXT * parser, PT_NODE * node)
{
XASL_NODE *xasl = NULL;
// ... condensed: check abort, save next, walk for is_system_generated_stmt ...
switch (node->node_type)
{
case PT_SELECT:
case PT_UNION:
case PT_DIFFERENCE:
case PT_INTERSECTION:
node->info.query.is_subquery = (PT_MISC_TYPE) 0;
if (node) node = meth_translate (parser, node); /* method calls */
if (node)
{
xasl_Supp_info.query_list = parser_new_node (parser, PT_SELECT);
xasl_Supp_info.query_list->info.query.xasl = NULL;
pt_init_xasl_supp_info ();
/* the actual XASL build, bottom-up over subqueries */
node = parser_walk_tree (parser, node,
parser_generate_xasl_pre, NULL,
parser_generate_xasl_post, &xasl_Supp_info);
}
if (node && !pt_has_error (parser))
{
xasl = (XASL_NODE *) node->info.query.xasl;
}
break;
}
/* fill in XASL cache info: creator oid, class_oid_list, locks, tcard */
if (xasl)
{
// ... condensed: oid + locks + tcard arrays from xasl_Supp_info ...
}
return xasl;
}

The pre-order hook parser_generate_xasl_pre clears any stale info.query.xasl (the parse tree is reused on prepared statement re-execution) and calls pt_push_symbol_info to open a new symbol scope for this query level — symbols carry the table_info that pt_to_val_list will consume to build the per-spec column buffers.

The post-order hook parser_generate_xasl_post is where the work actually happens: when control returns from the children, all subqueries already have their XASL hung off info.query.xasl, so the parent’s call to parser_generate_xasl_proc only has to wire them in.

// parser_generate_xasl_post — src/parser/xasl_generation.c (condensed)
static PT_NODE *
parser_generate_xasl_post (PARSER_CONTEXT * parser, PT_NODE * node, void *arg, int *continue_walk)
{
XASL_NODE *xasl;
XASL_SUPP_INFO *info = (XASL_SUPP_INFO *) arg;
switch (node->node_type)
{
case PT_SELECT:
case PT_UNION:
case PT_DIFFERENCE:
case PT_INTERSECTION:
assert (node->info.query.xasl == NULL);
xasl = parser_generate_xasl_proc (parser, node, info->query_list);
pt_pop_symbol_info (parser); /* matched with the pre-order push */
// ... condensed: walk select.from for class oid list ...
break;
case PT_CTE:
xasl = parser_generate_xasl_proc (parser, node, info->query_list);
break;
}
return node;
}

parser_generate_xasl_proc dispatches by query node type (PT_SELECTpt_plan_query; PT_UNION etc. → set-op builder), and the result goes back onto node->info.query.xasl so the parent’s post-hook can find it.

Each SELECT’s compile is a two-step: ask the optimizer for a plan, then translate the plan plus the parse tree into XASL. The choice of BUILDLIST_PROC vs BUILDVALUE_PROC depends on whether the executor will produce one or many rows.

// pt_plan_query — src/parser/xasl_generation.c (condensed)
static XASL_NODE *
pt_plan_query (PARSER_CONTEXT * parser, PT_NODE * select_node)
{
XASL_NODE *xasl;
QO_PLAN *plan = NULL;
if (select_node->node_type != PT_SELECT)
return NULL;
/* 1) cost-based optimization */
plan = qo_optimize_query (parser, select_node);
/* if hint-driven optimization fails, retry without hints */
if (!plan && select_node->info.query.q.select.hint != PT_HINT_NONE)
{
// ... condensed: clear all hint sublists, retry qo_optimize_query ...
plan = qo_optimize_query (parser, select_node);
}
/* 2) translate plan + select_node into XASL */
if (pt_is_single_tuple (parser, select_node))
{
xasl = pt_to_buildvalue_proc (parser, select_node, plan); /* aggregate w/o GROUP BY */
}
else
{
xasl = pt_to_buildlist_proc (parser, select_node, plan); /* normal SELECT */
}
// ... condensed: dump plan + cache plan text on xasl ...
return xasl;
}

pt_is_single_tuple is true when the select list is a single aggregate (SELECT SUM(x) FROM t) with no GROUP BY — exactly the case the deck identifies for BUILDVALUE_PROC. Everything else is BUILDLIST_PROC and produces a list-file.

Buildlist construction — pt_to_buildlist_proc

Section titled “Buildlist construction — pt_to_buildlist_proc”

The buildlist compiler is the longest function in the file (thousands of lines covering aggregates, group-by, having, order-by, limit, window functions, connect-by). Its skeleton is what matters here: allocate, set parent context, build outptr, set aptr, generate the optimized plan (which fills spec_list/scan_ptr), set dptr. The order matters: outptr before aptr because uncorrelated subqueries can be referenced from the select list; aptr before optimized-plan because the optimized plan’s per-spec subqueries need to know what’s already been hoisted; dptr last because correlated subqueries must hang off the leaf scan, which only exists once gen_outer has finished.

// pt_to_buildlist_proc — src/parser/xasl_generation.c (condensed)
static XASL_NODE *
pt_to_buildlist_proc (PARSER_CONTEXT * parser, PT_NODE * select_node, QO_PLAN * qo_plan)
{
XASL_NODE *xasl, *save_parent_proc_xasl;
SYMBOL_INFO *symbols = parser->symbols;
PT_NODE *from = select_node->info.query.q.select.from;
if (symbols == NULL || from == NULL || select_node->node_type != PT_SELECT)
return NULL;
/* 1) allocate the BUILDLIST_PROC shell */
xasl = regu_xasl_node_alloc (BUILDLIST_PROC);
if (xasl == NULL) return NULL;
/* 2) tell child compiles who their parent is — pt_set_aptr/pt_set_dptr will read this */
save_parent_proc_xasl = parser->parent_proc_xasl;
parser->parent_proc_xasl = xasl;
/* 3) limit / inst_num / ordby_num bookkeeping */
// ... condensed ...
/* 4) build select list as outptr */
if (pt_has_aggregate (parser, select_node))
{
/* group key + aggregate exprs become outptr; aggregates collected separately */
xasl->outptr_list = pt_to_outlist (parser, group_out_list, NULL, UNBOX_AS_VALUE);
// ... condensed: aggregate list, having, group-by sort spec ...
}
else
{
xasl->outptr_list = pt_to_outlist (parser, select_node->info.query.q.select.list,
&xasl->selected_upd_list, UNBOX_AS_VALUE);
}
/* 5) hoist uncorrelated subqueries into aptr_list before plan generation */
pt_set_aptr (parser, select_node, xasl);
/* 6) walk QO_PLAN to fill spec_list / scan_ptr / per-leaf subqueries */
xasl = pt_gen_optimized_plan (parser, select_node, qo_plan, xasl);
/* 7) attach correlated subqueries that reference outer columns */
/* pt_gen_optimized_plan already calls pt_set_dptr on the deepest leaf via qo_to_xasl */
parser->parent_proc_xasl = save_parent_proc_xasl;
return xasl;
}

The two helpers pt_set_aptr and pt_set_dptr walk the parse tree looking for PT_NODEs whose info.query.xasl was set by the post-order walk; if the subquery is uncorrelated they’re prepended to xasl->aptr_list, otherwise to xasl->dptr_list.

Plan walk — pt_gen_optimized_plan and qo_to_xasl

Section titled “Plan walk — pt_gen_optimized_plan and qo_to_xasl”

pt_gen_optimized_plan is a thin wrapper: it calls qo_to_xasl, then patches a few index-direction flags driven by query hints (USE_IDX_DESC, NO_IDX_DESC, NLJ_KEEP_HEAP_PAGE_PINNED). The recursive plan walk lives in qo_to_xasl and its mutual helpers gen_outer and gen_inner.

// qo_to_xasl — src/optimizer/plan_generation.c (condensed)
xasl_node *
qo_to_xasl (QO_PLAN * plan, xasl_node * xasl)
{
QO_ENV *env;
XASL_NODE *lastxasl;
if (plan && xasl && (env = plan->info->env))
{
xasl = gen_outer (env, plan, &EMPTY_SET, NULL, NULL, xasl);
/* find the deepest scan node — that's where correlated subqueries attach */
lastxasl = xasl;
while (lastxasl)
{
if (lastxasl->scan_ptr) lastxasl = lastxasl->scan_ptr;
else if (lastxasl->fptr_list) lastxasl = lastxasl->fptr_list;
else break;
}
pt_set_dptr (env->parser, env->pt_tree->info.query.q.select.list, lastxasl, MATCH_ALL);
xasl = preserve_info (env, plan, xasl);
}
return xasl;
}

pt_set_dptr is called with lastxasl deliberately: dptr subqueries fire per outer-row, so they must hang off the inner-most scan (the deepest scan_ptr reachable from the root) where each row materialisation happens. If you put them on the root, you get one execution per result row of the whole query — wrong; if you put them on the second-from-top, the join above would materialize without ever evaluating them — also wrong.

gen_outer is the recursion that translates plans by type:

// gen_outer — src/optimizer/plan_generation.c (condensed)
static XASL_NODE *
gen_outer (QO_ENV * env, QO_PLAN * plan, BITSET * subqueries,
XASL_NODE * inner_scans, XASL_NODE * fetches, XASL_NODE * xasl)
{
// ... condensed: bitset bookkeeping for predicates and subqueries ...
switch (plan->plan_type)
{
case QO_PLANTYPE_SCAN:
/* leaf — attach the access spec, then accumulated inner scans + fetches + correlated subqs */
xasl = add_access_spec (env, xasl, plan);
xasl = add_scan_proc (env, xasl, inner_scans);
xasl = add_fetch_proc (env, xasl, fetches);
xasl = add_subqueries (env, xasl, &new_subqueries);
break;
case QO_PLANTYPE_SORT:
/* if we're inside a join leg, materialise the sub-plan into a list-file
and rescan it; otherwise just recurse and add a sort spec */
if (inner_scans != NULL || plan->plan_un.sort.sort_type == SORT_LIMIT)
{
listfile = make_buildlist_proc (env, namelist);
listfile = gen_outer (env, plan->plan_un.sort.subplan,
&EMPTY_SET, NULL, NULL, listfile);
listfile = add_sort_spec (env, listfile, plan, xasl->ordbynum_val, false);
xasl = add_uncorrelated (env, xasl, listfile);
xasl = init_list_scan_proc (env, xasl, listfile, namelist,
&(plan->sarged_terms), NULL);
// ... condensed ...
}
else
{
xasl = gen_outer (env, plan->plan_un.sort.subplan, &new_subqueries,
inner_scans, fetches, xasl);
xasl = add_sort_spec (env, xasl, plan, NULL, true /* add instnum pred */);
}
break;
case QO_PLANTYPE_JOIN:
outer = plan->plan_un.join.outer;
inner = plan->plan_un.join.inner;
switch (plan->plan_un.join.join_method)
{
case QO_JOINMETHOD_NL_JOIN:
case QO_JOINMETHOD_IDX_JOIN:
/* build the inner side as a SCAN_PROC, then recurse on the outer
passing the inner as inner_scans */
inner_scans = gen_inner (env, inner, &predset, &new_subqueries, inner_scans);
xasl = gen_outer (env, outer, &EMPTY_SET, inner_scans, fetches, xasl);
break;
case QO_JOINMETHOD_MERGE_JOIN:
// ... condensed: build both sides as listfiles, attach via mergelist proc ...
break;
}
break;
// ... condensed: FOLLOW (path expr), WORST cases ...
}
// ... condensed: free bitsets ...
return xasl;
}

The recursion structure is what gives the deck its slogan “join order: A → B → C compiles to BUILDLIST(A) → SCAN(B) → SCAN(C)”. A QO_PLANTYPE_JOIN whose outer is itself a join recurses on the outer until it hits a QO_PLANTYPE_SCAN, which becomes the BUILDLIST_PROC with an access spec; on the way back up, each layer’s inner is wrapped as a SCAN_PROC (via gen_inner and make_buildlist_proc) and chained onto the outer’s scan_ptr. The result is a left-deep XASL chain whose root materialises into a list-file and whose scan_ptr descendants are pure pipelines.

add_access_spec is the leaf compiler — it converts the parse tree’s class spec into an ACCESS_SPEC_TYPE, picks predicates apart into key/access flavours, and binds the per-spec val_list:

// add_access_spec — src/optimizer/plan_generation.c (condensed)
static XASL_NODE *
add_access_spec (QO_ENV * env, XASL_NODE * xasl, QO_PLAN * plan)
{
PARSER_CONTEXT *parser = QO_ENV_PARSER (env);
PT_NODE *class_spec = QO_NODE_ENTITY_SPEC (plan->plan_un.scan.node);
PT_NODE *key_pred = NULL;
PT_NODE *access_pred = NULL;
PT_NODE *if_pred = NULL;
PT_NODE *instnum_pred = NULL;
QO_XASL_INDEX_INFO *info = qo_get_xasl_index_info (env, plan);
/* 1) split predicates: key (index range), access (heap-side residual), if/instnum */
make_pred_from_plan (env, plan, &key_pred, &access_pred, info, NULL);
/* 2) build the access spec — class oid, index info, range/key/pred regu lists */
xasl->spec_list = pt_to_spec_list (parser, class_spec, key_pred, access_pred,
plan, info, NULL, NULL);
/* 3) per-spec column buffer */
xasl->val_list = pt_to_val_list (parser, class_spec->info.spec.id);
/* 4) attach if-predicate and inst_num predicate to the parent xasl */
if_pred = make_if_pred_from_plan (env, plan);
instnum_pred = make_instnum_pred_from_plan (env, plan);
xasl = add_if_predicate (env, xasl, if_pred);
xasl = pt_to_instnum_pred (parser, xasl, instnum_pred);
// ... condensed: free pointer lists, free index info ...
return xasl;
}

The split into where_range / where_key / where_pred (the deck’s “vertical / horizontal / heap” axes) happens inside pt_to_index_info and pt_to_spec_list: predicates that match a leading prefix of the index drive where_range; predicates on remaining indexed columns become where_key (leaf-level filtering during index walk); predicates referencing non-indexed columns become where_pred (data-page evaluation after fetching the heap row).

Sub-IRs — REGU_VARIABLE, OUTPTR_LIST, VAL_LIST

Section titled “Sub-IRs — REGU_VARIABLE, OUTPTR_LIST, VAL_LIST”

A SELECT’s projection, predicates, and runtime constants all end up as REGU_VARIABLE trees. pt_to_regu_variable is the big switch that maps every PT_NODE shape into a regu — and where the executor reads it back via fetch_peek_dbval:

// pt_to_regu_variable — src/parser/xasl_generation.c (condensed signature + skeleton)
REGU_VARIABLE *
pt_to_regu_variable (PARSER_CONTEXT * parser, PT_NODE * node, UNBOX unbox)
{
REGU_VARIABLE *regu = NULL;
DB_VALUE *val = NULL;
if (node == NULL)
{
/* default: empty varchar constant — used as a placeholder */
regu_alloc (val);
db_value_domain_init (val, DB_TYPE_VARCHAR, DB_DEFAULT_PRECISION, DB_DEFAULT_SCALE);
regu = pt_make_regu_constant (parser, val, DB_TYPE_VARCHAR, NULL);
}
else if (PT_IS_POINTER_REF_NODE (node))
{
/* TYPE_CONSTANT pointing into another node's etc */
// ... condensed: domain resolution, regu->type = TYPE_CONSTANT ...
}
else
{
switch (node->node_type)
{
case PT_DOT_: /* path expr — resolves to TYPE_ATTR_ID via pt_attribute_to_regu */
case PT_NAME: /* column reference — TYPE_ATTR_ID or TYPE_POSITION */
case PT_VALUE: /* literal — TYPE_CONSTANT with a pre-bound DB_VALUE */
case PT_HOST_VAR: /* parameter — TYPE_CONSTANT with the host_var slot */
case PT_EXPR: /* arith/func — TYPE_INARITH with an ARITH_TYPE child */
case PT_FUNCTION: /* built-in or aggregate — TYPE_FUNC */
case PT_METHOD_CALL: /* method — special-cased */
// ... condensed: each shape allocates a regu and fills the union ...
}
}
return regu;
}

The deck’s example for a WHERE col1 IN (1,2) predicate is worth tracing through: the IN list lowers to two REGU_VARIABLEs of type TYPE_POS_VALUE (val_pos = 0 and val_pos = 1) pointing into a shared VAL_DESCR array that holds the constants 1 and 2; the IN itself becomes a PRED_EXPR of type T_PRED whose left and right hand sides are T_EVAL_TERM comparisons. The advantage of this layout is that the index range scan can re-bind the position values per key, and the heap-side residual can re-use the same PRED_EXPR without re-parsing.

pt_to_outlist walks the select list and produces an OUTPTR_LIST (a count plus a chain of REGU_VARIABLE_LIST nodes); pt_to_val_list walks the SYMBOL_INFO’s table_info for the spec id and produces a VAL_LIST whose entries are shared with the outptr regus when both reference the same column — that sharing is what makes “fetch a column once, project to many places” cheap at runtime.

Subquery placement — aptr / dptr / scan_ptr in one figure

Section titled “Subquery placement — aptr / dptr / scan_ptr in one figure”
flowchart TB
  Q["SELECT a.col1<br/>FROM tab a, tab b<br/>WHERE a.col1 = b.col1<br/>  AND a.col1 = (SELECT col1 FROM tab d)<br/>  /* uncorrelated */<br/>  AND EXISTS<br/>    (SELECT 1 FROM tab c WHERE c.k = a.k)<br/>  /* correlated */"]
  R["XASL_NODE root<br/>type=BUILDLIST_PROC<br/>(driver = tab a)"]
  AP["XASL_NODE<br/>type=BUILDLIST_PROC<br/>(uncorrelated subq: tab d)"]
  DP["XASL_NODE<br/>type=BUILDLIST_PROC<br/>(correlated subq: tab c, refs a.k)"]
  SP["XASL_NODE<br/>type=SCAN_PROC<br/>(inner: tab b)"]
  Q --> R
  R -. aptr_list .-> AP
  R -. dptr_list .-> DP
  R -. scan_ptr  .-> SP
  AP -. once before R .-> R
  DP -. per outer row .-> R
  SP -. nested per outer row .-> R

The executor’s loop reads the slots in this exact order:

run aptr_list once (materialises into a list-file, bound by LIST_SPEC_NODE)
for each outer tuple of R:
run dptr_list (correlated subq evaluated against the new outer)
descend scan_ptr chain (joined inner scan)
project R's outptr into list-file (BUILDLIST) or single value (BUILDVALUE)

The aptr_list member is also where CTEs land (the deck calls out CTE_PROC, and the comment in xasl.h says “CTEs are guaranteed always before the subqueries”). This is why a CTE reads like an uncorrelated subquery from the executor’s point of view — it is, modulo the recursive flag.

Predicate placement — RANGE / KEY / PRED

Section titled “Predicate placement — RANGE / KEY / PRED”

For INDEX(col1, col2, col3) and a query WHERE col1 = 1 AND col3 = 2 AND col4 = 2, the optimizer classifies each conjunct by its position relative to the index’s leading-key prefix:

flowchart LR
  WHERE["WHERE clause"]
  RANGE["where_range<br/>col1 = 1<br/>(B-tree vertical descend)"]
  KEY["where_key<br/>col3 = 2<br/>(B-tree leaf horizontal scan,<br/> non-leading indexed col)"]
  PRED["where_pred<br/>col4 = 2<br/>(heap-page evaluation,<br/> non-indexed col)"]
  IDX["INDEX(col1, col2, col3)"]
  WHERE --> RANGE
  WHERE --> KEY
  WHERE --> PRED
  IDX -.- RANGE
  IDX -.- KEY

pt_to_index_info builds the INDX_INFO (range type, btree id, key info chain) consumed by where_range; the residual goes into where_key/where_pred as PRED_EXPR trees built by pt_to_pred_expr. The deck observes that a PRED_EXPR is “a lighter struct than PT_NODE” — that’s because the parse tree carries source positions, type-derivation history, and rewrite state that the executor doesn’t need; the predicate IR only keeps the operation kind, the left/right regu pointers, and a short-circuit flag.

Once the XASL_NODE tree is built, it has to ship to the server. The sender is xts_map_xasl_to_stream:

// xts_map_xasl_to_stream — src/query/xasl_to_stream.c (condensed)
int
xts_map_xasl_to_stream (const XASL_NODE * xasl_tree, XASL_STREAM * stream)
{
int offset, header_size, body_size;
char *p;
if (!xasl_tree || !stream) return ER_QPROC_INVALID_XASLNODE;
/* Header: dbval_cnt + creator_oid + n_oid_list + class_oid_list + locks + tcard */
header_size = sizeof (int) + sizeof (OID) + sizeof (int)
+ sizeof (OID) * xasl_tree->n_oid_list
+ sizeof (int) * xasl_tree->n_oid_list
+ sizeof (int) * xasl_tree->n_oid_list;
/* layout: [size of header][header data][size of body][body data...] */
offset = sizeof (int) + header_size + sizeof (int);
offset = xasl_stream_make_align (offset);
xts_reserve_location_in_stream (offset);
xts_id_serial = 0;
/* recursive walk: each xts_save_* checks the visited table first,
emits the body once, returns the offset for subsequent references */
if (xts_save_xasl_node (xasl_tree) == ER_FAILED) goto end;
/* now backfill the header */
p = or_pack_int (xts_Stream_buffer, header_size);
p = or_pack_int (p, xasl_tree->dbval_cnt);
p = or_pack_oid (p, (OID *) (&xasl_tree->creator_oid));
p = or_pack_int (p, xasl_tree->n_oid_list);
for (i = 0; i < xasl_tree->n_oid_list; i++) p = or_pack_oid (p, &xasl_tree->class_oid_list[i]);
for (i = 0; i < xasl_tree->n_oid_list; i++) p = or_pack_int (p, xasl_tree->class_locks[i]);
for (i = 0; i < xasl_tree->n_oid_list; i++) p = or_pack_int (p, xasl_tree->tcard_list[i]);
body_size = xts_Free_offset_in_stream - offset;
p = or_pack_int (p, body_size);
stream->buffer = xts_Stream_buffer;
stream->buffer_size = xts_Free_offset_in_stream;
end:
xts_free_visited_ptrs ();
return xts_Xasl_errcode;
}

The header carries the class oid list and per-oid lock-mode list — the server reads these before unpacking the body so it can take the necessary locks first. If lock acquisition fails, the server can reject the query without ever materialising the XASL tree. The tcard_list is the per-class table-card hint the optimizer used; it lets the server detect plan staleness (if a class’s cardinality has changed materially since planning, the cached XASL is invalidated).

xts_save_xasl_node is the hot path for shared-subtree handling:

// xts_save_xasl_node — src/query/xasl_to_stream.c (condensed)
static int
xts_save_xasl_node (const XASL_NODE * xasl)
{
int offset, size;
if (xasl == NULL) return NO_ERROR;
/* deduplication: if we've packed this node before, return the prior offset */
offset = xts_get_offset_visited_ptr (xasl);
if (offset != ER_FAILED) return offset;
size = xts_sizeof_xasl_node (xasl);
offset = xts_reserve_location_in_stream (size);
xts_mark_ptr_visited (xasl, offset);
/* pack the body into a temp buffer, then memcpy into the stream */
buf = xts_process_xasl_node (buf_p, xasl);
// ... condensed: copy into xts_Stream_buffer at offset ...
return offset;
}

The visited-pointer check is what makes the encoding proportional to the unique node count rather than the path count — without it, a CTE referenced from two arms of a UNION would be packed twice. xts_process_xasl_node is the per-node field-by-field pack:

// xts_process_xasl_node — src/query/xasl_to_stream.c (condensed)
static char *
xts_process_xasl_node (char *ptr, const XASL_NODE * xasl)
{
int offset, cnt;
ACCESS_SPEC_TYPE *access_spec;
/* 1) header: assigned a fresh id_serial so the executor can identify nodes */
((XASL_NODE *) xasl)->header.id = xts_id_serial++;
ptr = xts_process_xasl_header (ptr, xasl->header);
ptr = or_pack_int (ptr, xasl->type);
ptr = or_pack_int (ptr, xasl->flag);
/* 2) every pointer becomes an offset (or 0 for NULL) */
offset = xts_save_list_id (xasl->list_id); ptr = or_pack_int (ptr, offset);
offset = xts_save_sort_list (xasl->after_iscan_list); ptr = or_pack_int (ptr, offset);
offset = xts_save_sort_list (xasl->orderby_list); ptr = or_pack_int (ptr, offset);
offset = xts_save_pred_expr (xasl->ordbynum_pred); ptr = or_pack_int (ptr, offset);
offset = xts_save_db_value (xasl->ordbynum_val); ptr = or_pack_int (ptr, offset);
offset = xts_save_regu_variable (xasl->orderby_limit); ptr = or_pack_int (ptr, offset);
// ... condensed: limit_offset, limit_row_count, single_tuple, outptr_list,
// selupd_list, val_list, merge_val_list ...
/* 3) access spec list: pack count then each spec */
for (cnt = 0, access_spec = xasl->spec_list; access_spec; access_spec = access_spec->next, cnt++) ;
ptr = or_pack_int (ptr, cnt);
for (access_spec = xasl->spec_list; access_spec; access_spec = access_spec->next)
ptr = xts_process_access_spec_type (ptr, access_spec);
/* 4) recurse into the four pointer slots */
offset = xts_save_xasl_node (xasl->aptr_list); ptr = or_pack_int (ptr, offset);
offset = xts_save_xasl_node (xasl->bptr_list); ptr = or_pack_int (ptr, offset);
offset = xts_save_xasl_node (xasl->dptr_list); ptr = or_pack_int (ptr, offset);
// ... condensed: fptr_list, scan_ptr, connect_by_ptr, predicates, instnum/orderby_num,
// proc-union (per type), creator_oid, class oid list, ... ...
return ptr;
}

Two encoding rules to note. (1) Pointers-as-offsets. Every field that’s a pointer in C becomes an or_pack_int of the offset returned by an xts_save_* helper; offset 0 means NULL. The receiver in stream_to_xasl.c pairs each pack with an or_unpack_int and an xts_get_addr_unpack_info-style lookup to rehydrate the pointer. (2) Pack-by-discriminator. The proc-union is dispatched on type — only the active arm is packed. The receiver does the same, so a BUILDVALUE_PROC costs the bytes of BUILDVALUE_PROC_NODE, not the bytes of the largest variant.

The unpack side is symmetric. stx_init_xasl_unpack_info allocates a server-side arena (unpack_info->packed_size * UNPACK_SCALE, where UNPACK_SCALE = 3 accounts for the deserialised tree being roughly three times the wire size due to alignment and pointer rehydration), then stream_to_xasl recursively reads each offset, looks it up in the already-unpacked table, and either returns the cached pointer or unpacks the body and inserts it.

Putting it together — one SELECT compile, end to end

Section titled “Putting it together — one SELECT compile, end to end”
sequenceDiagram
  participant CT as Client thread (PT walker)
  participant QO as qo_optimize_query
  participant PB as pt_to_buildlist_proc
  participant QX as qo_to_xasl / gen_outer
  participant XTS as xts_map_xasl_to_stream
  participant SVR as Server
  CT->>CT: parser_walk_tree pre-order:<br/>pt_push_symbol_info per query
  CT->>CT: post-order on inner subqueries first<br/>(parser_generate_xasl_proc)
  CT->>QO: qo_optimize_query(parser, select_node)
  QO-->>CT: QO_PLAN tree (scan/join/sort/follow)
  CT->>PB: pt_to_buildlist_proc(parser, select_node, plan)
  PB->>PB: regu_xasl_node_alloc(BUILDLIST_PROC)
  PB->>PB: pt_to_outlist  → xasl->outptr_list
  PB->>PB: pt_set_aptr   → xasl->aptr_list
  PB->>QX: pt_gen_optimized_plan
  QX->>QX: gen_outer recurses<br/>scan / join / sort / follow
  QX->>QX: add_access_spec → spec_list, val_list,<br/>          where_range / where_key / where_pred
  QX->>QX: add_scan_proc   → scan_ptr chain
  QX->>QX: add_subqueries  → aptr/dptr per leaf
  QX-->>PB: filled xasl
  PB->>PB: pt_set_dptr on lastxasl (deepest scan)
  PB-->>CT: xasl
  CT->>CT: parser_generate_xasl finishes:<br/>fill class_oid_list, locks, tcard, creator_oid
  CT->>XTS: xts_map_xasl_to_stream(xasl, stream)
  XTS->>XTS: pack header (dbval_cnt, oids, locks, tcard)
  XTS->>XTS: xts_save_xasl_node recursively;<br/>visited table dedups shared subtrees
  XTS-->>CT: stream->buffer + buffer_size
  CT->>SVR: prepare/execute RPC carrying the stream
  SVR->>SVR: stx_init_xasl_unpack_info (arena = size * UNPACK_SCALE)
  SVR->>SVR: stream_to_xasl rehydrates the tree
  SVR->>SVR: qexec_execute_mainblock runs it

The deck’s “XASL generator 진행 절차” slide gives the explicit order for the buildlist case; this matches the source exactly:

pt_to_buildlist_proc (PT_SELECT, QO_PLAN)
pt_to_outlist (select.list) — outlist (projection)
pt_set_aptr (select) — aptr (uncorrelated subqueries)
pt_gen_optimized_plan (select, plan, xasl)
qo_to_xasl (plan, xasl)
gen_outer (env, plan, ...)
case QO_PLANTYPE_SCAN:
add_access_spec — spec_list (range/key/pred, indexptr)
add_scan_proc — scan_ptr
add_subqueries — aptr + dptr per leaf
case QO_PLANTYPE_JOIN:
inner_scan = gen_inner (inner)
gen_outer (outer, ..., inner_scan)
pt_set_dptr (select.list, lastxasl) — dptr on deepest leaf

Two of these calls are subtle. pt_set_aptr must run before pt_gen_optimized_plan because add_subqueries looks up already-attached XASL on each subquery’s PT_NODE — the pre- attachment happens in pt_set_aptr’s walk over uncorrelated PT_SELECTs. pt_set_dptr must run after the recursive walk because correlated subqueries reference outer columns that only exist as val_list slots on the deepest scan; binding earlier would point at the wrong tuple buffer.

Anchor on symbol names, not line numbers. The CUBRID source moves; the position table at the end is scoped to the doc’s updated: date.

  • parser_generate_xasl (xasl_generation.c) — outermost entry called from the prepare/execute path; runs the two-pass walk and fills cache metadata.
  • parser_generate_xasl_pre / parser_generate_xasl_post (xasl_generation.c) — pre-order pushes symbol info, post-order builds XASL bottom-up.
  • parser_generate_xasl_proc (xasl_generation.c) — per-query dispatcher: SELECT → pt_plan_query, set ops → pt_to_union_proc family.
  • pt_plan_query (xasl_generation.c) — calls qo_optimize_query then pt_to_buildlist_proc / pt_to_buildvalue_proc.
  • pt_to_buildlist_proc (xasl_generation.c) — main multi-row compiler: aggregates, group-by, having, order-by, limit, window, connect-by.
  • pt_to_buildvalue_proc (xasl_generation.c) — single-row aggregate compiler (no GROUP BY).
  • pt_to_union_proc (xasl_generation.c) — UNION_PROC, DIFFERENCE_PROC, INTERSECTION_PROC builder.
  • pt_to_cte_proc (xasl_generation.c) — CTE_PROC builder.
  • pt_make_aptr_parent_node (xasl_generation.c) — common helper that allocates a parent XASL whose aptr holds a child query (used by INSERT/UPDATE/DELETE/MERGE).
  • pt_to_xasl_for_dblink (xasl_generation.c) — dblink-shaped XASL.
  • pt_gen_optimized_plan (xasl_generation.c) — wrapper that calls qo_to_xasl and patches index hints.
  • pt_gen_simple_plan (xasl_generation.c) — fallback when no QO_PLAN exists (eg. system-generated stmt without optimization).
  • qo_to_xasl (optimizer/plan_generation.c) — entry into the recursive walk.
  • gen_outer (optimizer/plan_generation.c) — outer-side recursion for SCAN / JOIN / SORT / FOLLOW.
  • gen_inner (optimizer/plan_generation.c) — inner-side builder; produces SCAN_PROC chained off outer’s scan_ptr.
  • make_buildlist_proc (optimizer/plan_generation.c) — sub-plan materialisation into a list-file.
  • make_sort_limit_proc (optimizer/plan_generation.c) — Top-N sort variant.
  • add_access_spec (optimizer/plan_generation.c) — leaf scan compiler.
  • add_scan_proc / add_fetch_proc / add_uncorrelated / add_subqueries (optimizer/plan_generation.c) — chain insertions for the four pointer slots on XASL_NODE.
  • add_sort_spec (optimizer/plan_generation.c) — orderby / groupby sort spec construction.
  • make_pred_from_plan (optimizer/plan_generation.c) — splits residual predicates into key / access classes.
  • qo_get_xasl_index_info (optimizer/plan_generation.c) — pulls out QO_XASL_INDEX_INFO for the chosen index.

Sub-IR builders (parse tree → XASL fragments)

Section titled “Sub-IR builders (parse tree → XASL fragments)”
  • pt_to_outlist (xasl_generation.c) — select list to OUTPTR_LIST.
  • pt_to_val_list (xasl_generation.c) — table_info to VAL_LIST.
  • pt_to_regu_variable (xasl_generation.c) — generic expression to REGU_VARIABLE (the big switch).
  • pt_to_pred_expr (xasl_generation.c) — predicate tree to PRED_EXPR.
  • pt_to_index_info (xasl_generation.c) — index access to INDX_INFO (range type, key info, key1/key2 regus).
  • pt_to_class_spec_list / pt_to_spec_list (xasl_generation.c) — class spec to ACCESS_SPEC_TYPE with key/pred/range regu lists.
  • pt_set_aptr / pt_set_dptr (xasl_generation.c) — uncorrelated / correlated subquery attachment.
  • pt_set_numbering_node_etc (xasl_generation.c) — wire INST_NUM / ORDERBY_NUM DB_VALUEs into parse-tree references.
  • pt_to_pos_descr (xasl_generation.c) — name to position descriptor inside an outptr.
  • pt_attribute_to_regu (xasl_generation.c) — column reference to TYPE_ATTR_ID regu.
  • pt_make_regu_constant (xasl_generation.c) — literal / host_var to TYPE_CONSTANT regu.
  • regu_xasl_node_alloc (xasl_regu_alloc.cpp) — XASL_NODE allocator (per-query parser arena, regu_alloc family).
  • regu_alloc (xasl_regu_alloc.hpp) — generic typed allocator.
  • pt_init_xasl_supp_info (xasl_generation.c) — clears the supplementary info accumulator between queries.
  • xts_map_xasl_to_stream (query/xasl_to_stream.c) — top pack entry; XASL_STREAM { buffer, buffer_size } out.
  • xts_save_xasl_node (query/xasl_to_stream.c) — visited check + body pack + memcpy into stream.
  • xts_process_xasl_node (query/xasl_to_stream.c) — per-node field-by-field pack.
  • xts_sizeof_xasl_node (query/xasl_to_stream.c) — pre-pack size estimate so the buffer can be reserved.
  • xts_save_outptr_list / xts_save_regu_variable / xts_save_pred_expr / xts_save_arith_type / xts_save_indx_info / xts_save_db_value / xts_save_val_list / xts_save_sort_list / xts_save_aggregate_type / xts_save_function_type / xts_save_analytic_type (query/xasl_to_stream.c) — per sub-IR pack functions, all using the visited table.
  • xts_get_offset_visited_ptr / xts_mark_ptr_visited / xts_free_visited_ptrs (query/xasl_to_stream.c) — visited-pointer table.
  • xts_reserve_location_in_stream (query/xasl_to_stream.c) — buffer growth.
  • xts_map_filter_pred_to_stream / xts_map_func_pred_to_stream (query/xasl_to_stream.c) — alternate entry for filter/function predicates (used by trigger and function indexes).
  • stx_init_xasl_unpack_info (xasl/xasl_stream.cpp) — server arena allocator (packed_size * UNPACK_SCALE).
  • stream_to_xasl (query/stream_to_xasl.c) — top unpack entry.
  • stx_get_xasl_node / stx_get_regu_variable / etc. (query/stream_to_xasl.c) — symmetric to the xts_save_* family.
  • xasl_stream_get_ptr_block / xasl_stream_make_align (xasl/xasl_stream.cpp) — alignment and block bookkeeping shared between sender and receiver.
  • xasl_stream_compare (xasl/xasl_stream.cpp) — equality on packed XASL fragments (used by the XASL cache lookup).
SymbolFileLine
xasl_node structsrc/query/xasl.h1075
PROC_TYPE enumsrc/query/xasl.h188
XASL_NODE_HEADER structsrc/query/xasl.h73
parser_generate_xaslsrc/parser/xasl_generation.c22460
parser_generate_xasl_presrc/parser/xasl_generation.c22322
parser_generate_xasl_postsrc/parser/xasl_generation.c22373
pt_plan_querysrc/parser/xasl_generation.c17680
pt_to_buildlist_procsrc/parser/xasl_generation.c16148
pt_to_buildvalue_procsrc/parser/xasl_generation.c17204
pt_gen_optimized_plansrc/parser/xasl_generation.c14714
pt_gen_simple_plansrc/parser/xasl_generation.c14812
pt_make_aptr_parent_nodesrc/parser/xasl_generation.c18519
pt_to_xasl_for_dblinksrc/parser/xasl_generation.c18807
pt_to_outlistsrc/parser/xasl_generation.c13830
pt_to_val_listsrc/parser/xasl_generation.c13237
pt_set_aptrsrc/parser/xasl_generation.c13388
pt_set_dptrsrc/parser/xasl_generation.c13370
pt_to_regu_variablesrc/parser/xasl_generation.c7577
pt_to_pred_exprsrc/parser/xasl_generation.c2194
pt_to_index_infosrc/parser/xasl_generation.c11852
pt_to_class_spec_listsrc/parser/xasl_generation.c12290
pt_to_spec_listsrc/parser/xasl_generation.c13167
pt_to_pos_descrsrc/parser/xasl_generation.c5377
pt_to_pos_descr_groupbysrc/parser/xasl_generation.c24103
qo_to_xaslsrc/optimizer/plan_generation.c2915
gen_outersrc/optimizer/plan_generation.c1955
gen_innersrc/optimizer/plan_generation.c2759
add_access_specsrc/optimizer/plan_generation.c895
add_scan_procsrc/optimizer/plan_generation.c978
add_subqueriessrc/optimizer/plan_generation.c1066
make_buildlist_procsrc/optimizer/plan_generation.c728
regu_xasl_node_allocsrc/parser/xasl_regu_alloc.cpp40
xts_map_xasl_to_streamsrc/query/xasl_to_stream.c276
xts_save_xasl_nodesrc/query/xasl_to_stream.c1682
xts_process_xasl_nodesrc/query/xasl_to_stream.c2810
xts_sizeof_xasl_nodesrc/query/xasl_to_stream.c5941
xts_save_aggregate_typesrc/query/xasl_to_stream.c498
xts_save_function_typesrc/query/xasl_to_stream.c634
xts_save_analytic_typesrc/query/xasl_to_stream.c702
xts_save_arith_typesrc/query/xasl_to_stream.c970
xts_save_indx_infosrc/query/xasl_to_stream.c1038
xts_save_outptr_listsrc/query/xasl_to_stream.c1106
xts_save_pred_exprsrc/query/xasl_to_stream.c1242
xts_save_regu_variablesrc/query/xasl_to_stream.c1310
xts_save_sort_listsrc/query/xasl_to_stream.c1394
xts_save_val_listsrc/query/xasl_to_stream.c1462
xts_save_db_valuesrc/query/xasl_to_stream.c1614
xts_save_filter_pred_nodesrc/query/xasl_to_stream.c1767
xts_save_func_predsrc/query/xasl_to_stream.c1835
xts_save_update_infosrc/query/xasl_to_stream.c2109
stx_init_xasl_unpack_infosrc/xasl/xasl_stream.cpp74
xasl_stream_make_alignsrc/xasl/xasl_stream.cpp231
  • Deck’s BUILDLIST_PROC definition is correct but incomplete. The deck says BUILDLIST is “ROW가 여러건이 될 수 있는 SQL”. The source agrees — pt_is_single_tuple (which selects BUILDVALUE) requires a single aggregate without GROUP BY. But there’s also a case the deck doesn’t mention: a SELECT that’s lifted into an UPDATE/DELETE/INSERT carrier via pt_make_aptr_parent_node is always BUILDLIST regardless of cardinality, because the outer modify proc is the consumer, not the user.

  • SCAN_PROC has no temp file — confirmed. The deck’s shorthand “SCAN_PROC: TEMP 영역을 사용하지 않음” matches the semantics: SCAN_PROC nodes that hang off scan_ptr are pipelined into the parent’s loop and write nothing to disk. BUILDLIST_PROC nodes do allocate a QFILE_LIST_ID and materialise; if they appear under scan_ptr (rare, only via make_buildlist_proc for join-leg materialisation) the materialisation happens regardless of position.

  • aptr_list is misnamed in the deck. The source comment on xasl->aptr_list reads “CTEs and uncorrelated subquery. CTEs are guaranteed always before the subqueries”. The deck treats aptr as “uncorrelated subquery” only. CTEs share the slot; the executor evaluates them in order, with CTEs first.

  • The deck’s “PRED_EXPR is lighter than PT_NODE” is true for a structural reason. PT_NODE carries source tracking, rewrite history, type-derivation state, and a 70-arm info union; PRED_EXPR’s union has three arms (pred, eval_term, not_term). The compile-time size of PT_NODE is hundreds of bytes; PRED_EXPR is ~32 bytes. The 5×–10× shrink is what makes per-row predicate evaluation cheap.

  • The deck’s “SCAN_PROC” cell shape on the multi-table example is a simplification. The deck shows the inner side of a join as just SCAN_PROC (tab b). The source actually produces a SCAN_PROC chained off the outer’s scan_ptr with its own spec_list (containing tab b’s access spec) and its own val_list; the SCAN_PROC inherits the parent’s outptr via projection forwarding. Visualising the inner as a full XASL_NODE makes the recursive structure explicit.

  • pt_set_dptr runs at the deepest scan, not on the buildlist root. The deck’s pseudocode shows pt_set_dptr (select.list) after qo_to_xasl returns. The source places the call inside qo_to_xasl itself, walking down scan_ptr and fptr_list until it hits a leaf. This is the correct placement: a correlated subquery referencing outer.col must execute after outer.col is in the val_list, which only happens at the leaf scan.

  • Visited-pointer table is per-pack, not global. The deck doesn’t address shared subtrees, but the source’s xts_free_visited_ptrs at the end of xts_map_xasl_to_stream shows that the table is reset per query — there is no inter-query sharing. Within a single query, however, a REGU_VARIABLE shared between an outptr and a where_pred is correctly emitted once.

  1. UNPACK_SCALE = 3 rationale. The server-side unpack arena is sized as packed_size * 3. Where does the factor come from? Inspection of a typical XASL packed-vs-rehydrated ratio would tell us whether 3 is conservative, empirically-derived, or arbitrary. Investigation path: instrument stx_init_xasl_unpack_info and db_private_alloc to log actual usage on a sample workload.

  2. Aggregate placement under hash GROUP BY. The deck stops at “BUILDLIST builds the temp file”; the source’s pt_to_buildlist_proc has a g_hash_eligible branch (PT_HINT_NO_HASH_AGGREGATE plus pt_is_hash_agg_eligible walks). When does the optimizer decide hash aggregation, and how does the hash table sit in the XASL_NODE proc-union? Investigation path: BUILDLIST_PROC_NODE::g_hash_eligible and BUILDLIST_PROC_NODE::g_output_first_tuple semantics in the executor.

  3. Connect-by connect_by_ptr vs proc-union connect_by. xasl_node carries both a connect_by_ptr (XASL_NODE) and a proc.connect_by (CONNECTBY_PROC_NODE). Why two? Probably because the connect-by sub-plan is itself an XASL_NODE (the recursive scan) while the proc-union holds the prior/level bookkeeping. Investigation path: trace pt_to_connect_by_proc (if it exists under that name) and qexec_execute_connect_by to see how the two interact.

  4. HASHJOIN_PROC reachability from xasl_generation. The PROC_TYPE enum lists HASHJOIN_PROC, but the deck does not mention it. Where does it get instantiated? The QO_JOINMETHOD_HASH_JOIN arm of gen_outer is the obvious suspect; but the body of that branch in the current source may have shifted to using a different proc type or an add_hash_pred extension. Investigation path: git grep for regu_xasl_node_alloc (HASHJOIN_PROC) and make_hashjoin_proc.

  5. Plan invalidation via tcard_list. The header carries per-class table-card hints. The server-side path that compares them against current cardinality (and decides whether to invalidate the cached XASL) is not traced in the deck. Investigation path: xasl_cache.c and the lookup that consumes tcard_list during xcache_find_xasl_id.

  6. REGU_VARIABLE sharing with vfetch_to. The deck shows REGU_VARIABLE carrying a vfetch_to pointer to a DB_VALUE_LIST entry, and notes “동일 컬럼의 regu_var일 경우 db_value의 값을 공유함”. The exact equality predicate the compiler uses to detect “same column” — is it attr_descr.id plus class oid, or PT_NODE pointer identity? Investigation path: pt_to_regu_variable for PT_NAME and the mq_regu_var_lookup cache (if any).

  7. Method-call regu interaction with parallelism. The xasl_node::px_executor and executed_parallelism fields make XASL parallel-aware. Method calls (JSP/SP) are noted as not parallel-safe in meth_translate. How does the generator decide a query is parallel-eligible vs not, and where is the gate? Investigation path: scan_check_parallel_heap_scan_possible and check_parallel_subquery_possible calls at the tail of parser_generate_xasl.

Raw analyses (raw/code-analysis/cubrid/query-processing/)

Section titled “Raw analyses (raw/code-analysis/cubrid/query-processing/)”
  • analysis_QP_XASL_generator_1.0.pptx — the deck this doc is anchored on.
  • _converted/analysisqpxaslgenerator1.0.pptx.md — markitdown extract.
  • (planned) cubrid-query-optimizer.md — produces the QO_PLAN consumed here.
  • (planned) cubrid-query-executor.md — consumes the XASL tree this generator produces.
  • (planned) cubrid-xasl-cache.md — caches packed XASL streams keyed on the SQL plus the class oid set; the header in xts_map_xasl_to_stream is the cache key payload.
  • Database Internals (Petrov), Ch. 12 “Query Processing” — plan vs executable form distinction, iterator (Volcano) model.
  • Designing Data-Intensive Applications (Kleppmann), Ch. 4 “Encoding and Evolution” — positional encoding with back-references, the shared-subtree problem in serialization.
  • Selinger et al., Access Path Selection in a Relational Database Management System, SIGMOD 1979 — the System R paper that draws the line between plan choice and plan execution.
  • Graefe, Volcano—An Extensible and Parallel Query Evaluation System, IEEE TKDE 1994 — the iterator model that XASL’s open/next/close inheritors implement.
  • Graefe, Query Evaluation Techniques for Large Databases, ACM Computing Surveys 1993 — the canonical survey of physical operators and how plan IRs encode them.

CUBRID source (/data/hgryoo/references/cubrid/)

Section titled “CUBRID source (/data/hgryoo/references/cubrid/)”
  • src/parser/xasl_generation.{c,h} — the bulk of the parser-side compiler; ~770 KB of C.
  • src/parser/xasl_regu_alloc.{cpp,hpp} — XASL_NODE / regu / predicate allocators tied to the parser arena.
  • src/optimizer/plan_generation.c — the plan-driven walk (gen_outer, gen_inner, add_access_spec, helpers).
  • src/optimizer/query_planner.hQO_PLAN shape consumed here.
  • src/query/xasl.hxasl_node, PROC_TYPE, all proc-union node definitions, ACCESS_SPEC / VAL_LIST / OUTPTR_LIST headers.
  • src/query/xasl_to_stream.{c,h} — sender; visited-pointer serializer.
  • src/query/stream_to_xasl.{c,h} — receiver.
  • src/xasl/xasl_stream.{cpp,hpp} — alignment / block helpers shared by sender and receiver, and the unpack-info arena.
  • src/xasl/xasl_predicate.hppPRED_EXPR definition.
  • src/xasl/access_spec.hppACCESS_SPEC_TYPE / hybrid node.
  • src/query/regu_var.hppREGU_VARIABLE definition.
  • src/query/query_executor.c — the consumer; entry qexec_execute_mainblock.