CUBRID Query Evaluator — PRED_EXPR Walking, regu_variable Fetch, and the Row-Level Filter Engine
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Cross-check Notes
- Open Questions
- Sources
Theoretical Background
Section titled “Theoretical Background”A query evaluator is the runtime sub-system that turns a row plus a predicate
into a Boolean verdict and a column reference into a value. The executor decides
what to read; the evaluator decides whether what was read should be kept and
how each referenced expression resolves to a DB_VALUE. Engineering it well
means handling four orthogonal concerns at once: predicate-tree walking,
expression interpretation, three-valued logic, and the per-row materialisation
discipline that decides when to copy a value and when to share a pointer.
Goetz Graefe’s Query Evaluation Techniques for Large Databases (ACM Computing
Surveys 25(2), 1993) lays out the textbook taxonomy. Predicate evaluation is
treated as a special case of expression interpretation, where the expression
returns one of three Boolean values; the canonical implementation is a small
tree-walking interpreter with one method per node type. Graefe also names the
three sub-problems that every engine must answer: how to short-circuit AND /
OR so unnecessary subtrees are not evaluated, how to propagate UNKNOWN
without paying the cost of bool-promotion at every leaf, and how to amortise
the per-tuple interpretation cost when the same predicate shape repeats over
millions of rows.
Three-valued logic is the foundation. ANSI SQL defines NULL propagation
through WHERE predicates by Codd’s well-known truth table:
A | B | A AND B | A OR B | NOT A |
|---|---|---|---|---|
| T | T | T | T | F |
| T | F | F | T | F |
| T | U | U | T | F |
| F | F | F | F | T |
| F | U | F | U | T |
| U | U | U | U | U |
The driver-level consequence is that the short-circuit rules differ between
two-valued and three-valued logic. In two-valued AND, the first false ends
evaluation; in three-valued AND, only false ends it — unknown does not.
A WHERE filter additionally collapses unknown to false (Codd’s
implication: a tuple is admitted only when the predicate evaluates to true).
Other contexts — CHECK constraints, joining conditions, WHEN clauses —
treat the three values differently; each context’s evaluator must be told
which collapse rule applies. CUBRID encodes the collapse policy in its
QPROC_QUALIFICATION enum (QPROC_QUALIFIED / QPROC_NOT_QUALIFIED /
QPROC_QUALIFIED_OR_NOT) and threads it through update_logical_result.
Expression interpretation is the second axis. Database Internals (Petrov,
ch. 12) frames it as a tagged-union walker: a node carries an opcode and a
small set of operand pointers, the walker reads the opcode, recurses into the
operands, and applies the operator to the recursively-computed sub-values.
The classical implementation cost is one virtual call (or one switch jump)
per opcode per row. The hot loop is therefore the unit on which two
optimisations compete: operator specialisation (compile a dedicated routine
for the common shape and call it directly, skipping the dispatch) and JIT
compilation (assemble machine code that does the whole walk inline).
PostgreSQL has migrated from the former (specialised fast_qual) to the
latter (LLVM-backed ExecCompileExpr). CUBRID does only the former, via
eval_fnc, which inspects a PRED_EXPR and returns a function pointer to a
predicate-shape-specific evaluator (eval_pred_comp0 for binary equality,
eval_pred_like6 for LIKE, etc.). The full recursive walker eval_pred
remains the fallback for any predicate too compound to specialise.
The third axis is the register abstraction for expression operands. The
abstract syntax tree the optimiser produces contains literals, attribute
references, sub-query references, and arithmetic expressions side-by-side, and
the runtime needs a uniform pointer type for them all so the walker can store
them in a single field. Volcano-class engines call this a regu variable
(“regulator variable”); each instance carries a small type tag plus a union
that stores the kind-specific payload. Resolving a regu variable to a
DB_VALUE is the work of a single dispatcher (CUBRID’s fetch_peek_dbval)
that switches on the tag and delegates to a kind-specific path. The
abstraction is what lets the predicate walker be ignorant of where its leaves
came from — the same compare routine works whether the lhs is a heap-resident
column, a host parameter, or a sub-query result.
The final axis is memory discipline at fetch time. Per-row work
multiplied by millions of rows means every avoidable copy is amortised
across the whole table; the canonical optimisation is peek semantics —
the fetcher returns a pointer into a persistent buffer rather than copying
into a fresh DB_VALUE. CUBRID uses peek by default (fetch_peek_dbval,
fetch_val_list (..., peek=PEEK)); the copying counterpart fetch_copy_dbval
exists for cases where the value needs to outlive the page latch (sort keys,
group-by accumulators, sub-query result bindings). Petrov frames this as the
classic ownership question that every database iterator must answer: who
clears the value, and when?
Common DBMS Design
Section titled “Common DBMS Design”Every engine that interprets predicates and expressions at runtime — Postgres, MySQL, SQL Server, CUBRID — solves the same four sub-problems with three recurring patterns.
A predicate type with a dispatch enum
Section titled “A predicate type with a dispatch enum”The plan tree carries the predicate as a tagged-union node, and the evaluator
switches on the tag. Postgres has Expr with NodeTag (T_OpExpr,
T_BoolExpr, T_NullTest, …); ExecQual walks the qual list, calls
ExecEvalExpr on each, and AND-s the results. MySQL builds Item trees with
virtual methods (val_int, val_str, val_real); each operator overrides
the appropriate val_*. SQL Server’s expression service compiles a stack
machine: each node becomes a small program, and the evaluator interprets the
program for each row.
CUBRID’s tag enum is TYPE_PRED_EXPR:
// TYPE_PRED_EXPR — src/xasl/xasl_predicate.hpptypedef enum{ T_PRED = 1, // Boolean operator (AND / OR / XOR / IS / IS NOT) T_EVAL_TERM, // Comparison or membership leaf T_NOT_TERM // Logical negation} TYPE_PRED_EXPR;T_PRED nodes carry a BOOL_OP (B_AND / B_OR / B_XOR / B_IS /
B_IS_NOT); T_EVAL_TERM nodes carry a TYPE_EVAL_TERM discriminator
(T_COMP_EVAL_TERM, T_ALSM_EVAL_TERM, T_LIKE_EVAL_TERM,
T_RLIKE_EVAL_TERM); T_NOT_TERM is a thin wrapper around a child
pred_expr *. The structure is exactly what a textbook recursive-descent
predicate evaluator reads.
A regu variable for expression leaves
Section titled “A regu variable for expression leaves”The shape-uniform expression node is everywhere. Postgres uses Expr with a
common header; the actual sub-types (Const, Var, OpExpr, FuncExpr)
live behind a NodeTag. MySQL’s Item is a class hierarchy where each leaf
overrides the value accessors. CUBRID’s choice is a fixed-size C struct with
an explicit REGU_DATATYPE tag plus a union of payload pointers — a flat
representation chosen because the XASL tree is serialised across the
client/server boundary, and a flat struct is straightforward to pack and
unpack via xasl_to_stream / stream_to_xasl. The CUBRID flavour does not
support virtual dispatch through the wire, so the union-tag form is the only
option.
The regu variable is what makes the same walker handle every kind of operand. Comparison terms, arithmetic expressions, function calls, and sub-query references all resolve through a single fetcher; the predicate evaluator itself never branches on the operand kind.
Three-valued return through every layer
Section titled “Three-valued return through every layer”Each layer must return one of {true, false, unknown, error}. Postgres uses
bool plus a separate *isNull out-parameter for unknown; MySQL uses
Item::null_value set as a side effect of val_*. CUBRID uses a single
4-valued enum, DB_LOGICAL, encoding V_TRUE, V_FALSE, V_UNKNOWN, and
V_ERROR as distinct values. The advantage is that every layer can express
all four outcomes through one return path; the cost is that the walker must
distinguish unknown-from-NULL from error-from-failure at every recursion. The
short helpers eval_negative (for NOT) and eval_logical_result (for
combining sub-results) embody the tri-state truth tables and are reused across
the file.
Specialise the common cases, recurse on the rest
Section titled “Specialise the common cases, recurse on the rest”The operator-specialisation pattern: at predicate-tree analysis time, the
engine looks at the predicate shape and picks a specialised evaluator that
skips the per-node dispatch. CUBRID’s implementation is eval_fnc, which
returns a PR_EVAL_FNC function pointer — eval_pred_comp0 for a binary
comparison with no NULL or LIST_ID complication, eval_pred_comp1 for
IS NULL, eval_pred_comp2 for EXISTS, eval_pred_comp3 for a comparison
where one side is a list-file reference, eval_pred_alsm4 / eval_pred_alsm5
for ANY / ALL over set or list, eval_pred_like6 for LIKE,
eval_pred_rlike7 for RLIKE. The full eval_pred is the fallback when the
predicate is a Boolean composition that cannot be reduced to a single
specialised shape; that is where the recursive AND/OR/XOR/NOT walker lives.
The SCAN_PRED::pr_eval_fnc field in query_evaluator.h is what scan_manager
caches per scan so the per-row cost is one indirect call and not a repeated
switch on the root.
Where CUBRID sits
Section titled “Where CUBRID sits”CUBRID is an interpreted, no-JIT engine. The walker is the C function
eval_pred; the leaf compute is the function pointer fetch_peek_dbval. The
specialisation pass is eval_fnc. There is no batched / vectorised path —
every leaf is one DB_VALUE, every comparison is one row at a time. Compared
to Postgres post-LLVM, CUBRID gives up the JIT speedup; compared to MySQL’s
virtual-method overhead, CUBRID gains the flat struct’s locality. The
engineering trade is straightforward: simple to understand, easy to serialise,
slow on long predicate chains over scan-heavy workloads.
| Theoretical concept | CUBRID symbol |
|---|---|
| Predicate tree node | cubxasl::pred_expr (PRED_EXPR) |
| Boolean operator opcode | BOOL_OP (B_AND / B_OR / B_XOR / B_IS / B_IS_NOT) |
| Eval-term sub-type | TYPE_EVAL_TERM (T_COMP_EVAL_TERM / T_ALSM_EVAL_TERM / T_LIKE_* / T_RLIKE_*) |
| Three-valued result | DB_LOGICAL (V_TRUE / V_FALSE / V_UNKNOWN / V_ERROR) |
| Recursive walker | eval_pred (query_evaluator.c) |
| Predicate negation | eval_negative (query_evaluator.c) |
| Tri-state combine | eval_logical_result (query_evaluator.c) |
| Specialised compiler | eval_fnc returning PR_EVAL_FNC |
| Specialised binary-eq path | eval_pred_comp0 |
| Specialised IS NULL path | eval_pred_comp1 |
| Specialised EXISTS path | eval_pred_comp2 |
| Specialised list-id compare path | eval_pred_comp3 |
| Specialised ANY/ALL paths | eval_pred_alsm4 (set), eval_pred_alsm5 (list) |
| Specialised LIKE / RLIKE paths | eval_pred_like6, eval_pred_rlike7 |
| Regu variable | regu_variable_node (REGU_VARIABLE) (regu_var.hpp) |
| Regu kind tag | REGU_DATATYPE (TYPE_DBVAL / TYPE_CONSTANT / TYPE_INARITH / TYPE_ATTR_ID / …) |
| Peek-fetcher dispatcher | fetch_peek_dbval (fetch.c) |
| Copy-fetcher | fetch_copy_dbval (fetch.c) |
| Per-row regu list materialisation | fetch_val_list (fetch.c) |
| Arithmetic-expression evaluator | fetch_peek_arith (fetch.c) |
| Function-call dispatcher | qdata_evaluate_function (query_opfunc.c) |
| Filter bridge from scan-manager | eval_data_filter / eval_key_filter (query_evaluator.c) |
| Tri-state qualification policy | QPROC_QUALIFICATION enum (query_evaluator.h) |
| Qualification update | update_logical_result (query_evaluator.c) |
CUBRID’s Approach
Section titled “CUBRID’s Approach”The evaluator does four things and everything else is a special case of one of
them: (1) walk a PRED_EXPR tree under three-valued logic, returning
V_TRUE / V_FALSE / V_UNKNOWN; (2) at every leaf, fetch the operand
DB_VALUEs through the polymorphic fetch_peek_dbval; (3) where the
predicate’s shape is amenable, skip the recursion and call a specialised
single-shape evaluator picked at scan-open time by eval_fnc; (4) interface
with the scan manager via the eval_data_filter / eval_key_filter
boundaries, which are the entry points the rest of the executor calls. The
rest of this section walks each piece in the order a real predicate
traverses them — top-down through eval_pred, downward through the regu
variable to fetch_peek_dbval, sideways into qdata_evaluate_function for
function-call leaves.
The PRED_EXPR tree shape
Section titled “The PRED_EXPR tree shape”A PRED_EXPR is a tagged union of three node kinds, sitting on top of a small
zoo of eval-term variants:
flowchart TB
subgraph PE["pred_expr (TYPE_PRED_EXPR tag)"]
direction LR
P_PRED["T_PRED<br/>{ lhs, rhs, bool_op }<br/>BOOL_OP ∈<br/>{B_AND, B_OR, B_XOR,<br/>B_IS, B_IS_NOT}"]
P_EVAL["T_EVAL_TERM<br/>{ et_type, et union }"]
P_NOT["T_NOT_TERM<br/>{ not_term: pred_expr* }"]
end
subgraph ET["eval_term variants (TYPE_EVAL_TERM tag)"]
direction LR
ETC["T_COMP_EVAL_TERM<br/>{ lhs, rhs : regu_variable*<br/> rel_op : REL_OP<br/> type : DB_TYPE }"]
ETA["T_ALSM_EVAL_TERM<br/>{ elem, elemset : regu_variable*<br/> eq_flag : F_ALL or F_SOME<br/> rel_op : REL_OP }"]
ETL["T_LIKE_EVAL_TERM<br/>{ src, pattern, esc_char :<br/> regu_variable* }"]
ETR["T_RLIKE_EVAL_TERM<br/>{ src, pattern, case_sensitive,<br/> compiled_regex }"]
end
P_EVAL --> ETC
P_EVAL --> ETA
P_EVAL --> ETL
P_EVAL --> ETR
P_PRED -. recurse .-> PE
P_NOT -. recurse .-> PE
This shape is the load-bearing diagram of the chapter — eval_pred is a
recursive walker over exactly this graph. T_PRED is a Boolean
combinator; the bool_op tells the walker whether to AND-fold or OR-fold its
children. T_NOT_TERM flips the sub-result via eval_negative. T_EVAL_TERM
is a leaf; the leaf’s et_type decides which compare/match function applies.
The four eval-term variants cover the four shapes of base predicates SQL
admits at the leaf — comparisons (=, <, …), all/some quantification (> ANY, = ALL), LIKE pattern match, regular-expression match (RLIKE /
REGEXP).
REL_OP is the comparison vocabulary at a T_COMP_EVAL_TERM. The full set
encodes both ordinal comparisons (R_EQ, R_NE, R_LT, R_LE, R_GT,
R_GE), set comparisons (R_SUBSET, R_SUBSETEQ, R_SUPERSET,
R_SUPERSETEQ), the special unary tests (R_NULL for IS NULL, R_EXISTS
for EXISTS), the quantified comparisons (R_EQ_SOME, R_LT_ALL, …),
total-order comparison (R_EQ_TORDER) and null-safe equality
(R_NULLSAFE_EQ). The driver uses the tag to pick a specialised path inside
eval_value_rel_cmp; the quantified *_SOME / *_ALL codes are translated
to the equivalent base op and dispatched to eval_some_eval /
eval_all_eval.
Three-valued result and the AND/OR walker
Section titled “Three-valued result and the AND/OR walker”eval_pred is a recursive descent over the PRED_EXPR tree, returning
V_TRUE, V_FALSE, V_UNKNOWN, or V_ERROR. The walker has two structural
optimisations: (a) it short-circuits AND/OR as soon as the result is
determined, and (b) it walks right-linear chains iteratively rather than
recursively, reflecting the bias the XASL generator produces (pt_to_pred_expr
emits right-linear AND/OR chains so the walker can iterate on rhs without
blowing the stack on long conjunctions).
flowchart TD
Start["eval_pred (pr)"] --> Type{"pr->type"}
Type -->|T_PRED| Bool{"bool_op"}
Type -->|T_EVAL_TERM| Et{"et_type"}
Type -->|T_NOT_TERM| Not["eval_pred (not_term)<br/>then eval_negative"]
Bool -->|B_AND| AndLoop["result = V_TRUE<br/>walk right-linear<br/>chain on .rhs<br/>break on V_FALSE / V_ERROR"]
Bool -->|B_OR| OrLoop["result = V_FALSE<br/>walk right-linear<br/>chain on .rhs<br/>break on V_TRUE / V_ERROR"]
Bool -->|B_XOR| Xor["eval lhs, eval rhs<br/>UNKNOWN if either UNKNOWN<br/>else lhs != rhs"]
Bool -->|B_IS / B_IS_NOT| Is["eval lhs, eval rhs<br/>compare 3-valued results"]
Et -->|T_COMP_EVAL_TERM| Cmp["fetch lhs, rhs<br/>NULL → V_UNKNOWN<br/>else eval_value_rel_cmp"]
Et -->|T_ALSM_EVAL_TERM| Alsm["fetch elem, elemset<br/>route to eval_some_∗<br/>or eval_all_∗ by F_ALL/F_SOME"]
Et -->|T_LIKE_EVAL_TERM| Like["fetch src, pattern, esc<br/>db_string_like"]
Et -->|T_RLIKE_EVAL_TERM| Rlike["delegate to<br/>eval_pred_rlike7"]
The B_AND arm is the prototypical example of the right-linear iteration:
// eval_pred — src/query/query_evaluator.c (B_AND, condensed)case B_AND: /* 'pt_to_pred_expr()' will generate right-linear tree */ result = V_TRUE; for (t_pr = pr; result == V_TRUE && t_pr->type == T_PRED && t_pr->pe.m_pred.bool_op == B_AND; t_pr = t_pr->pe.m_pred.rhs) { if (result == V_UNKNOWN) { result = eval_pred (thread_p, t_pr->pe.m_pred.lhs, vd, obj_oid); result = (result == V_TRUE) ? V_UNKNOWN : result; } else { result = eval_pred (thread_p, t_pr->pe.m_pred.lhs, vd, obj_oid); } if (result == V_FALSE || result == V_ERROR) goto exit; } /* tail evaluation for the final right child */ if (result == V_UNKNOWN) { result = eval_pred (thread_p, t_pr, vd, obj_oid); result = (result == V_TRUE) ? V_UNKNOWN : result; } else { result = eval_pred (thread_p, t_pr, vd, obj_oid); } break;Three things to read from this loop. First, the for advances along .rhs,
not via stack recursion — the caller pays one stack frame for the whole
conjunction. Second, the result == V_UNKNOWN branch is the three-valued
discipline made explicit: once UNKNOWN is sticky, a V_TRUE from a later
conjunct cannot promote the whole AND back to true (the result becomes
V_UNKNOWN, not V_TRUE), but a V_FALSE still wins. Third, the goto exit
on V_FALSE is the short-circuit; a strict left-to-right evaluator would have
walked the rest of the chain. The B_OR arm is the dual pattern with the
sentinel flipped; B_XOR and B_IS / B_IS_NOT are short and not
right-linear because they have semantic two-childness rather than associative
chainability.
The recursion-depth guard at the top of eval_pred (incremented on entry,
decremented on exit) protects against pathologically deep predicate trees;
when thread_get_recursion_depth (thread_p) > prm_get_integer_value (PRM_ID_MAX_RECURSION_SQL_DEPTH) the walker bails with ER_MAX_RECURSION_SQL_DEPTH.
REGU_VARIABLE polymorphism
Section titled “REGU_VARIABLE polymorphism”A regu variable is a fixed-size descriptor of an expression operand; it
unifies the kinds the predicate walker needs to read. The shape is one
REGU_DATATYPE tag, a flags word, two domain pointers (current and
original — relevant for XASL clones), an optional fetch-into slot
(vfetch_to), and a discriminated union of payloads:
flowchart TB
subgraph RV["regu_variable_node"]
T["type : REGU_DATATYPE"]
F["flags<br/>(REGU_VARIABLE_HIDDEN_COLUMN,<br/>FETCH_ALL_CONST, FETCH_NOT_CONST,<br/>APPLY_COLLATION, ANALYTIC_WINDOW, …)"]
D["domain : TP_DOMAIN*<br/>original_domain : TP_DOMAIN*"]
VF["vfetch_to : DB_VALUE*<br/>(slot to share-fetch into, when used as a list element)"]
XL["xasl : xasl_node*<br/>(linked sub-query, if any)"]
U["value : union (one of)"]
end
subgraph UN["union value (selected by type)"]
direction TB
DBV["DB_VALUE dbval<br/>(TYPE_DBVAL — embedded literal)"]
DPV["DB_VALUE *dbvalptr<br/>(TYPE_CONSTANT, TYPE_ORDERBY_NUM —<br/>pointer into the sub-query result slot)"]
AR["ARITH_TYPE *arithptr<br/>(TYPE_INARITH, TYPE_OUTARITH —<br/>arithmetic / general expression)"]
AT["ATTR_DESCR attr_descr<br/>(TYPE_ATTR_ID, TYPE_CLASS_ATTR_ID, TYPE_SHARED_ATTR_ID —<br/>attribute fetch through HEAP_CACHE_ATTRINFO)"]
PD["QFILE_TUPLE_VALUE_POSITION pos_descr<br/>(TYPE_POSITION — list-file column read)"]
SR["QFILE_SORTED_LIST_ID *srlist_id<br/>(TYPE_LIST_ID — sub-query list ref)"]
VP["int val_pos<br/>(TYPE_POS_VALUE — host-variable index)"]
FN["function_node *funcp<br/>(TYPE_FUNC — function call)"]
RVL["REGU_VALUE_LIST *reguval_list<br/>(TYPE_REGUVAL_LIST — VALUES query)"]
RVS["REGU_VARIABLE_LIST regu_var_list<br/>(TYPE_REGU_VAR_LIST — CUME_DIST / PERCENT_RANK)"]
SP["sp_node *sp_ptr<br/>(TYPE_SP — stored procedure)"]
end
U --- DBV
U --- DPV
U --- AR
U --- AT
U --- PD
U --- SR
U --- VP
U --- FN
U --- RVL
U --- RVS
U --- SP
The advantage is that every operand of every predicate or expression — a
column, a literal, the result of (SELECT MAX(x) FROM t), the result of
a + b * 3, the result of JSON_EXTRACT(j, '$.foo') — is a REGU_VARIABLE.
The walker never asks “what kind of thing is this?”; it asks the fetcher,
which knows.
TYPE_OID and TYPE_CLASSOID are tagless cases — they need no payload
because the dispatcher reads the current row’s obj_oid / class_oid
arguments (passed alongside the regu variable) and stuffs them into the
shared value.dbval slot.
The flags carry information that survives across calls to the fetcher: in
particular, REGU_VARIABLE_FETCH_ALL_CONST means “every recursive operand
of this regu was constant on first fetch, so the cached value.dbval
remains valid”, and REGU_VARIABLE_FETCH_NOT_CONST is its dual. They are
populated by fetch_peek_dbval on the first call — the fetcher is
self-classifying — and read on subsequent calls to short-circuit the work.
This is CUBRID’s per-row constant-folding: an arithmetic expression of
literals becomes a one-time evaluation, even though the predicate walker
sees it again on every row.
fetch_peek_dbval — the regu dispatcher
Section titled “fetch_peek_dbval — the regu dispatcher”fetch_peek_dbval is a single switch on regu_var->type that resolves a
regu variable to a DB_VALUE *. It is the peek form — it returns a pointer
to a buffer the caller must not free; the value is valid until the next
operation on the same scan invalidates it. The copying counterpart
fetch_copy_dbval calls fetch_peek_dbval and then qdata_copy_db_values
the result into a caller-owned slot.
flowchart LR
In["regu_var<br/>· vd, class_oid, obj_oid, tpl"] --> Disp{"regu_var->type"}
Disp -->|TYPE_DBVAL| RD1["FLAG ALL_CONST<br/>peek = &value.dbval"]
Disp -->|TYPE_CONSTANT| RD2["FLAG NOT_CONST<br/>EXECUTE_REGU_VARIABLE_XASL<br/>(materialise sub-query)<br/>peek = value.dbvalptr"]
Disp -->|TYPE_ATTR_ID<br/>(also CLASS_ATTR_ID, SHARED_ATTR_ID)| RD3["FLAG NOT_CONST<br/>peek = attr_descr.cache_dbvalp<br/>(else heap_attrinfo_access)"]
Disp -->|TYPE_POSITION| RD4["FLAG NOT_CONST<br/>qfile_locate_tuple_value tpl<br/>data_readval into vfetch_to"]
Disp -->|TYPE_POS_VALUE| RD5["FLAG ALL_CONST<br/>peek = vd->dbval_ptr + val_pos<br/>(host variable)"]
Disp -->|TYPE_OID / TYPE_CLASSOID| RD6["FLAG ALL_CONST<br/>db_make_oid into value.dbval"]
Disp -->|TYPE_INARITH<br/>TYPE_OUTARITH| RD7["fetch_peek_arith<br/>(recursive expression eval)"]
Disp -->|TYPE_FUNC| RD8["qdata_evaluate_function<br/>(F_SET / F_JSON_∗ / F_REGEXP_∗ / …)"]
Disp -->|TYPE_SP| RD9["pl::executor.execute<br/>(stored-procedure call)"]
Disp -->|TYPE_REGUVAL_LIST| RDB["recurse on reguval_list<br/>->current_value->value"]
Disp -->|TYPE_ORDERBY_NUM| RDA["FLAG NOT_CONST<br/>peek = value.dbvalptr"]
RD3 --> Cache["cache_dbvalp pointer cached<br/>on first read; subsequent rows<br/>read directly without HEAP lookup"]
RD2 --> SqCache["XASL_USES_SQ_CACHE:<br/>sq_get keyed by aptr correlation"]
The path-by-path summary mirrors the diagram. Each path contributes one specific responsibility:
- TYPE_DBVAL — embedded literal. Returns
®u_var->value.dbval. SetsFETCH_ALL_CONST. Cheapest case: zero work after the initial flag set. - TYPE_CONSTANT — pointer-into-aptr-result. The
dbvalptrwas bound by the parent’s pre-execution to point into the materialised aptr’s tuple. On the first read, the linked sub-query may need executing (EXECUTE_REGU_VARIABLE_XASL); subsequent rows read the cached pointer. WhenXASL_USES_SQ_CACHEis set,sq_getshort-circuits via the sub-query cache. - TYPE_ATTR_ID (and
TYPE_CLASS_ATTR_ID,TYPE_SHARED_ATTR_ID) — attribute fetch. Reads throughHEAP_CACHE_ATTRINFO(the per-class decoder cache the heap manager populates onheap_next/heap_get_visible_version). The first call resolvesregu_var->value.attr_descr.idagainst the cache viaheap_attrinfo_access; the resultingDB_VALUE *is stashed inattr_descr.cache_dbvalpso subsequent rows skip the lookup. SetsFETCH_NOT_CONST(the value changes per row by definition). - TYPE_POSITION — list-file tuple column. The current row is a
QFILE_TUPLE;qfile_locate_tuple_value (tpl, pos_descr.pos_no, ...)finds the column’s slot in the tuple, andpr_type->data_readvaldecodes it intoregu_var->vfetch_to. The decoded value is unsuitable for sharing across rows; CUBRID always allocates a per-reguvfetch_toslot in advance. - TYPE_POS_VALUE — host variable.
vd->dbval_ptr[val_pos]. Constant for the query. Pure pointer arithmetic. - TYPE_OID / TYPE_CLASSOID — current OID. The
obj_oid/class_oidarguments are written intoregu_var->value.dbval. These are not strictly constant (they change per row) butFETCH_ALL_CONSTis still set because they are not derived from any operand whose constancy matters. - TYPE_INARITH / TYPE_OUTARITH — arithmetic / general expression.
Delegates to
fetch_peek_arith, which is a 1500-line switch onarithptr->opcodecovering every SQL operator (T_ADD,T_SUBSTRING,T_CASE,T_DECODE,T_TRIM,T_RAND,T_TO_CHAR, …). The recursive structure is the same as the predicate walker: fetchleftptr,rightptr, optionallythirdptr/fourthptr, then apply the per-opcode logic intoarithptr->value. Constancy is computed bottom-up: an arithmetic node isALL_CONSTiff all of its sub-regus are. Side-effect-bearing opcodes (T_RAND,T_SLEEP,T_NEXT_VALUE,T_INCR,T_CASE/T_DECODE/T_IF/T_PREDICATE,T_SYS_GUID,T_ROW_COUNT,T_LAST_INSERT_ID,T_EVALUATE_VARIABLE) forceNOT_CONSTregardless of operand constancy — these never benefit from the per-row cache. - TYPE_FUNC — function call. If
FETCH_ALL_CONSTis already set, the pre-computedfuncp->valueis returned without re-evaluation. Otherwise,qdata_evaluate_function(inquery_opfunc.c) dispatches onfuncp->ftype(theFUNC_CODEenum), calling out to one of the F_-routines:qdata_convert_dbvals_to_setforF_SET/F_MULTISET/F_SEQUENCE;qdata_insert_substring_functionforF_INSERT_SUBSTRING;qdata_eltforF_ELT;qdata_regexp_functionfor the regex family;qdata_convert_operands_to_value_and_callfor the entire JSON family with adb_evaluate_json_*function pointer passed in. After the call the constancy flags are recomputed by walking thefuncp->operandchain. - TYPE_SP — stored-procedure call. Constructs a
cubpl::executor, fetches arguments throughexecutor.fetch_args_peek, then runsexecutor.execute (*regu_var->value.sp_ptr->value). AlwaysNOT_CONST(recursively forced byfetch_force_not_const_recursiveto ensure the stored procedure’s body is re-evaluated per call). - TYPE_REGUVAL_LIST — VALUES query (multi-row literal table).
Recurses on
reguval_list->current_value->value. The list is iterated by theS_VALUES_SCANdriver, which advancescurrent_valueper row. - TYPE_ORDERBY_NUM —
ORDER BY nreference. Behaves likeTYPE_CONSTANTfor fetch purposes but isNOT_CONSTbecause the number-binding changes per output row.
The shared post-amble reapplies collation when REGU_VARIABLE_APPLY_COLLATION
is set, runs domain refinement (resolving DB_TYPE_VARIABLE against the
fetched value’s actual type), and for TYPE_REGUVAL_LIST cross-checks that
the current row’s domain matches the head row’s (the same homogeneity rule
UNION enforces). Errors at this stage manifest as
ER_QPROC_INCOMPATIBLE_TYPES or ER_QSTR_INCOMPATIBLE_COLLATIONS.
Attribute fetch end-to-end
Section titled “Attribute fetch end-to-end”The TYPE_ATTR_ID path is where the evaluator meets the heap. The handshake
lives in three pieces. First, scan_open_heap_scan calls heap_attrinfo_start
to allocate a HEAP_CACHE_ATTRINFO for the columns referenced by the
predicate plus the output list. Second, on every fetched row, the heap
iterator (heap_next / heap_get_visible_version_internal) calls
heap_attrinfo_read_dbvalues on the new RECDES, decoding each referenced
column into a per-attribute DB_VALUE inside the cache. Third, when
fetch_peek_dbval hits a TYPE_ATTR_ID regu, it calls
heap_attrinfo_access (regu_var->value.attr_descr.id, attr_cache) and
caches the returned pointer in attr_descr.cache_dbvalp so the subsequent
peek (within the same row) is a direct dereference. The cache_dbvalp is
NOT cleared between rows — the underlying buffer is reused, the value
inside is overwritten by heap_attrinfo_read_dbvalues. The fetcher
trusts the heap manager’s per-row reset.
eval_data_filter in query_evaluator.c is the bridge that ensures the
attribute cache is populated before the predicate evaluation:
// eval_data_filter — src/query/query_evaluator.c (condensed)DB_LOGICALeval_data_filter (THREAD_ENTRY * thread_p, OID * oid, RECDES * recdesp, HEAP_SCANCACHE * scan_cache, FILTER_INFO * filterp){ SCAN_PRED *scan_predp = filterp->scan_pred; SCAN_ATTRS *scan_attrsp = filterp->scan_attrs; DB_LOGICAL ev_res;
if (scan_attrsp != NULL && scan_attrsp->attr_cache != NULL && scan_predp->regu_list != NULL) { /* read the predicate values from the heap into the attribute cache */ if (heap_attrinfo_read_dbvalues (thread_p, oid, recdesp, scan_attrsp->attr_cache) != NO_ERROR) return V_ERROR; /* ... class-attribute scan special case ... */ }
/* evaluate the predicates of the data filter via the compiled fast path */ ev_res = V_TRUE; if (scan_predp->pr_eval_fnc && scan_predp->pred_expr) ev_res = (*scan_predp->pr_eval_fnc) (thread_p, scan_predp->pred_expr, filterp->val_descr, oid);
if (ev_res == V_TRUE && scan_predp->regu_list && filterp->val_list) { /* row passed: fetch the output regu list into val_list */ if (fetch_val_list (thread_p, scan_predp->regu_list, filterp->val_descr, filterp->class_oid, oid, NULL, PEEK) != NO_ERROR) return V_ERROR; } return ev_res;}Three details to read off this. First, heap_attrinfo_read_dbvalues is
called before the predicate, so every TYPE_ATTR_ID regu in the predicate
finds its cached value already populated. Second, the predicate runs through
scan_predp->pr_eval_fnc — that pointer was set by eval_fnc at scan-open
time, so this is the specialised path, not the generic eval_pred. Third,
the output list (fetch_val_list (..., PEEK)) is materialised only when
the predicate returns V_TRUE — failed rows skip the projection cost.
eval_key_filter is the analogue for index scans: instead of reading from
a RECDES, it decodes the key columns out of a DB_MIDXKEY (multi-column
B-tree key), positions them into the attribute cache, and then runs the
same pr_eval_fnc — but with obj_oid = NULL, since the key filter cannot
reference columns that are not in the key.
eval_fnc — operator specialisation at scan-open time
Section titled “eval_fnc — operator specialisation at scan-open time”When scan_open_heap_scan initialises a SCAN_PRED, it calls eval_fnc to
pick the per-row evaluator. eval_fnc looks at the root PRED_EXPR and
returns a function pointer that bypasses the recursive walker for the common
single-shape cases:
// eval_fnc — src/query/query_evaluator.c (condensed)PR_EVAL_FNCeval_fnc (THREAD_ENTRY * thread_p, const PRED_EXPR * pr, DB_TYPE * single_node_type){ *single_node_type = DB_TYPE_NULL; if (pr == NULL) return NULL;
if (pr->type == T_EVAL_TERM) { switch (pr->pe.m_eval_term.et_type) { case T_COMP_EVAL_TERM: { const COMP_EVAL_TERM *et_comp = &pr->pe.m_eval_term.et.et_comp; *single_node_type = et_comp->type;
if (et_comp->rel_op == R_NULL) return (PR_EVAL_FNC) eval_pred_comp1; /* IS NULL */ if (et_comp->rel_op == R_EXISTS) return (PR_EVAL_FNC) eval_pred_comp2; /* EXISTS */ if (et_comp->lhs->type == TYPE_LIST_ID || et_comp->rhs->type == TYPE_LIST_ID) return (PR_EVAL_FNC) eval_pred_comp3; /* list-id cmp */ return (PR_EVAL_FNC) eval_pred_comp0; /* binary cmp */ } case T_ALSM_EVAL_TERM: { const ALSM_EVAL_TERM *et_alsm = &pr->pe.m_eval_term.et.et_alsm; *single_node_type = et_alsm->item_type; return (et_alsm->elemset->type != TYPE_LIST_ID) ? (PR_EVAL_FNC) eval_pred_alsm4 /* set-bound ALL/SOME */ : (PR_EVAL_FNC) eval_pred_alsm5; /* list-bound ALL/SOME */ } case T_LIKE_EVAL_TERM: return (PR_EVAL_FNC) eval_pred_like6; case T_RLIKE_EVAL_TERM: return (PR_EVAL_FNC) eval_pred_rlike7; } } /* general case: predicate is a Boolean composition; need full walker */ return (PR_EVAL_FNC) eval_pred;}The decision is purely structural — no value-flow analysis, no cost model.
eval_fnc returns eval_pred_comp0 if and only if the predicate is a
single binary comparison without IS NULL, EXISTS, or list-file
operands; if any structural twist is present, the more general specialised
path is picked, and only when no specialisation matches does the full
eval_pred fall out as the fallback. Each specialised path inlines its
known structure: eval_pred_comp0 does not switch on et_type because it
already knows it’s T_COMP_EVAL_TERM with the simple sub-shape;
eval_pred_comp1 does not check rel_op because it knows R_NULL. The
saving is small per call but multiplied by every row of every scan.
ANY/ALL — set comparisons via list-file walking
Section titled “ANY/ALL — set comparisons via list-file walking”eval_pred_alsm4 and eval_pred_alsm5 are the specialised paths for
quantified comparisons. The _alsm4 form fetches the rhs as a DB_SET
and dispatches to eval_some_eval or eval_all_eval (set walking via
set_get_element); _alsm5 materialises the rhs as a list-file via
EXECUTE_REGU_VARIABLE_XASL and then dispatches to eval_some_list_eval
or eval_all_list_eval (list-file scan via qfile_scan_list_next):
// eval_pred_alsm5 — src/query/query_evaluator.c (condensed; list-bound ANY/ALL)DB_LOGICALeval_pred_alsm5 (THREAD_ENTRY * thread_p, const PRED_EXPR * pr, val_descr * vd, OID * obj_oid){ const ALSM_EVAL_TERM *et_alsm = &pr->pe.m_eval_term.et.et_alsm;
/* materialise the inner sub-query */ EXECUTE_REGU_VARIABLE_XASL (thread_p, et_alsm->elemset, vd); if (CHECK_REGU_VARIABLE_XASL_STATUS (et_alsm->elemset) != XASL_SUCCESS) return V_ERROR;
QFILE_SORTED_LIST_ID *srlist_id = et_alsm->elemset->value.srlist_id; if (srlist_id->list_id->tuple_cnt == 0) return (et_alsm->eq_flag == F_ALL) ? V_TRUE : V_FALSE; /* empty-set ANSI rule */
DB_VALUE *peek_val1 = NULL; if (fetch_peek_dbval (thread_p, et_alsm->elem, vd, NULL, obj_oid, NULL, &peek_val1) != NO_ERROR) return V_ERROR; if (db_value_is_null (peek_val1)) return V_UNKNOWN;
return (et_alsm->eq_flag == F_ALL) ? eval_all_list_eval (thread_p, peek_val1, srlist_id->list_id, et_alsm->rel_op) : eval_some_list_eval (thread_p, peek_val1, srlist_id->list_id, et_alsm->rel_op);}The empty-set rule (F_ALL over empty is V_TRUE, F_SOME over empty is
V_FALSE) is ANSI-canonical and is the only place CUBRID short-circuits
without scanning the right side. The eval_all_list_eval / eval_some_list_eval
helpers walk the list file with qfile_scan_list_next, decoding column 0 of
each tuple into a DB_VALUE and feeding it to eval_value_rel_cmp against
the lhs. The walk short-circuits on the first decisive answer —
some_list_eval returns V_TRUE on first match; all_list_eval is
implemented as not some(neg-op) and returns V_FALSE on first
counter-example.
eval_value_rel_cmp — the bottom of the comparison stack
Section titled “eval_value_rel_cmp — the bottom of the comparison stack”Once both sides are DB_VALUE *, every comparison ultimately ends in
eval_value_rel_cmp. The sequence is:
- Constant coercion (one-time) — when the rhs is a
FETCH_ALL_CONSTregu and its domain differs from the lhs, the rhs is coerced once into the lhs’s domain. The intent is to avoid re-coercing the same constant per row insidetp_value_compare_with_error. The currently-active policies are: numeric ↔ char (coerce char to double), date/time ↔ char (coerce char to the date/time domain), narrow numeric ↔ wide numeric (coerce narrow to wide). The lhs-coercion mirror is present but#if 0-disabled in source — left as a TODO. tp_value_compare_with_error— the type-aware comparator returnsDB_LT/DB_EQ/DB_GT/DB_UNK(or setscomparable = false).DB_UNKtriggersV_UNKNOWNfor ordinaryREL_OPs;R_NULLSAFE_EQhas its own NULL-aware logic that admitsNULL = NULL → V_TRUE.- REL_OP mapping — the
intresult is mapped toV_TRUE/V_FALSEperrel_operator.R_EQchecksresult == DB_EQ;R_LEacceptsDB_LTorDB_EQ; the set comparisons map toDB_SUBSET/DB_SUPERSET/DB_EQ;R_NULLSAFE_EQis the only operator that treatsNULL = NULLasV_TRUE.
R_EQ_TORDER is a special-case total-order comparison used for sort and
group-by where NULLs are treated as a definite value (typically smallest).
Calling tp_value_compare_with_error with the total-order flag set yields
DB_EQ even for NULL = NULL.
End-to-end: scan emits row → predicate verdict → output projection
Section titled “End-to-end: scan emits row → predicate verdict → output projection”flowchart TD S1["scan_next_heap_scan or scan_next_index_scan"] --> S2["heap_get_visible_version_internal:<br/>fetch RECDES + heap_attrinfo_read_dbvalues"] S2 --> S3["eval_data_filter (or eval_key_filter):<br/>scan_pred->pr_eval_fnc()"] S3 -->|V_TRUE| S4["fetch_val_list PEEK on output regu_list:<br/>each regu resolved via fetch_peek_dbval"] S3 -->|V_FALSE / V_UNKNOWN| S5["skip row, advance scan"] S3 -->|V_ERROR| S6["bubble error up to qexec_intprt_fnc"] S4 --> S7["row passed to consumer:<br/>list-file write or upstream operator"] S5 --> S1 S7 --> S1
The point of inflection is eval_data_filter. Its responsibilities are
narrow and easy to enumerate:
- Make sure the attribute cache is populated.
- Run the predicate via the pre-compiled fast path.
- If admitted, populate the output regu list (peek shares pointers; copy only happens when the row is committed to a list file or copied across page-fix boundaries).
When pr_eval_fnc == eval_pred (the general case), this expands into the
full recursive walker. When it is one of the specialised paths, the body
is a single fetch_peek_dbval per operand plus the type-specific
comparator. Either way, the contract is the same: take a row (as RECDES +
OID + cached attrinfo), return a tri-state verdict.
The integration with scan-manager
Section titled “The integration with scan-manager”scan_manager.c builds a FILTER_INFO per scan and uses it to drive the
predicate. The FILTER_INFO carries the SCAN_PRED (the regu list, the
PRED_EXPR, the cached pr_eval_fnc), the SCAN_ATTRS (the attribute IDs
and the HEAP_CACHE_ATTRINFO), the VAL_DESCR (host vars, sys datetime),
the class OID, and — for index-side filters — the B-tree attribute IDs and
the function-index column. Three different FILTER_INFO instances coexist
for an index scan: the range filter (used to bound the B-tree walk), the
key filter (applied per key during the walk; no heap fetch), and the
data filter (applied after the heap fetch); each is built once at
scan_open_index_scan time and re-used per row. See
cubrid-scan-manager.md for the complete scan-side narrative.
The key contract scan_manager holds with the evaluator is:
- Input: a row in the form (
RECDES,OID, populated attribute cache). - Output: a
DB_LOGICAL∈ {V_TRUE,V_FALSE,V_UNKNOWN,V_ERROR}. - Side effects: on
V_TRUE, the output regu list is materialised into the per-XASLval_listviafetch_val_list. OnV_ERROR, the scan is aborted and the error code is in TLS viaer_set. OnV_FALSE/V_UNKNOWNthe row is dropped and the scan advances.
update_logical_result — the qualification policy
Section titled “update_logical_result — the qualification policy”Different consumers want different collapse rules for V_UNKNOWN. A WHERE
clause wants unknown → false; a WHERE NOT clause wants
unknown → unknown (the negation is also unknown); an outer-join
qualification predicate sits in a third position where both qualified and
not-qualified rows are interesting. CUBRID encodes the policy as a
QPROC_QUALIFICATION enum and routes the result through
update_logical_result:
// update_logical_result — src/query/query_evaluator.c (condensed)DB_LOGICALupdate_logical_result (THREAD_ENTRY * thread_p, DB_LOGICAL ev_res, int *qualification){ if (ev_res == V_ERROR) return ev_res;
if (qualification != NULL) { switch (*qualification) { case QPROC_QUALIFIED: if (ev_res != V_TRUE) return V_FALSE; /* unknown → false */ break; case QPROC_NOT_QUALIFIED: if (ev_res != V_FALSE) return V_FALSE; /* unknown → false */ break; case QPROC_QUALIFIED_OR_NOT: /* both qualified and not-qualified are interesting; bookkeep */ if (ev_res == V_TRUE) *qualification = QPROC_QUALIFIED; else if (ev_res == V_FALSE) *qualification = QPROC_NOT_QUALIFIED; /* V_UNKNOWN: leave qualification unchanged */ break; } } return (ev_res == V_TRUE) ? V_TRUE : V_FALSE;}This is the function scan_manager calls after the predicate to fold the
tri-state result into the per-scan qualification policy. The
QPROC_QUALIFIED_OR_NOT case is the one that survives unchanged through
unknown — it is used by anti-joins and outer-join filters where the row’s
qualification status is to be observed, not enforced.
Closing the loop — eval_pred recursion vs the per-row path
Section titled “Closing the loop — eval_pred recursion vs the per-row path”To anchor the picture: the typical WHERE x > 0 AND y < 10 predicate goes
through eval_pred because it is a Boolean composition of two
COMP_EVAL_TERM leaves; eval_fnc returned eval_pred at scan-open time.
On every row, eval_pred walks the right-linear AND chain, calling itself
on each conjunct, which then walks into T_EVAL_TERM and runs
fetch_peek_dbval twice (lhs, rhs) plus eval_value_rel_cmp once. A
predicate of the form WHERE x = ? (host parameter) goes through
eval_pred_comp0 directly — eval_fnc recognised the single binary
comparison and skipped the recursion.
The corollary is that predicate complexity cost is mostly tree-walk
overhead, while expression complexity cost is mostly fetch_peek_arith
recursion and qdata_evaluate_function work. The two concerns are
orthogonal: a single comparison with a complex rhs (a 10-deep arithmetic
expression involving 5 functions) has no walker overhead but enormous
fetch overhead. Optimising one without the other gives no speedup on the
mixed case — which is why both paths are kept.
Source Walkthrough
Section titled “Source Walkthrough”Anchor on symbol names. Line numbers are scoped to this document’s
updated:date; the symbols are stable across refactors.
Predicate types and the dispatch enums (src/xasl/xasl_predicate.hpp)
Section titled “Predicate types and the dispatch enums (src/xasl/xasl_predicate.hpp)”cubxasl::pred_expr— the predicate node; tagged union of three kinds.TYPE_PRED_EXPR—T_PRED/T_EVAL_TERM/T_NOT_TERM.BOOL_OP—B_AND/B_OR/B_XOR/B_IS/B_IS_NOT.TYPE_EVAL_TERM—T_COMP_EVAL_TERM/T_ALSM_EVAL_TERM/T_LIKE_EVAL_TERM/T_RLIKE_EVAL_TERM.REL_OP— full set of relational opcodes including ordinal, set, quantified, total-order, null-safe.QL_FLAG—F_ALL/F_SOMEfor ANY / ALL quantification.cubxasl::pred—lhs,rhs,bool_opforT_PRED.cubxasl::comp_eval_term—lhs,rhs,rel_op,typeforT_COMP_EVAL_TERM.cubxasl::alsm_eval_term—elem,elemset,eq_flag,rel_op,item_typeforT_ALSM_EVAL_TERM.cubxasl::like_eval_term—src,pattern,esc_charforT_LIKE_*.cubxasl::rlike_eval_term—src,pattern,case_sensitive,compiled_regexforT_RLIKE_*.
Driver and 3-valued logic (src/query/query_evaluator.c)
Section titled “Driver and 3-valued logic (src/query/query_evaluator.c)”eval_pred— the recursive walker; entry point for the general case.eval_negative— three-valued NOT (V_TRUE ↔ V_FALSE,V_UNKNOWN/V_ERRORpass through).eval_logical_result— three-valued AND of two sub-results (V_ERRORdominates,V_FALSEis sticky, otherwiseV_UNKNOWN).eval_value_rel_cmp— the type-aware comparator with one-time rhs-constant coercion and per-REL_OPresult mapping.eval_set_list_cmp—set ⨯ set/set ⨯ list/list ⨯ listcomparison; sorts list files when needed and dispatches to the multi-set / sort-list comparators.
Set / list helpers (src/query/query_evaluator.c)
Section titled “Set / list helpers (src/query/query_evaluator.c)”eval_some_eval/eval_all_eval— element-vs-set ANY / ALL.eval_some_list_eval/eval_all_list_eval— element-vs-list-file ANY / ALL viaqfile_scan_list_next.eval_item_card_set/eval_item_card_sort_list— cardinality of matching members (multi-set semantics for IN comparisons).eval_sub_multi_set_to_sort_list/eval_sub_sort_list_to_multi_set/eval_sub_sort_list_to_sort_list— subset relations.eval_eq_multi_set_to_sort_list,eval_ne_*,eval_le_*,eval_lt_*in their multi-set / sort-list combinations — the primitives foreval_set_list_cmp.eval_multi_set_to_sort_list/eval_sort_list_to_multi_set/eval_sort_list_to_sort_list— the dispatchers perrel_operator.
Specialised single-shape evaluators (src/query/query_evaluator.c)
Section titled “Specialised single-shape evaluators (src/query/query_evaluator.c)”eval_pred_comp0— binary ordinal / set comparison without LIST_ID.eval_pred_comp1—IS NULL(also handlesOIDnot-null special case).eval_pred_comp2—EXISTSover a list file or a set.eval_pred_comp3— comparison where lhs and/or rhs is aTYPE_LIST_ID.eval_pred_alsm4— set-boundANY/ALL.eval_pred_alsm5— list-file-boundANY/ALL.eval_pred_like6—LIKE(callsdb_string_like).eval_pred_rlike7—RLIKE(callsdb_string_rlikewith cachedcompiled_regex).eval_fnc— the dispatcher that picks one of the above (or falls back toeval_pred) at scan-open time.
Filter bridges (src/query/query_evaluator.c)
Section titled “Filter bridges (src/query/query_evaluator.c)”eval_data_filter— the bridge fromscan_managerfor heap-side predicates; populates the attribute cache, invokespr_eval_fnc, conditionally materialises the output regu list.eval_key_filter— analogue for index-key predicates; decodes the key columns out ofDB_MIDXKEYinto the attribute cache before invokingpr_eval_fnc.update_logical_result— appliesQPROC_QUALIFICATIONpolicy.
Regu variable types and the polymorphic struct (src/query/regu_var.hpp)
Section titled “Regu variable types and the polymorphic struct (src/query/regu_var.hpp)”regu_variable_node(aliasREGU_VARIABLE) — the polymorphic expression-operand node; tagged-union shape.REGU_DATATYPE—TYPE_DBVAL/TYPE_CONSTANT/TYPE_INARITH/TYPE_OUTARITH/TYPE_ATTR_ID/TYPE_CLASS_ATTR_ID/TYPE_SHARED_ATTR_ID/TYPE_POSITION/TYPE_LIST_ID/TYPE_POS_VALUE/TYPE_OID/TYPE_CLASSOID/TYPE_FUNC/TYPE_REGUVAL_LIST/TYPE_REGU_VAR_LIST/TYPE_ORDERBY_NUM/TYPE_SP.regu_variable_node::map_regu— recursive walker for the regu tree (used byclear_xasl, byfetch_force_not_const_recursive, and by any post-deserialisation patching that must touch every leaf).regu_variable_node::clear_xasl/clear_xasl_local— per-node clean-up; frees rand-seeds, regex objects, embeddedDB_VALUEs.attr_descr_node(ATTR_DESCR) — theTYPE_ATTR_IDpayload:id,type,cache_attrinfo,cache_dbvalp.arith_list_node(ARITH_TYPE) — theTYPE_INARITH/TYPE_OUTARITHpayload:domain,value,leftptr,rightptr,thirdptr,opcode,pred,rand_seed.function_node(FUNCTION_TYPE) — theTYPE_FUNCpayload:value,operand,ftype,tmp_obj(used to cache compiled regex forF_REGEXP_*).regu_value_list/regu_value_item— theTYPE_REGUVAL_LISTpayload (multi-row VALUES literal).valptr_list_node(aliasVALPTR_LIST,OUTPTR_LIST) — list of regu variables for output projection (xasl->outptr_list).regu_variable_list_node(REGU_VARIABLE_LIST) — singly-linked list of regu variables; the building block for predicate operand lists, output lists, and function arguments.REGU_VARIABLE_HIDDEN_COLUMN/..._FETCH_ALL_CONST/..._FETCH_NOT_CONST/..._APPLY_COLLATION/..._ANALYTIC_WINDOW/..._UPD_INS_LIST/..._STRICT_TYPE_CAST/..._CORRELATED— flags that tune fetch and projection behaviour.REGU_VARIABLE_IS_FLAGED/_SET_FLAG/_CLEAR_FLAG/_GET_TYPE— inline accessors.
Regu walker implementation (src/query/regu_var.cpp)
Section titled “Regu walker implementation (src/query/regu_var.cpp)”regu_variable_node::map_regu— generic visitor recursing througharithptr->leftptr/rightptr,funcp->operandchain,sp_ptr->argschain,reguval_list->regu_listchain,regu_var_listchain. The walker is intentionally limited — it skipsTYPE_LIST_ID,TYPE_DBVAL,TYPE_ATTR_ID(and similar leaves) because there are no children to visit.regu_variable_node::map_regu_and_xasl— same walker, also visitingregu.xaslwhenever set.regu_variable_node::clear_xasl_local/clear_xasl— per-leaf clean-up wired into the deeper XASL clean-up path.
Fetch dispatcher and helpers (src/query/fetch.c, src/query/fetch.h)
Section titled “Fetch dispatcher and helpers (src/query/fetch.c, src/query/fetch.h)”fetch_peek_dbval— the per-regu type-switch; returns a non-owning pointer.fetch_copy_dbval— peek-then-copy convenience.fetch_val_list— fetches every regu in a list into the correspondingvfetch_to;peek=PEEKshares pointers,peek=COPYcopies. Has a fast path: when the head of the list isTYPE_POSITION, it callsfetch_peek_dbval_posto walk the tuple in a single pass without re-decoding.fetch_peek_arith— recursive arithmetic / general-expression evaluator; ~3.7 KLOC switch onarithptr->opcode.fetch_peek_dbval_pos— fast tuple-positional fetcher; readsTYPE_POSITIONregus in one linear pass over the tuple.fetch_peek_min_max_value_of_width_bucket_func— specialised fetcher forT_WIDTH_BUCKETthat pre-walks the predicate-encoded bucket bounds.fetch_init_val_list— clearsattr_descr.cache_dbvalpfor every attribute regu in a list (called when the cache is invalidated).fetch_force_not_const_recursive— used byTYPE_SPand similar always-not-const cases; walks the regu tree viamap_reguand setsFETCH_NOT_CONSTon every relevant leaf.
Function-call dispatcher (src/query/query_opfunc.c)
Section titled “Function-call dispatcher (src/query/query_opfunc.c)”qdata_evaluate_function— the switch onfuncp->ftype; routes to the per-function evaluator. Sub-cases:F_SET/F_MULTISET/F_SEQUENCE/F_VID→qdata_convert_dbvals_to_set.F_TABLE_SET/F_TABLE_MULTISET/F_TABLE_SEQUENCE→qdata_convert_table_to_set.F_GENERIC→qdata_evaluate_generic_function.F_CLASS_OF→qdata_get_class_of_function.F_INSERT_SUBSTRING→qdata_insert_substring_function.F_ELT→qdata_elt.F_BENCHMARK→qdata_benchmark.- JSON family (
F_JSON_*) →qdata_convert_operands_to_value_and_call (..., db_evaluate_json_*). - Regex family (
F_REGEXP_COUNT/_INSTR/_LIKE/_REPLACE/_SUBSTR) →qdata_regexp_function.
EXECUTE_REGU_VARIABLE_XASL/CHECK_REGU_VARIABLE_XASL_STATUS(macros fromsrc/query/xasl.h) — the helpers that drive sub-query materialisation whenregu_var->xaslis set.
Position hints as of 2026-05-01
Section titled “Position hints as of 2026-05-01”| Symbol | File | Line |
|---|---|---|
eval_pred | src/query/query_evaluator.c | 1666 |
eval_pred_comp0 | src/query/query_evaluator.c | 2150 |
eval_pred_comp1 | src/query/query_evaluator.c | 2199 |
eval_pred_comp2 | src/query/query_evaluator.c | 2238 |
eval_pred_comp3 | src/query/query_evaluator.c | 2295 |
eval_pred_alsm4 | src/query/query_evaluator.c | 2353 |
eval_pred_alsm5 | src/query/query_evaluator.c | 2419 |
eval_pred_like6 | src/query/query_evaluator.c | 2477 |
eval_pred_rlike7 | src/query/query_evaluator.c | 2535 |
eval_fnc | src/query/query_evaluator.c | 2590 |
update_logical_result | src/query/query_evaluator.c | 2668 |
eval_data_filter | src/query/query_evaluator.c | 2741 |
eval_key_filter | src/query/query_evaluator.c | 2822 |
eval_negative | src/query/query_evaluator.c | 96 |
eval_logical_result | src/query/query_evaluator.c | 119 |
eval_value_rel_cmp | src/query/query_evaluator.c | 152 |
eval_some_eval | src/query/query_evaluator.c | 353 |
eval_all_eval | src/query/query_evaluator.c | 417 |
eval_some_list_eval | src/query/query_evaluator.c | 549 |
eval_all_list_eval | src/query/query_evaluator.c | 648 |
eval_set_list_cmp | src/query/query_evaluator.c | 1542 |
fetch_peek_dbval | src/query/fetch.c | 3901 |
fetch_peek_arith | src/query/fetch.c | 85 |
fetch_copy_dbval | src/query/fetch.c | 4695 |
fetch_val_list | src/query/fetch.c | 4755 |
fetch_init_val_list | src/query/fetch.c | 4818 |
fetch_peek_dbval_pos | src/query/fetch.c | 4520 |
fetch_peek_min_max_value_of_width_bucket_func | src/query/fetch.c | 4581 |
fetch_force_not_const_recursive | src/query/fetch.c | 5069 |
regu_variable_node::map_regu | src/query/regu_var.cpp | 41 |
regu_variable_node::map_regu_and_xasl | src/query/regu_var.cpp | 122 |
regu_variable_node::clear_xasl_local | src/query/regu_var.cpp | 142 |
regu_variable_node::clear_xasl | src/query/regu_var.cpp | 210 |
qdata_evaluate_function | src/query/query_opfunc.c | 6875 |
TYPE_PRED_EXPR (enum) | src/xasl/xasl_predicate.hpp | 32 |
BOOL_OP (enum) | src/xasl/xasl_predicate.hpp | 39 |
TYPE_EVAL_TERM (enum) | src/xasl/xasl_predicate.hpp | 48 |
REL_OP (enum) | src/xasl/xasl_predicate.hpp | 56 |
QL_FLAG (enum) | src/xasl/xasl_predicate.hpp | 88 |
pred_expr (struct) | src/xasl/xasl_predicate.hpp | 150 |
comp_eval_term | src/xasl/xasl_predicate.hpp | 106 |
alsm_eval_term | src/xasl/xasl_predicate.hpp | 114 |
like_eval_term | src/xasl/xasl_predicate.hpp | 123 |
rlike_eval_term | src/xasl/xasl_predicate.hpp | 130 |
regu_variable_node | src/query/regu_var.hpp | 173 |
REGU_DATATYPE (enum) | src/query/regu_var.hpp | 44 |
attr_descr_node | src/query/regu_var.hpp | 77 |
arith_list_node | src/query/regu_var.hpp | 126 |
function_node | src/query/regu_var.hpp | 143 |
regu_variable_list_node | src/query/regu_var.hpp | 224 |
valptr_list_node (OUTPTR_LIST) | src/query/regu_var.hpp | 117 |
EXECUTE_REGU_VARIABLE_XASL (macro) | src/query/xasl.h | 527 |
CHECK_REGU_VARIABLE_XASL_STATUS (macro) | src/query/xasl.h | 579 |
SCAN_PRED | src/query/query_evaluator.h | 94 |
SCAN_ATTRS | src/query/query_evaluator.h | 103 |
FILTER_INFO | src/query/query_evaluator.h | 112 |
QPROC_QUALIFICATION (enum) | src/query/query_evaluator.h | 62 |
Cross-check Notes
Section titled “Cross-check Notes”This document is purely code-derived (sources: []); there is no raw analysis
to align with. The cross-check is therefore against sibling docs in this
project’s code-analysis/cubrid/ folder, which describe the surrounding
modules. Each sibling claim is checked against the current source tree.
cubrid-scan-manager.md— describes the per-row driver (scan_next_*,scan_handle_single_scan) and theINDX_SCAN_ID::range_pred/key_pred/scan_predtriple that wraps the evaluator’s predicates. The scan-manager doc listseval_data_filterandeval_key_filteras the bridges to the evaluator; this document confirms them as the outward-facing entry points and adds the detail thatpr_eval_fncis pre-compiled byeval_fncat scan-open time. Scan-manager callseval_fncdirectly duringscan_init_scan_predto populate theSCAN_PRED::pr_eval_fncslot; this is the place specialisation happens.cubrid-query-executor.md— describes the operator-level driver (qexec_execute_mainblock_internal,qexec_intprt_fnc,qexec_open_scan), the XASL tree shape (aptr_list/dptr_list/scan_ptr), and the post-processing phases. The executor’s role for the evaluator is twofold: (a) fetch_val_list calls duringqexec_intprt_fncto materialise the projected output; (b) theEXECUTE_REGU_VARIABLE_XASLmacro the evaluator calls to materialise sub-query references is implemented inxasl.has a recursive call back into the executor. There is no cross-doc disagreement.cubrid-xasl-generator.md— describes howpt_to_pred_exprbuilds thePRED_EXPRtree, in particular emitting right-linear AND/OR chains. The evaluator’s iteration strategy (for (t_pr = pr; ...; t_pr = t_pr->pe.m_pred.rhs)) depends on this structural invariant — a left-linear tree would force a recursion call per conjunct and pay the recursion-depth check overhead. The XASL generator’s tree-shape assumption and the evaluator’s iteration contract are co-designed.cubrid-semantic-check.md— describes the four-pass semantic check including type checking and constant folding. The evaluator is the runtime counterpart of two of those passes: it relies on the type-checked tree (everyregu_variable->domainis set) and on the constant-folded form (literal sub-trees are folded toTYPE_DBVALwhere possible, so the evaluator’s per-row work is minimal). The evaluator’sREGU_VARIABLE_FETCH_ALL_CONSTflag is a per-scan form of the same idea — the semantic-check pass folds compile-time constants, the evaluator caches first-row constants for the rest of the scan.cubrid-mvcc.md— the snapshot-vs-version logic the heap manager applies before theRECDESreacheseval_data_filter. The evaluator itself does not see snapshot information; by the time theRECDESarrives, MVCC has already filtered to the visible version. The evaluator’s job is purely the SQL predicate, not visibility.
A note on operator naming: the eval-term variants are named with numeric
suffixes (comp0 / comp1 / comp2 / comp3 / alsm4 / alsm5 /
like6 / rlike7). These numbers carry no semantics — they are
historical, perhaps a hint that a long time ago the file was generated by
a code-generator that allocated names by index. The mapping is now
canonical and eval_fnc reads it as if the names were arbitrary.
Open Questions
Section titled “Open Questions”- JIT for predicate evaluation. PostgreSQL has migrated to LLVM-based
JIT (
ExecCompileExpr, since v11) and reports double-digit speedups on scan-heavy workloads. CUBRID’s interpreter design is amenable — the predicate tree is already compact and walker-friendly — but the wire serialisation (XASL stream) would need a runtime compile after deserialisation. Investigation path: prototype an LLVM walker that compileseval_predplus the most commoneval_pred_comp0paths; benchmark on TPC-H Q1, Q6, Q14. - Vectorised / batched evaluation. The fetch path is row-oriented;
fetch_peek_dbvalreturns oneDB_VALUEat a time. Vectorised alternatives (Postgres’sExecQualBatch, DuckDB’s column-at-a-time) would require restructuringeval_predto take a row vector. The biggest blocker is the regu-variable model: every leaf currently stores its result in a singleDB_VALUEslot, which would need to become a buffer. - SIMD-accelerated comparisons.
eval_value_rel_cmpis the bottom of the comparison stack; it spends most cycles ontp_value_compare_with_error. For numeric-heavy predicates over primitive types, a SIMD path that compares 4 / 8 lanes at a time would reduce per-tuple cost dramatically. The trade-off is the cost of NULL-aware comparison logic and domain coercion — both fight straight-line vector code. - Constant-folding policy boundaries. The
FETCH_ALL_CONST/FETCH_NOT_CONSTflags classify a regu after the first fetch. Could the optimiser instead pre-classify at compile-time and skip the runtime self-classification on subsequent rows? The current flag-on-first-fetch design is robust against serialisation drift, but the per-row cost of theREGU_VARIABLE_IS_FLAGEDchecks accumulates. - The disabled lhs-coercion block in
eval_value_rel_cmp. The lhs-side mirror of the rhs constant-coercion logic is#if 0with aTODOcomment “do not delete me for future”. When was it disabled, and what regression caused the disable? Investigation path: git log archaeology on the surrounding lines. - Sub-query cache invalidation.
XASL_USES_SQ_CACHEmakesTYPE_CONSTANTandR_EXISTSpaths short-circuit viasq_get. The cache key is built from the correlation operands. What happens on parameter rebinding (a prepared statement re-executed with new host vars)? Is the cache invalidated, or does the key include the host-var values? eval_pred_comp0onR_EQ_TORDER. Total-order equality is a compile-time fixed predicate shape (noIS NULL, noEXISTS, noLIST_ID), soeval_fncreturnseval_pred_comp0. Buteval_pred_comp0checksdb_value_is_null (peek_val1) && et_comp->rel_op != R_NULLSAFE_EQand returnsV_UNKNOWN— forR_EQ_TORDERwith a NULL operand, the spec says the comparison should be definite (NULL = NULL → TRUE under total order). Is the current behaviour intentional, or doesR_EQ_TORDERneed a special pathway?- Stored-procedure result caching.
TYPE_SPalways setsFETCH_NOT_CONSTand re-evaluates the procedure per row. Some stored procedures are pure functions of their arguments and could be memoised. Is there an opt-in mechanism, or is it always re-run? fetch_peek_arithforT_PREDICATEopcode. When an arith node has a nested predicate (arithptr->pred), constancy is forced toNOT_CONST. This is the path used byT_CASEand similar conditional evaluators. Is the predicate cached across rows, or re-walked? The walkereval_predis invoked fresh; the cost may matter on deeply-nested CASE expressions.HEAP_CACHE_ATTRINFO::cache_dbvalplifetime under MVCC re-fetch. When a row needs MVCC re-fetch after a lock acquisition (inscan_next_index_scanforSELECT ... FOR UPDATE), the underlying page may have changed. Doesattr_descr.cache_dbvalpcorrectly invalidate? The currentfetch_peek_dbvaldoes not appear to re-validate; the contract relies on the caller callingfetch_init_val_listat the right moment.
Sources
Section titled “Sources”This document is sources: [] — purely code-derived from the CUBRID source
tree at /data/hgryoo/references/cubrid/. No raw analysis files
(raw/code-analysis/cubrid/...) were consulted.
CUBRID source
Section titled “CUBRID source”src/query/query_evaluator.c— the predicate evaluator (~3 KLOC). Drivers:eval_pred; specialised paths:eval_pred_comp{0,1,2,3}/eval_pred_alsm{4,5}/eval_pred_like6/eval_pred_rlike7; filter bridges:eval_data_filter/eval_key_filter; specialiser:eval_fnc.src/query/query_evaluator.h—SCAN_PRED,SCAN_ATTRS,FILTER_INFO,QPROC_QUALIFICATION,PR_EVAL_FNC.src/query/fetch.c—fetch_peek_dbval,fetch_copy_dbval,fetch_val_list,fetch_peek_arith(~5 KLOC; bulk is per-opcode arithmetic).src/query/fetch.h— fetch entry points.src/query/regu_var.hpp—regu_variable_node,REGU_DATATYPE,attr_descr_node,arith_list_node,function_node,regu_variable_list_node,valptr_list_node,REGU_VARIABLE_*flag set.src/query/regu_var.cpp—map_regu,clear_xasl_local,clear_xasl.src/xasl/xasl_predicate.hpp—pred_expr,pred,comp_eval_term,alsm_eval_term,like_eval_term,rlike_eval_term,TYPE_PRED_EXPR,BOOL_OP,TYPE_EVAL_TERM,REL_OP,QL_FLAG.src/query/xasl.h—EXECUTE_REGU_VARIABLE_XASLandCHECK_REGU_VARIABLE_XASL_STATUSmacros; XASL node integration.src/query/query_opfunc.c—qdata_evaluate_functiondispatch onFUNC_CODE; per-function evaluators (qdata_convert_dbvals_to_set,qdata_insert_substring_function,qdata_elt,qdata_regexp_function,qdata_convert_operands_to_value_and_calldb_evaluate_json_*).
- Cross-references for context:
src/query/scan_manager.{c,h}(the caller side ofeval_data_filter/eval_key_filter),src/query/query_executor.c(the per-row driver that callsfetch_val_listfor output materialisation),src/storage/heap_attrinfo.h(the attribute-cache decoder used byTYPE_ATTR_IDfetches).
Sibling docs
Section titled “Sibling docs”knowledge/code-analysis/cubrid/cubrid-query-executor.md— the operator-level driver; predicate evaluation is invoked fromeval_data_filterinside the per-row pull loop.knowledge/code-analysis/cubrid/cubrid-scan-manager.md— the per-tuple scan dispatcher;SCAN_PRED::pr_eval_fncis set byeval_fncat scan-open time.knowledge/code-analysis/cubrid/cubrid-xasl-generator.md— the client-side XASL generator that buildsPRED_EXPRandREGU_VARIABLEtrees; produces right-linear AND/OR chains by convention.knowledge/code-analysis/cubrid/cubrid-semantic-check.md— the type-checking and constant-folding pass that runs before XASL generation; sets the domain on every regu and folds compile-time constants.knowledge/code-analysis/cubrid/cubrid-heap-manager.md—HEAP_CACHE_ATTRINFOandheap_attrinfo_read_dbvalues, the buffer theTYPE_ATTR_IDfetch path reads from.knowledge/code-analysis/cubrid/cubrid-mvcc.md— the visibility check that runs before the predicate evaluation; the evaluator never sees invisible rows.
Textbook chapters / papers
Section titled “Textbook chapters / papers”- Graefe, Goetz. Query Evaluation Techniques for Large Databases, ACM Computing Surveys 25(2), 1993 — predicate evaluation, function dispatch, expression interpretation; the canonical predicate-walker / regu-variable framing.
- Hellerstein, Joseph; Stonebraker, Michael (eds.). Readings in Database Systems, 4th ed. — operator implementation chapters covering the iterator-vs-vector trade-off and three-valued logic.
- Petrov, Alex. Database Internals, O’Reilly 2019, Ch. 12 “Query Execution” — scan + filter pipeline; peek-vs-copy ownership.
- Codd, E. F. The Relational Model for Database Management: Version 2, Addison-Wesley 1990 — three-valued logic in SQL predicates.
- Aho, Alfred; Lam, Monica; Sethi, Ravi; Ullman, Jeffrey. Compilers: Principles, Techniques, and Tools (Dragon Book), 2nd ed. — expression trees and tree-walking interpreters.