Skip to content

CUBRID Query Evaluator — PRED_EXPR Walking, regu_variable Fetch, and the Row-Level Filter Engine

Contents:

A query evaluator is the runtime sub-system that turns a row plus a predicate into a Boolean verdict and a column reference into a value. The executor decides what to read; the evaluator decides whether what was read should be kept and how each referenced expression resolves to a DB_VALUE. Engineering it well means handling four orthogonal concerns at once: predicate-tree walking, expression interpretation, three-valued logic, and the per-row materialisation discipline that decides when to copy a value and when to share a pointer.

Goetz Graefe’s Query Evaluation Techniques for Large Databases (ACM Computing Surveys 25(2), 1993) lays out the textbook taxonomy. Predicate evaluation is treated as a special case of expression interpretation, where the expression returns one of three Boolean values; the canonical implementation is a small tree-walking interpreter with one method per node type. Graefe also names the three sub-problems that every engine must answer: how to short-circuit AND / OR so unnecessary subtrees are not evaluated, how to propagate UNKNOWN without paying the cost of bool-promotion at every leaf, and how to amortise the per-tuple interpretation cost when the same predicate shape repeats over millions of rows.

Three-valued logic is the foundation. ANSI SQL defines NULL propagation through WHERE predicates by Codd’s well-known truth table:

ABA AND BA OR BNOT A
TTTTF
TFFTF
TUUTF
FFFFT
FUFUT
UUUUU

The driver-level consequence is that the short-circuit rules differ between two-valued and three-valued logic. In two-valued AND, the first false ends evaluation; in three-valued AND, only false ends it — unknown does not. A WHERE filter additionally collapses unknown to false (Codd’s implication: a tuple is admitted only when the predicate evaluates to true). Other contexts — CHECK constraints, joining conditions, WHEN clauses — treat the three values differently; each context’s evaluator must be told which collapse rule applies. CUBRID encodes the collapse policy in its QPROC_QUALIFICATION enum (QPROC_QUALIFIED / QPROC_NOT_QUALIFIED / QPROC_QUALIFIED_OR_NOT) and threads it through update_logical_result.

Expression interpretation is the second axis. Database Internals (Petrov, ch. 12) frames it as a tagged-union walker: a node carries an opcode and a small set of operand pointers, the walker reads the opcode, recurses into the operands, and applies the operator to the recursively-computed sub-values. The classical implementation cost is one virtual call (or one switch jump) per opcode per row. The hot loop is therefore the unit on which two optimisations compete: operator specialisation (compile a dedicated routine for the common shape and call it directly, skipping the dispatch) and JIT compilation (assemble machine code that does the whole walk inline). PostgreSQL has migrated from the former (specialised fast_qual) to the latter (LLVM-backed ExecCompileExpr). CUBRID does only the former, via eval_fnc, which inspects a PRED_EXPR and returns a function pointer to a predicate-shape-specific evaluator (eval_pred_comp0 for binary equality, eval_pred_like6 for LIKE, etc.). The full recursive walker eval_pred remains the fallback for any predicate too compound to specialise.

The third axis is the register abstraction for expression operands. The abstract syntax tree the optimiser produces contains literals, attribute references, sub-query references, and arithmetic expressions side-by-side, and the runtime needs a uniform pointer type for them all so the walker can store them in a single field. Volcano-class engines call this a regu variable (“regulator variable”); each instance carries a small type tag plus a union that stores the kind-specific payload. Resolving a regu variable to a DB_VALUE is the work of a single dispatcher (CUBRID’s fetch_peek_dbval) that switches on the tag and delegates to a kind-specific path. The abstraction is what lets the predicate walker be ignorant of where its leaves came from — the same compare routine works whether the lhs is a heap-resident column, a host parameter, or a sub-query result.

The final axis is memory discipline at fetch time. Per-row work multiplied by millions of rows means every avoidable copy is amortised across the whole table; the canonical optimisation is peek semantics — the fetcher returns a pointer into a persistent buffer rather than copying into a fresh DB_VALUE. CUBRID uses peek by default (fetch_peek_dbval, fetch_val_list (..., peek=PEEK)); the copying counterpart fetch_copy_dbval exists for cases where the value needs to outlive the page latch (sort keys, group-by accumulators, sub-query result bindings). Petrov frames this as the classic ownership question that every database iterator must answer: who clears the value, and when?

Every engine that interprets predicates and expressions at runtime — Postgres, MySQL, SQL Server, CUBRID — solves the same four sub-problems with three recurring patterns.

The plan tree carries the predicate as a tagged-union node, and the evaluator switches on the tag. Postgres has Expr with NodeTag (T_OpExpr, T_BoolExpr, T_NullTest, …); ExecQual walks the qual list, calls ExecEvalExpr on each, and AND-s the results. MySQL builds Item trees with virtual methods (val_int, val_str, val_real); each operator overrides the appropriate val_*. SQL Server’s expression service compiles a stack machine: each node becomes a small program, and the evaluator interprets the program for each row.

CUBRID’s tag enum is TYPE_PRED_EXPR:

// TYPE_PRED_EXPR — src/xasl/xasl_predicate.hpp
typedef enum
{
T_PRED = 1, // Boolean operator (AND / OR / XOR / IS / IS NOT)
T_EVAL_TERM, // Comparison or membership leaf
T_NOT_TERM // Logical negation
} TYPE_PRED_EXPR;

T_PRED nodes carry a BOOL_OP (B_AND / B_OR / B_XOR / B_IS / B_IS_NOT); T_EVAL_TERM nodes carry a TYPE_EVAL_TERM discriminator (T_COMP_EVAL_TERM, T_ALSM_EVAL_TERM, T_LIKE_EVAL_TERM, T_RLIKE_EVAL_TERM); T_NOT_TERM is a thin wrapper around a child pred_expr *. The structure is exactly what a textbook recursive-descent predicate evaluator reads.

The shape-uniform expression node is everywhere. Postgres uses Expr with a common header; the actual sub-types (Const, Var, OpExpr, FuncExpr) live behind a NodeTag. MySQL’s Item is a class hierarchy where each leaf overrides the value accessors. CUBRID’s choice is a fixed-size C struct with an explicit REGU_DATATYPE tag plus a union of payload pointers — a flat representation chosen because the XASL tree is serialised across the client/server boundary, and a flat struct is straightforward to pack and unpack via xasl_to_stream / stream_to_xasl. The CUBRID flavour does not support virtual dispatch through the wire, so the union-tag form is the only option.

The regu variable is what makes the same walker handle every kind of operand. Comparison terms, arithmetic expressions, function calls, and sub-query references all resolve through a single fetcher; the predicate evaluator itself never branches on the operand kind.

Each layer must return one of {true, false, unknown, error}. Postgres uses bool plus a separate *isNull out-parameter for unknown; MySQL uses Item::null_value set as a side effect of val_*. CUBRID uses a single 4-valued enum, DB_LOGICAL, encoding V_TRUE, V_FALSE, V_UNKNOWN, and V_ERROR as distinct values. The advantage is that every layer can express all four outcomes through one return path; the cost is that the walker must distinguish unknown-from-NULL from error-from-failure at every recursion. The short helpers eval_negative (for NOT) and eval_logical_result (for combining sub-results) embody the tri-state truth tables and are reused across the file.

Specialise the common cases, recurse on the rest

Section titled “Specialise the common cases, recurse on the rest”

The operator-specialisation pattern: at predicate-tree analysis time, the engine looks at the predicate shape and picks a specialised evaluator that skips the per-node dispatch. CUBRID’s implementation is eval_fnc, which returns a PR_EVAL_FNC function pointer — eval_pred_comp0 for a binary comparison with no NULL or LIST_ID complication, eval_pred_comp1 for IS NULL, eval_pred_comp2 for EXISTS, eval_pred_comp3 for a comparison where one side is a list-file reference, eval_pred_alsm4 / eval_pred_alsm5 for ANY / ALL over set or list, eval_pred_like6 for LIKE, eval_pred_rlike7 for RLIKE. The full eval_pred is the fallback when the predicate is a Boolean composition that cannot be reduced to a single specialised shape; that is where the recursive AND/OR/XOR/NOT walker lives. The SCAN_PRED::pr_eval_fnc field in query_evaluator.h is what scan_manager caches per scan so the per-row cost is one indirect call and not a repeated switch on the root.

CUBRID is an interpreted, no-JIT engine. The walker is the C function eval_pred; the leaf compute is the function pointer fetch_peek_dbval. The specialisation pass is eval_fnc. There is no batched / vectorised path — every leaf is one DB_VALUE, every comparison is one row at a time. Compared to Postgres post-LLVM, CUBRID gives up the JIT speedup; compared to MySQL’s virtual-method overhead, CUBRID gains the flat struct’s locality. The engineering trade is straightforward: simple to understand, easy to serialise, slow on long predicate chains over scan-heavy workloads.

Theoretical conceptCUBRID symbol
Predicate tree nodecubxasl::pred_expr (PRED_EXPR)
Boolean operator opcodeBOOL_OP (B_AND / B_OR / B_XOR / B_IS / B_IS_NOT)
Eval-term sub-typeTYPE_EVAL_TERM (T_COMP_EVAL_TERM / T_ALSM_EVAL_TERM / T_LIKE_* / T_RLIKE_*)
Three-valued resultDB_LOGICAL (V_TRUE / V_FALSE / V_UNKNOWN / V_ERROR)
Recursive walkereval_pred (query_evaluator.c)
Predicate negationeval_negative (query_evaluator.c)
Tri-state combineeval_logical_result (query_evaluator.c)
Specialised compilereval_fnc returning PR_EVAL_FNC
Specialised binary-eq patheval_pred_comp0
Specialised IS NULL patheval_pred_comp1
Specialised EXISTS patheval_pred_comp2
Specialised list-id compare patheval_pred_comp3
Specialised ANY/ALL pathseval_pred_alsm4 (set), eval_pred_alsm5 (list)
Specialised LIKE / RLIKE pathseval_pred_like6, eval_pred_rlike7
Regu variableregu_variable_node (REGU_VARIABLE) (regu_var.hpp)
Regu kind tagREGU_DATATYPE (TYPE_DBVAL / TYPE_CONSTANT / TYPE_INARITH / TYPE_ATTR_ID / …)
Peek-fetcher dispatcherfetch_peek_dbval (fetch.c)
Copy-fetcherfetch_copy_dbval (fetch.c)
Per-row regu list materialisationfetch_val_list (fetch.c)
Arithmetic-expression evaluatorfetch_peek_arith (fetch.c)
Function-call dispatcherqdata_evaluate_function (query_opfunc.c)
Filter bridge from scan-managereval_data_filter / eval_key_filter (query_evaluator.c)
Tri-state qualification policyQPROC_QUALIFICATION enum (query_evaluator.h)
Qualification updateupdate_logical_result (query_evaluator.c)

The evaluator does four things and everything else is a special case of one of them: (1) walk a PRED_EXPR tree under three-valued logic, returning V_TRUE / V_FALSE / V_UNKNOWN; (2) at every leaf, fetch the operand DB_VALUEs through the polymorphic fetch_peek_dbval; (3) where the predicate’s shape is amenable, skip the recursion and call a specialised single-shape evaluator picked at scan-open time by eval_fnc; (4) interface with the scan manager via the eval_data_filter / eval_key_filter boundaries, which are the entry points the rest of the executor calls. The rest of this section walks each piece in the order a real predicate traverses them — top-down through eval_pred, downward through the regu variable to fetch_peek_dbval, sideways into qdata_evaluate_function for function-call leaves.

A PRED_EXPR is a tagged union of three node kinds, sitting on top of a small zoo of eval-term variants:

flowchart TB
  subgraph PE["pred_expr (TYPE_PRED_EXPR tag)"]
    direction LR
    P_PRED["T_PRED<br/>{ lhs, rhs, bool_op }<br/>BOOL_OP ∈<br/>{B_AND, B_OR, B_XOR,<br/>B_IS, B_IS_NOT}"]
    P_EVAL["T_EVAL_TERM<br/>{ et_type, et union }"]
    P_NOT["T_NOT_TERM<br/>{ not_term: pred_expr* }"]
  end

  subgraph ET["eval_term variants (TYPE_EVAL_TERM tag)"]
    direction LR
    ETC["T_COMP_EVAL_TERM<br/>{ lhs, rhs : regu_variable*<br/>  rel_op : REL_OP<br/>  type : DB_TYPE }"]
    ETA["T_ALSM_EVAL_TERM<br/>{ elem, elemset : regu_variable*<br/>  eq_flag : F_ALL or F_SOME<br/>  rel_op : REL_OP }"]
    ETL["T_LIKE_EVAL_TERM<br/>{ src, pattern, esc_char :<br/>  regu_variable* }"]
    ETR["T_RLIKE_EVAL_TERM<br/>{ src, pattern, case_sensitive,<br/>  compiled_regex }"]
  end

  P_EVAL --> ETC
  P_EVAL --> ETA
  P_EVAL --> ETL
  P_EVAL --> ETR
  P_PRED -. recurse .-> PE
  P_NOT -. recurse .-> PE

This shape is the load-bearing diagram of the chapter — eval_pred is a recursive walker over exactly this graph. T_PRED is a Boolean combinator; the bool_op tells the walker whether to AND-fold or OR-fold its children. T_NOT_TERM flips the sub-result via eval_negative. T_EVAL_TERM is a leaf; the leaf’s et_type decides which compare/match function applies. The four eval-term variants cover the four shapes of base predicates SQL admits at the leaf — comparisons (=, <, …), all/some quantification (> ANY, = ALL), LIKE pattern match, regular-expression match (RLIKE / REGEXP).

REL_OP is the comparison vocabulary at a T_COMP_EVAL_TERM. The full set encodes both ordinal comparisons (R_EQ, R_NE, R_LT, R_LE, R_GT, R_GE), set comparisons (R_SUBSET, R_SUBSETEQ, R_SUPERSET, R_SUPERSETEQ), the special unary tests (R_NULL for IS NULL, R_EXISTS for EXISTS), the quantified comparisons (R_EQ_SOME, R_LT_ALL, …), total-order comparison (R_EQ_TORDER) and null-safe equality (R_NULLSAFE_EQ). The driver uses the tag to pick a specialised path inside eval_value_rel_cmp; the quantified *_SOME / *_ALL codes are translated to the equivalent base op and dispatched to eval_some_eval / eval_all_eval.

eval_pred is a recursive descent over the PRED_EXPR tree, returning V_TRUE, V_FALSE, V_UNKNOWN, or V_ERROR. The walker has two structural optimisations: (a) it short-circuits AND/OR as soon as the result is determined, and (b) it walks right-linear chains iteratively rather than recursively, reflecting the bias the XASL generator produces (pt_to_pred_expr emits right-linear AND/OR chains so the walker can iterate on rhs without blowing the stack on long conjunctions).

flowchart TD
  Start["eval_pred (pr)"] --> Type{"pr->type"}
  Type -->|T_PRED| Bool{"bool_op"}
  Type -->|T_EVAL_TERM| Et{"et_type"}
  Type -->|T_NOT_TERM| Not["eval_pred (not_term)<br/>then eval_negative"]

  Bool -->|B_AND| AndLoop["result = V_TRUE<br/>walk right-linear<br/>chain on .rhs<br/>break on V_FALSE / V_ERROR"]
  Bool -->|B_OR|  OrLoop["result = V_FALSE<br/>walk right-linear<br/>chain on .rhs<br/>break on V_TRUE / V_ERROR"]
  Bool -->|B_XOR| Xor["eval lhs, eval rhs<br/>UNKNOWN if either UNKNOWN<br/>else lhs != rhs"]
  Bool -->|B_IS / B_IS_NOT| Is["eval lhs, eval rhs<br/>compare 3-valued results"]

  Et -->|T_COMP_EVAL_TERM| Cmp["fetch lhs, rhs<br/>NULL → V_UNKNOWN<br/>else eval_value_rel_cmp"]
  Et -->|T_ALSM_EVAL_TERM| Alsm["fetch elem, elemset<br/>route to eval_some_∗<br/>or eval_all_∗ by F_ALL/F_SOME"]
  Et -->|T_LIKE_EVAL_TERM| Like["fetch src, pattern, esc<br/>db_string_like"]
  Et -->|T_RLIKE_EVAL_TERM| Rlike["delegate to<br/>eval_pred_rlike7"]

The B_AND arm is the prototypical example of the right-linear iteration:

// eval_pred — src/query/query_evaluator.c (B_AND, condensed)
case B_AND:
/* 'pt_to_pred_expr()' will generate right-linear tree */
result = V_TRUE;
for (t_pr = pr;
result == V_TRUE && t_pr->type == T_PRED && t_pr->pe.m_pred.bool_op == B_AND;
t_pr = t_pr->pe.m_pred.rhs)
{
if (result == V_UNKNOWN)
{
result = eval_pred (thread_p, t_pr->pe.m_pred.lhs, vd, obj_oid);
result = (result == V_TRUE) ? V_UNKNOWN : result;
}
else
{
result = eval_pred (thread_p, t_pr->pe.m_pred.lhs, vd, obj_oid);
}
if (result == V_FALSE || result == V_ERROR)
goto exit;
}
/* tail evaluation for the final right child */
if (result == V_UNKNOWN)
{
result = eval_pred (thread_p, t_pr, vd, obj_oid);
result = (result == V_TRUE) ? V_UNKNOWN : result;
}
else
{
result = eval_pred (thread_p, t_pr, vd, obj_oid);
}
break;

Three things to read from this loop. First, the for advances along .rhs, not via stack recursion — the caller pays one stack frame for the whole conjunction. Second, the result == V_UNKNOWN branch is the three-valued discipline made explicit: once UNKNOWN is sticky, a V_TRUE from a later conjunct cannot promote the whole AND back to true (the result becomes V_UNKNOWN, not V_TRUE), but a V_FALSE still wins. Third, the goto exit on V_FALSE is the short-circuit; a strict left-to-right evaluator would have walked the rest of the chain. The B_OR arm is the dual pattern with the sentinel flipped; B_XOR and B_IS / B_IS_NOT are short and not right-linear because they have semantic two-childness rather than associative chainability.

The recursion-depth guard at the top of eval_pred (incremented on entry, decremented on exit) protects against pathologically deep predicate trees; when thread_get_recursion_depth (thread_p) > prm_get_integer_value (PRM_ID_MAX_RECURSION_SQL_DEPTH) the walker bails with ER_MAX_RECURSION_SQL_DEPTH.

A regu variable is a fixed-size descriptor of an expression operand; it unifies the kinds the predicate walker needs to read. The shape is one REGU_DATATYPE tag, a flags word, two domain pointers (current and original — relevant for XASL clones), an optional fetch-into slot (vfetch_to), and a discriminated union of payloads:

flowchart TB
  subgraph RV["regu_variable_node"]
    T["type : REGU_DATATYPE"]
    F["flags<br/>(REGU_VARIABLE_HIDDEN_COLUMN,<br/>FETCH_ALL_CONST, FETCH_NOT_CONST,<br/>APPLY_COLLATION, ANALYTIC_WINDOW, …)"]
    D["domain : TP_DOMAIN*<br/>original_domain : TP_DOMAIN*"]
    VF["vfetch_to : DB_VALUE*<br/>(slot to share-fetch into, when used as a list element)"]
    XL["xasl : xasl_node*<br/>(linked sub-query, if any)"]
    U["value : union (one of)"]
  end
  subgraph UN["union value (selected by type)"]
    direction TB
    DBV["DB_VALUE dbval<br/>(TYPE_DBVAL — embedded literal)"]
    DPV["DB_VALUE *dbvalptr<br/>(TYPE_CONSTANT, TYPE_ORDERBY_NUM —<br/>pointer into the sub-query result slot)"]
    AR["ARITH_TYPE *arithptr<br/>(TYPE_INARITH, TYPE_OUTARITH —<br/>arithmetic / general expression)"]
    AT["ATTR_DESCR attr_descr<br/>(TYPE_ATTR_ID, TYPE_CLASS_ATTR_ID, TYPE_SHARED_ATTR_ID —<br/>attribute fetch through HEAP_CACHE_ATTRINFO)"]
    PD["QFILE_TUPLE_VALUE_POSITION pos_descr<br/>(TYPE_POSITION — list-file column read)"]
    SR["QFILE_SORTED_LIST_ID *srlist_id<br/>(TYPE_LIST_ID — sub-query list ref)"]
    VP["int val_pos<br/>(TYPE_POS_VALUE — host-variable index)"]
    FN["function_node *funcp<br/>(TYPE_FUNC — function call)"]
    RVL["REGU_VALUE_LIST *reguval_list<br/>(TYPE_REGUVAL_LIST — VALUES query)"]
    RVS["REGU_VARIABLE_LIST regu_var_list<br/>(TYPE_REGU_VAR_LIST — CUME_DIST / PERCENT_RANK)"]
    SP["sp_node *sp_ptr<br/>(TYPE_SP — stored procedure)"]
  end
  U --- DBV
  U --- DPV
  U --- AR
  U --- AT
  U --- PD
  U --- SR
  U --- VP
  U --- FN
  U --- RVL
  U --- RVS
  U --- SP

The advantage is that every operand of every predicate or expression — a column, a literal, the result of (SELECT MAX(x) FROM t), the result of a + b * 3, the result of JSON_EXTRACT(j, '$.foo') — is a REGU_VARIABLE. The walker never asks “what kind of thing is this?”; it asks the fetcher, which knows.

TYPE_OID and TYPE_CLASSOID are tagless cases — they need no payload because the dispatcher reads the current row’s obj_oid / class_oid arguments (passed alongside the regu variable) and stuffs them into the shared value.dbval slot.

The flags carry information that survives across calls to the fetcher: in particular, REGU_VARIABLE_FETCH_ALL_CONST means “every recursive operand of this regu was constant on first fetch, so the cached value.dbval remains valid”, and REGU_VARIABLE_FETCH_NOT_CONST is its dual. They are populated by fetch_peek_dbval on the first call — the fetcher is self-classifying — and read on subsequent calls to short-circuit the work. This is CUBRID’s per-row constant-folding: an arithmetic expression of literals becomes a one-time evaluation, even though the predicate walker sees it again on every row.

fetch_peek_dbval is a single switch on regu_var->type that resolves a regu variable to a DB_VALUE *. It is the peek form — it returns a pointer to a buffer the caller must not free; the value is valid until the next operation on the same scan invalidates it. The copying counterpart fetch_copy_dbval calls fetch_peek_dbval and then qdata_copy_db_values the result into a caller-owned slot.

flowchart LR
  In["regu_var<br/>· vd, class_oid, obj_oid, tpl"] --> Disp{"regu_var->type"}
  Disp -->|TYPE_DBVAL| RD1["FLAG ALL_CONST<br/>peek = &value.dbval"]
  Disp -->|TYPE_CONSTANT| RD2["FLAG NOT_CONST<br/>EXECUTE_REGU_VARIABLE_XASL<br/>(materialise sub-query)<br/>peek = value.dbvalptr"]
  Disp -->|TYPE_ATTR_ID<br/>(also CLASS_ATTR_ID, SHARED_ATTR_ID)| RD3["FLAG NOT_CONST<br/>peek = attr_descr.cache_dbvalp<br/>(else heap_attrinfo_access)"]
  Disp -->|TYPE_POSITION| RD4["FLAG NOT_CONST<br/>qfile_locate_tuple_value tpl<br/>data_readval into vfetch_to"]
  Disp -->|TYPE_POS_VALUE| RD5["FLAG ALL_CONST<br/>peek = vd->dbval_ptr + val_pos<br/>(host variable)"]
  Disp -->|TYPE_OID / TYPE_CLASSOID| RD6["FLAG ALL_CONST<br/>db_make_oid into value.dbval"]
  Disp -->|TYPE_INARITH<br/>TYPE_OUTARITH| RD7["fetch_peek_arith<br/>(recursive expression eval)"]
  Disp -->|TYPE_FUNC| RD8["qdata_evaluate_function<br/>(F_SET / F_JSON_∗ / F_REGEXP_∗ / …)"]
  Disp -->|TYPE_SP| RD9["pl::executor.execute<br/>(stored-procedure call)"]
  Disp -->|TYPE_REGUVAL_LIST| RDB["recurse on reguval_list<br/>->current_value->value"]
  Disp -->|TYPE_ORDERBY_NUM| RDA["FLAG NOT_CONST<br/>peek = value.dbvalptr"]
  RD3 --> Cache["cache_dbvalp pointer cached<br/>on first read; subsequent rows<br/>read directly without HEAP lookup"]
  RD2 --> SqCache["XASL_USES_SQ_CACHE:<br/>sq_get keyed by aptr correlation"]

The path-by-path summary mirrors the diagram. Each path contributes one specific responsibility:

  1. TYPE_DBVAL — embedded literal. Returns &regu_var->value.dbval. Sets FETCH_ALL_CONST. Cheapest case: zero work after the initial flag set.
  2. TYPE_CONSTANT — pointer-into-aptr-result. The dbvalptr was bound by the parent’s pre-execution to point into the materialised aptr’s tuple. On the first read, the linked sub-query may need executing (EXECUTE_REGU_VARIABLE_XASL); subsequent rows read the cached pointer. When XASL_USES_SQ_CACHE is set, sq_get short-circuits via the sub-query cache.
  3. TYPE_ATTR_ID (and TYPE_CLASS_ATTR_ID, TYPE_SHARED_ATTR_ID) — attribute fetch. Reads through HEAP_CACHE_ATTRINFO (the per-class decoder cache the heap manager populates on heap_next / heap_get_visible_version). The first call resolves regu_var->value.attr_descr.id against the cache via heap_attrinfo_access; the resulting DB_VALUE * is stashed in attr_descr.cache_dbvalp so subsequent rows skip the lookup. Sets FETCH_NOT_CONST (the value changes per row by definition).
  4. TYPE_POSITION — list-file tuple column. The current row is a QFILE_TUPLE; qfile_locate_tuple_value (tpl, pos_descr.pos_no, ...) finds the column’s slot in the tuple, and pr_type->data_readval decodes it into regu_var->vfetch_to. The decoded value is unsuitable for sharing across rows; CUBRID always allocates a per-regu vfetch_to slot in advance.
  5. TYPE_POS_VALUE — host variable. vd->dbval_ptr[val_pos]. Constant for the query. Pure pointer arithmetic.
  6. TYPE_OID / TYPE_CLASSOID — current OID. The obj_oid / class_oid arguments are written into regu_var->value.dbval. These are not strictly constant (they change per row) but FETCH_ALL_CONST is still set because they are not derived from any operand whose constancy matters.
  7. TYPE_INARITH / TYPE_OUTARITH — arithmetic / general expression. Delegates to fetch_peek_arith, which is a 1500-line switch on arithptr->opcode covering every SQL operator (T_ADD, T_SUBSTRING, T_CASE, T_DECODE, T_TRIM, T_RAND, T_TO_CHAR, …). The recursive structure is the same as the predicate walker: fetch leftptr, rightptr, optionally thirdptr / fourthptr, then apply the per-opcode logic into arithptr->value. Constancy is computed bottom-up: an arithmetic node is ALL_CONST iff all of its sub-regus are. Side-effect-bearing opcodes (T_RAND, T_SLEEP, T_NEXT_VALUE, T_INCR, T_CASE/T_DECODE/T_IF/T_PREDICATE, T_SYS_GUID, T_ROW_COUNT, T_LAST_INSERT_ID, T_EVALUATE_VARIABLE) force NOT_CONST regardless of operand constancy — these never benefit from the per-row cache.
  8. TYPE_FUNC — function call. If FETCH_ALL_CONST is already set, the pre-computed funcp->value is returned without re-evaluation. Otherwise, qdata_evaluate_function (in query_opfunc.c) dispatches on funcp->ftype (the FUNC_CODE enum), calling out to one of the F_-routines: qdata_convert_dbvals_to_set for F_SET / F_MULTISET / F_SEQUENCE; qdata_insert_substring_function for F_INSERT_SUBSTRING; qdata_elt for F_ELT; qdata_regexp_function for the regex family; qdata_convert_operands_to_value_and_call for the entire JSON family with a db_evaluate_json_* function pointer passed in. After the call the constancy flags are recomputed by walking the funcp->operand chain.
  9. TYPE_SP — stored-procedure call. Constructs a cubpl::executor, fetches arguments through executor.fetch_args_peek, then runs executor.execute (*regu_var->value.sp_ptr->value). Always NOT_CONST (recursively forced by fetch_force_not_const_recursive to ensure the stored procedure’s body is re-evaluated per call).
  10. TYPE_REGUVAL_LIST — VALUES query (multi-row literal table). Recurses on reguval_list->current_value->value. The list is iterated by the S_VALUES_SCAN driver, which advances current_value per row.
  11. TYPE_ORDERBY_NUMORDER BY n reference. Behaves like TYPE_CONSTANT for fetch purposes but is NOT_CONST because the number-binding changes per output row.

The shared post-amble reapplies collation when REGU_VARIABLE_APPLY_COLLATION is set, runs domain refinement (resolving DB_TYPE_VARIABLE against the fetched value’s actual type), and for TYPE_REGUVAL_LIST cross-checks that the current row’s domain matches the head row’s (the same homogeneity rule UNION enforces). Errors at this stage manifest as ER_QPROC_INCOMPATIBLE_TYPES or ER_QSTR_INCOMPATIBLE_COLLATIONS.

The TYPE_ATTR_ID path is where the evaluator meets the heap. The handshake lives in three pieces. First, scan_open_heap_scan calls heap_attrinfo_start to allocate a HEAP_CACHE_ATTRINFO for the columns referenced by the predicate plus the output list. Second, on every fetched row, the heap iterator (heap_next / heap_get_visible_version_internal) calls heap_attrinfo_read_dbvalues on the new RECDES, decoding each referenced column into a per-attribute DB_VALUE inside the cache. Third, when fetch_peek_dbval hits a TYPE_ATTR_ID regu, it calls heap_attrinfo_access (regu_var->value.attr_descr.id, attr_cache) and caches the returned pointer in attr_descr.cache_dbvalp so the subsequent peek (within the same row) is a direct dereference. The cache_dbvalp is NOT cleared between rows — the underlying buffer is reused, the value inside is overwritten by heap_attrinfo_read_dbvalues. The fetcher trusts the heap manager’s per-row reset.

eval_data_filter in query_evaluator.c is the bridge that ensures the attribute cache is populated before the predicate evaluation:

// eval_data_filter — src/query/query_evaluator.c (condensed)
DB_LOGICAL
eval_data_filter (THREAD_ENTRY * thread_p, OID * oid, RECDES * recdesp,
HEAP_SCANCACHE * scan_cache, FILTER_INFO * filterp)
{
SCAN_PRED *scan_predp = filterp->scan_pred;
SCAN_ATTRS *scan_attrsp = filterp->scan_attrs;
DB_LOGICAL ev_res;
if (scan_attrsp != NULL && scan_attrsp->attr_cache != NULL
&& scan_predp->regu_list != NULL)
{
/* read the predicate values from the heap into the attribute cache */
if (heap_attrinfo_read_dbvalues (thread_p, oid, recdesp,
scan_attrsp->attr_cache) != NO_ERROR)
return V_ERROR;
/* ... class-attribute scan special case ... */
}
/* evaluate the predicates of the data filter via the compiled fast path */
ev_res = V_TRUE;
if (scan_predp->pr_eval_fnc && scan_predp->pred_expr)
ev_res = (*scan_predp->pr_eval_fnc) (thread_p, scan_predp->pred_expr,
filterp->val_descr, oid);
if (ev_res == V_TRUE && scan_predp->regu_list && filterp->val_list)
{
/* row passed: fetch the output regu list into val_list */
if (fetch_val_list (thread_p, scan_predp->regu_list, filterp->val_descr,
filterp->class_oid, oid, NULL, PEEK) != NO_ERROR)
return V_ERROR;
}
return ev_res;
}

Three details to read off this. First, heap_attrinfo_read_dbvalues is called before the predicate, so every TYPE_ATTR_ID regu in the predicate finds its cached value already populated. Second, the predicate runs through scan_predp->pr_eval_fnc — that pointer was set by eval_fnc at scan-open time, so this is the specialised path, not the generic eval_pred. Third, the output list (fetch_val_list (..., PEEK)) is materialised only when the predicate returns V_TRUE — failed rows skip the projection cost.

eval_key_filter is the analogue for index scans: instead of reading from a RECDES, it decodes the key columns out of a DB_MIDXKEY (multi-column B-tree key), positions them into the attribute cache, and then runs the same pr_eval_fnc — but with obj_oid = NULL, since the key filter cannot reference columns that are not in the key.

eval_fnc — operator specialisation at scan-open time

Section titled “eval_fnc — operator specialisation at scan-open time”

When scan_open_heap_scan initialises a SCAN_PRED, it calls eval_fnc to pick the per-row evaluator. eval_fnc looks at the root PRED_EXPR and returns a function pointer that bypasses the recursive walker for the common single-shape cases:

// eval_fnc — src/query/query_evaluator.c (condensed)
PR_EVAL_FNC
eval_fnc (THREAD_ENTRY * thread_p, const PRED_EXPR * pr, DB_TYPE * single_node_type)
{
*single_node_type = DB_TYPE_NULL;
if (pr == NULL) return NULL;
if (pr->type == T_EVAL_TERM)
{
switch (pr->pe.m_eval_term.et_type)
{
case T_COMP_EVAL_TERM:
{
const COMP_EVAL_TERM *et_comp = &pr->pe.m_eval_term.et.et_comp;
*single_node_type = et_comp->type;
if (et_comp->rel_op == R_NULL) return (PR_EVAL_FNC) eval_pred_comp1; /* IS NULL */
if (et_comp->rel_op == R_EXISTS) return (PR_EVAL_FNC) eval_pred_comp2; /* EXISTS */
if (et_comp->lhs->type == TYPE_LIST_ID
|| et_comp->rhs->type == TYPE_LIST_ID)
return (PR_EVAL_FNC) eval_pred_comp3; /* list-id cmp */
return (PR_EVAL_FNC) eval_pred_comp0; /* binary cmp */
}
case T_ALSM_EVAL_TERM:
{
const ALSM_EVAL_TERM *et_alsm = &pr->pe.m_eval_term.et.et_alsm;
*single_node_type = et_alsm->item_type;
return (et_alsm->elemset->type != TYPE_LIST_ID)
? (PR_EVAL_FNC) eval_pred_alsm4 /* set-bound ALL/SOME */
: (PR_EVAL_FNC) eval_pred_alsm5; /* list-bound ALL/SOME */
}
case T_LIKE_EVAL_TERM: return (PR_EVAL_FNC) eval_pred_like6;
case T_RLIKE_EVAL_TERM: return (PR_EVAL_FNC) eval_pred_rlike7;
}
}
/* general case: predicate is a Boolean composition; need full walker */
return (PR_EVAL_FNC) eval_pred;
}

The decision is purely structural — no value-flow analysis, no cost model. eval_fnc returns eval_pred_comp0 if and only if the predicate is a single binary comparison without IS NULL, EXISTS, or list-file operands; if any structural twist is present, the more general specialised path is picked, and only when no specialisation matches does the full eval_pred fall out as the fallback. Each specialised path inlines its known structure: eval_pred_comp0 does not switch on et_type because it already knows it’s T_COMP_EVAL_TERM with the simple sub-shape; eval_pred_comp1 does not check rel_op because it knows R_NULL. The saving is small per call but multiplied by every row of every scan.

ANY/ALL — set comparisons via list-file walking

Section titled “ANY/ALL — set comparisons via list-file walking”

eval_pred_alsm4 and eval_pred_alsm5 are the specialised paths for quantified comparisons. The _alsm4 form fetches the rhs as a DB_SET and dispatches to eval_some_eval or eval_all_eval (set walking via set_get_element); _alsm5 materialises the rhs as a list-file via EXECUTE_REGU_VARIABLE_XASL and then dispatches to eval_some_list_eval or eval_all_list_eval (list-file scan via qfile_scan_list_next):

// eval_pred_alsm5 — src/query/query_evaluator.c (condensed; list-bound ANY/ALL)
DB_LOGICAL
eval_pred_alsm5 (THREAD_ENTRY * thread_p, const PRED_EXPR * pr,
val_descr * vd, OID * obj_oid)
{
const ALSM_EVAL_TERM *et_alsm = &pr->pe.m_eval_term.et.et_alsm;
/* materialise the inner sub-query */
EXECUTE_REGU_VARIABLE_XASL (thread_p, et_alsm->elemset, vd);
if (CHECK_REGU_VARIABLE_XASL_STATUS (et_alsm->elemset) != XASL_SUCCESS)
return V_ERROR;
QFILE_SORTED_LIST_ID *srlist_id = et_alsm->elemset->value.srlist_id;
if (srlist_id->list_id->tuple_cnt == 0)
return (et_alsm->eq_flag == F_ALL) ? V_TRUE : V_FALSE; /* empty-set ANSI rule */
DB_VALUE *peek_val1 = NULL;
if (fetch_peek_dbval (thread_p, et_alsm->elem, vd, NULL, obj_oid, NULL, &peek_val1) != NO_ERROR)
return V_ERROR;
if (db_value_is_null (peek_val1))
return V_UNKNOWN;
return (et_alsm->eq_flag == F_ALL)
? eval_all_list_eval (thread_p, peek_val1, srlist_id->list_id, et_alsm->rel_op)
: eval_some_list_eval (thread_p, peek_val1, srlist_id->list_id, et_alsm->rel_op);
}

The empty-set rule (F_ALL over empty is V_TRUE, F_SOME over empty is V_FALSE) is ANSI-canonical and is the only place CUBRID short-circuits without scanning the right side. The eval_all_list_eval / eval_some_list_eval helpers walk the list file with qfile_scan_list_next, decoding column 0 of each tuple into a DB_VALUE and feeding it to eval_value_rel_cmp against the lhs. The walk short-circuits on the first decisive answer — some_list_eval returns V_TRUE on first match; all_list_eval is implemented as not some(neg-op) and returns V_FALSE on first counter-example.

eval_value_rel_cmp — the bottom of the comparison stack

Section titled “eval_value_rel_cmp — the bottom of the comparison stack”

Once both sides are DB_VALUE *, every comparison ultimately ends in eval_value_rel_cmp. The sequence is:

  1. Constant coercion (one-time) — when the rhs is a FETCH_ALL_CONST regu and its domain differs from the lhs, the rhs is coerced once into the lhs’s domain. The intent is to avoid re-coercing the same constant per row inside tp_value_compare_with_error. The currently-active policies are: numeric ↔ char (coerce char to double), date/time ↔ char (coerce char to the date/time domain), narrow numeric ↔ wide numeric (coerce narrow to wide). The lhs-coercion mirror is present but #if 0-disabled in source — left as a TODO.
  2. tp_value_compare_with_error — the type-aware comparator returns DB_LT / DB_EQ / DB_GT / DB_UNK (or sets comparable = false). DB_UNK triggers V_UNKNOWN for ordinary REL_OPs; R_NULLSAFE_EQ has its own NULL-aware logic that admits NULL = NULL → V_TRUE.
  3. REL_OP mapping — the int result is mapped to V_TRUE / V_FALSE per rel_operator. R_EQ checks result == DB_EQ; R_LE accepts DB_LT or DB_EQ; the set comparisons map to DB_SUBSET / DB_SUPERSET / DB_EQ; R_NULLSAFE_EQ is the only operator that treats NULL = NULL as V_TRUE.

R_EQ_TORDER is a special-case total-order comparison used for sort and group-by where NULLs are treated as a definite value (typically smallest). Calling tp_value_compare_with_error with the total-order flag set yields DB_EQ even for NULL = NULL.

End-to-end: scan emits row → predicate verdict → output projection

Section titled “End-to-end: scan emits row → predicate verdict → output projection”
flowchart TD
  S1["scan_next_heap_scan or scan_next_index_scan"] --> S2["heap_get_visible_version_internal:<br/>fetch RECDES + heap_attrinfo_read_dbvalues"]
  S2 --> S3["eval_data_filter (or eval_key_filter):<br/>scan_pred->pr_eval_fnc()"]
  S3 -->|V_TRUE| S4["fetch_val_list PEEK on output regu_list:<br/>each regu resolved via fetch_peek_dbval"]
  S3 -->|V_FALSE / V_UNKNOWN| S5["skip row, advance scan"]
  S3 -->|V_ERROR| S6["bubble error up to qexec_intprt_fnc"]
  S4 --> S7["row passed to consumer:<br/>list-file write or upstream operator"]
  S5 --> S1
  S7 --> S1

The point of inflection is eval_data_filter. Its responsibilities are narrow and easy to enumerate:

  1. Make sure the attribute cache is populated.
  2. Run the predicate via the pre-compiled fast path.
  3. If admitted, populate the output regu list (peek shares pointers; copy only happens when the row is committed to a list file or copied across page-fix boundaries).

When pr_eval_fnc == eval_pred (the general case), this expands into the full recursive walker. When it is one of the specialised paths, the body is a single fetch_peek_dbval per operand plus the type-specific comparator. Either way, the contract is the same: take a row (as RECDES + OID + cached attrinfo), return a tri-state verdict.

scan_manager.c builds a FILTER_INFO per scan and uses it to drive the predicate. The FILTER_INFO carries the SCAN_PRED (the regu list, the PRED_EXPR, the cached pr_eval_fnc), the SCAN_ATTRS (the attribute IDs and the HEAP_CACHE_ATTRINFO), the VAL_DESCR (host vars, sys datetime), the class OID, and — for index-side filters — the B-tree attribute IDs and the function-index column. Three different FILTER_INFO instances coexist for an index scan: the range filter (used to bound the B-tree walk), the key filter (applied per key during the walk; no heap fetch), and the data filter (applied after the heap fetch); each is built once at scan_open_index_scan time and re-used per row. See cubrid-scan-manager.md for the complete scan-side narrative.

The key contract scan_manager holds with the evaluator is:

  • Input: a row in the form (RECDES, OID, populated attribute cache).
  • Output: a DB_LOGICAL ∈ {V_TRUE, V_FALSE, V_UNKNOWN, V_ERROR}.
  • Side effects: on V_TRUE, the output regu list is materialised into the per-XASL val_list via fetch_val_list. On V_ERROR, the scan is aborted and the error code is in TLS via er_set. On V_FALSE / V_UNKNOWN the row is dropped and the scan advances.

update_logical_result — the qualification policy

Section titled “update_logical_result — the qualification policy”

Different consumers want different collapse rules for V_UNKNOWN. A WHERE clause wants unknown → false; a WHERE NOT clause wants unknown → unknown (the negation is also unknown); an outer-join qualification predicate sits in a third position where both qualified and not-qualified rows are interesting. CUBRID encodes the policy as a QPROC_QUALIFICATION enum and routes the result through update_logical_result:

// update_logical_result — src/query/query_evaluator.c (condensed)
DB_LOGICAL
update_logical_result (THREAD_ENTRY * thread_p, DB_LOGICAL ev_res, int *qualification)
{
if (ev_res == V_ERROR) return ev_res;
if (qualification != NULL)
{
switch (*qualification)
{
case QPROC_QUALIFIED:
if (ev_res != V_TRUE) return V_FALSE; /* unknown → false */
break;
case QPROC_NOT_QUALIFIED:
if (ev_res != V_FALSE) return V_FALSE; /* unknown → false */
break;
case QPROC_QUALIFIED_OR_NOT:
/* both qualified and not-qualified are interesting; bookkeep */
if (ev_res == V_TRUE) *qualification = QPROC_QUALIFIED;
else if (ev_res == V_FALSE) *qualification = QPROC_NOT_QUALIFIED;
/* V_UNKNOWN: leave qualification unchanged */
break;
}
}
return (ev_res == V_TRUE) ? V_TRUE : V_FALSE;
}

This is the function scan_manager calls after the predicate to fold the tri-state result into the per-scan qualification policy. The QPROC_QUALIFIED_OR_NOT case is the one that survives unchanged through unknown — it is used by anti-joins and outer-join filters where the row’s qualification status is to be observed, not enforced.

Closing the loop — eval_pred recursion vs the per-row path

Section titled “Closing the loop — eval_pred recursion vs the per-row path”

To anchor the picture: the typical WHERE x > 0 AND y < 10 predicate goes through eval_pred because it is a Boolean composition of two COMP_EVAL_TERM leaves; eval_fnc returned eval_pred at scan-open time. On every row, eval_pred walks the right-linear AND chain, calling itself on each conjunct, which then walks into T_EVAL_TERM and runs fetch_peek_dbval twice (lhs, rhs) plus eval_value_rel_cmp once. A predicate of the form WHERE x = ? (host parameter) goes through eval_pred_comp0 directly — eval_fnc recognised the single binary comparison and skipped the recursion.

The corollary is that predicate complexity cost is mostly tree-walk overhead, while expression complexity cost is mostly fetch_peek_arith recursion and qdata_evaluate_function work. The two concerns are orthogonal: a single comparison with a complex rhs (a 10-deep arithmetic expression involving 5 functions) has no walker overhead but enormous fetch overhead. Optimising one without the other gives no speedup on the mixed case — which is why both paths are kept.

Anchor on symbol names. Line numbers are scoped to this document’s updated: date; the symbols are stable across refactors.

Predicate types and the dispatch enums (src/xasl/xasl_predicate.hpp)

Section titled “Predicate types and the dispatch enums (src/xasl/xasl_predicate.hpp)”
  • cubxasl::pred_expr — the predicate node; tagged union of three kinds.
  • TYPE_PRED_EXPRT_PRED / T_EVAL_TERM / T_NOT_TERM.
  • BOOL_OPB_AND / B_OR / B_XOR / B_IS / B_IS_NOT.
  • TYPE_EVAL_TERMT_COMP_EVAL_TERM / T_ALSM_EVAL_TERM / T_LIKE_EVAL_TERM / T_RLIKE_EVAL_TERM.
  • REL_OP — full set of relational opcodes including ordinal, set, quantified, total-order, null-safe.
  • QL_FLAGF_ALL / F_SOME for ANY / ALL quantification.
  • cubxasl::predlhs, rhs, bool_op for T_PRED.
  • cubxasl::comp_eval_termlhs, rhs, rel_op, type for T_COMP_EVAL_TERM.
  • cubxasl::alsm_eval_termelem, elemset, eq_flag, rel_op, item_type for T_ALSM_EVAL_TERM.
  • cubxasl::like_eval_termsrc, pattern, esc_char for T_LIKE_*.
  • cubxasl::rlike_eval_termsrc, pattern, case_sensitive, compiled_regex for T_RLIKE_*.

Driver and 3-valued logic (src/query/query_evaluator.c)

Section titled “Driver and 3-valued logic (src/query/query_evaluator.c)”
  • eval_pred — the recursive walker; entry point for the general case.
  • eval_negative — three-valued NOT (V_TRUE ↔ V_FALSE, V_UNKNOWN / V_ERROR pass through).
  • eval_logical_result — three-valued AND of two sub-results (V_ERROR dominates, V_FALSE is sticky, otherwise V_UNKNOWN).
  • eval_value_rel_cmp — the type-aware comparator with one-time rhs-constant coercion and per-REL_OP result mapping.
  • eval_set_list_cmpset ⨯ set / set ⨯ list / list ⨯ list comparison; sorts list files when needed and dispatches to the multi-set / sort-list comparators.

Set / list helpers (src/query/query_evaluator.c)

Section titled “Set / list helpers (src/query/query_evaluator.c)”
  • eval_some_eval / eval_all_eval — element-vs-set ANY / ALL.
  • eval_some_list_eval / eval_all_list_eval — element-vs-list-file ANY / ALL via qfile_scan_list_next.
  • eval_item_card_set / eval_item_card_sort_list — cardinality of matching members (multi-set semantics for IN comparisons).
  • eval_sub_multi_set_to_sort_list / eval_sub_sort_list_to_multi_set / eval_sub_sort_list_to_sort_list — subset relations.
  • eval_eq_multi_set_to_sort_list, eval_ne_*, eval_le_*, eval_lt_* in their multi-set / sort-list combinations — the primitives for eval_set_list_cmp.
  • eval_multi_set_to_sort_list / eval_sort_list_to_multi_set / eval_sort_list_to_sort_list — the dispatchers per rel_operator.

Specialised single-shape evaluators (src/query/query_evaluator.c)

Section titled “Specialised single-shape evaluators (src/query/query_evaluator.c)”
  • eval_pred_comp0 — binary ordinal / set comparison without LIST_ID.
  • eval_pred_comp1IS NULL (also handles OID not-null special case).
  • eval_pred_comp2EXISTS over a list file or a set.
  • eval_pred_comp3 — comparison where lhs and/or rhs is a TYPE_LIST_ID.
  • eval_pred_alsm4 — set-bound ANY / ALL.
  • eval_pred_alsm5 — list-file-bound ANY / ALL.
  • eval_pred_like6LIKE (calls db_string_like).
  • eval_pred_rlike7RLIKE (calls db_string_rlike with cached compiled_regex).
  • eval_fnc — the dispatcher that picks one of the above (or falls back to eval_pred) at scan-open time.

Filter bridges (src/query/query_evaluator.c)

Section titled “Filter bridges (src/query/query_evaluator.c)”
  • eval_data_filter — the bridge from scan_manager for heap-side predicates; populates the attribute cache, invokes pr_eval_fnc, conditionally materialises the output regu list.
  • eval_key_filter — analogue for index-key predicates; decodes the key columns out of DB_MIDXKEY into the attribute cache before invoking pr_eval_fnc.
  • update_logical_result — applies QPROC_QUALIFICATION policy.

Regu variable types and the polymorphic struct (src/query/regu_var.hpp)

Section titled “Regu variable types and the polymorphic struct (src/query/regu_var.hpp)”
  • regu_variable_node (alias REGU_VARIABLE) — the polymorphic expression-operand node; tagged-union shape.
  • REGU_DATATYPETYPE_DBVAL / TYPE_CONSTANT / TYPE_INARITH / TYPE_OUTARITH / TYPE_ATTR_ID / TYPE_CLASS_ATTR_ID / TYPE_SHARED_ATTR_ID / TYPE_POSITION / TYPE_LIST_ID / TYPE_POS_VALUE / TYPE_OID / TYPE_CLASSOID / TYPE_FUNC / TYPE_REGUVAL_LIST / TYPE_REGU_VAR_LIST / TYPE_ORDERBY_NUM / TYPE_SP.
  • regu_variable_node::map_regu — recursive walker for the regu tree (used by clear_xasl, by fetch_force_not_const_recursive, and by any post-deserialisation patching that must touch every leaf).
  • regu_variable_node::clear_xasl / clear_xasl_local — per-node clean-up; frees rand-seeds, regex objects, embedded DB_VALUEs.
  • attr_descr_node (ATTR_DESCR) — the TYPE_ATTR_ID payload: id, type, cache_attrinfo, cache_dbvalp.
  • arith_list_node (ARITH_TYPE) — the TYPE_INARITH / TYPE_OUTARITH payload: domain, value, leftptr, rightptr, thirdptr, opcode, pred, rand_seed.
  • function_node (FUNCTION_TYPE) — the TYPE_FUNC payload: value, operand, ftype, tmp_obj (used to cache compiled regex for F_REGEXP_*).
  • regu_value_list / regu_value_item — the TYPE_REGUVAL_LIST payload (multi-row VALUES literal).
  • valptr_list_node (alias VALPTR_LIST, OUTPTR_LIST) — list of regu variables for output projection (xasl->outptr_list).
  • regu_variable_list_node (REGU_VARIABLE_LIST) — singly-linked list of regu variables; the building block for predicate operand lists, output lists, and function arguments.
  • REGU_VARIABLE_HIDDEN_COLUMN / ..._FETCH_ALL_CONST / ..._FETCH_NOT_CONST / ..._APPLY_COLLATION / ..._ANALYTIC_WINDOW / ..._UPD_INS_LIST / ..._STRICT_TYPE_CAST / ..._CORRELATED — flags that tune fetch and projection behaviour.
  • REGU_VARIABLE_IS_FLAGED / _SET_FLAG / _CLEAR_FLAG / _GET_TYPE — inline accessors.

Regu walker implementation (src/query/regu_var.cpp)

Section titled “Regu walker implementation (src/query/regu_var.cpp)”
  • regu_variable_node::map_regu — generic visitor recursing through arithptr->leftptr/rightptr, funcp->operand chain, sp_ptr->args chain, reguval_list->regu_list chain, regu_var_list chain. The walker is intentionally limited — it skips TYPE_LIST_ID, TYPE_DBVAL, TYPE_ATTR_ID (and similar leaves) because there are no children to visit.
  • regu_variable_node::map_regu_and_xasl — same walker, also visiting regu.xasl whenever set.
  • regu_variable_node::clear_xasl_local / clear_xasl — per-leaf clean-up wired into the deeper XASL clean-up path.

Fetch dispatcher and helpers (src/query/fetch.c, src/query/fetch.h)

Section titled “Fetch dispatcher and helpers (src/query/fetch.c, src/query/fetch.h)”
  • fetch_peek_dbval — the per-regu type-switch; returns a non-owning pointer.
  • fetch_copy_dbval — peek-then-copy convenience.
  • fetch_val_list — fetches every regu in a list into the corresponding vfetch_to; peek=PEEK shares pointers, peek=COPY copies. Has a fast path: when the head of the list is TYPE_POSITION, it calls fetch_peek_dbval_pos to walk the tuple in a single pass without re-decoding.
  • fetch_peek_arith — recursive arithmetic / general-expression evaluator; ~3.7 KLOC switch on arithptr->opcode.
  • fetch_peek_dbval_pos — fast tuple-positional fetcher; reads TYPE_POSITION regus in one linear pass over the tuple.
  • fetch_peek_min_max_value_of_width_bucket_func — specialised fetcher for T_WIDTH_BUCKET that pre-walks the predicate-encoded bucket bounds.
  • fetch_init_val_list — clears attr_descr.cache_dbvalp for every attribute regu in a list (called when the cache is invalidated).
  • fetch_force_not_const_recursive — used by TYPE_SP and similar always-not-const cases; walks the regu tree via map_regu and sets FETCH_NOT_CONST on every relevant leaf.

Function-call dispatcher (src/query/query_opfunc.c)

Section titled “Function-call dispatcher (src/query/query_opfunc.c)”
  • qdata_evaluate_function — the switch on funcp->ftype; routes to the per-function evaluator. Sub-cases:
    • F_SET / F_MULTISET / F_SEQUENCE / F_VIDqdata_convert_dbvals_to_set.
    • F_TABLE_SET / F_TABLE_MULTISET / F_TABLE_SEQUENCEqdata_convert_table_to_set.
    • F_GENERICqdata_evaluate_generic_function.
    • F_CLASS_OFqdata_get_class_of_function.
    • F_INSERT_SUBSTRINGqdata_insert_substring_function.
    • F_ELTqdata_elt.
    • F_BENCHMARKqdata_benchmark.
    • JSON family (F_JSON_*) → qdata_convert_operands_to_value_and_call (..., db_evaluate_json_*).
    • Regex family (F_REGEXP_COUNT / _INSTR / _LIKE / _REPLACE / _SUBSTR) → qdata_regexp_function.
  • EXECUTE_REGU_VARIABLE_XASL / CHECK_REGU_VARIABLE_XASL_STATUS (macros from src/query/xasl.h) — the helpers that drive sub-query materialisation when regu_var->xasl is set.
SymbolFileLine
eval_predsrc/query/query_evaluator.c1666
eval_pred_comp0src/query/query_evaluator.c2150
eval_pred_comp1src/query/query_evaluator.c2199
eval_pred_comp2src/query/query_evaluator.c2238
eval_pred_comp3src/query/query_evaluator.c2295
eval_pred_alsm4src/query/query_evaluator.c2353
eval_pred_alsm5src/query/query_evaluator.c2419
eval_pred_like6src/query/query_evaluator.c2477
eval_pred_rlike7src/query/query_evaluator.c2535
eval_fncsrc/query/query_evaluator.c2590
update_logical_resultsrc/query/query_evaluator.c2668
eval_data_filtersrc/query/query_evaluator.c2741
eval_key_filtersrc/query/query_evaluator.c2822
eval_negativesrc/query/query_evaluator.c96
eval_logical_resultsrc/query/query_evaluator.c119
eval_value_rel_cmpsrc/query/query_evaluator.c152
eval_some_evalsrc/query/query_evaluator.c353
eval_all_evalsrc/query/query_evaluator.c417
eval_some_list_evalsrc/query/query_evaluator.c549
eval_all_list_evalsrc/query/query_evaluator.c648
eval_set_list_cmpsrc/query/query_evaluator.c1542
fetch_peek_dbvalsrc/query/fetch.c3901
fetch_peek_arithsrc/query/fetch.c85
fetch_copy_dbvalsrc/query/fetch.c4695
fetch_val_listsrc/query/fetch.c4755
fetch_init_val_listsrc/query/fetch.c4818
fetch_peek_dbval_possrc/query/fetch.c4520
fetch_peek_min_max_value_of_width_bucket_funcsrc/query/fetch.c4581
fetch_force_not_const_recursivesrc/query/fetch.c5069
regu_variable_node::map_regusrc/query/regu_var.cpp41
regu_variable_node::map_regu_and_xaslsrc/query/regu_var.cpp122
regu_variable_node::clear_xasl_localsrc/query/regu_var.cpp142
regu_variable_node::clear_xaslsrc/query/regu_var.cpp210
qdata_evaluate_functionsrc/query/query_opfunc.c6875
TYPE_PRED_EXPR (enum)src/xasl/xasl_predicate.hpp32
BOOL_OP (enum)src/xasl/xasl_predicate.hpp39
TYPE_EVAL_TERM (enum)src/xasl/xasl_predicate.hpp48
REL_OP (enum)src/xasl/xasl_predicate.hpp56
QL_FLAG (enum)src/xasl/xasl_predicate.hpp88
pred_expr (struct)src/xasl/xasl_predicate.hpp150
comp_eval_termsrc/xasl/xasl_predicate.hpp106
alsm_eval_termsrc/xasl/xasl_predicate.hpp114
like_eval_termsrc/xasl/xasl_predicate.hpp123
rlike_eval_termsrc/xasl/xasl_predicate.hpp130
regu_variable_nodesrc/query/regu_var.hpp173
REGU_DATATYPE (enum)src/query/regu_var.hpp44
attr_descr_nodesrc/query/regu_var.hpp77
arith_list_nodesrc/query/regu_var.hpp126
function_nodesrc/query/regu_var.hpp143
regu_variable_list_nodesrc/query/regu_var.hpp224
valptr_list_node (OUTPTR_LIST)src/query/regu_var.hpp117
EXECUTE_REGU_VARIABLE_XASL (macro)src/query/xasl.h527
CHECK_REGU_VARIABLE_XASL_STATUS (macro)src/query/xasl.h579
SCAN_PREDsrc/query/query_evaluator.h94
SCAN_ATTRSsrc/query/query_evaluator.h103
FILTER_INFOsrc/query/query_evaluator.h112
QPROC_QUALIFICATION (enum)src/query/query_evaluator.h62

This document is purely code-derived (sources: []); there is no raw analysis to align with. The cross-check is therefore against sibling docs in this project’s code-analysis/cubrid/ folder, which describe the surrounding modules. Each sibling claim is checked against the current source tree.

  • cubrid-scan-manager.md — describes the per-row driver (scan_next_*, scan_handle_single_scan) and the INDX_SCAN_ID::range_pred / key_pred / scan_pred triple that wraps the evaluator’s predicates. The scan-manager doc lists eval_data_filter and eval_key_filter as the bridges to the evaluator; this document confirms them as the outward-facing entry points and adds the detail that pr_eval_fnc is pre-compiled by eval_fnc at scan-open time. Scan-manager calls eval_fnc directly during scan_init_scan_pred to populate the SCAN_PRED::pr_eval_fnc slot; this is the place specialisation happens.
  • cubrid-query-executor.md — describes the operator-level driver (qexec_execute_mainblock_internal, qexec_intprt_fnc, qexec_open_scan), the XASL tree shape (aptr_list / dptr_list / scan_ptr), and the post-processing phases. The executor’s role for the evaluator is twofold: (a) fetch_val_list calls during qexec_intprt_fnc to materialise the projected output; (b) the EXECUTE_REGU_VARIABLE_XASL macro the evaluator calls to materialise sub-query references is implemented in xasl.h as a recursive call back into the executor. There is no cross-doc disagreement.
  • cubrid-xasl-generator.md — describes how pt_to_pred_expr builds the PRED_EXPR tree, in particular emitting right-linear AND/OR chains. The evaluator’s iteration strategy (for (t_pr = pr; ...; t_pr = t_pr->pe.m_pred.rhs)) depends on this structural invariant — a left-linear tree would force a recursion call per conjunct and pay the recursion-depth check overhead. The XASL generator’s tree-shape assumption and the evaluator’s iteration contract are co-designed.
  • cubrid-semantic-check.md — describes the four-pass semantic check including type checking and constant folding. The evaluator is the runtime counterpart of two of those passes: it relies on the type-checked tree (every regu_variable->domain is set) and on the constant-folded form (literal sub-trees are folded to TYPE_DBVAL where possible, so the evaluator’s per-row work is minimal). The evaluator’s REGU_VARIABLE_FETCH_ALL_CONST flag is a per-scan form of the same idea — the semantic-check pass folds compile-time constants, the evaluator caches first-row constants for the rest of the scan.
  • cubrid-mvcc.md — the snapshot-vs-version logic the heap manager applies before the RECDES reaches eval_data_filter. The evaluator itself does not see snapshot information; by the time the RECDES arrives, MVCC has already filtered to the visible version. The evaluator’s job is purely the SQL predicate, not visibility.

A note on operator naming: the eval-term variants are named with numeric suffixes (comp0 / comp1 / comp2 / comp3 / alsm4 / alsm5 / like6 / rlike7). These numbers carry no semantics — they are historical, perhaps a hint that a long time ago the file was generated by a code-generator that allocated names by index. The mapping is now canonical and eval_fnc reads it as if the names were arbitrary.

  1. JIT for predicate evaluation. PostgreSQL has migrated to LLVM-based JIT (ExecCompileExpr, since v11) and reports double-digit speedups on scan-heavy workloads. CUBRID’s interpreter design is amenable — the predicate tree is already compact and walker-friendly — but the wire serialisation (XASL stream) would need a runtime compile after deserialisation. Investigation path: prototype an LLVM walker that compiles eval_pred plus the most common eval_pred_comp0 paths; benchmark on TPC-H Q1, Q6, Q14.
  2. Vectorised / batched evaluation. The fetch path is row-oriented; fetch_peek_dbval returns one DB_VALUE at a time. Vectorised alternatives (Postgres’s ExecQualBatch, DuckDB’s column-at-a-time) would require restructuring eval_pred to take a row vector. The biggest blocker is the regu-variable model: every leaf currently stores its result in a single DB_VALUE slot, which would need to become a buffer.
  3. SIMD-accelerated comparisons. eval_value_rel_cmp is the bottom of the comparison stack; it spends most cycles on tp_value_compare_with_error. For numeric-heavy predicates over primitive types, a SIMD path that compares 4 / 8 lanes at a time would reduce per-tuple cost dramatically. The trade-off is the cost of NULL-aware comparison logic and domain coercion — both fight straight-line vector code.
  4. Constant-folding policy boundaries. The FETCH_ALL_CONST / FETCH_NOT_CONST flags classify a regu after the first fetch. Could the optimiser instead pre-classify at compile-time and skip the runtime self-classification on subsequent rows? The current flag-on-first-fetch design is robust against serialisation drift, but the per-row cost of the REGU_VARIABLE_IS_FLAGED checks accumulates.
  5. The disabled lhs-coercion block in eval_value_rel_cmp. The lhs-side mirror of the rhs constant-coercion logic is #if 0 with a TODO comment “do not delete me for future”. When was it disabled, and what regression caused the disable? Investigation path: git log archaeology on the surrounding lines.
  6. Sub-query cache invalidation. XASL_USES_SQ_CACHE makes TYPE_CONSTANT and R_EXISTS paths short-circuit via sq_get. The cache key is built from the correlation operands. What happens on parameter rebinding (a prepared statement re-executed with new host vars)? Is the cache invalidated, or does the key include the host-var values?
  7. eval_pred_comp0 on R_EQ_TORDER. Total-order equality is a compile-time fixed predicate shape (no IS NULL, no EXISTS, no LIST_ID), so eval_fnc returns eval_pred_comp0. But eval_pred_comp0 checks db_value_is_null (peek_val1) && et_comp->rel_op != R_NULLSAFE_EQ and returns V_UNKNOWN — for R_EQ_TORDER with a NULL operand, the spec says the comparison should be definite (NULL = NULL → TRUE under total order). Is the current behaviour intentional, or does R_EQ_TORDER need a special pathway?
  8. Stored-procedure result caching. TYPE_SP always sets FETCH_NOT_CONST and re-evaluates the procedure per row. Some stored procedures are pure functions of their arguments and could be memoised. Is there an opt-in mechanism, or is it always re-run?
  9. fetch_peek_arith for T_PREDICATE opcode. When an arith node has a nested predicate (arithptr->pred), constancy is forced to NOT_CONST. This is the path used by T_CASE and similar conditional evaluators. Is the predicate cached across rows, or re-walked? The walker eval_pred is invoked fresh; the cost may matter on deeply-nested CASE expressions.
  10. HEAP_CACHE_ATTRINFO::cache_dbvalp lifetime under MVCC re-fetch. When a row needs MVCC re-fetch after a lock acquisition (in scan_next_index_scan for SELECT ... FOR UPDATE), the underlying page may have changed. Does attr_descr.cache_dbvalp correctly invalidate? The current fetch_peek_dbval does not appear to re-validate; the contract relies on the caller calling fetch_init_val_list at the right moment.

This document is sources: [] — purely code-derived from the CUBRID source tree at /data/hgryoo/references/cubrid/. No raw analysis files (raw/code-analysis/cubrid/...) were consulted.

  • src/query/query_evaluator.c — the predicate evaluator (~3 KLOC). Drivers: eval_pred; specialised paths: eval_pred_comp{0,1,2,3} / eval_pred_alsm{4,5} / eval_pred_like6 / eval_pred_rlike7; filter bridges: eval_data_filter / eval_key_filter; specialiser: eval_fnc.
  • src/query/query_evaluator.hSCAN_PRED, SCAN_ATTRS, FILTER_INFO, QPROC_QUALIFICATION, PR_EVAL_FNC.
  • src/query/fetch.cfetch_peek_dbval, fetch_copy_dbval, fetch_val_list, fetch_peek_arith (~5 KLOC; bulk is per-opcode arithmetic).
  • src/query/fetch.h — fetch entry points.
  • src/query/regu_var.hppregu_variable_node, REGU_DATATYPE, attr_descr_node, arith_list_node, function_node, regu_variable_list_node, valptr_list_node, REGU_VARIABLE_* flag set.
  • src/query/regu_var.cppmap_regu, clear_xasl_local, clear_xasl.
  • src/xasl/xasl_predicate.hpppred_expr, pred, comp_eval_term, alsm_eval_term, like_eval_term, rlike_eval_term, TYPE_PRED_EXPR, BOOL_OP, TYPE_EVAL_TERM, REL_OP, QL_FLAG.
  • src/query/xasl.hEXECUTE_REGU_VARIABLE_XASL and CHECK_REGU_VARIABLE_XASL_STATUS macros; XASL node integration.
  • src/query/query_opfunc.cqdata_evaluate_function dispatch on FUNC_CODE; per-function evaluators (qdata_convert_dbvals_to_set, qdata_insert_substring_function, qdata_elt, qdata_regexp_function, qdata_convert_operands_to_value_and_call
    • db_evaluate_json_*).
  • Cross-references for context: src/query/scan_manager.{c,h} (the caller side of eval_data_filter / eval_key_filter), src/query/query_executor.c (the per-row driver that calls fetch_val_list for output materialisation), src/storage/heap_attrinfo.h (the attribute-cache decoder used by TYPE_ATTR_ID fetches).
  • knowledge/code-analysis/cubrid/cubrid-query-executor.md — the operator-level driver; predicate evaluation is invoked from eval_data_filter inside the per-row pull loop.
  • knowledge/code-analysis/cubrid/cubrid-scan-manager.md — the per-tuple scan dispatcher; SCAN_PRED::pr_eval_fnc is set by eval_fnc at scan-open time.
  • knowledge/code-analysis/cubrid/cubrid-xasl-generator.md — the client-side XASL generator that builds PRED_EXPR and REGU_VARIABLE trees; produces right-linear AND/OR chains by convention.
  • knowledge/code-analysis/cubrid/cubrid-semantic-check.md — the type-checking and constant-folding pass that runs before XASL generation; sets the domain on every regu and folds compile-time constants.
  • knowledge/code-analysis/cubrid/cubrid-heap-manager.mdHEAP_CACHE_ATTRINFO and heap_attrinfo_read_dbvalues, the buffer the TYPE_ATTR_ID fetch path reads from.
  • knowledge/code-analysis/cubrid/cubrid-mvcc.md — the visibility check that runs before the predicate evaluation; the evaluator never sees invisible rows.
  • Graefe, Goetz. Query Evaluation Techniques for Large Databases, ACM Computing Surveys 25(2), 1993 — predicate evaluation, function dispatch, expression interpretation; the canonical predicate-walker / regu-variable framing.
  • Hellerstein, Joseph; Stonebraker, Michael (eds.). Readings in Database Systems, 4th ed. — operator implementation chapters covering the iterator-vs-vector trade-off and three-valued logic.
  • Petrov, Alex. Database Internals, O’Reilly 2019, Ch. 12 “Query Execution” — scan + filter pipeline; peek-vs-copy ownership.
  • Codd, E. F. The Relational Model for Database Management: Version 2, Addison-Wesley 1990 — three-valued logic in SQL predicates.
  • Aho, Alfred; Lam, Monica; Sethi, Ravi; Ullman, Jeffrey. Compilers: Principles, Techniques, and Tools (Dragon Book), 2nd ed. — expression trees and tree-walking interpreters.