CUBRID Scalar Functions — Arithmetic, String, Numeric, JSON, Regex, and Cryptographic Operator Primitives

Contents:

Theoretical Background
Common DBMS Design
CUBRID’s Approach
Source Walkthrough
Cross-check Notes
Open Questions
Sources

Theoretical Background

The scalar-function library turns operator-and-operands into a DB_VALUE. Every column reference, host parameter, and sub-query result the optimizer materialises is plumbed into one of these primitives — qdata_add_dbval for +, db_string_substring for SUBSTR, numeric_db_value_mul for NUMERIC * NUMERIC, db_string_regexp_count for REGEXP_COUNT, crypt_sha_two for SHA2. The evaluator (cubrid-query-evaluator.md) is the dispatcher; the library is what the dispatcher dispatches to.

Five textbook concerns shape the design.

Built-in scalar function dispatch. Silberschatz et al. (Database System Concepts, ch. 5 §“Functions and Procedures”) frames every built-in as a name, a signature, and an executable body. The catalog binds name to signature; the executor binds signature to a C function pointer. The two design axes are granularity (one routine per type combination, or one polymorphic routine that branches on type) and binding time (function pointer fixed at parse time, or resolved per row). The classical trade-off is monomorphisation cost (more code, faster dispatch) versus interpreter cost. CUBRID falls in the middle: per-row dispatch is OPERATOR_TYPE for arithmetic-shaped operators and FUNC_CODE for function-shaped operators, with the per-type fan-out below that done in a second switch on DB_TYPE.

Type domain coercion. Operators like + take INT + INT, INT + DOUBLE, STRING + INT, DATE + INT — the engine must coerce to a common domain before computing. Date, An Introduction to Database Systems ch. 4, formalises this as a partial order with explicit and implicit promotions. The implementation pattern is a promotion lattice: a binary join on types that returns the smallest type holding the result without precision loss. CUBRID encodes this twice — at semantic-check time (pt_apply_expressions_definition picks an EXPRESSION_DEFINITION overload and inserts PT_CAST nodes) and at evaluate time (qdata_add_dbval calls tp_value_auto_cast to repeat the coercion if runtime types still mismatch the dispatch table).

NULL semantics. Most primitives propagate NULL (x + NULL = NULL, SUBSTR(NULL,1,3) = NULL) but rules are not uniform: COALESCE(NULL,1) = 1, IS NULL returns boolean, comparisons return three-valued logic. Every primitive checks DB_IS_NULL(arg) before touching the operand and produces db_make_null(result) on null inputs, with rare exceptions like db_string_instr (which accepts a NULL start position).

Collation-aware string operations. Every character string carries a codeset (UTF-8, ISO-8859-1, …) and a collation. String operators that compare or transform characters must honour both: LOWER('Ç') differs across collations; 'a' = 'A' only under case-insensitive collation. CUBRID threads INTL_CODESET and LANG_COLLATION through every string primitive and delegates per-character work to intl_* (in intl_support.c); the per-collation alphabet is lang_user_alphabet_w_coll.

BCD arithmetic. SQL NUMERIC(p,s) is fixed-point decimal, not binary float — implementing it on a binary CPU requires multi-byte representation plus carry-propagating addition, subtraction, long multiplication, and shift-and-subtract long division (Knuth, The Art of Computer Programming vol. 2 §4.3). CUBRID stores DB_NUMERIC as a 16-byte buffer of 8-bit “digits” (DB_NUMERIC_BUF_SIZE = 16), LSB at offset 15; numeric_add / numeric_sub / numeric_mul / numeric_long_div are the kernels.

Regex engine choice. Backtracking engines (std::regex, PCRE) are exponential worst-case; NFA-simulation engines (RE2, Hyperscan) are linear-time but lack backreferences (Cox, “Regular Expression Matching Can Be Simple And Fast”, swtch.com 2007). CUBRID exposes both (engine_type::LIB_RE2 and engine_type::LIB_CPPSTD) under one façade gated by PRM_ID_REGEXP_ENGINE.

Common DBMS Design

Every relational engine — Postgres, MySQL, SQLite, CUBRID — solves the same five sub-problems with the same recurring patterns.

Catalog of named functions plus a dispatch table. Postgres stores every built-in in pg_proc keyed by oid; the dispatch table in fmgrtab.c maps each oid to Datum (*) (FunctionCallInfo). MySQL uses an Item_func_* class hierarchy with virtual val_int / val_str / val_real. SQLite registers each function dynamically via sqlite3_create_function. CUBRID has two dispatch tables: (1) OPERATOR_TYPE (PT_ADD, PT_SUBSTRING, PT_LIKE, …) for arithmetic-shaped operators carrying up to four operand pointers in an ARITH_TYPE node, and (2) FUNC_CODE (F_INSERT_SUBSTRING, F_ELT, F_BENCHMARK, the JSON family, the regex family, the set/multiset constructors) for variadic operators modelled as a REGU_VARIABLE_LIST. The split reflects historical growth: arithmetic-shape arrived first; once payloads outgrew three operands, the function-code lane was added.

Per-type monomorphisation under a common umbrella. The umbrella qdata_add_dbval decides at runtime which monomorphic kernel to call (qdata_add_int_to_dbval, qdata_add_short_to_dbval, qdata_add_bigint_to_dbval, …). Postgres factors this at semantic-check time (each type has its own pg_proc entry, eliminating the per-row branch); MySQL uses templated Item_func_plus plus runtime type checks. CUBRID’s approach is an artefact of XASL serialisation across the client/server boundary: the XASL tree carries a single T_ADD regardless of operand types, so type discrimination must happen on the server.

Three-tier coercion. The promotion lattice is applied in three places: (1) pt_apply_expressions_definition at semantic check picks an overload from a signature table and inserts PT_CAST nodes; (2) qdata_add_dbval’s opening lines re-check DB_TYPE_ENUMERATION, string + number, date + string, calling tp_value_auto_cast as a safety net; (3) the per-pair kernel falls back to NO_ERROR with no assignment for unanticipated pairs.

Collation thread. Every string primitive accepts INTL_CODESET and a collation identifier from db_get_string_codeset(string) / db_get_string_collation(string). Per-character work goes through intl_lower_string, intl_char_count, intl_char_size, intl_nextchar_utf8. Comparison-driven ops (LIKE, <) dispatch through the LANG_COLLATION virtual table.

Library selection at parameter time. PRM_ID_REGEXP_ENGINE (values LIB_RE2, LIB_CPPSTD) is read on every regex compile; the compiled cub_compiled_regex carries the engine tag so search/replace dispatch is a switch on a tagged union (compiled_regex_object is union { cub_std_regex *std_obj; re2::RE2 *re2_obj; }).

Where CUBRID sits. Older, larger, and more conservative than Postgres’s or SQLite’s library. No JIT (Postgres’s LLVM-backed ExecCompileExpr); no SIMD vectorisation. The 693 KB string_opfunc.c is one translation unit by design — splitting it would impede inter-procedural inlining of small qstr_* helpers. The library reads as cumulative archaeology: arithmetic at the bottom, then string, then date, then crypt, then JSON, then regex.

CUBRID’s Approach

The library lives in src/query/. Six files dominate by line count: arithmetic.c (numeric and date primitives), numeric_opfunc.c (BCD arithmetic), string_opfunc.c (string primitives — by far the largest), query_opfunc.c (dispatchers and arithmetic umbrellas), crypt_opfunc.c (hashing and encryption), and the string_regex_* triple. Cross-cutting glue lives in fetch.c (the per-row operator switch in fetch_peek_arith) and parser/func_type.cpp (the function-signature table).

The two dispatch lanes

The executor crosses into the library via two surfaces. The first is arithmetic-shaped operators — anything wrapped in an ARITH_TYPE node carrying up to four operand pointers (leftptr, rightptr, thirdptr, optional fourth) and an OPERATOR_TYPE opcode. These reach the library through fetch_peek_arith in fetch.c.

// fetch_peek_arith — src/query/fetch.c
//   driven by `arithptr->opcode` (an OPERATOR_TYPE).
static int
fetch_peek_arith (THREAD_ENTRY * thread_p, REGU_VARIABLE * regu_var, val_descr * vd, OID * obj_oid,
                  QFILE_TUPLE tpl, DB_VALUE ** peek_dbval)
{
  ARITH_TYPE *arithptr = regu_var->value.arithptr;
  // ... fast-path for REGU_VARIABLE_FETCH_ALL_CONST, recursion-depth check ...

  /* Step 1: per-opcode operand fetch. Each case decides which sub-regus
     to fetch and in what order; T_ADD short-circuits the right operand
     when peek_left is NULL under PRM_ID_ORACLE_STYLE_EMPTY_STRING. */
  switch (arithptr->opcode)
    {
    case T_SUBSTRING: case T_LPAD: case T_RPAD: case T_REPLACE: case T_TRANSLATE:
      /* Three-operand string ops: fetch left, then right, then third. */
      // ... condensed ...
      break;
    case T_ADD: case T_SUB: case T_MUL: case T_DIV: case T_MOD:
    case T_POSITION: case T_AES_ENCRYPT: case T_AES_DECRYPT: case T_SHA_TWO:
    case T_POWER: case T_ROUND: case T_LOG: case T_TRUNC: case T_STRCAT:
    case T_BIT_AND: case T_BIT_OR: case T_BIT_XOR: case T_INTDIV: case T_INTMOD:
      /* Two-operand: fetch left and right. */
      break;
    /* ... ~120 case labels for unary/ternary/quaternary shapes ... */
    }

  /* Step 2: per-opcode dispatch to the library. */
  switch (arithptr->opcode)
    {
    case T_ADD:
      if (qdata_add_dbval (peek_left, peek_right, arithptr->value, regu_var->domain) != NO_ERROR) goto error;
      break;
    case T_SUB:
      if (qdata_subtract_dbval (peek_left, peek_right, arithptr->value, regu_var->domain) != NO_ERROR) goto error;
      break;
    case T_MUL: /* qdata_multiply_dbval */ break;
    case T_DIV: /* qdata_divide_dbval */ break;
    case T_MOD: /* db_mod_dbval */ break;
    case T_FLOOR:
      if (DB_IS_NULL (peek_right)) PRIM_SET_NULL (arithptr->value);
      else db_floor_dbval (arithptr->value, peek_right);
      break;
    case T_SUBSTRING:
      if (DB_IS_NULL (peek_left) || DB_IS_NULL (peek_right))
        PRIM_SET_NULL (arithptr->value);
      else
        db_string_substring (arithptr->misc_operand, peek_left, peek_right, peek_third, arithptr->value);
      break;
    /* ... ~160 case labels covering the OPERATOR_TYPE enum ... */
    }
}

The two-pass shape — fetch operands, then dispatch — is deliberate: the fetch pass implements per-opcode operand-evaluation strategy (e.g., T_ADD short-circuits the right operand when peek_left is NULL under PRM_ID_ORACLE_STYLE_EMPTY_STRING); the dispatch pass calls into the library. The recursion-depth check (thread_inc_recursion_depth) at the top, capped at PRM_ID_MAX_RECURSION_SQL_DEPTH, makes deep expressions safe.

The second lane is function-shaped operators — anything modelled as a FUNCTION_TYPE node with a REGU_VARIABLE_LIST of operands and a FUNC_CODE discriminator. These are reached through qdata_evaluate_function:

// qdata_evaluate_function — src/query/query_opfunc.c
//   driven by `funcp->ftype` (a FUNC_CODE).
int
qdata_evaluate_function (THREAD_ENTRY * thread_p, regu_variable_node * function_p, ...)
{
  FUNCTION_TYPE *funcp = function_p->value.funcp;
  pr_clear_value (funcp->value);

  switch (funcp->ftype)
    {
    case F_SET: case F_MULTISET: case F_SEQUENCE: case F_VID:
      return qdata_convert_dbvals_to_set (thread_p, /* set type */, function_p, ...);
    case F_TABLE_SET: case F_TABLE_MULTISET: case F_TABLE_SEQUENCE:
      return qdata_convert_table_to_set (thread_p, /* set type */, function_p, val_desc_p);
    case F_GENERIC:          return qdata_evaluate_generic_function (...);
    case F_CLASS_OF:         return qdata_get_class_of_function (...);
    case F_INSERT_SUBSTRING: return qdata_insert_substring_function (...);
    case F_ELT:              return qdata_elt (...);
    case F_BENCHMARK:        return qdata_benchmark (...);

    /* JSON family: ~25 entries; each fetches operands into DB_VALUE *args[]
       via qdata_convert_operands_to_value_and_call and forwards to a
       db_evaluate_json_* function pointer. */
    case F_JSON_ARRAY:    return qdata_convert_operands_to_value_and_call (..., db_evaluate_json_array);
    case F_JSON_EXTRACT:  return qdata_convert_operands_to_value_and_call (..., db_evaluate_json_extract);
    /* ... F_JSON_OBJECT, F_JSON_MERGE, F_JSON_INSERT, F_JSON_REPLACE,
           F_JSON_SET, F_JSON_REMOVE, F_JSON_KEYS, F_JSON_LENGTH,
           F_JSON_DEPTH, F_JSON_TYPE, F_JSON_VALID, F_JSON_QUOTE,
           F_JSON_UNQUOTE, F_JSON_PRETTY, F_JSON_SEARCH,
           F_JSON_CONTAINS, F_JSON_CONTAINS_PATH,
           F_JSON_ARRAY_APPEND, F_JSON_ARRAY_INSERT,
           F_JSON_GET_ALL_PATHS, F_JSON_MERGE_PATCH ... */

    case F_REGEXP_COUNT: case F_REGEXP_INSTR: case F_REGEXP_LIKE:
    case F_REGEXP_REPLACE: case F_REGEXP_SUBSTR:
      return qdata_regexp_function (thread_p, funcp, val_desc_p, obj_oid_p, tuple);

    default:
      er_set (ER_ERROR_SEVERITY, ARG_FILE_LINE, ER_QPROC_INVALID_XASLNODE, 0);
      return ER_FAILED;
    }
}

The note /* should sync with fetch_peek_dbval () */ at the top is a maintenance directive: every new FUNC_CODE must also be added to the function-evaluation arm of fetch_peek_dbval so the regu-variable peek path can reach the same dispatcher.

flowchart LR
    Regu[REGU_VARIABLE]
    Regu -->|TYPE_INARITH| Arith[ARITH_TYPE]
    Regu -->|TYPE_FUNC|    Func[FUNCTION_TYPE]
    Regu -->|TYPE_CONSTANT| Const[DB_VALUE]
    Regu -->|TYPE_ATTR_ID|  Attr[heap fetch]

    Arith -->|opcode T_ADD| Add[qdata_add_dbval]
    Arith -->|opcode T_SUB| Sub[qdata_subtract_dbval]
    Arith -->|opcode T_MUL| Mul[qdata_multiply_dbval]
    Arith -->|opcode T_DIV| Div[qdata_divide_dbval]
    Arith -->|opcode T_MOD| Mod[db_mod_dbval]
    Arith -->|opcode T_SUBSTRING| Sub2[db_string_substring]
    Arith -->|opcode T_LIKE| Like[db_string_like]
    Arith -->|opcode T_LOWER| Low[db_string_lower]
    Arith -->|opcode T_AES_ENCRYPT| AES[db_string_aes_encrypt]
    Arith -->|opcode T_RLIKE| Rl[db_string_rlike]
    Arith -->|opcode T_FLOOR/T_CEIL/T_ABS/...| Math[db_floor_dbval/db_ceil_dbval/db_abs_dbval/...]

    Func -->|F_SET/MULTISET/SEQUENCE| MkSet[qdata_convert_dbvals_to_set]
    Func -->|F_INSERT_SUBSTRING| Ins[qdata_insert_substring_function]
    Func -->|F_ELT| Elt[qdata_elt]
    Func -->|F_JSON_ARRAY/MERGE/EXTRACT/...| JFan[qdata_convert_operands_to_value_and_call]
    Func -->|F_REGEXP_COUNT/INSTR/LIKE/REPLACE/SUBSTR| RexF[qdata_regexp_function]
    JFan --> Jeval[db_evaluate_json_*]
    RexF --> Rexd[db_string_regexp_*]

Figure 1 — REGU_VARIABLE dispatch tree. TYPE_INARITH routes to ARITH_TYPE opcodes (T_ADD, T_SUBSTRING, T_AES_ENCRYPT, …) and TYPE_FUNC routes to FUNCTION_TYPE codes (F_SET, F_ELT, F_JSON_*, F_REGEXP_*), each fanning out to its monomorphic evaluation kernel.

Arithmetic — `qdata_add_dbval` and the per-pair kernels

qdata_add_dbval is the umbrella for +. Three phases: (1) ENUMERATION pre-coercion — cast to VARCHAR (if the other operand is a string) or SHORT (otherwise) and recurse; (2) PRM_ID_PLUS_AS_CONCAT short-circuit — both operands character-or-bit forwards to qdata_strcat_dbval (MySQL-compat 'a' + 'b' = 'ab'); (3) promotion-lattice coercion — for STRING + NUMBER / NUMBER + DATE / STRING + DATE the operands may be swapped to canonical order, the string side cast to DOUBLE, the number side (when added to a date) cast to BIGINT. Then the per-DB_TYPE switch hands off to the monomorphic kernel.

// qdata_add_dbval — src/query/query_opfunc.c
int
qdata_add_dbval (DB_VALUE * dbval1_p, DB_VALUE * dbval2_p, DB_VALUE * result_p, tp_domain * domain_p)
{
  DB_TYPE type1 = DB_VALUE_DOMAIN_TYPE (dbval1_p);
  /* Phase 1: ENUMERATION pre-coercion (recursive). */
  /* Phase 2: PLUS_AS_CONCAT short-circuit -> qdata_strcat_dbval. */
  /* Phase 3: mixed-type coercion via tp_value_auto_cast. */
  /* Phase 4: per-DB_TYPE dispatch. */
  switch (type1)
    {
    case DB_TYPE_SHORT:    error = qdata_add_short_to_dbval  (dbval1_p, dbval2_p, result_p, domain_p);  break;
    case DB_TYPE_INTEGER:  error = qdata_add_int_to_dbval    (dbval1_p, dbval2_p, result_p, domain_p);  break;
    case DB_TYPE_BIGINT:   error = qdata_add_bigint_to_dbval (dbval1_p, dbval2_p, result_p, domain_p);  break;
    case DB_TYPE_FLOAT:    error = qdata_add_float_to_dbval  (dbval1_p, dbval2_p, result_p);            break;
    case DB_TYPE_DOUBLE:   error = qdata_add_double_to_dbval (dbval1_p, dbval2_p, result_p);            break;
    case DB_TYPE_NUMERIC:  error = qdata_add_numeric_to_dbval(dbval1_p, dbval2_p, result_p);            break;
    case DB_TYPE_MONETARY: error = qdata_add_monetary_to_dbval(dbval1_p, dbval2_p, result_p);           break;
    case DB_TYPE_DATE:     error = qdata_add_date_to_dbval   (dbval1_p, dbval2_p, result_p);            break;
    /* DATETIME, TIMESTAMP, TIME, TIMESTAMPTZ, DATETIMETZ, ... */
    }
  return qdata_coerce_result_to_domain (result_p, domain_p);
}

The per-pair kernel qdata_add_int_to_dbval (“LHS is INT, dispatch on RHS”) runs an inner switch on the second operand’s DB_TYPE and calls a fully-monomorphic helper:

// qdata_add_int_to_dbval — src/query/query_opfunc.c
static int
qdata_add_int_to_dbval (DB_VALUE * int_val_p, DB_VALUE * dbval_p, DB_VALUE * result_p, TP_DOMAIN * domain_p)
{
  int i = db_get_int (int_val_p);
  switch (DB_VALUE_DOMAIN_TYPE (dbval_p))
    {
    case DB_TYPE_SHORT:     return qdata_add_int    (i, db_get_short (dbval_p), result_p);
    case DB_TYPE_INTEGER:   return qdata_add_int    (i, db_get_int   (dbval_p), result_p);
    case DB_TYPE_BIGINT:    return qdata_add_bigint (i, db_get_bigint(dbval_p), result_p);
    case DB_TYPE_FLOAT:     return qdata_add_float  ((float) i, db_get_float (dbval_p), result_p);
    case DB_TYPE_DOUBLE:    return qdata_add_double (i, db_get_double(dbval_p), result_p);
    case DB_TYPE_NUMERIC:   return qdata_add_numeric (dbval_p, int_val_p, result_p);
    case DB_TYPE_TIMESTAMP: return qdata_add_int_to_utime    (dbval_p, i, result_p, domain_p);
    case DB_TYPE_DATE:      return qdata_add_int_to_date     (dbval_p, i, result_p, domain_p);
    /* MONETARY, TIME, TIMESTAMPLTZ, TIMESTAMPTZ, DATETIME, DATETIMELTZ, DATETIMETZ */
    }
}

The monomorphic qdata_add_int (and qdata_add_short, qdata_add_bigint, qdata_add_float, qdata_add_double) is where the actual arithmetic happens with overflow checks — integer addition raises ER_QPROC_OVERFLOW_ADDITION, bigint addition checks against DB_BIGINT_MAX, float/double arithmetic relies on IEEE infinity propagation.

The four umbrellas — qdata_add_dbval, qdata_subtract_dbval, qdata_multiply_dbval, qdata_divide_dbval — share the same shape. qdata_unary_minus_dbval is simpler (one operand) but follows the same per-DB_TYPE switch. db_mod_dbval (in arithmetic.c) is its own thing — it predates the qdata_* umbrellas, fanning out to db_mod_short, db_mod_int, db_mod_bigint, db_mod_float, db_mod_double, db_mod_string, db_mod_numeric, db_mod_monetary.

Promotion lattice as observed at runtime:

graph TD
    Short[SHORT] --> Int[INTEGER]
    Int --> BigInt[BIGINT]
    BigInt --> Numeric[NUMERIC<br/>BCD, exact]
    BigInt --> Float[FLOAT<br/>IEEE 32]
    Float --> Double[DOUBLE<br/>IEEE 64]
    Numeric --> Double
    Char[CHAR/VARCHAR] -->|tp_value_auto_cast| Double
    Date[DATE] -->|+INT/BIGINT| Date
    Date -->|+STRING| BigIntDate[BIGINT then DATE]
    Enum[ENUMERATION] -->|cast first| Short
    Enum -->|cast first| Char

Figure 2 — Numeric promotion lattice at runtime. SHORT widens through INTEGER → BIGINT → NUMERIC/FLOAT → DOUBLE; CHAR/VARCHAR casts to DOUBLE via tp_value_auto_cast; ENUMERATION pre-coerces to SHORT or CHAR before entering the lattice.

Numeric (BCD) — `numeric_db_value_*`

DB_NUMERIC is a 16-byte buffer of binary digits, two’s-complement signed. numeric_opfunc.c implements arithmetic on this representation. The kernels are byte-wise carry-propagating loops resembling grade-school arithmetic:

// numeric_add — src/query/numeric_opfunc.c
//   carry-propagating byte add, MSB at offset 0, LSB at offset size-1.
static void
numeric_add (DB_C_NUMERIC arg1, DB_C_NUMERIC arg2, DB_C_NUMERIC answer, int size)
{
  unsigned int answer_bit = 0;
  for (int digit = size - 1; digit >= 0; digit--)
    {
      answer_bit = (arg1[digit] + arg2[digit]) + CARRYOVER (answer_bit);
      answer[digit] = GET_LOWER_BYTE (answer_bit);
    }
}

/* numeric_sub negates arg2 (two's complement) then forwards to numeric_add.
   numeric_mul is long multiplication: outer loop shifts arg1 by `shift` bytes,
   inner loop computes one digit of the product, results accumulate via numeric_add.
   numeric_long_div is shift-and-subtract (Knuth Algorithm D variant). */

The public entry points are numeric_db_value_add, numeric_db_value_sub, numeric_db_value_mul, numeric_db_value_div. Each does:

Validate the argument is DB_TYPE_NUMERIC.
Coerce both operands to a common precision and scale via numeric_common_prec_scale — the operand with the smaller scale gets shifted left (multiplied by powers of ten) so both have the same scale; precision is widened to max(p1, p2) + 1 to leave room for the carry.
Call the kernel (numeric_add etc.).
Detect overflow with numeric_overflow (which checks whether the high byte is nonzero beyond the new precision); on overflow either widen the precision by one digit (if < DB_MAX_NUMERIC_PRECISION) or raise ER_IT_DATA_OVERFLOW.
Build the result DB_VALUE with db_make_numeric(answer, prec, scale).

Conversion into and out of the BCD representation is in the same file: numeric_coerce_int_to_num, numeric_coerce_double_to_num, numeric_coerce_string_to_num (built on mprec.h for double-precision parsing), and the inverses numeric_coerce_num_to_int, numeric_coerce_num_to_double, numeric_coerce_num_to_dec_str. Long-numeric (32-byte) variants exist for intermediate results that exceed 16 bytes (numeric_is_longnum_value, numeric_shortnum_to_longnum, numeric_longnum_to_shortnum).

String — `db_string_*`

string_opfunc.c is the largest file in src/query/ (693 KB) with ~350 functions. Representative entries:

Operator	Routine	Notes
`\|\|` / `CONCAT`	`db_string_concatenate`	Codeset check; `PRM_ID_ORACLE_STYLE_EMPTY_STRING` decides NULL handling
`SUBSTR` / `SUBSTRING` / `MID`	`db_string_substring`	Two flavours via `MISC_OPERAND`
`INSTR` / `POSITION`	`db_string_instr`	1-based index
`LOWER` / `UPPER`	`db_string_lower` / `_upper`	Collation alphabet via `lang_user_alphabet_w_coll`
`LIKE`	`db_string_like`	Four-valued `DB_LOGICAL`; delegates to `qstr_eval_like`
`RLIKE` / `REGEXP`	`db_string_rlike`	Compiles via `cubregex::compile`
`LPAD` / `RPAD`	`db_string_lpad` / `_rpad`	Length in characters
`TRIM` / `LTRIM` / `RTRIM`	`db_string_trim`	Dispatched by `MISC_OPERAND`
`REPLACE` / `TRANSLATE`	`db_string_replace` / `_translate`	Source-find-replace and char-table substitution
`REPEAT`	`db_string_repeat`	Pre-checks `PRM_ID_STRING_MAX_SIZE_BYTES`
`INSERT`	`db_string_insert_substring`	F_INSERT_SUBSTRING function-shape
`LENGTH` / `CHAR_LENGTH`	`db_string_char_length` / `_bit_length`	Characters vs bits
`MD5` / `SHA1` / `SHA2`	`db_string_md5` / `_sha_one` / `_sha_two`	Wrapping `crypt_*`
`AES_ENCRYPT` / `AES_DECRYPT`	`db_string_aes_encrypt` / `_decrypt`	AES-128/ECB/PKCS7
`REVERSE` / `SPACE` / `QUOTE`	`db_string_reverse` / `_space` / `_quote`	UTF-8-aware reverse, padding, escape

db_string_substring shows the codeset-aware shape:

// db_string_substring — src/query/string_opfunc.c
int
db_string_substring (const MISC_OPERAND substr_operand, const DB_VALUE * src_string,
                     const DB_VALUE * start_position, const DB_VALUE * extraction_length,
                     DB_VALUE * sub_string)
{
  if (DB_IS_NULL (src_string) || DB_IS_NULL (start_position))
    { db_make_null (sub_string); return NO_ERROR; }
  /* validate types: QSTR_IS_ANY_CHAR_OR_BIT, is_integer ... */

  if (QSTR_IS_CHAR (src_type))
    {
      const unsigned char *string = DB_GET_UCHAR (src_string);
      int string_len    = db_get_string_length (src_string);
      int start_offset  = db_get_int (start_position);
      int extract_nchars = extraction_length_is_null ? string_len : db_get_int (extraction_length);

      /* For SUBSTR (Oracle), negative start counts from the end via codeset
         walk: intl_char_size advances byte_pos to the (length+start)'th char. */
      if (substr_operand == SUBSTR && start_offset < 0)
        {
          int byte_pos;
          intl_char_size (string, string_len + start_offset,
                          db_get_string_codeset (src_string), &byte_pos);
          string += byte_pos; string_len = -start_offset;
        }

      qstr_substring (string, string_len, start_offset, extract_nchars,
                      db_get_string_codeset (src_string), &sub, &sub_length, &sub_size);
      qstr_make_typed_string (result_type, sub_string, ..., (char *) sub, sub_size,
                              db_get_string_codeset (src_string),
                              db_get_string_collation (src_string));
      sub_string->need_clear = true;
    }
  else { /* qstr_bit_substring for bit-string variant */ }
  return NO_ERROR;
}

The flow is uniform: validate types, return early on NULL, delegate to a qstr_* byte-level kernel, package the result with qstr_make_typed_string (which sets codeset and collation on the result DB_VALUE).

db_string_lower shows the collation thread — the alphabet is collation-specific so a single codepoint may lowercase differently across collations (Turkish I -> ı under tr_TR):

// db_string_lower — src/query/string_opfunc.c
int
db_string_lower (const DB_VALUE * string, DB_VALUE * lower_string)
{
  if (DB_IS_NULL (string)) { db_make_null (lower_string); return NO_ERROR; }
  const ALPHABET_DATA *alphabet = lang_user_alphabet_w_coll (db_get_string_collation (string));
  int lower_size = intl_lower_string_size (alphabet, DB_GET_UCHAR (string), ...);
  unsigned char *lower_str = (unsigned char *) db_private_alloc (NULL, lower_size + 1);
  intl_lower_string (alphabet, DB_GET_UCHAR (string), lower_str, src_length);
  qstr_make_typed_string (..., db_get_string_codeset (string), db_get_string_collation (string));
  return NO_ERROR;
}

db_string_like is the predicate counterpart — it returns the four-valued DB_LOGICAL (V_TRUE / V_FALSE / V_UNKNOWN / V_ERROR) instead of a result string. Its body validates types/codesets, resolves a common collation via LANG_RT_COMMON_COLL, then calls qstr_eval_like — the byte-walking matcher that handles % / _ wildcards plus the optional ESCAPE char. LIKE is therefore a Boolean-returning scalar function; the predicate evaluator wraps it into a T_LIKE_EVAL_TERM for filtering.

Date/time

Date arithmetic is split between arithmetic.c (SQL-level entries — db_round_dbval, db_trunc_dbval, db_add_months) and query_opfunc.c (per-pair qdata_add_int_to_date, qdata_add_int_to_datetime, qdata_add_int_to_utime, qdata_add_short_to_timestamptz invoked when one operand is date/time, the other numeric). Format/parse: db_string_to_date, db_string_to_datetime, db_string_to_timestamp and inverses db_date_format, db_datetime_format. The full date library lives in src/base/db_date.c; the scalar-function layer is the SQL facade.

JSON

arithmetic.c carries ~25 JSON evaluators (db_evaluate_json_extract, _array, _object, _merge_preserve, _merge_patch, _search, _contains, _contains_path, _keys, _length, _depth, _type_dbval, _valid, _quote, _unquote, _pretty, _array_append, _array_insert, _insert, _replace, _set, _remove, _get_all_paths, plus db_accumulate_json_arrayagg / _objectagg aggregate accumulators). All share the signature (DB_VALUE *result, DB_VALUE * const *args, int num_args) and delegate to src/compat/db_json.cpp (rapidjson wrapper); db_json_path.hpp defines the JSONPath grammar.

Dispatch from qdata_evaluate_function is uniform: every JSON function flows through qdata_convert_operands_to_value_and_call, which fetches the regu-variable list into a stack-allocated DB_VALUE *args[], calls the function pointer, frees the array.

Regex

The regex façade in string_regex.cpp exposes six operations (compile, search, count, instr, replace, substr). Each one converts its std::string arguments to UTF-8 (regardless of the original codeset), switches on compiled_regex::type, and forwards to either the RE2 backend (re2_* functions in string_regex_re2.cpp) or the C++ std::regex backend (std_* in string_regex_std.cpp). Backend selection happens at compile time, driven by the system parameter PRM_ID_REGEXP_ENGINE:

// cubregex::compile — src/query/string_regex.cpp
int compile (REFPTR (compiled_regex, cr), const std::string &pattern_string,
             const std::string &opt_str, const LANG_COLLATION *collation)
{
  opt_flag_type opt_flag = parse_match_type (opt_str);   /* 'c' / 'i' */
  engine_type type = static_cast<engine_type> (prm_get_integer_value (PRM_ID_REGEXP_ENGINE));
  cublocale::convert_string_to_utf8 (utf8_pattern, pattern_string, collation->codeset);
  if (should_compile_skip (cr, utf8_pattern, type, opt_flag, collation->codeset)) return NO_ERROR;
  return compile_regex_internal (cr, utf8_pattern, type, opt_flag, collation);
}

The “compile-skip” path is the per-row hot path: when the pattern is constant across rows, the first row pays compile cost and every subsequent row gets a cached cub_compiled_regex * held in function_p->tmp_obj (or arithptr->aux for the operator-shape RLIKE).

The two backends do not share much. RE2’s API is method-based (RE2::PartialMatch, RE2::FindAndConsume) and works directly on UTF-8 byte strings; std::regex is iterator-based (std::regex_iterator, std::wstring) and requires UTF-8 → wide → UTF-8 round-trips. RE2 supports neither lookarounds nor backreferences but is linear-time; std::regex supports both at potential exponential cost.

stateDiagram-v2
    [*] --> NotCompiled
    NotCompiled --> Compiling: cubregex::compile()
    Compiling --> ChooseEngine: parse opt_str, read PRM_ID_REGEXP_ENGINE
    ChooseEngine --> RE2: type == LIB_RE2
    ChooseEngine --> Std: type == LIB_CPPSTD
    RE2 --> Compiled: re2_compile() ok
    Std --> Compiled: std_compile() ok
    Compiled --> Searching: search/count/instr/replace/substr
    Searching --> Compiled: still cached, pattern unchanged
    Compiled --> Recompile: pattern or flags changed
    Recompile --> Compiling
    Compiled --> [*]: ~compiled_regex

Figure 3 — cubregex compile-and-cache state machine. ChooseEngine selects RE2 or std::regex based on PRM_ID_REGEXP_ENGINE; once Compiled, a pattern stays cached in function_p->tmp_obj across rows and only re-enters Compiling when the pattern or flags change.

Crypto

crypt_opfunc.c wraps OpenSSL in three families. Hashing: crypt_sha_one (SHA-1), crypt_sha_two (SHA-224/256/384/512 selected by hash length), crypt_md5_buffer_hex / crypt_md5_buffer_binary (MD5). All call EVP_DigestInit/Update/Final; hex encoding is in-house via str_to_hex_prealloced. Encryption: crypt_default_encrypt / crypt_default_decrypt accept a CIPHER_ENCRYPTION_TYPE (AES_128_ECB / DES_ECB) with PKCS7 padding; aes_default_gen_key XOR-folds an arbitrary-length key into 16 bytes (matching MySQL’s AES_ENCRYPT); EVP_CIPHER_CTX is RAII-wrapped in a deleted_unique_ptr. Random / CRC: crypt_generate_random_bytes (OpenSSL RAND_bytes), crypt_crc32; the dblink helpers (crypt_dblink_encrypt, shake_dblink_password) obfuscate cross-server-link passwords with a fixed 16-byte transport key.

The SQL surface (db_string_md5, _sha_one, _sha_two, _aes_encrypt, _aes_decrypt, _des_encrypt) lives in string_opfunc.c as thin wrappers that handle NULL, type-check QSTR_IS_ANY_CHAR, manage buffer ownership (db_private_alloc + need_clear=true), and forward bytes into the crypt_* kernels.

Type-system integration

The compile-time face is parser/type_checking.c and parser/func_type.cpp. Function-shaped operators (F_INSERT_SUBSTRING, JSON, regex, …) are typed by pt_eval_function_type, which (for codes that opt in via pt_is_function_new_type_checking) constructs a func_type::Node and calls its type_checking () method. func_type.cpp carries a per-FUNC_CODE signature table — sig_of_insert_substring, sig_of_elt, sig_of_benchmark, sig_ret_json_arg_jdoc_r_jpath, sig_of_regexp_count, … — expressing argument types via a pseudo-typed enum (PT_GENERIC_TYPE_JSON_DOC, PT_GENERIC_TYPE_STRING) and a fixed-args / repeating-args partition (so F_JSON_OBJECT’s (key, value) repeating pair is one signature). The matcher scores candidates by cast cost and either picks the best match or returns an error.

Arithmetic-shaped operators (PT_ADD, PT_SUBSTRING, PT_LIKE, …) are typed by pt_eval_expr_type calling pt_apply_expressions_definition (older but similar). Both lanes share the eventual coercion infrastructure (pt_coerce_expression_argument, tp_value_auto_cast). Once typed, xasl_generation.c packs the operator tree into ARITH_TYPE (operators) and FUNCTION_TYPE (functions) nodes wrapped in a REGU_VARIABLE of TYPE_INARITH or TYPE_FUNC — the runtime dispatch keys off those tags.

Domain coercion at evaluate time

Compile-time coercions are advisory: the runtime cannot trust the optimizer’s types because host variables may bind differently, sub-query results may have DB_TYPE_VARIABLE resolving only at execute time, and T_DEFAULT / NULL substitution can change operand kind. Every qdata_*_dbval umbrella runs its own coercion pass via tp_value_auto_cast. For string ops, the parallel is tp_value_str_auto_cast_to_number — called by db_floor_dbval / db_round_dbval so that FLOOR('3.7') produces 3.

Source Walkthrough

Arithmetic dispatchers and umbrellas (`query_opfunc.c`)

Dispatcher: qdata_evaluate_function. Binary umbrellas: qdata_add_dbval, qdata_subtract_dbval, qdata_multiply_dbval, qdata_divide_dbval. Unary: qdata_unary_minus_dbval. Concat: qdata_strcat_dbval. Per-LHS-type fan-out: qdata_add_short_to_dbval / _int_to_dbval / _bigint_to_dbval / _float_to_dbval / _double_to_dbval / _numeric_to_dbval / _monetary_to_dbval / _date_to_dbval / _datetime_to_dbval and parallel families for qdata_subtract_*, qdata_multiply_*, qdata_divide_*. Monomorphic kernels with overflow checks: qdata_add_short, qdata_add_int, qdata_add_bigint, qdata_add_float, qdata_add_double, qdata_add_numeric, qdata_add_monetary. Date/time helpers: qdata_add_int_to_utime, qdata_add_int_to_date, qdata_add_int_to_datetime. Bitwise: qdata_bit_and_dbval, qdata_bit_or_dbval, qdata_bit_xor_dbval, qdata_bit_not_dbval, qdata_bit_shift_dbval. Integer divmod: qdata_divmod_dbval. Result coercion: qdata_coerce_result_to_domain. Function-shape fan-ins: qdata_regexp_function, qdata_convert_operands_to_value_and_call, qdata_convert_dbvals_to_set, qdata_convert_table_to_set, qdata_insert_substring_function, qdata_elt, qdata_benchmark, qdata_evaluate_generic_function, qdata_get_class_of_function.

Arithmetic primitives (`arithmetic.c`)

Math: db_floor_dbval, db_ceil_dbval, db_sign_dbval, db_abs_dbval, db_exp_dbval, db_sqrt_dbval, db_power_dbval. MOD: db_mod_dbval plus per-type db_mod_short/_int/_bigint/_float/_double/_string/_numeric/_monetary. Rounding: db_round_dbval, db_trunc_dbval, round_double, round_date, truncate_double, truncate_bigint, truncate_date. Logs: db_log_dbval, db_log_generic_dbval. RNG: db_random_dbval, db_drandom_dbval. Trig: db_sin_dbval, db_cos_dbval, db_tan_dbval, db_cot_dbval, db_acos_dbval, db_asin_dbval, db_atan_dbval. Angles: db_degrees_dbval, db_radians_dbval. Misc: db_bit_count_dbval, db_typeof_dbval, db_width_bucket, db_sleep, db_least_or_greatest. JSON family (~25): db_evaluate_json_extract, db_evaluate_json_array, db_evaluate_json_object, db_evaluate_json_search, db_evaluate_json_merge_preserve, db_evaluate_json_merge_patch, db_evaluate_json_contains, db_evaluate_json_contains_path, db_evaluate_json_keys, db_evaluate_json_length, db_evaluate_json_depth, db_evaluate_json_type_dbval, db_evaluate_json_valid, db_evaluate_json_quote, db_evaluate_json_unquote, db_evaluate_json_pretty, db_evaluate_json_array_append, db_evaluate_json_array_insert, db_evaluate_json_insert, db_evaluate_json_replace, db_evaluate_json_set, db_evaluate_json_remove, db_evaluate_json_get_all_paths, db_accumulate_json_arrayagg, db_accumulate_json_objectagg.

Numeric (BCD) (`numeric_opfunc.c`)

Kernels: numeric_add, numeric_sub, numeric_mul, numeric_long_div, numeric_div. Byte ops: numeric_negate, numeric_negate_long, numeric_copy, numeric_copy_long, numeric_increase, numeric_decrease, numeric_zero, numeric_shift_byte, numeric_double_shift_bit. Comparators: numeric_compare, numeric_compare_pos. Predicates: numeric_is_negative, numeric_is_zero, numeric_is_long, numeric_is_bigint, numeric_overflow, numeric_is_longnum_value. Scaling: numeric_scale_by_ten, numeric_scale_dec, numeric_common_prec_scale, numeric_prec_scale_when_overflow. Public entry points: numeric_db_value_add, numeric_db_value_sub, numeric_db_value_mul, numeric_db_value_div, numeric_db_value_negate, numeric_db_value_abs, numeric_db_value_compare, numeric_db_value_is_zero, numeric_db_value_increase, numeric_db_value_print. Conversions: numeric_coerce_int_to_num, _bigint_to_num, _double_to_num, _string_to_num, _num_to_int, _num_to_bigint, _num_to_double, _num_to_dec_str, _num_to_num, _dec_str_to_num. DB_VALUE conversion: numeric_db_value_coerce_to_num, _coerce_from_num, _coerce_from_num_strict. Float→BCD: numeric_internal_real_to_num, numeric_fast_convert. Integral/fractional split: numeric_get_integral_part, numeric_get_fractional_part, numeric_is_fraction_part_zero.

String (`string_opfunc.c`)

Concat: db_string_concatenate, qstr_concatenate. Substring: db_string_substring, qstr_substring, qstr_bit_substring, db_string_substring_index. Repeat/space: db_string_repeat, db_string_space. Trim: db_string_trim, qstr_trim. Padding: db_string_lpad, db_string_rpad. Case: db_string_lower, db_string_upper. Replace: db_string_replace, db_string_translate. Misc: db_string_reverse, db_string_quote, db_string_chr, db_string_escape_str, db_string_escape_char. Search: db_string_instr, db_string_position, db_find_string_in_in_set. LIKE: db_string_like, qstr_eval_like. Regex: db_string_rlike, db_string_regexp_count, _regexp_instr, _regexp_like, _regexp_replace, _regexp_substr. Hashing: db_string_md5, db_string_sha_one, db_string_sha_two. Encryption: db_string_aes_encrypt, db_string_aes_decrypt, db_string_des_encrypt. Insert: db_string_insert_substring. Length: db_string_char_length, db_string_bit_length. Helpers: qstr_make_typed_string, qstr_get_category, qstr_compare, qstr_compare_with_collation.

Cryptography (`crypt_opfunc.c`)

SHA: crypt_sha_one, crypt_sha_two, crypt_sha_functions. MD5: crypt_md5_buffer_hex, crypt_md5_buffer_binary. AES/DES: crypt_default_encrypt, crypt_default_decrypt, aes_default_gen_key. Random: crypt_generate_random_bytes. CRC: crypt_crc32. Hex: str_to_hex, str_to_hex_prealloced. dblink: crypt_dblink_encrypt, _decrypt, shake_dblink_password, reverse_shake_dblink_password, crypt_dblink_bin_to_str, _str_to_bin.

Regex (`string_regex*.cpp`)

Façade (string_regex.cpp): cubregex::compile, search, count, instr, replace, substr, plus internal compile_regex_internal, should_compile_skip, parse_match_type, and compiled_regex::~compiled_regex. RE2 backend (string_regex_re2.cpp): re2_compile, re2_search, re2_count, re2_instr, re2_replace, re2_substr, re2_on_match (template helper), re2_split_string_utf8, re2_distance_utf8. std::regex backend (string_regex_std.cpp): std_compile, std_search, std_count, std_instr, std_replace, std_substr, std_parse_regex_exception, std_parse_match_type, plus cub_reg_traits (collation traits subclass that throws on [[. .]] and patches iswblank portability).

Type-system glue

parser/type_checking.c: pt_eval_expr_type (PT_EXPR operator typer), pt_apply_expressions_definition (table-driven overload resolver), pt_eval_function_type (PT_FUNCTION typer), pt_evaluate_function_w_args (constant folding), pt_coerce_expression_argument (PT_CAST insertion).

parser/func_type.cpp: func_signature::get (per-FUNC_CODE signature-table lookup), func_type::Node::type_checking (new function-typing entry), sig_has_json_args, is_type_with_collation, can_signature_have_collation, plus the sig_of_* table.

compat/db_function.hpp: FUNC_CODE enum, fcode_get_lowercase_name, fcode_get_uppercase_name.

Position hints (as of `updated:`)

Symbol	File	Line
`qdata_evaluate_function`	`src/query/query_opfunc.c`	6875
`qdata_add_dbval`	`src/query/query_opfunc.c`	2436
`qdata_subtract_dbval`	`src/query/query_opfunc.c`	4522
`qdata_multiply_dbval`	`src/query/query_opfunc.c`	5225
`qdata_divide_dbval`	`src/query/query_opfunc.c`	5829
`qdata_unary_minus_dbval`	`src/query/query_opfunc.c`	5986
`qdata_add_short_to_dbval`	`src/query/query_opfunc.c`	1701
`qdata_add_int_to_dbval`	`src/query/query_opfunc.c`	1792
`qdata_strcat_dbval`	`src/query/query_opfunc.c`	6111
`qdata_regexp_function`	`src/query/query_opfunc.c`	8627
`qdata_convert_operands_to_value_and_call`	`src/query/query_opfunc.c`	8712
`fetch_peek_arith`	`src/query/fetch.c`	84
`db_floor_dbval`	`src/query/arithmetic.c`	89
`db_abs_dbval`	`src/query/arithmetic.c`	563
`db_power_dbval`	`src/query/arithmetic.c`	832
`db_mod_dbval`	`src/query/arithmetic.c`	1910
`db_round_dbval`	`src/query/arithmetic.c`	2295
`db_trunc_dbval`	`src/query/arithmetic.c`	3289
`db_evaluate_json_extract`	`src/query/arithmetic.c`	5485
`db_evaluate_json_search`	`src/query/arithmetic.c`	6242
`numeric_add`	`src/query/numeric_opfunc.c`	822
`numeric_mul`	`src/query/numeric_opfunc.c`	871
`numeric_long_div`	`src/query/numeric_opfunc.c`	953
`numeric_db_value_add`	`src/query/numeric_opfunc.c`	1596
`numeric_db_value_mul`	`src/query/numeric_opfunc.c`	1797
`numeric_db_value_div`	`src/query/numeric_opfunc.c`	1875
`db_string_concatenate`	`src/query/string_opfunc.c`	942
`db_string_substring`	`src/query/string_opfunc.c`	1727
`db_string_sha_one`	`src/query/string_opfunc.c`	2397
`db_string_sha_two`	`src/query/string_opfunc.c`	2462
`db_string_md5`	`src/query/string_opfunc.c`	2693
`db_string_insert_substring`	`src/query/string_opfunc.c`	2763
`db_string_lower`	`src/query/string_opfunc.c`	3321
`db_string_trim`	`src/query/string_opfunc.c`	3504
`db_string_like`	`src/query/string_opfunc.c`	4214
`db_string_rlike`	`src/query/string_opfunc.c`	4371
`db_string_replace`	`src/query/string_opfunc.c`	6031
`cubregex::compile`	`src/query/string_regex.cpp`	91
`cubregex::search`	`src/query/string_regex.cpp`	120
`re2_compile`	`src/query/string_regex_re2.cpp`	56
`std_compile`	`src/query/string_regex_std.cpp`	96
`crypt_default_encrypt`	`src/query/crypt_opfunc.c`	216
`crypt_sha_one`	`src/query/crypt_opfunc.c`	477
`crypt_sha_two`	`src/query/crypt_opfunc.c`	494
`pt_eval_function_type`	`src/parser/type_checking.c`	12285
`pt_apply_expressions_definition`	`src/parser/type_checking.c`	5778
`FUNC_CODE` enum	`src/compat/db_function.hpp`	26

Cross-check Notes

Caller is the regu evaluator (cubrid-query-evaluator.md). That doc describes how fetch_peek_dbval discriminates on REGU_VARIABLE::type; the two arms relevant here are TYPE_INARITH → fetch_peek_arith and TYPE_FUNC → qdata_evaluate_function. Both exit into this library. Contract: caller hands the callee a domain (TP_DOMAIN *) and a result DB_VALUE *; the callee returns NO_ERROR and writes the result, or returns an error code. Peek-vs-copy discipline is on the caller side — every db_* and qdata_* primitive returns a freshly-allocated value (with need_clear=true) for string/bit/JSON results, and the caller clears via pr_clear_value(funcp->value) (which qdata_evaluate_function does at the top).

Type resolution is in semantic-check (cubrid-semantic-check.md). That doc describes how pt_eval_expr_type and pt_eval_function_type pick an overload from per-operator signature tables and insert PT_CAST nodes. The runtime is robust against a missed cast — the per-pair kernel discriminates on DB_VALUE_DOMAIN_TYPE directly and re-coerces via tp_value_auto_cast. The asymmetry is intentional: it keeps the runtime independent of whether the parser succeeded fully (for system-internal queries that bypass semantic check).

qdata_evaluate_function and fetch_peek_dbval must be kept in sync. The comment /* should sync with fetch_peek_dbval () */ is load-bearing. Every new FUNC_CODE requires a parallel arm in both routines: the regu-variable peek path goes through fetch_peek_dbval’s function-type arm; the explicit XASL-execution path goes through qdata_evaluate_function. Drift shows up as functions that work in some query shapes but fail in others.

numeric_db_value_* is a closed system. The four umbrella routines require both operands to be DB_TYPE_NUMERIC. Cross-type cases are handled outside: qdata_add_numeric_to_dbval does the INT → NUMERIC cast before calling numeric_db_value_add. numeric_opfunc.c has no awareness of any type other than DB_TYPE_NUMERIC.

Regex and RLIKE cache compiled patterns differently. Operator-shape RLIKE/REGEXP (db_string_rlike) stores the compiled pattern in arithptr->aux; function-shape F_REGEXP_* stores it in function_p->tmp_obj->compiled_regex. Both honour should_compile_skip for constant-pattern reuse.

JSON dispatchers are all “convert and call”. The 25 JSON arms forward through qdata_convert_operands_to_value_and_call with the fixed (DB_VALUE *result, DB_VALUE * const *args, int num_args) inner signature. db_evaluate_json_* are ordinary C functions with no XASL awareness; they live in arithmetic.c only because the file historically grew JSON support. A move to a dedicated json_opfunc.c would not change dispatch.

Open Questions

Vectorised execution. The per-pair kernels are scalar — one pair per call. Postgres has experimented with batch interpretation; CUBRID has not. The two-stage switch (fetch_peek_arith plus the per-pair kernel) is already small, but column-store-style L1-locality wins are unavailable to CUBRID’s row-at-a-time iterator.

JIT for hot functions. Postgres’s LLVM-backed ExecCompileExpr saves 30%+ on TPC-H. CUBRID does only eval_fnc-style operator specialisation. The flat XASL representation is well-suited to JIT (every regu-variable is small and fully typed); the gating concern is build-time LLVM cost.

UDF beyond Java SP. Current UDF lane is the Java SP engine (pl_engine/) via JNI. No in-process C-extension SDK; F_GENERIC is vestigial. A native-C extension lane would shorten per-call latency for tight numeric/string custom functions but needs a stable ABI for DB_VALUE.

Why two regex backends? Both RE2 and std::regex are linked unconditionally; PRM_ID_REGEXP_ENGINE flips per-session. Historical: std::regex was first, RE2 was added for predictability, std::regex kept for backreference compatibility. Whether RE2 becomes the default is a roadmap question.

Why not split string_opfunc.c? At 693 KB it is the largest file in the engine. Project guidelines forbid splitting large files (the inter-procedural inlining of qstr_* helpers would suffer), but cognitive cost for new contributors is high. No measurement has been done either way.

Constant folding granularity. pt_evaluate_function_w_args folds calls whose arguments are all PT_VALUE; impure functions (RAND, NOW, BENCHMARK) are excluded by hand-coded predicates. A “purity” annotation in the signature table could automate this — none currently exists.

Sources

CUBRID source files: src/query/arithmetic.c, numeric_opfunc.c, string_opfunc.c, query_opfunc.c, crypt_opfunc.c, string_regex.cpp / string_regex.hpp / string_regex_constants.hpp, string_regex_re2.cpp, string_regex_std.cpp, fetch.c; src/parser/type_checking.c, func_type.cpp / func_type.hpp; src/compat/db_function.hpp.

Theoretical references: Silberschatz, Korth, Sudarshan, Database System Concepts, 7th ed., chs. 4–5; Date, An Introduction to Database Systems, 8th ed., ch. 4; ISO/IEC 9075:2016 (SQL standard); Knuth, TAOCP vol. 2 §4.3 (multi-precision arithmetic); Cox, “Regular Expression Matching Can Be Simple And Fast”, swtch.com 2007; PostgreSQL fmgr.c / clauses.c / pg_proc.h for cross-engine comparison.

Cross-references: knowledge/code-analysis/cubrid/cubrid-query-evaluator.md (the regu-variable evaluator that calls into this library), knowledge/code-analysis/cubrid/cubrid-semantic-check.md (overload resolution that feeds the runtime dispatch).