Skip to content

CUBRID JSON_TABLE — Table Function Turning JSON Documents Into Virtual Rows

Contents:

JSON_TABLE is a table function: invoked in FROM, it returns a relation. Textbooks call leaves of an operator tree that produce tuples “row sources” or “table expressions”; JSON_TABLE (expr, '$.path' COLUMNS (...)) takes a JSON document and a column specification and the engine pretends the result is a base table.

The standard reference is ISO/IEC 9075-2:2016 §6.36 “JSON_TABLE” in the JSON support package (X401/X402). It prescribes four column shapes — regular (name type PATH 'expr'), EXISTS (name BOOLEAN EXISTS PATH 'expr'), ORDINALITY (name FOR ORDINALITY), NESTED (NESTED PATH 'expr' COLUMNS (...)) — plus per-column ON ERROR/ON EMPTY clauses each with NULL (default), ERROR (raise), or DEFAULT v (substitute). These knobs make JSON_TABLE an ETL-grade primitive: it cannot crash a batch when one row’s $.user.age is the string "unknown" instead of a number.

The query language is JSONPath, from Goessner’s 2007 article and standardised as RFC 9535 (2024). CUBRID’s surface syntax is the conservative subset all major engines agree on: $ (root), .name, [i], [*], optional ** (recursive descent). The semantic core is the array-wildcard at the table root ('$.[*]'): turns one document into multiple rows. CUBRID, like MySQL, makes the parent-child relationship a left-join with sibling-NULL semantics rather than a Cartesian product across siblings.

The implementation pillar is the iterator model on tree-shaped state. A JSON_TABLE invocation has a tree of column specs (root + each NESTED PATH recursively); a per-call cursor walks that tree depth-first while the executor pulls rows. Graefe’s Volcano paper (TKDE 1994) frames this as a tree of nested iterators: outer advances when inner is exhausted; inner resets per outer position. JSON_TABLE applies that pattern to JSON-array iteration. The cursor recording “where am I in this tree” is the essential state.

The last pillar is column projection with type coercion. JSON is untyped; SQL columns have fixed types. The mapping is lossy — every (JSON kind × SQL kind) pair must have a defined behaviour, with ON ERROR/ON EMPTY as the implementation-defined fallback. CUBRID reuses tp_value_cast for coercion and a per-column json_table_column_behavior record for the failure path.

Every engine that ships JSON_TABLE (or its equivalent) converges on three layers: a parser-built spec tree, a per-row cursor, and a per-column evaluator coercing JSON to SQL.

MySQL (8.0+) was first to ship JSON_TABLE and is the closest model. It builds a Json_table_column tree at parse time; execution is Json_seekable_iterator walking Json_dom with per-column subclasses (Path, Exists, For_ordinality, Nested). The “left-join the nested path” rule is the one CUBRID copies verbatim in the header of scan_json_table.hpp.

MariaDB (10.6+) walks a Json_table_column tree via Json_table_nested_path::scan_next recursive descent — close to CUBRID’s scan_next_internal.

Oracle added JSON_TABLE in 12c (2013), predating the standard, as a row-source operator that also handles XMLTABLE. Supports richer JSONPath filters (?(@.age > 18)).

SQL Server ships OPENJSON (@json, '$.path') WITH (...) instead. No native NESTED PATH; nested expansion uses CROSS APPLY OPENJSON(...).

PostgreSQL added JSON_TABLE in 17 (2024) atop jsonb_to_recordset + jsonb_path_query. Execution: ExecJsonTable + JsonTablePlanState over a JsonTablePlan tree — closest in shape to CUBRID.

Where CUBRID sits. Functionally complete against SQL:2016. One C++ class (cubscan::json_table::scanner) with a depth-keyed cursor stack and recursive scan_next_internal. Persistent state (the spec tree of cubxasl::json_table::node) lives in src/xasl/access_json_table.cpp so it survives XASL serialisation; transient state lives in the scanner. Plugs into SCAN_ID via S_JSON_TABLE_SCAN so the executor treats it like any other scan.

The implementation splits across three layers in three directories.

flowchart TB
  subgraph Parser["src/parser/ — parse-time"]
    G[csql_grammar.y]
    PT[PT_JSON_TABLE family]
    G --> PT
  end
  subgraph XaslGen["src/xasl/"]
    SP[cubxasl::json_table::spec_node]
    NN[node + column]
    XS[xasl_stream.cpp]
    PT --> SP --> NN
    SP --> XS
  end
  subgraph Runtime["src/query/"]
    SC[cubscan::json_table::scanner]
    CUR[cursor stack]
    SID[SCAN_ID.s.jtid<br/>S_JSON_TABLE_SCAN]
    SP --> SC --> CUR
    SC -.alias.-> SID
  end
  Runtime --> Exec[qexec_open_scan: TARGET_JSON_TABLE]

The naming alias is what makes JSON_TABLE a first-class SCAN_TYPE rather than a bolt-on:

// scan_json_table.hpp — naming alias
using JSON_TABLE_SCAN_ID = cubscan::json_table::scanner;

Because scanner is a C++17 standard-layout class with a default constructor, the C-side union in scan_id_struct embeds it by value. The cost (a few dozen bytes per SCAN_ID even when the scan is a heap scan) buys discriminator-check dispatch with no virtual call and no heap allocation.

The parser turns JSON_TABLE (expr, '$.path' COLUMNS (...)) into three PT_NODE types defined in parse_tree.h:

// parse_tree.h — condensed
struct pt_json_table_info { PT_NODE *expr; PT_NODE *tree; bool is_correlated; };
struct pt_json_table_node_info { PT_NODE *columns; PT_NODE *nested_paths; char *path; };
struct pt_json_table_column_info{ PT_NODE *name; char *path; size_t index;
enum json_table_column_function func;
json_table_column_behavior on_error, on_empty; };

The grammar (csql_grammar.y) builds this tree in json_table_rule / json_table_node_rule / json_table_column_rule; pt_jt_append_column_or_nested_node dispatches each parsed item into either columns or nested_paths of the current node. Three column kinds exist by json_table_column_function: JSON_TABLE_EXTRACT (default), JSON_TABLE_EXISTS, JSON_TABLE_ORDINALITY.

XASL generation transforms the parser tree into a runtime tree under namespace cubxasl::json_table:

// access_json_table.hpp — the runtime spec tree
struct column {
tp_domain *m_domain; // SQL type to coerce to
char *m_path; char *m_column_name;
json_table_column_behavior m_on_error, m_on_empty;
db_value *m_output_value_pointer; // aliases an xasl outptr_list slot
json_table_column_function m_function;
int evaluate (const JSON_DOC &input, size_t ordinality);
};
struct node {
char *m_path; size_t m_ordinality; // 1-based row counter
column *m_output_columns; size_t m_output_columns_size;
node *m_nested_nodes; size_t m_nested_nodes_size;
size_t m_id;
JSON_ITERATOR *m_iterator; // walks current array
bool m_is_iterable_node; // true when m_path ends in [*]
void init_iterator (); void clear_columns (bool is_final_clear);
};
struct spec_node {
node *m_root_node;
regu_variable_node *m_json_reguvar; // JSON expr feeding the scan
std::size_t m_node_count;
};

This tree survives XASL serialisation. xasl_stream.cpp carries stx_build overloads for each of column, node, spec_node. The wire format is depth-first recursive: each node writes m_output_columns_size, the columns, m_nested_nodes_size, the children. The server-side tree is rebuilt with stx_alloc_struct from the per-query private pool, so its lifetime is bounded by the XASL invocation.

The crucial property is that m_output_value_pointer aliases an entry in the surrounding XASL’s outptr_list: when a column writes to its output value, the XASL’s projection layer reads the same memory.

Column evaluation — three shapes, one entry

Section titled “Column evaluation — three shapes, one entry”

Every column kind funnels through column::evaluate, which dispatches by m_function:

// column::evaluate — src/xasl/access_json_table.cpp
int column::evaluate (const JSON_DOC &input, size_t ordinality)
{
pr_clear_value (m_output_value_pointer); db_make_null (m_output_value_pointer);
switch (m_function) {
case JSON_TABLE_EXTRACT: return evaluate_extract (input);
case JSON_TABLE_EXISTS: return evaluate_exists (input);
case JSON_TABLE_ORDINALITY: return evaluate_ordinality (ordinality);
default: return ER_FAILED;
}
}

evaluate_extract calls db_json_extract_document_from_path (input, m_path, docp); on empty path-match it fires trigger_on_empty; if tp_value_cast returns DOMAIN_INCOMPATIBLE (JSON string "abc" vs SQL INTEGER) it fires trigger_on_error. evaluate_exists calls db_json_contains_path, stores 1/0, then casts to the column domain. evaluate_ordinality writes the ordinality argument as an integer.

trigger_on_error and trigger_on_empty encode the behaviour matrix:

// column::trigger_on_error — condensed
switch (m_on_error.m_behavior) {
case JSON_TABLE_RETURN_NULL: er_clear (); return NO_ERROR;
case JSON_TABLE_THROW_ERROR: er_set (..., ER_JSON_TABLE_ON_ERROR_INCOMP_DOMAIN, ...);
return ER_JSON_TABLE_ON_ERROR_INCOMP_DOMAIN;
case JSON_TABLE_DEFAULT_VALUE: pr_clone_value (m_on_error.m_default_value, &value_out);
return NO_ERROR;
}

er_clear () on the RETURN_NULL arm is load-bearing: earlier code may have set a thread-local error during the failed extract; the column swallows it and continues. trigger_on_empty is the same matrix.

Scanner and cursor stack — the transient state

Section titled “Scanner and cursor stack — the transient state”

The runtime layer is a single class with one private inner struct:

// scan_json_table.hpp / .cpp — condensed
class scanner {
public:
void init (cubxasl::json_table::spec_node &spec);
void clear (xasl_node *xasl_p, bool is_final, bool is_final_clear);
int open (cubthread::entry *thread_p);
void end (cubthread::entry *thread_p);
int next_scan (cubthread::entry *thread_p, scan_id_struct &sid, SCAN_CODE &sc);
SCAN_PRED &get_predicate (); void set_value_descriptor (val_descr *vd);
private:
struct cursor;
int scan_next_internal (cubthread::entry *thread_p, size_t depth, bool &found_row_output);
int init_cursor (const JSON_DOC &doc, cubxasl::json_table::node &node, cursor &cursor_out);
int set_next_cursor (const cursor &current_cursor, size_t next_depth);
int set_input_document (cursor &c, const cubxasl::json_table::node &node, const JSON_DOC &doc);
size_t get_tree_height (const cubxasl::json_table::node &node);
void init_iterators (cubxasl::json_table::node &node);
void reset_ordinality (cubxasl::json_table::node &node);
void clear_node_columns (cubxasl::json_table::node &node);
cubxasl::json_table::spec_node *m_specp;
cursor *m_scan_cursor; size_t m_scan_cursor_depth; size_t m_tree_height;
scan_pred m_scan_predicate; val_descr *m_vd;
};
struct scanner::cursor {
std::size_t m_child; // current child index in non-leaf walk
cubxasl::json_table::node *m_node; // points back into spec tree
JSON_DOC_STORE m_input_doc; // input document at this depth
const JSON_DOC *m_process_doc; // current iterator value or input doc
bool m_is_row_fetched, m_need_advance_row, m_is_node_consumed, m_iteration_started;
void advance_row_cursor (); void start_json_iterator ();
int fetch_row (); void end ();
};

m_scan_cursor is a flat array sized to m_tree_height (computed once by get_tree_height), not a linked list. Depth never exceeds spec-tree height (a static property of the parse). m_scan_cursor_depth is the index of the deepest open level — where scan_next_internal is currently advancing rows. The flat-array shape is what makes recursion-by-depth+1 a constant-cost operation.

scanner::init runs once per server-side scan-id construction. It computes m_tree_height recursively, allocates the cursor array, and pre-points each cursor at the left-most branch of the spec tree:

// scanner::init — src/query/scan_json_table.cpp
void scanner::init (cubxasl::json_table::spec_node &spec) {
m_specp = &spec;
m_tree_height = get_tree_height (*m_specp->m_root_node);
m_scan_cursor_depth = 0;
m_scan_cursor = new cursor[m_tree_height];
json_table_node *t = m_specp->m_root_node;
m_scan_cursor[0].m_node = t;
for (int i = 1; t->m_nested_nodes_size != 0; t = &t->m_nested_nodes[0], ++i)
m_scan_cursor[i].m_node = t;
init_iterators (*m_specp->m_root_node);
}

init_iterators calls node::init_iterator on every iterable node (m_is_iterable_node true ⇔ path ends in [*]); the iterator is reused (rewound, not reallocated) on subsequent open calls.

scanner::open runs on the first next_scan (when sid.position == S_BEFORE):

// scanner::open — src/query/scan_json_table.cpp (condensed)
int scanner::open (cubthread::entry *thread_p) {
DB_VALUE *value_p = NULL;
int err = fetch_peek_dbval (thread_p, m_specp->m_json_reguvar,
m_vd, NULL, NULL, NULL, &value_p);
if (err != NO_ERROR) return err;
if (db_value_is_null (value_p)) {
assert (m_scan_cursor[0].m_is_node_consumed); // NULL input ⇒ no rows
return NO_ERROR;
}
if (db_value_type (value_p) == DB_TYPE_JSON) {
err = init_cursor (*db_get_json_document (value_p),
*m_specp->m_root_node, m_scan_cursor[0]);
} else {
JSON_DOC_STORE document;
err = db_value_to_json_doc (*value_p, false, document);
if (err != NO_ERROR) return err;
err = init_cursor (*document.get_immutable (),
*m_specp->m_root_node, m_scan_cursor[0]);
}
reset_ordinality (*m_specp->m_root_node);
m_scan_cursor_depth = 0;
return err;
}

fetch_peek_dbval (the regu-variable evaluator — see cubrid-scalar-functions.md) materialises m_specp->m_json_reguvar: a column ref, a host variable, or an expression like JSON_OBJECT(...). The result may already be DB_TYPE_JSON (fast path) or a string passing through db_value_to_json_doc (slow path: parse via rapidjson). The latter is what lets JSON_TABLE (varchar_col_with_json, ...) work without an explicit CAST.

init_cursor delegates to set_input_document, which does the table-level path extraction:

// scanner::set_input_document — src/query/scan_json_table.cpp
int scanner::set_input_document (cursor &cursor_arg,
const cubxasl::json_table::node &node,
const JSON_DOC &document) {
cursor_arg.m_input_doc.clear ();
int err = db_json_extract_document_from_path (&document, node.m_path,
cursor_arg.m_input_doc);
if (err != NO_ERROR) return err;
if (cursor_arg.m_input_doc.is_null ())
cursor_arg.m_is_node_consumed = true; // path matched nothing
else
cursor_arg.start_json_iterator ();
return NO_ERROR;
}

cursor::start_json_iterator decides whether to iterate or treat the input as a single row: if m_is_iterable_node is true (path ends in [*]), db_json_set_iterator rewinds the node’s JSON_ITERATOR to position 0 of the array; otherwise the cursor is left in single-row mode and m_process_doc = m_input_doc.

The public per-row entry wraps scan_next_internal plus the scanner-local predicate filter:

// scanner::next_scan — src/query/scan_json_table.cpp (condensed)
int scanner::next_scan (cubthread::entry *thread_p, scan_id_struct &sid, SCAN_CODE &sc) {
bool has_row = false; DB_LOGICAL logical = V_FALSE;
if (sid.position == S_BEFORE) {
int err = open (thread_p);
if (err != NO_ERROR) { sc = S_ERROR; return err; }
sid.position = S_ON; sid.status = S_STARTED;
} else if (sid.position != S_ON) { sc = S_END; return ER_FAILED; }
while (true) {
int err = scan_next_internal (thread_p, 0, has_row);
if (err != NO_ERROR) { sc = S_ERROR; return err; }
if (!has_row) { sid.position = S_AFTER; sc = S_END; return NO_ERROR; }
if (m_scan_predicate.pred_expr == NULL) break;
logical = m_scan_predicate.pr_eval_fnc (thread_p, m_scan_predicate.pred_expr, sid.vd, NULL);
if (logical == V_TRUE) break;
if (logical == V_ERROR) { sc = S_ERROR; return ER_FAILED; }
// V_FALSE / V_UNKNOWN → loop, try next row
}
sc = S_SUCCESS; return NO_ERROR;
}

Two design points. Lazy open on first call: next_scan does the first fetch_peek_dbval itself; for correlated JSON_TABLE inside a nested-loop join, each outer row triggers one open, re-fetches the JSON expression, and rebuilds the cursor. Predicate filter is scanner-local: skipping a non-qualifying row must still advance the JSON iterator; if next_scan returned a “skip” code to scan_handle_single_scan, the outer would re-enter from the same iterator state. Only the scanner knows the iterator position, so it owns the skip-loop.

scan_next_internal — the FSM in disguise

Section titled “scan_next_internal — the FSM in disguise”

The recursive engine is ~120 lines; every flag carries weight. The shape:

// scanner::scan_next_internal — src/query/scan_json_table.cpp (condensed)
int scanner::scan_next_internal (cubthread::entry *thread_p, size_t depth, bool &found_row_output) {
cursor &this_cursor = m_scan_cursor[depth];
// (A) Resume into child if we descended on a previous call.
if (m_scan_cursor_depth >= depth + 1) {
int err = scan_next_internal (thread_p, depth + 1, found_row_output);
if (err != NO_ERROR) return err;
if (found_row_output) return NO_ERROR;
this_cursor.m_child++;
}
while (!this_cursor.m_is_node_consumed) {
if (this_cursor.m_need_advance_row) {
this_cursor.advance_row_cursor ();
if (this_cursor.m_is_node_consumed) break;
}
int err = this_cursor.fetch_row ();
if (err != NO_ERROR) return err;
// (C) Leaf: every row is output.
if (this_cursor.m_node->m_nested_nodes_size == 0) {
found_row_output = true;
this_cursor.m_need_advance_row = true;
return NO_ERROR;
}
// (D) Non-leaf, all children visited: at most one spine row.
if (this_cursor.m_child == this_cursor.m_node->m_nested_nodes_size) {
this_cursor.m_need_advance_row = true;
if (this_cursor.m_iteration_started) continue; // children already emitted
found_row_output = true; return NO_ERROR; // sibling-NULL row
}
// (E) Descend into next child.
err = set_next_cursor (this_cursor, depth + 1);
if (err != NO_ERROR) return err;
cursor &next_cursor = m_scan_cursor[depth + 1];
if (!next_cursor.m_is_node_consumed) {
m_scan_cursor_depth++;
this_cursor.m_iteration_started = true;
err = scan_next_internal (thread_p, depth + 1, found_row_output);
if (err != NO_ERROR) return err;
} else { this_cursor.m_child++; continue; }
if (found_row_output) return NO_ERROR;
this_cursor.m_child++;
}
// (F) Node fully consumed — pop a level.
found_row_output = false;
if (m_scan_cursor_depth > 0) m_scan_cursor_depth--;
return NO_ERROR;
}
stateDiagram-v2
  [*] --> CursorOpen : init_cursor
  CursorOpen --> FetchRow : iterator has element
  FetchRow --> EmitLeaf : leaf node
  FetchRow --> DescendChild : has children, m_child < N
  DescendChild --> ChildEmits
  ChildEmits --> Caller
  Caller --> ResumeChild
  FetchRow --> EmitSpine : m_child == N AND NOT m_iteration_started
  FetchRow --> Skip : m_child == N AND m_iteration_started
  EmitLeaf --> AdvanceRow
  Skip --> AdvanceRow
  AdvanceRow --> CursorOpen : advance_row_cursor
  AdvanceRow --> NodeDone : iterator exhausted

Three rules fall out of the flag manipulation. Rule 1: a leaf row is always output (arc C); m_need_advance_row ensures the next call advances. Rule 2: a non-leaf spine row is output only if no child emitted (arc D, gated by m_iteration_started) — this implements SQL:2016 left-join semantics where empty NESTED PATH arrays still produce a parent row with NULLs. Rule 3: siblings do not cross-product — the header comment is explicit: “while one nested path is expanded, the values for sibling nested paths will be all null.” This is enforced by cursor::end which calls node::clear_columns to NULL out the just-finished sibling’s columns before the next sibling’s expansion populates its own:

// scanner::cursor::end — src/query/scan_json_table.cpp
void scanner::cursor::end (void) {
m_is_node_consumed = true;
db_json_reset_iterator (m_node->m_iterator);
m_process_doc = NULL;
m_node->clear_columns (false); // NULL all this node's columns
}

fetch_row — populate the columns at this depth

Section titled “fetch_row — populate the columns at this depth”
// scanner::cursor::fetch_row — src/query/scan_json_table.cpp
int scanner::cursor::fetch_row (void) {
if (m_is_row_fetched) return NO_ERROR; // idempotent guard
m_process_doc = (m_node->m_iterator != NULL)
? db_json_iterator_get_document (*m_node->m_iterator) // peek into iterator
: m_input_doc.get_immutable (); // single-row mode
for (size_t i = 0; i < m_node->m_output_columns_size; ++i) {
int err = m_node->m_output_columns[i].evaluate
(*m_process_doc, m_node->m_ordinality);
if (err != NO_ERROR) return err;
}
return NO_ERROR;
}

db_json_iterator_get_document returns a peek pointer into the iterator’s current array element — no copy, no allocation. Each column’s evaluate re-extracts its own per-column path relative to this peek document. The two-level path system (node-level path + column-level path) is what makes '$.users[*]' COLUMNS (a INT PATH '$.age') work: row context from the node, per-column extraction from the column.

advance_row_cursor increments the iterator and the per-node ordinality:

// scanner::cursor::advance_row_cursor — src/query/scan_json_table.cpp
void scanner::cursor::advance_row_cursor () {
m_need_advance_row = false; m_iteration_started = false;
if (m_node->m_iterator == NULL || !db_json_iterator_has_next (*m_node->m_iterator))
{ end (); return; }
db_json_iterator_next (*m_node->m_iterator);
m_is_row_fetched = false;
m_node->m_ordinality++; m_child = 0;
}

The 1-based ordinality lives on the node (one counter per nesting level). reset_ordinality resets all of them to 1 at every open.

For SELECT * FROM JSON_TABLE ('{"a":1, "arr":[{"c":10},{"c":20}]}', '$' COLUMNS (a INT PATH '$.a', NESTED PATH '$.arr[*]' COLUMNS (c INT PATH '$.c'))) jt; the spec is root ($, column a, one child) and child ($.arr[*], iterable, column c).

sequenceDiagram
  participant NS as scanner::next_scan
  participant SI as scan_next_internal
  participant Root as cursor[0]
  participant Child as cursor[1]

  NS->>NS: open: fetch JSON, init_cursor[0]
  NS->>SI: scan_next_internal(0)
  SI->>Root: fetch_row → a:=1
  SI->>Child: set_next_cursor, recurse(1), fetch_row → c:=10
  SI-->>NS: row 1: (a=1, c=10), m_need_advance_row=true
  NS->>SI: next_scan call → resume at depth=1
  SI->>Child: advance_row_cursor → c:=20
  SI-->>NS: row 2: (a=1, c=20)
  NS->>SI: next call → resume(1) → iterator exhausted ⇒ end
  SI->>Root: m_iteration_started=true ⇒ continue, advance, end
  SI-->>NS: S_END

If the JSON had been {"a":1, "arr":[]}, the child would be m_is_node_consumed=true immediately after init_cursor. Arc (D) of scan_next_internal then fires on the root: m_iteration_started is false, so the spine row (a=1, c=NULL) is emitted — the SQL:2016 left-join behaviour.

Cleanup — clear, end, and the rebind path

Section titled “Cleanup — clear, end, and the rebind path”

scanner::clear is parameterised by two booleans that select among three cleanup intensities:

// scanner::clear — src/query/scan_json_table.cpp (condensed)
void scanner::clear (xasl_node *xasl_p, bool is_final, bool is_final_clear) {
m_specp->m_root_node->clear_xasl (is_final_clear);
reset_ordinality (*m_specp->m_root_node);
if (is_final) {
for (size_t i = 0; i < m_tree_height; ++i) { /* reset cursor[i] flags */ }
m_specp->m_root_node->clear_iterators (is_final_clear);
if (is_final_clear) delete [] m_scan_cursor;
}
}

is_final=false: scan paused (next outer-loop row); only column values are cleared. is_final=true, is_final_clear=false: scan ends this iteration but XASL will re-execute; cursors and iterators are emptied but allocations kept. is_final=true, is_final_clear=true: XASL torn down for good; iterators are deleted and the cursor array freed. This three-way granularity matches the executor’s cache-vs-tear-down policy and lets a JSON_TABLE survive correlated rebinds without leaking.

Wiring into SCAN_ID — the dispatch layer

Section titled “Wiring into SCAN_ID — the dispatch layer”

Fan-out is two functions in scan_manager.c:

// scan_open_json_table_scan + scan_next_json_table_scan — src/query/scan_manager.c
int scan_open_json_table_scan (THREAD_ENTRY *thread_p, SCAN_ID *scan_id, int grouped,
QPROC_SINGLE_FETCH single_fetch, DB_VALUE *join_dbval,
val_list_node *val_list, VAL_DESCR *vd, PRED_EXPR *pr) {
DB_TYPE single_node_type = DB_TYPE_NULL;
assert (scan_id->type == S_JSON_TABLE_SCAN);
scan_init_scan_id (scan_id, false, S_SELECT, true, grouped, single_fetch,
join_dbval, val_list, vd);
scan_init_scan_pred (&scan_id->s.jtid.get_predicate (), NULL, pr,
((pr) ? eval_fnc (thread_p, pr, &single_node_type) : NULL));
scan_id->s.jtid.set_value_descriptor (vd);
return NO_ERROR;
}
static SCAN_CODE scan_next_json_table_scan (THREAD_ENTRY *thread_p, SCAN_ID *scan_id) {
SCAN_CODE sc;
int err = scan_id->s.jtid.next_scan (thread_p, *scan_id, sc);
return (err != NO_ERROR) ? S_ERROR : sc;
}

scan_open_json_table_scan does not call scanner::init; the scanner state (m_specp, cursor array, iterators) is built by XASL deserialisation in xasl_stream.cpp plus a call to scanner::init triggered when qexec_open_scan traverses the unpacked XASL_NODE. This function only sets up the outer SCAN_ID metadata.

Dispatch back into the scanner: one arm in scan_next_scan_local (case S_JSON_TABLE_SCAN: status = scan_next_json_table_scan (thread_p, scan_id);) and one arm in qexec_open_scan (case TARGET_JSON_TABLE: calling scan_open_json_table_scan). XASL generation tags the access-spec TARGET_JSON_TABLE based on PT_DERIVED_JSON_TABLE at semantic-check time.

flowchart LR
  Q[SQL with JSON_TABLE] --> P[PT_JSON_TABLE]
  P --> XG[xasl_generation:<br/>ACCESS_SPEC TARGET_JSON_TABLE,<br/>spec_node]
  XG --> XS[xasl_stream serialise]
  XS --> QE["qexec_open_scan: TARGET_JSON_TABLE"]
  QE --> SID[SCAN_ID.type = S_JSON_TABLE_SCAN]
  SID --> NL[scan_next_scan_local switch]
  NL --> NS[scanner::next_scan]
  NS --> SI[scan_next_internal]
  SI --> JI[db_json_iterator_*<br/>db_json_extract_document_from_path]
  SI --> CE[column::evaluate<br/>tp_value_cast]
  CE --> OL[outptr_list DB_VALUEs]
  NS --> Pred[m_scan_predicate.pr_eval_fnc]

Symbols are anchored on names; line numbers in the position-hints table are decay-prone hints scoped to this document’s updated: date.

Cross-module header — src/compat/json_table_def.h

Section titled “Cross-module header — src/compat/json_table_def.h”

enum json_table_column_behavior_type (RETURN_NULL/THROW_ERROR/DEFAULT_VALUE); enum json_table_column_function (EXTRACT/EXISTS/ORDINALITY); struct json_table_column_behavior (m_behavior plus m_default_value); enum json_table_expand_type (ARRAY/OBJECT/NO_EXPAND — advisory; runtime uses m_is_iterable_node).

Parser — src/parser/csql_grammar.y, name_resolution.c, parse_tree.h

Section titled “Parser — src/parser/csql_grammar.y, name_resolution.c, parse_tree.h”

JSON_TABLE keyword; bison productions json_table_rule, json_table_node_rule, json_table_column_rule, json_table_column_list_rule, json_table_on_error_rule_optional, json_table_on_empty_rule_optional, json_table_column_behavior_rule; pt_jt_append_column_or_nested_node (column-vs-nested separator). PT_NODE types PT_JSON_TABLE, PT_JSON_TABLE_NODE, PT_JSON_TABLE_COLUMN; info structs pt_json_table_info, pt_json_table_node_info, pt_json_table_column_info; derived-table type PT_DERIVED_JSON_TABLE. Name resolution: pt_get_all_json_table_attributes_and_types, pt_json_table_gather_attribs (synthesise as_attr_list); json_table_column_count (per-invocation index counter).

XASL spec layer — src/xasl/access_json_table.{hpp,cpp}

Section titled “XASL spec layer — src/xasl/access_json_table.{hpp,cpp}”

Namespace cubxasl::json_table with structs column, node, spec_node. Column: evaluate, evaluate_extract, evaluate_exists, evaluate_ordinality, trigger_on_error, trigger_on_empty, clear_xasl. Node: init_iterator, clear_columns, clear_iterators, clear_xasl, init_ordinality. Spec: clear_xasl. Aliases json_table_column, json_table_node, json_table_spec_node.

XASL wire format — src/xasl/xasl_stream.cpp

Section titled “XASL wire format — src/xasl/xasl_stream.cpp”

stx_build overloads for column, node, spec_node; stx_unpack for json_table_column_behavior; xasl_stream_compare overloads for cache equivalence.

Runtime scanner — src/query/scan_json_table.{hpp,cpp}

Section titled “Runtime scanner — src/query/scan_json_table.{hpp,cpp}”

Class cubscan::json_table::scanner; inner scanner::cursor. Cursor methods: advance_row_cursor, start_json_iterator, fetch_row, end. Scanner public: init, clear, open, end, next_scan, get_predicate, set_value_descriptor. Scanner private: get_tree_height, init_iterators, reset_ordinality, clear_node_columns, set_input_document, init_cursor, set_next_cursor, scan_next_internal. Alias JSON_TABLE_SCAN_ID = cubscan::json_table::scanner.

SCAN_TYPE::S_JSON_TABLE_SCAN, scan_id_struct.s.jtid, scan_open_json_table_scan, scan_next_json_table_scan (in scan_manager.c); the case S_JSON_TABLE_SCAN: arms in scan_next_scan_local, scan_start_scan, scan_end_scan, scan_close_scan, scan_reset_scan_block, scan_next_scan_block (most no-ops). case TARGET_JSON_TABLE: arm in qexec_open_scan (query_executor.c).

JSON support kernels (consumed) — src/compat/db_json.{hpp,cpp}

Section titled “JSON support kernels (consumed) — src/compat/db_json.{hpp,cpp}”

db_json_extract_document_from_path (path kernel), db_json_contains_path (EXISTS PATH), iterator API (db_json_set_iterator / iterator_next / _has_next / _get_document / create_iterator / delete_json_iterator / clear_json_iterator / reset_iterator), db_value_to_json_doc (implicit cast for VARCHAR JSON), db_json_get_type, db_json_get_raw_json_body_from_document.

SymbolFileLine
enum json_table_column_functionsrc/compat/json_table_def.h38
enum json_table_column_behavior_typesrc/compat/json_table_def.h31
struct json_table_column_behaviorsrc/compat/json_table_def.h45
cubxasl::json_table::columnsrc/xasl/access_json_table.hpp46
cubxasl::json_table::nodesrc/xasl/access_json_table.hpp74
cubxasl::json_table::spec_nodesrc/xasl/access_json_table.hpp96
column::trigger_on_error / _on_emptysrc/xasl/access_json_table.cpp43 / 81
column::evaluate_extract / _exists / _ordinalitysrc/xasl/access_json_table.cpp129 / 171 / 197
column::evaluate / clear_xaslsrc/xasl/access_json_table.cpp207 / 235
node::clear_columns / clear_iterators / clear_xasl / init_iteratorsrc/xasl/access_json_table.cpp270 / 280 / 298 / 309
spec_node::clear_xaslsrc/xasl/access_json_table.cpp337
cubscan::json_table::scanner / JSON_TABLE_SCAN_ID aliassrc/query/scan_json_table.hpp109 / 171
scanner::cursor (struct)src/query/scan_json_table.cpp37
cursor::advance_row_cursor / start_json_iterator / fetch_row / endsrc/query/scan_json_table.cpp72 / 98 / 109 / 150
scanner::get_tree_height / init / clear / open / endsrc/query/scan_json_table.cpp161 / 175 / 198 / 229 / 290
scanner::next_scan / set_input_document / init_cursor / set_next_cursorsrc/query/scan_json_table.cpp296 / 359 / 387 / 397
scanner::clear_node_columns / init_iterators / reset_ordinalitysrc/query/scan_json_table.cpp405 / 415 / 426
scanner::scan_next_internal / get_predicate / set_value_descriptorsrc/query/scan_json_table.cpp437 / 564 / 570
scan_open_json_table_scan / scan_next_json_table_scansrc/query/scan_manager.c4036 / 7014
case S_JSON_TABLE_SCAN in scan_next_scan_localsrc/query/scan_manager.c5273
case TARGET_JSON_TABLE in qexec_open_scansrc/query/query_executor.c7591
S_JSON_TABLE_SCAN enum / jtid union membersrc/query/scan_manager.h83 / 412
bison json_table_rule / _node_rule / _column_rule / _column_behavior_rulesrc/parser/csql_grammar.y21957 / 21944 / 21888 / 21838
pt_get_all_json_table_attributes_and_types / pt_json_table_gather_attribssrc/parser/name_resolution.c4972 / 4952
stx_build (column / node / spec_node) / stx_unpack (behavior)src/xasl/xasl_stream.cpp360 / 393 / 437 / 463
db_json_iterator_next / _has_next / _get_document / set_iterator / reset_iterator / create_iterator / delete_json_iterator / clear_json_iteratorsrc/compat/db_json.hpp136 / 138 / 137 / 139 / 140 / 142 / 143 / 144

Against cubrid-scan-manager.md. That doc owns the SCAN_ID polymorphism and lifecycle protocol; the boundary is the case S_JSON_TABLE_SCAN: arm in scan_next_scan_local plus the scan_open_json_table_scan / scan_next_json_table_scan shim pair. This doc owns the cursor stack, FSM, and NESTED-PATH semantics.

Against cubrid-scalar-functions.md. That doc lists db_evaluate_json_extract etc. as the per-F_JSON_* scalar functions. JSON_TABLE does not go through that path; column::evaluate_extract calls db_json_extract_document_from_path directly. Reason: JSON_TABLE extracts over a peek iterator (no copy, no DB_VALUE allocation around the JSON_DOC); scalar JSON_EXTRACT produces a fresh DB_VALUE of DB_TYPE_JSON. Kernel shared, wrapping differs.

Predicate placement is scanner-internal. The WHERE filter is wired into m_scan_predicate and evaluated inside scanner::next_scan, not at the SCAN_ID-level scan_handle_single_scan. A skipped row must still advance the JSON iterator; only the inner function knows that state. Same pattern is used by scan_next_set_scan.

m_output_value_pointer aliases the XASL outptr_list. The XASL builder allocates one DB_VALUE per output column; the column writes via db_make_json_from_doc_store_and_release directly to that aliased slot. The pr_clear_value at the start of every evaluate is the lifetime contract.

Iterator allocation is per-XASL-invocation. JSON_ITERATORs attached to each node are allocated at scanner::init, rewound (not reallocated) at every start_json_iterator, and freed by clear_iterators with is_final_clear=true — avoiding per-row allocation on large arrays.

Re-entry on rebind. For correlated JSON_TABLE inside a nested-loop join, clear (is_final=false) resets column values; the next next_scan re-runs open, re-fetch_peek_dbvals the expression, and rebuilds the cursor.

Drift. json_table_expand_type enum is advisory (runtime uses m_is_iterable_node); only ARRAY_EXPAND is wired. pt_json_table_info::is_correlated is set at semantic check but the runtime always re-fetches on rebind.

  1. JSONPath filter expressions. RFC 9535 and MySQL/Oracle support ?(@.age > 18); CUBRID covers only the basic subset. Adding filters would push selection into the row-source.

  2. Object expansion. JSON_TABLE_OBJECT_EXPAND is unwired. '$.*' over an object could yield one row per member; the runtime hook is small but the parser syntax has not been settled.

  3. Per-node predicate pushdown. The header comment calls this out: “partition the scan predicate on scan nodes and filter invalid rows at node level”. Today filtering is leaf-only; node-aware filtering could prune entire NESTED-PATH branches.

  4. Streaming the iterator. JSON_DOC is fully materialised. A SAX-style parser would avoid materialising the whole document when only a small subset is touched.

  5. Sibling cross-product option. SQL:2016 prescribes left-join semantics; some users want Cartesian product. A per-NESTED-PATH CROSS keyword would express it without breaking existing semantics.

  6. Common-subexpression elimination on column paths. Two columns with the same path call db_json_extract_document_from_path twice; the XASL builder could share the extraction.

Code paths consumed: src/query/scan_json_table.{cpp,hpp}, src/xasl/access_json_table.{cpp,hpp}, src/compat/json_table_def.h, src/query/scan_manager.{c,h}, src/query/query_executor.c, src/parser/csql_grammar.y, src/parser/parse_tree.h, src/parser/name_resolution.c, src/xasl/xasl_stream.cpp, src/compat/db_json.hpp. Context: src/compat/db_json.cpp, src/query/string_opfunc.c.

Theoretical references: ISO/IEC 9075-2:2016 §6.36 “JSON_TABLE”; IETF RFC 9535 (2024) JSONPath: Query Expressions for JSON; Goessner (2007) JSONPath — XPath for JSON; Graefe (1994) Volcano, IEEE TKDE 6(1); Graefe (1993) Query Evaluation Techniques, ACM Computing Surveys 25(2); Silberschatz/Korth/Sudarshan, Database System Concepts 7th ed.

Cross-refs: knowledge/code-analysis/cubrid/cubrid-scan-manager.md (SCAN_ID/SCAN_TYPE polymorphism); knowledge/code-analysis/cubrid/cubrid-scalar-functions.md (peer JSON scalar functions through db_json).