CUBRID JSON_TABLE — Table Function Turning JSON Documents Into Virtual Rows
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Cross-check Notes
- Open Questions
- Sources
Theoretical Background
Section titled “Theoretical Background”JSON_TABLE is a table function: invoked in FROM, it returns a relation. Textbooks call leaves of an operator tree that produce tuples “row sources” or “table expressions”; JSON_TABLE (expr, '$.path' COLUMNS (...)) takes a JSON document and a column specification and the engine pretends the result is a base table.
The standard reference is ISO/IEC 9075-2:2016 §6.36 “JSON_TABLE” in the JSON support package (X401/X402). It prescribes four column shapes — regular (name type PATH 'expr'), EXISTS (name BOOLEAN EXISTS PATH 'expr'), ORDINALITY (name FOR ORDINALITY), NESTED (NESTED PATH 'expr' COLUMNS (...)) — plus per-column ON ERROR/ON EMPTY clauses each with NULL (default), ERROR (raise), or DEFAULT v (substitute). These knobs make JSON_TABLE an ETL-grade primitive: it cannot crash a batch when one row’s $.user.age is the string "unknown" instead of a number.
The query language is JSONPath, from Goessner’s 2007 article and standardised as RFC 9535 (2024). CUBRID’s surface syntax is the conservative subset all major engines agree on: $ (root), .name, [i], [*], optional ** (recursive descent). The semantic core is the array-wildcard at the table root ('$.[*]'): turns one document into multiple rows. CUBRID, like MySQL, makes the parent-child relationship a left-join with sibling-NULL semantics rather than a Cartesian product across siblings.
The implementation pillar is the iterator model on tree-shaped state. A JSON_TABLE invocation has a tree of column specs (root + each NESTED PATH recursively); a per-call cursor walks that tree depth-first while the executor pulls rows. Graefe’s Volcano paper (TKDE 1994) frames this as a tree of nested iterators: outer advances when inner is exhausted; inner resets per outer position. JSON_TABLE applies that pattern to JSON-array iteration. The cursor recording “where am I in this tree” is the essential state.
The last pillar is column projection with type coercion. JSON is untyped; SQL columns have fixed types. The mapping is lossy — every (JSON kind × SQL kind) pair must have a defined behaviour, with ON ERROR/ON EMPTY as the implementation-defined fallback. CUBRID reuses tp_value_cast for coercion and a per-column json_table_column_behavior record for the failure path.
Common DBMS Design
Section titled “Common DBMS Design”Every engine that ships JSON_TABLE (or its equivalent) converges on three layers: a parser-built spec tree, a per-row cursor, and a per-column evaluator coercing JSON to SQL.
MySQL (8.0+) was first to ship JSON_TABLE and is the closest model. It builds a Json_table_column tree at parse time; execution is Json_seekable_iterator walking Json_dom with per-column subclasses (Path, Exists, For_ordinality, Nested). The “left-join the nested path” rule is the one CUBRID copies verbatim in the header of scan_json_table.hpp.
MariaDB (10.6+) walks a Json_table_column tree via Json_table_nested_path::scan_next recursive descent — close to CUBRID’s scan_next_internal.
Oracle added JSON_TABLE in 12c (2013), predating the standard, as a row-source operator that also handles XMLTABLE. Supports richer JSONPath filters (?(@.age > 18)).
SQL Server ships OPENJSON (@json, '$.path') WITH (...) instead. No native NESTED PATH; nested expansion uses CROSS APPLY OPENJSON(...).
PostgreSQL added JSON_TABLE in 17 (2024) atop jsonb_to_recordset + jsonb_path_query. Execution: ExecJsonTable + JsonTablePlanState over a JsonTablePlan tree — closest in shape to CUBRID.
Where CUBRID sits. Functionally complete against SQL:2016. One C++ class (cubscan::json_table::scanner) with a depth-keyed cursor stack and recursive scan_next_internal. Persistent state (the spec tree of cubxasl::json_table::node) lives in src/xasl/access_json_table.cpp so it survives XASL serialisation; transient state lives in the scanner. Plugs into SCAN_ID via S_JSON_TABLE_SCAN so the executor treats it like any other scan.
CUBRID’s Approach
Section titled “CUBRID’s Approach”Three layers and one discriminator
Section titled “Three layers and one discriminator”The implementation splits across three layers in three directories.
flowchart TB
subgraph Parser["src/parser/ — parse-time"]
G[csql_grammar.y]
PT[PT_JSON_TABLE family]
G --> PT
end
subgraph XaslGen["src/xasl/"]
SP[cubxasl::json_table::spec_node]
NN[node + column]
XS[xasl_stream.cpp]
PT --> SP --> NN
SP --> XS
end
subgraph Runtime["src/query/"]
SC[cubscan::json_table::scanner]
CUR[cursor stack]
SID[SCAN_ID.s.jtid<br/>S_JSON_TABLE_SCAN]
SP --> SC --> CUR
SC -.alias.-> SID
end
Runtime --> Exec[qexec_open_scan: TARGET_JSON_TABLE]
The naming alias is what makes JSON_TABLE a first-class SCAN_TYPE rather than a bolt-on:
// scan_json_table.hpp — naming aliasusing JSON_TABLE_SCAN_ID = cubscan::json_table::scanner;Because scanner is a C++17 standard-layout class with a default constructor, the C-side union in scan_id_struct embeds it by value. The cost (a few dozen bytes per SCAN_ID even when the scan is a heap scan) buys discriminator-check dispatch with no virtual call and no heap allocation.
Spec tree — the persistent shape
Section titled “Spec tree — the persistent shape”The parser turns JSON_TABLE (expr, '$.path' COLUMNS (...)) into three PT_NODE types defined in parse_tree.h:
// parse_tree.h — condensedstruct pt_json_table_info { PT_NODE *expr; PT_NODE *tree; bool is_correlated; };struct pt_json_table_node_info { PT_NODE *columns; PT_NODE *nested_paths; char *path; };struct pt_json_table_column_info{ PT_NODE *name; char *path; size_t index; enum json_table_column_function func; json_table_column_behavior on_error, on_empty; };The grammar (csql_grammar.y) builds this tree in json_table_rule / json_table_node_rule / json_table_column_rule; pt_jt_append_column_or_nested_node dispatches each parsed item into either columns or nested_paths of the current node. Three column kinds exist by json_table_column_function: JSON_TABLE_EXTRACT (default), JSON_TABLE_EXISTS, JSON_TABLE_ORDINALITY.
XASL generation transforms the parser tree into a runtime tree under namespace cubxasl::json_table:
// access_json_table.hpp — the runtime spec treestruct column { tp_domain *m_domain; // SQL type to coerce to char *m_path; char *m_column_name; json_table_column_behavior m_on_error, m_on_empty; db_value *m_output_value_pointer; // aliases an xasl outptr_list slot json_table_column_function m_function; int evaluate (const JSON_DOC &input, size_t ordinality);};struct node { char *m_path; size_t m_ordinality; // 1-based row counter column *m_output_columns; size_t m_output_columns_size; node *m_nested_nodes; size_t m_nested_nodes_size; size_t m_id; JSON_ITERATOR *m_iterator; // walks current array bool m_is_iterable_node; // true when m_path ends in [*] void init_iterator (); void clear_columns (bool is_final_clear);};struct spec_node { node *m_root_node; regu_variable_node *m_json_reguvar; // JSON expr feeding the scan std::size_t m_node_count;};This tree survives XASL serialisation. xasl_stream.cpp carries stx_build overloads for each of column, node, spec_node. The wire format is depth-first recursive: each node writes m_output_columns_size, the columns, m_nested_nodes_size, the children. The server-side tree is rebuilt with stx_alloc_struct from the per-query private pool, so its lifetime is bounded by the XASL invocation.
The crucial property is that m_output_value_pointer aliases an entry in the surrounding XASL’s outptr_list: when a column writes to its output value, the XASL’s projection layer reads the same memory.
Column evaluation — three shapes, one entry
Section titled “Column evaluation — three shapes, one entry”Every column kind funnels through column::evaluate, which dispatches by m_function:
// column::evaluate — src/xasl/access_json_table.cppint column::evaluate (const JSON_DOC &input, size_t ordinality){ pr_clear_value (m_output_value_pointer); db_make_null (m_output_value_pointer); switch (m_function) { case JSON_TABLE_EXTRACT: return evaluate_extract (input); case JSON_TABLE_EXISTS: return evaluate_exists (input); case JSON_TABLE_ORDINALITY: return evaluate_ordinality (ordinality); default: return ER_FAILED; }}evaluate_extract calls db_json_extract_document_from_path (input, m_path, docp); on empty path-match it fires trigger_on_empty; if tp_value_cast returns DOMAIN_INCOMPATIBLE (JSON string "abc" vs SQL INTEGER) it fires trigger_on_error. evaluate_exists calls db_json_contains_path, stores 1/0, then casts to the column domain. evaluate_ordinality writes the ordinality argument as an integer.
trigger_on_error and trigger_on_empty encode the behaviour matrix:
// column::trigger_on_error — condensedswitch (m_on_error.m_behavior) { case JSON_TABLE_RETURN_NULL: er_clear (); return NO_ERROR; case JSON_TABLE_THROW_ERROR: er_set (..., ER_JSON_TABLE_ON_ERROR_INCOMP_DOMAIN, ...); return ER_JSON_TABLE_ON_ERROR_INCOMP_DOMAIN; case JSON_TABLE_DEFAULT_VALUE: pr_clone_value (m_on_error.m_default_value, &value_out); return NO_ERROR;}er_clear () on the RETURN_NULL arm is load-bearing: earlier code may have set a thread-local error during the failed extract; the column swallows it and continues. trigger_on_empty is the same matrix.
Scanner and cursor stack — the transient state
Section titled “Scanner and cursor stack — the transient state”The runtime layer is a single class with one private inner struct:
// scan_json_table.hpp / .cpp — condensedclass scanner {public: void init (cubxasl::json_table::spec_node &spec); void clear (xasl_node *xasl_p, bool is_final, bool is_final_clear); int open (cubthread::entry *thread_p); void end (cubthread::entry *thread_p); int next_scan (cubthread::entry *thread_p, scan_id_struct &sid, SCAN_CODE &sc); SCAN_PRED &get_predicate (); void set_value_descriptor (val_descr *vd);private: struct cursor; int scan_next_internal (cubthread::entry *thread_p, size_t depth, bool &found_row_output); int init_cursor (const JSON_DOC &doc, cubxasl::json_table::node &node, cursor &cursor_out); int set_next_cursor (const cursor ¤t_cursor, size_t next_depth); int set_input_document (cursor &c, const cubxasl::json_table::node &node, const JSON_DOC &doc); size_t get_tree_height (const cubxasl::json_table::node &node); void init_iterators (cubxasl::json_table::node &node); void reset_ordinality (cubxasl::json_table::node &node); void clear_node_columns (cubxasl::json_table::node &node);
cubxasl::json_table::spec_node *m_specp; cursor *m_scan_cursor; size_t m_scan_cursor_depth; size_t m_tree_height; scan_pred m_scan_predicate; val_descr *m_vd;};
struct scanner::cursor { std::size_t m_child; // current child index in non-leaf walk cubxasl::json_table::node *m_node; // points back into spec tree JSON_DOC_STORE m_input_doc; // input document at this depth const JSON_DOC *m_process_doc; // current iterator value or input doc bool m_is_row_fetched, m_need_advance_row, m_is_node_consumed, m_iteration_started; void advance_row_cursor (); void start_json_iterator (); int fetch_row (); void end ();};m_scan_cursor is a flat array sized to m_tree_height (computed once by get_tree_height), not a linked list. Depth never exceeds spec-tree height (a static property of the parse). m_scan_cursor_depth is the index of the deepest open level — where scan_next_internal is currently advancing rows. The flat-array shape is what makes recursion-by-depth+1 a constant-cost operation.
The lifecycle — open, next, end, clear
Section titled “The lifecycle — open, next, end, clear”scanner::init runs once per server-side scan-id construction. It computes m_tree_height recursively, allocates the cursor array, and pre-points each cursor at the left-most branch of the spec tree:
// scanner::init — src/query/scan_json_table.cppvoid scanner::init (cubxasl::json_table::spec_node &spec) { m_specp = &spec; m_tree_height = get_tree_height (*m_specp->m_root_node); m_scan_cursor_depth = 0; m_scan_cursor = new cursor[m_tree_height];
json_table_node *t = m_specp->m_root_node; m_scan_cursor[0].m_node = t; for (int i = 1; t->m_nested_nodes_size != 0; t = &t->m_nested_nodes[0], ++i) m_scan_cursor[i].m_node = t;
init_iterators (*m_specp->m_root_node);}init_iterators calls node::init_iterator on every iterable node (m_is_iterable_node true ⇔ path ends in [*]); the iterator is reused (rewound, not reallocated) on subsequent open calls.
scanner::open runs on the first next_scan (when sid.position == S_BEFORE):
// scanner::open — src/query/scan_json_table.cpp (condensed)int scanner::open (cubthread::entry *thread_p) { DB_VALUE *value_p = NULL; int err = fetch_peek_dbval (thread_p, m_specp->m_json_reguvar, m_vd, NULL, NULL, NULL, &value_p); if (err != NO_ERROR) return err; if (db_value_is_null (value_p)) { assert (m_scan_cursor[0].m_is_node_consumed); // NULL input ⇒ no rows return NO_ERROR; } if (db_value_type (value_p) == DB_TYPE_JSON) { err = init_cursor (*db_get_json_document (value_p), *m_specp->m_root_node, m_scan_cursor[0]); } else { JSON_DOC_STORE document; err = db_value_to_json_doc (*value_p, false, document); if (err != NO_ERROR) return err; err = init_cursor (*document.get_immutable (), *m_specp->m_root_node, m_scan_cursor[0]); } reset_ordinality (*m_specp->m_root_node); m_scan_cursor_depth = 0; return err;}fetch_peek_dbval (the regu-variable evaluator — see cubrid-scalar-functions.md) materialises m_specp->m_json_reguvar: a column ref, a host variable, or an expression like JSON_OBJECT(...). The result may already be DB_TYPE_JSON (fast path) or a string passing through db_value_to_json_doc (slow path: parse via rapidjson). The latter is what lets JSON_TABLE (varchar_col_with_json, ...) work without an explicit CAST.
init_cursor delegates to set_input_document, which does the table-level path extraction:
// scanner::set_input_document — src/query/scan_json_table.cppint scanner::set_input_document (cursor &cursor_arg, const cubxasl::json_table::node &node, const JSON_DOC &document) { cursor_arg.m_input_doc.clear (); int err = db_json_extract_document_from_path (&document, node.m_path, cursor_arg.m_input_doc); if (err != NO_ERROR) return err; if (cursor_arg.m_input_doc.is_null ()) cursor_arg.m_is_node_consumed = true; // path matched nothing else cursor_arg.start_json_iterator (); return NO_ERROR;}cursor::start_json_iterator decides whether to iterate or treat the input as a single row: if m_is_iterable_node is true (path ends in [*]), db_json_set_iterator rewinds the node’s JSON_ITERATOR to position 0 of the array; otherwise the cursor is left in single-row mode and m_process_doc = m_input_doc.
next_scan — the per-tuple driver
Section titled “next_scan — the per-tuple driver”The public per-row entry wraps scan_next_internal plus the scanner-local predicate filter:
// scanner::next_scan — src/query/scan_json_table.cpp (condensed)int scanner::next_scan (cubthread::entry *thread_p, scan_id_struct &sid, SCAN_CODE &sc) { bool has_row = false; DB_LOGICAL logical = V_FALSE;
if (sid.position == S_BEFORE) { int err = open (thread_p); if (err != NO_ERROR) { sc = S_ERROR; return err; } sid.position = S_ON; sid.status = S_STARTED; } else if (sid.position != S_ON) { sc = S_END; return ER_FAILED; }
while (true) { int err = scan_next_internal (thread_p, 0, has_row); if (err != NO_ERROR) { sc = S_ERROR; return err; } if (!has_row) { sid.position = S_AFTER; sc = S_END; return NO_ERROR; } if (m_scan_predicate.pred_expr == NULL) break; logical = m_scan_predicate.pr_eval_fnc (thread_p, m_scan_predicate.pred_expr, sid.vd, NULL); if (logical == V_TRUE) break; if (logical == V_ERROR) { sc = S_ERROR; return ER_FAILED; } // V_FALSE / V_UNKNOWN → loop, try next row } sc = S_SUCCESS; return NO_ERROR;}Two design points. Lazy open on first call: next_scan does the first fetch_peek_dbval itself; for correlated JSON_TABLE inside a nested-loop join, each outer row triggers one open, re-fetches the JSON expression, and rebuilds the cursor. Predicate filter is scanner-local: skipping a non-qualifying row must still advance the JSON iterator; if next_scan returned a “skip” code to scan_handle_single_scan, the outer would re-enter from the same iterator state. Only the scanner knows the iterator position, so it owns the skip-loop.
scan_next_internal — the FSM in disguise
Section titled “scan_next_internal — the FSM in disguise”The recursive engine is ~120 lines; every flag carries weight. The shape:
// scanner::scan_next_internal — src/query/scan_json_table.cpp (condensed)int scanner::scan_next_internal (cubthread::entry *thread_p, size_t depth, bool &found_row_output) { cursor &this_cursor = m_scan_cursor[depth];
// (A) Resume into child if we descended on a previous call. if (m_scan_cursor_depth >= depth + 1) { int err = scan_next_internal (thread_p, depth + 1, found_row_output); if (err != NO_ERROR) return err; if (found_row_output) return NO_ERROR; this_cursor.m_child++; }
while (!this_cursor.m_is_node_consumed) { if (this_cursor.m_need_advance_row) { this_cursor.advance_row_cursor (); if (this_cursor.m_is_node_consumed) break; } int err = this_cursor.fetch_row (); if (err != NO_ERROR) return err;
// (C) Leaf: every row is output. if (this_cursor.m_node->m_nested_nodes_size == 0) { found_row_output = true; this_cursor.m_need_advance_row = true; return NO_ERROR; } // (D) Non-leaf, all children visited: at most one spine row. if (this_cursor.m_child == this_cursor.m_node->m_nested_nodes_size) { this_cursor.m_need_advance_row = true; if (this_cursor.m_iteration_started) continue; // children already emitted found_row_output = true; return NO_ERROR; // sibling-NULL row } // (E) Descend into next child. err = set_next_cursor (this_cursor, depth + 1); if (err != NO_ERROR) return err; cursor &next_cursor = m_scan_cursor[depth + 1]; if (!next_cursor.m_is_node_consumed) { m_scan_cursor_depth++; this_cursor.m_iteration_started = true; err = scan_next_internal (thread_p, depth + 1, found_row_output); if (err != NO_ERROR) return err; } else { this_cursor.m_child++; continue; } if (found_row_output) return NO_ERROR; this_cursor.m_child++; }
// (F) Node fully consumed — pop a level. found_row_output = false; if (m_scan_cursor_depth > 0) m_scan_cursor_depth--; return NO_ERROR;}stateDiagram-v2 [*] --> CursorOpen : init_cursor CursorOpen --> FetchRow : iterator has element FetchRow --> EmitLeaf : leaf node FetchRow --> DescendChild : has children, m_child < N DescendChild --> ChildEmits ChildEmits --> Caller Caller --> ResumeChild FetchRow --> EmitSpine : m_child == N AND NOT m_iteration_started FetchRow --> Skip : m_child == N AND m_iteration_started EmitLeaf --> AdvanceRow Skip --> AdvanceRow AdvanceRow --> CursorOpen : advance_row_cursor AdvanceRow --> NodeDone : iterator exhausted
Three rules fall out of the flag manipulation. Rule 1: a leaf row is always output (arc C); m_need_advance_row ensures the next call advances. Rule 2: a non-leaf spine row is output only if no child emitted (arc D, gated by m_iteration_started) — this implements SQL:2016 left-join semantics where empty NESTED PATH arrays still produce a parent row with NULLs. Rule 3: siblings do not cross-product — the header comment is explicit: “while one nested path is expanded, the values for sibling nested paths will be all null.” This is enforced by cursor::end which calls node::clear_columns to NULL out the just-finished sibling’s columns before the next sibling’s expansion populates its own:
// scanner::cursor::end — src/query/scan_json_table.cppvoid scanner::cursor::end (void) { m_is_node_consumed = true; db_json_reset_iterator (m_node->m_iterator); m_process_doc = NULL; m_node->clear_columns (false); // NULL all this node's columns}fetch_row — populate the columns at this depth
Section titled “fetch_row — populate the columns at this depth”// scanner::cursor::fetch_row — src/query/scan_json_table.cppint scanner::cursor::fetch_row (void) { if (m_is_row_fetched) return NO_ERROR; // idempotent guard
m_process_doc = (m_node->m_iterator != NULL) ? db_json_iterator_get_document (*m_node->m_iterator) // peek into iterator : m_input_doc.get_immutable (); // single-row mode
for (size_t i = 0; i < m_node->m_output_columns_size; ++i) { int err = m_node->m_output_columns[i].evaluate (*m_process_doc, m_node->m_ordinality); if (err != NO_ERROR) return err; } return NO_ERROR;}db_json_iterator_get_document returns a peek pointer into the iterator’s current array element — no copy, no allocation. Each column’s evaluate re-extracts its own per-column path relative to this peek document. The two-level path system (node-level path + column-level path) is what makes '$.users[*]' COLUMNS (a INT PATH '$.age') work: row context from the node, per-column extraction from the column.
advance_row_cursor increments the iterator and the per-node ordinality:
// scanner::cursor::advance_row_cursor — src/query/scan_json_table.cppvoid scanner::cursor::advance_row_cursor () { m_need_advance_row = false; m_iteration_started = false; if (m_node->m_iterator == NULL || !db_json_iterator_has_next (*m_node->m_iterator)) { end (); return; } db_json_iterator_next (*m_node->m_iterator); m_is_row_fetched = false; m_node->m_ordinality++; m_child = 0;}The 1-based ordinality lives on the node (one counter per nesting level). reset_ordinality resets all of them to 1 at every open.
Walking the tree — concrete trace
Section titled “Walking the tree — concrete trace”For SELECT * FROM JSON_TABLE ('{"a":1, "arr":[{"c":10},{"c":20}]}', '$' COLUMNS (a INT PATH '$.a', NESTED PATH '$.arr[*]' COLUMNS (c INT PATH '$.c'))) jt; the spec is root ($, column a, one child) and child ($.arr[*], iterable, column c).
sequenceDiagram participant NS as scanner::next_scan participant SI as scan_next_internal participant Root as cursor[0] participant Child as cursor[1] NS->>NS: open: fetch JSON, init_cursor[0] NS->>SI: scan_next_internal(0) SI->>Root: fetch_row → a:=1 SI->>Child: set_next_cursor, recurse(1), fetch_row → c:=10 SI-->>NS: row 1: (a=1, c=10), m_need_advance_row=true NS->>SI: next_scan call → resume at depth=1 SI->>Child: advance_row_cursor → c:=20 SI-->>NS: row 2: (a=1, c=20) NS->>SI: next call → resume(1) → iterator exhausted ⇒ end SI->>Root: m_iteration_started=true ⇒ continue, advance, end SI-->>NS: S_END
If the JSON had been {"a":1, "arr":[]}, the child would be m_is_node_consumed=true immediately after init_cursor. Arc (D) of scan_next_internal then fires on the root: m_iteration_started is false, so the spine row (a=1, c=NULL) is emitted — the SQL:2016 left-join behaviour.
Cleanup — clear, end, and the rebind path
Section titled “Cleanup — clear, end, and the rebind path”scanner::clear is parameterised by two booleans that select among three cleanup intensities:
// scanner::clear — src/query/scan_json_table.cpp (condensed)void scanner::clear (xasl_node *xasl_p, bool is_final, bool is_final_clear) { m_specp->m_root_node->clear_xasl (is_final_clear); reset_ordinality (*m_specp->m_root_node); if (is_final) { for (size_t i = 0; i < m_tree_height; ++i) { /* reset cursor[i] flags */ } m_specp->m_root_node->clear_iterators (is_final_clear); if (is_final_clear) delete [] m_scan_cursor; }}is_final=false: scan paused (next outer-loop row); only column values are cleared. is_final=true, is_final_clear=false: scan ends this iteration but XASL will re-execute; cursors and iterators are emptied but allocations kept. is_final=true, is_final_clear=true: XASL torn down for good; iterators are deleted and the cursor array freed. This three-way granularity matches the executor’s cache-vs-tear-down policy and lets a JSON_TABLE survive correlated rebinds without leaking.
Wiring into SCAN_ID — the dispatch layer
Section titled “Wiring into SCAN_ID — the dispatch layer”Fan-out is two functions in scan_manager.c:
// scan_open_json_table_scan + scan_next_json_table_scan — src/query/scan_manager.cint scan_open_json_table_scan (THREAD_ENTRY *thread_p, SCAN_ID *scan_id, int grouped, QPROC_SINGLE_FETCH single_fetch, DB_VALUE *join_dbval, val_list_node *val_list, VAL_DESCR *vd, PRED_EXPR *pr) { DB_TYPE single_node_type = DB_TYPE_NULL; assert (scan_id->type == S_JSON_TABLE_SCAN); scan_init_scan_id (scan_id, false, S_SELECT, true, grouped, single_fetch, join_dbval, val_list, vd); scan_init_scan_pred (&scan_id->s.jtid.get_predicate (), NULL, pr, ((pr) ? eval_fnc (thread_p, pr, &single_node_type) : NULL)); scan_id->s.jtid.set_value_descriptor (vd); return NO_ERROR;}static SCAN_CODE scan_next_json_table_scan (THREAD_ENTRY *thread_p, SCAN_ID *scan_id) { SCAN_CODE sc; int err = scan_id->s.jtid.next_scan (thread_p, *scan_id, sc); return (err != NO_ERROR) ? S_ERROR : sc;}scan_open_json_table_scan does not call scanner::init; the scanner state (m_specp, cursor array, iterators) is built by XASL deserialisation in xasl_stream.cpp plus a call to scanner::init triggered when qexec_open_scan traverses the unpacked XASL_NODE. This function only sets up the outer SCAN_ID metadata.
Dispatch back into the scanner: one arm in scan_next_scan_local (case S_JSON_TABLE_SCAN: status = scan_next_json_table_scan (thread_p, scan_id);) and one arm in qexec_open_scan (case TARGET_JSON_TABLE: calling scan_open_json_table_scan). XASL generation tags the access-spec TARGET_JSON_TABLE based on PT_DERIVED_JSON_TABLE at semantic-check time.
flowchart LR Q[SQL with JSON_TABLE] --> P[PT_JSON_TABLE] P --> XG[xasl_generation:<br/>ACCESS_SPEC TARGET_JSON_TABLE,<br/>spec_node] XG --> XS[xasl_stream serialise] XS --> QE["qexec_open_scan: TARGET_JSON_TABLE"] QE --> SID[SCAN_ID.type = S_JSON_TABLE_SCAN] SID --> NL[scan_next_scan_local switch] NL --> NS[scanner::next_scan] NS --> SI[scan_next_internal] SI --> JI[db_json_iterator_*<br/>db_json_extract_document_from_path] SI --> CE[column::evaluate<br/>tp_value_cast] CE --> OL[outptr_list DB_VALUEs] NS --> Pred[m_scan_predicate.pr_eval_fnc]
Source Walkthrough
Section titled “Source Walkthrough”Symbols are anchored on names; line numbers in the position-hints table are decay-prone hints scoped to this document’s updated: date.
Cross-module header — src/compat/json_table_def.h
Section titled “Cross-module header — src/compat/json_table_def.h”enum json_table_column_behavior_type (RETURN_NULL/THROW_ERROR/DEFAULT_VALUE); enum json_table_column_function (EXTRACT/EXISTS/ORDINALITY); struct json_table_column_behavior (m_behavior plus m_default_value); enum json_table_expand_type (ARRAY/OBJECT/NO_EXPAND — advisory; runtime uses m_is_iterable_node).
Parser — src/parser/csql_grammar.y, name_resolution.c, parse_tree.h
Section titled “Parser — src/parser/csql_grammar.y, name_resolution.c, parse_tree.h”JSON_TABLE keyword; bison productions json_table_rule, json_table_node_rule, json_table_column_rule, json_table_column_list_rule, json_table_on_error_rule_optional, json_table_on_empty_rule_optional, json_table_column_behavior_rule; pt_jt_append_column_or_nested_node (column-vs-nested separator). PT_NODE types PT_JSON_TABLE, PT_JSON_TABLE_NODE, PT_JSON_TABLE_COLUMN; info structs pt_json_table_info, pt_json_table_node_info, pt_json_table_column_info; derived-table type PT_DERIVED_JSON_TABLE. Name resolution: pt_get_all_json_table_attributes_and_types, pt_json_table_gather_attribs (synthesise as_attr_list); json_table_column_count (per-invocation index counter).
XASL spec layer — src/xasl/access_json_table.{hpp,cpp}
Section titled “XASL spec layer — src/xasl/access_json_table.{hpp,cpp}”Namespace cubxasl::json_table with structs column, node, spec_node. Column: evaluate, evaluate_extract, evaluate_exists, evaluate_ordinality, trigger_on_error, trigger_on_empty, clear_xasl. Node: init_iterator, clear_columns, clear_iterators, clear_xasl, init_ordinality. Spec: clear_xasl. Aliases json_table_column, json_table_node, json_table_spec_node.
XASL wire format — src/xasl/xasl_stream.cpp
Section titled “XASL wire format — src/xasl/xasl_stream.cpp”stx_build overloads for column, node, spec_node; stx_unpack for json_table_column_behavior; xasl_stream_compare overloads for cache equivalence.
Runtime scanner — src/query/scan_json_table.{hpp,cpp}
Section titled “Runtime scanner — src/query/scan_json_table.{hpp,cpp}”Class cubscan::json_table::scanner; inner scanner::cursor. Cursor methods: advance_row_cursor, start_json_iterator, fetch_row, end. Scanner public: init, clear, open, end, next_scan, get_predicate, set_value_descriptor. Scanner private: get_tree_height, init_iterators, reset_ordinality, clear_node_columns, set_input_document, init_cursor, set_next_cursor, scan_next_internal. Alias JSON_TABLE_SCAN_ID = cubscan::json_table::scanner.
SCAN_ID dispatch + executor wiring
Section titled “SCAN_ID dispatch + executor wiring”SCAN_TYPE::S_JSON_TABLE_SCAN, scan_id_struct.s.jtid, scan_open_json_table_scan, scan_next_json_table_scan (in scan_manager.c); the case S_JSON_TABLE_SCAN: arms in scan_next_scan_local, scan_start_scan, scan_end_scan, scan_close_scan, scan_reset_scan_block, scan_next_scan_block (most no-ops). case TARGET_JSON_TABLE: arm in qexec_open_scan (query_executor.c).
JSON support kernels (consumed) — src/compat/db_json.{hpp,cpp}
Section titled “JSON support kernels (consumed) — src/compat/db_json.{hpp,cpp}”db_json_extract_document_from_path (path kernel), db_json_contains_path (EXISTS PATH), iterator API (db_json_set_iterator / iterator_next / _has_next / _get_document / create_iterator / delete_json_iterator / clear_json_iterator / reset_iterator), db_value_to_json_doc (implicit cast for VARCHAR JSON), db_json_get_type, db_json_get_raw_json_body_from_document.
Position hints (as of this revision)
Section titled “Position hints (as of this revision)”| Symbol | File | Line |
|---|---|---|
enum json_table_column_function | src/compat/json_table_def.h | 38 |
enum json_table_column_behavior_type | src/compat/json_table_def.h | 31 |
struct json_table_column_behavior | src/compat/json_table_def.h | 45 |
cubxasl::json_table::column | src/xasl/access_json_table.hpp | 46 |
cubxasl::json_table::node | src/xasl/access_json_table.hpp | 74 |
cubxasl::json_table::spec_node | src/xasl/access_json_table.hpp | 96 |
column::trigger_on_error / _on_empty | src/xasl/access_json_table.cpp | 43 / 81 |
column::evaluate_extract / _exists / _ordinality | src/xasl/access_json_table.cpp | 129 / 171 / 197 |
column::evaluate / clear_xasl | src/xasl/access_json_table.cpp | 207 / 235 |
node::clear_columns / clear_iterators / clear_xasl / init_iterator | src/xasl/access_json_table.cpp | 270 / 280 / 298 / 309 |
spec_node::clear_xasl | src/xasl/access_json_table.cpp | 337 |
cubscan::json_table::scanner / JSON_TABLE_SCAN_ID alias | src/query/scan_json_table.hpp | 109 / 171 |
scanner::cursor (struct) | src/query/scan_json_table.cpp | 37 |
cursor::advance_row_cursor / start_json_iterator / fetch_row / end | src/query/scan_json_table.cpp | 72 / 98 / 109 / 150 |
scanner::get_tree_height / init / clear / open / end | src/query/scan_json_table.cpp | 161 / 175 / 198 / 229 / 290 |
scanner::next_scan / set_input_document / init_cursor / set_next_cursor | src/query/scan_json_table.cpp | 296 / 359 / 387 / 397 |
scanner::clear_node_columns / init_iterators / reset_ordinality | src/query/scan_json_table.cpp | 405 / 415 / 426 |
scanner::scan_next_internal / get_predicate / set_value_descriptor | src/query/scan_json_table.cpp | 437 / 564 / 570 |
scan_open_json_table_scan / scan_next_json_table_scan | src/query/scan_manager.c | 4036 / 7014 |
case S_JSON_TABLE_SCAN in scan_next_scan_local | src/query/scan_manager.c | 5273 |
case TARGET_JSON_TABLE in qexec_open_scan | src/query/query_executor.c | 7591 |
S_JSON_TABLE_SCAN enum / jtid union member | src/query/scan_manager.h | 83 / 412 |
bison json_table_rule / _node_rule / _column_rule / _column_behavior_rule | src/parser/csql_grammar.y | 21957 / 21944 / 21888 / 21838 |
pt_get_all_json_table_attributes_and_types / pt_json_table_gather_attribs | src/parser/name_resolution.c | 4972 / 4952 |
stx_build (column / node / spec_node) / stx_unpack (behavior) | src/xasl/xasl_stream.cpp | 360 / 393 / 437 / 463 |
db_json_iterator_next / _has_next / _get_document / set_iterator / reset_iterator / create_iterator / delete_json_iterator / clear_json_iterator | src/compat/db_json.hpp | 136 / 138 / 137 / 139 / 140 / 142 / 143 / 144 |
Cross-check Notes
Section titled “Cross-check Notes”Against cubrid-scan-manager.md. That doc owns the SCAN_ID polymorphism and lifecycle protocol; the boundary is the case S_JSON_TABLE_SCAN: arm in scan_next_scan_local plus the scan_open_json_table_scan / scan_next_json_table_scan shim pair. This doc owns the cursor stack, FSM, and NESTED-PATH semantics.
Against cubrid-scalar-functions.md. That doc lists db_evaluate_json_extract etc. as the per-F_JSON_* scalar functions. JSON_TABLE does not go through that path; column::evaluate_extract calls db_json_extract_document_from_path directly. Reason: JSON_TABLE extracts over a peek iterator (no copy, no DB_VALUE allocation around the JSON_DOC); scalar JSON_EXTRACT produces a fresh DB_VALUE of DB_TYPE_JSON. Kernel shared, wrapping differs.
Predicate placement is scanner-internal. The WHERE filter is wired into m_scan_predicate and evaluated inside scanner::next_scan, not at the SCAN_ID-level scan_handle_single_scan. A skipped row must still advance the JSON iterator; only the inner function knows that state. Same pattern is used by scan_next_set_scan.
m_output_value_pointer aliases the XASL outptr_list. The XASL builder allocates one DB_VALUE per output column; the column writes via db_make_json_from_doc_store_and_release directly to that aliased slot. The pr_clear_value at the start of every evaluate is the lifetime contract.
Iterator allocation is per-XASL-invocation. JSON_ITERATORs attached to each node are allocated at scanner::init, rewound (not reallocated) at every start_json_iterator, and freed by clear_iterators with is_final_clear=true — avoiding per-row allocation on large arrays.
Re-entry on rebind. For correlated JSON_TABLE inside a nested-loop join, clear (is_final=false) resets column values; the next next_scan re-runs open, re-fetch_peek_dbvals the expression, and rebuilds the cursor.
Drift. json_table_expand_type enum is advisory (runtime uses m_is_iterable_node); only ARRAY_EXPAND is wired. pt_json_table_info::is_correlated is set at semantic check but the runtime always re-fetches on rebind.
Open Questions
Section titled “Open Questions”-
JSONPath filter expressions. RFC 9535 and MySQL/Oracle support
?(@.age > 18); CUBRID covers only the basic subset. Adding filters would push selection into the row-source. -
Object expansion.
JSON_TABLE_OBJECT_EXPANDis unwired.'$.*'over an object could yield one row per member; the runtime hook is small but the parser syntax has not been settled. -
Per-node predicate pushdown. The header comment calls this out: “partition the scan predicate on scan nodes and filter invalid rows at node level”. Today filtering is leaf-only; node-aware filtering could prune entire NESTED-PATH branches.
-
Streaming the iterator.
JSON_DOCis fully materialised. A SAX-style parser would avoid materialising the whole document when only a small subset is touched. -
Sibling cross-product option. SQL:2016 prescribes left-join semantics; some users want Cartesian product. A per-NESTED-PATH
CROSSkeyword would express it without breaking existing semantics. -
Common-subexpression elimination on column paths. Two columns with the same path call
db_json_extract_document_from_pathtwice; the XASL builder could share the extraction.
Sources
Section titled “Sources”Code paths consumed: src/query/scan_json_table.{cpp,hpp}, src/xasl/access_json_table.{cpp,hpp}, src/compat/json_table_def.h, src/query/scan_manager.{c,h}, src/query/query_executor.c, src/parser/csql_grammar.y, src/parser/parse_tree.h, src/parser/name_resolution.c, src/xasl/xasl_stream.cpp, src/compat/db_json.hpp. Context: src/compat/db_json.cpp, src/query/string_opfunc.c.
Theoretical references: ISO/IEC 9075-2:2016 §6.36 “JSON_TABLE”; IETF RFC 9535 (2024) JSONPath: Query Expressions for JSON; Goessner (2007) JSONPath — XPath for JSON; Graefe (1994) Volcano, IEEE TKDE 6(1); Graefe (1993) Query Evaluation Techniques, ACM Computing Surveys 25(2); Silberschatz/Korth/Sudarshan, Database System Concepts 7th ed.
Cross-refs: knowledge/code-analysis/cubrid/cubrid-scan-manager.md (SCAN_ID/SCAN_TYPE polymorphism); knowledge/code-analysis/cubrid/cubrid-scalar-functions.md (peer JSON scalar functions through db_json).