(KO) CUBRID JSON_TABLE — JSON 문서를 가상 행으로 풀어내는 테이블 함수

학술적 배경
DBMS 공통 설계 패턴 (Common DBMS Design)
CUBRID의 구현
소스 코드 가이드
교차 검증 노트 (Cross-check Notes)
미해결 질문 (Open Questions)
출처

학술적 배경

JSON_TABLE 은 테이블 함수(table function) 다. FROM 절에서 호출되며 관계(relation)를 돌려준다. 텍스트북은 튜플을 만들어 내는 연산자 트리의 잎을 행 소스(row source) 또는 테이블 표현식(table expression)으로 부르는데, JSON_TABLE (expr, '$.path' COLUMNS (...)) 은 JSON 문서 하나와 컬럼 명세 하나를 받아, 엔진이 그 결과를 베이스 테이블처럼 다루도록 만든다.

표준 레퍼런스는 ISO/IEC 9075-2:2016 §6.36 JSON_TABLE (JSON 지원 패키지 X401 / X402) 이다. 이 표준은 컬럼의 모양 네 가지 — regular (name type PATH 'expr'), EXISTS (name BOOLEAN EXISTS PATH 'expr'), ORDINALITY (name FOR ORDINALITY), NESTED (NESTED PATH 'expr' COLUMNS (...)) — 와, 컬럼별 ON ERROR / ON EMPTY 절을 규정한다. 그 절은 NULL (기본값), ERROR (예외 발생), DEFAULT v (대체값 주입) 셋 중 하나를 고를 수 있다. 이 노브들이 JSON_TABLE을 ETL-급 원시 연산자로 만든다. 한 행의 $.user.age 가 숫자 대신 문자열 unknown 이라고 해서 배치 전체가 멈춰서는 안 되기 때문이다.

질의 언어는 JSONPath 다. Goessner의 2007년 글에서 시작했고, RFC 9535 (2024) 로 표준화됐다. CUBRID의 표면 문법은 모든 주요 엔진이 합의하는 보수적인 부분 집합이다. $ (root), .name, [i], [*], 선택적 ** (재귀 하강). 의미론의 핵심은 테이블 루트에서의 배열 와일드카드 ('$.[*]') 다. 이것이 한 문서를 여러 행으로 펼쳐 놓는다. CUBRID는 MySQL과 마찬가지로, 부모-자식 관계를 형제 간의 데카르트 곱이 아니라 left join (형제 NULL 의미론) 으로 본다.

구현의 기둥은 트리 모양 상태(tree-shaped state) 위의 iterator 모델 이다. JSON_TABLE 호출 하나는 컬럼 명세의 트리 (root + 각 NESTED PATH 가 재귀적으로 자식이 됨) 를 갖고, 호출마다 한 개의 커서가 그 트리를 깊이 우선으로 걷는다. 그 동안 executor가 행을 끌어당긴다. Graefe의 Volcano 논문(TKDE 1994)은 이 구조를 중첩된 iterator의 트리로 본다. 안쪽이 고갈되면 바깥쪽이 진행하고, 바깥쪽 위치마다 안쪽이 리셋된다. JSON_TABLE은 이 패턴을 JSON 배열 iteration에 그대로 적용한다. 이 트리의 어디에 있느냐를 기록하는 커서가 본질적인 상태다.

마지막 기둥은 타입 강제 변환(type coercion)을 동반한 컬럼 사영(projection) 이다. JSON은 타입이 없고 SQL 컬럼은 고정된 타입을 갖는다. 둘 사이의 사상(mapping)은 손실이 있을 수밖에 없다. (JSON 종류 × SQL 종류) 의 모든 쌍을 정의된 동작이 있어야 하고, ON ERROR / ON EMPTY 가 구현 정의 fallback이 된다. CUBRID는 변환에 tp_value_cast 를 재사용하고, 실패 경로를 컬럼별 json_table_column_behavior 레코드에 담는다.

DBMS 공통 설계 패턴 (Common DBMS Design)

JSON_TABLE (또는 동등한 기능)을 출하한 모든 엔진은 세 계층으로 수렴한다. 파서가 만든 명세 트리, 행마다 도는 커서, 그리고 JSON을 SQL로 강제 변환하는 컬럼별 evaluator.

MySQL (8.0+) 은 JSON_TABLE 을 가장 먼저 출하했고 가장 가까운 모델이다. 파스 타임에 Json_table_column 트리를 짓고, 실행 시에는 Json_seekable_iterator 가 Json_dom 을 걸으며 컬럼별 서브클래스 (Path, Exists, For_ordinality, Nested) 를 호출한다. “nested path를 left-join 한다”는 규칙은 CUBRID가 scan_json_table.hpp 의 헤더 주석에 글자 그대로 베껴 둔 그 규칙이다.

MariaDB (10.6+) 는 Json_table_column 트리를 Json_table_nested_path::scan_next 의 재귀 하강으로 걷는다. CUBRID의 scan_next_internal 과 가까운 형태다.

Oracle 은 12c (2013) 에 JSON_TABLE을 추가했다. 표준보다 앞섰고, XMLTABLE 도 함께 다루는 row-source 연산자로 구현했다. 더 풍부한 JSONPath 필터(?(@.age > 18)) 를 지원한다.

SQL Server 는 OPENJSON (@json, '$.path') WITH (...) 을 대신 출하한다. 네이티브 NESTED PATH 가 없고, 중첩 확장은 CROSS APPLY OPENJSON(...) 으로 표현해야 한다.

PostgreSQL 은 17 (2024) 에 와서야 JSON_TABLE 을 추가했다. jsonb_to_recordset + jsonb_path_query 위에 얹은 형태다. 실행은 ExecJsonTable + JsonTablePlanState 가 JsonTablePlan 트리 위에서 도는 모양으로, CUBRID와 가장 흡사하다.

CUBRID의 자리. SQL:2016를 기능적으로 완전하다. 단일 C++ 클래스 (cubscan::json_table::scanner) 한 개에 깊이로 키잉되는 커서 스택과 재귀적 scan_next_internal 이 들어 있다. 영속 상태(cubxasl::json_table::node 의 명세 트리)는 src/xasl/access_json_table.cpp 에 살아 XASL 직렬화를 견디고, 일시 상태는 스캐너 안에 산다. S_JSON_TABLE_SCAN 으로 SCAN_ID 에 꽂히기 때문에, executor는 그 어떤 다른 스캔과도 똑같이 취급한다.

CUBRID의 구현

세 계층과 하나의 디스크리미네이터

구현은 세 디렉터리, 세 계층으로 갈라진다.

flowchart TB
  subgraph Parser["src/parser/ — 파스 타임"]
    G[csql_grammar.y]
    PT[PT_JSON_TABLE 계열]
    G --> PT
  end
  subgraph XaslGen["src/xasl/"]
    SP[cubxasl::json_table::spec_node]
    NN[node + column]
    XS[xasl_stream.cpp]
    PT --> SP --> NN
    SP --> XS
  end
  subgraph Runtime["src/query/"]
    SC[cubscan::json_table::scanner]
    CUR[커서 스택]
    SID[SCAN_ID.s.jtid<br/>S_JSON_TABLE_SCAN]
    SP --> SC --> CUR
    SC -.별칭.-> SID
  end
  Runtime --> Exec[qexec_open_scan: TARGET_JSON_TABLE]

그림 1 — 세 계층 구조. 파서(csql_grammar.y)가 PT_JSON_TABLE 파스 트리를 만들고, XASL 계층이 spec_node·node·column으로 변환해 직렬화한다. 런타임 계층의 cubscan::json_table::scanner가 커서 스택으로 반복을 수행하며, SCAN_ID.s.jtid라는 별칭으로 S_JSON_TABLE_SCAN 일급 scan 타입이 된다.

JSON_TABLE을 덧붙인 부속 기능이 아니라 일급 SCAN_TYPE으로 만드는 것은 바로 이 명명 별칭이다.

// scan_json_table.hpp — naming alias
using JSON_TABLE_SCAN_ID = cubscan::json_table::scanner;

scanner 가 기본 생성자를 가진 C++17 standard-layout 클래스이기 때문에, C 쪽 scan_id_struct 안의 union 이 그 클래스를 값으로(by value) 품을 수 있다. 비용은 SCAN_ID마다 수십 바이트가 더 붙는다는 것 (실제 스캔이 힙 스캔이더라도) 이고, 대신 가상 호출도 힙 할당도 없는, 디스크리미네이터 검사만으로 끝나는 디스패치를 얻는다.

명세 트리 — 영속 상태의 모양

파서는 JSON_TABLE (expr, '$.path' COLUMNS (...)) 을 parse_tree.h 의 세 가지 PT_NODE 타입으로 바꾼다.

// parse_tree.h — condensed
struct pt_json_table_info       { PT_NODE *expr; PT_NODE *tree; bool is_correlated; };
struct pt_json_table_node_info  { PT_NODE *columns; PT_NODE *nested_paths; char *path; };
struct pt_json_table_column_info{ PT_NODE *name; char *path; size_t index;
                                  enum json_table_column_function func;
                                  json_table_column_behavior on_error, on_empty; };

문법(csql_grammar.y) 은 이 트리를 json_table_rule / json_table_node_rule / json_table_column_rule 에서 짓고, pt_jt_append_column_or_nested_node 가 파스된 항목을 현재 노드의 columns 또는 nested_paths 로 분기시킨다. 컬럼의 종류 셋은 json_table_column_function 으로 구별된다. JSON_TABLE_EXTRACT (기본), JSON_TABLE_EXISTS, JSON_TABLE_ORDINALITY.

XASL 생성은 파서 트리를 cubxasl::json_table 네임스페이스의 런타임 트리로 옮겨 적는다.

// access_json_table.hpp — the runtime spec tree
struct column {
  tp_domain *m_domain;                       // 강제 변환 대상 SQL 타입
  char *m_path; char *m_column_name;
  json_table_column_behavior m_on_error, m_on_empty;
  db_value *m_output_value_pointer;          // XASL outptr_list 슬롯의 별칭
  json_table_column_function m_function;
  int evaluate (const JSON_DOC &input, size_t ordinality);
};
struct node {
  char *m_path; size_t m_ordinality;         // 1부터 시작하는 행 카운터
  column *m_output_columns;  size_t m_output_columns_size;
  node *m_nested_nodes;      size_t m_nested_nodes_size;
  size_t m_id;
  JSON_ITERATOR *m_iterator;                 // 현재 배열을 도는 iterator
  bool m_is_iterable_node;                   // m_path가 [*]로 끝날 때 true
  void init_iterator (); void clear_columns (bool is_final_clear);
};
struct spec_node {
  node *m_root_node;
  regu_variable_node *m_json_reguvar;        // 스캔에 들어가는 JSON 식
  std::size_t m_node_count;
};

이 트리는 XASL 직렬화를 견딘다. xasl_stream.cpp 에 column, node, spec_node 각각에 대한 stx_build 오버로드가 있다. 와이어 포맷은 깊이 우선 재귀다. 각 node 는 m_output_columns_size, 컬럼들, m_nested_nodes_size, 자식들을 차례로 쓴다. 서버 쪽에서 트리를 다시 지을 때는 stx_alloc_struct 가 질의별 사설 풀(private pool) 에서 메모리를 잡으므로, 트리의 수명은 XASL 호출 한 번에 묶인다.

핵심적인 성질 하나는, m_output_value_pointer 가 둘러싸는 XASL의 outptr_list 의 한 엔트리의 별칭이다. 컬럼이 자기 출력값을 쓰면, XASL의 사영 계층은 같은 메모리를 읽는다.

컬럼 평가 — 세 가지 모양, 한 진입점

모든 컬럼 종류는 column::evaluate 한 군데로 흘러 들어가서 m_function 에 따라 분기한다.

// column::evaluate — src/xasl/access_json_table.cpp
int column::evaluate (const JSON_DOC &input, size_t ordinality)
{
  pr_clear_value (m_output_value_pointer);  db_make_null (m_output_value_pointer);
  switch (m_function) {
    case JSON_TABLE_EXTRACT:    return evaluate_extract (input);
    case JSON_TABLE_EXISTS:     return evaluate_exists (input);
    case JSON_TABLE_ORDINALITY: return evaluate_ordinality (ordinality);
    default:                    return ER_FAILED;
  }
}

evaluate_extract 는 db_json_extract_document_from_path (input, m_path, docp) 을 호출한다. 경로 매칭이 비면 trigger_on_empty 를 친다. tp_value_cast 가 DOMAIN_INCOMPATIBLE 을 돌려주면 (예: JSON 문자열 abc 를 SQL INTEGER 로 변환할 때) trigger_on_error 를 친다. evaluate_exists 는 db_json_contains_path 를 호출하고 1 / 0 을 저장한 뒤 컬럼 도메인으로 변환한다. evaluate_ordinality 는 인자 ordinality 를 정수로 그대로 쓴다.

trigger_on_error 와 trigger_on_empty 가 동작 행렬을 인코딩한다.

// column::trigger_on_error — condensed
switch (m_on_error.m_behavior) {
  case JSON_TABLE_RETURN_NULL:    er_clear (); return NO_ERROR;
  case JSON_TABLE_THROW_ERROR:    er_set (..., ER_JSON_TABLE_ON_ERROR_INCOMP_DOMAIN, ...);
                                  return ER_JSON_TABLE_ON_ERROR_INCOMP_DOMAIN;
  case JSON_TABLE_DEFAULT_VALUE:  pr_clone_value (m_on_error.m_default_value, &value_out);
                                  return NO_ERROR;
}

RETURN_NULL 분기의 er_clear () 는 무게가 있는 한 줄이다. 이전 코드가 실패한 추출 도중 thread-local 에러를 이미 세팅해 두었을 수 있는데, 컬럼은 그 에러를 삼키고 계속 가야 한다. trigger_on_empty 는 같은 행렬을 따른다.

스캐너와 커서 스택 — 일시 상태

런타임 계층은 안쪽 inner struct 하나를 품은 단일 클래스다.

// scan_json_table.hpp / .cpp — condensed
class scanner {
public:
  void init (cubxasl::json_table::spec_node &spec);
  void clear (xasl_node *xasl_p, bool is_final, bool is_final_clear);
  int  open (cubthread::entry *thread_p);
  void end (cubthread::entry *thread_p);
  int  next_scan (cubthread::entry *thread_p, scan_id_struct &sid, SCAN_CODE &sc);
  SCAN_PRED &get_predicate ();      void set_value_descriptor (val_descr *vd);
private:
  struct cursor;
  int  scan_next_internal (cubthread::entry *thread_p, size_t depth, bool &found_row_output);
  int  init_cursor (const JSON_DOC &doc, cubxasl::json_table::node &node, cursor &cursor_out);
  int  set_next_cursor (const cursor &current_cursor, size_t next_depth);
  int  set_input_document (cursor &c, const cubxasl::json_table::node &node, const JSON_DOC &doc);
  size_t get_tree_height (const cubxasl::json_table::node &node);
  void init_iterators (cubxasl::json_table::node &node);
  void reset_ordinality (cubxasl::json_table::node &node);
  void clear_node_columns (cubxasl::json_table::node &node);

  cubxasl::json_table::spec_node *m_specp;
  cursor *m_scan_cursor;   size_t m_scan_cursor_depth;  size_t m_tree_height;
  scan_pred m_scan_predicate;  val_descr *m_vd;
};

struct scanner::cursor {
  std::size_t m_child;                  // non-leaf 워크에서의 현재 자식 인덱스
  cubxasl::json_table::node *m_node;    // 명세 트리로의 역참조
  JSON_DOC_STORE m_input_doc;           // 이 깊이에서의 입력 문서
  const JSON_DOC *m_process_doc;        // 현재 iterator 값 또는 입력 문서
  bool m_is_row_fetched, m_need_advance_row, m_is_node_consumed, m_iteration_started;
  void advance_row_cursor (); void start_json_iterator ();
  int  fetch_row ();          void end ();
};

m_scan_cursor 는 연결 리스트가 아니라 m_tree_height 크기의 평탄한 배열이다. 이 높이는 get_tree_height 가 한 번 계산한다. 깊이는 명세 트리의 높이를 절대 넘지 못한다 (파스 결과에서 결정되는 정적 성질). m_scan_cursor_depth 는 열려 있는 가장 깊은 레벨의 인덱스 — 즉 scan_next_internal 이 지금 행을 진행시키고 있는 위치 — 다. 이 평탄 배열 모양 덕분에 depth+1 재귀가 상수 비용 연산이 된다.

라이프사이클 — open, next, end, clear

scanner::init 은 서버 측 scan-id 구성마다 한 번 돈다. m_tree_height 를 재귀적으로 계산하고, 커서 배열을 할당하고, 명세 트리의 가장 왼쪽 가지 위에 각 커서를 미리 꽂아 둔다.

// scanner::init — src/query/scan_json_table.cpp
void scanner::init (cubxasl::json_table::spec_node &spec) {
  m_specp = &spec;
  m_tree_height = get_tree_height (*m_specp->m_root_node);
  m_scan_cursor_depth = 0;
  m_scan_cursor = new cursor[m_tree_height];

  json_table_node *t = m_specp->m_root_node;
  m_scan_cursor[0].m_node = t;
  for (int i = 1; t->m_nested_nodes_size != 0; t = &t->m_nested_nodes[0], ++i)
    m_scan_cursor[i].m_node = t;

  init_iterators (*m_specp->m_root_node);
}

init_iterators 는 모든 iterable 노드 (m_is_iterable_node 가 true ⇔ 경로가 [*] 로 끝남) 를 node::init_iterator 를 부른다. 이후 open 호출에서는 iterator를 다시 할당하지 않고 되감기(rewind) 만 한다.

scanner::open 은 첫 next_scan (sid.position == S_BEFORE) 에서 돈다.

// scanner::open — src/query/scan_json_table.cpp (condensed)
int scanner::open (cubthread::entry *thread_p) {
  DB_VALUE *value_p = NULL;
  int err = fetch_peek_dbval (thread_p, m_specp->m_json_reguvar,
                              m_vd, NULL, NULL, NULL, &value_p);
  if (err != NO_ERROR) return err;
  if (db_value_is_null (value_p)) {
    assert (m_scan_cursor[0].m_is_node_consumed);  // NULL 입력 ⇒ 행 없음
    return NO_ERROR;
  }
  if (db_value_type (value_p) == DB_TYPE_JSON) {
    err = init_cursor (*db_get_json_document (value_p),
                       *m_specp->m_root_node, m_scan_cursor[0]);
  } else {
    JSON_DOC_STORE document;
    err = db_value_to_json_doc (*value_p, false, document);
    if (err != NO_ERROR) return err;
    err = init_cursor (*document.get_immutable (),
                       *m_specp->m_root_node, m_scan_cursor[0]);
  }
  reset_ordinality (*m_specp->m_root_node);
  m_scan_cursor_depth = 0;
  return err;
}

fetch_peek_dbval (regu-variable evaluator — cubrid-scalar-functions.md 참조) 이 m_specp->m_json_reguvar 를 실체화한다. 컬럼 참조일 수도, 호스트 변수일 수도, JSON_OBJECT(...) 같은 식일 수도 있다. 결과는 이미 DB_TYPE_JSON 일 수도 있고 (빠른 경로) db_value_to_json_doc 을 거쳐 가는 문자열일 수도 있다 (느린 경로 — rapidjson 으로 파싱). 후자가 바로 JSON_TABLE (varchar_col_with_json, ...) 이 명시적 CAST 없이 동작하는 이유다.

init_cursor 는 set_input_document 으로 위임한다. 거기서 테이블 레벨 경로 추출이 일어난다.

// scanner::set_input_document — src/query/scan_json_table.cpp
int scanner::set_input_document (cursor &cursor_arg,
                                 const cubxasl::json_table::node &node,
                                 const JSON_DOC &document) {
  cursor_arg.m_input_doc.clear ();
  int err = db_json_extract_document_from_path (&document, node.m_path,
                                                cursor_arg.m_input_doc);
  if (err != NO_ERROR) return err;
  if (cursor_arg.m_input_doc.is_null ())
    cursor_arg.m_is_node_consumed = true;        // 경로가 아무것도 매칭하지 못함
  else
    cursor_arg.start_json_iterator ();
  return NO_ERROR;
}

cursor::start_json_iterator 는 입력을 iterate 할지 단일 행으로 처리할지 결정한다. m_is_iterable_node 가 true (경로가 [*] 로 끝남) 면 db_json_set_iterator 가 노드의 JSON_ITERATOR 를 배열의 위치 0으로 되감는다. 아니면 커서는 단일 행 모드로 남고 m_process_doc = m_input_doc 가 된다.

`next_scan` — 튜플 단위 드라이버

행마다의 공개 진입점은 scan_next_internal 과 스캐너 로컬 술어 필터를 감싼다.

// scanner::next_scan — src/query/scan_json_table.cpp (condensed)
int scanner::next_scan (cubthread::entry *thread_p, scan_id_struct &sid, SCAN_CODE &sc) {
  bool has_row = false;  DB_LOGICAL logical = V_FALSE;

  if (sid.position == S_BEFORE) {
    int err = open (thread_p);
    if (err != NO_ERROR) { sc = S_ERROR; return err; }
    sid.position = S_ON;  sid.status = S_STARTED;
  } else if (sid.position != S_ON) { sc = S_END; return ER_FAILED; }

  while (true) {
    int err = scan_next_internal (thread_p, 0, has_row);
    if (err != NO_ERROR) { sc = S_ERROR; return err; }
    if (!has_row) { sid.position = S_AFTER; sc = S_END; return NO_ERROR; }
    if (m_scan_predicate.pred_expr == NULL) break;
    logical = m_scan_predicate.pr_eval_fnc (thread_p, m_scan_predicate.pred_expr, sid.vd, NULL);
    if (logical == V_TRUE)  break;
    if (logical == V_ERROR) { sc = S_ERROR; return ER_FAILED; }
    // V_FALSE / V_UNKNOWN → 루프, 다음 행 시도
  }
  sc = S_SUCCESS; return NO_ERROR;
}

설계상 두 가지 점이 보인다. 첫 호출에서 lazy open: next_scan 이 첫 fetch_peek_dbval 까지 직접 처리한다. nested-loop join 안의 상관 JSON_TABLE이라면, 외부 행마다 open 이 한 번씩 발사돼서 JSON 식을 다시 fetch 하고 커서를 다시 짓는다. 술어 필터는 스캐너 로컬: 자격이 안 되는 행을 건너뛰더라도 JSON iterator 는 여전히 진행시켜야 한다. 만약 next_scan 이 skip 코드를 scan_handle_single_scan 에 돌려준다면, 바깥쪽이 같은 iterator 상태에서 다시 들어왔을 것이다. iterator 위치를 아는 것은 스캐너뿐이므로, skip-루프는 스캐너가 소유한다.

`scan_next_internal` — 가면 쓴 FSM

재귀 엔진은 약 120 줄이고, 모든 플래그가 무게를 가진다. 모양은 이렇다.

// scanner::scan_next_internal — src/query/scan_json_table.cpp (condensed)
int scanner::scan_next_internal (cubthread::entry *thread_p, size_t depth, bool &found_row_output) {
  cursor &this_cursor = m_scan_cursor[depth];

  // (A) 이전 호출에서 자식으로 내려갔다면 그 자식으로 재진입.
  if (m_scan_cursor_depth >= depth + 1) {
    int err = scan_next_internal (thread_p, depth + 1, found_row_output);
    if (err != NO_ERROR) return err;
    if (found_row_output) return NO_ERROR;
    this_cursor.m_child++;
  }

  while (!this_cursor.m_is_node_consumed) {
    if (this_cursor.m_need_advance_row) {
      this_cursor.advance_row_cursor ();
      if (this_cursor.m_is_node_consumed) break;
    }
    int err = this_cursor.fetch_row ();
    if (err != NO_ERROR) return err;

    // (C) 잎: 모든 행이 출력된다.
    if (this_cursor.m_node->m_nested_nodes_size == 0) {
      found_row_output = true;
      this_cursor.m_need_advance_row = true;
      return NO_ERROR;
    }
    // (D) non-leaf, 모든 자식 방문 완료: 척추(spine) 행 최대 1개.
    if (this_cursor.m_child == this_cursor.m_node->m_nested_nodes_size) {
      this_cursor.m_need_advance_row = true;
      if (this_cursor.m_iteration_started) continue;          // 자식이 이미 방출함
      found_row_output = true; return NO_ERROR;               // 형제-NULL 행
    }
    // (E) 다음 자식으로 하강.
    err = set_next_cursor (this_cursor, depth + 1);
    if (err != NO_ERROR) return err;
    cursor &next_cursor = m_scan_cursor[depth + 1];
    if (!next_cursor.m_is_node_consumed) {
      m_scan_cursor_depth++;
      this_cursor.m_iteration_started = true;
      err = scan_next_internal (thread_p, depth + 1, found_row_output);
      if (err != NO_ERROR) return err;
    } else { this_cursor.m_child++; continue; }
    if (found_row_output) return NO_ERROR;
    this_cursor.m_child++;
  }

  // (F) 이 노드 완전히 소진 — 한 레벨 pop.
  found_row_output = false;
  if (m_scan_cursor_depth > 0) m_scan_cursor_depth--;
  return NO_ERROR;
}

stateDiagram-v2
  [*] --> CursorOpen : init_cursor
  CursorOpen --> FetchRow : iterator에 원소 있음
  FetchRow --> EmitLeaf : 잎 노드
  FetchRow --> DescendChild : 자식 있음, m_child < N
  DescendChild --> ChildEmits
  ChildEmits --> Caller
  Caller --> ResumeChild
  FetchRow --> EmitSpine : m_child == N AND NOT m_iteration_started
  FetchRow --> Skip : m_child == N AND m_iteration_started
  EmitLeaf --> AdvanceRow
  Skip --> AdvanceRow
  AdvanceRow --> CursorOpen : advance_row_cursor
  AdvanceRow --> NodeDone : iterator 고갈

그림 2 — scan_next_internal 커서 상태 머신. init_cursor로 CursorOpen 상태가 되고, fetch_row에서 잎 노드는 EmitLeaf로, 자식이 있으면 DescendChild로 재귀한다. m_child == N이고 m_iteration_started가 false인 척추 행은 EmitSpine으로 방출되어 SQL:2016 left-join 동작을 구현한다. iterator가 고갈되면 NodeDone으로 종료한다.

플래그 조작에서 세 가지 규칙이 도출된다. 규칙 1: 잎 행은 항상 출력된다 (호 C). m_need_advance_row 가 다음 호출에서 진행을 보장한다. 규칙 2: non-leaf 척추 행은 자식이 아무것도 방출하지 못했을 때만 출력된다 (호 D, m_iteration_started 게이팅) — 이것이 빈 NESTED PATH 배열에서도 부모 행을 NULL로 채워 내보내는 SQL:2016 left-join 의미론을 구현한다. 규칙 3: 형제 간에 데카르트 곱을 만들지 않는다. 헤더 주석이 분명히 적어 두었다. “한 nested path가 펼쳐지는 동안, 형제 nested path들의 값은 모두 NULL이다.” 이 규칙은 cursor::end 가 node::clear_columns 를 불러서 다음 형제의 펼침이 자기 컬럼을 채우기 전에 방금 끝난 형제의 컬럼을 NULL로 만드는 방식으로 강제된다.

// scanner::cursor::end — src/query/scan_json_table.cpp
void scanner::cursor::end (void) {
  m_is_node_consumed = true;
  db_json_reset_iterator (m_node->m_iterator);
  m_process_doc = NULL;
  m_node->clear_columns (false);             // 이 노드의 모든 컬럼을 NULL로
}

`fetch_row` — 이 깊이의 컬럼들을 채운다

// scanner::cursor::fetch_row — src/query/scan_json_table.cpp
int scanner::cursor::fetch_row (void) {
  if (m_is_row_fetched) return NO_ERROR;          // 멱등 가드

  m_process_doc = (m_node->m_iterator != NULL)
    ? db_json_iterator_get_document (*m_node->m_iterator)   // iterator로 peek
    : m_input_doc.get_immutable ();                         // 단일 행 모드

  for (size_t i = 0; i < m_node->m_output_columns_size; ++i) {
    int err = m_node->m_output_columns[i].evaluate
      (*m_process_doc, m_node->m_ordinality);
    if (err != NO_ERROR) return err;
  }
  return NO_ERROR;
}

db_json_iterator_get_document 은 iterator 가 보고 있는 현재 배열 원소로의 peek 포인터를 돌려준다. 복사도 할당도 없다. 각 컬럼의 evaluate 는 그 peek 문서를 기준으로 자기 자신의 컬럼 경로를 다시 추출한다. 이 두-레벨 경로 시스템 (노드 레벨 경로 + 컬럼 레벨 경로) 이 '$.users[*]' COLUMNS (a INT PATH '$.age') 가 동작하게 만든다. 행 컨텍스트는 노드에서, 컬럼별 추출은 컬럼에서.

advance_row_cursor 는 iterator 와 노드별 ordinality 를 진행시킨다.

// scanner::cursor::advance_row_cursor — src/query/scan_json_table.cpp
void scanner::cursor::advance_row_cursor () {
  m_need_advance_row = false;  m_iteration_started = false;
  if (m_node->m_iterator == NULL || !db_json_iterator_has_next (*m_node->m_iterator))
    { end (); return; }
  db_json_iterator_next (*m_node->m_iterator);
  m_is_row_fetched = false;
  m_node->m_ordinality++;     m_child = 0;
}

1부터 시작하는 ordinality 는 node 위에 산다 (중첩 레벨마다 카운터 하나). reset_ordinality 가 모든 카운터를 매 open 마다 1로 리셋한다.

트리 걷기 — 구체적인 트레이스

SELECT * FROM JSON_TABLE ('{a:1, arr:[{c:10},{c:20}]}', '$' COLUMNS (a INT PATH '$.a', NESTED PATH '$.arr[*]' COLUMNS (c INT PATH '$.c'))) jt; 의 명세는 root ($, 컬럼 a, 자식 1개) 와 자식 ($.arr[*], iterable, 컬럼 c) 로 구성된다.

sequenceDiagram
  participant NS as scanner::next_scan
  participant SI as scan_next_internal
  participant Root as cursor[0]
  participant Child as cursor[1]

  NS->>NS: open: JSON fetch, init_cursor[0]
  NS->>SI: scan_next_internal(0)
  SI->>Root: fetch_row → a:=1
  SI->>Child: set_next_cursor, recurse(1), fetch_row → c:=10
  SI-->>NS: 행 1: (a=1, c=10), m_need_advance_row=true
  NS->>SI: 다음 next_scan → depth=1에서 재개
  SI->>Child: advance_row_cursor → c:=20
  SI-->>NS: 행 2: (a=1, c=20)
  NS->>SI: 다음 호출 → resume(1) → iterator 고갈 ⇒ end
  SI->>Root: m_iteration_started=true ⇒ continue, advance, end
  SI-->>NS: S_END

그림 3 — NESTED PATH 예제 시퀀스. root cursor[0]가 a:=1을 fetch하고 child cursor[1]이 $.arr[*]를 반복해 c:=10, c:=20 두 행을 방출한다. child iterator가 고갈되면 m_iteration_started=true로 root 행이 추가 방출되지 않고 S_END를 반환한다.

JSON 이 {a:1, arr:[]} 였다면, 자식은 init_cursor 직후 m_is_node_consumed=true 가 됐을 것이다. 그러면 root에서 scan_next_internal 의 호 (D) 가 발사된다. m_iteration_started 가 false 이므로 척추 행 (a=1, c=NULL) 이 방출된다. 바로 SQL:2016의 left-join 동작이다.

정리 — `clear`, `end`, 그리고 rebind 경로

scanner::clear 는 두 개의 boolean 으로 매개화되어 세 단계의 정리 강도 중 하나를 고른다.

// scanner::clear — src/query/scan_json_table.cpp (condensed)
void scanner::clear (xasl_node *xasl_p, bool is_final, bool is_final_clear) {
  m_specp->m_root_node->clear_xasl (is_final_clear);
  reset_ordinality (*m_specp->m_root_node);
  if (is_final) {
    for (size_t i = 0; i < m_tree_height; ++i) { /* cursor[i] 플래그 리셋 */ }
    m_specp->m_root_node->clear_iterators (is_final_clear);
    if (is_final_clear) delete [] m_scan_cursor;
  }
}

is_final=false: 스캔이 일시정지 (다음 외부 루프 행을 위해). 컬럼값만 지운다. is_final=true, is_final_clear=false: 이번 iteration 의 스캔은 끝나지만 XASL은 다시 실행될 예정. 커서와 iterator 는 비워지지만 할당은 유지된다. is_final=true, is_final_clear=true: XASL이 영구히 해체되는 경우. iterator 들이 삭제되고 커서 배열이 풀린다. 이 3-단 단위는 executor의 캐시 vs 해체 정책과 정확히 맞물리고, JSON_TABLE이 상관 rebind 를 누수 없이 견디게 해 준다.

SCAN_ID로의 배선 — 디스패치 계층

밖으로 나가는 출구는 scan_manager.c 의 두 함수다.

// scan_open_json_table_scan + scan_next_json_table_scan — src/query/scan_manager.c
int scan_open_json_table_scan (THREAD_ENTRY *thread_p, SCAN_ID *scan_id, int grouped,
                               QPROC_SINGLE_FETCH single_fetch, DB_VALUE *join_dbval,
                               val_list_node *val_list, VAL_DESCR *vd, PRED_EXPR *pr) {
  DB_TYPE single_node_type = DB_TYPE_NULL;
  assert (scan_id->type == S_JSON_TABLE_SCAN);
  scan_init_scan_id (scan_id, false, S_SELECT, true, grouped, single_fetch,
                     join_dbval, val_list, vd);
  scan_init_scan_pred (&scan_id->s.jtid.get_predicate (), NULL, pr,
                       ((pr) ? eval_fnc (thread_p, pr, &single_node_type) : NULL));
  scan_id->s.jtid.set_value_descriptor (vd);
  return NO_ERROR;
}
static SCAN_CODE scan_next_json_table_scan (THREAD_ENTRY *thread_p, SCAN_ID *scan_id) {
  SCAN_CODE sc;
  int err = scan_id->s.jtid.next_scan (thread_p, *scan_id, sc);
  return (err != NO_ERROR) ? S_ERROR : sc;
}

scan_open_json_table_scan 이 scanner::init 을 부르지는 않는다. 스캐너 상태 (m_specp, 커서 배열, iterator) 는 xasl_stream.cpp 의 XASL 역직렬화에서 만들어지고, 그 뒤 qexec_open_scan 이 풀어진 XASL_NODE를 순회할 때 트리거되는 scanner::init 호출이 마저 채운다. 이 함수가 하는 일은 바깥쪽 SCAN_ID 메타데이터를 세팅하는 것뿐이다.

스캐너로 다시 들어가는 디스패치는 두 군데에 있다. scan_next_scan_local 의 한 분기 (case S_JSON_TABLE_SCAN: status = scan_next_json_table_scan (thread_p, scan_id);) 와 qexec_open_scan 의 한 분기 (case TARGET_JSON_TABLE: 가 scan_open_json_table_scan 을 호출). XASL 생성기는 의미 검사 시점에 PT_DERIVED_JSON_TABLE 을 보고 액세스 명세에 TARGET_JSON_TABLE 태그를 붙인다.

flowchart LR
  Q[JSON_TABLE 포함 SQL] --> P[PT_JSON_TABLE]
  P --> XG[xasl_generation:<br/>ACCESS_SPEC TARGET_JSON_TABLE,<br/>spec_node]
  XG --> XS[xasl_stream 직렬화]
  XS --> QE["qexec_open_scan: TARGET_JSON_TABLE"]
  QE --> SID[SCAN_ID.type = S_JSON_TABLE_SCAN]
  SID --> NL[scan_next_scan_local switch]
  NL --> NS[scanner::next_scan]
  NS --> SI[scan_next_internal]
  SI --> JI[db_json_iterator_*<br/>db_json_extract_document_from_path]
  SI --> CE[column::evaluate<br/>tp_value_cast]
  CE --> OL[outptr_list DB_VALUE들]
  NS --> Pred[m_scan_predicate.pr_eval_fnc]

그림 4 — SQL에서 스캔 실행까지의 전체 경로. 파서가 PT_JSON_TABLE을 만들면 XASL 생성기가 ACCESS_SPEC TARGET_JSON_TABLE과 spec_node를 붙이고 직렬화한다. qexec_open_scan이 S_JSON_TABLE_SCAN으로 스캔을 열면 scan_next_scan_local switch가 scanner::next_scan을 호출하고, 내부에서 db_json_iterator_*로 JSON 값을 추출해 column::evaluate로 타입 변환 후 outptr_list에 채운다.

소스 코드 가이드

심볼은 이름에 앵커를 둔다. 위치 힌트 표의 라인 번호는 본 문서의 updated: 날짜 기준의 사라지기 쉬운(decay-prone) 힌트다.

모듈 간 헤더 — `src/compat/json_table_def.h`

enum json_table_column_behavior_type (RETURN_NULL/THROW_ERROR/DEFAULT_VALUE); enum json_table_column_function (EXTRACT/EXISTS/ORDINALITY); struct json_table_column_behavior (m_behavior 와 m_default_value); enum json_table_expand_type (ARRAY/OBJECT/NO_EXPAND — advisory. 런타임은 m_is_iterable_node 를 사용).

파서 — `src/parser/csql_grammar.y`, `name_resolution.c`, `parse_tree.h`

JSON_TABLE 키워드. bison 프로덕션 json_table_rule, json_table_node_rule, json_table_column_rule, json_table_column_list_rule, json_table_on_error_rule_optional, json_table_on_empty_rule_optional, json_table_column_behavior_rule. pt_jt_append_column_or_nested_node (column-vs-nested 분기). PT_NODE 타입 PT_JSON_TABLE, PT_JSON_TABLE_NODE, PT_JSON_TABLE_COLUMN. info struct pt_json_table_info, pt_json_table_node_info, pt_json_table_column_info. derived-table 타입 PT_DERIVED_JSON_TABLE. 이름 해소: pt_get_all_json_table_attributes_and_types, pt_json_table_gather_attribs (as_attr_list 합성). json_table_column_count (호출별 인덱스 카운터).

XASL 명세 계층 — `src/xasl/access_json_table.{hpp,cpp}`

네임스페이스 cubxasl::json_table 안에 struct column, node, spec_node. Column: evaluate, evaluate_extract, evaluate_exists, evaluate_ordinality, trigger_on_error, trigger_on_empty, clear_xasl. Node: init_iterator, clear_columns, clear_iterators, clear_xasl, init_ordinality. Spec: clear_xasl. 별칭 json_table_column, json_table_node, json_table_spec_node.

XASL 와이어 포맷 — `src/xasl/xasl_stream.cpp`

column, node, spec_node 에 대한 stx_build 오버로드. json_table_column_behavior 에 대한 stx_unpack. 캐시 동등성을 위한 xasl_stream_compare 오버로드.

런타임 스캐너 — `src/query/scan_json_table.{hpp,cpp}`

클래스 cubscan::json_table::scanner; inner scanner::cursor. 커서 메서드: advance_row_cursor, start_json_iterator, fetch_row, end. 스캐너 public: init, clear, open, end, next_scan, get_predicate, set_value_descriptor. 스캐너 private: get_tree_height, init_iterators, reset_ordinality, clear_node_columns, set_input_document, init_cursor, set_next_cursor, scan_next_internal. 별칭 JSON_TABLE_SCAN_ID = cubscan::json_table::scanner.

SCAN_ID 디스패치 + executor 배선

SCAN_TYPE::S_JSON_TABLE_SCAN, scan_id_struct.s.jtid, scan_open_json_table_scan, scan_next_json_table_scan (scan_manager.c). scan_next_scan_local, scan_start_scan, scan_end_scan, scan_close_scan, scan_reset_scan_block, scan_next_scan_block 의 case S_JSON_TABLE_SCAN: 분기들 (대부분 no-op). qexec_open_scan 의 case TARGET_JSON_TABLE: 분기 (query_executor.c).

JSON 지원 커널 (소비자 측) — `src/compat/db_json.{hpp,cpp}`

db_json_extract_document_from_path (경로 커널), db_json_contains_path (EXISTS PATH), iterator API (db_json_set_iterator / iterator_next / _has_next / _get_document / create_iterator / delete_json_iterator / clear_json_iterator / reset_iterator), db_value_to_json_doc (VARCHAR JSON에 대한 암묵 cast), db_json_get_type, db_json_get_raw_json_body_from_document.

위치 힌트 (이번 리비전 기준)

심볼	파일	라인
`enum json_table_column_function`	`src/compat/json_table_def.h`	38
`enum json_table_column_behavior_type`	`src/compat/json_table_def.h`	31
`struct json_table_column_behavior`	`src/compat/json_table_def.h`	45
`cubxasl::json_table::column`	`src/xasl/access_json_table.hpp`	46
`cubxasl::json_table::node`	`src/xasl/access_json_table.hpp`	74
`cubxasl::json_table::spec_node`	`src/xasl/access_json_table.hpp`	96
`column::trigger_on_error` / `_on_empty`	`src/xasl/access_json_table.cpp`	43 / 81
`column::evaluate_extract` / `_exists` / `_ordinality`	`src/xasl/access_json_table.cpp`	129 / 171 / 197
`column::evaluate` / `clear_xasl`	`src/xasl/access_json_table.cpp`	207 / 235
`node::clear_columns` / `clear_iterators` / `clear_xasl` / `init_iterator`	`src/xasl/access_json_table.cpp`	270 / 280 / 298 / 309
`spec_node::clear_xasl`	`src/xasl/access_json_table.cpp`	337
`cubscan::json_table::scanner` / `JSON_TABLE_SCAN_ID` 별칭	`src/query/scan_json_table.hpp`	109 / 171
`scanner::cursor` (struct)	`src/query/scan_json_table.cpp`	37
`cursor::advance_row_cursor` / `start_json_iterator` / `fetch_row` / `end`	`src/query/scan_json_table.cpp`	72 / 98 / 109 / 150
`scanner::get_tree_height` / `init` / `clear` / `open` / `end`	`src/query/scan_json_table.cpp`	161 / 175 / 198 / 229 / 290
`scanner::next_scan` / `set_input_document` / `init_cursor` / `set_next_cursor`	`src/query/scan_json_table.cpp`	296 / 359 / 387 / 397
`scanner::clear_node_columns` / `init_iterators` / `reset_ordinality`	`src/query/scan_json_table.cpp`	405 / 415 / 426
`scanner::scan_next_internal` / `get_predicate` / `set_value_descriptor`	`src/query/scan_json_table.cpp`	437 / 564 / 570
`scan_open_json_table_scan` / `scan_next_json_table_scan`	`src/query/scan_manager.c`	4036 / 7014
`case S_JSON_TABLE_SCAN` in `scan_next_scan_local`	`src/query/scan_manager.c`	5273
`case TARGET_JSON_TABLE` in `qexec_open_scan`	`src/query/query_executor.c`	7591
`S_JSON_TABLE_SCAN` enum / `jtid` union 멤버	`src/query/scan_manager.h`	83 / 412
bison `json_table_rule` / `_node_rule` / `_column_rule` / `_column_behavior_rule`	`src/parser/csql_grammar.y`	21957 / 21944 / 21888 / 21838
`pt_get_all_json_table_attributes_and_types` / `pt_json_table_gather_attribs`	`src/parser/name_resolution.c`	4972 / 4952
`stx_build (column / node / spec_node)` / `stx_unpack (behavior)`	`src/xasl/xasl_stream.cpp`	360 / 393 / 437 / 463
`db_json_iterator_next` / `_has_next` / `_get_document` / `set_iterator` / `reset_iterator` / `create_iterator` / `delete_json_iterator` / `clear_json_iterator`	`src/compat/db_json.hpp`	136 / 138 / 137 / 139 / 140 / 142 / 143 / 144

교차 검증 노트 (Cross-check Notes)

cubrid-scan-manager.md 와의 비교. 그 문서가 SCAN_ID 다형성과 라이프사이클 프로토콜을 소유한다. 경계는 scan_next_scan_local 의 case S_JSON_TABLE_SCAN: 분기와 scan_open_json_table_scan / scan_next_json_table_scan shim 쌍이다. 본 문서는 커서 스택, FSM, NESTED-PATH 의미론을 소유한다.

cubrid-scalar-functions.md 와의 비교. 그 문서는 db_evaluate_json_extract 등 F_JSON_* 단위의 스칼라 함수들을 나열한다. JSON_TABLE은 그 경로로 가지 않는다. column::evaluate_extract 가 db_json_extract_document_from_path 를 직접 부른다. 이유: JSON_TABLE은 peek iterator 위에서 추출하므로 (복사 없음, JSON_DOC 둘레의 DB_VALUE 할당 없음) 스칼라 JSON_EXTRACT 가 DB_TYPE_JSON 의 새 DB_VALUE 를 만들어 내는 것과 다르다. 커널은 공유, 래핑이 다르다.

술어 배치는 스캐너 내부. WHERE 필터는 m_scan_predicate 에 배선되고 scanner::next_scan 내부에서 평가된다. SCAN_ID 레벨의 scan_handle_single_scan 이 아니다. 건너뛴 행도 JSON iterator 는 진행시켜야 하며, 그 상태를 아는 것은 안쪽 함수뿐이다. scan_next_set_scan 도 같은 패턴을 쓴다.

m_output_value_pointer 는 XASL outptr_list 의 별칭. XASL 빌더가 출력 컬럼마다 DB_VALUE 한 개를 할당하고, 컬럼은 db_make_json_from_doc_store_and_release 로 그 별칭 슬롯에 직접 쓴다. 모든 evaluate 의 첫머리 pr_clear_value 가 수명 계약(lifetime contract) 이다.

Iterator 할당은 XASL 호출 단위. 각 node 에 붙는 JSON_ITERATOR 는 scanner::init 시점에 할당되고, start_json_iterator 마다 되감기되며 (재할당 X), is_final_clear=true 의 clear_iterators 에서 풀린다. 큰 배열을 행마다 할당이 발생하지 않게 한다.

Rebind 시 재진입. nested-loop join 안의 상관 JSON_TABLE 에서는, clear (is_final=false) 가 컬럼 값을 리셋하고, 다음 next_scan 이 open 을 다시 돌려 식을 다시 fetch_peek_dbval 한 뒤 커서를 다시 짓는다.

드리프트. json_table_expand_type enum 은 advisory 이며 (런타임은 m_is_iterable_node 사용), ARRAY_EXPAND 만 실제로 배선되어 있다. pt_json_table_info::is_correlated 는 의미 검사 단계에서 세팅되지만 런타임은 항상 rebind 시 다시 fetch 한다.

미해결 질문 (Open Questions)

JSONPath 필터 식. RFC 9535 와 MySQL/Oracle 은 ?(@.age > 18) 같은 필터를 지원한다. CUBRID는 기본 부분 집합만 다룬다. 필터를 추가하면 선택(selection)이 행 소스 안으로 들어가게 된다.
객체 확장. JSON_TABLE_OBJECT_EXPAND 는 배선되어 있지 않다. 객체 위의 '$.*' 는 멤버마다 한 행을 만들 수 있다. 런타임 훅은 작은데 파서 문법이 아직 정해지지 않았다.
노드별 술어 푸시다운. 헤더 주석이 명시한다. “스캔 술어를 스캔 노드 단위로 분할하고 노드 레벨에서 무효 행을 거르자”. 오늘은 잎 레벨만 거른다. 노드 단위 필터링은 NESTED-PATH 가지 전체를 잘라낼 수 있다.
Iterator 스트리밍. JSON_DOC 은 완전히 실체화된다(fully materialised). SAX-style 파서라면 작은 부분 집합만 건드릴 때 전체 문서를 실체화하지 않을 수 있다.
형제 데카르트 곱 옵션. SQL:2016은 left-join 의미론을 규정하지만, 일부 사용자는 데카르트 곱을 원한다. NESTED-PATH 단위의 CROSS 키워드는 기존 의미론을 깨지 않고 이를 표현할 수 있다.
컬럼 경로의 공통 부분식 제거. 경로가 같은 두 컬럼은 db_json_extract_document_from_path 를 두 번 부른다. XASL 빌더가 추출을 공유시킬 수 있다.

출처

소비된 코드 경로: src/query/scan_json_table.{cpp,hpp}, src/xasl/access_json_table.{cpp,hpp}, src/compat/json_table_def.h, src/query/scan_manager.{c,h}, src/query/query_executor.c, src/parser/csql_grammar.y, src/parser/parse_tree.h, src/parser/name_resolution.c, src/xasl/xasl_stream.cpp, src/compat/db_json.hpp. 컨텍스트: src/compat/db_json.cpp, src/query/string_opfunc.c.

이론적 참고: ISO/IEC 9075-2:2016 §6.36 JSON_TABLE; IETF RFC 9535 (2024) JSONPath: Query Expressions for JSON; Goessner (2007) JSONPath — XPath for JSON; Graefe (1994) Volcano, IEEE TKDE 6(1); Graefe (1993) Query Evaluation Techniques, ACM Computing Surveys 25(2); Silberschatz/Korth/Sudarshan, Database System Concepts 7판.

교차 참조: knowledge/code-analysis/cubrid/cubrid-scan-manager.md (SCAN_ID/SCAN_TYPE 다형성); knowledge/code-analysis/cubrid/cubrid-scalar-functions.md (db_json 을 통과하는 동급 JSON 스칼라 함수).