Skip to content

CUBRID Reading Path — How a SELECT Executes End-to-End

Contents:

This document follows one concrete query — SELECT * FROM t WHERE x > 10 — from the moment a JDBC driver issues executeQuery until the last result row arrives back at the application. The trip starts as a TCP byte stream into a CUBRID broker daemon, is handed to a CAS worker process via Unix-domain SCM_RIGHTS file-descriptor passing, lands inside a cub_server request handler through a single CSS-framed NRP opcode, walks the entire compile pipeline (lexer → bison parser → semantic check → query rewrite → cost-based optimizer → XASL generator → XASL cache), and then enters the executor where a Volcano-style operator tree iterates rows out of either a sequential heap scan or an index range scan. Each candidate row is evaluated against the x > 10 predicate by a PRED_EXPR walker, then submitted to MVCC visibility via mvcc_satisfies_snapshot, and surviving rows are materialised into a list-file that the cursor reads back across the wire. The same network frame that brought the query in carries the result back out, with cub_cas and cub_broker hops in reverse. Every step below names one or two detail docs in knowledge/code-analysis/cubrid/ whose ## CUBRID's Approach section makes the corresponding mechanism precise; the prose here is synthesis, not fresh code reading.

The query is small on purpose. SELECT * FROM t WHERE x > 10 exercises exactly one base table, one inequality predicate, no joins, no aggregates, no ordering. That keeps the trip on the main spine of the engine — every step that fires for this query fires for nearly every query — and lets the document focus on threading the spine rather than enumerating branches. Branches that are not on this path (group-by, hash join, partition pruning, parallel-query, post-processing) are catalogued in the “What we did NOT cover” section at the end with one-line pointers to their detail docs.

The journey starts in a JDBC client process. The application calls Statement.executeQuery("SELECT * FROM t WHERE x > 10"); the JDBC driver wraps that call into a CCI (CUBRID Call Interface) request and pushes it down a TCP socket to the CUBRID broker. The broker daemon cub_broker is a SysV-shared-memory-coordinated parent process that has, at startup, forked a fixed pool of cub_cas worker processes; each CAS is a real Unix process with its own address space, embedding the CUBRID client library. The detail doc cubrid-broker.md is precise on the resulting topology: cub_broker owns the public TCP listener (sock_fd bound to getenv(PORT_NUMBER_ENV_STR)), accepts the JDBC client’s connection on a receiver_thr_f thread, queues the accepted descriptor in a max-heap-ordered job queue (T_MAX_HEAP_NODE job_queue[] inside the shared T_SHM_APPL_SERVER), and a dispatch_thr_f consults find_idle_cas to pick a CAS worker.

What happens next is the design’s most distinctive move. Rather than proxying every byte of SQL traffic through the broker, CUBRID hands the open kernel file descriptor of the client TCP socket to the chosen CAS over a pre-existing AF_UNIX rendezvous channel using the POSIX SCM_RIGHTS ancillary message — the same primitive nginx and pgbouncer use. After send_fd()/recv_fd(), the broker drops its copy of the descriptor and the CAS owns the data plane outright. From this point forward the JDBC driver and the CAS worker speak directly; the broker is back to its admin role, and SQL log collection, ACL enforcement, and connection-pool metering happen in the control plane through shared memory. cubrid-dbi-cci.md traces the CAS side of the handshake: ux_database_connect runs the CUBRID client API’s db_restart (which calls boot_restart_client in boot_cl.c), opening a transactional connection from the CAS process into a cub_server instance for the requested database. The CAS now holds two sockets — the JDBC-facing one it received via fd-handoff, and a server-facing one (the standard CSS connection, see cubrid-network-protocol.md).

The CAS receives the SQL text on the JDBC-facing side, runs it through the T_SRV_HANDLE-keyed ux_prepare / ux_execute pair, and inside those functions calls into the embedded db_* API: db_open_buffer produces a DB_SESSION, db_compile_statement_local runs the compile pipeline (Initial → Compiled → Prepared, FSM in session->stage[]), and db_execute_statement_local flips it Prepared → Executed. The split between prepare and execute is what lets the CAS reuse a compiled DB_SESSION across many executeQuery calls when the JDBC client uses a PreparedStatement; for our example, since the SQL is a literal text execution, prepare and execute fire back-to-back in the same ux_execute call. The detail doc cubrid-dbi-cci.md is the source of truth for how every binding (JDBC, CCI native, ODBC, Python, PHP, CSQL) collapses onto this same db_* core; the broker’s CAS is just the wire-driver wrapper that adapts CCI opcodes (CAS_FC_* in cas_protocol.h) to db_* calls through the flat server_fn_table in broker/cas.c.

The CAS now needs the server to compile and execute. The CAS is a CUBRID client in network terms — it runs the same network_cl.c stub that an embedded csql would use — and it ships the compile/execute work to cub_server through the CSS framing layer. cubrid-network-protocol.md is precise about the framing: every server entry point is a single NET_SERVER_* opcode in the central enum (enum net_server_request in network.h), dispatched on the server side through a static table (static struct net_request net_Requests[] in network_sr.c) populated at startup by net_server_init(). Each row of the table carries a function pointer plus an attribute bitmask — CHECK_DB_MODIFICATION, CHECK_AUTHORIZATION, IN_TRANSACTION — that declares the side-conditions of the call. The wire frame itself is length-prefixed: a NET_HEADER struct precedes the body, and pack/unpack is symmetric (or_pack_int, or_unpack_int, or_pack_value, …) so the client stub and the server handler form a mirrored pair.

Connection acceptance on the server side is handled by cub_master, which forwards the new connection to the requested cub_server via master::connector over a Unix-domain socket; from there the connection is owned by an epoll-based cubconn::connection::worker that reads CSS-framed packets and pushes the decoded request through net_Requests[]. For our SELECT, the relevant opcode is NET_SERVER_QM_QUERY_PREPARE (or the combined prepare+execute path used when JDBC issues Statement.executeQuery rather than PreparedStatement.executeQuery); the handler unpacks the SQL text plus host variables and dispatches into the query manager.

Before any compile work, the server has to bind the request to its session and transaction context. cubrid-server-session.md describes this in detail. Each CSS_CONN_ENTRY already carries a session_id and a cached session_p pointer; the first request on a new connection runs xsession_check_session to look up the SESSION_STATE in a server-wide lock-free hash keyed by SESSION_ID, after which subsequent requests on the same socket pay no hash lookup at all — the cached pointer is read directly. The session is the bookkeeping container (prepared statements, autocommit, last insert id, locale) and survives across many transactions; the transaction is a separate object owned by the lock and recovery managers. cubrid-transaction.md names that object the TDES (transaction descriptor): a per-transaction record in a server-wide trantable carrying a stable trid, a lifecycle state, an isolation level, and the MVCC snapshot. The connection layer owns one transaction index per connection; the worker thread copies that index into its THREAD_ENTRY so LOG_FIND_THREAD_TRAN_INDEX(thread_p) returns the right LOG_TDES. After this binding, every layer below — parser, optimizer, executor, scan manager, MVCC checker — has implicit access to both the SESSION_STATE and the LOG_TDES through thread_p.

Step 3 — Parse + semantic-check + rewrite

Section titled “Step 3 — Parse + semantic-check + rewrite”

The query manager hands the raw SQL text to the compile front-end. cubrid-parser.md walks the lexer/parser pipeline. The lexer is Flex-generated from csql_lexer.l; its yylex() returns one (token_class, lexeme) pair per call, walking a single-buffer YY_INPUT. The parser is Bison-generated from csql_grammar.y and is compiled with %glr-parser — the Generalized LR variant that forks on grammar conflicts and discards branches that fail to reduce — because SQL has historic ambiguities (the classical GROUP BY c, d list-vs-tuple parse) that LALR(1) cannot handle cleanly. Reduce actions in the grammar call parser_new_node to allocate PT_NODE objects out of a per-PARSER_CONTEXT block allocator, so the entire parse tree can later be freed in one pass. The result is a PT_NODE tree shaped exactly as the user wrote the query: a PT_SELECT root pointing at a PT_SPEC for t, a PT_NAME projection list (the * is expanded later), and a PT_EXPR of opcode > whose operands are a PT_NAME for x and a PT_VALUE for 10.

A freshly parsed tree is grammatically well-formed but not yet meaningful against the catalog. cubrid-semantic-check.md describes the four passes that pt_check_with_info chains. (1) Name resolution walks the tree under a stack of SCOPES (linked list, head innermost) and binds every PT_NAME to a PT_SPEC provider; for x this means consulting the catalog for t’s columns and threading the column descriptor onto the PT_NAME node. (2) Where-clause aggregate check ensures the predicate does not contain unbound aggregates — trivial here. (3) Host-variable replacement would substitute prepared-statement parameters; not relevant for our literal-text query. (4) semantic_check_local invokes pt_semantic_type for type evaluation: the > operator’s left operand x has a column type from the catalog, the right operand 10 is an integer literal, the type-checker selects an EXPRESSION_DEFINITION overload from pt_apply_expressions_definition and inserts a PT_CAST if implicit promotion is needed. Constant folding happens here too: a predicate like WHERE 1=1 AND x > 10 would fold the first conjunct into TRUE and drop it. Finally, pt_cnf rewrites the WHERE clause into conjunctive normal form so each conjunct is a candidate for index-driven evaluation.

After semantic check, the tree enters the rewrite layer. cubrid-query-rewrite.md catalogues the transformations (mq_rewrite is the entry point in query_rewrite.c): predicate pushdown, view inlining, subquery flattening, outer-join reduction, redundant-join elimination, auto-parameterisation, and the LIMIT-clause lowering case study. For our SELECT none of the heavyweight transforms fire — there are no views, no subqueries, no joins, no LIMIT. The rewrite phase still runs and produces a canonical normalised form so the XASL cache key is stable across equivalent textual variations; cubrid-xasl-cache.md explicitly notes that the cache hashes the rewritten SQL (“hash text”) rather than the user-supplied raw text, which is what makes SELECT * FROM t WHERE x>10 and SELECT * FROM t WHERE x > 10 collide on the same cache slot.

Step 4 — Optimizer + XASL generator + XASL cache

Section titled “Step 4 — Optimizer + XASL generator + XASL cache”

The optimizer turns the resolved, normalised PT_NODE into a costed plan. cubrid-query-optimizer.md describes the lowering: qo_optimize_query builds a QO_ENV query graph in which each FROM source becomes a QO_NODE, each predicate becomes a QO_TERM, and each column reference becomes a QO_SEGMENT. Statistics drive the cost model; cubrid-statistics.md is precise about what’s available — server-side xstats_update_statistics walks the heap and B+Tree to record cardinality, NDV, leaf/page counts, and partial-key fanouts on the catalog’s latest disk representation; the client reads them back through qo_get_attr_info so that qo_iscan_cost / qo_sscan_cost can score plans and qo_equal_selectivity / qo_range_selectivity can estimate predicate selectivity. The cost model itself is System R-shaped, with a fixed CPU/IO startup component plus a variable per-row component scaled by estimated cardinality.

For our query, the planner’s task is one-table access-path selection: heap scan or index scan over an index that covers x. If t has an index on x, the planner compares qo_sscan_cost (sequential heap scan, reads every page) against qo_iscan_cost (B+Tree range descent followed by per-OID heap fetches) using qo_range_selectivity for the x > 10 predicate — a small selectivity favours the index scan; a large one favours the heap scan because random heap reads outweigh the saved page count. With no usable index, the planner has only one choice. The DP join enumeration in qo_search_partition_join is a no-op for one table, so the optimizer settles on a single QO_PLAN with the chosen access spec. (Above 8 tables CUBRID switches from full DP to a partial-join “first-node-then-extend” heuristic, but that’s irrelevant here.)

The chosen QO_PLAN is procedural-but-not-yet-runnable; turning it into something the executor can call is the job of cubrid-xasl-generator.md. xasl_generation.c walks the plan with gen_outer/gen_inner and produces an XASL_NODE tree. For our SELECT, the root is a BUILDLIST_PROC XASL whose spec_list carries one ACCESS_SPEC_TYPE for table t (with where_pred populated for x > 10), whose outptr_list projects all columns (the * was expanded during semantic check), whose aptr_list and dptr_list are empty (no subqueries), and whose scan_ptr is NULL (no chained join). The predicate x > 10 becomes a PRED_EXPR of opcode T_PRED, with a T_EVAL_TERM leaf comparing a REGU_VARIABLE of type TYPE_ATTR_ID (column reference for x) against a REGU_VARIABLE of type TYPE_DBVAL (the constant 10). The whole tree is then offset-table-serialised via xasl_to_stream.c (the xts_* family) into a self-describing byte buffer; pointers are flattened to offsets, and shared sub-trees are emitted once.

cubrid-xasl-cache.md describes what happens next. The serialised stream’s hash text is SHA-1’d into an XASL_ID, and a server-wide latch-free hashmap is consulted: hit → reuse the cached XASL_CACHE_ENTRY (skip everything from parsing to here on the next execution of the same SQL), miss → insert. Cache entries carry the serialised plan, a per-class OID dependency list (so DDL fires xcache_remove_by_oid to invalidate dependents), a statistics snapshot at compile time (drift triggers a soft RT recompile), refcount + eviction metadata, and the time_stored timestamp. For our query’s first execution we land in the miss path: compile, insert, return the XASL_ID. Subsequent executions hit the cache and skip directly to step 5.

Step 5 — Executor dispatches the operator tree

Section titled “Step 5 — Executor dispatches the operator tree”

With an XASL_ID in hand the server’s query manager enters the executor. cubrid-query-executor.md is precise about the entry chain: xqmgr_execute_query looks up the cached XASL by XASL_ID, unpacks the host-variable buffer, and calls qmgr_process_query, which unpacks the XASL stream into an in-memory XASL_NODE tree (if not already a tree) and calls qexec_execute_query. The latter initialises an XASL_STATE (carrying a VAL_DESCR with host variables and timezone) and dispatches into qexec_execute_mainblock, the recursion entry that everything else (subqueries, CTEs, joined blocks) ultimately calls back into. qexec_execute_mainblock_internal then runs a switch (xasl->type) over the proc types — UPDATE_PROC, DELETE_PROC, INSERT_PROC, MERGE_PROC, CTE_PROC, DO_PROC, and so on — but our SELECT’s root is BUILDLIST_PROC, which falls through to the generic pull interpreter qexec_intprt_fnc.

The interpreter is the literal Volcano shape with three guard rings: an outer loop over qexec_next_scan_block_iterations (each iteration resets SCAN_IDs for a new combination of access specs), a middle loop over scan_next_scan for tuples within the current block, and an inner per-row body that runs bptr_list (path-expression fetches), dptr_list (correlated subqueries), scan_ptr (chained join block, recursive), and if_pred (residual predicates not pushed into the scan). The outer two loops give qexec_intprt_fnc its name; the inner block does the per-row work. For our one-table predicate-only query the inner block is degenerate — no path expressions, no correlated subqueries, no scan_ptr chain — so the loop reduces to: open root scan, pull next tuple, check predicate, append to xasl->list_id if qualified, repeat. The shape is identical to what Database Internals (Petrov, ch. 12) calls a Volcano-style filter-over-scan, just with CUBRID’s three-ring scaffolding.

To open the root scan, the executor calls qexec_open_scan on the spec (scan_open_<type>_scan underneath, dispatched into scan_manager.c). This returns a populated SCAN_ID whose type discriminator selects which arm of the tagged union holds live state. The status field starts at S_OPENED; the next call (scan_start_scan) advances it to S_STARTED by fixing initial pages, allocating HEAP_SCANCACHE for heap or BTREE_SCAN for index, and stamping the MVCC snapshot. From here the inner Volcano loop runs scan_next_scan repeatedly until S_END is returned, then scan_end_scan and scan_close_scan tear the SCAN_ID down. The lifecycle is the textbook five-step S_OPENED → S_STARTED → S_STARTED → S_ENDED → S_CLOSED sequence the heap-scan deck spelled out and that cubrid-query-executor.md and cubrid-scan-manager.md describe in matched detail.

Step 6 — Scan-manager picks the access method

Section titled “Step 6 — Scan-manager picks the access method”

cubrid-scan-manager.md is the single source of truth for what happens inside scan_open_<type>_scan and scan_next_scan. The polymorphism is data-driven: one SCAN_ID struct, one SCAN_TYPE discriminator, one switch per public function. The SCAN_TYPE enum enumerates exactly which access methods exist — S_HEAP_SCAN, S_PARALLEL_HEAP_SCAN, S_CLASS_ATTR_SCAN, S_INDX_SCAN, S_LIST_SCAN, S_SET_SCAN, S_JSON_TABLE_SCAN, S_METHOD_SCAN, S_VALUES_SCAN, S_SHOWSTMT_SCAN, S_HEAP_SCAN_RECORD_INFO, S_HEAP_PAGE_SCAN, S_INDX_KEY_INFO_SCAN, S_INDX_NODE_INFO_SCAN, S_DBLINK_SCAN, S_HEAP_SAMPLING_SCAN — and the catalogue is fixed at compile time. Adding a new access path means adding an enum value, a sub-struct in the union, a scan_open_* function, and switch arms in scan_start_scan/scan_next_scan_local/scan_end_scan/scan_close_scan.

For WHERE x > 10, the optimizer’s choice in step 4 has already decided which arm fires. The two relevant cases:

  • Heap scan path (S_HEAP_SCAN). The optimizer picked sequential access either because no index on x exists or because the predicate is unselective enough that index access would cost more (random heap reads dominate). scan_open_heap_scan populates a HEAP_SCAN_ID with HEAP_SCANCACHE and a starting OID; scan_next_scan per tuple does heap_next (or the MVCC-aware heap_next_record) to advance through the heap, then evaluates where_pred (the predicate x > 10) on each fetched record. Predicates not pushed into the scan stay on xasl->if_pred and run after the scan.
  • Index range scan path (S_INDX_SCAN). The optimizer picked index access because an index on x exists and x > 10 is selective. scan_open_index_scan populates an INDX_SCAN_ID with a BTREE_SCAN, an OID buffer, and three predicate triples — range_pred (bounds the index walk: x > 10 becomes the lower bound of the range), key_pred (applied during the walk to columns in the index), scan_pred (applied after the heap fetch). Each is a (regu_list, pr_eval_fnc, ...) triple stored on INDX_SCAN_ID. scan_next_scan per tuple advances the B+Tree cursor to the next (key, OID) pair, applies key_pred, follows the OID into the heap (skipping the heap fetch entirely when the index is covering — every referenced column is already in the key), and applies scan_pred.

The split between range_pred/key_pred/scan_pred is the same one PostgreSQL calls IndexQual/IndexFilter/Filter and MySQL calls range/ref/Using where, just with CUBRID’s identifiers. The detail doc spells out the matching field layout in INDX_SCAN_ID.

Step 7 — Heap or B+Tree access method runs

Section titled “Step 7 — Heap or B+Tree access method runs”

Below the scan manager sit the access-method modules. They speak in pages, slots, and OIDs. cubrid-page-buffer-manager.md is the substrate: every page fetch goes through pgbuf_fix on a Buffer Control Block (BCB), pinning the page in the buffer pool with a custom read/write/flush latch and a fix count. The buffer pool’s three-zone LRU split into per-thread private and shared lists with adjustable quotas decides eviction; for our scan, well-clustered sequential heap pages stay hot in the LRU.

For the heap-scan branch, cubrid-heap-manager.md walks the per-page logic. CUBRID’s heap pages are slotted: a small fixed header, a slot directory growing back from the end, and record bodies growing forward from the header. Each row carries an OID = (file, page, slot); the slot is the stable identifier (record bodies move during compaction but slot numbers do not). heap_next walks HEAP_CHAIN.next_vpid from the current page to the next, and within each page iterates slots; per-slot dispatch is on record_typeREC_HOME (record body lives in the slot), REC_RELOCATION (forwarding pointer to another slot), REC_BIGONE (overflow record in a separate file), and a few more. The fetched record is decoded by HEAP_CACHE_ATTRINFO::heap_attrinfo_read_dbvalues into a DB_VALUE slot per attribute, ready for predicate evaluation. The MVCC version chain is anchored in the heap header — each record carries its insert and delete MVCCIDs and a back-pointer to the prior version — and heap_get_visible_version_internal is what walks the chain when the on-page version is invisible to our snapshot.

For the index-scan branch, cubrid-btree.md carries the per-node logic. CUBRID’s B+Tree nodes are slotted (inherited from heap pages), keys in non-unique indexes are stored as key || OID concatenations (the OID is the duplicate-key tie-breaker), and unique indexes store pure keys with the OID stored at a known offset (overflowing into a per-key OID list when the count is large). Descent is lock-coupled (parent-then-child, release parent) on the read path; the write path adds a “restart from root” recovery when a concurrent split has invalidated the descent. For our x > 10 range scan, btree_keyval_search descends to the leaf containing the smallest key strictly greater than 10, then walks the sibling-link chain forward, yielding (key, OID) pairs to the caller. Each OID is then a heap fetch (skipped when covering); the heap fetch itself goes back through cubrid-heap-manager.md’s slot iteration and cubrid-page-buffer-manager.md’s page fix.

Either way, the cost paid per row is one heap-page fix (sequential under heap scan, random under index scan), one slot dereference, one heap_attrinfo_read_dbvalues per accessed column, and (for index scan) one B+Tree leaf-page fix that amortises across many (key, OID) pairs because leaf pages hold dozens to hundreds of entries.

Once the access method has fetched a candidate record into DB_VALUE slots, the predicate has to fire. cubrid-query-evaluator.md is the dispatcher’s source of truth. The predicate x > 10 is represented as a PRED_EXPR tree: a T_PRED Boolean root over a T_EVAL_TERM leaf that compares two REGU_VARIABLEs. The walker eval_pred traverses the tree under three-valued logic — every node returns one of V_TRUE, V_FALSE, V_UNKNOWN, or V_ERROR — and short-circuits AND/OR according to the SQL truth table. Each leaf calls fetch_peek_dbval to resolve a REGU_VARIABLE to a DB_VALUE. The dispatcher reads the regu’s type tag (constant, attribute fetch, list-file position, arithmetic expression, function call, host variable, OID, list-id) and routes into a path-specific resolver; for our predicate, the left operand resolves via TYPE_ATTR_ID into the column slot the access method just populated, and the right operand resolves via TYPE_DBVAL to the constant 10.

The actual > comparison is the work of the scalar-function library. cubrid-scalar-functions.md describes the operator-primitive layer: each OPERATOR_TYPE (here T_GT) dispatches into a qdata_* arithmetic dispatcher, which fans out by DB_TYPE into per-pair variants. The two operands’ types are coerced to a common domain via tp_value_auto_cast if necessary (e.g., one side INTEGER and the other BIGINT), then a type-specific comparator returns the Boolean. NULL semantics propagate: if either side is NULL, the comparator returns V_UNKNOWN, and the WHERE-clause collapse rule (encoded in QPROC_QUALIFICATION) treats it as V_FALSE for filter purposes. Only rows that produce V_TRUE advance to the next step.

CUBRID can specialise common predicate shapes for speed. eval_fnc inspects the PRED_EXPR and, for predicate shapes it recognises, returns a function pointer to a hand-written shape-specific evaluator (eval_pred_comp0 for binary equality, eval_pred_like6 for LIKE, etc.) that bypasses the recursive walker. For x > 10 the optimisation matters because the predicate evaluator is on the hot loop — once per row, multiplied by row count — and the saved virtual-call overhead is real on large heaps. The full recursive walker remains the fallback for any predicate too compound to specialise.

A row that satisfies the predicate is not yet a result row — it has to be visible under the transaction’s snapshot. cubrid-mvcc.md describes the model: every record header carries (inserted_by_mvccid, deleted_by_mvccid) plus a back-pointer to the previous version; every transaction takes a logical snapshot at its first read; visibility is decided by mvcc_satisfies_snapshot against the snapshot’s active-MVCCID set. A version is visible iff the inserter committed before the snapshot was taken (inserter not active, inserter MVCCID below the snapshot’s high-water mark) and the deleter (if any) either is still active or committed after the snapshot.

For heap scans, heap_next_record (the MVCC-aware variant of heap_next) calls heap_get_visible_version_internal per fetched record. If the on-page version is visible under our snapshot, that’s the row the predicate evaluator above sees; if it’s invisible (e.g., inserted by a transaction still active when our snapshot was taken), the version chain is walked back through the heap or the undo segment until a visible version is found, or the row is skipped entirely. For index scans, the visibility check is two-stage: the B+Tree leaf yields a candidate OID, the heap page is fetched, and only then is mvcc_satisfies_snapshot consulted on the heap-side header. The index scan additionally re-checks via locator_lock_and_get_object_with_evaluation when mvcc_select_lock_needed is set (i.e., SELECT ... FOR UPDATE); for our plain SELECT * the flag is off and no row-level lock is taken.

The visibility predicate is read-only and pays no log writes, but it does interact with the vacuum subsystem: a long-running snapshot pins the global “oldest visible MVCCID” watermark, which gates how aggressively dead versions can be reclaimed. For our short SELECT this is not a concern, but the structural cost of MVCC — that vacuum is gated by the longest-lived reader — is the textbook trade-off the design accepts in exchange for non-blocking reads. Predicate satisfaction and MVCC visibility are both required; a row that fails either is dropped silently from the result.

Visible, predicate-satisfying rows now have to land somewhere. cubrid-list-file.md describes the substrate. Every materialised tuple stream in CUBRID — sub-query result, sort output, hash-build side, group-by accumulator, and the final query result — is the same QFILE_LIST_ID linked-page abstraction backed by a per-query QMGR_TEMP_FILE membuf-then-FILE_TEMP substrate. The producer writes via qfile_add_tuple_to_list; the consumer reads via qfile_open_list_scan + qfile_scan_list_next. The page format on disk is PAGE_QRESULT with a 32-byte QFILE_PAGE_HEADER; tuples are length-prefixed packed rows with a per-tuple length and a per-value length each. The list-file is non-recoverable and non-WAL-logged — there is nothing to roll back, no transaction has committed against its contents, so crashes simply discard it.

For our BUILDLIST_PROC XASL, qexec_end_one_iteration is the per-row producer call. Inside the inner Volcano loop, after the predicate has returned V_TRUE and MVCC has marked the row visible, the executor projects the columns into the output VAL_LIST, packs them into a QFILE_TUPLE, and calls qfile_add_tuple_to_list on xasl->list_id. The list-file lives in the in-memory membuf array as long as it fits; once it grows past the threshold it migrates transparently to a FILE_TEMP on disk via file_create_temp in file_manager.c. The transition is invisible to the producer and consumer.

When the inner loop drains (the access method returns S_END), the list-file is closed for writes (qfile_close_list) and the result is ready to be shipped. A small SELECT may stream rows back to the wire as it produces them rather than going through a separate post-emit phase; either way, the final list-file is the canonical handoff point. The cursor object that the client side will manipulate sits on top of this list-file: cubrid-cursor.md describes how a CURSOR_ID is a client-side fetch handle that locks onto the server-side QFILE_LIST_ID and pages tuples one network-page at a time across qfile_get_list_file_page round-trips. Holdable cursors (those that survive COMMIT) are detached from the transaction and re-attached to the session’s holdable_cursors list via the mechanisms cubrid-cursor.md describes; a non-holdable cursor disappears at COMMIT/ROLLBACK.

The trip back is the trip out, in reverse. cubrid-network-protocol.md describes the server-to-client packing: each result row is or_pack_value’d into a wire buffer (one or_pack_* call per column), the buffer is wrapped in a CSS frame, and css_send_data_packet_for_request ships it out the connection socket. For SELECT queries the response is a result-set descriptor (column types, nullability, length list) followed by N rows; the descriptor is built from the XASL’s outptr_list and the list-file’s QFILE_TUPLE_VALUE_TYPE_LIST. The CAS process on the other end of the server-facing socket does the symmetric or_unpack_* walk to decode rows back into client-side DB_VALUE slots, then re-encodes them into CCI’s wire format on the JDBC-facing socket.

cubrid-broker.md notes that the broker is not on this hot path. Because of the SCM_RIGHTS fd handoff back in step 1, the JDBC client and the CAS speak directly; the broker cannot see individual result rows even if it wanted to. The broker only sees the data plane through indirect signals — SQL-log writes the CAS performs into shared memory, monitor counters incremented in the per-CAS T_APPL_SERVER_INFO, the CAS’s idle/busy state. This is why CUBRID can debug or restart cub_broker without dropping live queries (in principle — the detail doc flags one open question about whether all paths actually preserve this property under restart).

The JDBC driver receives the row stream and presents it to the application as a ResultSet. Each rs.next() call may either return a row already buffered in driver memory or pull the next network page from the CAS via the cursor protocol. When the application closes the result set or commits the transaction, the cursor closes (or migrates to the holdable list), qfile_destroy_list reclaims the list-file’s temp pages, the CAS releases the T_SRV_HANDLE, the DB_SESSION cleans up if no statements remain prepared, and — eventually, on driver-side Connection.close() — the JDBC TCP socket closes. The CAS then either parks in the broker’s idle pool (waiting for the next assignment) or, if KEEP_CONNECTION = AUTO and time_to_kill has expired, is harvested and a new CAS forked in its place.

flowchart TD
  JDBC["JDBC client<br/>Statement.executeQuery"]
  Driver["JDBC driver / CCI native"]
  Broker["cub_broker<br/>(receiver_thr_f → dispatch_thr_f<br/>find_idle_cas)"]
  CAS["cub_cas worker<br/>(ux_database_connect /<br/>ux_prepare / ux_execute)"]
  DBI["client db_∗ API<br/>db_open_buffer →<br/>db_compile_statement_local →<br/>db_execute_statement_local"]
  NetCl["network_cl<br/>or_pack_∗<br/>NET_SERVER_QM_QUERY_PREPARE"]
  Master["cub_master + connector<br/>over AF_UNIX"]
  Worker["cubconn::connection::worker<br/>(epoll loop)"]
  Dispatch["net_Requests[]<br/>dispatch table"]
  Session["xsession_check_session<br/>SESSION_STATE binding"]
  TDES["LOG_FIND_THREAD_TRAN_INDEX<br/>LOG_TDES binding"]
  Lex["Flex csql_lexer.l<br/>yylex()"]
  Bison["Bison %glr-parser<br/>csql_grammar.y<br/>parser_new_node"]
  PT["PT_NODE tree"]
  Sem["pt_check_with_info<br/>name_resolution → type_check<br/>→ constant_fold → pt_cnf"]
  Rew["mq_rewrite<br/>(query_rewrite.c)"]
  Opt["qo_optimize_query<br/>QO_NODE/QO_TERM/QO_SEGMENT<br/>DP join enumeration"]
  Stats["xstats_update_statistics<br/>· qo_iscan/sscan_cost"]
  Gen["xasl_generation.c<br/>gen_outer/gen_inner<br/>· xts_∗ serializer"]
  Cache["XASL cache<br/>SHA-1 hash text<br/>xcache_remove_by_oid"]
  Exec["qexec_execute_mainblock_internal<br/>switch (xasl->type)<br/>BUILDLIST_PROC → qexec_intprt_fnc"]
  Scan["scan_open_<type>_scan<br/>scan_next_scan<br/>switch (SCAN_TYPE)"]
  Heap["heap_next / heap_next_record<br/>HEAP_CHAIN walk<br/>heap_attrinfo_read_dbvalues"]
  BTree["btree_keyval_search<br/>leaf-page sibling walk<br/>(key||OID)"]
  PGB["pgbuf_fix BCB<br/>three-zone LRU"]
  Eval["eval_pred PRED_EXPR walk<br/>fetch_peek_dbval<br/>eval_fnc shape specialisation"]
  Sclr["qdata_∗_dbval<br/>tp_value_auto_cast<br/>type-pair comparator"]
  MVCC["mvcc_satisfies_snapshot<br/>active-MVCCID set check<br/>version chain walk"]
  LF["qfile_add_tuple_to_list<br/>QFILE_LIST_ID<br/>membuf → FILE_TEMP"]
  Cur["CURSOR_ID<br/>qfile_get_list_file_page"]
  NetSr["or_pack_value rows<br/>CSS framed reply"]
  CASBack["CAS server-side socket<br/>or_unpack_value"]
  CASOut["CAS JDBC-side socket<br/>(SCM_RIGHTS handoff target)"]
  Result["JDBC ResultSet<br/>application code"]

  JDBC --> Driver
  Driver -. TCP .-> Broker
  Broker -. SCM_RIGHTS fd .-> CAS
  CAS --> DBI
  DBI --> NetCl
  NetCl -. CSS frame .-> Master
  Master --> Worker
  Worker --> Dispatch
  Dispatch --> Session
  Session --> TDES
  TDES --> Lex
  Lex --> Bison
  Bison --> PT
  PT --> Sem
  Sem --> Rew
  Rew --> Opt
  Stats -. fed into .-> Opt
  Opt --> Gen
  Gen --> Cache
  Cache --> Exec
  Exec --> Scan
  Scan --> Heap
  Scan --> BTree
  Heap --> PGB
  BTree --> PGB
  PGB --> Eval
  Eval --> Sclr
  Sclr --> MVCC
  MVCC -- visible --> LF
  MVCC -- skip --> Scan
  LF --> Cur
  Cur --> NetSr
  NetSr -. CSS frame .-> CASBack
  CASBack --> CASOut
  CASOut -. TCP .-> Driver
  Driver --> Result

  classDef detail fill:#eef,stroke:#557,stroke-width:1px;
  class Broker,CAS detail;
  class DBI detail;
  class NetCl,Master,Worker,Dispatch detail;
  class Session,TDES detail;
  class Lex,Bison,PT,Sem,Rew detail;
  class Opt,Stats,Gen,Cache detail;
  class Exec,Scan detail;
  class Heap,BTree,PGB detail;
  class Eval,Sclr,MVCC detail;
  class LF,Cur,NetSr detail;

Annotations of the major arrows, by detail doc:

  • JDBC → Broker → CAScubrid-broker.md (TCP listener, job-queue dispatch, SCM_RIGHTS fd handoff) plus cubrid-dbi-cci.md (CCI wire, T_SRV_HANDLE, ux_database_connect).
  • CAS → DBI → NetCl → Master → Worker → Dispatchcubrid-network-protocol.md (CSS framing, NET_SERVER_* opcode dispatch, net_Requests[] table) plus cubrid-dbi-cci.md (db_* API surface, four-stage statement FSM).
  • Dispatch → Session → TDEScubrid-server-session.md (SESSION_STATE lock-free hash, CSS_CONN_ENTRY cache) plus cubrid-transaction.md (TDES binding via LOG_FIND_THREAD_TRAN_INDEX).
  • Lex → Bison → PT → Sem → Rewcubrid-parser.md (Flex/Bison GLR, PT_NODE tree, per-context allocator), cubrid-semantic-check.md (name resolution, type checking, constant folding, CNF), cubrid-query-rewrite.md (mq_rewrite, transformation catalogue).
  • Opt + Stats → Gen → Cachecubrid-query-optimizer.md (QO_ENV graph, DP join enumeration, cost model), cubrid-statistics.md (cardinality/NDV/page counts), cubrid-xasl-generator.md (gen_outer/gen_inner, REGU_VARIABLE/ACCESS_SPEC/OUTPTR_LIST, xts_* serialiser), cubrid-xasl-cache.md (SHA-1 hash text, RT recompile, per-class OID invalidation).
  • Cache → Exec → Scancubrid-query-executor.md (qexec_execute_mainblock_internal, qexec_intprt_fnc, three-ring Volcano loop) plus cubrid-scan-manager.md (SCAN_ID dispatch, open/start/next/end/close protocol).
  • Scan → Heap/BTree → PGBcubrid-heap-manager.md (slotted page, OID = (file, page, slot), heap_next, record_type dispatch, MVCC version chain anchor), cubrid-btree.md (key||OID, latch-coupling, leaf sibling walk), cubrid-page-buffer-manager.md (BCB, three-zone LRU, page fix/unfix protocol).
  • PGB → Eval → Sclr → MVCCcubrid-query-evaluator.md (eval_pred PRED_EXPR walk, fetch_peek_dbval, three-valued logic, eval_fnc specialisation), cubrid-scalar-functions.md (qdata_*_dbval, type-pair comparator, NULL propagation), cubrid-mvcc.md (mvcc_satisfies_snapshot, active-MVCCID set, version-chain walk).
  • MVCC → LF → Cur → NetSr → CAS → JDBCcubrid-list-file.md (QFILE_LIST_ID, qfile_add_tuple_to_list, membuf-to-temp transition), cubrid-cursor.md (CURSOR_ID, holdable handoff, page-at-a-time fetch), cubrid-network-protocol.md (or_pack_value, CSS reply framing) plus cubrid-broker.md and cubrid-dbi-cci.md for the symmetric outbound CAS hop.

The example query was deliberately small. Several major branches of the executor and its surrounding modules are off the path. They are listed here with one-line pointers so a reader who lands on this doc looking for a different query shape can navigate.

  • Post-processing (group-by, order-by, distinct, window/analytic). Triggered when the XASL’s qexec_groupby / qexec_orderby_distinct / qexec_execute_analytic phases run after the main pull loop drains. See cubrid-post-processing.md for the full second-pass machinery.
  • Hash join (build/probe). Fires when the optimizer picks HASH_LIST_SCAN for a join leaf; the build side reads tuples from a list file and hashes them, the probe side issues hash lookups per outer-loop tuple. See cubrid-hash-join.md.
  • Partition pruning. When the queried table is partitioned, the optimizer prunes irrelevant partitions before plan generation; the executor then iterates surviving partitions inside the scan-block loop. See cubrid-partition.md.
  • Parallel query. When S_PARALLEL_HEAP_SCAN is chosen, the heap is split across worker threads coordinated by a parallel-scan manager. See cubrid-parallel-query.md.
  • JSON_TABLE. A virtual relation derived from a JSON document column; uses the S_JSON_TABLE_SCAN arm of the SCAN_ID union and the cubscan::json_table::scanner C++ object. See cubrid-json-table.md.
  • dblink (foreign data). Cross-DB queries that route through S_DBLINK_SCAN and a remote CCI driver. See the dblink coverage in cubrid-scan-manager.md’s section on DBLINK_SCAN_ID.
  • DML proc types (INSERT / UPDATE / DELETE / MERGE). Handled by qexec_execute_insert / _update / _delete / _merge rather than qexec_intprt_fnc. See cubrid-ddl-execution.md for the broader DML discussion.
  • Triggers and authorisation. A SELECT may fire BEFORE/AFTER STATEMENT triggers (none here); authorisation is enforced before parser entry via the CHECK_AUTHORIZATION attribute on net_Requests[]. See cubrid-trigger.md and cubrid-authentication.md.
  • Locking. Plain MVCC-snapshot SELECT does not acquire row locks. SELECT ... FOR UPDATE flips mvcc_select_lock_needed and routes through cubrid-lock-manager.md.
  • Recovery / WAL. SELECT writes no log records; the page buffer’s WAL ordering constraint (cubrid-log-manager.md, cubrid-recovery-manager.md) only fires on writes.
  • Replication / HA. A standby server applies log records but does not run the SELECT pipeline; the same cub_master machinery routes the connection to the master if HA is configured (cubrid-ha-replication.md, cubrid-heartbeat.md).
  • Vacuum / version-chain reclamation. The MVCC reader pins the global “oldest visible” watermark; vacuum runs in a separate process. See cubrid-vacuum.md.

This doc is synthesis. The detail docs cited at each step carry the actual mechanism descriptions; the symbol names referenced inline come from those docs’ ## Source Walkthrough and ## CUBRID's Approach sections, not from fresh source-tree reading. The CUBRID source tree itself lives at /data/hgryoo/references/cubrid/ and the relevant entry-point files implied by the trip are:

  • src/broker/broker.c, src/broker/cas.c, src/broker/cas_execute.c — broker and CAS (step 1, step 11).
  • src/compat/db_admin.c, src/compat/db_vdb.c, src/compat/db_query.cdb_* client API (step 1).
  • src/communication/network_cl.c, src/communication/network_sr.c, src/communication/network_interface_*.{c,cpp} — NRP wire (step 2, step 11).
  • src/connection/connection_sr.c, src/connection/server_support.c — CSS server-side framing (step 2).
  • src/session/session.c, src/session/session_sr.cSESSION_STATE (step 2).
  • src/transaction/log_tran_table.c, src/transaction/transaction_sr.c — TDES (step 2).
  • src/parser/csql_lexer.l, src/parser/csql_grammar.y, src/parser/parse_tree.h, src/parser/parse_tree_cl.c — parser (step 3).
  • src/parser/semantic_check.c, src/parser/name_resolution.c, src/parser/type_checking.c, src/parser/cnf.c — semantic check (step 3).
  • src/optimizer/rewriter/query_rewrite*.c, src/parser/compile.c — query rewrite (step 3).
  • src/optimizer/query_graph.c, src/optimizer/query_planner.c, src/optimizer/plan_generation.c — optimizer (step 4).
  • src/storage/statistics_sr.c, src/storage/statistics_cl.c — statistics (step 4).
  • src/parser/xasl_generation.c, src/query/xasl_to_stream.c, src/xasl/xasl_stream.cpp — XASL generator (step 4).
  • src/query/xasl_cache.c, src/query/query_manager.c — XASL cache (step 4).
  • src/query/query_executor.c — executor (step 5).
  • src/query/scan_manager.c, src/query/scan_manager.h — scan manager (step 6).
  • src/storage/heap_file.c, src/storage/slotted_page.c — heap manager (step 7).
  • src/storage/btree.c, src/storage/btree_load.c — B+Tree (step 7).
  • src/storage/page_buffer.c — page buffer (step 7).
  • src/query/query_evaluator.c, src/query/fetch.c, src/query/regu_var.cpp — predicate evaluator (step 8).
  • src/query/arithmetic.c, src/query/numeric_opfunc.c, src/query/string_opfunc.c, src/query/query_opfunc.c — scalar functions (step 8).
  • src/transaction/mvcc.c, src/transaction/mvcc_table.cpp, src/transaction/mvcc_active_tran.cpp — MVCC (step 9).
  • src/query/list_file.c, src/query/query_list.h — list file (step 10).
  • src/query/cursor.c — cursor (step 10, step 11).

The full set of detail docs cited (in path order) is in the frontmatter references: block at the top of this file; the diagram-annotation list under ## Diagram — full pipeline names them again grouped by step. The off-path branches in ## What we did NOT cover link to the additional docs that this trip does not exercise.