CUBRID Query Executor — XASL Interpretation, Iterator Model, and Heap/Index Scan Operators
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Cross-check Notes
- Open Questions
- Sources
Theoretical Background
Section titled “Theoretical Background”A query executor is the runtime that turns a physical plan tree into a stream of result tuples. Database Internals (Petrov, Ch. 12 “Query Execution”) and Garcia-Molina/Ullman/Widom Database Systems: The Complete Book (Ch. 16 “The Query Compiler” and Ch. 15 “Query Execution”) frame it as a single design problem with two competing solution families: pull-based iterator models (Volcano), where each operator is asked for tuples one at a time, and push-based / vectorised models (MonetDB/X100, modern HyPer), where each operator emits batches into the next.
Goetz Graefe’s Volcano — An Extensible and Parallel Query Evaluation System (TKDE 1994) is the canonical reference for the iterator model. Its core is the three-method contract every operator implements:
open()— initialise scan state, allocate buffers, open underlying files and child operators recursively.next()— return the next tuple (or end-of-stream), pulling from children as needed.close()— release scan state, close child operators recursively.
The contract has three structural consequences that show up in every Volcano-style engine, including CUBRID:
- Composition is by tree, not by graph. Each operator names its children directly; the same tree shape that the optimiser produced is the shape the executor walks.
- Pipelining is automatic for non-blocking operators. A
Filterover aScanover anIndexproduces no intermediate materialisation; control simply walks down the tree on everynext(). - Blocking operators (sort, hash-build, group-by, top-N) materialise. They consume their entire input inside
open()(or on the firstnext()), write to a temporary file or in-memory structure, and become a leaf for everything above them.
Three textbook scan operators sit at the leaves of every plan tree:
- Heap scan — sequential walk over a row-oriented heap file. The cheapest leaf when the predicate is unselective and there is no usable index. CUBRID’s heap scan walks
HEAP_CHAIN.next_vpidand dispatches per slot viarecord_type(REC_HOME/REC_RELOCATION/REC_BIGONE); seecubrid-heap-manager.md. - Index scan — B+-tree range walk yielding
(key, OID)pairs followed by a heap point-read per OID (or no heap read at all when the index is covering — all referenced columns are in the key). - List scan — read back a previously materialised tuple stream. The result of
open()-ing aSortor a sub-block is a list file; the parent operator reads it as a list scan. List scan is what makes the hierarchical XASL tree work: every materialising sub-XASL produces a list, and its parent consumes that list as if it were any other relation.
The pull-based model has one well-known cost — per-tuple virtual call overhead — and one well-known benefit — trivial composition. CUBRID pays the cost (a function pointer per operator per tuple, in scan_next_scan → scan_next_scan_local → scan_next_<type>_scan) and reaps the benefit (~600 KB of query_executor.c glue is enough to handle the entire SQL surface, including correlated subqueries, group-by, analytic windows, hierarchical queries, CTEs, and partitioned classes). The vectorised alternative would require restructuring every operator’s interface; that work has not happened, and the document accordingly tracks the plan-tree iterator model as it stands.
Common DBMS Design
Section titled “Common DBMS Design”Every iterator-model executor — PostgreSQL, MySQL/InnoDB, Oracle, SQL Server, CUBRID — adopts the same patterns on top of Graefe’s three-method contract. Naming this vocabulary first makes the CUBRID-specific choices in ## CUBRID's Approach legible as dial settings, not inventions.
Per-operator state and dispatch
Section titled “Per-operator state and dispatch”Each operator carries a state object that lives across open / next / close. PostgreSQL splits the immutable plan (Plan) from the per-execution plan state (PlanState); ExecProcNode(PlanState *) reads type and jumps into the per-operator next (ExecSeqScan, ExecIndexScan, ExecHashJoin, …) via a function pointer set in ExecInitNode. MySQL’s handler API (handler::rnd_init / rnd_next / rnd_end) plays the analogous role for storage-side scans. CUBRID’s plan is the XASL node tree (xasl_node); the per-operator state for a scan is the SCAN_ID tagged union (one of HEAP_SCAN_ID, INDX_SCAN_ID, LLIST_SCAN_ID, SHOWSTMT_SCAN_ID, …), and dispatch is switch (xasl->type) in qexec_execute_mainblock_internal over BUILDLIST_PROC / BUILDVALUE_PROC / UNION_PROC / MERGELIST_PROC / UPDATE_PROC / DELETE_PROC / INSERT_PROC / MERGE_PROC / CONNECTBY_PROC / CTE_PROC / DO_PROC and a few more. Row-producing types fall through to a generic pull interpreter, qexec_intprt_fnc.
Materialised output as a list file
Section titled “Materialised output as a list file”Sort, hash-build, group-by, and sub-block boundaries cannot stream. The standard implementation gives each blocking operator a backing temporary file and a scan over that file as its output. PostgreSQL uses Tuplesortstate / Tuplestorestate. CUBRID writes list files (QFILE_LIST_ID, list_file.c); the parent reads them back as a list scan (scan_open_list_scan). Every BUILDLIST XASL materialises into xasl->list_id; the final query result is the top XASL’s list file, returned by qexec_execute_query.
Subquery placement — three pointers
Section titled “Subquery placement — three pointers”Where a subquery hangs in the operator tree determines when it executes. PostgreSQL distinguishes InitPlan (uncorrelated, once-only) from SubPlan (correlated, per-row); CUBRID encodes the same distinction directly on the XASL node:
xasl->aptr_list— uncorrelated subqueries. Executed once before the outer scan; result is referenced via aREGU_VARIABLEof typeCONSTANTwhosedbvalptrpoints into the materialised tuple.xasl->dptr_list— correlated subqueries. Executed once per outer row from insideqexec_intprt_fncafter the outer’s correlation columns are bound.xasl->scan_ptr— the chained scan operator the outer iterates against. Multi-table joins are unrolled as a chainouter.scan_ptr = middle,middle.scan_ptr = inner.
Predicate split and visibility checks
Section titled “Predicate split and visibility checks”When the leaf is an index scan, predicates split three ways: range (bounds the index walk), key (applied during the walk to columns in the index), data (applied after the heap fetch). CUBRID’s INDX_SCAN_ID stores them as range_pred / key_pred / scan_pred, each with its own regu_list and pr_eval_fnc. The same split appears in MySQL’s range/ref/Using where and PostgreSQL’s IndexQual/IndexFilter/Filter.
A heap scan calls mvcc_satisfies_snapshot on each fetched record (via heap_next → heap_get_visible_version_internal). An index scan additionally re-checks via locator_lock_and_get_object_with_evaluation when mvcc_select_lock_needed is set (SELECT ... FOR UPDATE).
Theory ↔ CUBRID mapping
Section titled “Theory ↔ CUBRID mapping”| Theoretical concept | CUBRID name |
|---|---|
| Operator plan node | xasl_node / XASL_NODE (src/xasl/xasl_node.hpp) |
| Plan-state / per-exec state | xasl_state + per-spec SCAN_ID (src/query/scan_manager.h) |
| Dispatch by operator type | switch (xasl->type) in qexec_execute_mainblock_internal (query_executor.c) |
| Generic pull-loop interpreter | qexec_intprt_fnc (query_executor.c) |
Operator open() | qexec_open_scan (query_executor.c) → scan_open_<type>_scan (scan_manager.c) |
Operator next() | scan_next_scan → scan_next_scan_local → scan_next_<type>_scan (scan_manager.c) |
Operator close() | scan_end_scan + scan_close_scan (scan_manager.c) |
| Materialised output | QFILE_LIST_ID (list_file.c); per XASL on xasl->list_id |
| Uncorrelated subquery | xasl->aptr_list (executed once, pre-scan) |
| Correlated subquery | xasl->dptr_list (executed per outer row) |
| Join / nested operator chain | xasl->scan_ptr (single-link chain, walked top-down) |
| Index range / key / data predicates | INDX_SCAN_ID::range_pred / key_pred / scan_pred |
| Predicate evaluator function pointer | SCAN_PRED::pr_eval_fnc set by eval_fnc (query_evaluator.c) |
| Outer-block iteration entry | qexec_next_scan_block_iterations (query_executor.c) |
| Outer-block result write | qexec_end_one_iteration (query_executor.c) |
| Group-by | qexec_groupby / qexec_hash_gby_* (query_executor.c) |
| Order by + distinct | qexec_orderby_distinct (query_executor.c) |
| Analytic functions | qexec_execute_analytic (query_executor.c) |
| Server-side query entry | xqmgr_execute_query → qmgr_process_query → qexec_execute_query |
| Heap-scan operator | S_HEAP_SCAN SCAN_ID type, opened by scan_open_heap_scan |
| Index-scan operator | S_INDX_SCAN SCAN_ID type, opened by scan_open_index_scan |
| List-scan operator | S_LIST_SCAN SCAN_ID type, opened by scan_open_list_scan |
| Parallel heap scan | S_PARALLEL_HEAP_SCAN (scan_open_parallel_heap_scan, query/parallel/) |
| Per-scan heap state | HEAP_SCANCACHE (cached page latch + snapshot — see cubrid-heap-manager.md) |
| Per-record decoder | HEAP_CACHE_ATTRINFO (heap_attrinfo.h; heap_attrinfo_read_dbvalues) |
CUBRID’s Approach
Section titled “CUBRID’s Approach”The executor does four things: (1) walk the XASL tree and dispatch each node by xasl->type; (2) for the row-producing types, run a uniform open/next/close loop over the spec’s SCAN_ID; (3) for blocking nodes, materialise output into a list file and read it back via list scan; (4) for boundary nodes (group-by, order-by, analytic), drive a separate phase after the main loop. The four are interleaved by the execution model rather than sequential; the rest of this section walks them in the order an actual query traverses them.
Server-side entry — from xqmgr_execute_query to qexec_execute_mainblock_internal
Section titled “Server-side entry — from xqmgr_execute_query to qexec_execute_mainblock_internal”A query reaches the executor through the query manager. xqmgr_execute_query looks up the cached XASL by XASL_ID, unpacks the host-variable buffer, and calls qmgr_process_query, which unpacks the XASL stream (if not already a tree) and calls qexec_execute_query. The latter initialises an XASL_STATE (carrying the VAL_DESCR with host vars and timezone) and dispatches to qexec_execute_mainblock, which is the recursion entry point that everything else (subqueries, joined blocks, CTEs) calls back into:
// qexec_execute_mainblock — src/query/query_executor.c (condensed)intqexec_execute_mainblock (THREAD_ENTRY * thread_p, xasl_node * xasl, xasl_state * xstate, UPDDEL_CLASS_INSTANCE_LOCK_INFO * p_class_lock_info){ // ... recursion-depth guard, trace timer setup ... thread_inc_recursion_depth (thread_p); error = qexec_execute_mainblock_internal (thread_p, xasl, xstate, p_class_lock_info); // ... trace timer stop, perfmon stats accumulate ... thread_dec_recursion_depth (thread_p); return error;}qexec_execute_mainblock_internal is the operator-type dispatch:
// qexec_execute_mainblock_internal — src/query/query_executor.c (condensed)static intqexec_execute_mainblock_internal (THREAD_ENTRY * thread_p, XASL_NODE * xasl, XASL_STATE * xasl_state, UPDDEL_CLASS_INSTANCE_LOCK_INFO * p_lock_info){ // ... limit-clause early exit ... switch (xasl->type) { case UPDATE_PROC: error = qexec_execute_update (thread_p, xasl, false, xasl_state); break; case DELETE_PROC: error = qexec_execute_delete (thread_p, xasl, xasl_state); break; case INSERT_PROC: error = qexec_execute_insert (thread_p, xasl, xasl_state, false); break; case MERGE_PROC: error = qexec_execute_merge (thread_p, xasl, xasl_state); break; case CTE_PROC: error = qexec_execute_cte (thread_p, xasl, xasl_state); break; case DO_PROC: error = qexec_execute_do_stmt(thread_p, xasl, xasl_state); break; case CONNECTBY_PROC: /* fall through to special hierarchical interpreter */ break; /* BUILDLIST_PROC / BUILDVALUE_PROC / UNION_PROC / ... fall through to pull-loop below */ } // ... pre-processing (open scan, run aptr) // ... processing (qexec_intprt_fnc loop) // ... post-processing (group-by, order-by, analytic)}DML proc types route to dedicated functions because they have to drive an outer scan and perform the modification with proper locking. Row-producing types share the same generic pull interpreter.
XASL tree shape — aptr / dptr / scan_ptr
Section titled “XASL tree shape — aptr / dptr / scan_ptr”A single XASL node carries three pointers to other XASLs that frame its surroundings:
flowchart TB
subgraph X["xasl_node"]
T["type:<br/>BUILDLIST / BUILDVALUE /<br/>UNION / SCAN / ..."]
SP["spec_list<br/>(ACCESS_SPEC_TYPE chain —<br/>tables/indexes/list-files for this block)"]
OP["outptr_list<br/>(REGU_VARIABLE list — output projection)"]
A["aptr_list →<br/>uncorrelated subqueries<br/>(executed pre-scan, once)"]
D["dptr_list →<br/>correlated subqueries<br/>(executed per outer row)"]
S["scan_ptr →<br/>next operator in chain<br/>(joined block / nested scan)"]
L["list_id →<br/>materialised output<br/>(QFILE_LIST_ID)"]
end
A -. recurse .-> X1["aptr xasl"]
D -. recurse .-> X2["dptr xasl"]
S -. recurse .-> X3["scan_ptr xasl"]
The example from the v0.7 deck makes the relationship concrete:
SELECT (select col1 from tab c where col1 = a.col1) -- DPTR (correlated) FROM tab a, tab b -- SCAN_PTR chain WHERE a.col2 = b.col2 AND a.col1 = (select col1 from tab d); -- APTR (uncorrelated)flowchart LR TopXASL["BUILDLIST<br/>(SELECT)<br/>spec=tab a"] AptrXASL["BUILDLIST<br/>spec=tab d<br/>flag=XASL_LINK_TO_REGU_VARIABLE"] DptrXASL["BUILDLIST<br/>spec=tab c<br/>flag=XASL_LINK_TO_REGU_VARIABLE"] ScanPtrXASL["SCAN_PROC<br/>spec=tab b"] TopXASL -- aptr_list --> AptrXASL TopXASL -- dptr_list --> DptrXASL TopXASL -- scan_ptr --> ScanPtrXASL
Plan-evaluation order: aptr → outer scan starts → for each outer row → dptr (executes once per row) → scan_ptr (joins), then end_one_iteration writes the projected row to the outer’s list_id.
The pull loop — qexec_intprt_fnc
Section titled “The pull loop — qexec_intprt_fnc”For BUILDLIST / BUILDVALUE / SCAN_PROC nodes the executor runs the generic interpreter:
// qexec_intprt_fnc — src/query/query_executor.c (condensed)static SCAN_CODEqexec_intprt_fnc (THREAD_ENTRY * thread_p, XASL_NODE * xasl, XASL_STATE * xasl_state, QFILE_TUPLE_RECORD * tplrec, XASL_SCAN_FNC_PTR next_scan_fnc){ while ((xb_scan = qexec_next_scan_block_iterations (thread_p, xasl)) == S_SUCCESS) { while ((ls_scan = scan_next_scan (thread_p, &xasl->curr_spec->s_id)) == S_SUCCESS) { /* per-row: bptr (path expressions) */ for (xptr = xasl->bptr_list; qualified && xptr; xptr = xptr->next) qexec_execute_obj_fetch (thread_p, xptr, xasl_state, scan_op);
/* per-row: dptr (correlated subqueries) */ for (xptr = xasl->dptr_list; xptr; xptr = xptr->next) qexec_execute_mainblock (thread_p, xptr, xasl_state, NULL);
/* per-row: scan_ptr (joined block, recursive) */ if (xasl->scan_ptr) qexec_execute_scan (thread_p, xasl->scan_ptr, xasl_state, tplrec, next_scan_fnc);
/* per-row: predicates not pushed to the scan */ if (xasl->if_pred) ev_res = eval_pred (thread_p, xasl->if_pred, &xasl_state->vd, NULL);
/* write the qualified row to xasl->list_id */ if (qualified) qexec_end_one_iteration (thread_p, xasl, xasl_state, tplrec); } }}Three nested loops: outer over scan blocks (each next_scan_block resets the SCAN_IDs for a new combination of access specs — see “Block iterations” below), middle over scan_next_scan for tuples within the current block, inner over the per-row work (path expressions, correlated subqueries, join chain, residual predicates). The loop is the literal Volcano shape, just with three guard rings instead of one.
Open / Next / Close — SCAN_ID is the operator state
Section titled “Open / Next / Close — SCAN_ID is the operator state”The SCAN_ID is a tagged union over all the scan kinds CUBRID supports. Its type field selects which arm of the union is live:
// SCAN_ID — src/query/scan_manager.h (sketch)typedef enum { S_HEAP_SCAN, S_HEAP_SCAN_RECORD_INFO, S_HEAP_PAGE_SCAN, S_HEAP_SAMPLING_SCAN, S_INDX_SCAN, S_INDX_KEY_INFO_SCAN, S_INDX_NODE_INFO_SCAN, S_LIST_SCAN, S_CLASS_ATTR_SCAN, S_SET_SCAN, S_JSON_TABLE_SCAN, S_VALUES_SCAN, S_DBLINK_SCAN, S_PARALLEL_HEAP_SCAN, S_SHOWSTMT_SCAN, ...} SCAN_TYPE;
struct scan_id_struct{ SCAN_TYPE type; SCAN_OPERATION_TYPE scan_op_type; bool fixed; /* PEEK ok? (page latch held across calls) */ bool grouped; /* iterating in groups (UPDATE/DELETE join helper) */ bool mvcc_select_lock_needed; S_DIRECTION direction; /* S_FORWARD / S_BACKWARD */ S_POSITION position; /* S_BEFORE / S_ON / S_AFTER */ S_STATUS status; /* S_OPENED / S_STARTED / S_ENDED / S_CLOSED */ QPROC_QUALIFICATION qualification; VAL_LIST *val_list; /* output projection target */ VAL_DESCR *vd; union { HEAP_SCAN_ID hsid; INDX_SCAN_ID isid; LLIST_SCAN_ID llsid; /* list scan */ SET_SCAN_ID ssid; REGU_VALUES_SCAN_ID rvsid; /* VALUES (...), VALUES (...) */ PARALLEL_HEAP_SCAN_ID phsid; /* ... */ } s;};The lifecycle of a SCAN_ID is the textbook five-step sequence the heap-scan deck spelled out:
stateDiagram-v2 [*] --> S_OPENED : scan_open_<type>_scan S_OPENED --> S_STARTED : scan_start_scan (fix initial pages, allocate caches) S_STARTED --> S_STARTED : scan_next_scan loop S_STARTED --> S_ENDED : scan_end_scan (release per-block resources) S_ENDED --> S_OPENED : (next access-spec / block iteration) S_ENDED --> [*] : scan_close_scan (release file/list/temp)
The split between start/end and open/close exists because of block iterations: when a query has multiple access specs (e.g. UNION-ALL of two tables, or a partitioned class with multiple sub-classes), each block walks start_scan → next_scan*loop → end_scan while the file-level resources (allocated by open_scan) are kept across blocks and released only at close_scan.
Heap scan path — concretely
Section titled “Heap scan path — concretely”scan_open_heap_scan builds a HEAP_SCAN_ID from the access spec’s regu lists:
// scan_open_heap_scan — src/query/scan_manager.c (condensed)intscan_open_heap_scan (THREAD_ENTRY * thread_p, SCAN_ID * scan_id, bool mvcc_select_lock_needed, SCAN_OPERATION_TYPE op_type, int fixed, int grouped, QPROC_SINGLE_FETCH single_fetch, DB_VALUE * join_dbval, val_list_node * val_list, VAL_DESCR * vd, OID * cls_oid, HFID * hfid, regu_variable_list_node * regu_list_pred, PRED_EXPR * pr, regu_variable_list_node * regu_list_rest, ...){ HEAP_SCAN_ID *hsidp = &scan_id->s.hsid;
scan_id->type = scan_type; /* S_HEAP_SCAN */ scan_init_scan_id (scan_id, mvcc_select_lock_needed, op_type, fixed, grouped, ...);
COPY_OID (&hsidp->cls_oid, cls_oid); hsidp->hfid = *hfid; UT_CAST_TO_NULL_HEAP_OID (&hsidp->hfid, &hsidp->curr_oid); /* iterator cursor */
/* attach scan predicate + its compiled evaluator (eval_fnc picks the fastest * pr_eval_fnc for this PRED_EXPR shape) */ scan_init_scan_pred (&hsidp->scan_pred, regu_list_pred, pr, ((pr) ? eval_fnc (thread_p, pr, &single_node_type) : NULL)); scan_init_scan_attrs (&hsidp->pred_attrs, num_attrs_pred, attrids_pred, cache_pred); hsidp->rest_regu_list = regu_list_rest; scan_init_scan_attrs (&hsidp->rest_attrs, num_attrs_rest, attrids_rest, cache_rest);
hsidp->scancache_inited = false; /* HEAP_SCANCACHE allocated lazily in start_scan */ hsidp->scanrange_inited = false; return NO_ERROR;}The shape is uniform across scan types: copy identifiers and predicates from the optimiser-produced access spec into the SCAN_ID’s union, install a compiled predicate evaluator, and defer the actual page-fix / latch acquisition to scan_start_scan. eval_fnc (query_evaluator.c) is the operator-fusion entry point: it inspects the PRED_EXPR shape and returns one of a handful of optimised function pointers (single-comparison, conjunction, disjunction, between, …) so the per-row inner loop pays one indirect call instead of recursing through a generic predicate tree.
scan_next_heap_scan is the per-tuple next(). The structure is exactly the one the heap-scan deck describes:
// scan_next_heap_scan — src/query/scan_manager.c (condensed)static SCAN_CODEscan_next_heap_scan (THREAD_ENTRY * thread_p, SCAN_ID * scan_id){ HEAP_SCAN_ID *hsidp = &scan_id->s.hsid; FILTER_INFO data_filter = { &hsidp->scan_pred, &hsidp->pred_attrs, NULL, NULL, scan_id->val_list, scan_id->vd, &hsidp->cls_oid, ... };
while (1) { /* 1. Pull the next visible tuple from the heap. */ sp_scan = heap_next (thread_p, &hsidp->hfid, &hsidp->cls_oid, &hsidp->curr_oid, &recdes, &hsidp->scan_cache, is_peeking); if (sp_scan != S_SUCCESS) return (sp_scan == S_END) ? S_END : S_ERROR;
/* 2. Evaluate the WHERE-pushed predicate against the tuple's pred_attrs. */ ev_res = eval_data_filter (thread_p, p_current_oid, &recdes, &hsidp->scan_cache, &data_filter); if (ev_res != V_TRUE) continue; /* not qualified — try next */
/* 3. If FOR UPDATE / non-MVCC class, lock and re-fetch — re-evaluate predicate. */ if (scan_id->mvcc_select_lock_needed) { sp_scan = locator_lock_and_get_object_with_evaluation (...); if (mvcc_reev_data.filter_result == V_FALSE) continue; }
/* 4. Fetch the rest of the columns (those not needed for the predicate). */ if (hsidp->rest_regu_list) fetch_val_list (thread_p, hsidp->rest_regu_list, scan_id->vd, &hsidp->cls_oid, p_current_oid, NULL, COPY);
return S_SUCCESS; }}The four-step inner shape — next obj → predicate → maybe-lock → rest values — is the executor-side amplifier of the predicate-attribute split: only pred_attrs are decoded before the predicate is evaluated, so a row that fails the WHERE clause never pays for decoding the projection columns. heap_attrinfo_read_dbvalues (called inside eval_data_filter) decodes only the pred_attrs; the rest_attrs decoder runs only after the row qualifies.
heap_next itself is the heap-side iterator that the executor calls into:
// heap_next — src/storage/heap_file.c (entry; condensed)SCAN_CODEheap_next (THREAD_ENTRY * thread_p, const HFID * hfid, OID * class_oid, OID * next_oid, RECDES * recdes, HEAP_SCANCACHE * scan_cache, int ispeeking){ return heap_next_internal (thread_p, hfid, class_oid, next_oid, recdes, scan_cache, ispeeking, false /*reversed*/, NULL, NULL);}
// heap_next_internal — src/storage/heap_file.c (condensed)static SCAN_CODEheap_next_internal (THREAD_ENTRY * thread_p, const HFID * hfid, OID * class_oid, OID * next_oid, RECDES * recdes, HEAP_SCANCACHE * scan_cache, bool ispeeking, bool reversed_direction, ...){ while (true) { while (true) { /* Reuse the cached page latch if it covers oid; otherwise unfix * the old one and fix the new vpid via heap_scan_pb_lock_and_fetch. */ if (scan_cache->page_watcher.pgptr != NULL && !VPID_EQ (&vpid, vpidptr_incache)) pgbuf_replace_watcher (thread_p, &scan_cache->page_watcher, &old_page_watcher); if (scan_cache->page_watcher.pgptr == NULL) scan_cache->page_watcher.pgptr = heap_scan_pb_lock_and_fetch (thread_p, &vpid, OLD_PAGE_PREVENT_DEALLOC, S_LOCK, scan_cache, &scan_cache->page_watcher); /* Walk to next slot on the page; if we ran off the page, follow * HEAP_CHAIN.next_vpid and continue. */ ... } /* Apply mvcc_satisfies_snapshot and follow REC_RELOCATION / REC_BIGONE * chains to materialise the visible record body into recdes. */ ... }}The cache_last_fix_page pattern (the page latch is kept across heap_next calls when consecutive next-OIDs land on the same page) is what gives a heap scan amortised O(1)-per-tuple page-fix cost. See cubrid-heap-manager.md for record-type dispatch (REC_HOME / REC_RELOCATION → REC_NEWHOME / REC_BIGONE → overflow) inside heap_next_internal.
Index scan path — predicate split + heap point-read
Section titled “Index scan path — predicate split + heap point-read”scan_open_index_scan builds an INDX_SCAN_ID whose three predicate slots — range_pred, key_pred, scan_pred — each carry a regu_list and an eval_fnc-compiled evaluator. scan_next_index_scan walks the B+-tree under btree_range_search, evaluates the range and key predicates during the walk, and for each surviving OID either (a) returns from the covering index directly when the projection is fully covered, or (b) calls locator_attribute_info_force / heap_get_visible_version to fetch the heap row and apply scan_pred against it.
flowchart LR
Spec["ACCESS_SPEC<br/>(TARGET_CLASS, ACCESS_METHOD_INDEX)"] --> Open["scan_open_index_scan"]
Open --> SID["INDX_SCAN_ID<br/>{range_pred, key_pred, scan_pred,<br/>cls_regu_list_rest, indx_cov, ...}"]
SID --> Start["scan_start_scan"]
Start --> Next["scan_next_index_scan"]
Next -->|btree_range_search| Btree["B+-tree leaf walk"]
Btree --> KeyEval["evaluate key_pred"]
KeyEval -- pass --> CovTest{covering<br/>index?}
CovTest -- yes --> ListWrite["copy from index key<br/>directly to val_list"]
CovTest -- no --> Heap["heap_get_visible_version (oid)"]
Heap --> ScanEval["evaluate scan_pred"]
ScanEval -- pass --> ListWrite
ListWrite --> Next
The benefit of pushing predicates this finely is workload-dependent: a query whose key_pred rejects 90 % of the index range pays no heap-fetch tax for those keys, only a B+-tree leaf comparison.
List scan path — feeding parents from materialised children
Section titled “List scan path — feeding parents from materialised children”When the optimiser produces a sub-block — a derived table, a sorted intermediate, an aptr that can’t be inlined, an aggregation result — the sub-block’s XASL is its own BUILDLIST_PROC that materialises into xasl->list_id. The parent’s access spec is TARGET_LIST and points at the sub-XASL:
SELECT a.col1 FROM (select col1 from tab a where col2 = 3) a, tab b WHERE a.col1 = b.col1;flowchart LR TopXASL["Outer BUILDLIST<br/>spec_list=[TARGET_LIST(a), TARGET_CLASS(b)]"] AptrSub["Sub-XASL<br/>BUILDLIST(tab a)<br/>materialises → list_id"] TopXASL -- aptr_list --> AptrSub TopXASL --> ScanA["scan_open_list_scan over a.list_id"] TopXASL --> ScanB["scan_open_heap_scan over tab b"] ScanA --> Next1["scan_next_list_scan tuple"] ScanB --> Next2["scan_next_heap_scan tuple"]
scan_next_list_scan walks the list file via qfile_scan_list_next, applies scan_pred against the materialised tuple, and projects via rest_regu_list — exactly the same shape as heap-scan but with qfile_scan_list_next instead of heap_next and no MVCC step (the list file already holds projected DB_VALUEs).
scan_next_scan — the outer dispatcher
Section titled “scan_next_scan — the outer dispatcher”The interpreter never calls a scan-type-specific next directly; it goes through scan_next_scan:
// scan_next_scan — src/query/scan_manager.c (condensed)SCAN_CODEscan_next_scan (THREAD_ENTRY * thread_p, SCAN_ID * s_id){ return scan_handle_single_scan (thread_p, s_id, scan_next_scan_local);}scan_handle_single_scan adds wrappers that handle (a) single_fetch semantics for SELECT-INTO-style queries, (b) per-tuple statistics counters, and (c) the qualification resolution loop. scan_next_scan_local is the actual dispatch:
// scan_next_scan_local — src/query/scan_manager.c (condensed)static SCAN_CODEscan_next_scan_local (THREAD_ENTRY * thread_p, SCAN_ID * scan_id){ switch (scan_id->type) { case S_HEAP_SCAN: case S_HEAP_SCAN_RECORD_INFO: case S_HEAP_SAMPLING_SCAN: return scan_next_heap_scan (thread_p, scan_id); case S_INDX_SCAN: return scan_next_index_scan (thread_p, scan_id); case S_LIST_SCAN: return scan_next_list_scan (thread_p, scan_id); case S_CLASS_ATTR_SCAN: return scan_next_class_attr_scan (thread_p, scan_id); case S_SHOWSTMT_SCAN: return scan_next_showstmt_scan (thread_p, scan_id); case S_SET_SCAN: return scan_next_set_scan (thread_p, scan_id); case S_JSON_TABLE_SCAN: return scan_next_json_table_scan (thread_p, scan_id); case S_VALUES_SCAN: return scan_next_value_scan (thread_p, scan_id); case S_DBLINK_SCAN: return scan_next_dblink_scan (thread_p, scan_id); case S_HEAP_PAGE_SCAN: return scan_next_heap_page_scan (thread_p, scan_id); case S_INDX_KEY_INFO_SCAN: return scan_next_index_key_info_scan (thread_p, scan_id); /* ... S_PARALLEL_HEAP_SCAN: returned by a different path, see below ... */ default: return S_ERROR; }}Adding a new scan operator is a closed surgery: extend SCAN_TYPE, add the union arm, add an open / next / start / end / close set, and append a case to this switch.
End-to-end — a concrete SELECT walk
Section titled “End-to-end — a concrete SELECT walk”SELECT col1, col2 FROM t1 WHERE col1 = 1;The plan is a single BUILDLIST_PROC with one TARGET_CLASS access spec on t1. Optimiser chose ACCESS_METHOD_SEQUENTIAL (no usable index, or seqscan cost wins). The runtime path is:
sequenceDiagram
participant CL as Client
participant QM as qmgr_process_query
participant QE as qexec_execute_query
participant MB as qexec_execute_mainblock_internal
participant IPF as qexec_intprt_fnc
participant SC as scan_next_scan / scan_next_heap_scan
participant HF as heap_next_internal
participant LF as list_file
CL->>QM: xqmgr_execute_query (xasl_id, host_vars)
QM->>QE: qexec_execute_query (xasl_p)
QE->>MB: qexec_execute_mainblock_internal
Note over MB: Pre-processing
MB->>MB: qexec_open_scan(spec=t1) → scan_open_heap_scan
MB->>MB: aptr loop (none here)
MB->>IPF: dispatch BUILDLIST_PROC
Note over IPF: Processing — pull loop
loop while qexec_next_scan_block_iterations == S_SUCCESS
IPF->>SC: scan_start_scan + scan_next_scan
loop while scan_next_scan == S_SUCCESS
SC->>HF: heap_next (next OID, fetch RECDES)
HF-->>SC: RECDES + page latch (cached)
SC->>SC: heap_attrinfo_read_dbvalues (pred_attrs)
SC->>SC: pr_eval_fnc(scan_pred) — col1 = 1 ?
alt qualified
SC->>SC: fetch_val_list (rest_regu_list — col2)
SC-->>IPF: S_SUCCESS
IPF->>IPF: dptr loop, scan_ptr loop, if_pred
IPF->>LF: qexec_end_one_iteration → write tuple
else not qualified
SC->>SC: continue (no rest fetch)
end
end
SC->>SC: scan_end_scan
end
Note over MB: Post-processing
MB->>MB: groupby? no. orderby? no. analytic? no.
MB->>MB: qexec_close_scan
MB-->>QE: NO_ERROR
QE-->>QM: xasl->list_id
QM-->>CL: cloned QFILE_LIST_ID
The dotted-line property the deck spelled out: nothing about this loop changes for a join — a join just makes xasl->scan_ptr non-NULL, and the per-row body in qexec_intprt_fnc recursively calls qexec_execute_scan on the inner block. Likewise, a correlated subquery just adds a dptr_list entry that runs once per qualified outer row.
Block iterations — (a ∪ b) ⋈ (c ∪ d)
Section titled “Block iterations — (a ∪ b) ⋈ (c ∪ d)”When an access spec covers multiple targets — typically a partitioned class (ALL super_class) or an OID-list spec — the spec carries a chain of ACCESS_SPEC_TYPE nodes. The interpreter handles this through qexec_next_scan_block_iterations, which advances the spec chain top-down and resets every joined-block scan whenever an outer block resets.
flowchart LR
Out0["outer access_spec<br/>= (a, b)"] -- block 1: a --> InRing0
Out0 -- block 2: b --> InRing1
subgraph InRing0[" "]
direction LR
In0a["inner access_spec<br/>= (c, d)<br/>block 1: c"] --> In0b["inner block 2: d"]
end
subgraph InRing1[" "]
direction LR
In1a["inner access_spec<br/>= (c, d)<br/>block 1: c (reset)"] --> In1b["inner block 2: d"]
end
The four-block expansion (a∪b)⋈(c∪d) = (a⋈c)∪(a⋈d)∪(b⋈c)∪(b⋈d) falls out by running the outer block selector through qexec_next_scan_block on the outermost XASL, then recursively resetting all scan_ptr chains beneath it via the loop the v0.7 deck reproduces.
// qexec_next_scan_block_iterations — src/query/query_executor.c (sketch)static SCAN_CODEqexec_next_scan_block_iterations (THREAD_ENTRY * thread_p, XASL_NODE * xasl){ /* find the deepest scan_ptr that still has a qualified block */ for (last_xptr = xasl; last_xptr->scan_ptr; last_xptr = last_xptr->scan_ptr) if (!last_xptr->next_scan_block_on || (...)) break;
/* advance that scan_ptr to its next block */ xs_scan = qexec_next_scan_block (thread_p, last_xptr);
/* reset every scan_ptr below it back to its first block */ for (xptr2 = last_xptr; xptr2; xptr2 = xptr2->scan_ptr) qexec_next_scan_block (thread_p, xptr2->scan_ptr);
/* reset every scan above it to re-run on the new combination */ while (last_xptr != xasl) { /* unwind back up */ }}Subqueries — APTR runs once, DPTR runs per row, REGU_VARIABLE binds the result
Section titled “Subqueries — APTR runs once, DPTR runs per row, REGU_VARIABLE binds the result”Both subquery kinds end up as separate XASL trees and execute via recursive qexec_execute_mainblock. The difference is when and how the result reaches the parent.
For an APTR (xasl->aptr_list), the outer’s pre-processing runs qexec_execute_mainblock(aptr) once. The result list is referenced via a REGU_VARIABLE of type CONSTANT whose value.dbvalptr points to a slot the executor populates from the materialised tuple. The aptr’s XASL has XASL_LINK_TO_REGU_VARIABLE set so that when the executor walks the aptr_list later, it skips it (the result is already materialised and bound).
For a DPTR (xasl->dptr_list), the per-row loop runs qexec_execute_mainblock(dptr) for each outer row. Inside the dptr’s scan, the outer’s correlation columns are visible because they live in the shared VAL_DESCR (via the XASL_STATE).
For a derived-table-style FROM-clause subquery, the outer’s spec is TARGET_LIST pointing at the inner XASL; pre-processing runs the inner once via aptr, and the outer reads the result with scan_open_list_scan.
Post-processing — group-by, order-by, analytic
Section titled “Post-processing — group-by, order-by, analytic”After the main pull loop completes, the executor runs three optional post-processing phases on xasl->list_id:
// post-processing tail — qexec_execute_mainblock_internal (sketch){ /* ... main loop drained ... */
/* GROUP BY — sort or hash, then evaluate aggregates per group */ if (xasl->type == BUILDLIST_PROC && xasl->proc.buildlist.groupby_list) qexec_groupby (thread_p, xasl, xasl_state, &tplrec);
/* analytic functions (window) — separate post-pass over list_id */ if (xasl->type == BUILDLIST_PROC && xasl->proc.buildlist.a_eval_list) qexec_execute_analytic (thread_p, xasl, xasl_state, ...);
/* ORDER BY / DISTINCT — sort_listfile with SORT_ELIM_DUP or SORT_DUP */ if (xasl->orderby_list || (XASL_IS_FLAGED (xasl, XASL_HAS_DISTINCT))) qexec_orderby_distinct (thread_p, xasl, query_opt_mode, &tplrec);}Group-by has two implementations — sort group-by and hash group-by — chosen by the optimiser:
flowchart LR
subgraph Sort["sort group-by (qexec_groupby)"]
SR["result list"] --> SS["sort_listfile"]
SS --> SI["iterate sorted run"]
SI --> SA["accumulate ACC<br/>(qdata_evaluate_aggregate_list)"]
SA --> SO["result list"]
end
subgraph Hash["hash group-by"]
HM["per-row evaluation<br/>(qexec_hash_gby_agg_tuple)"] --> HACC["in-memory hash ACC"]
HACC -- spill --> HPL["partial list (sorted)"]
HPL --> HM2["sort_listfile"]
HM2 --> HMG["merge eq groups<br/>(qdata_aggregate_accumulator_to_accumulator)"]
HMG --> HSA["accumulate residual ACC"]
HSA --> HO["result list"]
end
Hash group-by switches to the spill path when the in-memory hash crosses the configured budget (prm_get_integer_value(PRM_ID_GBY_HASH_TO_SCAN_THRESHOLD)); the spilled partial list is then merged via the same qdata_aggregate_accumulator_to_accumulator path. Order-by and DISTINCT both flow through sort_listfile with the SORT_ELIM_DUP flag for DISTINCT.
Parallel heap scan — the one push-side detour
Section titled “Parallel heap scan — the one push-side detour”For sequential heap scans on large heaps, CUBRID can split the file across multiple worker threads. qexec_open_scan first tries scan_open_parallel_heap_scan; if it fails (small heap, partition too small to split), it falls back to single-thread scan_open_heap_scan. The parallel implementation lives under src/query/parallel/; the SCAN_ID arm is S_PARALLEL_HEAP_SCAN and produces results that the rest of the executor reads as if they were a list scan.
// scan_open_parallel_heap_scan — src/query/scan_manager.c (sketch via qexec_open_scan)#if SERVER_MODE && !WINDOWS error_code = scan_open_parallel_heap_scan (thread_p, s_id, mvcc_select_lock_needed, fixed, grouped, vd, curr_spec, &ACCESS_SPEC_CLS_OID (curr_spec), &ACCESS_SPEC_HFID (curr_spec), xasl, query_id); if (s_id->type == S_PARALLEL_HEAP_SCAN) { assert (s_id->s.phsid.manager != nullptr); } else { /* fallback — small heap; revert to S_HEAP_SCAN */ scan_open_heap_scan (thread_p, s_id, ...); }#endifParallel-heap is the one place the executor briefly leaves pure pull-mode: the workers push partial result tuples into a coordinator queue that the parent reads back in pull style.
Source Walkthrough
Section titled “Source Walkthrough”Anchor on symbol names. Line numbers are scoped to this document’s
updated:date and are intended only as quick hints; the symbol names are stable.
Server entry and dispatch (src/query/)
Section titled “Server entry and dispatch (src/query/)”xqmgr_execute_query(query_manager.c) — wire-side server entry; looks up the cached XASL.qmgr_process_query(query_manager.c) — unpack stream → XASL tree, callqexec_execute_query, clone/return list id.qexec_execute_query(query_executor.c) — top entry; sets upXASL_STATE, returns the top-levelQFILE_LIST_ID.qexec_execute_mainblock/qexec_execute_mainblock_internal— recursion-depth wrapper + the operator-type dispatch switch.qexec_intprt_fnc— generic pull-loop interpreter (BUILDLIST/BUILDVALUE/SCAN_PROC).qexec_execute_scan/qexec_execute_scan_ptr— recursive entry forxasl->scan_ptr(joined-block chain).qexec_open_scan— fan-out from access spec toscan_open_<type>_scan.qexec_next_scan_block/qexec_next_scan_block_iterations— advance access-spec chain across UNION-ALL of partitions or sub-classes.qexec_end_one_iteration/qexec_end_mainblock_iterations— write qualified row, close down the main block’s scans.
DML and special proc-type handlers
Section titled “DML and special proc-type handlers”qexec_execute_update/qexec_execute_delete/qexec_execute_insert/qexec_execute_merge— pair a BUILDLIST row source with the mutation inlocator_*/heap_*.qexec_execute_dblink_query—TARGET_DBLINKleaf.qexec_execute_do_stmt— DO-statement (no result, side-effects only).qexec_execute_connect_by— hierarchical query (CONNECT BY).qexec_execute_cte— CTE materialisation/recursion (xasl->max_iterationsceiling).
Post-processing
Section titled “Post-processing”qexec_groupby/qexec_groupby_index— sort and index-driven group-by.qexec_hash_gby_agg_tuple/qexec_hash_gby_get_next/qexec_hash_gby_put_next— hash group-by per-row, cursor wrappers.qdata_evaluate_aggregate_list/qdata_aggregate_accumulator_to_accumulator— accumulator update; merge for hash spill.qdata_save_agg_htable_to_list/qdata_load_agg_hvalue_in_agg_list— spill out / load back hash accumulators.qexec_orderby_distinct/qexec_orderby_distinct_by_sorting— order-by + distinct viasort_listfile(SORT_ELIM_DUP / SORT_DUP).qexec_execute_analytic— analytic-function (window) post-pass.
Scan management (src/query/scan_manager.{c,h})
Section titled “Scan management (src/query/scan_manager.{c,h})”SCAN_ID— operator state (tagged union);SCAN_TYPEenum lists all scan kinds (S_HEAP_SCAN,S_INDX_SCAN,S_LIST_SCAN,S_CLASS_ATTR_SCAN,S_SET_SCAN,S_JSON_TABLE_SCAN,S_VALUES_SCAN,S_SHOWSTMT_SCAN,S_DBLINK_SCAN,S_PARALLEL_HEAP_SCAN,S_HEAP_PAGE_SCAN,S_INDX_KEY_INFO_SCAN).S_STATUSlifecycle (S_OPENED/S_STARTED/S_ENDED/S_CLOSED);S_DIRECTION,S_POSITION.scan_init_scan_id/scan_init_scan_pred/scan_init_scan_attrs— common initialisers.scan_open_heap_scan/scan_open_index_scan/scan_open_list_scan— the three principal opens.scan_open_class_attr_scan/scan_open_set_scan/scan_open_values_scan/scan_open_json_table_scan/scan_open_dblink_scan/scan_open_showstmt_scan/scan_open_heap_page_scan— secondary leaves.scan_open_parallel_heap_scan— worker-pool-driven heap scan; falls back to single-threadscan_open_heap_scan.scan_start_scan/scan_end_scan/scan_close_scan— per-block / file-level resource lifecycle.scan_next_scan_block— advance access-spec chain inside one SCAN_ID.scan_next_scan→scan_handle_single_scan→scan_next_scan_local— outer dispatcher chain.scan_next_heap_scan/scan_next_index_scan/scan_next_list_scan— the three principal nexts.scan_next_class_attr_scan/scan_next_set_scan/scan_next_value_scan/scan_next_json_table_scan/scan_next_dblink_scan/scan_next_heap_page_scan/scan_next_index_key_info_scan— secondary nexts.
Heap-side helpers (src/storage/heap_file.c)
Section titled “Heap-side helpers (src/storage/heap_file.c)”heap_next/heap_prev— forward/backward iterators; thin wrappers overheap_next_internal(page-fix, slot walk, MVCC visibility,REC_RELOCATION/REC_BIGONEchain).heap_next_sampling/heap_next_record_info— sampling and record-info variants.heap_first/heap_last— single-fetch / positional-index entry points.heap_get_visible_version/heap_get_visible_version_internal/heap_prepare_object_page— point-read with snapshot evaluation and version-chain walk.heap_attrinfo_start/heap_attrinfo_end/heap_attrinfo_read_dbvalues— decoder cache lifecycle and RECDES → DB_VALUE conversion.
Predicate evaluator, fetcher, list I/O
Section titled “Predicate evaluator, fetcher, list I/O”eval_data_filter/eval_fnc/eval_pred*(query_evaluator.c) —FILTER_INFObridge, per-shape compiled predicate, fast paths.fetch_val_list/fetch_peek_dbval(fetch.c) — REGU_VARIABLE → DB_VALUE materialisation.qexec_execute_obj_fetch— bptr (path-expression) fetcher.QFILE_LIST_ID,qfile_open_list/qfile_close_list,qfile_scan_list_next,qfile_save_tuple,qfile_clone_list_id,sort_listfile(list_file.c) — list-file I/O.
XASL node types (src/xasl/xasl_node.hpp)
Section titled “XASL node types (src/xasl/xasl_node.hpp)”xasl_node— the operator/plan node.proc_type—BUILDLIST_PROC/BUILDVALUE_PROC/UNION_PROC/INTERSECTION_PROC/DIFFERENCE_PROC/MERGELIST_PROC/OBJFETCH_PROC/SCAN_PROC/UPDATE_PROC/DELETE_PROC/INSERT_PROC/MERGE_PROC/CONNECTBY_PROC/CTE_PROC/DO_PROC.ACCESS_SPEC_TYPE—TARGET_CLASS/TARGET_LIST/TARGET_REGUVAL_LIST/TARGET_SET/TARGET_DBLINK/TARGET_JSON_TABLE/TARGET_VALUES_LIST/TARGET_SHOWSTMT.ACCESS_METHOD—ACCESS_METHOD_SEQUENTIAL/ACCESS_METHOD_INDEX/ACCESS_METHOD_SEQUENTIAL_RECORD_INFO/..._SAMPLING_SCAN.
Position hints as of 2026-04-30
Section titled “Position hints as of 2026-04-30”| Symbol | File | Line |
|---|---|---|
xqmgr_execute_query | query_manager.c | 1299 |
qmgr_process_query | query_manager.c | 1177 |
qmgr_get_query_entry | query_manager.c | 543 |
qexec_execute_query | query_executor.c | 16292 |
qexec_execute_mainblock | query_executor.c | 14900 |
qexec_execute_mainblock_internal | query_executor.c | 15058 |
qexec_intprt_fnc | query_executor.c | 9092 |
qexec_open_scan | query_executor.c | 7303 |
qexec_execute_scan | query_executor.c | 8300 |
qexec_execute_scan_ptr | query_executor.c | 8261 |
qexec_next_scan_block | query_executor.c | 7851 |
qexec_next_scan_block_iterations | query_executor.c | 7979 |
qexec_end_one_iteration | query_executor.c | 1130 |
qexec_end_mainblock_iterations | query_executor.c | 14701 |
qexec_groupby | query_executor.c | 5247 |
qexec_groupby_index | query_executor.c | 20526 |
qexec_hash_gby_agg_tuple | query_executor.c | 4537 |
qexec_hash_gby_get_next | query_executor.c | 4778 |
qexec_hash_gby_put_next | query_executor.c | 4793 |
qexec_orderby_distinct | query_executor.c | 3936 |
qexec_orderby_distinct_by_sorting | query_executor.c | 4000 |
qexec_execute_analytic | query_executor.c | 21007 |
qexec_execute_obj_fetch | query_executor.c | 13182 |
qexec_execute_connect_by | query_executor.c | 16716 |
qexec_execute_cte | query_executor.c | 17472 |
scan_open_heap_scan | scan_manager.c | 2846 |
scan_open_index_scan | scan_manager.c | 3067 |
scan_open_list_scan | scan_manager.c | 3715 |
scan_start_scan | scan_manager.c | 4136 |
scan_next_scan_block | scan_manager.c | 4606 |
scan_end_scan | scan_manager.c | 4749 |
scan_close_scan | scan_manager.c | 4882 |
scan_next_scan_local | scan_manager.c | 5193 |
scan_next_heap_scan | scan_manager.c | 5361 |
scan_next_index_scan | scan_manager.c | 5909 |
scan_next_list_scan | scan_manager.c | 6647 |
scan_next_scan | scan_manager.c | 7329 |
heap_next_internal | storage/heap_file.c | 7902 |
heap_next | storage/heap_file.c | 19427 |
heap_first | storage/heap_file.c | 8483 |
heap_last | storage/heap_file.c | 8511 |
heap_attrinfo_read_dbvalues | storage/heap_file.c | 10683 |
heap_get_visible_version | storage/heap_file.c | 25456 |
heap_get_visible_version_internal | storage/heap_file.c | 25577 |
heap_prepare_object_page | storage/heap_file.c | 25856 |
Cross-check Notes
Section titled “Cross-check Notes”The raw analysis (analysis_QP_executor_0.7.pptx, dated 2020-03-16) is version 0.7 — explicitly an early exploratory pass — and tracks the executor as it was around CUBRID 10.x / 11-early. The Heap scan.pdf companion is more recent and aligned with the current source. Several aspects have shifted since the v0.7 deck was authored; flag these when reading the deck against the current tree:
- The deck’s narrative entry point is
qexec_execute_mainblock(). The current code adds an_internalindirection:qexec_execute_mainblockis just a recursion-depth + perfmon wrapper aroundqexec_execute_mainblock_internal. Symbol names quoted from the deck still resolve, but the dispatch switch lives one frame deeper. SELECT_PROCis no longer a separate proc type. The deck’s slide on “EXECUTOR (qexec_execute_mainblock())” lists pre/processing/post-processing as the only code path, implicitly assuming a SELECT_PROC. The current code usesBUILDLIST_PROCandBUILDVALUE_PROC(and several others) and routes them through the sameqexec_intprt_fncinterpreter; the proc type is what the optimiser writes, not what the executor branches on at the top level.- DML proc types (
UPDATE_PROC/DELETE_PROC/INSERT_PROC/MERGE_PROC). The deck’s flowchart shows only the SELECT path. The currentqexec_execute_mainblock_internalhas dedicated arms for DML proc types that route toqexec_execute_update/_delete/_insert/_merge; these are not covered by the v0.7 narrative. - Hierarchical / CTE proc types.
CONNECTBY_PROCandCTE_PROCare dispatched separately — neither is in the deck. The CTE recursive-iteration counter (xasl->max_iterations) drives acte_offset_read_tuplehand-shaking insideqexec_intprt_fncthat postdates the deck. - Parallel heap scan.
S_PARALLEL_HEAP_SCANandscan_open_parallel_heap_scanare recent (SERVER_MODE && !WINDOWSonly). The deck’s heap-scan flow assumes a single-threadedS_HEAP_SCAN. For large heaps with no usable index, the dispatch inqexec_open_scanfirst tries parallel and falls back to single-thread. S_JSON_TABLE_SCAN/S_DBLINK_SCAN/S_VALUES_SCAN/S_SHOWSTMT_SCAN. Modern leaf scan types not present in the v0.7 deck. The same open/start/next/end/close shape applies, but each has its ownscan_open_*/scan_next_*set.- Hash group-by spill mechanics. The deck’s diagram shows the partial-list-merge logic but does not name
PRM_ID_GBY_HASH_TO_SCAN_THRESHOLDor the exact spill predicate. Current code uses a budgeted in-memory hash with explicit spill viaqdata_save_agg_htable_to_listwhen the budget is exceeded; the merge step usesqdata_aggregate_accumulator_to_accumulatorto combine equal-key accumulators. grouped/single_fetchsemantics on SCAN_ID. The deck does not detail these flags, but they meaningfully alterscan_next_heap_scan(fixed page latch + PEEK semantics undergrouped) and the higher wrapperscan_handle_single_scan(single-tuple SELECT-INTO).- MVCC visibility was reworked. The deck pre-dates
heap_get_visible_version/heap_get_visible_version_internalin their current form; in particular, theprev_version_lsachain walking the undo log for too-new versions (seecubrid-mvcc.md/cubrid-heap-manager.md) is materially different from the deck’s brief MVCC-snapshot reference. - The heap-scan PDF’s five-step lifecycle (
scan_open/scan_start/scan_next/scan_end/scan_close) is correct as of current source. The PDF’s description of the four-step inner loop inscan_next_heap_scan(next obj → predicate → maybe-lock → rest) also matches the current code body.
Open Questions
Section titled “Open Questions”- Cost of the per-tuple virtual call.
scan_next_scan→scan_handle_single_scan→scan_next_scan_local→scan_next_<type>_scanis at least three indirect calls per tuple, on top ofpr_eval_fnc. For TPC-H-shaped scans on cold caches, the cache-miss cost on the indirect targets is plausibly the bottleneck. Investigation path: PMU profiling onscan_next_heap_scanunder sustained scans; consider the cost of a vector-batched alternative. cache_last_fix_pageinvariants under MVCC re-fetch. Whenmvcc_select_lock_needed == trueand the row needs a re-fetch with the lock, the page LSA may have advanced (PGBUF_IS_PAGE_CHANGED), at which point the code drops PEEK to COPY and restarts. Are there cases where this loop does not converge — for instance, a heavily-updated heap page under repeatedSELECT FOR UPDATE?- Hash group-by spill threshold default.
PRM_ID_GBY_HASH_TO_SCAN_THRESHOLDcontrols when the in-memory hash spills. The current default has not been re-tuned against modern memory sizes; benchmark sweep would surface a better default (or argue for a workload-adaptive policy). xasl->max_iterationsfor recursive CTE. The interpreter terminates a recursive CTE whenrecursive_iterations >= xasl->max_iterations(qexec_intprt_fnc). What does the optimiser set this to? Is it user-tunable, or is it a hard ceiling? Investigation path: track all assignments toxasl_node::max_iterationsinxasl_generation/.count_star_with_iscan_optcorrectness. The optimisation that skips per-key iteration forSELECT COUNT(*) FROM twhen there is a usable index relies onspecp->s_id.s.isid.need_count_only = trueand a singleoids_countaccumulation per scan block. Are there filter shapes that silently bypass this and quietly produce wrong counts?- Parallel heap scan fallback boundary.
scan_open_parallel_heap_scanfalls back to single-thread when the heap is small or the partition makes parallelism uneconomic. The exact boundary is internal; document the predicate and how it interacts with the optimiser’s cost model. S_DBLINK_SCANpush-down. Predicates that would normally split into key/scan/data for a local index — what fraction of those are pushed to the remote? Are there predicate shapes (NLS comparisons, JSON ops) that are silently re-evaluated locally?SELECT_PROCversusBUILDLIST_PROChistory. Was there ever a distinctSELECT_PROCproc type that was later folded into BUILDLIST? The v0.7 deck is ambiguous; git history should resolve this. (Agit log --diff-filter=D -S 'SELECT_PROC'pass onxasl/would settle it.)XASL_LINK_TO_REGU_VARIABLElifecycle. When an aptr’s result is bound into a parent’s REGU_VARIABLE, re-execution of the same XASL (e.g. inside a UNION-ALL spec) must not re-run the aptr. The flag suppresses re-execution but the ownership semantics of the bound DB_VALUE are subtle; investigate corner cases when an aptr returns a SET-typed value.
Sources
Section titled “Sources”Raw analyses (raw/code-analysis/cubrid/query-processing/)
Section titled “Raw analyses (raw/code-analysis/cubrid/query-processing/)”analysis_QP_executor_0.7.pptx— early (2020-03) executor overview deck; main-block flow, sub-query placement, group-by sketch.Heap scan.pdf— heap-scan deep dive; the five-step scan lifecycle, four-step inner loop, HEAP_SCANCACHE/HEAP_CACHE_ATTRINFO local-cache framing._converted/analysisqpexecutor0.7.pptx.md— markitdown extract of the PPTX (used inline above)._converted/heap-scan.pdf.md— markitdown extract of the PDF.
Sibling docs
Section titled “Sibling docs”knowledge/code-analysis/cubrid/cubrid-heap-manager.md— record-type vocabulary (REC_HOME/REC_RELOCATION/REC_BIGONE),heap_nextpage-walk, and the five caches thatHEAP_SCANCACHEparticipates in.knowledge/code-analysis/cubrid/cubrid-mvcc.md—mvcc_satisfies_snapshot,prev_version_lsachain, and how visibility plugs into the heap iterator.
Textbook chapters / papers
Section titled “Textbook chapters / papers”- Database Internals (Petrov), Ch. 12 “Query Execution” — pull vs push, operator state, materialisation strategies.
- Database Systems: The Complete Book (Garcia-Molina/Ullman/Widom), Ch. 15 “Query Execution” + Ch. 16 “The Query Compiler” — Volcano model framing, sort/hash/group operator pseudocode.
- Graefe, Goetz. Volcano — An Extensible and Parallel Query Evaluation System, IEEE TKDE 6(1), 1994 — the canonical iterator-model paper; introduces
open/next/closeand the exchange operator. - Graefe, Goetz. Query Evaluation Techniques for Large Databases, ACM Computing Surveys 25(2), 1993 — broader survey including blocking-operator design and join algorithms.
CUBRID source (/data/hgryoo/references/cubrid/)
Section titled “CUBRID source (/data/hgryoo/references/cubrid/)”src/query/query_executor.c— the executor (~28 KLOC). Entry:qexec_execute_mainblock_internal.src/query/query_manager.c— server-side query plumbing; entry:xqmgr_execute_query.src/query/scan_manager.c— the per-tuple iterator dispatcher; entry:scan_next_scan.src/query/scan_manager.h—SCAN_ID,SCAN_TYPE,S_STATUSdefinitions.src/query/fetch.c—fetch_val_list/fetch_peek_dbvalregu-variable materialisation.src/query/query_evaluator.c—eval_fnc,eval_data_filter, the per-shape predicate fast paths.src/query/list_file.c/list_file.h—QFILE_LIST_ID,qfile_scan_list_next,sort_listfile.src/query/parallel/— parallel heap-scan worker pool implementation.src/storage/heap_file.c—heap_next/heap_next_internaland the heap-side scan helpers used fromscan_next_heap_scan.src/xasl/xasl_node.hpp—xasl_node, proc type constants,ACCESS_SPEC_TYPE.