CUBRID Scan Manager — SCAN_ID Dispatch, Open/Next/Close Protocol, and the Access-Method Catalogue
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Cross-check Notes
- Open Questions
- Sources
Theoretical Background
Section titled “Theoretical Background”The job of a relational executor is to stitch operators into a tree and ask the leaves for tuples. The leaves are access methods: heap scans, index scans, list-file scans, generated rows, foreign-data scans, virtual catalogues. Above them sit filters, projections, joins, sorts, group-by, and the rest of the operator algebra. The runtime contract that lets every operator above pretend it is talking to a uniform table is what this document calls the scan manager.
Three theoretical pillars underpin the design.
The first is access-path selection. Selinger et al.’s 1979 Access Path Selection in a Relational DBMS (the System R paper) is the founding statement of the optimiser/executor split: the optimiser picks an access path per relation — sequential heap scan, index scan over a chosen index, or composite — using cost estimates over CPU and I/O; the executor faithfully runs whatever path was chosen. The split lives because access-path costing is a search problem (it requires statistics, join-order enumeration, dynamic programming) while access-path execution is a mechanical dispatch problem (apply the chosen heap walk, B+Tree descent, or list reader). System R also introduced the idea that a single uniform scan handle can hide the per-method bookkeeping from the operator above. CUBRID inherits both halves verbatim: the optimiser writes the choice into the XASL access-spec node, and scan_open_* plus scan_next_scan translate that choice into actual page reads.
The second pillar is the iterator model. Goetz Graefe formalised it in Volcano — An Extensible and Parallel Query Evaluation System (TKDE 1994) and surveyed it in Query Evaluation Techniques for Large Databases (ACM Computing Surveys 1993). Each operator implements three methods — open, next, close — and the executor pulls a tuple at a time. Two structural consequences matter for the scan manager. First, every leaf must implement the same three-method interface so that the operator above does not need to know what kind of leaf it has. This is the polymorphism CUBRID encodes in SCAN_ID. Second, the state of an in-flight scan is not the function-call stack; it is data carried inside the scan handle, because the executor returns to next over and over. This is why a heap scan keeps HEAP_SCANCACHE and curr_oid inside HEAP_SCAN_ID, and why an index scan keeps BTREE_SCAN, OID list, and curr_oidno inside INDX_SCAN_ID.
The third pillar is the access-method catalogue. Hellerstein and Stonebraker’s Anatomy of a Database System (Red Book Ch. 4) and Petrov’s Database Internals (Ch. 1 “Introduction and Overview” and Ch. 2-3 on B-tree variants) describe a DBMS as a layered system with a small set of stable access-method modules — heap files, B+Trees, hash indexes, sometimes bitmap or LSM variants — each owning its own page formats, latching, and concurrency. The scan manager is the dispatch layer between the operator algebra above and these access-method modules below. Database System Concepts (Silberschatz, Korth, Sudarshan), in its Query Processing chapter, frames the same idea as “evaluation algorithms”: every algebraic operator picks an evaluation algorithm (e.g., index scan, sort-merge join, hash aggregate), and the scan dispatch is what selects among heap-vs-index for relation access at the leaves.
Two further textbook ideas matter for CUBRID specifically. Database Internals Ch. 2-3 catalogues B-tree variants used in real systems, including the covering index (the leaf carries enough columns that the heap fetch can be skipped), index skip scan (cycling through the leading column’s distinct values when the predicate omits that column), and index loose scan (reading only one entry per distinct prefix for DISTINCT/MIN/MAX queries). All three appear inside INDX_SCAN_ID as flags and inline state. Database System Concepts on multi-table joins introduces the hash join build-and-probe pattern, which CUBRID layers on top of list scan via HASH_LIST_SCAN — the build side reads tuples from a list file and hashes them; the probe side issues hash lookups on each outer-loop tuple. The scan manager owns this layering because the build/probe machinery is bolted onto LLIST_SCAN_ID, not factored into a separate operator.
Where the scan operator sits in the operator tree fixes its responsibilities. Above it, every Volcano-style operator (filter, project, sort, hash-build, group-by, top-N, join driver) ultimately calls next() on a scan. Below it, the access methods (heap file, B+Tree, list file) expose lower-level interfaces (page-at-a-time iteration, key-value search, tuple-at-a-time list reads). The scan manager is the layer that bridges these two abstractions. It is the only place in CUBRID where the polymorphism implied by Selinger 1979 — “the optimiser chose this access path; the executor runs it” — actually gets compiled in C as a switch over SCAN_TYPE.
Common DBMS Design
Section titled “Common DBMS Design”Every iterator-model relational engine implements an access-method dispatch layer; the differences lie in how it is reified.
PostgreSQL keeps separate plan-state node types for each access path: SeqScanState, IndexScanState, BitmapHeapScanState, BitmapIndexScanState, IndexOnlyScanState, TidScanState, SubqueryScanState, FunctionScanState, ValuesScanState, CteScanState, WorkTableScanState, ForeignScanState. Each is a struct that extends ScanState, and ExecProcNode dispatches via the function pointer set by ExecInitNode. The polymorphism is by C-struct inheritance and a per-node function pointer table; the cost is a thicker plumbing layer (one initializer, one execProc, one shutdown per scan kind) but the benefit is per-kind type safety in the C code. Index scan further partitions into IndexScanDesc plus per-AM (am_beginscan, am_gettuple, am_endscan) callbacks dispatched through the pluggable index access-method API; this is how Postgres supports B-tree, hash, GiST, GIN, BRIN, and SP-GiST through a single executor.
MySQL/InnoDB pushes the dispatch one level lower: the SQL layer’s JOIN_TAB / QEP_TAB doesn’t care about the storage engine at all; it calls into the handler API, where handler::ha_index_init, ha_index_read_map, ha_index_next, ha_rnd_init, ha_rnd_next, ha_rnd_end are the three-method contract. Per-engine handlers (ha_innobase, ha_myisam, ha_archive, ha_federated, ha_partition) supply the actual scan logic. The dispatch is therefore double — once at the SQL/handler boundary, once inside the engine. The advantage is engine pluggability; the disadvantage is that the handler API has accreted dozens of methods over time as features (covering indexes, multi-range read, condition pushdown, parallel scan) leaked across the boundary.
SQL Server’s relational engine uses access-method “row sources” (the term row source is borrowed from Oracle, where the same idea appears as ROWSOURCE_SCAN, ROWSOURCE_INDEX_RANGE, etc.). A query plan is a tree of operators, each with Open, GetNext, and Close methods, and the leaves are access-method row sources that talk to the storage engine through a uniform tuple interface. Oracle additionally compiles its row-source tree to position-independent code so it can be shared across cursors.
CUBRID picks a third design point: one polymorphic struct, one function table, one switch. The SCAN_ID struct carries a SCAN_TYPE discriminator and an anonymous union of per-type sub-structs (HEAP_SCAN_ID, INDX_SCAN_ID, LLIST_SCAN_ID, SET_SCAN_ID, REGU_VALUES_SCAN_ID, SHOWSTMT_SCAN_ID, JSON_TABLE_SCAN_ID, DBLINK_SCAN_ID, METHOD_SCAN_ID, PARALLEL_HEAP_SCAN_ID, HEAP_PAGE_SCAN_ID, INDEX_NODE_SCAN_ID). Every public entry — scan_open_<kind>, scan_start_scan, scan_next_scan, scan_end_scan, scan_close_scan, scan_reset_scan_block, scan_next_scan_block — is a single function whose body is a switch (scan_id->type) that fans out to a per-type implementation. The polymorphism is therefore in data (the discriminator) rather than in types (Postgres) or in interfaces (MySQL). The trade-off CUBRID accepted: less type safety (everything compiles even when a SCAN_TYPE is wrong because the switch’s default is a runtime error), but a vastly thinner mechanism — one open function, one next function, one close function in scan_manager.h, all shared.
The SCAN_TYPE enum in scan_manager.h enumerates exactly which access methods CUBRID supports today: S_HEAP_SCAN, S_PARALLEL_HEAP_SCAN, S_CLASS_ATTR_SCAN, S_INDX_SCAN, S_LIST_SCAN, S_SET_SCAN, S_JSON_TABLE_SCAN, S_METHOD_SCAN, S_VALUES_SCAN, S_SHOWSTMT_SCAN, S_HEAP_SCAN_RECORD_INFO, S_HEAP_PAGE_SCAN, S_INDX_KEY_INFO_SCAN, S_INDX_NODE_INFO_SCAN, S_DBLINK_SCAN, S_HEAP_SAMPLING_SCAN. The catalogue is fixed at compile time; adding a new access path means adding an enum value, a sub-struct in the union, an scan_open_* function, and switch arms in scan_start_scan / scan_next_scan_local / scan_end_scan / scan_close_scan. The whole catalogue lives in roughly 240 KB of scan_manager.c.
CUBRID’s Approach
Section titled “CUBRID’s Approach”SCAN_ID — one struct, many shapes
Section titled “SCAN_ID — one struct, many shapes”The polymorphism is realised in a single struct in scan_manager.h. The fields above the union are common to every scan kind; the union encodes the per-kind state.
// scan_id_struct — src/query/scan_manager.hstruct scan_id_struct{ SCAN_TYPE type; /* discriminator */ SCAN_STATUS status; /* S_OPENED|S_STARTED|S_ENDED|S_CLOSED */ SCAN_POSITION position; /* S_BEFORE|S_ON|S_AFTER */ SCAN_DIRECTION direction; /* S_FORWARD|S_BACKWARD */ bool mvcc_select_lock_needed; SCAN_OPERATION_TYPE scan_op_type; /* S_SELECT|S_DELETE|S_UPDATE */ int fixed; int grouped; int qualified_block; QPROC_SINGLE_FETCH single_fetch; int single_fetched; int null_fetched; QPROC_QUALIFICATION qualification; DB_VALUE *join_dbval; val_list_node *val_list; val_descr *vd; union { LLIST_SCAN_ID llsid; HEAP_SCAN_ID hsid; PARALLEL_HEAP_SCAN_ID phsid; HEAP_PAGE_SCAN_ID hpsid; INDX_SCAN_ID isid; INDEX_NODE_SCAN_ID insid; SET_SCAN_ID ssid; DBLINK_SCAN_ID dblid; REGU_VALUES_SCAN_ID rvsid; SHOWSTMT_SCAN_ID stsid; JSON_TABLE_SCAN_ID jtid; METHOD_SCAN_ID msid; } s; SCAN_STATS scan_stats; SCAN_STATS *partition_stats; bool scan_immediately_stop;};The common fields above the union are exactly those the operator algebra needs uniformly: position/status form the iterator’s state machine; single_fetch/single_fetched encode outer-join NULL-padding, read by scan_handle_single_scan regardless of scan kind; mvcc_select_lock_needed/scan_op_type propagate transactional intent to every kind that consults visibility.
The union compresses fundamentally different state shapes into a single in-memory layout: a HEAP_SCANCACHE plus current OID for heap; a BTREE_SCAN plus OID buffer plus three filter triples plus ISS/MRO/covered state for index; a QFILE_LIST_SCAN_ID plus optional HASH_LIST_SCAN for list; a cubscan::json_table::scanner C++ object for JSON_TABLE (via C++17 unrestricted-union rules). The discriminator SCAN_TYPE is set exactly once in the corresponding scan_open_* function and read everywhere else through the dispatch switches.
classDiagram
class SCAN_ID {
SCAN_TYPE type
SCAN_STATUS status
SCAN_POSITION position
SCAN_DIRECTION direction
bool mvcc_select_lock_needed
SCAN_OPERATION_TYPE scan_op_type
QPROC_SINGLE_FETCH single_fetch
union s
SCAN_STATS scan_stats
}
class HEAP_SCAN_ID {
OID curr_oid
OID cls_oid
HFID hfid
HEAP_SCANCACHE scan_cache
SCAN_PRED scan_pred
SCAN_ATTRS pred_attrs
SCAN_ATTRS rest_attrs
sampling_info sampling
}
class INDX_SCAN_ID {
INDX_INFO* indx_info
BTREE_SCAN bt_scan
BTREE_ISCAN_OID_LIST* oid_list
OID* curr_oidp
SCAN_PRED key_pred
SCAN_PRED scan_pred
SCAN_PRED range_pred
INDX_COV indx_cov
MULTI_RANGE_OPT multi_range_opt
INDEX_SKIP_SCAN iss
}
class LLIST_SCAN_ID {
QFILE_LIST_ID* list_id
QFILE_LIST_SCAN_ID lsid
SCAN_PRED scan_pred
HASH_LIST_SCAN hlsid
}
class SET_SCAN_ID {
regu_variable_node* set_ptr
regu_variable_list_node* operand
DB_VALUE set
int set_card
int cur_index
}
class JSON_TABLE_SCAN_ID {
cubscan::json_table::scanner
}
class DBLINK_SCAN_ID {
DBLINK_SCAN_INFO scan_info
SCAN_PRED scan_pred
}
class SHOWSTMT_SCAN_ID {
SHOWSTMT_TYPE show_type
DB_VALUE** arg_values
DB_VALUE** out_values
int cursor
void* ctx
}
class PARALLEL_HEAP_SCAN_ID {
OID curr_oid
HFID hfid
parallel_heap_scan::manager*
accumulative_trace_storage* trace_storage
}
SCAN_ID *-- HEAP_SCAN_ID : type=S_HEAP_SCAN
SCAN_ID *-- INDX_SCAN_ID : type=S_INDX_SCAN
SCAN_ID *-- LLIST_SCAN_ID : type=S_LIST_SCAN
SCAN_ID *-- SET_SCAN_ID : type=S_SET_SCAN
SCAN_ID *-- JSON_TABLE_SCAN_ID : type=S_JSON_TABLE_SCAN
SCAN_ID *-- DBLINK_SCAN_ID : type=S_DBLINK_SCAN
SCAN_ID *-- SHOWSTMT_SCAN_ID : type=S_SHOWSTMT_SCAN
SCAN_ID *-- PARALLEL_HEAP_SCAN_ID : type=S_PARALLEL_HEAP_SCAN
The protocol — open, start, next-block, next, end, close
Section titled “The protocol — open, start, next-block, next, end, close”Five public verbs implement the lifecycle. Each is a thin switch in scan_manager.c; the per-type body either does the work inline or calls into a sibling module (heap_file.c, btree.c, list_file.c, set_scan.c, show_scan.c, dblink_scan.c, scan_json_table.cpp, parallel/px_heap_scan/).
scan_open_<kind> initialises the SCAN_ID. There is one open per SCAN_TYPE because each takes a different argument list. All open functions call scan_init_scan_id first to set the common state (position=S_BEFORE, status=S_OPENED, direction=S_FORWARD, qualification=QPROC_QUALIFIED), then populate the union member.
scan_start_scan transitions S_OPENED → S_STARTED and acquires transactional resources: heap_scancache_start for heap and index, MVCC snapshot via logtb_get_mvcc_snapshot, attribute-info caches via heap_attrinfo_start, qfile_open_list_scan + qfile_start_scan_fix for list. JSON-table, set, values, method, and dblink merely set state; show delegates to showstmt_start_scan.
scan_next_scan_block and scan_reset_scan_block exist for grouped (block-at-a-time) scans. Non-grouped scans treat them as identity functions returning S_SUCCESS once and S_END thereafter. Grouped heap scans use heap_scanrange_to_following/heap_scanrange_to_prior; grouped index scans call scan_get_index_oidset to refill the OID buffer.
scan_next_scan is the per-tuple entry point:
// scan_next_scan — src/query/scan_manager.cSCAN_CODEscan_next_scan (THREAD_ENTRY * thread_p, SCAN_ID * s_id){ return scan_handle_single_scan (thread_p, s_id, scan_next_scan_local);}scan_handle_single_scan is the second-order wrapper that enforces the single_fetch semantics (outer-join NULL-padding, early-out when the join column is NULL). scan_next_scan_local is the actual dispatch:
// scan_next_scan_local — src/query/scan_manager.c (excerpt)static SCAN_CODEscan_next_scan_local (THREAD_ENTRY * thread_p, SCAN_ID * scan_id){ /* ... trace bookkeeping elided ... */ switch (scan_id->type) { case S_HEAP_SCAN: case S_HEAP_SCAN_RECORD_INFO: case S_HEAP_SAMPLING_SCAN: status = scan_next_heap_scan (thread_p, scan_id); break; case S_PARALLEL_HEAP_SCAN: status = scan_next_parallel_heap_scan (thread_p, scan_id); break; case S_HEAP_PAGE_SCAN: status = scan_next_heap_page_scan (thread_p, scan_id); break; case S_CLASS_ATTR_SCAN: status = scan_next_class_attr_scan (thread_p, scan_id); break; case S_INDX_SCAN: status = scan_next_index_scan (thread_p, scan_id); break; case S_INDX_KEY_INFO_SCAN: status = scan_next_index_key_info_scan (thread_p, scan_id); break; case S_INDX_NODE_INFO_SCAN: status = scan_next_index_node_info_scan (thread_p, scan_id); break; case S_LIST_SCAN: if (scan_id->s.llsid.hlsid.hash_list_scan_type != HASH_METH_NOT_USE) status = scan_next_hash_list_scan (thread_p, scan_id); else status = scan_next_list_scan (thread_p, scan_id); break; case S_SHOWSTMT_SCAN: status = scan_next_showstmt_scan (thread_p, scan_id); break; case S_VALUES_SCAN: status = scan_next_value_scan (thread_p, scan_id); break; case S_SET_SCAN: status = scan_next_set_scan (thread_p, scan_id); break; case S_JSON_TABLE_SCAN: status = scan_next_json_table_scan (thread_p, scan_id); break; case S_METHOD_SCAN: status = scan_next_method_scan (thread_p, scan_id); break; case S_DBLINK_SCAN: status = scan_next_dblink_scan (thread_p, scan_id); break; default: er_set (ER_ERROR_SEVERITY, ARG_FILE_LINE, ER_QPROC_INVALID_XASLNODE, 0); return S_ERROR; } /* ... fetch / ioread perfmon delta accumulation elided ... */ return status;}scan_end_scan and scan_close_scan mirror scan_start_scan and scan_open_*. end releases per-iteration resources (close HEAP_SCANCACHE, end qfile_scan fixings, clear key copy buffers); close releases per-scan-lifetime allocations (bt_attr_ids, vstr_ids, OID buffer, copy_buf, indx_cov records, multi_range_opt arrays, prebuilt_midxkey_domains, hash-list-scan tables). The two-phase release exists because the same SCAN_ID is sometimes restarted (rewound for nested-loop joins) without being reopened — end then start_scan again is cheaper than full reopen.
stateDiagram-v2
[*] --> S_OPENED : scan_open_*
S_OPENED --> S_STARTED : scan_start_scan
S_STARTED --> S_STARTED : scan_next_scan*
S_STARTED --> S_ENDED : scan_end_scan
S_ENDED --> S_STARTED : scan_start_scan (re-iterate)
S_ENDED --> S_CLOSED : scan_close_scan
S_CLOSED --> [*]
note right of S_STARTED
position cycles
S_BEFORE -> S_ON -> S_AFTER
direction may flip
S_FORWARD <-> S_BACKWARD
end note
Where scan_next_scan sits in the operator tree
Section titled “Where scan_next_scan sits in the operator tree”The executor (cubrid-query-executor.md covers it in detail) treats every leaf of an XASL operator tree as a scan. qexec_open_scan opens whatever access path the optimiser picked by calling the right scan_open_<kind>; qexec_intprt_fnc’s pull loop calls scan_next_scan until it returns S_END; qexec_end_scan and qexec_close_scan finish the lifecycle. This means every operator above — filter, project, sort-build, group-by, hash-join build, top-N — is downstream of scan_next_scan. The scan manager is the single chokepoint where access-method polymorphism is resolved.
flowchart TB
subgraph "Operator Tree (above scan)"
GROUPBY[Group-By / Aggregate]
SORT[Sort / TOP-N]
HASHB[Hash-Join Build]
PROJ[Project / Fetch rest]
FILT[Filter / scan_pred]
GROUPBY --> SORT
SORT --> HASHB
HASHB --> PROJ
PROJ --> FILT
end
FILT --> NEXT[scan_next_scan]
NEXT --> LOCAL[scan_next_scan_local]
LOCAL --> SWITCH{switch type}
SWITCH -->|S_HEAP_SCAN| HEAP[scan_next_heap_scan]
SWITCH -->|S_INDX_SCAN| IDX[scan_next_index_scan]
SWITCH -->|S_LIST_SCAN| LIST[scan_next_list_scan / scan_next_hash_list_scan]
SWITCH -->|S_SET_SCAN| SET[scan_next_set_scan]
SWITCH -->|S_VALUES_SCAN| VALS[scan_next_value_scan]
SWITCH -->|S_JSON_TABLE_SCAN| JT[scan_next_json_table_scan]
SWITCH -->|S_DBLINK_SCAN| DBL[scan_next_dblink_scan]
SWITCH -->|S_SHOWSTMT_SCAN| SHOW[scan_next_showstmt_scan]
SWITCH -->|S_PARALLEL_HEAP_SCAN| PHS[scan_next_parallel_heap_scan]
HEAP --> HF[heap_next/prev/sampling/record_info]
IDX --> BTS[btree_keyval_search/btree_range_search]
LIST --> QF[qfile_scan_list_next]
JT --> JN[scanner::next_scan]
DBL --> CCI[dblink_scan_next via CCI]
SHOW --> SHF[showstmt_next_scan]
PHS --> WMP[worker_manager dispatch]
Heap scan path — S_HEAP_SCAN
Section titled “Heap scan path — S_HEAP_SCAN”scan_open_heap_scan initialises the HEAP_SCAN_ID from the access spec: class OID and HFID are bitwise-copied, current OID is reset to a NULL slot in the heap’s volume, the data filter scan_pred is bound to the predicate-expression tree with eval_fnc chosen as evaluator, and pred/rest attribute caches are linked. scan_type may be S_HEAP_SCAN, S_HEAP_SCAN_RECORD_INFO (adds a cache_recordinfo array exposing REC_HOME/REC_RELOCATION/REC_BIGONE/REC_NEWHOME and per-slot LSAs), or S_HEAP_SAMPLING_SCAN (computes sampling_weight = total_pages / NUMBER_OF_SAMPLING_PAGES).
scan_start_scan calls heap_scanrange_start (grouped) or heap_scancache_start (non-grouped) to obtain the HEAP_SCANCACHE that pins the most recent page across slot iterations, acquires the per-thread MVCC snapshot via logtb_get_mvcc_snapshot for non-root classes, and starts the predicate/rest attribute info caches via heap_attrinfo_start.
scan_next_heap_scan is the per-tuple loop. Forward calls heap_next/heap_next_sampling/heap_next_record_info; backward calls heap_prev/heap_prev_record_info. After the heap returns the next slot, eval_data_filter walks the pred_expr tree against pred_attrs.attr_cache’s freshly populated DB_VALUEs. Rows that fail the filter continue; rows that pass fetch the rest attribute set via heap_attrinfo_read_dbvalues and fetch_val_list, or — when mvcc_select_lock_needed is on — hit locator_lock_and_get_object_with_evaluation which acquires the row lock and re-checks the data filter against the latest visible version (see Reevaluation hook below). The page-changed-during-PEEK loop (PGBUF_IS_PAGE_CHANGED) re-fetches with COPY semantics if a concurrent write has shifted the slot’s underlying bytes between the predicate read and the rest-attribute read.
This is the only consumer of heap_next/heap_prev/heap_next_sampling/heap_next_record_info in the executor path; cubrid-heap-manager.md is canonical for those primitives.
Index scan path — S_INDX_SCAN
Section titled “Index scan path — S_INDX_SCAN”scan_open_index_scan is the longest open function (≈ 350 lines) because the state machine is the richest. The B+Tree root header is fetched via pgbuf_fix + btree_get_root_header for max_key_len and BTID; BTREE_INIT_SCAN initialises the cursor; btree_glean_root_header_info populates BTID_INT.key_type. The OID buffer comes from a pool (scan_alloc_iscan_oid_buf_list, capacity ISCAN_OID_BUFFER_COUNT) unless the scan is covering. Three filter triples — key_pred, range_pred, scan_pred — are each bound with their own regu_list and pr_eval_fnc via scan_init_scan_pred. scan_init_indx_coverage/scan_init_iss/scan_init_multi_range_optimization set up the optional covering-index list-file, ISS state, and MRO top-N heap respectively.
scan_start_scan starts the heap scan cache (heap fetches happen by default), starts attribute caches for range/key/pred/rest predicates, and resets oids_count/curr_keyno/curr_oidno/one_range. scan_next_index_scan then alternates two states:
- Out of OIDs —
call_get_next_index_oidset(ISS-aware wrapper) callsscan_get_index_oidset, which invokesbtree_keyval_search(R_KEY/R_KEYLIST) orbtree_range_search(R_RANGE/R_RANGELIST), filling the OID buffer up tomax_oid_cnt. - Have OIDs — advance
curr_oidno, pointcurr_oidpat the OID, callscan_next_index_lookup_heapto fetch the heap row, evaluate the data filter, apply MVCC reevaluation.
Four optimiser-driven variants ride on this skeleton:
- Covering index —
SCAN_IS_INDEX_COVERED(isidp). Keys are dumped intoINDX_COV.list_idviascan_dump_key_into_tupleandqfile_scan_list_nextreads them as scan output; no heap fetch (Petrov Database Internals Ch. 2). - MRO (multi-range optimization) —
multi_range_opt.use = trueforORDER BY ... LIMIT N.top_n_itemsis a heap of the best N keys; each new key competes against the worst element. O(N log N) instead of O(matching keys + sort). - ISS (index skip scan) —
iss.skipped_range = &indx_info->iss_range. When the predicate omits the leading column,scan_get_next_iss_valueadvances through distinct C1 values via B+Tree lookup, retrying the original range per value (Database Internals Ch. 3). - LISS (loose index scan) —
SCAN_IS_INDEX_ILS(isidp)withils_prefix_len > 0. Advances by skipping prefix duplicates, returning one entry per distinct prefix; what makesSELECT DISTINCTandMAX ... GROUP BYrun in O(distinct prefixes).
Every B+Tree state (BTREE_SCAN, BTID_INT, root-header fetch, range/key-val search) flows through INDX_SCAN_ID.bt_scan; cubrid-btree.md is canonical.
flowchart LR
A[scan_next_index_scan] --> B{have OIDs?}
B -- no --> C[call_get_next_index_oidset]
C --> D{ISS?}
D -- yes,empty --> E[scan_get_next_iss_value]
E --> C
D -- no/has rows --> F[scan_get_index_oidset]
F --> G[btree_keyval_search OR btree_range_search]
G --> H[oid_list filled]
H --> B
B -- yes --> I{covered?}
I -- yes --> J[scan_dump_key_into_tuple via INDX_COV.list_id]
J --> K[qfile_scan_list_next] --> L[fetch_val_list to val_list]
I -- no --> M[scan_next_index_lookup_heap]
M --> N[heap_get_visible_version]
N --> O[eval_data_filter]
O --> P{mvcc_select_lock_needed?}
P -- yes --> Q[locator_lock_and_get_object_with_evaluation]
Q --> R[reevaluate filters]
P -- no --> R
R --> S[fetch_val_list rest_regu_list] --> T[return S_SUCCESS]
List scan path — S_LIST_SCAN
Section titled “List scan path — S_LIST_SCAN”A list scan reads a previously materialised tuple stream — typically the output of a sort, hash-build, group-by, or sub-block. scan_open_list_scan binds the upstream QFILE_LIST_ID, the data filter scan_pred, the rest regu list, and (optionally) build/probe regu lists for hash-list-scan. is_read_only governs whether list-file pages are latched for write. scan_start_scan calls qfile_open_list_scan + qfile_start_scan_fix to attach the cursor and pin the first page.
The per-tuple loop walks qfile_scan_list_next (PEEK), populates predicate-attribute DB_VALUEs via fetch_val_list (scan_pred.regu_list), evaluates the predicate, branches on QPROC_QUALIFIED/NOT_QUALIFIED/QUALIFIED_OR_NOT, and fetches rest attributes via fetch_val_list (rest_regu_list) on success. Every materialising sub-XASL writes to a QFILE_LIST_ID and its parent reads via S_LIST_SCAN; this is what makes hierarchical XASL trees, derived tables, CTE materialisation, and sort-then-scan all work uniformly.
resolve_domains_on_list_scan runs once at the top of the loop to resolve TYPE_POSITION regu variables whose domain was deferred (host-variable late binding) — the position index into the list-file type list yields the actual TP_DOMAIN for variable-domain regu vars.
Hash list scan — S_LIST_SCAN with hash_list_scan_type != HASH_METH_NOT_USE
Section titled “Hash list scan — S_LIST_SCAN with hash_list_scan_type != HASH_METH_NOT_USE”When the optimiser detects that a list scan is the inner of a hash-joinable shape, it flags hash-list-scan. check_hash_list_scan validates feasibility (tuple count > 0, build/probe regu lists non-null and balanced, types coercible, no no_hash_list_scan hint) and picks one of three methods:
HASH_METH_IN_MEM— entire tuples stored in an in-memory hash table (mht_create_hls).HASH_METH_HYBRID— only tuple positions stored in memory; tuples re-read from the list file on probe viaqfile_jump_scan_tuple_position. Used when the list fits in memory but tuples are wide.HASH_METH_HASH_FILE— extendible-hash file (fhs_create/fhs_insert/fhs_search) when the list overflowsPRM_ID_MAX_HASH_LIST_SCAN_SIZE. Backed bycubrid-extendible-hash.md.
Build runs inside scan_open_list_scan via scan_build_hash_list_scan: walk the list, build keys from build_regu_list via qdata_build_hscan_key, hash via qdata_hash_scan_key, insert. Probe runs in scan_next_hash_list_scan: build the key from probe_regu_list against the current outer tuple, hash, look up. scan_hash_probe_next walks both mht_get_hls and mht_get_next_hls for collision peers; the hybrid variant additionally calls qfile_jump_scan_tuple_position to materialise the tuple at the recorded position.
This is CUBRID’s hash-join build/probe machinery, layered onto list scan rather than implemented as a separate operator. Hash join only works when the inner is already a list-file, but the gains versus nested-loop are substantial when it does.
Set / Values / JSON_TABLE / Show / Dblink / Method scans
Section titled “Set / Values / JSON_TABLE / Show / Dblink / Method scans”Set scan (S_SET_SCAN) — unnests a CUBRID collection (SET/MULTISET/SEQUENCE) into rows. scan_open_set_scan records the regu var producing the collection (set_ptr); qproc_next_set_scan (in set_scan.c) at S_BEFORE fetches the collection via fetch_copy_dbval and yields element 0, at S_ON advances cur_index and yields the next, at S_AFTER returns S_END. A special F_SEQUENCE path walks the inline operand list directly without materialising a DB_SET.
Values scan (S_VALUES_SCAN) — implements (VALUES (1,'a'),(2,'b'),...). scan_open_values_scan records the valptr_list of REGU_VALUE_LIST cells; scan_next_value_scan advances every cell’s current_value in lockstep — when the first reaches its end, returns S_END.
JSON_TABLE scan (S_JSON_TABLE_SCAN) — turns a JSON document plus a path expression into rows. scan_open_json_table_scan only initialises the embedded cubscan::json_table::scanner’s scan predicate and value descriptor; the heavy lifting is in scan_json_table.cpp. The C++ scanner is exposed to C as JSON_TABLE_SCAN_ID via C++17 unrestricted-union rules. It maintains a per-depth cursor stack — each cursor holds the current JSON document, an iterator into the array at this level, the current child index, and flags for “row fetched” / “need advance” / “node consumed”. scanner::next_scan calls scan_next_internal recursively over the cursor stack, generating rows by cross-product expansion of nested paths (each NESTED PATH multiplies row counts by its array’s cardinality). The scanner-local predicate filter lives inside next_scan itself, not in the outer scan_handle_single_scan, because the scanner must skip non-qualifying rows internally to advance JSON iterators correctly.
Show scan (S_SHOWSTMT_SCAN) — exposes server internals (SHOW VOLUME HEADER, SHOW THREADS, SHOW PAGE_BUFFER_STATUS, SHOW HEAP CAPACITY, SHOW INDEX HEADER, SHOW TIMEZONES, SHOW JOB QUEUES, SHOW TRAN_TABLES, etc.) as virtual relations. show_scan.c keeps a SHOW_REQUEST table indexed by SHOWSTMT_TYPE with start_func/next_func/end_func pointers per entry; showstmt_scan_init populates it at server bootstrap. The most common implementation pattern is showstmt_array_next_scan/showstmt_array_end_scan: pre-materialise rows into a SHOWSTMT_ARRAY_CONTEXT (DB_VALUE matrix) inside start_func, then iterate. Volume-header / log-header / slotted-page-slots / heap-header fetch rows on demand from the underlying subsystem.
Dblink scan (S_DBLINK_SCAN) — executes a remote SQL statement via the CCI driver. scan_open_dblink_scan records the predicate then calls dblink_open_scan which: reuses or creates a CCI connection via qmgr_dblink_find_conn_handle / cci_connect_with_url_ex; calls cci_prepare on the SQL text; binds host variables via dblink_bind_param; calls cci_execute and stashes column metadata via cci_get_result_info. scan_next_dblink_scan → dblink_scan_next uses cci_cursor (CCI_CURSOR_FIRST first call, CCI_CURSOR_CURRENT subsequent) + cci_fetch to advance, then per column calls cci_get_data and converts via dblink_make_cci_value + tp_value_cast_preserve_domain. End-of-result is signalled by CCI_ER_NO_MORE_DATA. dblink_scan_reset re-positions to CCI_CURSOR_FIRST for outer-loop rescans (CCI keeps the result set after cci_execute). dblink_close_scan calls cci_close_req_handle and (if auto_commit) cci_disconnect.
Parallel heap scan (S_PARALLEL_HEAP_SCAN) — the optimiser picks this when a query benefits from parallelism (aggregation or hash-join build over a large heap). scan_open_parallel_heap_scan constructs a parallel_heap_scan::manager<RESULT_TYPE> template instance (parametrised by scan-only / scan-with-aggregation / scan-as-hash-join-build); scan_start_parallel_heap_scan partitions the heap into block ranges and dispatches per-worker tasks via the worker manager. scan_next_parallel_heap_scan reads from the result handler’s reassembled stream populated asynchronously by workers; scan_end_parallel_heap_scan signals workers to stop and merges stats; scan_close_parallel_heap_scan releases the manager. The runtime substrate lives in cubrid-thread-worker-pool.md; the scan manager just owns the dispatch.
Method scan (S_METHOD_SCAN) — invokes a stored procedure (Java PL or PL/CSQL) or method and treats its result as a relation. scan_open_method_scan calls METHOD_SCAN_ID::init + open to set up the JNI bridge or PL stack; scan_next_method_scan calls METHOD_SCAN_ID::next_scan which marshals the next tuple across the JNI/method boundary into scan_id->val_list. Source-of-truth: cubrid-pl-javasp.md/cubrid-pl-plcsql.md.
Class-attribute scan (S_CLASS_ATTR_SCAN) — returns a single row of class-level “shared” attributes. scan_next_class_attr_scan evaluates the data filter once at S_BEFORE and transitions directly to S_AFTER; never produces more than one row.
Heap-page / record-info / index-key-info / index-node-info scans — introspection variants. S_HEAP_PAGE_SCAN walks heap pages without descending into slots. S_HEAP_SCAN_RECORD_INFO piggybacks on regular heap scan but populates cache_recordinfo (record_type, record_size, OID-of-home, MVCC headers) per slot. S_INDX_KEY_INFO_SCAN and S_INDX_NODE_INFO_SCAN are B+Tree leaf-key and node-structure walkers used by SHOW INDEX HEADER/SHOW INDEX CAPACITY. S_HEAP_SAMPLING_SCAN visits one page per sampling_weight pages (computed from total_pages / NUMBER_OF_SAMPLING_PAGES); used by the analyzer for cheap cardinality estimation (cubrid-statistics.md).
Predicate / range / filter hooks
Section titled “Predicate / range / filter hooks”Three predicate slots appear on scan kinds that have them: key_pred, range_pred, scan_pred. The first two are index-only; scan_pred (data filter) appears on heap, list, set, show, and dblink. SCAN_PRED carries regu_list (attribute-reading list to populate before eval), pred_expr (predicate tree), and pr_eval_fnc (evaluator obtained from eval_fnc). scan_init_scan_pred wires the three; per-type scan_open_* calls it once per slot. At eval time, the per-tuple loop calls fetch_val_list on regu_list to populate the DB_VALUEs the predicate references, then pr_eval_fnc(thread_p, pred_expr, vd, current_oid) for the boolean. eval_data_filter (in query_evaluator.c) is the canonical wrapper; it takes a FILTER_INFO bundle naming range/key/data filters, the value list, the value descriptor, the class OID, and reevaluation hooks (only some slots populated per scan kind). This is the unified hook through which the scan manager talks to the predicate evaluator.
Reevaluation hook
Section titled “Reevaluation hook”When mvcc_select_lock_needed is true (UPDATE/DELETE driver scans, FOR UPDATE selects), the scan must re-evaluate the data filter against the latest visible version after acquiring the row lock, because an in-flight transaction may have changed columns between the snapshot read and lock acquisition. The hook is wired through MVCC_REEV_DATA + UPDDEL_MVCC_COND_REEVAL + MVCC_SCAN_REEV_DATA and locator_lock_and_get_object_with_evaluation (in locator_sr.c) which acquires the row lock, refetches the latest version, evaluates the reev data, and reports filter_result. If reevaluation fails, the scan transparently skips the row. The same machinery is used for index scan via scan_next_index_lookup_heap. Source-of-truth: cubrid-mvcc.md.
Single-fetch / outer-join semantics
Section titled “Single-fetch / outer-join semantics”scan_handle_single_scan wraps any per-type next function with four single_fetch modes: QPROC_NO_SINGLE_INNER (pass-through; normal case); QPROC_SINGLE_OUTER (at most one row; if join_dbval is NULL, return a NULL-padded row without touching the access method — short-circuits a left-outer join’s inner when the outer column is NULL); QPROC_SINGLE_INNER (same single-row contract without NULL-padding fallback); QPROC_NO_SINGLE_OUTER (multiple rows allowed; if none qualify, return one NULL-padded row to preserve the outer’s row count — outer-join row preservation). The outer-join row-preservation logic lives in the scan layer so each operator above does not have to re-implement it.
Source Walkthrough
Section titled “Source Walkthrough”Symbols are grouped by responsibility. Line numbers in the position-hints table are as of this document’s updated: date; treat them as decay-prone hints and grep for the symbol name first.
SCAN_ID type plumbing
Section titled “SCAN_ID type plumbing”SCAN_TYPE— discriminator enum inscan_manager.h. Values:S_HEAP_SCAN,S_PARALLEL_HEAP_SCAN,S_CLASS_ATTR_SCAN,S_INDX_SCAN,S_LIST_SCAN,S_SET_SCAN,S_JSON_TABLE_SCAN,S_METHOD_SCAN,S_VALUES_SCAN,S_SHOWSTMT_SCAN,S_HEAP_SCAN_RECORD_INFO,S_HEAP_PAGE_SCAN,S_INDX_KEY_INFO_SCAN,S_INDX_NODE_INFO_SCAN,S_DBLINK_SCAN,S_HEAP_SAMPLING_SCAN.scan_id_struct(SCAN_ID) — the polymorphic scan handle.scan_init_scan_id— populates the common (above-union) fields.scan_init_scan_pred— wiresregu_list/pred_expr/pr_eval_fncinto a SCAN_PRED.scan_init_scan_attrs— wires the attribute-info cache into a SCAN_ATTRS.scan_initialize/scan_finalize— process-wide pool of ISCAN OID buffers (scan_Iscan_oid_buf_list).scan_alloc_oid_list/scan_free_oid_list— buffer (de)allocation primitives.scan_alloc_iscan_oid_buf_list/scan_free_iscan_oid_buf_list— pool-aware acquire/release for an INDX_SCAN_ID’s OID buffer.scan_save_scan_pos/scan_jump_scan_pos— list-file-scan-only checkpoint primitives.
Public lifecycle entries
Section titled “Public lifecycle entries”scan_open_heap_scan,scan_open_heap_page_scan,scan_open_class_attr_scan— heap variants.scan_open_index_scan,scan_open_index_key_info_scan,scan_open_index_node_info_scan— index variants.scan_open_list_scan— list scan, with optional hash-list-scan setup inline.scan_open_showstmt_scan— show.scan_open_values_scan—(VALUES ...).scan_open_set_scan— collection unnest.scan_open_json_table_scan— JSON_TABLE.scan_open_method_scan— method/SP invocation.scan_open_dblink_scan— foreign data via CCI.scan_open_parallel_heap_scan(inpx_heap_scan.cpp) — parallel heap.scan_start_scan— per-iteration acquire.scan_reset_scan_block,scan_next_scan_block— block iteration for grouped scans.scan_next_scan,scan_prev_scan— per-tuple entry; thin wrappers aroundscan_handle_single_scan.scan_end_scan— per-iteration release.scan_close_scan— per-scan-lifetime release.
Heap scan internals
Section titled “Heap scan internals”scan_next_heap_scan, scan_next_heap_page_scan, scan_next_class_attr_scan. Local enum OBJECT_GET_STATUS tracks the lock-acquire / re-fetch state machine inside scan_next_heap_scan.
Index scan internals
Section titled “Index scan internals”scan_next_index_scan, scan_next_index_lookup_heap, scan_next_index_key_info_scan, scan_next_index_node_info_scan, call_get_next_index_oidset, scan_get_index_oidset, scan_regu_key_to_index_key, scan_dbvals_to_midxkey, scan_init_iss, scan_save_range_details/scan_restore_range_details, scan_get_next_iss_value, scan_init_index_key_limit, scan_init_indx_coverage, scan_init_multi_range_optimization, scan_dump_key_into_tuple. Macros SCAN_IS_INDEX_COVERED/MRO/ISS/ILS. Sub-structures INDEX_SKIP_SCAN, MULTI_RANGE_OPT, INDX_COV.
List + hash list scan internals
Section titled “List + hash list scan internals”scan_next_list_scan, resolve_domains_on_list_scan/resolve_domain_on_regu_operand, scan_build_hash_list_scan, scan_next_hash_list_scan, scan_hash_probe_next, check_hash_list_scan. Hash primitives in query_hash_scan.c: qdata_alloc_hscan_key/hscan_value/hscan_value_OID, qdata_build_hscan_key, qdata_hash_scan_key, qdata_hscan_key_eq, qdata_copy_hscan_key/copy_hscan_key_without_alloc, fhs_create/destroy/insert/search/search_next, in-mem mht_put_hls/get_hls/get_next_hls/clear_hls/destroy_hls. Macros MAKE_TUPLE_POSTION, MAKE_TFTID_TO_TUPLE_POSTION.
Set / values / JSON / show / dblink / parallel
Section titled “Set / values / JSON / show / dblink / parallel”qproc_next_set_scan (in set_scan.c), scan_next_set_scan, scan_next_value_scan, scan_next_json_table_scan, cubscan::json_table::scanner and methods open/next_scan/end/init/clear/scan_next_internal/set_input_document/init_cursor/set_next_cursor/init_iterators/reset_ordinality/clear_node_columns/get_tree_height plus inner cursor. showstmt_scan_init, showstmt_start_scan/next_scan/end_scan, scan_next_showstmt_scan, showstmt_array_next_scan/array_end_scan, showstmt_alloc_array_context/free_array_context/alloc_tuple_in_context, SHOW_REQUEST, thread_start_scan/thread_scan_mapfunc. dblink_open_scan/close_scan/scan_next/scan_reset, dblink_bind_param, dblink_make_cci_value/make_date_time/make_date_time_tz/get_basic_utype, dblink_execute_query/end_tran. parallel_heap_scan::manager<RESULT_TYPE> template (in px_heap_scan.hpp), extern-C entries scan_open_parallel_heap_scan/scan_start_parallel_heap_scan/scan_next_parallel_heap_scan/scan_end_parallel_heap_scan/scan_close_parallel_heap_scan/scan_reset_scan_block_parallel_heap_scan, plus input_handler_ftabs, result_handler<RESULT_TYPE>, trace_handler, accumulative_trace_storage.
Predicate / single-fetch / reevaluation / stats
Section titled “Predicate / single-fetch / reevaluation / stats”scan_handle_single_scan, scan_next_scan_local, scan_prev_scan_local, eval_data_filter (in query_evaluator.c), update_logical_result, FILTER_INFO. MVCC_REEV_DATA/MVCC_SCAN_REEV_DATA/UPDDEL_MVCC_COND_REEVAL (in query_reevaluation.hpp), locator_lock_and_get_object_with_evaluation (in locator_sr.c), mvcc_reev_data.filter_result. SCAN_STATS, scan_print_stats_json/scan_print_stats_text (used by EXPLAIN).
Position hints (as of this revision)
Section titled “Position hints (as of this revision)”Symbol names are stable; line numbers below are decay-prone hints scoped to this document’s updated: date.
| Symbol | File | Line |
|---|---|---|
SCAN_TYPE enum | src/query/scan_manager.h | 75 |
scan_id_struct | src/query/scan_manager.h | 379 |
HEAP_SCAN_ID | src/query/scan_manager.h | 104 |
INDX_SCAN_ID (indx_scan_id) | src/query/scan_manager.h | 230 |
LLIST_SCAN_ID | src/query/scan_manager.h | 291 |
SET_SCAN_ID | src/query/scan_manager.h | 323 |
SHOWSTMT_SCAN_ID | src/query/scan_manager.h | 303 |
DBLINK_SCAN_ID | src/query/scan_manager.h | 97 |
PARALLEL_HEAP_SCAN_ID | src/query/scan_manager.h | 130 |
INDEX_SKIP_SCAN | src/query/scan_manager.h | 222 |
MULTI_RANGE_OPT | src/query/scan_manager.h | 193 |
INDX_COV | src/query/scan_manager.h | 168 |
SCAN_IS_INDEX_COVERED macro | src/query/scan_manager.h | 421 |
SCAN_IS_INDEX_ISS macro | src/query/scan_manager.h | 424 |
SCAN_IS_INDEX_ILS macro | src/query/scan_manager.h | 425 |
scan_init_iss | src/query/scan_manager.c | 220 |
scan_init_index_scan | src/query/scan_manager.c | 290 |
scan_save_range_details | src/query/scan_manager.c | 320 |
scan_restore_range_details | src/query/scan_manager.c | 349 |
scan_get_next_iss_value | src/query/scan_manager.c | 392 |
scan_init_scan_pred | src/query/scan_manager.c | 639 |
scan_init_scan_attrs | src/query/scan_manager.c | 654 |
scan_init_indx_coverage | src/query/scan_manager.c | 676 |
scan_init_index_key_limit | src/query/scan_manager.c | 812 |
scan_alloc_oid_list | src/query/scan_manager.c | 940 |
scan_alloc_iscan_oid_buf_list | src/query/scan_manager.c | 997 |
scan_dbvals_to_midxkey | src/query/scan_manager.c | 1475 |
scan_regu_key_to_index_key | src/query/scan_manager.c | 1932 |
scan_get_index_oidset | src/query/scan_manager.c | 2224 |
scan_init_scan_id | src/query/scan_manager.c | 2789 |
scan_open_heap_scan | src/query/scan_manager.c | 2846 |
scan_open_heap_page_scan | src/query/scan_manager.c | 2932 |
scan_open_class_attr_scan | src/query/scan_manager.c | 2981 |
scan_open_index_scan | src/query/scan_manager.c | 3067 |
scan_open_index_key_info_scan | src/query/scan_manager.c | 3437 |
scan_open_index_node_info_scan | src/query/scan_manager.c | 3643 |
scan_open_list_scan | src/query/scan_manager.c | 3715 |
scan_open_showstmt_scan | src/query/scan_manager.c | 3849 |
scan_open_values_scan | src/query/scan_manager.c | 3958 |
scan_open_set_scan | src/query/scan_manager.c | 3996 |
scan_open_json_table_scan | src/query/scan_manager.c | 4036 |
scan_open_method_scan | src/query/scan_manager.c | 4071 |
scan_open_dblink_scan | src/query/scan_manager.c | 4109 |
scan_start_scan | src/query/scan_manager.c | 4136 |
scan_reset_scan_block | src/query/scan_manager.c | 4446 |
scan_next_scan_block | src/query/scan_manager.c | 4606 |
scan_end_scan | src/query/scan_manager.c | 4749 |
scan_close_scan | src/query/scan_manager.c | 4882 |
call_get_next_index_oidset | src/query/scan_manager.c | 5118 |
scan_next_scan_local | src/query/scan_manager.c | 5193 |
scan_next_heap_scan | src/query/scan_manager.c | 5361 |
scan_next_heap_page_scan | src/query/scan_manager.c | 5739 |
scan_next_class_attr_scan | src/query/scan_manager.c | 5810 |
scan_next_index_scan | src/query/scan_manager.c | 5909 |
scan_next_index_lookup_heap | src/query/scan_manager.c | 6288 |
scan_next_index_key_info_scan | src/query/scan_manager.c | 6539 |
scan_next_index_node_info_scan | src/query/scan_manager.c | 6595 |
scan_next_list_scan | src/query/scan_manager.c | 6647 |
scan_next_value_scan | src/query/scan_manager.c | 6756 |
scan_next_showstmt_scan | src/query/scan_manager.c | 6820 |
scan_next_set_scan | src/query/scan_manager.c | 6911 |
scan_next_json_table_scan | src/query/scan_manager.c | 7014 |
scan_next_method_scan | src/query/scan_manager.c | 7038 |
scan_next_dblink_scan | src/query/scan_manager.c | 7094 |
scan_handle_single_scan | src/query/scan_manager.c | 7182 |
scan_next_scan | src/query/scan_manager.c | 7329 |
scan_prev_scan_local | src/query/scan_manager.c | 7343 |
scan_prev_scan | src/query/scan_manager.c | 7460 |
scan_save_scan_pos | src/query/scan_manager.c | 7474 |
scan_jump_scan_pos | src/query/scan_manager.c | 7491 |
scan_initialize | src/query/scan_manager.c | 7616 |
scan_finalize | src/query/scan_manager.c | 7659 |
resolve_domains_on_list_scan | src/query/scan_manager.c | 7712 |
resolve_domain_on_regu_operand | src/query/scan_manager.c | 7808 |
scan_init_multi_range_optimization | src/query/scan_manager.c | 7848 |
scan_dump_key_into_tuple | src/query/scan_manager.c | 7921 |
scan_print_stats_json | src/query/scan_manager.c | 7969 |
scan_print_stats_text | src/query/scan_manager.c | 8105 |
scan_build_hash_list_scan | src/query/scan_manager.c | 8263 |
scan_next_hash_list_scan | src/query/scan_manager.c | 8371 |
scan_hash_probe_next | src/query/scan_manager.c | 8478 |
check_hash_list_scan | src/query/scan_manager.c | 8657 |
qproc_next_set_scan | src/query/set_scan.c | 46 |
cubscan::json_table::scanner | src/query/scan_json_table.hpp | 109 |
scanner::open | src/query/scan_json_table.cpp | 230 |
scanner::next_scan | src/query/scan_json_table.cpp | 297 |
scanner::scan_next_internal | src/query/scan_json_table.cpp | 438 |
scanner::cursor | src/query/scan_json_table.cpp | 37 |
showstmt_scan_init | src/query/show_scan.c | 102 |
showstmt_next_scan | src/query/show_scan.c | 252 |
showstmt_start_scan | src/query/show_scan.c | 285 |
showstmt_end_scan | src/query/show_scan.c | 311 |
showstmt_array_next_scan | src/query/show_scan.c | 449 |
showstmt_array_end_scan | src/query/show_scan.c | 479 |
showstmt_alloc_array_context | src/query/show_scan.c | 338 |
dblink_open_scan | src/query/dblink_scan.c | 686 |
dblink_close_scan | src/query/dblink_scan.c | 788 |
dblink_scan_next | src/query/dblink_scan.c | 830 |
dblink_scan_reset | src/query/dblink_scan.c | 1056 |
parallel_heap_scan::manager (template) | src/query/parallel/px_heap_scan/px_heap_scan.hpp | 38 |
scan_next_parallel_heap_scan | src/query/parallel/px_heap_scan/px_heap_scan.cpp | 44 |
scan_reset_scan_block_parallel_heap_scan | src/query/parallel/px_heap_scan/px_heap_scan.cpp | 77 |
scan_end_parallel_heap_scan | src/query/parallel/px_heap_scan/px_heap_scan.cpp | 197 |
scan_close_parallel_heap_scan | src/query/parallel/px_heap_scan/px_heap_scan.cpp | 242 |
scan_open_parallel_heap_scan | src/query/parallel/px_heap_scan/px_heap_scan.cpp | 349 |
scan_start_parallel_heap_scan | src/query/parallel/px_heap_scan/px_heap_scan.cpp | 586 |
Cross-check Notes
Section titled “Cross-check Notes”Cross-check against cubrid-query-executor.md. The executor doc describes qexec_open_scan and qexec_intprt_fnc as the operator-tree drivers; this doc describes the access-method dispatch they sit on top of. The hand-off is exactly: qexec_open_scan calls one of the scan_open_<kind> functions enumerated here based on ACCESS_SPEC_TYPE.access, then qexec_intprt_fnc calls scan_next_scan per pulled tuple. The two docs share the SCAN_ID name but split responsibility cleanly: executor doc owns the inter-operator plumbing (aptr/dptr/scan_ptr chains, list-file materialisation between operators, fetch_val_list, trace + EXPLAIN); this doc owns the leaf-level access-method machinery.
Cross-check against cubrid-query-optimizer.md. The optimiser writes the SCAN_TYPE choice into the access spec by way of the ACCESS_SPEC_TYPE.access enum (SEQUENTIAL / INDEX etc.); xasl_generation translates that into the scan_open_<kind> call. The optimiser/executor split lives at exactly that boundary: cost-based path selection above, mechanical dispatch below. The optimiser doc covers cost models, statistics consumption, and join-ordering; this doc covers what runs once the path is chosen.
Cross-check against cubrid-heap-manager.md. The heap-scan path here is a thin client of heap_next / heap_prev / heap_next_record_info / heap_next_sampling / heap_get_visible_version / heap_attrinfo_start / heap_attrinfo_read_dbvalues / heap_scancache_start / heap_scancache_end / heap_scanrange_start / heap_scanrange_to_following. The heap-manager doc is canonical for those functions’ semantics including REC_RELOCATION/REC_BIGONE/REC_NEWHOME slot type handling; this doc only describes how the scan manager calls them.
Cross-check against cubrid-btree.md. The index-scan path is similarly a client of btree_keyval_search / btree_range_search / btree_get_next_key_info / btree_get_next_node_info / btree_glean_root_header_info / btree_get_root_header / BTREE_INIT_SCAN / BTREE_RESET_SCAN / BTREE_END_OF_SCAN. The B+Tree doc is canonical for descent, key encoding (MIDXKEY), uniqueness, and concurrency; this doc covers how the executor’s index-scan iterator wraps those primitives, including the ISS / LISS / MRO / covered variants which live at the scan-manager layer rather than inside the B+Tree.
Cross-check against cubrid-mvcc.md. The reevaluation hook (MVCC_REEV_DATA + locator_lock_and_get_object_with_evaluation + mvcc_reev_data.filter_result) is the interface by which scans participate in MVCC visibility checks under mvcc_select_lock_needed. The MVCC doc is canonical for snapshot semantics, version chains, and visibility; this doc covers how a scan acquires a snapshot via logtb_get_mvcc_snapshot at scan_start_scan time and re-checks predicates after acquiring locks for UPDATE/DELETE driver scans.
Drift between scan_manager.h types and what the executor uses today. All SCAN_TYPE values in the enum are wired into both the executor’s qexec_open_scan and the scan-manager’s switches — the catalogue is exhaustive and consistent. Two enum values are introspection-only and never produced by the regular optimiser path: S_HEAP_PAGE_SCAN and S_INDX_NODE_INFO_SCAN are reached only through the SHOW infrastructure which constructs the access spec by hand. S_HEAP_SCAN_RECORD_INFO is similarly reached only via record-info SHOW. There is no enum value or sub-struct corresponding to a future “bitmap heap scan” or “TID scan”; these would require extending both the enum and the union.
The scan_id_struct.s union holds a METHOD_SCAN_ID whose internals live in method_scan.hpp outside the scan-manager module; this is an intentional decoupling so that the JNI/PL bridge can evolve independently of access-method dispatch.
The parallel_heap_scan::manager template C++ class sits inside the PARALLEL_HEAP_SCAN_ID C struct via a void *manager pointer — the manager class cannot be a direct member because the C struct must be visible to .c files that are compiled as C++17 but historically used C-style POD layouts. This is also why parallel_heap_scan::accumulative_trace_storage *trace_storage lives as a forward-declared pointer rather than an embedded value.
The MVCC_REEV_DATA machinery lives in query_reevaluation.cpp/hpp rather than directly in scan_manager.c; the scan-manager’s job is to populate the reev structures and call locator_lock_and_get_object_with_evaluation, which delegates the actual reevaluation to the predicate evaluator with the reev data threaded through.
Open Questions
Section titled “Open Questions”-
Parallel index scan?
S_PARALLEL_HEAP_SCANexists for heap scans but there is no correspondingS_PARALLEL_INDX_SCAN. The B+Tree’s range-search interface is not currently structured to permit safe partitioning across workers — partitioning would need to either split the leaf-level link list (which is single-direction) or split by sub-range with worker-local OID buffers reassembled at the parent. Worth exploring as the parallel-query infrastructure matures. -
Adaptive scan-method switching. The choice of
SCAN_TYPEis fixed at optimiser time. There is no mechanism for the executor to fall back from index scan to heap scan after observing that the actual selectivity is much worse than estimated, or to switch from in-memory hash to file-hash when the build side overflows. PostgreSQL has prototypes for this; CUBRID does not. -
Resumable cursors / scan-state checkpointing.
scan_save_scan_posandscan_jump_scan_posexist but are list-file-only. A general resumable cursor (across heap and index scans) would let the broker pause a long-running query mid-scan; this is currently impossible for non-list scans. TheSCAN_POSstruct only carriesQFILE_TUPLE_POSITION, so extending it would require per-scan-kind serialisation. -
Vectorisation. Every scan kind currently produces tuples one at a time. A vectorised next API (
next_batch(thread_p, scan_id, batch_size, batch_out)) would amortise the per-tuplepr_eval_fnccall and the predicate-attribute fetch across many tuples, but the cost is invasive — every operator above the scan would have to handle batches too. This is the same trade-off PostgreSQL is currently working through with its custom-scan and JIT infrastructure. -
Pluggable access methods. Adding a new
SCAN_TYPEtoday requires editing the enum, the union, every switch inscan_manager.c, and the executor’sqexec_open_scan. A handler-API-style indirection (function-pointer table perSCAN_TYPE) would make the catalogue extensible at runtime; the cost would be the loss of switch-table compile-time exhaustiveness checks. Postgres chose extension at compile time via theIndexAmRoutinecallback table; MySQL chose runtime viahandlerton. CUBRID has not committed in either direction. -
Index loose scan beyond DISTINCT/MIN/MAX. LISS today is wired only when
ils_prefix_len > 0is set by the optimiser for very specific query shapes. Generalising it to arbitrary GROUP BY prefixes would let CUBRID skip duplicates insideGROUP BYvia the index rather than by hashing or sorting; the gain would be substantial for high-cardinality leading-column groupings. -
Hash-list-scan for non-list inners. Hash-list-scan only triggers when the inner is already a list-file. If the inner is a heap scan with an indexed selection, the optimiser can prepend a sort/hash-build sub-block, but it currently does not because the cost model would need to compare nested-loop-with-index against materialised-hash-then-probe. This is a planner-side gap that the scan manager merely surfaces.
Sources
Section titled “Sources”Code paths consumed:
src/query/scan_manager.h— SCAN_ID, SCAN_TYPE, sub-type structs, public entry declarations.src/query/scan_manager.c— all dispatch switches, per-typescan_open_*/scan_next_*/scan_end_*/scan_close_*, ISS / LISS / MRO / covering-index helpers, hash-list-scan build/probe, single-fetch wrapper, scan position save/jump, OID buffer pool, EXPLAIN stats output.src/query/set_scan.c,src/query/set_scan.h—qproc_next_set_scan.src/query/scan_json_table.cpp,src/query/scan_json_table.hpp—cubscan::json_table::scanner.src/query/show_scan.c,src/query/show_scan.h—SHOW_REQUESTtable,showstmt_*lifecycle,showstmt_array_*.src/query/dblink_scan.c,src/query/dblink_scan.h— CCI bridge for foreign data.src/query/query_hash_scan.c,src/query/query_hash_scan.h—HASH_LIST_SCAN,qdata_*hscan*,fhs_*extendible-hash primitives.src/query/parallel/px_heap_scan/px_heap_scan.hpp,px_heap_scan.cpp—parallel_heap_scan::managertemplate andscan_*_parallel_heap_scanextern-C entries.- Cross-references read for context:
src/query/query_executor.c(qexec_open_scan,qexec_intprt_fnc),src/storage/heap_file.c(theheap_next/heap_scancache_*/heap_scanrange_*family),src/storage/btree.c(btree_keyval_search,btree_range_search),src/query/list_file.c(qfile_open_list_scan,qfile_scan_list_next).
Theoretical references cited:
- Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A., & Price, T. G. (1979). Access Path Selection in a Relational Database Management System. Proc. ACM SIGMOD. The founding paper for the optimiser/executor split and uniform access-path dispatch.
- Graefe, G. (1993). Query Evaluation Techniques for Large Databases. ACM Computing Surveys 25(2). Comprehensive survey of iterator-model and access-method techniques; chapter on iterators is the textbook anchor for the open/next/close protocol.
- Graefe, G. (1994). Volcano — An Extensible and Parallel Query Evaluation System. IEEE TKDE 6(1). The canonical Volcano paper; CUBRID’s
scan_next_scanlifecycle is a direct descendant. - Hellerstein, J. M., & Stonebraker, M. (Eds.) (2005). Anatomy of a Database System. In Readings in Database Systems (Red Book), 4th ed., Ch. 4 “Access Methods”. Layered DBMS model and the access-method catalogue concept.
- Petrov, A. (2019). Database Internals: A Deep Dive into How Distributed Data Systems Work. O’Reilly. Ch. 1 “Introduction and Overview” (file organisation), Ch. 2-3 (B-tree variants including covering, skip, and loose scan). Captured locally at
knowledge/research/dbms-general/database-internals.md. - Silberschatz, A., Korth, H. F., & Sudarshan, S. (2019). Database System Concepts, 7th ed. McGraw-Hill. Query Processing chapter on access paths; multi-table joins chapter on hash join build/probe. Captured locally at
knowledge/research/dbms-general/database-system-concepts.md.