CUBRID Cursor — Client-Side Fetch Handle Over a Server List-File With Holdability and Scroll State
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Cross-check Notes
- Open Questions
- Sources
Theoretical Background
Section titled “Theoretical Background”A relational engine that has finished evaluating a query has produced a
set of tuples. The client almost never wants the whole set in one
shot — it wants to walk it. The walking interface is a cursor:
a positional handle over the result set that yields one row (or a
small batch) per call and survives across many client/server
round-trips. Database System Concepts (Silberschatz, ch. 5
“Application Development”) frames the cursor as the bridge between
relational semantics (a query yields a set; sets have no order
unless ORDER BY is given) and host-language semantics (the
host wants to iterate, branch, possibly update on a per-row basis).
The bridge is positional, so the cursor must carry a current
position and a direction-of-travel; everything else (block
size, prefetch, locking) is a tunable.
ANSI/ISO SQL distinguishes four orthogonal cursor properties, and every cursor implementation lands on a specific point in this four-dimensional space:
- Scrollability. A
FORWARD ONLYcursor only moves forward; aSCROLLcursor supportsFETCH FIRST | LAST | PRIOR | NEXT | ABSOLUTE n | RELATIVE n. The textbook cost is that scrollability forces the engine to keep the result materialised: a forward-only cursor can stream straight from the operator tree (Volcanonext()calls), but a backward fetch needs random access into a buffered tuple stream. - Sensitivity. An
INSENSITIVEcursor sees a snapshot of the data taken at OPEN time; subsequent updates by the same or other transactions do not change what the cursor returns. ASENSITIVEcursor sees committed updates that satisfy the query’s predicate.ASENSITIVEis implementation-defined. The textbook implementation of insensitivity is to materialise the result; for sensitivity the engine must instead re-evaluate on each fetch. - Updatability. An
UPDATABLEcursor allowsWHERE CURRENT OFupdates and deletes. The implementation cost is that the cursor must carry the row’sOID(or rowid, or ctid) in its tuple stream so the engine can locate the row to update. - Holdability. A
WITH HOLDcursor survives aCOMMITof the transaction that opened it. WithoutWITH HOLDthe cursor is destroyed at commit because (a) the result-stream’s temp file may be reclaimed at transaction end, and (b) any locks the cursor was holding evaporate. Holdability requires the engine to detach the result stream from the transaction and either re-attach it to a session-scoped store or convert it to an immutable, read-only artefact whose lifetime extends to session end (or cursor close, whichever first).
Every cursor implementation is thus a triplet: (client-side positional state) × (server-side result-stream) × (lifetime binding). The client side must fast-walk the local position (forward/backward by one tuple, jump to first/last) without re-asking the server. The server side must hold the stream materialised in a re-readable form. The lifetime binding decides whether the stream is destroyed at commit, query close, transaction abort, session end, or connection drop.
A separate concern is result-set vs. cursor. JDBC and ODBC
clients distinguish a “result set” object (a thin wrapper over a
cursor with column metadata) from the cursor itself. CUBRID
collapses them: the client-facing object is the CURSOR_ID, and
column metadata travels independently in the QFILE_LIST_ID’s
type list. Higher-level wrappers (DB_QUERY_RESULT, the broker’s
T_QUERY_RESULT) hold the cursor by value and add result-set-
shaped metadata around it.
Common DBMS Design
Section titled “Common DBMS Design”Every server-client RDBMS lands on a few standard patterns; the moving parts are where the result lives, who pages it across the wire, and how the holdability boundary is enforced. The vocabulary is roughly the same across implementations.
PostgreSQL — Portals. The Postgres analogue of a cursor is the
Portal (src/backend/utils/mmgr/portalmem.c,
src/backend/tcop/pquery.c). A Portal binds a query plan, a
QueryDesc, an executor state, and a strategy (PORTAL_ONE_SELECT
for the streaming case; PORTAL_UTIL_SELECT for the materialised
case; PORTAL_MULTI_QUERY for multi-statement). Cursors created
with DECLARE CURSOR produce Portals; the FETCH command
(PerformPortalFetch) drives PortalRun which walks the
executor’s iterator tree forward (or backward, when the strategy
is PORTAL_ONE_SELECT and cursorOptions & CURSOR_OPT_SCROLL).
WITH HOLD portals (HoldPortal) materialise their remaining rows
into a Tuplestore at commit time so they survive the transaction.
The Portal lives in a memory context that outlives the
transaction.
MySQL — Stored-procedure cursor. MySQL exposes server-side
cursors only inside stored procedures (DECLARE c CURSOR FOR SELECT ...; OPEN c; FETCH c INTO v; CLOSE c;). The implementation
(sql/sp_rcontext.cc,
sql/sql_cursor.cc) writes the result into a Server_side_cursor
backed by a MEMORY or MyISAM temp table; FETCH walks that
table by sequence number. There is no client-facing
SCROLL/HOLD API at the protocol level; what the client gets
across the wire is row-batched result sets, and “scrolling” is the
client driver’s local cache. Holdability is moot — stored
procedures complete inside a transaction.
Oracle — REF CURSOR. Oracle’s SYS_REFCURSOR is a typed
handle that PL/SQL passes back to the client. The client (OCI
or JDBC) calls OCIStmtFetch2 to pull rows. The result lives in
a server-side area (the QEPROW_FRAME) and is re-fetched on
demand; scrolling is supported via OCI’s OCI_FETCH_FIRST,
OCI_FETCH_LAST, etc. Holdability is implicit — REF CURSORs
naturally outlive the inner transaction because they are bound
to the calling block, not the transaction.
SQL Server — Cursor types. SQL Server exposes the
fullest ANSI menu: STATIC (snapshot in tempdb),
KEYSET-driven (key columns saved, data re-fetched on each
position), DYNAMIC (predicate re-evaluated on each fetch), and
FAST_FORWARD (forward-only, read-only, no re-evaluation). The
storage substrate is tempdb worktables. WITH HOLD is supported
via CURSOR_HOLD_OVER_COMMIT on global cursors.
Theory ↔ CUBRID mapping
Section titled “Theory ↔ CUBRID mapping”CUBRID picks a static-snapshot, optionally-scrollable, optionally-
updatable cursor that is implemented entirely on top of a server-
side list-file (the materialised tuple stream described in
cubrid-list-file.md). The cursor is therefore always insensitive
in ANSI terms — the snapshot is whatever the executor wrote into
the list-file at OPEN time. The client side ferries one
network-page at a time and walks forward, backward, first, or
last by re-positioning inside the page chain without rerunning
the query. Holdability is implemented by detaching the underlying
list-file from the query manager’s transaction-scoped table at
COMMIT and re-attaching it to the session’s holdable-cursor list
(see cubrid-server-session.md SESSION_QUERY_ENTRY).
| Theoretical concept | CUBRID name |
|---|---|
| Cursor object (client side) | CURSOR_ID (src/query/cursor.h) |
| Underlying materialised stream | QFILE_LIST_ID (query_list.h) — see cubrid-list-file.md |
| Cursor position relative to set | CURSOR_POSITION enum (C_BEFORE / C_ON / C_AFTER) |
| Page-level position | current_vpid, current_tuple_no, current_tuple_offset |
| Open the cursor | cursor_open |
| Forward fetch | cursor_next_tuple |
| Backward fetch | cursor_prev_tuple |
| Jump to head | cursor_first_tuple |
| Jump to tail | cursor_last_tuple |
| Decode tuple value | cursor_get_tuple_value / _value_list |
| Close + free | cursor_close / cursor_free |
| Network page transport | qfile_get_list_file_page (client) / xqfile_get_list_file_page (server) |
| Updatable-cursor OID column | is_oid_included flag, cursor_get_current_oid |
| Pre-fetch dereferenced OIDs | cursor_prefetch_first_hidden_oid / _column_oids |
| Lock mode for prefetch | cursor_set_prefetch_lock_mode |
| Peek vs copy semantics | cursor_set_copy_tuple_value / is_copy_tuple_value |
| Holdable result flag | RESULT_HOLDABLE bit in QUERY_FLAG (query_list.h) |
| Holdable cursor state on session | SESSION_QUERY_ENTRY::list_id (in session.c) |
| Holdable-cursor preservation hook | session_preserve_temporary_files (session.c) |
| Cursor commit handoff | qmgr_clear_trans_wakeup → xsession_store_query_entry_info |
| Cursor commit destroy path | qmgr_clear_trans_wakeup (non-holdable branch) |
| Cursor reload after COMMIT | qmgr_get_query_entry → xsession_load_query_entry_info |
| Higher-level result wrapper | DB_QUERY_RESULT / DB_SELECT_RESULT (db_query.h) |
| Broker-side per-statement state | T_SRV_HANDLE::is_holdable, T_QUERY_RESULT (cas_execute.c) |
The architectural choice that distinguishes CUBRID is the list-file
underneath. Because every materialising operator in the executor
already produces a QFILE_LIST_ID (see cubrid-list-file.md), the
cursor module does not need its own backing store — it is a thin
client-side reader over a server-side artefact that the executor
already produces. Sort, hash-build, group-by, and final result all
write the same shape; the cursor reads it. The cost is that every
cursor result is materialised (no streaming forward-only cursor
fast path), but the architectural payoff is that there is one tuple
format, one page format, and one network protocol for every
result-bearing query.
CUBRID’s Approach
Section titled “CUBRID’s Approach”The cursor module has four moving parts: the CURSOR_ID struct
that holds the per-cursor positional state, the page-fetch loop
that pulls one QFILE_LIST_ID page at a time across the wire,
the tuple-decode path that turns length-prefixed packed bytes
into typed DB_VALUEs, and the holdability handshake between
the cursor’s underlying query entry, the query manager’s
transaction-scoped table, and the session’s holdable list. We walk
them in that order.
Overall structure
Section titled “Overall structure”flowchart LR
subgraph CL["Client side (CS or SA)"]
APP["Application / JDBC / CCI / broker (CAS)"]
DBQR["DB_QUERY_RESULT<br/>(db_query.h)<br/>res.s.cursor_id"]
CID["CURSOR_ID<br/>(cursor.h)"]
BUF["buffer_area<br/>(IO_MAX_PAGE_SIZE)"]
LISTID_CL["local QFILE_LIST_ID copy<br/>(deep-copied at open)"]
end
subgraph NET["Network (CS_MODE)"]
GLP["NET_SERVER_LS_GET_LIST_FILE_PAGE<br/>(network_interface_cl.c)"]
end
subgraph SR["Server"]
XGLP["xqfile_get_list_file_page<br/>(list_file.c)"]
QMGR["qmgr_get_query_entry<br/>(query_manager.c)"]
QENT["QMGR_QUERY_ENTRY<br/>list_id, temp_vfid, is_holdable"]
LISTID_SR["QFILE_LIST_ID<br/>(materialised result)"]
PAGES["page chain<br/>membuf + FILE_TEMP"]
SESS["SESSION_STATE.queries<br/>SESSION_QUERY_ENTRY (holdable)"]
end
APP --> DBQR
DBQR --> CID
CID --> BUF
CID --> LISTID_CL
CID -- one page at a time --> GLP
GLP --> XGLP
XGLP --> QMGR
QMGR --> QENT
QENT --> LISTID_SR
LISTID_SR --> PAGES
QENT -. preserved across COMMIT .-> SESS
The CURSOR_ID is the client’s authoritative view: it owns a
deep copy of the QFILE_LIST_ID (so the schema, tuple count,
and head/tail VPIDs can be walked without any server round-trip),
plus a malloc’d buffer_area the size of IO_MAX_PAGE_SIZE that
holds whatever page chunk the last qfile_get_list_file_page call
returned. Every advance routine (cursor_next_tuple,
cursor_prev_tuple, cursor_first_tuple, cursor_last_tuple)
first walks within the current page; only when the page boundary
is crossed does it call cursor_fetch_page_having_tuple, which is
the single funnel into qfile_get_list_file_page.
The CURSOR_ID struct
Section titled “The CURSOR_ID struct”// CURSOR_ID — src/query/cursor.htypedef struct cursor_id CURSOR_ID;struct cursor_id{ QUERY_ID query_id; /* server-side query handle */ QFILE_LIST_ID list_id; /* deep copy of the result-stream id */ OID *oid_set; /* prefetch OID buffer (this page) */ MOP *mop_set; /* prefetch MOP buffer (parallel array) */ int oid_ent_count; /* sizeof oid_set / mop_set */ CURSOR_POSITION position; /* C_BEFORE | C_ON | C_AFTER */ VPID current_vpid; /* page currently in `buffer` */ VPID next_vpid; /* unused in current code */ VPID header_vpid; /* head of multi-page network buffer */ int on_overflow; /* big-tuple overflow flag */ int tuple_no; /* absolute tuple index */ QFILE_TUPLE_RECORD tuple_record; /* reassembly buffer for big tuples */ char *buffer; /* current page within buffer_area */ char *buffer_area; /* IO_MAX_PAGE_SIZE bytes from server */ int buffer_filled_size; /* bytes server actually returned */ int buffer_tuple_count; /* tuples on `buffer` */ int current_tuple_no; /* tuple index within `buffer` */ int current_tuple_offset; /* byte offset within `buffer` */ char *current_tuple_p; /* pointer to current tuple bytes */ int *oid_col_no; /* additional OID-bearing columns */ int current_tuple_length; int oid_col_no_cnt; DB_FETCH_MODE prefetch_lock_mode; /* lock mode for prefetched objects */ int current_tuple_value_index; /* memo for repeated cursor_get_tuple_value */ char *current_tuple_value_p; bool is_updatable; bool is_oid_included; /* first tuple value is hidden OID */ bool is_copy_tuple_value; /* true = copy DB_VALUE, false = peek */};The struct mixes four concerns:
- Identity —
query_idandlist_id.query_idis the server- side handle the network protocol uses;list_idis the deep copy of the type list and page-chain head/tail so the client can plan forward/backward navigation without asking the server. - Network buffer —
buffer_area,buffer,buffer_filled_size,header_vpid. The client allocatesIO_MAX_PAGE_SIZEbytes once atcursor_open, and a singleqfile_get_list_file_pageround-trip may fill severalDB_PAGESIZEpages into that buffer (the server packs as many sequential pages as fit). The buffer is the local page cache; the position fields below are offsets into it. - Position —
position,tuple_no,current_vpid,current_tuple_no,current_tuple_offset,current_tuple_p,current_tuple_length. The cursor state machine moves these in lock-step.positionis the macro state (before all rows, on a row, after all rows); the rest is the micro state (which page, which tuple in the page, where in the page). - Per-fetch decoder memo —
current_tuple_value_indexandcurrent_tuple_value_paccelerate repeatedcursor_get_tuple_value (idx)calls on the same tuple by remembering where the last decode left off, so a sequential walk of columns 0..N isO(N)rather thanO(N^2).
Three flags pin the cursor’s mode:
is_oid_included— the result was opened with an updatable cursor in mind, so the executor synthesised a hidden first column carrying the underlying row’s OID.cursor_get_current_oidreads it;db_query_get_tuple_valueshifts user-visible column indices by 1.is_updatable— caller ofcursor_openrequested update semantics. Currently this only controls whethercursor_set_oid_columnsis allowed (it refuses ifis_updatableis set, because the hidden-OID path takes priority).is_copy_tuple_value— chooses betweenpr_data_readval (..., copy=true)(decode into a freshly allocatedDB_VALUE) andpr_data_readval (..., copy=false)(point theDB_VALUEdirectly into the cursor’s network buffer, valid only until the next page fetch). Default is true; the broker flips it to false when it knows the value will be encoded back onto the wire immediately.
Lifecycle FSM
Section titled “Lifecycle FSM”stateDiagram-v2 [*] --> CLOSED : (memory uninitialised) CLOSED --> OPEN_BEFORE : cursor_open \n deep-copy list_id, malloc buffer_area OPEN_BEFORE --> OPEN_BEFORE : no rows in result OPEN_BEFORE --> ON_ROW : cursor_next_tuple \n or cursor_first_tuple ON_ROW --> ON_ROW : cursor_next_tuple \n within or across page ON_ROW --> ON_ROW : cursor_prev_tuple ON_ROW --> AFTER_LAST : cursor_next_tuple past last AFTER_LAST --> ON_ROW : cursor_prev_tuple \n jumps to last_vpid LAST_TPL AFTER_LAST --> AFTER_LAST : cursor_next_tuple ON_ROW --> ON_ROW : cursor_first_tuple \n cursor_last_tuple ON_ROW --> CLOSED : cursor_close (frees list_id copy + buffer_area) AFTER_LAST --> CLOSED : cursor_close OPEN_BEFORE --> CLOSED : cursor_close
The three macro states (C_BEFORE, C_ON, C_AFTER) appear
literally as the CURSOR_POSITION enum in cursor.h. The
transitions match SQL/CLI cursor semantics: a freshly opened cursor
is positioned before the first row; a successful next/first
moves it on a row; running off the tail moves it after, and
the only way back is cursor_prev_tuple or cursor_last_tuple
(which special-case C_AFTER by jumping to the last page’s
LAST_TPL).
cursor_open — the deep copy and network buffer allocation
Section titled “cursor_open — the deep copy and network buffer allocation”// cursor_open — src/query/cursor.c (condensed)boolcursor_open (CURSOR_ID * cursor_id_p, QFILE_LIST_ID * list_id_p, bool updatable, bool is_oid_included){ static QFILE_LIST_ID empty_list_id; QFILE_CLEAR_LIST_ID (&empty_list_id);
cursor_id_p->is_updatable = updatable; cursor_id_p->is_oid_included = is_oid_included; cursor_id_p->position = C_BEFORE; cursor_id_p->tuple_no = -1; VPID_SET_NULL (&cursor_id_p->current_vpid); /* ... condensed: more zeroing ... */ cursor_id_p->is_copy_tuple_value = true;
if (cursor_copy_list_id (&cursor_id_p->list_id, list_id_p) != NO_ERROR) return false; cursor_id_p->query_id = list_id_p->query_id;
if (cursor_id_p->list_id.type_list.type_cnt) { cursor_id_p->buffer_area = (char *) malloc (CURSOR_BUFFER_AREA_SIZE); cursor_id_p->buffer = cursor_id_p->buffer_area; if (is_oid_included) cursor_allocate_oid_buffer (cursor_id_p); } return true;}Two facts deserve emphasis. First, cursor_copy_list_id performs a
deep copy of the type list (allocating a fresh
type_list.domp array and memcpying the source) and a shallow
clone of the page-chain head/tail VPIDs; it also mallocs a new
last_pgptr buffer if the source has one. After the copy the
cursor’s list_id is independent: the source can be freed without
disturbing the cursor.
// cursor_copy_list_id — src/query/cursor.c (condensed)intcursor_copy_list_id (QFILE_LIST_ID * dest_list_id_p, const QFILE_LIST_ID * src_list_id_p){ memcpy (dest_list_id_p, src_list_id_p, DB_SIZEOF (QFILE_LIST_ID));
dest_list_id_p->type_list.domp = NULL; if (src_list_id_p->type_list.type_cnt) { size_t size = src_list_id_p->type_list.type_cnt * sizeof (TP_DOMAIN *); dest_list_id_p->type_list.domp = (TP_DOMAIN **) malloc (size); memcpy (dest_list_id_p->type_list.domp, src_list_id_p->type_list.domp, size); } dest_list_id_p->tpl_descr.f_valp = NULL; dest_list_id_p->sort_list = NULL; /* never used at crs_ level */
if (src_list_id_p->last_pgptr) { dest_list_id_p->last_pgptr = (PAGE_PTR) malloc (CURSOR_BUFFER_SIZE); memcpy (dest_list_id_p->last_pgptr, src_list_id_p->last_pgptr, CURSOR_BUFFER_SIZE); } return NO_ERROR;}Second, the cursor’s buffer_area is IO_MAX_PAGE_SIZE, not
DB_PAGESIZE. The motivation is in xqfile_get_list_file_page
on the server: that function “appends pages until a network page
is full”, concatenating consecutive list-file pages into one wire
response so a single round-trip can ferry several DB_PAGESIZE
chunks. The client has to be ready to receive up to
IO_MAX_PAGE_SIZE bytes, then walk inside that buffer with the
overflow-vpid / next-vpid headers to find each page.
Forward fetch — cursor_next_tuple
Section titled “Forward fetch — cursor_next_tuple”// cursor_next_tuple — src/query/cursor.c (condensed)intcursor_next_tuple (CURSOR_ID * cursor_id_p){ cursor_initialize_current_tuple_value_position (cursor_id_p);
if (cursor_id_p->position == C_BEFORE) { if (VPID_ISNULL (&(cursor_id_p->list_id.first_vpid))) return DB_CURSOR_END; if (cursor_fetch_page_having_tuple (cursor_id_p, &cursor_id_p->list_id.first_vpid, FIRST_TPL, 0) != NO_ERROR) return DB_CURSOR_ERROR; QFILE_COPY_VPID (&cursor_id_p->current_vpid, &cursor_id_p->list_id.first_vpid); cursor_id_p->position = C_ON; cursor_id_p->tuple_no = -1; cursor_id_p->current_tuple_no = -1; cursor_id_p->current_tuple_length = 0; /* fall through into the C_ON branch */ }
if (cursor_id_p->position == C_ON) { VPID next_vpid; if (cursor_id_p->current_tuple_no < cursor_id_p->buffer_tuple_count - 1) { /* fast path: still in the same page, walk forward */ cursor_id_p->tuple_no++; cursor_id_p->current_tuple_no++; cursor_id_p->current_tuple_offset += cursor_id_p->current_tuple_length; cursor_id_p->current_tuple_p += cursor_id_p->current_tuple_length; cursor_id_p->current_tuple_length = QFILE_GET_TUPLE_LENGTH (cursor_id_p->current_tuple_p); } else if (QFILE_GET_NEXT_PAGE_ID (cursor_id_p->buffer) != NULL_PAGEID) { /* slow path: cross page boundary, fetch next page */ QFILE_GET_NEXT_VPID (&next_vpid, cursor_id_p->buffer); if (cursor_fetch_page_having_tuple (cursor_id_p, &next_vpid, FIRST_TPL, 0) != NO_ERROR) return DB_CURSOR_ERROR; QFILE_COPY_VPID (&cursor_id_p->current_vpid, &next_vpid); cursor_id_p->tuple_no++; } else { cursor_id_p->position = C_AFTER; cursor_id_p->tuple_no = cursor_id_p->list_id.tuple_cnt; return DB_CURSOR_END; } } else if (cursor_id_p->position == C_AFTER) return DB_CURSOR_END;
return DB_CURSOR_SUCCESS;}The shape is the textbook positional-cursor fast/slow split:
- Fast path (still on the current page) is pure pointer arithmetic — the per-tuple length prefix tells the cursor how far to advance, and the page header’s tuple count tells it when to stop. Zero round-trips, zero allocations, ~10 instructions per tuple.
- Slow path (cross page boundary) calls
cursor_fetch_page_having_tuplewith the next VPID; that may hit the local network-buffer cache (if the server packed several pages into one response and the next VPID is one of them) or trigger a freshqfile_get_list_file_pageround-trip.
The reverse function cursor_prev_tuple is symmetric, leveraging
the prev_tuple_length prefix CUBRID writes into every tuple and
the prev_pgid field on every page header (see cubrid-list-file.md
for the page format). The macros QFILE_GET_PREV_TUPLE_LENGTH
and QFILE_GET_PREV_PAGE_ID materialise the backward walk in one
arithmetic step.
The page-fetch funnel — cursor_fetch_page_having_tuple
Section titled “The page-fetch funnel — cursor_fetch_page_having_tuple”Every position change that crosses a page boundary funnels through one routine:
// cursor_fetch_page_having_tuple — src/query/cursor.c (condensed)intcursor_fetch_page_having_tuple (CURSOR_ID * cursor_id_p, VPID * vpid_p, int position, int offset){ cursor_initialize_current_tuple_value_position (cursor_id_p);
if (!VPID_EQ (&(cursor_id_p->current_vpid), vpid_p)) if (cursor_buffer_last_page (cursor_id_p, vpid_p) != NO_ERROR) return ER_FAILED;
if (cursor_id_p->buffer == NULL) return ER_FAILED;
if (cursor_point_current_tuple (cursor_id_p, position, offset) != NO_ERROR) return ER_FAILED;
if (QFILE_GET_OVERFLOW_PAGE_ID (cursor_id_p->buffer) != NULL_PAGEID) { if (cursor_construct_tuple_from_overflow_pages (cursor_id_p, vpid_p) != NO_ERROR) return ER_FAILED; } else cursor_id_p->current_tuple_p = cursor_id_p->buffer + cursor_id_p->current_tuple_offset;
if (cursor_id_p->buffer_tuple_count < 2) return NO_ERROR;
if (cursor_has_first_hidden_oid (cursor_id_p)) return cursor_prefetch_first_hidden_oid (cursor_id_p); else if (cursor_id_p->oid_col_no && cursor_id_p->oid_col_no_cnt) return cursor_prefetch_column_oids (cursor_id_p);
return NO_ERROR;}It does five things:
- If the requested VPID is already in the local network buffer (because a previous round-trip packed it), reuse it.
- Otherwise, call
cursor_buffer_last_pagewhich either points at the writer’s last_pgptr (in SA-mode, where the executor and the cursor share the same address space) or invokescursor_get_list_file_pagewhich is the network call. - Set the page-relative position fields (
current_tuple_no,current_tuple_offset,current_tuple_length) by callingcursor_point_current_tuple. Thepositionargument is one ofFIRST_TPL = -1,LAST_TPL = -2, or a literal tuple index. - If the page indicates the tuple has overflow chunks
(
QFILE_GET_OVERFLOW_PAGE_ID(buffer) != NULL_PAGEID), reassemble the full tuple by walking the overflow chain and copying chunks intotuple_record.tpl(a malloc’d reassembly buffer kept on the cursor). Otherwise the current tuple lives in-place inside the network buffer. - Vector OID prefetch. If the result has a hidden OID column
(the executor planted it at column 0 because the query is
updatable), or if the caller registered additional OID-bearing
columns via
cursor_set_oid_columns, walk the page once gathering OIDs intooid_setand issue onelocator_fetch_setcall to bring them all into the workspace. This is a page-grain optimisation: instead of paying one round-trip per per-row dereference, every page boundary triggers exactly one batched OID fetch. The decision is gated bybuffer_tuple_count < 2(a page with one tuple is not worth batching).
The page transport itself is one network round-trip on the client side:
// qfile_get_list_file_page (client-side stub) — src/communication/network_interface_cl.cintqfile_get_list_file_page (QUERY_ID query_id, VOLID volid, PAGEID pageid, char *buffer, int *buffer_size){ /* ... pack request, send NET_SERVER_LS_GET_LIST_FILE_PAGE ... */ return net_client_request2_no_malloc ( NET_SERVER_LS_GET_LIST_FILE_PAGE, request, sizeof (request), reply, sizeof (reply), NULL, 0, buffer, buffer_size);}The server-side handler (in list_file.c) is the corresponding amplifier:
// xqfile_get_list_file_page — src/query/list_file.c (condensed)intxqfile_get_list_file_page (THREAD_ENTRY * thread_p, QUERY_ID query_id, VOLID vol_id, PAGEID page_id, char *page_buf_p, int *page_size_p){ /* ... resolve query_id → QMGR_QUERY_ENTRY → QFILE_LIST_ID → QMGR_TEMP_FILE ... */
get_page: /* append pages until a network page is full */ while ((*page_size_p + DB_PAGESIZE) <= IO_MAX_PAGE_SIZE) { page_p = qmgr_get_old_page (thread_p, &vpid, tfile_vfid_p); QFILE_GET_OVERFLOW_VPID (&next_vpid, page_p); if (next_vpid.pageid == NULL_PAGEID) QFILE_GET_NEXT_VPID (&next_vpid, page_p);
/* trim trailing zero-bytes if this is a regular page */ if (QFILE_GET_TUPLE_COUNT (page_p) == QFILE_OVERFLOW_TUPLE_COUNT_FLAG || QFILE_GET_OVERFLOW_PAGE_ID (page_p) != NULL_PAGEID) one_page_size = DB_PAGESIZE; else one_page_size = (QFILE_GET_LAST_TUPLE_OFFSET (page_p) + QFILE_GET_TUPLE_LENGTH (page_p + QFILE_GET_LAST_TUPLE_OFFSET (page_p)));
memcpy (page_buf_p + *page_size_p, page_p, one_page_size); qmgr_free_old_page_and_init (thread_p, page_p, tfile_vfid_p); *page_size_p += one_page_size;
VPID_COPY (&vpid, &next_vpid); if (VPID_ISNULL (&vpid)) break; } return NO_ERROR;}The two important behaviours are multi-page packing (the
while loop) and last-tuple-only copy (the one_page_size
computation): a page that is logically full of e.g. 3KB of tuples
out of a 16KB physical page only needs 3KB shipped; the empty
tail is trimmed before memcpy. The trade-off is the overflow
case, where the trailing chunks must be copied verbatim because
they are not normal tuples.
Tuple decode — cursor_get_tuple_value
Section titled “Tuple decode — cursor_get_tuple_value”The on-page tuple format is the list-file’s length-prefixed packed row described in cubrid-list-file.md:
[ tuple_length (4) | prev_tuple_length (4) | val0 | val1 | ... ][ flag (4) | val_len (4) | <packed bytes, MAX_ALIGNMENT-padded> ]The cursor’s decoder is cursor_get_tuple_value:
// cursor_get_tuple_value — src/query/cursor.c (condensed)intcursor_get_tuple_value (CURSOR_ID * cursor_id_p, int index, DB_VALUE * value_p){ if (cursor_id_p->is_oid_included == true) index++; /* shift past the hidden first column */
char *tuple_p = cursor_peek_tuple (cursor_id_p); if (tuple_p == NULL) return ER_FAILED; return cursor_get_tuple_value_from_list (cursor_id_p, index, value_p, tuple_p);}cursor_peek_tuple returns the cached current_tuple_p (and
errors out if position != C_ON). The actual decode is in
cursor_get_tuple_value_from_list:
// cursor_get_tuple_value_from_list — src/query/cursor.c (condensed)static intcursor_get_tuple_value_from_list (CURSOR_ID * cursor_id_p, int index, DB_VALUE * value_p, char *tuple_p){ QFILE_TUPLE_VALUE_TYPE_LIST *type_list_p = &cursor_id_p->list_id.type_list; OR_BUF buffer;
or_init (&buffer, tuple_p, QFILE_GET_TUPLE_LENGTH (tuple_p));
/* fast path: previous call left us pointing at column k, k <= index */ int i; if (cursor_id_p->current_tuple_value_index >= 0 && cursor_id_p->current_tuple_value_index <= index && cursor_id_p->current_tuple_value_p != NULL) { i = cursor_id_p->current_tuple_value_index; tuple_p = cursor_id_p->current_tuple_value_p; } else { i = 0; tuple_p += QFILE_TUPLE_LENGTH_SIZE; }
for (; i < index; i++) tuple_p += (QFILE_TUPLE_VALUE_HEADER_SIZE + QFILE_GET_TUPLE_VALUE_LENGTH (tuple_p));
cursor_id_p->current_tuple_value_index = i; cursor_id_p->current_tuple_value_p = tuple_p;
QFILE_TUPLE_VALUE_FLAG flag = QFILE_GET_TUPLE_VALUE_FLAG (tuple_p); tuple_p += QFILE_TUPLE_VALUE_HEADER_SIZE; buffer.ptr = tuple_p;
return cursor_get_tuple_value_to_dbvalue (&buffer, type_list_p->domp[i], flag, value_p, cursor_id_p->is_copy_tuple_value);}The forward-walking memo in current_tuple_value_index /
current_tuple_value_p makes a per-row column scan
(get_value(0); get_value(1); ...; get_value(N-1)) cost O(N)
total bytes-skipped rather than O(N^2) — when the previous call
ended at column k and the next call asks for column k+1 the
loop begins at k, not at 0. The memo is invalidated by every
cursor-position change (cursor_initialize_current_tuple_value_position
is called from cursor_next_tuple, cursor_prev_tuple, and
cursor_fetch_page_having_tuple).
The actual byte-to-DB_VALUE conversion is delegated to the
primitive type’s data_readval:
// cursor_get_tuple_value_to_dbvalue — src/query/cursor.c (condensed)static intcursor_get_tuple_value_to_dbvalue (OR_BUF * buffer_p, TP_DOMAIN * domain_p, QFILE_TUPLE_VALUE_FLAG value_flag, DB_VALUE * value_p, bool is_copy){ const PR_TYPE *pr_type = domain_p->type; if (value_flag == V_UNBOUND) { db_value_domain_init (value_p, pr_type->id, domain_p->precision, domain_p->scale); return NO_ERROR; /* SQL NULL */ } if (pr_type->id == DB_TYPE_VOBJ) return cursor_copy_vobj_to_dbvalue (buffer_p, value_p); if (pr_type->data_readval (buffer_p, value_p, domain_p, -1, is_copy, NULL, 0) != NO_ERROR) return ER_FAILED; return cursor_fixup_vobjs (value_p);}cursor_fixup_vobjs is the post-decode hook that turns OID-bearing
values into MOPs (managed-object pointers — see the locator/MOP
module): a DB_TYPE_OID becomes DB_TYPE_OBJECT via
vid_oid_to_object, a DB_TYPE_VOBJ becomes a vmop via
vid_vobj_to_object, and a set/sequence/multiset of either is
walked and recursively fixed up. Without this hook the cursor
would hand the application a raw OID; the application code
expects to receive an object handle that already has its
workspace entry populated.
Updatable cursor — hidden OID column and OID prefetch
Section titled “Updatable cursor — hidden OID column and OID prefetch”When cursor_open(... is_oid_included=true), the executor has
prepended a hidden first column to every tuple carrying the
underlying row’s OID (or VOBJ for view rows). This is what makes
UPDATE WHERE CURRENT OF possible — the cursor knows which row
to point the update at.
cursor_get_current_oid reads it:
// cursor_get_current_oid — src/query/cursor.cintcursor_get_current_oid (CURSOR_ID * cursor_id_p, DB_VALUE * value_p){ assert (cursor_id_p->is_oid_included == true); char *tuple_p = cursor_peek_tuple (cursor_id_p); if (tuple_p == NULL) return ER_FAILED; return cursor_get_first_tuple_value (tuple_p, &cursor_id_p->list_id.type_list, value_p, cursor_id_p->is_copy_tuple_value);}The user-visible side effect is that cursor_get_tuple_value(idx)
shifts idx by 1 to skip past the hidden column.
The bigger optimisation around the hidden OID is
page-grain vector prefetch. After every page fetch,
cursor_fetch_page_having_tuple calls
cursor_prefetch_first_hidden_oid (or _column_oids if the caller
registered additional columns):
// cursor_prefetch_first_hidden_oid — src/query/cursor.c (condensed)static intcursor_prefetch_first_hidden_oid (CURSOR_ID * cursor_id_p){ int tuple_count = QFILE_GET_TUPLE_COUNT (cursor_id_p->buffer); QFILE_TUPLE current_tuple = cursor_id_p->buffer + QFILE_PAGE_HEADER_SIZE; int oid_index = 0;
for (int i = 0; i < tuple_count; i++) { int current_tuple_length = QFILE_GET_TUPLE_LENGTH (current_tuple); DB_TYPE type = TP_DOMAIN_TYPE (cursor_id_p->list_id.type_list.domp[0]); char *tuple_p = (char *) current_tuple + QFILE_TUPLE_LENGTH_SIZE; if (QFILE_GET_TUPLE_VALUE_FLAG (tuple_p) != V_BOUND) { current_tuple = tuple_p + current_tuple_length; continue; } OID *current_oid_p = cursor_get_oid_from_tuple (tuple_p, type); if (current_oid_p && oid_index < cursor_id_p->oid_ent_count) { COPY_OID (&cursor_id_p->oid_set[oid_index], current_oid_p); oid_index++; } current_tuple = (char *) current_tuple + current_tuple_length; }
return cursor_fetch_oids (cursor_id_p, oid_index, cursor_id_p->prefetch_lock_mode, (cursor_id_p->prefetch_lock_mode == DB_FETCH_WRITE) ? DB_FETCH_QUERY_WRITE : DB_FETCH_QUERY_READ);}Three behaviours:
- The walk visits every tuple on the page once, extracting the
first-column OID into
oid_set. cursor_fetch_oidscallslocator_fetch_set(orlocator_fetch_objectif there is exactly one OID), which is the locator-manager’s batched fetch primitive. This is one network round-trip for the whole page, replacing what would otherwise be onelocator_fetch_objectper per-row dereference.- The lock mode is whatever the caller registered via
cursor_set_prefetch_lock_mode. If the cursor is opened bySELECT ... FOR UPDATEthe broker / driver flips this toDB_FETCH_WRITEso the prefetch acquires X locks; otherwise the default isDB_FETCH_READ.
The oid_set and mop_set parallel arrays are sized at
cursor_open time as CEIL_PTVDIV(DB_PAGESIZE, sizeof(OID)) - 1,
which is the maximum number of OIDs that can fit on a single
page (allowing one slot for the page header). Allocation failure
of either is non-fatal — cursor_allocate_oid_buffer simply zeroes
oid_ent_count and the prefetch is silently skipped.
Holdability — surviving COMMIT
Section titled “Holdability — surviving COMMIT”Holdability is the most subtle concern in the cursor module and the one place where the cursor abstraction reaches deeply into the rest of the engine. The data flow is:
sequenceDiagram
autonumber
participant CL as Client (broker / app)
participant QM as Query Manager
participant TR as Transaction
participant SE as Session
CL->>QM: prepare/execute (RESULT_HOLDABLE flag)
QM->>QM: query_p->is_holdable = true
Note right of QM: Result list-file built, tuples written
CL->>TR: COMMIT
TR->>QM: qmgr_clear_trans_wakeup(tran_index, is_abort=false)
loop for each query in tran_entry_p->query_entry_list_p
QM->>QM: if query_p->is_holdable && !is_abort
QM->>SE: xsession_store_query_entry_info (query_p)
SE->>SE: qentry_to_sentry — moves list_id/temp_vfid pointer<br/>session_preserve_temporary_files — file_temp_preserve
SE->>SE: prepend SESSION_QUERY_ENTRY to state_p->queries
QM->>QM: query_p->list_id = NULL, temp_vfid = NULL
QM->>QM: free QMGR_QUERY_ENTRY
end
Note over CL: COMMIT returns; cursor still holds query_id
CL->>QM: cursor_next_tuple → qfile_get_list_file_page (query_id)
QM->>QM: qmgr_get_query_entry not in tran-table
QM->>SE: xsession_load_query_entry_info (query_id)
SE->>QM: sentry_to_qentry — recreate QMGR_QUERY_ENTRY
QM->>CL: serve page
The key invariant is that the server-side QFILE_LIST_ID and its
backing FILE_TEMP survive across the COMMIT because:
- The query manager’s
qmgr_clear_trans_wakeup(called by transaction-end) detectsis_holdable && !is_abortand instead of destroying the list-file, it moves ownership oflist_idandtemp_vfidfrom the transaction-scopedQMGR_QUERY_ENTRYto a new session-scopedSESSION_QUERY_ENTRY. The query manager’s copy of the pointers is nulled (query_p->list_id = NULL; query_p->temp_vfid = NULL;) so the subsequentqfile_close_list/qmgr_free_query_temp_file_helperis a no-op for the holdable path. session_preserve_temporary_fileswalks the temp-file chain and callsfile_temp_preserveon every backingFILE_TEMPso the file manager’s transaction-end cleanup (file_tempcache_drop_tran) skips them.
// qmgr_clear_trans_wakeup — src/query/query_manager.c (the holdable branch, condensed)if (query_p->is_holdable) { if (is_abort || is_tran_died) xsession_clear_query_entry_info (thread_p, query_p->query_id); else { xsession_store_query_entry_info (thread_p, query_p); query_p->list_id = NULL; query_p->temp_vfid = NULL; } }/* fall-through: destroy whatever pointers remain (NULL for holdable+commit) */if (query_p->list_id) { qfile_close_list (thread_p, query_p->list_id); QFILE_FREE_AND_INIT_LIST_ID (query_p->list_id); }if (query_p->temp_vfid != NULL) (void) qmgr_free_query_temp_file_helper (thread_p, query_p);// session_store_query_entry_info — src/session/session.c (condensed)voidsession_store_query_entry_info (THREAD_ENTRY * thread_p, QMGR_QUERY_ENTRY * qentry_p){ SESSION_STATE *state_p = session_get_session_state (thread_p); if (state_p == NULL) return;
for (SESSION_QUERY_ENTRY *current = state_p->queries; current; current = current->next) if (current->query_id == qentry_p->query_id) { /* idempotent — caller is in qmgr_clear_trans_wakeup, will null these */ qentry_p->list_id = NULL; qentry_p->temp_vfid = NULL; return; }
SESSION_QUERY_ENTRY *sqentry_p = qentry_to_sentry (qentry_p); /* qentry_to_sentry STEALS list_id and temp_file from qentry_p, nulling them */ session_preserve_temporary_files (thread_p, sqentry_p); sqentry_p->next = state_p->queries; state_p->queries = sqentry_p; sessions.num_holdable_cursors++;}The second half of the protocol — the post-COMMIT fetch — is in
qmgr_get_query_entry:
// qmgr_get_query_entry — src/query/query_manager.c (condensed)QMGR_QUERY_ENTRY *qmgr_get_query_entry (THREAD_ENTRY * thread_p, QUERY_ID query_id, int tran_index){ /* normal path: look up in this transaction's list */ pthread_mutex_lock (&tran_entry_p->mutex); query_p = qmgr_find_query_entry (tran_entry_p->query_entry_list_p, query_id); pthread_mutex_unlock (&tran_entry_p->mutex); if (query_p != NULL) return query_p;
/* fallback: maybe it's a holdable result on the session */ query_p = qmgr_allocate_query_entry (thread_p, tran_entry_p); query_p->query_id = query_id; if (xsession_load_query_entry_info (thread_p, query_p) != NO_ERROR) { qmgr_free_query_entry (thread_p, tran_entry_p, query_p); return NULL; } qmgr_add_query_entry (thread_p, query_p, tran_index); return query_p;}The first transactional fetch after COMMIT will find the
transaction’s query-entry list empty for that query_id, fall
through to xsession_load_query_entry_info, copy the list-file
pointers from the SESSION_QUERY_ENTRY back into a freshly
allocated QMGR_QUERY_ENTRY, attach that to the new transaction,
and continue serving pages. The cursor on the client never knew
the difference — it kept calling cursor_next_tuple, which kept
calling qfile_get_list_file_page (query_id), and the only
transitional cost is one O(N) walk of the session’s holdable
list to find the right entry (capped at
MAX_HOLDABLE_CURSORS_COUNT-shaped behaviour).
The broker’s role — RESULT_HOLDABLE flag
Section titled “The broker’s role — RESULT_HOLDABLE flag”The flag that drives the whole protocol is set at the broker (CAS) layer based on what the JDBC/CCI client requested:
// cas_execute.c (condensed; multiple call sites)if (jdbc_holdable_request) srv_handle->is_holdable = true;db_session_set_holdable ((DB_SESSION *) srv_handle->session, srv_handle->is_holdable);/* ... later, after execute ... */if (srv_handle->is_holdable == true) { srv_handle->q_result->is_holdable = true; as_info->num_holdable_results++; }The db_session_set_holdable propagates the bit into the session’s
prepared-statement state which, on db_execute_and_keep_statement,
ORs RESULT_HOLDABLE into the QUERY_FLAG shipped to the server.
The server’s xqmgr_execute_query reads it:
// (paraphrased) — query_manager.cif (*flag_p & RESULT_HOLDABLE) query_p->is_holdable = true;else query_p->is_holdable = false;The CAS counter as_info->num_holdable_results is the broker’s
local view of how many cursors in this connection are holdable; it
matches the server’s sessions.num_holdable_cursors across the
session (subject to broker-restart and connection-drop edge
cases — see Open Questions).
Higher-level wrappers
Section titled “Higher-level wrappers”Most callers do not touch CURSOR_ID directly. The two main
wrappers are:
DB_QUERY_RESULTincompat/db_query.h. Holds the cursor by value insideres.s.cursor_idforT_SELECT-typed results, and exposes thedb_query_*family (db_query_first_tuple,db_query_next_tuple,db_query_seek_tuple, etc.) that delegate to the correspondingcursor_*calls. Thedb_query_seek_tuplefunction gives an absolute/relative/end-relative seek by walking forward or backward via repeatedcursor_next_tuple/cursor_prev_tuplecalls, optionally short-circuited by adb_query_get_tplpos/db_query_set_tplpossave/restore pair (theDB_QUERY_TPLPOSstruct stores(crs_pos, vpid, tpl_no, tpl_off)— exactly the fields needed to re-seat a cursor on a previously-visited tuple without re-walking the pages).T_SRV_HANDLE/T_QUERY_RESULTinbroker/cas_handle.h. The CAS process’s per-statement state, holding aDB_QUERY_RESULT *plus driver-side metadata (column types in CAS wire format, prepared-handle id, holdability bit). The broker’s wire-protocol fetch handlers (fn_fetch,fn_get_db_parameter, …) translate the wire request intodb_query_seek_tuple/db_query_get_tuple_valuecalls.
A note on the cursor_free_list_id macro
Section titled “A note on the cursor_free_list_id macro”The header exports two intriguing macros:
#define cursor_free_list_id(list_id) \ do { ... free_and_init the inner pointers ... } while (0)#define cursor_free_self_list_id(list_id) \ do { cursor_free_list_id (list_id); free_and_init (list_id); } while (0)These are not the symmetric of qfile_free_list_id — they are the
client-side cleanup for a QFILE_LIST_ID that was deep-copied
by cursor_copy_list_id. The macro frees last_pgptr (allocated
fresh by cursor_copy_list_id), tpl_descr.f_valp (typically
NULL on the client side), sort_list (always NULL on the client
side per cursor_copy_list_id), and type_list.domp (the
malloc’d domain pointer array). Because the cursor’s list_id is
embedded by value (not by pointer), cursor_free calls the
non-self form on &cursor_id_p->list_id.
Source Walkthrough
Section titled “Source Walkthrough”Symbols grouped by concern. Line numbers are observed values as of
this updated: date and decay; anchor on the symbol name.
CURSOR_ID lifecycle
Section titled “CURSOR_ID lifecycle”| Symbol | Role |
|---|---|
CURSOR_ID (struct) | The client-side handle (cursor.h) |
CURSOR_POSITION enum | C_BEFORE / C_ON / C_AFTER (cursor.h) |
cursor_open | Constructor — deep-copies list_id, allocates buffer_area, optionally allocates oid_set |
cursor_close | Destructor wrapper — calls cursor_free then zeroes positional fields |
cursor_free | Frees deep-copied list_id inner pointers, buffer_area, tuple_record.tpl, oid_set, mop_set |
cursor_copy_list_id | The deep copy routine (called from cursor_open); allocates fresh domp[] and last_pgptr |
cursor_free_list_id macro | The matching shallow free (called from cursor_free) — releases last_pgptr, tpl_descr.f_valp, sort_list, type_list.domp |
cursor_free_self_list_id macro | The owning version — additionally frees the struct itself |
cursor_allocate_oid_buffer | Sizes and allocates oid_set / mop_set for hidden-OID prefetch |
cursor_set_oid_columns | Registers extra OID-bearing columns; refuses if is_oid_included or is_updatable already set |
cursor_set_copy_tuple_value | Toggles copy-vs-peek for cursor_get_tuple_value |
cursor_set_prefetch_lock_mode | Toggles lock mode for cursor_prefetch_*_oids |
Positional state machine
Section titled “Positional state machine”| Symbol | Role |
|---|---|
cursor_next_tuple | Forward fetch; fast path stays in buffer, slow path triggers cursor_fetch_page_having_tuple |
cursor_prev_tuple | Backward fetch; uses tuple’s prev_tuple_length and page’s prev_pgid |
cursor_first_tuple | Jump to head — fetch list_id.first_vpid, position=FIRST_TPL |
cursor_last_tuple | Jump to tail — fetch list_id.last_vpid, position=LAST_TPL |
cursor_point_current_tuple | Sets current_tuple_no, current_tuple_offset, current_tuple_length from a position+offset; understands FIRST_TPL = -1, LAST_TPL = -2 |
cursor_initialize_current_tuple_value_position | Invalidates the per-tuple decode memo on every position change |
cursor_peek_tuple | Returns current_tuple_p — errors if position != C_ON |
Page transport
Section titled “Page transport”| Symbol | Role |
|---|---|
cursor_fetch_page_having_tuple | The single funnel into the network round-trip; integrates page-fetch + position + overflow-reassemble + OID-prefetch |
cursor_buffer_last_page | Either points at writer’s last_pgptr (SA-mode) or calls cursor_get_list_file_page |
cursor_get_list_file_page | Walks the local network buffer for a hit; on miss calls qfile_get_list_file_page |
qfile_get_list_file_page | Client-side network stub (network_interface_cl.c) — wire request NET_SERVER_LS_GET_LIST_FILE_PAGE |
xqfile_get_list_file_page | Server-side handler (list_file.c) — packs multiple pages until IO_MAX_PAGE_SIZE is full |
cursor_construct_tuple_from_overflow_pages | Reassembles a big tuple from its overflow chain into tuple_record.tpl |
cursor_allocate_tuple_area | Malloc/realloc for the reassembly buffer |
Tuple decoding
Section titled “Tuple decoding”| Symbol | Role |
|---|---|
cursor_get_tuple_value | User-facing decoder; shifts index by 1 if is_oid_included |
cursor_get_tuple_value_list | Convenience wrapper looping over all columns |
cursor_get_tuple_value_from_list | Skip-ahead memoised column walker |
cursor_get_first_tuple_value | Specialised walk to column 0 (used by cursor_get_current_oid) |
cursor_get_tuple_value_to_dbvalue | Dispatches to pr_type->data_readval (or cursor_copy_vobj_to_dbvalue for DB_TYPE_VOBJ) |
cursor_fixup_vobjs | Post-decode hook turning DB_TYPE_OID/DB_TYPE_VOBJ into DB_TYPE_OBJECT (MOP) and recursing into sets |
cursor_fixup_set_vobjs | The set/multiset/sequence variant of the above |
cursor_copy_vobj_to_dbvalue | Decode a DB_TYPE_VOBJ packed value into a vmop |
OID prefetch
Section titled “OID prefetch”| Symbol | Role |
|---|---|
cursor_has_first_hidden_oid | Predicate: is_oid_included && oid_ent_count > 0 && type_list.domp[0] is DB_TYPE_OBJECT |
cursor_prefetch_first_hidden_oid | Page-grain walk gathering first-column OIDs |
cursor_prefetch_column_oids | Page-grain walk gathering OIDs from oid_col_no[] columns |
cursor_get_oid_from_tuple | Reads a single OID/VOBJ value out of a tuple |
cursor_get_oid_from_vobj | The VOBJ→base-instance unwrap |
cursor_fetch_oids | Calls locator_fetch_object (single) or locator_fetch_set (batch) |
cursor_get_current_oid | User-facing read of the hidden first column |
Holdability hand-off (cross-module)
Section titled “Holdability hand-off (cross-module)”| Symbol | File | Role |
|---|---|---|
RESULT_HOLDABLE | src/query/query_list.h | The wire-protocol bit set by the client to request holdable cursor |
db_session_set_holdable | src/compat/db_session.c | Propagates holdable bit from broker into the session |
T_SRV_HANDLE::is_holdable / T_QUERY_RESULT::is_holdable | src/broker/cas_handle.h | Broker-side per-handle holdable flag |
as_info->num_holdable_results | src/broker/cas_execute.c | Broker-side counter |
QMGR_QUERY_ENTRY::is_holdable | src/query/query_manager.h | Server-side per-query holdable flag |
qmgr_clear_trans_wakeup | src/query/query_manager.c | Transaction-end hook — routes holdable entries to session |
xsession_store_query_entry_info | src/session/session_sr.c | Server-entry wrapper around session_store_query_entry_info |
session_store_query_entry_info | src/session/session.c | Moves list-file ownership from query manager to session |
qentry_to_sentry | src/session/session.c | Steals list_id/temp_file pointers (zeroes the source) |
session_preserve_temporary_files | src/session/session.c | Calls file_temp_preserve on every backing temp file |
xsession_load_query_entry_info | src/session/session_sr.c | Server-entry wrapper around session_load_query_entry_info |
session_load_query_entry_info | src/session/session.c | Reverse — finds the holdable entry and copies pointers back |
sentry_to_qentry | src/session/session.c | The reverse copy, sets is_holdable = true on the new query entry |
qmgr_get_query_entry | src/query/query_manager.c | Hot-path lookup with holdable fallback |
session_remove_query_entry_info | src/session/session.c | Removes a holdable entry on cursor close |
session_remove_query_entry_all | src/session/session.c | Bulk remove on connection drop |
Higher-level wrappers (callers of cursor_*)
Section titled “Higher-level wrappers (callers of cursor_*)”| Symbol | File | Role |
|---|---|---|
DB_QUERY_RESULT::res::s::cursor_id | src/compat/db_query.h | The embed point in the result-set wrapper |
db_query_first_tuple / _last_tuple / _next_tuple / _prev_tuple / _seek_tuple | src/compat/db_query.c | Thin delegators to cursor_* |
db_query_get_tplpos / _set_tplpos | src/compat/db_query.c | Save/restore a position into a DB_QUERY_TPLPOS |
db_query_get_tuple_object / _value | src/compat/db_query.c | Wrap cursor_get_current_oid / cursor_get_tuple_value |
pt_new_query_result_descriptor | src/parser/query_result.c | Constructs a DB_QUERY_RESULT from a parsed query — the place cursor_open is called for each compiled SELECT |
parse_evaluate.c cursor calls | src/parser/parse_evaluate.c | Used for inline subquery evaluation in the parser (open, next, close) |
cas_execute.c is_holdable set | src/broker/cas_execute.c | Where RESULT_HOLDABLE enters the protocol |
Position hints (as observed for this revision)
Section titled “Position hints (as observed for this revision)”| Symbol | File | Line |
|---|---|---|
CURSOR_ID (struct) | src/query/cursor.h | 52 |
CURSOR_POSITION | src/query/cursor.h | 44 |
cursor_free_list_id macro | src/query/cursor.h | 86 |
cursor_free_self_list_id macro | src/query/cursor.h | 105 |
cursor_open | src/query/cursor.c | 1194 |
cursor_close | src/query/cursor.c | 1381 |
cursor_free | src/query/cursor.c | 1342 |
cursor_copy_list_id | src/query/cursor.c | 105 |
cursor_allocate_oid_buffer | src/query/cursor.c | 1140 |
cursor_set_oid_columns | src/query/cursor.c | 1322 |
cursor_set_copy_tuple_value | src/query/cursor.c | 1291 |
cursor_set_prefetch_lock_mode | src/query/cursor.c | 1267 |
cursor_next_tuple | src/query/cursor.c | 1482 |
cursor_prev_tuple | src/query/cursor.c | 1568 |
cursor_first_tuple | src/query/cursor.c | 1652 |
cursor_last_tuple | src/query/cursor.c | 1696 |
cursor_get_tuple_value | src/query/cursor.c | 1734 |
cursor_get_tuple_value_list | src/query/cursor.c | 1778 |
cursor_get_tuple_value_from_list | src/query/cursor.c | 424 |
cursor_get_tuple_value_to_dbvalue | src/query/cursor.c | 375 |
cursor_get_first_tuple_value | src/query/cursor.c | 483 |
cursor_fetch_page_having_tuple | src/query/cursor.c | 992 |
cursor_buffer_last_page | src/query/cursor.c | 946 |
cursor_get_list_file_page | src/query/cursor.c | 506 |
cursor_point_current_tuple | src/query/cursor.c | 911 |
cursor_construct_tuple_from_overflow_pages | src/query/cursor.c | 666 |
cursor_allocate_tuple_area | src/query/cursor.c | 639 |
cursor_initialize_current_tuple_value_position | src/query/cursor.c | 85 |
cursor_peek_tuple | src/query/cursor.c | 1420 |
cursor_get_current_oid | src/query/cursor.c | 1449 |
cursor_fixup_vobjs | src/query/cursor.c | 282 |
cursor_fixup_set_vobjs | src/query/cursor.c | 185 |
cursor_copy_vobj_to_dbvalue | src/query/cursor.c | 333 |
cursor_has_first_hidden_oid | src/query/cursor.c | 727 |
cursor_prefetch_first_hidden_oid | src/query/cursor.c | 786 |
cursor_prefetch_column_oids | src/query/cursor.c | 841 |
cursor_fetch_oids | src/query/cursor.c | 740 |
cursor_get_oid_from_tuple | src/query/cursor.c | 622 |
cursor_get_oid_from_vobj | src/query/cursor.c | 591 |
cursor_print_list (debug) | src/query/cursor.c | 1062 |
qfile_get_list_file_page (client) | src/communication/network_interface_cl.c | 6676 |
xqfile_get_list_file_page (server) | src/query/list_file.c | 2312 |
qmgr_clear_trans_wakeup | src/query/query_manager.c | 2271 |
qmgr_get_query_entry (holdable fallback) | src/query/query_manager.c | 566 |
session_store_query_entry_info | src/session/session.c | 2508 |
session_load_query_entry_info | src/session/session.c | 2593 |
session_remove_query_entry_info | src/session/session.c | 2652 |
session_remove_query_entry_all | src/session/session.c | 2622 |
qentry_to_sentry | src/session/session.c | 2406 |
sentry_to_qentry | src/session/session.c | 2484 |
session_preserve_temporary_files | src/session/session.c | 2442 |
RESULT_HOLDABLE | src/query/query_list.h | 584 |
DB_SELECT_RESULT::cursor_id | src/compat/db_query.h | 74 |
DB_CURSOR_SUCCESS / END / ERROR | src/compat/dbtype_def.h | 176 |
Cross-check Notes
Section titled “Cross-check Notes”-
vs.
cubrid-list-file.md. The list-file document treatsQFILE_LIST_IDas the producer/consumer boundary inside one query execution:qfile_open_listwrites,qfile_open_list_scanreads. The cursor takes the place of the executor’s list-scan when the consumer is the network client rather than another XASL operator. The on-page tuple format is identical —tuple_length,prev_tuple_length, value flag, value length, packed bytes. The cursor pays the same price for big tuples (overflow chain reassembly) and the same price for multi-page network round-trips (theIO_MAX_PAGE_SIZE-sized buffer). What is new in the cursor module is the network-buffer cache (the cursor walks withinbuffer_areauntil it crosses a page that was not packed into the last response) and the OID-prefetch optimisation, which the in-serverqfile_open_list_scandoes not need because the executor already has the OIDs in MOP form. -
vs.
cubrid-server-session.md. The session document enumeratesSESSION_QUERY_ENTRYas one of the three named- catalogue lists hung offSESSION_STATE(alongside session variables and prepared statements), and notes thatsession_store_query_entry_infois “called by the query manager for each holdable result; it copies the query manager’s entry into the session and steals thelist_idandtemp_vfidpointers”. This document is the consumer side of that contract — the cursor is what the client uses to read what the session is now keeping alive. The session document also flags asessions.num_holdable_cursorsglobal counter; the cursor side bumpsas_info->num_holdable_resultson the broker side, and the two should stay in lockstep modulo broker restart. -
vs.
cubrid-query-executor.md. The executor document describesS_LIST_SCANas one of the SCAN_ID arms — the server-side, in-process consumer of aQFILE_LIST_ID. The cursor is the client-side, cross-network consumer of the same artefact. Thexqfile_get_list_file_pageserver entry is what bridges them: it asks the query manager for the query’sQFILE_LIST_ID, walks its page chain viaqmgr_get_old_page(which is the same gatekeeper used byS_LIST_SCAN), and copies the page bytes into the network reply. The two consumers do not share state — the cursor’s position is purely client-side, theS_LIST_SCAN’s position is purely server-side — but they share the underlying tuple format, page format, and storage substrate. -
One-direction VOBJ dependency. The cursor calls
vid_oid_to_objectandvid_vobj_to_objectfromcursor_fixup_vobjs. These are part of the virtual-objects / view-instance subsystem (see alsosrc/object/virtual_object.c), and require the workspace (src/object/work_space.c) and the locator client (src/object/locator_cl.c) to be initialised. This is whycursor.cis compiled into the client library variants (CS_MODE and SA_MODE). In SERVER_MODE the file is still in the build (the_get_list_file_pageserver-side helper,xqfile_get_list_file_page, lives inlist_file.c, not incursor.c), but the cursor itself is not used on the server.
Open Questions
Section titled “Open Questions”-
Why
static QFILE_LIST_ID empty_list_idincursor_open? The comment saysTODO: remove static empty_list_id. The variable is local-static, used only to zero the cursor’slist_idbeforecursor_copy_list_idoverwrites it. TheQFILE_CLEAR_LIST_ID(&empty_list_id)is run on every call, so two threads racing oncursor_opencould see torn writes; in practice the variable is recomputed bit-for-bit identical every time so torn writes are benign, but the construct is a classic source of “data race in static storage” warnings and the TODO acknowledges it. -
Cursor state across a holdable-cursor’s COMMIT — what position survives? The session captures
list_id,temp_vfid,num_tmp,total_count,query_flag. It does not capture any cursor positional state —current_vpid,current_tuple_no,current_tuple_offset, etc. The position is purely client-side state on theCURSOR_ID, so a COMMIT inside an open cursor is silently transparent to the client: the nextcursor_next_tuplesees the samecurrent_vpidand asks for the next page. Whether that survives the broker process layer (the broker’sT_SRV_HANDLEand its embeddedDB_QUERY_RESULTare kept across COMMIT, so yes for normal use) is an integration detail rather than a cursor-module one. -
cursor_set_oid_columnsvs.is_updatable. The current code refusescursor_set_oid_columnsifis_updatableis set — butis_updatableonly turns on this refusal; nothing else in the cursor module reads it. Updatable cursors are effectively expressed byis_oid_included, withis_updatableas a guardrail that makes the API hard to misuse. Whether a future updatable-cursor with non-first-column OIDs would need to be supported is unspecified. -
Multi-page packing and the
header_vpidfield. The cursor’sheader_vpidrecords the first VPID that the last network buffer fill received, so that subsequentcursor_get_list_file_pagecalls can walk forward through the buffer comparing each packed page’s VPID against the request. The walk isO(packed-pages)and starts over fromheader_vpidon every miss; in a worst case (cursor zigzags forwards then backwards across pages that are not in the buffer), this means re-fetching the same network page repeatedly. There is no eviction policy beyond “buffer holds whatever the last fetch returned”; whether a larger, LRU’d page cache on the cursor would help workloads with mixed-direction cursors is an open question. -
Cursor at scale — why no streaming? The cursor pays for full materialisation (the entire result is in the list-file before the first
cursor_next_tuplereturns). Postgres’PORTAL_ONE_SELECTstreams forward-only cursors directly from the executor without materialising. Whether a CUBRID forward-only-onlycursor_opencould shortcut around the list-file by binding directly to the executor’s iterator tree is an open architectural question; the prerequisite is that the executor stay alive acrosscursor_next_tuplecalls, which is incompatible with the current model whereqexec_execute_querydrives to completion. -
Connection-drop vs. broker-restart visibility of holdable cursors.
session_remove_query_entry_allis called bynet_server_conn_downto flush all holdable cursors when the TCP socket dies. A broker that recycles its CAS process (e.g.,BROKER_RESTART_TIMEexhausted) drops the connection cleanly, and the holdable cursor is destroyed. A broker that is kill-9’d has its TCP socket closed by the kernel and the same cleanup runs. So holdable cursors do not survive broker recycling; only intra-broker COMMITs benefit. Whether this is a documented contract or a side-effect is unclear.
Sources
Section titled “Sources”src/query/cursor.c— the entire cursor module: ~1800 lines covering positional navigation, page fetch, decode, OID prefetch, lifecyclesrc/query/cursor.h—CURSOR_IDstruct,CURSOR_POSITIONenum,cursor_*exported API,cursor_free_list_idmacro familysrc/query/list_file.c—xqfile_get_list_file_pageserver-side handler (the page-packing transport partner)src/query/query_list.h—RESULT_HOLDABLEflag, page header / tuple-value macros consumed by cursor.csrc/query/query_manager.c—qmgr_clear_trans_wakeup(commit hand-off),qmgr_get_query_entry(post-commit reload),is_holdableflagsrc/query/query_manager.h—QMGR_QUERY_ENTRY::is_holdablesrc/session/session.c—session_store_query_entry_info,session_load_query_entry_info,qentry_to_sentry,sentry_to_qentry,session_preserve_temporary_files,session_remove_query_entry_*src/communication/network_interface_cl.c— client stubqfile_get_list_file_pageissuingNET_SERVER_LS_GET_LIST_FILE_PAGEsrc/compat/db_query.h/db_query.c—DB_QUERY_RESULT,DB_SELECT_RESULT::cursor_id,db_query_seek_tupleand thedb_query_*family delegating tocursor_*src/parser/query_result.c—pt_new_query_result_descriptor, the placecursor_openis invoked for compiled SELECTsrc/parser/parse_evaluate.c— inline subquery evaluation viacursor_open/_next_tuple/_closesrc/broker/cas_execute.c—T_SRV_HANDLE::is_holdableand the propagation intodb_session_set_holdableandRESULT_HOLDABLE- Sibling docs:
cubrid-list-file.md,cubrid-server-session.md,cubrid-query-executor.md