Skip to content

CUBRID List-File — Spillable Tuple-Stream Inter-Operator Pipe and Materialisation Substrate

Contents:

A query plan is a directed acyclic graph of operators that ingest tuples and emit tuples. Two textbook strategies move tuples between adjacent operators. Volcano (Graefe 1994) couples them by demand-driven iteration: the consumer calls next() on the producer and the producer fabricates exactly one tuple per call, walking down the tree until a leaf yields a row. Nothing is materialised; the call stack is the pipe. The other strategy, explicit materialisation, has the producer write every tuple it generates into a temporary tuple-stream first, and the consumer reads that stream back. Graefe 1993 (Query Evaluation Techniques for Large Databases) catalogues the two designs side-by-side and observes that real engines mix them — most operators iterate, but a small set must materialise.

Four classes of operator have to materialise no matter how aggressive the engine is about pipelining:

  1. Sort. A merge-sort consumes its entire input before it can emit the smallest tuple. The input has to live somewhere — typically on disk if the run does not fit in memory.
  2. Hash build (build side of hash join, hash GROUP BY). The probe side streams, but the build side must be fully consumed and indexed before any probe begins.
  3. Multi-pass scan. Operations that need to walk the same input more than once — analytic functions over a partition, recursive CTEs, certain CONNECT BY traversals, the inner side of a correlated subquery executed once per outer row — must keep the input around. The most efficient way to do this is to capture it in a re-scannable buffer.
  4. Sub-query result memoisation. When an uncorrelated sub-query feeds the outer plan via a CONSTANT reference, the sub-query is run once, its result tuple is stored, and every reference reads from that store. Without materialisation the sub-query would re-execute on every reference.

Hellerstein and Stonebraker, in Anatomy of a Database System, name this object the inter-operator pipe and observe that it dominates the executor’s runtime cost when queries spill. The pipe has three properties that distinguish it from a generic temp file:

  • Sequential write, sequential read. The pipe is never randomly mutated mid-query; the producer appends until it is finished, then becomes immutable for the consumer’s reads. Sort and hash internally re-write, but the pipe they emit is append-only.
  • Self-describing tuples. The pipe must carry enough metadata that a reader can decode each tuple back into typed values (for example, to evaluate predicates against it). Per-tuple length-prefixes plus a stream-level type list are the canonical encoding; Database Internals (Petrov, ch. 1, “File Organization”) describes the same pattern under “slotted page” and “fixed/variable-length tuples”.
  • Spillable. When the result fits in a few KB the pipe should not allocate disk space; when it grows past that, it should transition to disk transparently. Postgres’ Tuplestore and CUBRID’s list-file both pin themselves to a small in-memory page array first and migrate to a temporary file under pressure.

Tuple representation inside the pipe is a separate design dial. Three families exist: packed row format (fixed-or-variable-length values written contiguously, length prefixes for variable parts; what most row stores use for temp streams), columnar (one buffer per column; favoured by analytical engines), and opaque encoded blobs (e.g., serialised C structs; brittle and rare). CUBRID picks packed row format with two-level length prefixes — a per-tuple length and a per-value length — described in detail in ## CUBRID's Approach below.

The final theoretical piece is lifecycle. A pipe exists strictly inside a query’s execution; it is created when the producer opens, written until the producer closes, scanned by the consumer until the consumer closes, then destroyed. Crashes do not need to recover the pipe — there is nothing to roll back, no transaction has committed against its contents — and so the storage manager treats it as a non-recoverable, non-WAL-logged file. CUBRID’s FILE_TEMP and PAGE_QRESULT page type encode exactly this contract.

Engines diverge in how many tuple-stream abstractions they expose, and in where the in-memory-to-disk transition happens. The shape of the abstractions is otherwise uniform.

PostgreSQL has two distinct objects:

  • Tuplestore (src/backend/utils/sort/tuplestore.c) — a generic spillable tuple buffer. Used for Material plan nodes, WindowAgg, Tablefunc, and as the holding tank for RowsFromFunc. Lives entirely in work_mem until it overflows, then writes to a BufFile-backed disk file. Supports forward scan, backward scan, and rewind. The producer calls tuplestore_puttuple / tuplestore_putvalues; the consumer iterates with tuplestore_gettupleslot.
  • Tuplesort (tuplesort.c) — a specialised spillable buffer that sorts on insertion. Used by Sort, Agg (sort strategy), Unique, Limit, and by ANALYZE. Internally falls into one of several states (TSS_INITIAL, TSS_BOUNDED, TSS_BUILDRUNS, TSS_FINALMERGE) depending on whether the input fits in work_mem and whether top-N can be exploited.

The split is intentional: Tuplestore does not pay for sort logic when not needed, and Tuplesort’s sort is not bolted onto a generic buffer. Both share BufFile as the spill substrate.

MySQL — filesort, temp tables, Filesort_buffer

Section titled “MySQL — filesort, temp tables, Filesort_buffer”

MySQL’s analogue is split across more pieces. Filesort (sql/filesort.cc) handles ORDER BY and is conceptually a Tuplesort-shaped object — sort buffer in memory, spill to a sequence of merge files in tmpdir when the buffer is full. Temporary tables (sql/sql_tmp_table.cc) handle GROUP BY and DISTINCT; they look more like a regular MyISAM/InnoDB table than a stream, and migrate from in-memory MEMORY engine to on-disk MyISAM/InnoDB when the size limit (tmp_table_size) is exceeded. The streaming half of the equation is JOIN_CACHE and the iterator interfaces in modern MySQL 8 (AggregateIterator, WindowIterator).

SQL Server has two spool operators in the plan tree itself: Table Spool materialises an arbitrary tuple stream into TempDB, and Index Spool materialises a stream and adds a B-tree index for re-probe. Spools are first-class plan nodes, so the planner reasons explicitly about which sub-plans to spool versus re-execute. The spool storage lives in TempDB, which has its own buffer pool and recovery rules.

CUBRID collapses everything PostgreSQL splits across Tuplestore and Tuplesort into a single tuple-stream abstraction — the list file. Sort, group-by, hash build, sub-query memo, and final-result delivery all use the same QFILE_LIST_ID underneath. Sort is implemented as a function on top of a list file (qfile_sort_list reads an unsorted list file and produces a sorted list file via external_sort); it is not a different object.

Theoretical conceptCUBRID name
Tuple-stream identifier (the “what”)QFILE_LIST_ID (query_list.h)
Underlying storage handleQMGR_TEMP_FILE (query_manager.h)
In-memory spill bufferQMGR_TEMP_FILE::membuf[] (PAGE_PTR array)
On-disk spill substrateFILE_TEMP via file_create_temp (file_manager.c)
Page format on diskPAGE_QRESULT (set by qmgr_init_external_file_page)
Per-page headerQFILE_PAGE_HEADER (32 bytes)
Tuple formatlength-prefixed packed row, QFILE_TUPLE
Per-tuple value formatQFILE_TUPLE_VALUE_HEADER (flag + length)
Stream-level type listQFILE_TUPLE_VALUE_TYPE_LIST (array of TP_DOMAIN *)
Producer iterator stateimplicit in QFILE_LIST_ID::last_pgptr / last_offset / lasttpl_len
Consumer iterator stateQFILE_LIST_SCAN_ID (query_list.h)
Scannable wrapper as scan operatorS_LIST_SCAN / LLIST_SCAN_ID (scan_manager.h)
Open the pipe (producer side)qfile_open_list
Append a tupleqfile_add_tuple_to_list (or qfile_generate_tuple_into_list)
Close the pipe (producer side)qfile_close_list
Open a scan over the pipeqfile_open_list_scan
Pull next tuple (consumer side)qfile_scan_list_next
Close the scanqfile_close_scan
Destroy the pipe (release temp pages)qfile_destroy_list
Set-operation pipesqfile_combine_two_list (UNION / INTERSECT / DIFFERENCE)
Sort wrapperqfile_sort_list / qfile_sort_list_with_func
Cross-query result memoisationQFILE_LIST_CACHE_ENTRY + qfile_lookup_list_cache_entry

The architectural payoff of one shared abstraction is large: the executor (query_executor.c), the scan layer (scan_manager.c), the post-processing layer (query_aggregate.cpp, query_analytic.cpp), and the cross-query cache (the list-file cache) all speak the same vocabulary. Adding a new materialising operator (or a new consumer) is mostly a matter of calling qfile_open_list and qfile_scan_list_next; the spill logic, page layout, and lifecycle are inherited.

This section walks the abstraction at four levels of zoom — the identifier (QFILE_LIST_ID), the page format (QFILE_PAGE_HEADER + tuples), the producer/consumer operations, and the lifecycle binding to the surrounding XASL execution. Each level is exposed by a small set of functions in list_file.c, anchored by symbol names rather than line numbers (the position-hint table at the end of ## Source Walkthrough ties symbols to current line offsets).

QFILE_LIST_ID — the tuple-stream identifier

Section titled “QFILE_LIST_ID — the tuple-stream identifier”

QFILE_LIST_ID is the handle every component passes around. It is not the tuples and not the bytes — it is a small struct (~120 bytes) describing where the bytes live, how many tuples are there, what their schema is, and which writer-side cursor state is current.

// QFILE_LIST_ID — src/query/query_list.h
struct qfile_list_id
{
QFILE_TUPLE_VALUE_TYPE_LIST type_list; /* domain of each column */
SORT_LIST *sort_list; /* sort attributes (if sorted) */
INT64 tuple_cnt; /* total tuples */
int page_cnt; /* total pages */
VPID first_vpid; /* head of page chain */
VPID last_vpid; /* tail of page chain */
PAGE_PTR last_pgptr; /* live last-page latch (writer) */
int last_offset; /* end-of-data on last page */
int lasttpl_len; /* last tuple length (for prev links) */
QUERY_ID query_id; /* owning query */
VFID temp_vfid; /* duplicated from tfile_vfid */
struct qmgr_temp_file *tfile_vfid; /* the actual storage handle */
QFILE_TUPLE_DESCRIPTOR tpl_descr; /* writer-side fast path */
bool is_domain_resolved;
bool is_result_cached;
QFILE_LIST_ID *dependent_list_id; /* chained on qfile_connect_list */
};

Three observations:

  1. The struct mixes a static description of the file (type list, sort attributes, query owner) with a live writer cursor (last_pgptr, last_offset, lasttpl_len). When a list-file is being constructed, those cursor fields point at a latched page; once qfile_close_list runs they are nulled out and the page is unlatched. A list-file is therefore in one of three states: open-and-writing (cursor live), closed-and-readable (cursor null, pages on disk), or destroyed (struct cleared).
  2. The actual storage is one indirection away. tfile_vfid points to a QMGR_TEMP_FILE, which holds the in-memory membuf[] and, once the membuf is full, the temp_vfid of a FILE_TEMP file allocated via file_create_temp. The list-file does not see disk directly — it sees qmgr_get_new_page / qmgr_get_old_page, both of which check the membuf first.
  3. dependent_list_id is a uni-directional chain of secondary list-files attached via qfile_connect_list. Used when a producer needs to logically extend a parent stream with a child — for example, a sub-query’s intermediate that the outer execution will outlive. The chain is destroyed transitively (qfile_destroy_list recurses into dependent_list_id), so ownership is unambiguous.

A list-file is a doubly-linked chain of database pages, each of size DB_PAGESIZE (default 16 KB). Each page begins with a 32-byte QFILE_PAGE_HEADER and stores tuples back-to-back from byte 32 to last_offset.

// qfile_page_header — src/query/query_list.h
struct qfile_page_header
{
int pg_tplcnt; /* # tuples on this page */
PAGEID prev_pgid; /* previous data page */
PAGEID next_pgid; /* next data page */
int lasttpl_off; /* last-tuple offset (for backward scan) */
PAGEID ovfl_pgid; /* overflow page chain head */
VOLID prev_volid; /* (volid, pageid) split into separate fields */
VOLID next_volid;
VOLID ovfl_volid;
};

A separate per-page overflow chain (rooted at ovfl_pgid / ovfl_volid) holds tuples larger than QFILE_MAX_TUPLE_SIZE_IN_PAGE (= DB_PAGESIZE - 32). Big tuples are split: the first chunk lives in-line on the data page, the rest is split across overflow pages and stitched together by qfile_get_tuple_from_current_list when the consumer encounters a tuple whose page header has ovfl_pgid != NULL_PAGEID.

flowchart LR
  subgraph MEMBUF["QMGR_TEMP_FILE.membuf[] (in-memory pages)"]
    M0["page 0<br/>QFILE_PAGE_HEADER<br/>tuples..."]
    M1["page 1<br/>tuples..."]
    M2["page 2<br/>tuples..."]
  end
  subgraph TEMP["FILE_TEMP (on-disk spill)"]
    D0["page 3<br/>tuples..."]
    D1["page 4<br/>tuples..."]
    DN["page N<br/>tuples..."]
  end
  LISTID["QFILE_LIST_ID<br/>first_vpid → page 0<br/>last_vpid → page N<br/>last_pgptr → page N latch"]
  M0 -- next_vpid --> M1
  M1 -- next_vpid --> M2
  M2 -- next_vpid --> D0
  D0 -- next_vpid --> D1
  D1 -- next_vpid --> DN
  LISTID -. owns .-> M0
  LISTID -. live cursor on .-> DN

The chain spans both flavours of storage. A page in the membuf has volid == NULL_VOLID and a small pageid index into the array; a page in FILE_TEMP has a real (volid, pageid). The reader does not care: qmgr_get_old_page consults the membuf first (vpid->pageid <= membuf_last) and falls through to pgbuf_fix for real pages. From the list-file’s perspective the chain is uniform.

Tuple format — packed row, two-level length prefix

Section titled “Tuple format — packed row, two-level length prefix”

Each tuple inside a page is a length-prefixed packed row:

[ tuple_length (4) | prev_tuple_length (4) | value_0 | value_1 | ... | value_N-1 ]

prev_tuple_length enables backward scan (qfile_scan_list_prev) without revisiting the page header. Each value is itself prefixed:

[ flag (4) | val_len (4) | <val_len bytes of packed value, MAX_ALIGNMENT-padded> ]

The flag is V_BOUND for a present value or V_UNBOUND for SQL NULL (in which case val_len == 0). The packed value is in disk-image format — what pr_data_writeval of the column’s PR_TYPE produces. Decoding back into a DB_VALUE is the consumer’s job (typically via qdata_set_valptr_list_with_domain after fetch_val_list).

The macros that realise this layout — QFILE_PUT_TUPLE_LENGTH, QFILE_GET_TUPLE_VALUE_HEADER_POSITION, QFILE_GET_TUPLE_VALUE_LENGTH, QFILE_GET_TUPLE_VALUE_FLAG — live in query_list.h and are inlined wherever a tuple is touched. They are not type-aware; they manipulate raw byte offsets.

Why two prefixes (tuple-length and value-length)?

Section titled “Why two prefixes (tuple-length and value-length)?”

Because the consumer needs to skip past values it does not care about (e.g., to evaluate a predicate on column 5 without decoding columns 0–4), and because the writer needs to know how much space the next tuple needs before allocating a new page. The tuple-level prefix lets qfile_scan_next walk pages without decoding; the value-level prefix lets qfile_locate_tuple_value jump to column N in O(N) macro stepping.

sequenceDiagram
  autonumber
  participant Op as Producer (BUILDLIST_PROC)
  participant LF as list_file
  participant QM as query_manager
  participant FM as file_manager
  Op->>LF: qfile_open_list(type_list, query_id, flag, NULL)
  LF->>QM: qmgr_create_new_temp_file(TEMP_FILE_MEMBUF_NORMAL)
  QM-->>LF: QMGR_TEMP_FILE (membuf empty, no FILE_TEMP yet)
  LF-->>Op: QFILE_LIST_ID*
  loop for each output tuple
    Op->>LF: qfile_generate_tuple_into_list(T_NORMAL)
    LF->>LF: qfile_allocate_new_page_if_need
    alt membuf has free page
      LF->>QM: qmgr_get_new_page (returns membuf[i+1])
    else membuf full → spill
      LF->>QM: qmgr_get_new_page (no membuf left)
      QM->>FM: file_create_temp(1, &temp_vfid) (first time)
      QM->>FM: file_alloc(temp_vfid, init_page, ...)
      FM-->>QM: real disk page
      QM-->>LF: PAGE_PTR
    end
    LF->>LF: qfile_save_normal_tuple → memcpy values<br/>qfile_add_tuple_to_list_id (advance cursor)<br/>qfile_set_dirty_page
  end
  Op->>LF: qfile_close_list (unlatch last_pgptr)

qfile_open_list is a constructor: it allocates the QFILE_LIST_ID, copies the type list (deep-copies the TP_DOMAIN * array), allocates a QMGR_TEMP_FILE for storage, optionally builds a SORT_LIST if QFILE_FLAG_DISTINCT was set, and registers the file in the per-query temp-file accounting.

// qfile_open_list — src/query/list_file.c (condensed)
QFILE_LIST_ID *
qfile_open_list (THREAD_ENTRY * thread_p, QFILE_TUPLE_VALUE_TYPE_LIST * type_list_p, SORT_LIST * sort_list_p,
QUERY_ID query_id, int flag, QFILE_LIST_ID * existing_list_id)
{
QFILE_LIST_ID *list_id_p = existing_list_id ? existing_list_id : (QFILE_LIST_ID *) malloc (sizeof (QFILE_LIST_ID));
// ... condensed: NULL checks ...
QFILE_CLEAR_LIST_ID (list_id_p);
list_id_p->type_list.type_cnt = type_list_p->type_cnt;
list_id_p->query_id = query_id;
if (QFILE_IS_FLAG_SET (flag, QFILE_FLAG_RESULT_FILE) && !QFILE_IS_LIST_CACHE_DISABLED)
list_id_p->tfile_vfid = qmgr_create_result_file (thread_p, query_id);
else if (QFILE_IS_FLAG_SET (flag, QFILE_FLAG_USE_KEY_BUFFER))
list_id_p->tfile_vfid = qmgr_create_new_temp_file (thread_p, query_id, TEMP_FILE_MEMBUF_KEY_BUFFER);
else
list_id_p->tfile_vfid = qmgr_create_new_temp_file (thread_p, query_id, TEMP_FILE_MEMBUF_NORMAL);
// ... condensed: allocate type_list.domp, build sort_list if DISTINCT ...
return list_id_p;
}

Three flavours of underlying temp file are chosen at open time:

  • qmgr_create_result_file — the final result of a query. Used when QFILE_FLAG_RESULT_FILE is set on the top-level XASL’s list-file. Its lifetime survives the executor; the client fetches pages via xqfile_get_list_file_page.
  • TEMP_FILE_MEMBUF_NORMAL — the default. Membuf size = PRM_ID_TEMP_MEM_BUFFER_PAGES (default 4 pages). Most intermediate list-files use this.
  • TEMP_FILE_MEMBUF_KEY_BUFFER — a larger membuf (sized by PRM_ID_INDEX_SCAN_KEY_BUFFER_PAGES) for index-scan key buffers. Same code path otherwise.

After open, qfile_open_list does not allocate a data page yet. The first page is created lazily inside the first qfile_add_tuple_to_list (or qfile_generate_tuple_into_list / qfile_get_first_page) call:

// qfile_allocate_new_page (condensed) — src/query/list_file.c
static PAGE_PTR
qfile_allocate_new_page (THREAD_ENTRY * thread_p, QFILE_LIST_ID * list_id_p, PAGE_PTR page_p, bool is_ovf_page)
{
VPID new_vpid;
PAGE_PTR new_page_p = qmgr_get_new_page (thread_p, &new_vpid, list_id_p->tfile_vfid);
if (new_page_p == NULL) return NULL;
QFILE_PUT_TUPLE_COUNT (new_page_p, 0);
QFILE_PUT_PREV_VPID (new_page_p, &list_id_p->last_vpid);
if (is_ovf_page) QFILE_PUT_NEXT_VPID_NULL (new_page_p);
else LS_PUT_NEXT_VPID (new_page_p); /* NULL_PAGEID_IN_PROGRESS */
QFILE_PUT_LAST_TUPLE_OFFSET (new_page_p, QFILE_PAGE_HEADER_SIZE);
QFILE_PUT_OVERFLOW_VPID_NULL (new_page_p);
if (page_p) QFILE_PUT_NEXT_VPID (page_p, &new_vpid); /* link old → new */
list_id_p->page_cnt++;
if (page_p) qfile_set_dirty_page (thread_p, page_p, FREE, list_id_p->tfile_vfid);
else QFILE_COPY_VPID (&list_id_p->first_vpid, &new_vpid); /* very first page */
QFILE_COPY_VPID (&list_id_p->last_vpid, &new_vpid);
list_id_p->last_pgptr = new_page_p;
list_id_p->last_offset = QFILE_PAGE_HEADER_SIZE;
return new_page_p;
}

qmgr_get_new_page is the gatekeeper that decides membuf-vs-disk:

// qmgr_get_new_page (condensed) — src/query/query_manager.c
PAGE_PTR
qmgr_get_new_page (THREAD_ENTRY * thread_p, VPID * vpid_p, QMGR_TEMP_FILE * tfile_vfid_p)
{
/* first try the in-memory page array */
if (tfile_vfid_p->membuf != NULL && tfile_vfid_p->membuf_last < tfile_vfid_p->membuf_npages - 1)
{
vpid_p->volid = NULL_VOLID;
vpid_p->pageid = ++(tfile_vfid_p->membuf_last);
return tfile_vfid_p->membuf[tfile_vfid_p->membuf_last];
}
/* membuf exhausted → spill: ensure a FILE_TEMP exists, then alloc from it */
if (VFID_ISNULL (&tfile_vfid_p->temp_vfid))
{
file_create_temp (thread_p, 1, &tfile_vfid_p->temp_vfid);
tfile_vfid_p->temp_file_type = FILE_TEMP;
/* ... TDE setup if encrypted ... */
}
return qmgr_get_external_file_page (thread_p, vpid_p, tfile_vfid_p);
}

Pages live in two regions stitched into one logical chain. The membuf is a fixed-size array of pre-allocated DB_PAGESIZE buffers carved out of a single large malloc (qmgr_allocate_tempfile_with_buffer); the FILE_TEMP pages are real disk pages tracked by the file manager’s tempcache. The transition happens once per list-file, when the (fixed-size) membuf is full.

The producer has three entry points, distinguished by how it knows the tuple bytes:

  • qfile_add_tuple_to_list — the bytes are already in a QFILE_TUPLE buffer. Used when the producer already serialised the tuple (e.g., a sub-block copies into a parent’s list, or the executor ran out of tpl_descr capacity and fell back to qdata_copy_valptr_list_to_tuple). Handles tuples larger than a page via the overflow-page mechanism (qfile_allocate_new_ovf_page).
  • qfile_generate_tuple_into_list — the bytes are not serialised yet; they live in a QFILE_TUPLE_DESCRIPTOR whose f_valp[] array points at DB_VALUE *s. The descriptor’s tuple_type (T_SINGLE_BOUND_ITEM, T_NORMAL, T_SORTKEY, T_MERGE) selects one of four qfile_save_*_tuple callbacks that walk the descriptor and write the values. This is the fast path used by the executor’s main loop in qexec_end_one_iteration.
  • qfile_fast_intint_tuple_to_list / qfile_fast_intval_tuple_to_list / qfile_fast_val_tuple_to_list — micro-optimisations that bypass the descriptor and write a 1-or-2-value tuple directly. Used by performance-sensitive paths that emit small fixed-shape rows (e.g., the parallel-coordinator emitting (worker_id, count) pairs).
// qfile_save_normal_tuple — src/query/list_file.c (condensed)
static int
qfile_save_normal_tuple (QFILE_TUPLE_DESCRIPTOR * tuple_descr_p, char *tuple_p, char *page_p, int tuple_length)
{
int total_tuple_value_size = 0, tuple_value_size;
for (int i = 0; i < tuple_descr_p->f_cnt; i++)
{
if (qdata_copy_db_value_to_tuple_value (tuple_descr_p->f_valp[i], tuple_p, &tuple_value_size) != NO_ERROR)
return ER_FAILED;
total_tuple_value_size += tuple_value_size;
tuple_p += tuple_value_size;
}
QFILE_PUT_TUPLE_LENGTH (page_p, tuple_length);
return NO_ERROR;
}

qdata_copy_db_value_to_tuple_value writes [flag, val_len, packed_bytes] for each field, returning the bytes consumed. After all fields are written, the first four bytes of the row are stamped with the total tuple length so consumers can stride past it.

If tuple_length > QFILE_MAX_TUPLE_SIZE_IN_PAGE (= DB_PAGESIZE - 32), the writer takes the overflow path:

// qfile_add_tuple_to_list — overflow tail (condensed)
prev_page_p = cur_page_p;
for (offset = tuple_page_size, tuple_p = tuple + offset; offset < tuple_length;
offset += tuple_page_size, tuple_p += tuple_page_size)
{
new_page_p = qfile_allocate_new_ovf_page (thread_p, list_id_p, cur_page_p, prev_page_p,
tuple_length, offset, &tuple_page_size);
if (new_page_p == NULL) /* error */
memcpy ((char *) new_page_p + QFILE_PAGE_HEADER_SIZE, tuple_p, tuple_page_size);
prev_page_p = new_page_p;
}

The first chunk fits on the data page like a normal tuple; subsequent chunks live one-per-page in a chain rooted at the data page’s ovfl_pgid. Each overflow page is marked by QFILE_PUT_TUPLE_COUNT (page, QFILE_OVERFLOW_TUPLE_COUNT_FLAG) (= -2), so consumers can recognise it and not mistake it for a regular page.

qfile_close_list is a small unlatch-only operation. The data is already on disk (or in membuf); closing only releases the latch on the live writer page so consumers and other producers can fix it:

// qfile_close_list — src/query/list_file.c
void
qfile_close_list (THREAD_ENTRY * thread_p, QFILE_LIST_ID * list_id_p)
{
if (list_id_p && list_id_p->last_pgptr != NULL)
{
QFILE_PUT_NEXT_VPID_NULL (list_id_p->last_pgptr); /* terminate chain */
qmgr_free_old_page_and_init (thread_p, list_id_p->last_pgptr, list_id_p->tfile_vfid);
}
}

Note the chain-termination write: while the writer was building the file, every newly allocated page had its next_vpid stamped with NULL_PAGEID_IN_PROGRESS (via the LS_PUT_NEXT_VPID macro under SERVER_MODE); only qfile_close_list finalises it to NULL_PAGEID. This signals to consumers (especially in async / streaming mode) that the producer is done and there will be no further pages. qfile_reopen_list_as_append_mode reverses this — it re-fetches the last page with a write latch so subsequent qfile_add_tuple_to_list calls can extend the file.

// QFILE_LIST_SCAN_ID — src/query/query_list.h
struct qfile_list_scan_id
{
SCAN_STATUS status; /* S_OPENED / S_STARTED / S_ENDED / S_CLOSED */
SCAN_POSITION position; /* S_BEFORE / S_ON / S_AFTER */
VPID curr_vpid;
PAGE_PTR curr_pgptr; /* live page latch */
QFILE_TUPLE curr_tpl; /* pointer into curr_pgptr */
bool keep_page_on_finish;
bool is_read_only;
int curr_offset;
int curr_tplno;
QFILE_TUPLE_RECORD tplrec; /* spliced buffer for overflow tuples */
QFILE_LIST_ID list_id; /* COPY of the list-file (independent cursor) */
};

Note that list_id inside the scan is a copy of the producer’s QFILE_LIST_ID, made by qfile_copy_list_id inside qfile_open_list_scan. This decouples the consumer’s read cursor from the producer (and from any other consumer) — multiple readers can scan the same list-file concurrently, each holding its own curr_pgptr latch. Lifecycle is handled by ref-counting the underlying QMGR_TEMP_FILE (via qfile_update_qlist_count) so the temp file is not destroyed until the last cursor closes.

// qfile_open_list_scan — src/query/list_file.c
int
qfile_open_list_scan (QFILE_LIST_ID * list_id_p, QFILE_LIST_SCAN_ID * scan_id_p)
{
scan_id_p->status = S_OPENED;
scan_id_p->position = S_BEFORE;
scan_id_p->keep_page_on_finish = 0;
scan_id_p->is_read_only = false;
scan_id_p->curr_vpid.pageid = NULL_PAGEID;
scan_id_p->curr_vpid.volid = NULL_VOLID;
QFILE_CLEAR_LIST_ID (&scan_id_p->list_id);
if (qfile_copy_list_id (&scan_id_p->list_id, list_id_p, true, QFILE_SKIP_DEPENDENT) != NO_ERROR)
return ER_FAILED;
scan_id_p->tplrec.size = 0;
scan_id_p->tplrec.tpl = NULL;
return NO_ERROR;
}

State starts at (S_OPENED, S_BEFORE, no-page-fixed). The scan does not fix any page until the first qfile_scan_list_next is called; until then there is no buffer-pool cost for an open-but-unread cursor.

qfile_scan_list_next is a thin wrapper that delegates to qfile_scan_next (the position machine) and then qfile_retrieve_tuple (the materialiser):

// qfile_scan_next — src/query/list_file.c (condensed)
static SCAN_CODE
qfile_scan_next (THREAD_ENTRY * thread_p, QFILE_LIST_SCAN_ID * scan_id_p)
{
if (scan_id_p->position == S_BEFORE)
{
if (scan_id_p->list_id.tuple_cnt == 0) return S_END;
/* fix first page, position cursor at first tuple */
scan_id_p->curr_pgptr = qmgr_get_old_page (thread_p, &scan_id_p->list_id.first_vpid,
scan_id_p->list_id.tfile_vfid);
QFILE_COPY_VPID (&scan_id_p->curr_vpid, &scan_id_p->list_id.first_vpid);
scan_id_p->curr_offset = QFILE_PAGE_HEADER_SIZE;
scan_id_p->curr_tpl = (char *) scan_id_p->curr_pgptr + QFILE_PAGE_HEADER_SIZE;
scan_id_p->curr_tplno = 0;
scan_id_p->position = S_ON;
return S_SUCCESS;
}
else if (scan_id_p->position == S_ON)
{
/* still tuples on this page? */
if (scan_id_p->curr_tplno < QFILE_GET_TUPLE_COUNT (scan_id_p->curr_pgptr) - 1)
{
scan_id_p->curr_offset += QFILE_GET_TUPLE_LENGTH (scan_id_p->curr_tpl);
scan_id_p->curr_tpl += QFILE_GET_TUPLE_LENGTH (scan_id_p->curr_tpl);
scan_id_p->curr_tplno++;
return S_SUCCESS;
}
/* page exhausted — follow next_vpid */
else if (qfile_has_next_page (scan_id_p->curr_pgptr))
{
VPID next_vpid; QFILE_GET_NEXT_VPID (&next_vpid, scan_id_p->curr_pgptr);
PAGE_PTR next_page_p = qmgr_get_old_page (thread_p, &next_vpid, scan_id_p->list_id.tfile_vfid);
qmgr_free_old_page_and_init (thread_p, scan_id_p->curr_pgptr, scan_id_p->list_id.tfile_vfid);
QFILE_COPY_VPID (&scan_id_p->curr_vpid, &next_vpid);
scan_id_p->curr_pgptr = next_page_p;
scan_id_p->curr_tplno = 0;
scan_id_p->curr_offset = QFILE_PAGE_HEADER_SIZE;
scan_id_p->curr_tpl = (char *) scan_id_p->curr_pgptr + QFILE_PAGE_HEADER_SIZE;
return S_SUCCESS;
}
else
{
scan_id_p->position = S_AFTER;
if (!scan_id_p->keep_page_on_finish)
qmgr_free_old_page_and_init (thread_p, scan_id_p->curr_pgptr, scan_id_p->list_id.tfile_vfid);
return S_END;
}
}
/* ... S_AFTER → S_END ... */
}

The state machine is exactly what one would expect: S_BEFORES_ON (fix first page) → S_ON (advance within page or to next page) → S_AFTER / S_END (release page). qfile_scan_list_prev is the symmetric operation, using the per-tuple prev_tuple_length prefix to step backward and prev_pgid in the page header to walk the chain backward.

// qfile_retrieve_tuple — src/query/list_file.c (condensed)
static SCAN_CODE
qfile_retrieve_tuple (THREAD_ENTRY * thread_p, QFILE_LIST_SCAN_ID * scan_id_p,
QFILE_TUPLE_RECORD * tuple_record_p, int peek)
{
if (QFILE_GET_OVERFLOW_PAGE_ID (scan_id_p->curr_pgptr) == NULL_PAGEID)
{
if (peek) tuple_record_p->tpl = scan_id_p->curr_tpl; /* zero-copy */
else memcpy (tuple_record_p->tpl, scan_id_p->curr_tpl, QFILE_GET_TUPLE_LENGTH (scan_id_p->curr_tpl));
}
else
{
/* tuple has overflow pages → splice into scan_id_p->tplrec */
qfile_get_tuple_from_current_list (thread_p, scan_id_p, &scan_id_p->tplrec);
tuple_record_p->tpl = scan_id_p->tplrec.tpl;
}
return S_SUCCESS;
}

peek = true is the zero-copy fast path used by everything that reads the tuple before advancing — predicate evaluation, projection, sort-key extraction, sub-tuple set-membership tests. The pointer is valid until the next qfile_scan_list_next (which may unlatch the page). peek = false copies into a caller-owned buffer and is used when the caller wants to retain the tuple beyond the next advance — e.g., merge join’s “remember the current row of the inner side while we advance the outer”.

For overflow tuples (QFILE_GET_OVERFLOW_PAGE_ID(curr_pgptr) != NULL_PAGEID), the peek path is impossible — the bytes are spread across multiple pages — so qfile_get_tuple_from_current_list always copies them into the per-scan tplrec buffer first.

UNION, INTERSECT, DIFFERENCE between two list-files share one driver: qfile_combine_two_list. The high-level shape is “sort-merge”, with per-tuple action callbacks plumbed in by flag:

flowchart TB
  A["qfile_combine_two_list(L, R, flag)"] --> B{"flag == UNION ALL?"}
  B -- "yes" --> U["qfile_union_list (clone L, append R, no sort)"]
  B -- "no" --> S1["sort L by all columns (Q_DISTINCT or Q_ALL)"]
  S1 --> S2["sort R by all columns"]
  S2 --> O["open dest list"]
  O --> M{"choose action callbacks"}
  M -- "INTERSECT" --> M1["act_both = qfile_add_one_tuple<br/>act_left = act_right = NULL"]
  M -- "DIFFERENCE" --> M2["act_left = qfile_add_tuple<br/>act_both = act_right = NULL"]
  M -- "UNION DISTINCT" --> M3["act_left = act_right = qfile_add_tuple<br/>act_both = qfile_add_one_tuple"]
  M -- "UNION ALL (post-sort)" --> M4["act_both = qfile_add_two_tuple<br/>(both copies)"]
  M1 --> L["merge loop: compare top-of-L vs top-of-R<br/>cmp<0 → act_left, cmp==0 → act_both, cmp>0 → act_right"]
  M2 --> L
  M3 --> L
  M4 --> L
  L --> C["qfile_close_list (dest)"]

The qfile_advance_single vs qfile_advance_group choice is what makes DISTINCT cheap: _group collapses runs of equal keys into one logical position before each comparison, so duplicate rows in the inputs do not generate duplicate output rows even when both sides have them.

qfile_union_list is the all-the-way-fast-path for UNION ALL: it does not sort, it does not iterate tuples — it physically appends the second list’s pages to the first by editing the next_vpid of the first list’s last page to point at the second list’s first page (qfile_append_list).

Sort wrapper — qfile_sort_list / qfile_sort_list_with_func

Section titled “Sort wrapper — qfile_sort_list / qfile_sort_list_with_func”

Sort is not a list-file primitive. It is a function from one list-file to another, implemented as a callback sandwich around external_sort:

// qfile_sort_list_with_func — src/query/list_file.c (condensed)
QFILE_LIST_ID *
qfile_sort_list_with_func (THREAD_ENTRY * thread_p, QFILE_LIST_ID * list_id_p, SORT_LIST * sort_list_p,
QUERY_OPTIONS option, int flag, SORT_GET_FUNC * get_func, SORT_PUT_FUNC * put_func,
SORT_CMP_FUNC * cmp_func, void *extra_arg, int limit, bool do_close,
int parallelism, ORDERBY_STATS * orderby_stats)
{
qfile_close_list (thread_p, list_id_p); /* unlatch input writer */
QFILE_LIST_ID *srlist_id = qfile_open_list (thread_p, &list_id_p->type_list, sort_list_p,
list_id_p->query_id, flag, NULL); /* output */
QFILE_LIST_SCAN_ID t_scan_id; QFILE_SORT_SCAN_ID s_scan_id; SORT_INFO info;
qfile_open_list_scan (list_id_p, &t_scan_id);
qfile_initialize_sort_info (&info, list_id_p, sort_list_p);
info.s_id = &s_scan_id; info.output_file = srlist_id; info.input_file = list_id_p;
if (get_func == NULL) get_func = &qfile_get_next_sort_item; /* default reader callback */
if (put_func == NULL) put_func = &qfile_put_next_sort_item; /* default writer callback */
if (cmp_func == NULL) cmp_func = (info.key_info.use_original ? &qfile_compare_partial_sort_record
: &qfile_compare_all_sort_record);
int sort_result = sort_listfile (thread_p, NULL_VOLID, estimated_pages, get_func, &info,
put_func, &info, cmp_func, &info.key_info,
(option == Q_DISTINCT ? SORT_ELIM_DUP : SORT_DUP),
limit, srlist_id->tfile_vfid->tde_encrypted, parallel_type);
/* ... destroy input list_id_p, copy srlist_id over list_id_p ... */
return list_id_p;
}

Three callbacks plug external_sort into the list-file:

  • get_funcexternal_sort calls this to read the next input record. Default is qfile_get_next_sort_item, which pulls a tuple from the input scan and copies its sort-key columns into a SORT_REC.
  • put_funcexternal_sort calls this to emit a sorted record. Default is qfile_put_next_sort_item, which decodes the SORT_REC back into a tuple and qfile_add_tuple_to_lists it to the output. Aggregation overrides this with qexec_gby_put_next (sort-based GROUP BY) so each emitted tuple folds into the running aggregate instead of being written.
  • cmp_funcexternal_sort calls this to compare two SORT_RECs. Default is qfile_compare_all_sort_record (full-key comparison) or qfile_compare_partial_sort_record (when sort-key columns are a prefix of the original tuple).
flowchart LR
  subgraph CONS["CUBRID list-file"]
    LIN["unsorted list-file<br/>(QFILE_LIST_ID)"]
    LSCAN["QFILE_LIST_SCAN_ID<br/>(input cursor)"]
    LOUT["sorted list-file<br/>(QFILE_LIST_ID)"]
  end
  subgraph EXT["external_sort.c"]
    GETFN["get_func →<br/>qfile_get_next_sort_item"]
    PUTFN["put_func →<br/>qfile_put_next_sort_item"]
    CMPFN["cmp_func →<br/>qfile_compare_∗_sort_record"]
    INPHASE["IN-PHASE:<br/>build sorted runs<br/>in sort buffer"]
    EXPHASE["EX-PHASE:<br/>k-way merge runs<br/>through sort buffer"]
  end
  LIN --> LSCAN
  LSCAN --> GETFN
  GETFN --> INPHASE
  INPHASE --> EXPHASE
  EXPHASE --> CMPFN
  EXPHASE --> PUTFN
  PUTFN --> LOUT

external_sort itself spills runs to its own FILE_TEMP files when the sort buffer is full — a separate spill substrate from the list-file’s own membuf-then-FILE_TEMP. The list-file is a transport layer; sort is the algorithm.

Type-list inference and the executor binding

Section titled “Type-list inference and the executor binding”

A list-file carries its column types as an array of TP_DOMAIN * (type_list.domp). The executor builds this from the producing XASL’s outptr_list:

flowchart LR
  X["xasl_node<br/>type=BUILDLIST_PROC"] --> O["outptr_list<br/>(REGU_VARIABLE list)"]
  O --> R["each REGU_VARIABLE has<br/>domain (TP_DOMAIN*)"]
  R --> T["pt_node_to_db_domain or<br/>regu_variable_get_domain"]
  T --> TL["QFILE_TUPLE_VALUE_TYPE_LIST<br/>(domp[], type_cnt)"]
  TL --> Q["qfile_open_list(..., &type_list, ...)"]
  Q --> L["new QFILE_LIST_ID<br/>type_list.domp = deep copy"]

The deep copy in qfile_open_list is intentional — the source domains may be on the parser-tree arena (which is freed before query execution finishes); the list-file outlives that, so it must own its own copies. qfile_clear_list_id free_and_inits type_list.domp when the list is destroyed.

qfile_unify_types reconciles two list-files’ type lists when one is appended to or combined with another; it walks both domp[] arrays and picks the LUB (least upper bound) domain via tp_domain_resolve_value. This is the path that turns SELECT 'abc' UNION SELECT 1 into a stream whose column has VARCHAR domain and whose integer rows are coerced.

qfile_modify_type_list is a lighter rebuilder used when the executor knows the type list it wants and just needs to override the one auto-derived from outptr_list (e.g., for hash-join build sides where the build columns are a subset).

A list-file is owned by exactly one XASL node for the duration of that XASL’s execution. The owning node is whichever node was constructing the file via qfile_open_list; the result is stored as xasl->list_id. When the executor’s qexec_clear_xasl_head runs (during qexec_clear_xasl cleanup of an XASL tree), it walks every node and calls:

// qexec_clear_xasl_head excerpt — src/query/query_executor.c
if (xasl->list_id && !xasl->list_id->is_result_cached)
{
(void) qfile_close_list (thread_p, xasl->list_id);
qfile_destroy_list (thread_p, xasl->list_id);
}

Two important implications:

  1. A consumed sub-block’s list-file dies with the consumer’s XASL. A BUILDLIST_PROC sub-XASL’s list_id is materialised once during qexec_open_scan, the parent reads it via S_LIST_SCAN, and the file is destroyed when the parent’s XASL tree is cleared. There is no GC; lifetimes are stack-shaped.
  2. Result-cached list-files survive across queries. When is_result_cached is true, the list-file is owned by the list-file cache (QFILE_LIST_CACHE_ENTRY), not by the producing XASL. The XASL tree is torn down but the list-file stays — until the cache decides to evict it (qfile_clear_list_cache).
sequenceDiagram
  autonumber
  participant E as Query (executor)
  participant X as Parent XASL (BUILDLIST_PROC)
  participant SX as Sub-XASL (BUILDLIST_PROC)
  participant LF as list-file (sub-XASL.list_id)
  E->>X: qexec_execute_mainblock(parent)
  X->>SX: qexec_execute_mainblock(sub) (e.g., aptr_list)
  SX->>LF: qfile_open_list (TEMP_FILE_MEMBUF_NORMAL)
  loop sub-XASL produces tuples
    SX->>LF: qfile_generate_tuple_into_list (T_NORMAL)
  end
  SX->>LF: qfile_close_list
  X->>LF: scan_open_list_scan (S_LIST_SCAN)
  loop parent reads tuples
    X->>LF: qfile_scan_list_next (peek)
  end
  X->>LF: qfile_close_scan
  E->>X: qexec_clear_xasl (tree cleanup)
  X->>LF: qfile_close_list + qfile_destroy_list (sub-XASL.list_id)

The list-file cache is a per-XASL-cache-entry hash table that memoises the result of an entire query keyed on its parameter values (DB_VALUE_ARRAY). Hit on qfile_lookup_list_cache_entry returns a QFILE_LIST_CACHE_ENTRY whose list_id is a previously-built (and not-destroyed) list-file:

// qfile_lookup_list_cache_entry — src/query/list_file.c (condensed)
QFILE_LIST_CACHE_ENTRY *
qfile_lookup_list_cache_entry (THREAD_ENTRY * thread_p, XASL_CACHE_ENTRY * xasl,
const DB_VALUE_ARRAY * params, bool * result_cached)
{
*result_cached = false;
if (QFILE_IS_LIST_CACHE_DISABLED) return NULL;
if (qfile_List_cache.n_hts == 0) return NULL;
csect_enter (thread_p, CSECT_QPROC_LIST_CACHE, INF_WAIT);
if (xasl->list_ht_no < 0) xasl->list_ht_no = qcache_get_new_ht_no (thread_p);
QFILE_LIST_CACHE_ENTRY *lent = mht_get (qfile_List_cache.list_hts[xasl->list_ht_no], params);
qfile_List_cache.lookup_counter++;
if (lent && !lent->deletion_marker)
{
*result_cached = true;
/* register tran_index, bump ref_count + time_last_used */
}
csect_exit (thread_p, CSECT_QPROC_LIST_CACHE);
return *result_cached ? lent : NULL;
}

When a query begins execution and a cache hit is found, the executor skips materialisation — it uses the cached list_id directly as the result, and the XASL tree’s own list-files are never built. When a query finishes execution and conditions allow caching (no triggers fired, no autocommit boundary issues, etc.), qfile_update_list_cache_entry clones the producing list-file into a new cache entry. Eviction is by the standard combination of TTL + LRU (time_last_used, ref_count) plus capacity (PRM_ID_LIST_MAX_QUERY_CACHE_PAGES), implemented in qfile_list_cache_cleanup.

The interaction with the XASL cache is: every list-cache entry is owned by an XASL-cache entry. When the XASL cache evicts a plan, qfile_clear_list_cache walks all of that XASL’s list-cache entries and deletes them. This guarantees a list-cache entry never references a stale plan.

This section names the symbols that any cross-reference into the list-file module should anchor on. Group order matches the producer / consumer / set-ops / sort / cache / lifecycle order used in ## CUBRID's Approach.

  • QFILE_LIST_ID — the tuple-stream identifier.
  • QFILE_LIST_SCAN_ID — consumer cursor.
  • QFILE_PAGE_HEADER (32 bytes) and offset macros: QFILE_TUPLE_COUNT_OFFSET, QFILE_PREV_PAGE_ID_OFFSET, QFILE_NEXT_PAGE_ID_OFFSET, QFILE_LAST_TUPLE_OFFSET, QFILE_OVERFLOW_PAGE_ID_OFFSET, QFILE_PREV_VOL_ID_OFFSET, QFILE_NEXT_VOL_ID_OFFSET, QFILE_OVERFLOW_VOL_ID_OFFSET.
  • Tuple-format macros: QFILE_TUPLE_LENGTH_SIZE (= 8 — both length prefixes), QFILE_TUPLE_VALUE_HEADER_SIZE (= 8), QFILE_MAX_TUPLE_SIZE_IN_PAGE, QFILE_GET_TUPLE_LENGTH, QFILE_GET_PREV_TUPLE_LENGTH, QFILE_GET_TUPLE_VALUE_FLAG, QFILE_GET_TUPLE_VALUE_LENGTH, QFILE_GET_TUPLE_VALUE_HEADER_POSITION.
  • Stream-flag enum: QFILE_FLAG_RESULT_FILE, QFILE_FLAG_UNION, QFILE_FLAG_INTERSECT, QFILE_FLAG_DIFFERENCE, QFILE_FLAG_ALL, QFILE_FLAG_DISTINCT, QFILE_FLAG_USE_KEY_BUFFER, QFILE_NOT_USE_MEMBUF.
  • Sort metadata: SORT_LIST, SORT_TYPE, SORT_ORDER, SORT_NULLS, QFILE_SORTED_LIST_ID, QFILE_SORT_SCAN_ID.
  • Single-tuple type enum: QFILE_TUPLE_TYPE (T_SINGLE_BOUND_ITEM, T_NORMAL, T_SORTKEY, T_MERGE).
  • Special flag for overflow pages: QFILE_OVERFLOW_TUPLE_COUNT_FLAG (= -2).
  • qfile_initialize / qfile_finalize — module-wide setup of qfile_sort_list_Freelist (lock-free freelist of SORT_LIST records) and the qfile_Max_tuple_page_size constant.
  • qfile_open_list — constructor; allocates QFILE_LIST_ID, picks TEMP_FILE_MEMBUF_NORMAL / _KEY_BUFFER / result-file substrate, deep-copies type list, builds optional sort list for DISTINCT.
  • qfile_close_list — terminates next_vpid chain and unfixes the live writer page.
  • qfile_reopen_list_as_append_mode — re-fixes the last page with a write latch; used by UNION ALL fast-path appending.
  • qfile_get_first_page — eagerly allocates the first data page (used when external code needs the first VPID before adding tuples).
  • qfile_destroy_list — unwinds the temp-file storage; recurses into dependent_list_id.
  • qfile_free_list_id / QFILE_FREE_AND_INIT_LIST_ID — frees the QFILE_LIST_ID struct and its owned arrays.
  • qfile_clear_list_id — zeroes the struct and frees type_list.domp / sort_list / tpl_descr.f_valp without touching the underlying temp-file (for transferring ownership).
  • qfile_copy_list_id / qfile_clone_list_id — duplicate the identifier with QFILE_DEPENDENT_MODE controlling whether the chained dependent_list_id is followed.
  • qfile_add_tuple_to_list — append a pre-serialised QFILE_TUPLE; handles oversize tuples via overflow pages.
  • qfile_generate_tuple_into_list — append from a QFILE_TUPLE_DESCRIPTOR; selects qfile_save_normal_tuple / qfile_save_single_bound_item_tuple / qfile_save_sort_key_tuple / qfile_save_merge_tuple.
  • qfile_save_tuple — entry point dispatching to the four qfile_save_* callbacks.
  • qfile_fast_intint_tuple_to_list / qfile_fast_intval_tuple_to_list / qfile_fast_val_tuple_to_list — bypassing the descriptor for known small shapes.
  • qfile_add_item_to_list — single-column convenience wrapper.
  • qfile_add_overflow_tuple_to_list — copies an already-overflowing tuple from one list-file into another, preserving its overflow chain.
  • qfile_add_tuple_get_pos_in_list — append + return the QFILE_TUPLE_POSITION of the just-written tuple (used by hash-join build to record where each key landed).
  • qfile_allocate_new_page — fix a fresh page from qmgr_get_new_page, link prev/next, copy chain pointers into QFILE_LIST_ID.
  • qfile_allocate_new_ovf_page — same but for an overflow page.
  • qfile_allocate_new_page_if_need — branchy gate around the two allocators called from every tuple-write path.
  • qfile_add_tuple_to_list_id — bookkeeping after a tuple is written (advance tuple_cnt, last_offset, lasttpl_len).
  • qfile_initialize_page_header — zero-init a fresh page header.
  • qfile_set_dirty_page / qfile_set_dirty_page_and_skip_logging — mark the page dirty and either send through the page-buffer’s normal flush path (with WAL) or skip logging for FILE_TEMP pages (which are not recovered).
  • qfile_open_list_scan — initialize a QFILE_LIST_SCAN_ID, deep-copy list_id.
  • qfile_scan_list_next — wrapper calling qfile_scan_list (qfile_scan_next, ...).
  • qfile_scan_list_prev — wrapper calling qfile_scan_list (qfile_scan_prev, ...).
  • qfile_close_scan — release the live page latch, free per-scan tplrec, deep-clear the scan-local list_id.
  • qfile_save_current_scan_tuple_position / qfile_jump_scan_tuple_position — bookmark and seek; used by hash-join build to record per-key row positions and probe to seek to them.
  • qfile_start_scan_fix / qfile_end_scan_fix — pin/unpin the current page across a sub-call that should not lose the cursor (e.g., the executor briefly swaps to a sub-XASL).
  • qfile_get_tuple — read a single tuple by QFILE_TUPLE_POSITION (used to materialise an overflow tuple’s full bytes).
  • qfile_overwrite_tuple — in-place rewrite of a tuple’s bytes (UPDATE-style modification on a list-file used as a working buffer; rare).
  • qfile_set_tuple_column_value — narrower in-place column update.
  • qfile_scan_next / qfile_scan_prev — the S_BEFORE / S_ON / S_AFTER position machine.
  • qfile_retrieve_tuple — peek-vs-copy materialise after positioning.
  • qfile_get_tuple_from_current_list — splice an overflow tuple’s bytes from its overflow-page chain into the scan’s tplrec.
  • qfile_combine_two_list — UNION / INTERSECT / DIFFERENCE driver; sorts inputs (unless UNION ALL fast-path), opens dest, runs sort-merge with action callbacks.
  • qfile_union_list (static, called by qfile_combine_two_list for UNION ALL) — physical page-chain splice via qfile_clone_list_id + qfile_reopen_list_as_append_mode + qfile_copy_tuple (for the tail).
  • qfile_append_list — physical page-chain extension (no tuple iteration).
  • qfile_connect_list — link a child dependent_list_id onto a parent so destruction cascades.
  • qfile_truncate_list — drop all tuples but keep the QFILE_LIST_ID open for re-use.
  • qfile_advance_single / qfile_advance_group (static) — advance one or one-group within the merge.
  • qfile_add_one_tuple / qfile_add_two_tuple (static) — action callbacks plumbed into the merge by qfile_combine_two_list.
  • qfile_compare_tuple_helper (static) — full-row comparison for the merge.
  • qfile_unify_types — reconcile column domains across the two sides.
  • qfile_sort_list — narrow wrapper choosing flags from QUERY_OPTIONS (Q_DISTINCT vs Q_ALL).
  • qfile_sort_list_with_func — full-power entry for callers that supply their own get_func / put_func / cmp_func (the sort-based GROUP BY in query_executor.c does this; see cubrid-post-processing.md).
  • qfile_make_sort_key / qfile_generate_sort_tuple — pack/unpack a tuple into/out of SORT_REC (the in-buffer record passed to external_sort).
  • qfile_compare_partial_sort_record / qfile_compare_all_sort_record — default cmp_funcs.
  • qfile_initialize_sort_key_info / qfile_clear_sort_key_info — build/free a SORTKEY_INFO from a SORT_LIST.
  • qfile_initialize_sort_info / qfile_clear_sort_info (static) — build/free the SORT_INFO plumbing struct.
  • qfile_get_next_sort_item / qfile_put_next_sort_item (static) — default callbacks plugged into external_sort.
  • qfile_get_estimated_pages_for_sorting — fed to external_sort so it knows how many runs to expect.
  • qfile_initialize_list_cache / qfile_finalize_list_cache — module setup of qfile_List_cache (the global QFILE_LIST_CACHE instance plus its hash-table pool).
  • qfile_lookup_list_cache_entry — hit-or-miss probe by (xasl_cache_entry, params).
  • qfile_update_list_cache_entry — insert (called when a query that opted into caching completes).
  • qfile_clear_list_cache — invalidate every list-cache entry attached to a given XASL-cache entry.
  • qfile_end_use_of_list_cache_entry — drop a transaction’s reference; eviction-eligible when ref_count → 0.
  • qfile_list_cache_cleanup (static) — LRU + TTL eviction driven by PRM_ID_LIST_MAX_QUERY_CACHE_PAGES.
  • qfile_clear_uncommited_list_cache_entry — purge cache entries from a transaction that aborted (their results are not visible to anyone else).
  • qfile_allocate_list_cache_entry / qfile_free_list_cache_entry — pool-backed allocation (qfile_List_cache_entry_pool).
  • qfile_print_list_cache_entry / qfile_dump_list_cache_internal — diagnostic.
  • qfile_hash_db_value_array / qfile_compare_equal_db_value_array — hash-table key functions.
  • QMGR_TEMP_FILE (query_manager.h) — wraps (membuf[], membuf_last, membuf_npages, temp_vfid, temp_file_type, tde_encrypted).
  • qmgr_create_new_temp_file (query_manager.c) — build a QMGR_TEMP_FILE with a freshly carved membuf.
  • qmgr_create_result_file — same, but flagged so the temp file is preserved (survives the producing query for the client to fetch).
  • qmgr_get_new_page (query_manager.c) — membuf-vs-FILE_TEMP arbitration on append.
  • qmgr_get_old_page / qmgr_get_old_page_read_only — membuf-vs-FILE_TEMP arbitration on read.
  • qmgr_free_old_page_and_init — release a fixed page (no-op for membuf pages, pgbuf_unfix for FILE_TEMP).
  • qmgr_free_list_temp_file — destroy a QMGR_TEMP_FILE (called from qfile_destroy_list).
  • qmgr_get_external_file_page (static) — file_alloc against the temp_vfid, sets PAGE_QRESULT ptype.
  • file_create_temp (file_manager.c) — allocate a FILE_TEMP VFID; this is what backs the spill.
  • file_temp_retire — return a FILE_TEMP to the tempcache.
  • xqfile_get_list_file_page — server-to-client page transfer for query results.
  • qexec_end_one_iteration — calls qfile_generate_tuple_into_list (T_NORMAL) for each row that survives outer-block predicates.
  • qexec_clear_xasl / qexec_clear_xasl_head — calls qfile_close_list + qfile_destroy_list on every XASL’s list_id (unless is_result_cached).
  • qexec_orderby_distinct — calls qfile_sort_list.
  • qexec_groupby / qexec_hash_gby_finalize — calls qfile_sort_list_with_func with qexec_gby_put_next (sort-based) or builds a hash via qexec_hash_gby_agg_tuple (hash-based; the resulting list-file is the spill from the hash’s overflow tuples).
  • qexec_execute_analytic — drives qfile_sort_list over the analytic input plus several auxiliary list-files (group_list_id, value_list_id for window-function state).
  • xqmgr_execute_query / qmgr_process_query — the entry point that owns the top-level QFILE_LIST_ID and registers it with the query manager.
  • scan_open_list_scan — opens a QFILE_LIST_SCAN_ID and wraps it in a LLIST_SCAN_ID plus a SCAN_ID of type S_LIST_SCAN. This is what makes a list-file appear as a scan operator to the rest of the executor.
  • scan_next_scan_local (case S_LIST_SCAN) → scan_next_list_scan — pulls one tuple via qfile_scan_list_next, then evaluates regu_list_pred / regu_list_rest against it.
  • scan_init_scan_id / scan_init_scan_pred — generic scan-id init shared across all scan types.

Position hints (as of 2026-05-01, on the worktree at /data/hgryoo/references/cubrid)

Section titled “Position hints (as of 2026-05-01, on the worktree at /data/hgryoo/references/cubrid)”
SymbolFileLine
qfile_page_header (struct)src/query/query_list.h66
qfile_list_id (struct)src/query/query_list.h425
qfile_list_scan_id (struct)src/query/query_list.h498
qfile_initializesrc/query/list_file.c1156
qfile_finalizesrc/query/list_file.c1178
qfile_open_listsrc/query/list_file.c1203
qfile_close_listsrc/query/list_file.c1352
qfile_reopen_list_as_append_modesrc/query/list_file.c1373
qfile_allocate_new_pagesrc/query/list_file.c1465
qfile_allocate_new_ovf_pagesrc/query/list_file.c1526
qfile_allocate_new_page_if_needsrc/query/list_file.c1561
qfile_add_tuple_to_list_idsrc/query/list_file.c1592
qfile_add_tuple_to_listsrc/query/list_file.c1610
qfile_save_single_bound_item_tuplesrc/query/list_file.c1672
qfile_save_normal_tuplesrc/query/list_file.c1695
qfile_save_sort_key_tuplesrc/query/list_file.c1715
qfile_save_merge_tuplesrc/query/list_file.c1751
qfile_save_tuplesrc/query/list_file.c1804
qfile_generate_tuple_into_listsrc/query/list_file.c1848
qfile_fast_intint_tuple_to_listsrc/query/list_file.c1908
qfile_fast_intval_tuple_to_listsrc/query/list_file.c1968
qfile_fast_val_tuple_to_listsrc/query/list_file.c2055
qfile_add_overflow_tuple_to_listsrc/query/list_file.c2138
qfile_get_first_pagesrc/query/list_file.c2218
qfile_destroy_listsrc/query/list_file.c2269
xqfile_get_list_file_pagesrc/query/list_file.c2312
qfile_add_item_to_listsrc/query/list_file.c2442
qfile_combine_two_listsrc/query/list_file.c2688
qfile_append_listsrc/query/list_file.c2953
qfile_connect_listsrc/query/list_file.c3130
qfile_truncate_listsrc/query/list_file.c3210
qfile_union_list (static)src/query/list_file.c3359
qfile_make_sort_keysrc/query/list_file.c3507
qfile_generate_sort_tuplesrc/query/list_file.c3643
qfile_get_next_sort_itemsrc/query/list_file.c3729
qfile_put_next_sort_itemsrc/query/list_file.c3765
qfile_compare_partial_sort_recordsrc/query/list_file.c3925
qfile_compare_all_sort_recordsrc/query/list_file.c4003
qfile_initialize_sort_key_infosrc/query/list_file.c4145
qfile_clear_sort_key_infosrc/query/list_file.c4243
qfile_initialize_sort_infosrc/query/list_file.c4266
qfile_clear_sort_info (static)src/query/list_file.c4290
qfile_sort_list_with_funcsrc/query/list_file.c4331
qfile_sort_listsrc/query/list_file.c4482
qfile_copy_list_pages (static)src/query/list_file.c4509
qfile_duplicate_listsrc/query/list_file.c4626
qfile_get_tuplesrc/query/list_file.c4663
qfile_get_tuple_from_current_list (static)src/query/list_file.c4738
qfile_scan_next (static)src/query/list_file.c4762
qfile_scan_prev (static)src/query/list_file.c4865
qfile_save_current_scan_tuple_positionsrc/query/list_file.c4959
qfile_retrieve_tuple (static)src/query/list_file.c4971
qfile_jump_scan_tuple_positionsrc/query/list_file.c5033
qfile_start_scan_fixsrc/query/list_file.c5124
qfile_open_list_scansrc/query/list_file.c5162
qfile_scan_list (static)src/query/list_file.c5206
qfile_scan_list_nextsrc/query/list_file.c5229
qfile_scan_list_prevsrc/query/list_file.c5243
qfile_end_scan_fixsrc/query/list_file.c5257
qfile_close_scansrc/query/list_file.c5279
qfile_initialize_list_cachesrc/query/list_file.c5400
qfile_finalize_list_cachesrc/query/list_file.c5521
qfile_clear_list_cachesrc/query/list_file.c5570
qfile_lookup_list_cache_entrysrc/query/list_file.c6035
qfile_update_list_cache_entrysrc/query/list_file.c6196
qfile_end_use_of_list_cache_entrysrc/query/list_file.c6419
qfile_add_tuple_get_pos_in_listsrc/query/list_file.c6559
qfile_has_next_pagesrc/query/list_file.c6636
qfile_update_domains_on_type_listsrc/query/list_file.c6648
qfile_set_tuple_column_valuesrc/query/list_file.c6740
qfile_overwrite_tuplesrc/query/list_file.c6868
qfile_update_qlist_countsrc/query/list_file.c7041
qfile_collect_list_sector_infosrc/query/list_file.c7085
qfile_free_list_sector_infosrc/query/list_file.c7181
qmgr_create_new_temp_filesrc/query/query_manager.c2954
qmgr_get_new_pagesrc/query/query_manager.c2822
qmgr_get_external_file_page (static)src/query/query_manager.c2912
qmgr_init_external_file_page (static)src/query/query_manager.c2890
scan_open_list_scansrc/query/scan_manager.c3715
  • Versus cubrid-query-executor.md — that doc names list-file as the materialised output of every BUILDLIST_PROC and asserts qexec_end_one_iteration writes via qfile_generate_tuple_into_list (T_NORMAL). Verified: query_executor.c near line 1262 in this worktree. The fast-path branch through qexec_hash_gby_agg_tuple (when g_hash_eligible is set) calls into qfile_generate_tuple_into_list only on the spill path; the in-memory hash table holds tuples in an AGGREGATE_HASH_CONTEXT, not in a list-file. The spill list-file is agg_hash_context->part_list_id — a list-file used as a hash-spill substrate, not as a streaming pipe.
  • Versus cubrid-post-processing.md — that doc names qfile_sort_list as the entry to sort-based GROUP BY and ORDER BY, and qfile_combine_two_list as the UNION/INTERSECT/DIFFERENCE driver. Both verified. The post-processing doc also names external_sort.c::sort_listfile as the actual sort kernel — list-file is the input/output substrate, sort is external_sort’s job. The boundary is the sort_listfile callback contract (get_func / put_func / cmp_func); see qfile_sort_list_with_func in ## CUBRID's Approach.
  • Versus cubrid-page-buffer-manager.md — list-file pages are normal database pages once they spill to FILE_TEMP. They go through pgbuf_fix / pgbuf_unfix like any other page, with PAGE_QRESULT ptype. This means a list-file’s spill pages compete for slots in the page buffer pool with heap and B-tree pages, and they participate in the same LRU. Membuf pages, by contrast, are not in the page buffer pool at all — they are raw malloc’d memory in qmgr_allocate_tempfile_with_buffer. The two-tier design means that a small list-file is essentially free (no pgbuf slot occupied), and a large one becomes a page-buffer citizen only after it spills.
  • MVCC interaction. List-files are query-private; no other transaction ever sees them. The visibility check for tuples flowing into a list-file happens at the producing scan (heap_get_visible_version and friends in scan_next_heap_scan). Once written into the list-file, tuples are committed-from-the-perspective-of-this-query — there is no per-list-file MVCC. This means a query whose outer scan ran early in a transaction and whose materialised list-file is read late in the same transaction will see the same tuples both times even if the underlying heap has been updated in between. That is the desired snapshot-stable behaviour for a single statement; for cursors (HOLDABLE cursors) the result list-file is the only thing that survives a commit, so it is implicitly snapshot-stable across the commit boundary.
  • WAL. List-file pages on FILE_TEMP use qfile_set_dirty_page (or, for index-key buffer mode, qfile_set_dirty_page_and_skip_logging). Because FILE_TEMP files are not recovered, log records for these pages are unnecessary and CUBRID skips logging them where it can. After a crash, the tempcache is wiped, and any in-flight queries are simply restarted from the client. This is the same trade-off PostgreSQL makes for BufFile and SQL Server makes for TempDB minimal-logging — temp tuple-streams pay no WAL cost, but they cost a re-execute on crash.
  • Naming hazard. The list-file cache (QFILE_LIST_CACHE_ENTRY) is not the XASL cache (XASL_CACHE_ENTRY); the two are linked but distinct. The XASL cache memoises plans; the list-file cache memoises results of those plans. Result-cache hits skip execution; plan-cache hits skip parse-and-optimise. A result-cache entry holds a back-pointer to the XASL-cache entry (xcache_entry) so XASL eviction cascades into list-cache eviction (qfile_clear_list_cache).
  • Parallel writers. qfile_open_list produces a single live writer cursor (last_pgptr). The parallel-heap-scan path (scan_open_parallel_heap_scan) appears to fan tuples out to multiple producer threads — verify whether each parallel worker writes into its own private list-file and whether a coordinator merges them, or whether the writer cursor is mutex-protected. Searching query_executor.c for parallelism reveals an ORDERBY_STATS parallel argument threaded through qfile_sort_list_with_func, suggesting parallel sort is supported, but the parallel-write path inside list-file itself is not obvious.
  • Compressed tuple format. Every tuple is stored uncompressed; for wide list-files (e.g., a sub-query selecting many columns) this is wasteful. Whether CUBRID has plans for or experiments with row-compressed list-files is unclear; no compress symbols appear in list_file.c.
  • Columnar variant. The packed-row format is fine for OLTP-shaped queries but pathological for wide-aggregation analytics. CUBRID has no columnar list-file. PostgreSQL has experimental “columnar” extensions but not in core; MySQL/MariaDB ColumnStore is a separate engine. A columnar list-file (QFILE_LIST_ID with one membuf chain per column) would be an interesting refactor with bounded blast radius, but no evidence of work in this direction.
  • Membuf size policy. PRM_ID_TEMP_MEM_BUFFER_PAGES defaults to a few pages; in practice many sub-query results are small and never spill. Whether the default is correct for current workloads — and whether per-query adaptive sizing would pay off — would need profiling. The split between TEMP_FILE_MEMBUF_NORMAL and TEMP_FILE_MEMBUF_KEY_BUFFER shows the engine already differentiates by use-case, so adding a TEMP_FILE_MEMBUF_LARGE for known-big producers (CONNECT BY, large hash builds) is a small change away.
  • is_result_cached ownership boundary. When xasl->list_id->is_result_cached is true, the executor skips destroying the list-file and the cache takes ownership. The control flow that flips this bit is spread across qexec_execute_query, qmgr_process_query, and xcache_can_entry_cache_list. A single ownership-transfer function would help, but the current scattering is consistent with CUBRID’s “C error model” — every owner returns success/failure and the caller decides. Worth a refactor only if a real bug is found.
  • src/query/list_file.c (~7200 lines) — the implementation, including page allocation (qfile_allocate_new_page), tuple writing (qfile_save_normal_tuple and friends), scan state machine (qfile_scan_next), set operations (qfile_combine_two_list), sort wrapper (qfile_sort_list_with_func), and the result-cache (qfile_lookup_list_cache_entry etc.).
  • src/query/list_file.h — public API.
  • src/query/query_list.hQFILE_LIST_ID, QFILE_LIST_SCAN_ID, QFILE_PAGE_HEADER, tuple-format macros, sort-list types.
  • src/query/query_manager.hQMGR_TEMP_FILE (the storage handle).
  • src/query/query_manager.cqmgr_create_new_temp_file, qmgr_get_new_page / qmgr_get_old_page, the membuf-vs-FILE_TEMP arbitration.
  • src/query/scan_manager.cscan_open_list_scan (the wrapping that turns a list-file into an S_LIST_SCAN scan operator).
  • src/query/query_executor.c — call sites: qexec_end_one_iteration (writer), qexec_clear_xasl (lifecycle), qexec_groupby / qexec_orderby_distinct / qexec_execute_analytic (sort callers), qexec_clear_list_cache_by_class (cache invalidation).
  • src/storage/file_manager.cfile_create_temp, file_alloc, file_temp_retire; the FILE_TEMP substrate underneath the spill.

Textbook references:

  • Goetz Graefe, Volcano — An Extensible and Parallel Query Evaluation System (TKDE 1994). The iterator model and the open / next / close contract that every list-file consumer follows.
  • Goetz Graefe, Query Evaluation Techniques for Large Databases (Computing Surveys 1993). Tuple buffers, materialisation, sort-merge for set operations.
  • Alex Petrov, Database Internals (O’Reilly 2019). Chapter 1 (file organisation, slotted page, tuple length prefixes) and Chapter 14 (sort, merge, group-by). Captured locally at knowledge/research/dbms-general/database-internals.md.
  • Joseph M. Hellerstein, Michael Stonebraker, James Hamilton, Architecture of a Database System (Foundations and Trends in Databases 2007). The “inter-operator pipe” terminology and the operator-tree implementation model.
  • Abraham Silberschatz, Henry Korth, S. Sudarshan, Database System Concepts (6th ed.) — Chapter 12 (“Query Processing”) and Chapter 13 (“Query Optimization”) describe the materialisation-vs-pipelining trade-off textbook-style. Captured locally at knowledge/research/dbms-general/database-system-concepts.md.