CUBRID Reading Path — How a Write Commits End-to-End
Contents:
- What this traces
- Step 1 — Client to server (briefly)
- Step 2 — Parse + semantic-check + statement dispatch
- Step 3 — Locator’s force family
- Step 4 — Heap insert
- Step 5 — Index update
- Step 6 — Constraint and FK checks
- Step 7 — Trigger fires (if any)
- Step 8 — Log records via the prior list
- Step 9 — Replication record (if HA)
- Step 10 — Lock acquisition
- Step 11 — COMMIT statement
- Step 12 — Durability
- Step 13 — Eventually: dirty-page flush + DWB
- Step 14 — MVCC vacuum eventually reclaims dead versions
- Diagram — full pipeline
- What we did NOT cover
- Sources
What this traces
Section titled “What this traces”A single client connection sends INSERT INTO t VALUES (...) and
then COMMIT. The reading path that follows is the durable path —
how those two statements turn into bytes on three stable-storage
targets (active log volume, heap data volume, overflow file) so the
client’s commit acknowledgment is the engine’s promise that the
row will survive any subsequent crash. We thread through ~13 detail
docs in order: parser, semantic-check, statement dispatch, locator
_force fan-in, heap manager, overflow file, B+Tree, MVCC, trigger,
prior list, log manager, lock manager, HA replication, transaction
state machine, page buffer, DWB, checkpoint, vacuum. The trip ends
not at “COMMIT returned” but at the eventual page flush and
eventually-eventual vacuum reclamation — calling the write “done”
requires the LOG_COMMIT record durable, the row reachable through
every relevant index under every isolation level, and the dirty
heap+btree pages either flushed or covered by a torn-write defense.
Step 1 — Client to server (briefly)
Section titled “Step 1 — Client to server (briefly)”Identical to the SELECT path: the application calls db_* (CCI,
ODBC, JDBC, or a CAS broker shim), the broker ships the SQL text
plus bind parameters over the network protocol to the server’s
request dispatcher, and a worker thread picks it up. The
client-side workspace (work_space.c, locator_cl.c) holds MOPs
that will later receive permanent OIDs. See cubrid-rpath-select.md
steps 1–3. The only INSERT-specific note is that the workspace will
mark the new MOP dirty with LC_FLUSH_INSERT so the eventual flush
packs it into an LC_COPYAREA — but for the executor-driven path
that this doc follows (INSERT ... VALUES (...) parsed and run
inside one statement), the workspace flush is bypassed and the
executor calls locator_attribute_info_force directly on the
server with locally-built attribute-info bundles.
Step 2 — Parse + semantic-check + statement dispatch
Section titled “Step 2 — Parse + semantic-check + statement dispatch”The server’s parser phase is shared with SELECT: the GLR Bison
grammar in csql_grammar.y reduces the INSERT INTO t VALUES (...)
text into a PT_INSERT node — a PT_NODE whose node_type is the
disambiguating tag and whose info.insert arm carries the target
class, the value list, and the optional ON DUPLICATE KEY UPDATE
clause. The lexer is a Flex DFA with start-condition states; the
parser is built with %glr-parser so it can absorb SQL’s historic
ambiguities. Semantic check then runs (cubrid-semantic-check.md):
name resolution binds t to its class OID, the value-list types
are unified against the table’s column types via the type-coercion
rules, and any user-specified DEFAULT values are filled in. See
cubrid-parser.md for the parse-tree shape and
cubrid-semantic-check.md for the resolution and type-checking
passes.
The post-semantic-check statement is then dispatched. The single
entry point is do_statement in src/query/execute_statement.c,
which is also the dispatcher for DDL — do_statement is one big
switch on PT_NODE.node_type that routes to per-kind handlers. For
PT_INSERT the handler is do_insert (and its prepared-statement
sibling do_execute_statement re-enters the same switch on a
re-prepared PT_NODE). do_insert builds an XASL fragment for the
value-list, runs it through the executor, and for each produced row
calls into the locator’s force family. See cubrid-ddl-execution.md
§“Top-level dispatch — do_statement and the DDL switch” for the
switch’s structure; the DML arms sit alongside the DDL arms in the
same dispatch.
Step 3 — Locator’s force family
Section titled “Step 3 — Locator’s force family”locator_attribute_info_force (locator_sr.c) is the canonical
fan-in for every server-side row mutation. Its body is a
switch (operation) on LC_COPYAREA_OPERATION. For INSERT the
LC_FLUSH_INSERT arm builds a RECDES from HEAP_CACHE_ATTRINFO
via locator_allocate_copy_area_by_attr_info and dispatches to
locator_insert_force. UPDATE falls through into the same
encoding step after reading the existing record; INSERT has no
existing version and skips the snapshot-aware read.
locator_insert_force drives six responsibilities in order:
(1) partition pruning to pick the actual partition class,
(2) heap insert via heap_insert_logical (decides the OID),
(3) the per-index loop in locator_add_or_remove_index that
touches every B+Tree on the class, (4) FK checks via
locator_check_foreign_key, side-effects to (5) HA
replication via repl_log_insert and (6) WAL via the heap
and btree primitives’ own log_append_* calls. See
cubrid-locator.md §“The ‘force’ family”.
Step 4 — Heap insert
Section titled “Step 4 — Heap insert”heap_insert_logical (heap_file.c) is the slotted-page side.
(a) It stamps the record’s MVCC header — mvcc_ins_id is
assigned to the transaction’s MVCCID (lazily allocated via
mvcctable::get_new_mvccid if this is the first write; see
cubrid-mvcc.md §“MVCCID assignment policy”). A brand-new INSERT’s
mvcc_rec_header flag byte carries only VALID_INSID. (b) If
the record is too big for any heap page, the row body goes to the
overflow file via heap_ovf_insert → overflow_insert
(cubrid-overflow-file.md), laid across a chain of
OVERFLOW_FIRST_PART + OVERFLOW_REST_PART pages under a
log_sysop_start / log_sysop_attach_to_outer bracket; the heap
home slot stores a REC_BIGONE forwarding record. Each overflow
page emits an RVOVF_NEWPAGE_INSERT redo record; a
LOG_DUMMY_OVF_RECORD on the head page anchors an LSN for HA
replication and vacuum. (c) Otherwise the record is REC_HOME;
the heap manager finds a target home page via
HEAP_STATS_BESTSPACE_CACHE, falling back to
HEAP_HDR_STATS.estimates.best[], a bounded scan, and finally
heap_alloc_new_page. (d) A slot is allocated; the slot id +
(volid, pageid) is the row’s permanent OID. FILE_HEAP pages are
ANCHORED_DONT_REUSE_SLOTS, so the OID never aliases. (e)
Per-page stats and the bestspace cache update; the page is dirty.
See cubrid-heap-manager.md §“Insert flow”.
Step 5 — Index update
Section titled “Step 5 — Index update”For each B+Tree on the class, the locator’s locator_add_or_remove_index
extracts the key columns from the new record via
heap_attrvalue_get_key, and calls btree_insert with the
(key, OID) pair. The btree side traverses the tree under
latch-coupling discipline: the descent fixes parent with S
latch, fixes child, releases parent, all the way down — except on
the write path where it escalates the leaf to an X latch. If the
leaf is full and a split is needed, btree_insert_helper opens a
log_sysop_start system-op bracket, calls btree_split_node (or
btree_split_root for a height-growing split), promotes a
separator key into the parent, and closes with
log_sysop_end_logical_undo — so abort can re-merge by replaying
logical undo rather than reverse-applying physical page movement.
See cubrid-btree.md §“Splits — split point, key promotion, parent
update”.
The leaf itself stores LEAF_REC (a fixed prefix of VPID ovfl + short key_len) followed by the key bytes followed by an OID list.
For non-unique keys the OID list grows inline up to a per-page
threshold; spillover crosses into a per-key overflow chain
allocated from BTID_INT::ovfid — PAGE_BTREE-typed pages headed
by a BTREE_OVERFLOW_HEADER whose slot 0 carries next_vpid and
whose slot 1 carries the OIDs sorted by OID for binary search. New
overflow pages are linked at the head (the new page’s
next_vpid becomes the old first overflow page) so subsequent
inserts don’t pay tail-walk cost. See
cubrid-overflow-file.md §“Walk: B+Tree overflow OID list”.
Unique-key check happens at insert time, under the leaf’s X
latch, dispatched via btree_find_oid_and_its_page with
BTREE_OP_PURPOSE = INSERT. Because the OIDs in a leaf record’s
suffix list are sorted by OID, a unique index has at most one OID
per key — finding any OID for the key already there is the
duplicate. The check is gated by the BTREE_NEED_UNIQUE_CHECK
macro: it runs only on active transactions, never on recovery
redo (recovery never inserts a duplicate because the original
insert was already validated). See cubrid-btree.md §“Unique-key
handling — OID-list and stats”.
Step 6 — Constraint and FK checks
Section titled “Step 6 — Constraint and FK checks”After heap+index, locator_insert_force runs the
constraint-orchestration helpers from the locator. Three are
relevant for INSERT.
locator_add_or_remove_index (already invoked in step 5) is
itself the unique-key check loop — btree_insert returns
ER_BTREE_UNIQUE_FAILED if the key already exists in a unique
B+Tree, and the locator propagates the error.
locator_check_unique_btree_entries is the deeper integrity
check used by CHECKDB and post-restore consistency, not on the
hot insert path. locator_check_foreign_key walks the FK list
on the class representation, extracts the referencing-column key
from the new record, and probes the parent class’s PK B+Tree via
btree_keyoid_checks; on miss the insert is rejected with
ER_FK_INVALID. See cubrid-locator.md §“Constraint orchestration”.
The order is deliberate: heap insert first (to pin the OID), then
indexes (to populate every key including the PK that other FKs
might reference), then FK check (the parent might also be a row
that this transaction is inserting earlier in the same batch).
Within locator_check_foreign_key, the parent lookup is a regular
btree fetch with the transaction’s own snapshot, so a parent
inserted earlier in this transaction is visible by virtue of
“my own writes are visible to me” — see cubrid-mvcc.md’s
mvcc_satisfies_snapshot truth table for the
MVCC_IS_REC_INSERTED_BY_ME arm.
Step 7 — Trigger fires (if any)
Section titled “Step 7 — Trigger fires (if any)”If the target class has BEFORE INSERT or AFTER INSERT triggers
defined, they fire from the client side — not the server.
Trigger firing happens in obt_apply_assignments
(object_template.c) before the dirty MOP is packed into the
LC_COPYAREA. The dispatch is gated by sm_active_triggers,
which short-circuits the no-trigger case in O(1). When triggers
do exist, tr_prepare_class builds a TR_STATE, the BEFORE pass
runs via tr_before_object → tr_execute_activities (calling
eval_action on each trigger in priority order), the heap
mutation happens, and tr_after_object runs the AFTER pass and
queues DEFERRED triggers onto the per-transaction
tr_Deferred_activities chain. Recursion is bounded two ways: a
depth counter (tr_Current_depth ≤ 32) catches infinite row-level
recursion, and an OID stack (tr_Stack) silently skips re-entry of
a STATEMENT-level trigger. See cubrid-trigger.md §“Firing path”
and §“Recursion control”.
For the executor-driven insert path that this doc traces, the
trigger fires before the DML reaches locator_attribute_info_force,
so by the time we are at steps 4–6 the trigger has either accepted,
rejected (raising ER_TR_REJECTED and rolling back to the
statement boundary), or invalidated the transaction
(tr_Invalid_transaction = true so the eventual COMMIT becomes
ABORT). AFTER triggers run after the heap mutation, on the next
visit to tr_after_object. The locator’s force family knows
nothing about triggers — cubrid-locator.md’s server side
explicitly does heap, lock, btree, FK, log, and replication, but
not triggers; this is the trigger_manager.h #error Does not belong to server module guard made manifest.
Step 8 — Log records via the prior list
Section titled “Step 8 — Log records via the prior list”Every page change in steps 4–5 calls the WAL append API
(log_append_undoredo_data, log_append_redo_data,
log_append_undo_data). MVCC-flavored variants
(LOG_MVCC_UNDOREDO_DATA, etc.) carry the writer’s MVCCID and a
LOG_VACUUM_INFO whose prev_mvcc_op_log_lsa chains MVCC
operations into a list the vacuum subsystem can walk without
re-reading every record.
Each call funnels into prior_lsa_alloc_and_copy_data / _crumbs
(log_append.cpp:273/:410), which mallocs a LOG_PRIOR_NODE
outside any global mutex, optionally zlib-compresses payloads
over log_Zip_min_size_to_compress, and returns the node. Then
prior_lsa_next_record assigns the LSA from
log_Gl.prior_info.prior_lsa under prior_lsa_mutex, links the
node onto the tail, bumps list_size, unlocks. The mutex is held
only across O(1) link manipulation and LSN arithmetic — the
expensive compression and memcpy happen outside, so N producers
build in parallel. See cubrid-prior-list.md §“Producer step 2”.
The drain runs separately. log_Flush_daemon (and any
backpressure self-help) calls
logpb_prior_lsa_append_all_list under LOG_CS_OWN_WRITE_MODE,
which detaches the prior list under the mutex (swap head/tail/size
to NULL/NULL/0), releases, then walks the detached list with
logpb_append_next_record to copy each node’s bytes into the
authoritative LOG_PAGE buffer. The disk write is a separate
stage — see step 12.
For a single INSERT into a class with two indexes, the prior list
receives LOG_MVCC_UNDOREDO (from heap_insert_logical), two
LOG_UNDOREDO_DATA (from btree_insert × 2), plus a
LOG_DUMMY_OVF_RECORD if the row spilled to overflow. Each
carries LOG_RECORD_HEADER { prev_tranlsa, back_lsa, forw_lsa, trid, type } — the triple-LSA layout ARIES needs: prev_tranlsa
chains records of the same transaction so undo walks backward;
back_lsa/forw_lsa chain records in physical log order so redo
scans forward.
Step 9 — Replication record (if HA)
Section titled “Step 9 — Replication record (if HA)”If the server is configured as an HA master, the same
locator_*_force flow that emitted the WAL records also calls
repl_log_insert (replication.c), which appends a
LOG_REPL_RECORD to the per-transaction staging array
tdes->repl_records[]. The staging entry is intentionally minimal:
the repl_data payload is | packed_pkey_size | class_name | pkey_dbvalue | — class name plus primary-key value, no full row
image. The slave will re-fetch the row from the master’s heap when
it applies the event. This keeps the per-transaction staging cost
bounded even for batch inserts touching millions of rows.
The actual LOG_REPLICATION_DATA log record is not appended at
this point — the entries sit on tdes->repl_records[] until commit
time. See cubrid-ha-replication.md §“Master side —
LOG_REPL_RECORD and the staging array”. The CDC channel is
populated separately: every DML emits a LOG_SUPPLEMENTAL_INFO
record (record type 52) inline with the WAL via
log_append_supplemental_*, carrying a richer self-describing
payload (table OID, before/after image, transaction user) so
external pull-style consumers can decode without consulting the
catalog. See cubrid-cdc.md §“LOG_SUPPLEMENTAL_INFO — the modern
event format” and cubrid-log-manager.md §“LOG_SUPPLEMENTAL_INFO
is the channel CDC uses”.
Step 10 — Lock acquisition
Section titled “Step 10 — Lock acquisition”Locks flow through the locator path. For INSERT, the row’s OID is
decided during heap_insert_logical when the slot is allocated,
so the X-lock is acquired inside the heap path rather than upstream
— INSERT is one of the few ops that takes its row lock inside the
heap primitive. The lock manager’s public entry is lock_object
(lock_manager.c:5945), delegating to
lock_internal_perform_lock_object for the hash → resource →
compatibility-check sequence.
lock_object finds-or-inserts an LK_RES keyed by
LK_RES_KEY{type=INSTANCE, oid, class_oid}, then grants fresh,
grants by adding to the holder list (compatible with
total_holders_mode | total_waiters_mode), or splices into the
waiter list and suspends. The compatibility check is one O(1)
matrix lookup against aggregated mode bits. CUBRID’s 12-mode
vocabulary (NA … SCH-M) carries IX for the parent class
(taken upstream in the executor) and X for the row OID. The new
OID has no prior holders, so the LK_RES is fresh and granting is
trivial.
Index-key locks are not taken on the inline OID-list — CUBRID
relies on MVCC + row-OID locking for non-SERIALIZABLE isolation.
Under SERIALIZABLE, key-range locks are taken at scan boundaries.
Under READ COMMITTED the instance lock is short-duration (released
at statement end via lock_unlock_object_by_isolation); under
REPEATABLE READ / SERIALIZABLE it is long-duration. See
cubrid-lock-manager.md §“Lock acquisition flow”.
Step 11 — COMMIT statement
Section titled “Step 11 — COMMIT statement”The client sends COMMIT. xtran_server_commit
(transaction_sr.c:71) forwards to log_commit
(log_manager.c:5352), which delegates to log_commit_local:
- Commit-side triggers.
tr_check_commit_triggersruns any user-triggerTR_EVENT_COMMITand drains the per-transactiontr_Deferred_activitiesqueue. If a deferred action raisestr_Invalid_transaction, commit converts to abort (ER_TR_TRANSACTION_INVALIDATED). - Drain postpones. If
LOG_POSTPONErecords were buffered,LOG_COMMIT_WITH_POSTPONEis appended,log_do_postponereplays them, state moves toTRAN_UNACTIVE_COMMITTED_WITH_POSTPONE. - Atomic repl + commit emission.
log_append_repl_info_and_commit_logtakesprior_lsa_mutexonce and appends everytdes->repl_records[]entry asLOG_REPLICATION_DATA(or_STATEMENT) records atomically with the commit record — no peer transaction’s commit can slip between them. Seecubrid-ha-replication.md§“Atomic emission”. - Append
LOG_COMMIT. The commit record’s LSA (commit_lsa) is the transaction’s promise handle. - Wait for durability.
logpb_flush_pages(commit_lsa)parks the committer ongc_cond(under defaultasync_commit=false, group_commit=true); the log-flush daemon ticks everylog_get_log_group_commit_intervaland broadcasts; the committer wakes whennxio_lsa >= commit_lsa. Seecubrid-prior-list.md§“Commit waiters”. - Transition + release. TDES state →
TRAN_UNACTIVE_COMMITTED,logtb_complete_mvccflips the bit in the active set, locks released (or retained ifretain_lock), trantable index freed vialogtb_release_tran_index. Seecubrid-transaction.md.
Step 12 — Durability
Section titled “Step 12 — Durability”log_Flush_daemon (log_manager.c::log_flush_execute) puts the
bytes on disk. Each tick (timer or on-demand wakeup):
// log_flush_execute — log_manager.c (condensed)LOG_CS_ENTER (&thread_ref);logpb_flush_pages_direct (&thread_ref); // → logpb_prior_lsa_append_all_list (drain prior list → LOG_PAGE buffer) // → logpb_flush_all_append_pages (write LOG_PAGE → active log + fsync)LOG_CS_EXIT (&thread_ref);pthread_cond_broadcast (&log_Gl.group_commit_info.gc_cond);logpb_flush_all_append_pages walks the dirty LOG_PAGE list,
issues fileio_write_pages on the active log volume, and advances
log_append_info::nxio_lsa — the lowest LSA not yet on stable
storage. The two-step flush of partial records (everything except
the page where the most-recent record header lives, then the
header page last) makes the write resilient to a crash mid-flush:
the on-disk log always ends at either an old end-of-log marker or
a new one, never a dangling forward pointer. See
cubrid-log-manager.md §“Flush”.
After the daemon’s broadcast, every committer whose commit_lsa <= nxio_lsa wakes, observes the watermark, and returns the
acknowledgment to the client. This is the moment the durability
promise crystallizes — once log_commit returns
TRAN_UNACTIVE_COMMITTED, the row is reachable through every
index and constraint after any subsequent crash. What is not yet
on disk: the heap page’s modified slot, the btree leaf’s new
entry, the overflow chain. Only the WAL is durable; the data
pages catch up later.
Step 13 — Eventually: dirty-page flush + DWB
Section titled “Step 13 — Eventually: dirty-page flush + DWB”The dirty heap and btree pages are flushed lazily by the page
buffer’s three daemons (see cubrid-page-buffer-manager.md):
Page Flush Daemon picks dirty BCBs and writes them at a rate
adapted to the dirty ratio; Page Post-Flush Daemon
post-processes flushed BCBs and hands them to direct-victim
waiters; Page Maintenance Daemon adjusts per-private-LRU
quotas every 100 ms.
Every dirty data-page write goes through the double-write
buffer to defend against torn writes (cubrid-double-write-buffer.md).
Producer-side: dwb_acquire_next_slot CAS-bumps the position
counter to claim a slot in the in-memory DWB block,
dwb_set_data_on_next_slot copies the page bytes in, the page is
inserted into dwb_Global.slots_hashmap (so a concurrent reader
finds it via dwb_read_page instead of re-reading a possibly-torn
home), and when the block fills the dwb-flush-block daemon
writes the block sequentially to the DWB volume, fsyncs, then
writes each slot’s contents to its home volume.
Lockstep with the WAL invariant: before any data page is written
home, pgbuf_flush_check_log_lsa ensures nxio_lsa >= page->lsa.
Step 12’s commit force satisfies this for our INSERT’s pages, so
their flush is unconditional from this point on.
The next checkpoint records that the flush has happened.
logpb_checkpoint’s pgbuf_flush_checkpoint(newchkpt_lsa, ...)
returns tmp_chkpt.redo_lsa = the smallest oldest_unflush_lsa
remaining; this advances the recovery anchor. The checkpoint then
walks the trantable, packs an active-transaction snapshot into
LOG_REC_CHKPT, emits LOG_END_CHKPT, fsyncs, and updates
log_Gl.hdr.chkpt_lsa in the active log header. See
cubrid-checkpoint.md §“Top-level flow”. A crash after the
checkpoint and before subsequent dirty-page flush replays only
redo records below the new redo-LSA.
Step 14 — MVCC vacuum eventually reclaims dead versions
Section titled “Step 14 — MVCC vacuum eventually reclaims dead versions”For an INSERT specifically, vacuum has little to do — there is no
prior version to reclaim. But to close the loop on the MVCC
machinery: once the oldest active snapshot moves past our commit’s
MVCCID the row is universally visible. The transaction’s
LOG_MVCC_UNDOREDO record is nonetheless visible to vacuum’s
per-block scanner. vacuum_consume_buffer_log_blocks sweeps
the log forward chunking it into blocks of 31 log pages
(VACUUM_LOG_BLOCK_PAGES_DEFAULT); our record’s MVCCID feeds the
block’s newest_mvccid. Once the global oldest_visible_mvccid
exceeds newest_mvccid, the block becomes dispatchable.
vacuum_master_task picks it up via its vacuum_job_cursor,
CAS-transitions AVAILABLE → IN_PROGRESS, and hands it to a
vacuum_worker from the pool of up to 50.
The worker walks the block’s MVCC chain backward via
LOG_VACUUM_INFO::prev_mvcc_op_log_lsa, decompresses the undo
image into its per-thread log_zip_p, builds
VACUUM_HEAP_OBJECT { vfid, oid } per candidate, transitions to
EXECUTE, fixes the target page through its private LRU and removes
dead versions. For our INSERT the work is “walk past, alive”;
material work happens for later DELETEs/UPDATEs that touch this
row. vacuum_is_file_dropped short-circuits if the class was
dropped. On success the block transitions to VACUUMED; on
interrupt, INTERRUPTED + AVAILABLE so the master re-dispatches.
See cubrid-vacuum.md §“Worker”.
Diagram — full pipeline
Section titled “Diagram — full pipeline”flowchart TB
CLIENT["client: INSERT INTO t VALUES (...);<br/>then COMMIT"]
CLIENT -->|"net protocol<br/>(cubrid-rpath-select.md steps 1-3)"| PARSE
subgraph SERVER["cub_server worker thread"]
direction TB
PARSE["Parse<br/>(cubrid-parser.md)<br/>PT_INSERT node"]
SEM["Semantic check<br/>(cubrid-semantic-check.md)<br/>name resolution + type unify"]
DISP["do_statement → do_insert<br/>(cubrid-ddl-execution.md)"]
LOC["locator_attribute_info_force<br/>switch (LC_FLUSH_INSERT)<br/>(cubrid-locator.md)"]
INS["locator_insert_force"]
HI["heap_insert_logical<br/>(cubrid-heap-manager.md)"]
HSTAMP["MVCC stamp<br/>mvcc_ins_id<br/>(cubrid-mvcc.md)"]
OVF["heap_ovf_insert → overflow_insert<br/>(cubrid-overflow-file.md)<br/>only if record > page"]
BTI["btree_insert × N indexes<br/>(cubrid-btree.md)<br/>latch-coupling, unique check"]
CONS["locator_check_unique_btree_entries<br/>locator_check_foreign_key<br/>(cubrid-locator.md)"]
LK["lock_object<br/>X on row OID<br/>(cubrid-lock-manager.md)"]
REPL["repl_log_insert<br/>tdes->repl_records[]<br/>(cubrid-ha-replication.md)"]
SUP["log_append_supplemental_∗<br/>(cubrid-cdc.md)"]
end
CLIENT_TR["BEFORE/AFTER triggers<br/>(client side, cubrid-trigger.md)<br/>obt_apply_assignments"]
CLIENT -. "if class has triggers" .-> CLIENT_TR
CLIENT_TR -.-> DISP
PARSE --> SEM --> DISP --> LOC --> INS
INS --> HI --> HSTAMP
INS --> BTI
HI --> OVF
HI -.OID assigned.-> LK
INS --> CONS
INS --> REPL
INS --> SUP
subgraph WAL["Per page change → WAL"]
direction TB
APP["log_append_undoredo_data<br/>log_append_redo_data<br/>(cubrid-log-manager.md)"]
PRA["prior_lsa_alloc_and_copy_data<br/>malloc node, zlib outside mutex<br/>(cubrid-prior-list.md)"]
PRN["prior_lsa_next_record<br/>assign LSN, link tail<br/>under prior_lsa_mutex"]
PL["prior_list<br/>singly-linked queue"]
APP --> PRA --> PRN --> PL
end
HI --> APP
BTI --> APP
OVF --> APP
CLIENT -->|"second statement: COMMIT"| COMMIT
COMMIT["log_commit_local<br/>(cubrid-transaction.md, cubrid-log-manager.md)"]
COMMIT --> TRDC["tr_check_commit_triggers<br/>drain tr_Deferred_activities<br/>(cubrid-trigger.md)"]
COMMIT --> POST["replay LOG_POSTPONE if any"]
COMMIT --> RPC["log_append_repl_info_and_commit_log<br/>flush tdes->repl_records[]<br/>· append LOG_COMMIT<br/>under one prior_lsa_mutex hold"]
RPC --> PRA
COMMIT --> WAIT["logpb_flush_pages(commit_lsa)<br/>park on gc_cond timed-wait"]
subgraph DAEMON["log_Flush_daemon"]
DR["logpb_prior_lsa_append_all_list<br/>detach prior list under mutex<br/>copy nodes into LOG_PAGE buffer"]
FLU["logpb_flush_all_append_pages<br/>fileio_write_pages → active log volume<br/>fsync; advance nxio_lsa"]
BC["pthread_cond_broadcast(gc_cond)"]
DR --> FLU --> BC
end
PL --> DR
WAIT --> BC
BC -->|"nxio_lsa >= commit_lsa"| ACK["return TRAN_UNACTIVE_COMMITTED<br/>release locks<br/>logtb_release_tran_index"]
ACK --> CLIENT_OK["COMMIT acknowledgment to client"]
subgraph LATER["Eventually — page-buffer flush daemons"]
direction TB
PFD["Page Flush Daemon<br/>(cubrid-page-buffer-manager.md)"]
PPF["Page Post-Flush Daemon"]
PMD["Page Maintenance Daemon<br/>quota adjust every 100ms"]
DWB["dwb_acquire_next_slot<br/>dwb_add_page<br/>(cubrid-double-write-buffer.md)<br/>sequential write + fsync"]
HOME["fileio_write_pages → home volume"]
PFD --> DWB --> HOME
PPF -.-> DWB
end
HI -. "dirty heap pages" .-> PFD
BTI -. "dirty btree pages" .-> PFD
subgraph CHK["Periodic — log-checkpoint daemon"]
CHKD["logpb_checkpoint<br/>(cubrid-checkpoint.md)<br/>LOG_START_CHKPT → pgbuf_flush_checkpoint<br/>→ LOG_END_CHKPT → log header fsync"]
end
HOME -. "advances redo-LSA" .-> CHKD
subgraph VACUUM["Eventually-eventually — vacuum"]
VC["vacuum_consume_buffer_log_blocks<br/>(cubrid-vacuum.md)<br/>chunk WAL into 31-page blocks"]
VM["vacuum_master_task<br/>cursor over vacuum_Data<br/>dispatch IN_PROGRESS"]
VW["vacuum_worker × ≤ 50<br/>walk MVCC chain backward<br/>fix page, remove dead version"]
VC --> VM --> VW
end
PL -. "LOG_MVCC_* records" .-> VC
VW -. "next visit" .-> HOME
Each arrow is annotated with the detail doc that owns its mechanism. Steps that share one thread of execution (parse → semantic check → dispatch → locator → heap → btree → WAL append) collapse into the upper region; the durability transition (commit wait → daemon flush → broadcast) is the boundary the client crosses. The bottom half — page-buffer flush, DWB, checkpoint, vacuum — happens outside the client’s commit acknowledgment.
What we did NOT cover
Section titled “What we did NOT cover”The path above is the single-row, single-statement, single-server INSERT + COMMIT. Adjacent paths intentionally out of scope:
- Bulk INSERT via
loaddb—BUclass-lock + bottom-up B+Tree build viaxbtree_load_index. Seecubrid-loaddb.md,cubrid-btree.md§“Bulk load”. - DELETE specifics — sets
mvcc_del_id; vacuum reclaims later. Seecubrid-heap-manager.md§“Delete flow”. - UPDATE specifics — read old + encode new + diff-driven index
update filtered by
att_id[]; may relocate or push to overflow. Seecubrid-locator.md§“locator_update_force”,cubrid-heap-manager.md§“Update flow”. - Trigger internals — ECA model, action
PT_NODElazy compile, recursion counter + OID stack, deferred drain. Seecubrid-trigger.md. - Deadlock detection —
LK_WFG_EDGEon conflict;lock_detect_local_deadlockaborts the most-recently-blocked transaction. Seecubrid-lock-manager.md§“Deadlock detection”. - Two-phase commit (cross-server XA) —
LOG_2PC_*records,LOG_TDES::coord/gtrinfo, separate state machine on top ofTRAN_STATE. Seecubrid-2pc.md. - Replication apply / CDC consumer. Slave-side
applylogdb/la_apply_log_fileand pull-stylecdc_make_loginfo. Seecubrid-ha-replication.md,cubrid-cdc.md. - Crash recovery. Three-pass ARIES (analysis/redo/undo)
anchored on
log_Gl.hdr.chkpt_lsa. Seecubrid-recovery-manager.md,cubrid-checkpoint.md§“Recovery integration”.
Sources
Section titled “Sources”CUBRID source (/data/hgryoo/references/cubrid/)
Section titled “CUBRID source (/data/hgryoo/references/cubrid/)”src/parser/csql_grammar.y,parse_tree.h—PT_INSERT.src/query/execute_statement.c—do_statementswitch,do_insert,do_execute_statement.src/transaction/locator_sr.c—locator_attribute_info_force,locator_insert_force,locator_add_or_remove_index,locator_check_foreign_key.src/storage/heap_file.c—heap_insert_logical,heap_ovf_insert,heap_set_mvcc_rec_header_on_overflow.src/storage/btree.c,btree_load.c—btree_insert,btree_split_node,btree_find_oid_and_its_page,btree_start_overflow_page.src/storage/overflow_file.c—overflow_insert,RVOVF_NEWPAGE_INSERT.src/transaction/mvcc_table.cpp—mvcctable::get_new_mvccid,complete_mvcc.src/object/trigger_manager.c—tr_prepare_class,tr_before_object,tr_after_object,tr_check_commit_triggers.src/transaction/replication.c—repl_log_insert,repl_add_update_lsa.src/transaction/log_manager.c,log_append.cpp,log_page_buffer.c—log_append_*,prior_lsa_alloc_and_copy_data,prior_lsa_next_record,logpb_prior_lsa_append_all_list,logpb_flush_all_append_pages,log_flush_execute,log_commit,log_append_repl_info_and_commit_log.src/transaction/transaction_sr.c,log_tran_table.c—xtran_server_commit,logtb_release_tran_index,logtb_complete_mvcc.src/transaction/lock_manager.c—lock_object,lock_internal_perform_lock_object,lock_detect_local_deadlock.src/storage/page_buffer.c—pgbuf_flush_check_log_lsa,pgbuf_flush_victim_candidates, the three flush daemons.src/storage/double_write_buffer.cpp—dwb_acquire_next_slot,dwb_add_page,dwb_flush_block.src/query/vacuum.c—vacuum_consume_buffer_log_blocks,vacuum_master_task,vacuum_process_log_block.
Sibling reading-path doc
Section titled “Sibling reading-path doc”cubrid-rpath-select.md— read path; steps 1–3 reused above.
Detail docs threaded through this synthesis
Section titled “Detail docs threaded through this synthesis”cubrid-parser.md, cubrid-semantic-check.md,
cubrid-ddl-execution.md, cubrid-locator.md,
cubrid-heap-manager.md, cubrid-overflow-file.md,
cubrid-btree.md, cubrid-mvcc.md, cubrid-trigger.md,
cubrid-lock-manager.md, cubrid-prior-list.md,
cubrid-log-manager.md, cubrid-ha-replication.md,
cubrid-cdc.md, cubrid-transaction.md,
cubrid-page-buffer-manager.md, cubrid-double-write-buffer.md,
cubrid-checkpoint.md, cubrid-vacuum.md.