CUBRID Locator — OID Workspace, Bulk Fetch/Flush, and the Server-Side Insert/Update/Delete Bridge
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Cross-check Notes
- Open Questions
- Sources
Theoretical Background
Section titled “Theoretical Background”A relational engine that touches the disk has to bridge two very
different vocabularies. The lower layers — heap manager, B-tree, page
buffer, log — speak in physical addresses: a record is a slot on a
page in a file on a volume, identified by an OID (volid, pageid, slotid) (CUBRID), a TID (blocknumber, offsetnumber) (PostgreSQL),
or a ROWID (Oracle). The upper layers — the executor, the schema
operations, the catalog, the network protocol that talks to clients —
speak in logical objects: rows with column values, classes with
attributes, instances with identities. Something has to translate, and
that something is what CUBRID calls the locator.
The textbook problem the locator solves is, in Database Internals (Petrov, Ch. 3 “File Formats” and Ch. 4 “Implementing B-Trees”), called object identity: a way to name a row that survives compaction of its containing page, that survives moves between pages of the same file (forwarding pointers), and that the index layer can embed in its leaves so a B-tree lookup terminates at exactly one heap slot. The OID is the artifact; the locator is the layer that creates OIDs (when a new record is inserted), resolves OIDs (when a fetch needs to read the row body), and mutates the row at a known OID (when an UPDATE or DELETE is applied). Stonebraker’s POSTGRES (1986) and the EXODUS storage manager (Carey & DeWitt, 1986) introduced the canonical shape of this layer in object-oriented databases — they called it the object manager — and although CUBRID is a relational engine today, its lineage as a hybrid object/relational system shows in the name.
Two textbook ingredients complete the picture:
-
The workspace pattern. Database Systems: The Complete Book (Garcia-Molina, Ullman, Widom), §10.6 “Object-Oriented Database Systems”, describes the workspace as the in-memory cache of objects an application has touched. Reads pull objects into the workspace; writes mark them dirty; commit flushes the dirty set back to disk in a single batch. The workspace pattern was the central design choice of object stores like ObjectStore and GemStone, and it is the reason CUBRID’s client still carries a Memory Object Pointer (MOP) ↔ OID map even though most modern relational clients do not.
-
Bulk-fetch / bulk-flush vs per-row APIs. A relational client that touches N rows and round-trips to the server N times pays N × RTT. A workspace-based client pays roughly one RTT per transaction: at commit, the entire dirty set is packed into a single buffer and sent to the server. The buffer (CUBRID’s
LC_COPYAREA) carries a header describing N objects plus the N row bodies concatenated, and the server unpacks and dispatches per-OID inside one transaction-scoped top-op. The same shape appears in distributed transaction monitors (X/Open XA), in bulk-load APIs (PostgreSQLCOPY, MySQLLOAD DATA), and in ORMs that “session.flush()” before commit.
This document tracks how CUBRID realizes both pieces — the object
workspace (locator_cl.c) and the server-side fan-in
(locator_sr.c) — and how the network shape (LC_COPYAREA packed
in locator.c) ties them together.
Common DBMS Design
Section titled “Common DBMS Design”The textbook gives the model; this section names the engineering
conventions that almost every row-oriented engine adopts in some
form. CUBRID’s specific choices in ## CUBRID's Approach are best
read as one set of dials within this shared design space.
Three layers that meet at the locator
Section titled “Three layers that meet at the locator”Every DBMS has three layers that need to agree on what “a row” is:
- The object/row layer (executor, parser, type-checker, catalog)
speaks in fully decoded values —
DB_VALUEs,PT_NODEs,RECDESs with column offsets parsed. - The storage layer (heap, btree, page buffer) speaks in raw byte arrays plus OIDs — it knows where the bytes live but not what they mean.
- The cross-cutting services (lock manager, MVCC, log, vacuum, HA replication, foreign-key checker) speak in OIDs plus class metadata.
A canonical insert has to: (1) ask the heap to find a page, (2) allocate an OID, (3) lock the OID exclusively, (4) write the row bytes, (5) update every affected B-tree, (6) check unique, (7) check foreign keys, (8) write a log record per page touched, (9) generate a replication record if HA is on, (10) bump catalog statistics. None of those layers should know how to do all the others’ jobs. They need a conductor. CUBRID’s locator is that conductor.
Postgres: no separate workspace
Section titled “Postgres: no separate workspace”In PostgreSQL, the same role is split across heap_insert,
heap_update, heap_delete, plus ExecInsertIndexTuples and the
ON-CONFLICT machinery. The executor calls these directly; there is
no client-side workspace in the modern (post-Berkeley) Postgres —
client tuples are sent on the wire as ASCII or binary parameters of
a Bind/Execute message and the server materializes them into a
HeapTuple inside the buffer pool. PostgreSQL’s executor therefore
is the locator; the dispatch is implicit.
MySQL: the handler interface
Section titled “MySQL: the handler interface”InnoDB and the other storage engines hide behind ha_innobase (and
the abstract handler class). Rows enter and leave through
ha_write_row, ha_update_row, ha_delete_row. There is a
ha_bulk_update_row for batched updates, but the contract is
per-table-handle, not per-transaction. A workspace exists in the
form of the Field* array decoded from the row buffer, but it is
per-statement, not per-transaction.
Oracle: row sources + dirty-buffer write-behind
Section titled “Oracle: row sources + dirty-buffer write-behind”Oracle’s executor produces row sources that pipe into a DML
operator, which calls kdusru / kdusrf for the actual mutation.
The buffer cache is dirtied per row; a background DBWn writes
behind. Fetch is per-row through the row source; the bulk path is
PL/SQL’s FORALL … BULK COLLECT, which is a syntactic form, not a
storage layer.
CUBRID: explicit client workspace + server-side fan-in
Section titled “CUBRID: explicit client workspace + server-side fan-in”CUBRID inherits its workspace from CUBRID’s roots as an object-relational system (UniSQL, the predecessor, was an OODBMS). Even after the system became fully relational, the client kept the workspace because:
- The same protocol serves the C API (
db_*functions insrc/compat/) where the application’s objects live in process memory — a MOP is a pointer the application holds long-term. - The catalog itself is read as objects via the locator (the root class and per-class system records are MOPs), so the workspace mechanism doubles as the catalog cache.
- Stand-alone mode (SA_MODE) compiles the client and the server into the same process; the workspace is the boundary between them.
On the server, the dual is locator_sr.c’s force family —
locator_attribute_info_force is the canonical entry that the
executor (qexec_execute_*), the trigger machinery, the schema
manipulator, the type-checker, and the partition-pruner all
eventually call. It dispatches to locator_insert_force /
locator_update_force / locator_delete_force, and those drive
heap, btree, FK, unique-check, replication, and log in the right
order.
Theory ↔ CUBRID mapping
Section titled “Theory ↔ CUBRID mapping”| Theory | CUBRID name |
|---|---|
| Per-record identifier | OID = (volid, pageid, slotid) |
| Memory pointer to an object | MOP (Memory Object Pointer) — opaque handle |
| MOP → OID mapping | workspace hash (ws_* API; ws_oid (mop)) |
| Transient (un-persisted) identifier | ”temp OID” — OID_ISTEMP (oid) |
| Workspace dirty list | ws_* dirty list scanned by locator_mflush |
| Bulk fetch buffer | LC_COPYAREA (locator.h) |
| Bulk fetch request | LC_LOCKSET / LC_LOCKHINT |
| Per-object descriptor in the buffer | LC_COPYAREA_ONEOBJ |
| Workspace-mflush staging | LOCATOR_MFLUSH_CACHE (locator_cl.c) |
| Client fetch entry | locator_fetch_object / _class / _instance / _set |
| Client flush entry | locator_flush_class / _instance / _all_instances / locator_force |
| Server fetch entry | xlocator_fetch / _lockset / _lockhint_classes |
| Server force entry (transport) | xlocator_force |
| Server canonical DML entry | locator_attribute_info_force |
| Server per-op force | locator_insert_force / locator_update_force / locator_delete_force |
| Server constraint orchestration | locator_add_or_remove_index / locator_update_index / locator_check_foreign_key |
| Server snapshot-aware read | locator_get_object / locator_lock_and_get_object |
| Catalog (class) lookup | xlocator_find_class_oid |
| Pre-mint of OIDs | xlocator_assign_oid / xlocator_assign_oid_batch |
CUBRID’s Approach
Section titled “CUBRID’s Approach”CUBRID’s locator has three faces. On the client side
(locator_cl.c), it is a workspace + bulk-flush coordinator: it
maintains the MOP-to-OID map, watches a workspace dirty list, packs
dirty objects into an LC_COPYAREA, and ships the buffer to the
server. On the server side (locator_sr.c), it is the DML
fan-in: every insert/update/delete in the system, regardless of
who originated it, comes through locator_attribute_info_force →
locator_{insert,update,delete}_force, which drive heap +
btree + FK + unique + log + replication in the right order. In
between (locator.c + locator.h), it is a protocol layer: the
LC_COPYAREA, LC_LOCKSET, and LC_LOCKHINT structs are the
on-wire shape that serializes the workspace.
The distinguishing choices are: (1) the workspace is explicit —
not folded into the catalog cache as in PostgreSQL — and survives
across transactions; (2) bulk flush packs all dirty objects of a
transaction into one buffer, dispatched server-side under one
top-op for atomicity; (3) the canonical server-side entry is one
function (locator_attribute_info_force) with a switch on
LC_COPYAREA_OPERATION rather than three independent code paths,
which keeps cross-cutting work (lock acquisition, snapshot reading,
class-OID resolution, partition pruning) in exactly one place.
How a DML statement flows end-to-end
Section titled “How a DML statement flows end-to-end”flowchart LR
A["application:\nINSERT/UPDATE/DELETE"] --> B["compat layer\n(db_∗)"]
B --> C["workspace:\nfind/create MOP,\nmark dirty"]
C --> D{"commit?"}
D -- "no" --> E["return MOP\nto application"]
D -- "yes (or explicit flush)" --> F["locator_mflush\n(workspace traversal)"]
F --> G["LC_COPYAREA packing\n(LC_COPYAREA_MANYOBJS\n· N × LC_COPYAREA_ONEOBJ\n· row bodies)"]
G --> H["wire: net_client_request_recv_copyarea"]
H --> I["server: xlocator_force\n(transport entry)"]
I --> J["per-object dispatch:\nlocator_attribute_info_force\nor inline switch"]
J --> K{"operation?"}
K -- "INSERT" --> KI["locator_insert_force"]
K -- "UPDATE" --> KU["locator_update_force"]
K -- "DELETE" --> KD["locator_delete_force"]
KI --> L["heap_insert_logical"]
KU --> L2["heap_update_logical"]
KD --> L3["heap_delete_logical"]
L --> M["btree_update\nlocator_add_or_remove_index"]
L2 --> M2["locator_update_index\n(diff old/new attr_info)"]
L3 --> M
M --> N["btree_check_unique\nFK check\nrepl_log_insert (HA)\nlog_append (WAL)"]
M2 --> N
L --> N
L2 --> N
L3 --> N
N --> Z["LC_COPYAREA\nreturned with\nfinal OIDs"]
Z --> ZZ["client: ws_update_oid_and_class\n(temp OID → perm OID)"]
Each labeled box is unpacked in the subsections below. Note that the
executor path (server-side query execution that produces an
INSERT INTO … SELECT or UPDATE … WHERE) does not go through
xlocator_force — it builds attr_info structures locally on the
server and calls locator_attribute_info_force directly. The
xlocator_force entry is for the client-driven path (workspace
flush). Both paths converge at locator_attribute_info_force →
locator_*_force.
Workspace model — locator_cl.c
Section titled “Workspace model — locator_cl.c”The workspace (“ws”) is implemented in src/object/work_space.c
with locator_cl.c as the bridge that ferries objects between
the workspace and the server. The workspace’s data structure is a
hash of MOPs:
// MOP — src/object/work_space.h (sketch)struct db_object // typedef MOP{ OID oid; /* server OID; OID_ISTEMP until flushed */ MOP class_mop; /* MOP of the class object */ void *object; /* in-memory decoded object (MOBJ) */ unsigned dirty:1; /* needs flush */ unsigned deleted:1; /* logical delete */ unsigned no_objects:1; /* class with no instances cached */ /* ... */};A MOP is the application’s long-lived handle — held across statements, returned from queries, used to navigate from one row to another. The locator translates between MOP and OID at the boundary.
Fetch — pulling objects from server to workspace
Section titled “Fetch — pulling objects from server to workspace”Public entry points (locator_cl.h):
// locator_cl.h — fetch entriesextern MOBJ locator_fetch_object (MOP mop, DB_FETCH_MODE purpose, LC_FETCH_VERSION_TYPE fetch_version_type);extern MOBJ locator_fetch_class (MOP class_mop, DB_FETCH_MODE purpose);extern MOBJ locator_fetch_class_of_instance (MOP inst_mop, MOP *class_mop, DB_FETCH_MODE purpose);extern MOBJ locator_fetch_instance (MOP mop, DB_FETCH_MODE purpose, LC_FETCH_VERSION_TYPE fetch_version_type);extern MOBJ locator_fetch_set (int num_mops, MOP *mop_set, DB_FETCH_MODE inst_purpose, DB_FETCH_MODE class_purpose, int quit_on_errors);extern MOBJ locator_fetch_nested (MOP mop, DB_FETCH_MODE purpose, int prune_level, int quit_on_errors);The five entries differ in scope: _object is one MOP, _class
is a class MOP, _class_of_instance resolves the class given an
instance MOP, _set is a vector of MOPs, _nested follows
attribute references to a configurable depth. All five end up in
locator_lock (for single MOP) or locator_lock_set (vector),
which builds an LC_LOCKSET and round-trips to the server’s
xlocator_fetch_lockset.
The reason for the vector form is prefetch. Reading one MOP that
references many other MOPs (a class with many indexes pointing at
many instances) and round-tripping per MOP would be N × RTT. With
fetch_set, the workspace asks the server “I will need all these
MOPs; please send the lot back in one buffer.” The server replies
with one LC_COPYAREA containing many LC_COPYAREA_ONEOBJ
descriptors plus the bodies.
flowchart LR WS["workspace"] -->|"miss\non N MOPs"| FS["locator_fetch_set(N, [mop_1..mop_N])"] FS --> LS["build LC_LOCKSET\n(N reqobjs)"] LS --> NET["wire: net_client_request_2recv_copyarea"] NET --> SRV["server: xlocator_fetch_lockset"] SRV --> HEAP["heap_get_visible_version × N"] HEAP --> CA["pack LC_COPYAREA\nwith N ONEOBJs"] CA --> NET2["wire: reply"] NET2 --> CACHE["locator_cache:\nfor each ONEOBJ,\nwrite into MOP,\nclear dirty bit,\nset chn"] CACHE --> DONE["all N MOPs\nin workspace"]
Mflush — packing dirty objects for flush
Section titled “Mflush — packing dirty objects for flush”The flush path is the output dual of fetch. Its core data structure:
// LOCATOR_MFLUSH_CACHE — src/transaction/locator_cl.cstruct locator_mflush_cache{ LC_COPYAREA *copy_area; /* staging buffer */ LC_COPYAREA_MANYOBJS *mobjs; /* N-objects descriptor */ LC_COPYAREA_ONEOBJ *obj; /* current ONEOBJ slot */ LOCATOR_MFLUSH_TEMP_OID *mop_toids; /* MOPs whose OID is temp */ LOCATOR_MFLUSH_TEMP_OID *mop_uoids; /* MOPs being repartitioned */ MOP mop_tail_toid; MOP mop_tail_uoid; MOP class_mop; /* class of last mflushed obj */ MOBJ class_obj; /* its decoded class */ HFID *hfid; /* its heap */ RECDES recdes; /* current record body */ bool decache; /* drop after flush */ bool isone_mflush; /* single-object mflush */};The flush is driven by a map over the workspace’s dirty list:
stateDiagram-v2 [*] --> UNFETCHED : workspace miss UNFETCHED --> FETCHED : locator_fetch_* FETCHED --> DIRTY : locator_update_instance / add_instance / remove_instance DIRTY --> FLUSHING : ws_map_dirty + locator_mflush FLUSHING --> FLUSHED : locator_mflush_force succeeds FLUSHED --> FETCHED : on next operation FLUSHED --> FREED : decache or process exit DIRTY --> FREED : explicit decache (rare) FETCHED --> FREED : eviction
The packing loop, in locator_mflush:
- For each dirty MOP, compute its
LC_COPYAREA_ONEOBJdescriptor:operation∈ {LC_FLUSH_INSERT,LC_FLUSH_UPDATE,LC_FLUSH_DELETE};flagcarriesLC_FLAG_HAS_INDEX,LC_FLAG_HAS_UNIQUE_INDEX,LC_FLAG_TRIGGER_INVOLVED,LC_FLAG_UPDATED_BY_ME;hfidis the class’s heap;class_oidthe class OID;oidthe row OID (possibly temp);lengthandoffsetdescribe where in the buffer the row body lives. - The row body is encoded by
locator_mem_to_disk(instance) orlocator_class_to_disk(class), which calls into the schema / primitive layer to serialize the in-memory object into a rawRECDES. - If the buffer overflows,
locator_mflush_forceis called now to drain the current contents to the server, then the buffer is reset and the loop continues with the overflowing object. - If the MOP has a temp OID, it is recorded in
mop_toidsso that after the server replies with the permanent OID assigned during the flush, the workspace can be patched (ws_update_oid_and_class).
The wire shape, defined in locator.h:
// LC_COPYAREA — src/transaction/locator.hstruct lc_copy_area{ char *mem; /* the buffer */ int length; /* size */};
// LC_COPYAREA_MANYOBJS — at the END of the buffer, growing backwardstruct lc_copyarea_manyobjs{ LC_COPYAREA_ONEOBJ objs; /* first object descriptor */ int multi_update_flags; /* IS / START / END_MULTI_UPDATE */ int num_objs;};
// LC_COPYAREA_ONEOBJ — one per object, packed N-wise at endstruct lc_copyarea_oneobj{ LC_COPYAREA_OPERATION operation; /* LC_FLUSH_INSERT/UPDATE/DELETE/etc */ int flag; /* LC_FLAG_HAS_INDEX | ... */ HFID hfid; /* heap file id */ OID class_oid; /* class OID */ OID oid; /* row OID (may be temp) */ int length; int offset; /* offset of body in buffer */};The buffer is laid out bidirectionally: row bodies grow from the
front of mem, and LC_COPYAREA_ONEOBJ descriptors grow backward
from the end (anchored at LC_COPYAREA_MANYOBJS). A run of macros
in locator.h walks the descriptors:
// locator.h — descriptor walk macros#define LC_MANYOBJS_PTR_IN_COPYAREA(copy_areaptr) \ ((LC_COPYAREA_MANYOBJS *) ((char *)(copy_areaptr)->mem \ + (copy_areaptr)->length \ - DB_SIZEOF(LC_COPYAREA_MANYOBJS)))#define LC_START_ONEOBJ_PTR_IN_COPYAREA(manyobjs_ptr) (&(manyobjs_ptr)->objs)#define LC_NEXT_ONEOBJ_PTR_IN_COPYAREA(oneobj_ptr) ((oneobj_ptr) - 1)The descriptors point at row bodies via obj->offset; the body’s
length is obj->length. Because both ends grow toward the middle
and meet at the watermark mflush->recdes.data, packing is bounded
by available buffer space without two passes.
flowchart LR
subgraph CA["LC_COPYAREA buffer"]
direction LR
HEAD["row body 0\nrow body 1\nrow body 2"]
GAP["… free gap …"]
DESC2["ONEOBJ 2"]
DESC1["ONEOBJ 1"]
DESC0["ONEOBJ 0"]
META["LC_COPYAREA_MANYOBJS\n(num_objs, flags)"]
HEAD --> GAP --> DESC2 --> DESC1 --> DESC0 --> META
end
Three flush entry points
Section titled “Three flush entry points”// locator_cl.h — flush entriesextern int locator_flush_class (MOP class_mop);extern int locator_flush_instance (MOP mop);extern int locator_flush_all_instances (MOP class_mop, bool decache);extern int locator_flush_for_multi_update (MOP class_mop);extern int locator_all_flush (void);locator_flush_instance is the explicit call when the application
or the upper-layer code wants to make an in-memory change visible
before commit. locator_flush_class and _all_instances are
broader sweeps. locator_all_flush is what the commit path calls —
it walks every workspace partition and pushes everything dirty.
locator_flush_for_multi_update is the special path for UPDATE
statements that may produce multiple updates per row (triggers, FK
cascades) and needs the START_MULTI_UPDATE / END_MULTI_UPDATE
markers in LC_COPYAREA_MANYOBJS.multi_update_flags.
Internally they all funnel into locator_mflush_initialize →
ws_map_dirty(locator_mflush, mflush) → locator_mflush_force →
locator_force (the wire send), which calls
net_client_request_recv_copyarea to the server’s xlocator_force.
OID lifecycle
Section titled “OID lifecycle”An OID’s life has three stages: temp (workspace-allocated, not yet known to the server), assigned (server has bound it to a heap slot), resolved (server has confirmed the row body).
Temp OID minting on the client
Section titled “Temp OID minting on the client”When db_create is called for a new instance, the workspace mints a
temp OID — OID_ISTEMP returns true; the value is a sentinel
that is not a real (volid, pageid, slotid) tuple. The MOP is
inserted into the dirty list with operation LC_FLUSH_INSERT. No
server contact yet.
Permanent OID minting on the server
Section titled “Permanent OID minting on the server”At flush time, the server’s locator_insert_force calls
heap_insert_logical, which (via the heap manager, see
cubrid-heap-manager.md) finds a target page, allocates a slot,
and that slot id becomes the permanent OID. The new OID is
written back into the LC_COPYAREA_ONEOBJ.oid field of the reply
buffer; on reply, locator_mflush_force’s post-processing walks
mop_toids and calls ws_update_oid_and_class to remap the MOP.
For the rare case where an OID needs to be known before the row
body is written — a catalog entry that needs to reference itself —
there is xlocator_assign_oid:
// xlocator_assign_oid — src/transaction/locator_sr.cintxlocator_assign_oid (THREAD_ENTRY *thread_p, const HFID *hfid, OID *perm_oid, int expected_length, OID *class_oid, const char *classname){ if (heap_assign_address (thread_p, hfid, class_oid, perm_oid, expected_length) != NO_ERROR) return ER_FAILED;
if (classname != NULL) locator_permoid_class_name (thread_p, classname, perm_oid); return NO_ERROR;}heap_assign_address allocates a slot containing only a
REC_ASSIGN_ADDRESS placeholder (see cubrid-heap-manager.md).
The OID exists; the row body comes later.
For batches — the bulk catalog case where a CREATE TABLE creates
many catalog rows in one shot — xlocator_assign_oid_batch
(driven by LC_OIDSET / LC_CLASS_OIDSET from locator.h) does
the same for many OIDs in one round trip.
OID resolution
Section titled “OID resolution”// LC_FETCH_VERSION_TYPE — src/transaction/locator.htypedef enum { LC_FETCH_CURRENT_VERSION = 0x01, /* latest committed, no lock */ LC_FETCH_MVCC_VERSION = 0x02, /* visible to my snapshot */ LC_FETCH_DIRTY_VERSION = 0x03, /* updatable: S-lock + dirty */ LC_FETCH_CURRENT_VERSION_NO_CHECK = 0x04, /* skip server-side checks */} LC_FETCH_VERSION_TYPE;The version-type knob is the lock + visibility policy of a fetch. The header has a long comment explaining which is right when:
- MVCC version for
SELECTreads — no lock, snapshot visibility, “reader does not block writer”. - Dirty version for
SELECT FOR UPDATEand existence checks — takes an S-lock, returns the latest committed version even if the snapshot would not have seen it; the lock prevents concurrent delete. - Current version when the caller already holds X-lock — saves the lock-acquisition cost; reads the latest committed version without further checks.
The server’s xlocator_fetch switches on fetch_version_type to
build the right MVCC_SNAPSHOT and pass it to the heap layer.
The “force” family — server-side fan-in
Section titled “The “force” family — server-side fan-in”The force family is where the locator stops being a transport layer
and starts being a conductor. The canonical entry is
locator_attribute_info_force:
// locator_attribute_info_force — src/transaction/locator_sr.c (signature)intlocator_attribute_info_force (THREAD_ENTRY *thread_p, const HFID *hfid, OID *oid, HEAP_CACHE_ATTRINFO *attr_info, ATTR_ID *att_id, int n_att_id, LC_COPYAREA_OPERATION operation, int op_type, HEAP_SCANCACHE *scan_cache, int *force_count, bool not_check_fk, REPL_INFO_TYPE repl_info, int pruning_type, PRUNING_CONTEXT *pcontext, FUNC_PRED_UNPACK_INFO *func_preds, MVCC_REEV_DATA *mvcc_reev_data, UPDATE_INPLACE_STYLE force_update_inplace, RECDES *rec_descriptor, bool need_locking);The signature alone gives away the responsibilities. The function
takes an attribute-info bundle (HEAP_CACHE_ATTRINFO, see
cubrid-heap-manager.md’s “AttrInfo cache”) and an
LC_COPYAREA_OPERATION and dispatches based on the operation.
The body is a switch (operation):
// locator_attribute_info_force — body sketchswitch (operation) { case LC_FLUSH_UPDATE: case LC_FLUSH_UPDATE_PRUNE: case LC_FLUSH_UPDATE_PRUNE_VERIFY: /* (1) Read the existing row using the right MVCC discipline */ if (HEAP_IS_UPDATE_INPLACE (force_update_inplace) || !need_locking) scan = heap_get_last_version (thread_p, &context); else scan = locator_lock_and_get_object (thread_p, oid, &class_oid, ©_recdes, scan_cache, X_LOCK, COPY, NULL_CHN, LOG_ERROR_IF_DELETED); old_recdes = ©_recdes; /* fallthrough */
case LC_FLUSH_INSERT: case LC_FLUSH_INSERT_PRUNE: case LC_FLUSH_INSERT_PRUNE_VERIFY: /* (2) Encode attr_info + (for UPDATE) old_recdes into a new RECDES */ copyarea = locator_allocate_copy_area_by_attr_info (thread_p, attr_info, old_recdes, &new_recdes, -1, LOB_FLAG_INCLUDE_LOB); if (LC_IS_FLUSH_INSERT (operation)) error_code = locator_insert_force (thread_p, &class_hfid, &class_oid, oid, &new_recdes, true, op_type, scan_cache, force_count, pruning_type, pcontext, func_preds, UPDATE_INPLACE_NONE, NULL, false, false); else /* LC_FLUSH_UPDATE */ error_code = locator_update_force (thread_p, &class_hfid, &class_oid, oid, old_recdes, &new_recdes, has_index, att_id, n_att_id, op_type, scan_cache, force_count, not_check_fk, repl_info, pruning_type, pcontext, mvcc_reev_data, force_update_inplace, need_locking); break;
case LC_FLUSH_DELETE: error_code = locator_delete_force (thread_p, &class_hfid, oid, true, op_type, scan_cache, force_count, mvcc_reev_data, need_locking); break; }Three things to internalize:
-
The UPDATE path falls through into the INSERT path. That is the C
[[fallthrough]]you can see in the source. UPDATE is “read old + encode new + apply” — INSERT is just “encode new + apply”. Sharing the encoding step keeps them honest. -
Locking happens here, not in the heap. Whether the row gets X-locked depends on
need_lockingandforce_update_inplace. For ordinary executor-driven UPDATE / DELETE, the row was already X-locked during the SELECT phase that drove the predicate (the executor callslocator_lock_and_get_objectwithX_LOCKforS_DELETE/S_UPDATE); the force path skips the lock. For the workspace-driven case (client flushed an object the server has not seen X-locked yet),locator_lock_and_get_objectinside the force is what acquires it. This is the design reason the lock manager analysis (cubrid-lock-manager.md) says “locks flow through the locator”. -
Snapshot is consulted for UPDATE/DELETE under MVCC, but not for INSERT. Inserts do not have an existing version to be visible against. The fall-through arrangement makes this structural rather than conditional.
locator_insert_force — what an insert touches
Section titled “locator_insert_force — what an insert touches”// locator_insert_force — src/transaction/locator_sr.c (skeleton)static intlocator_insert_force (THREAD_ENTRY *thread_p, HFID *hfid, OID *class_oid, OID *oid, RECDES *recdes, int has_index, int op_type, HEAP_SCANCACHE *scan_cache, int *force_count, int pruning_type, PRUNING_CONTEXT *pcontext, FUNC_PRED_UNPACK_INFO *func_preds, UPDATE_INPLACE_STYLE force_in_place, PGBUF_WATCHER *home_hint_p, bool has_BU_lock, bool dont_check_fk, bool use_bulk_logging){ /* (1) Partition pruning — if the class is partitioned, choose * the actual partition that will receive the row. */ if (pruning_type != DB_NOT_PARTITIONED_CLASS) partition_prune_insert (...);
/* (2) Heap insert. The OID is decided by the slot the heap chose; * the row is now physically present. */ recdes->type = REC_HOME; heap_create_insert_context (&context, &real_hfid, &real_class_oid, recdes, local_scan_cache); context.update_in_place = force_in_place; context.is_bulk_op = has_BU_lock; context.use_bulk_logging = use_bulk_logging;
heap_insert_logical (thread_p, &context, home_hint_p); COPY_OID (oid, &context.res_oid);
/* (3) Index updates — for every B-tree on the class, encode the * key out of the new record and insert the (key, oid) pair. * locator_add_or_remove_index does the per-index loop. */ if (has_index) locator_add_or_remove_index (thread_p, recdes, oid, &real_class_oid, /*is_insert=*/true, op_type, scan_cache, /*datayn=*/true, /*need_replication=*/true, &real_hfid, func_preds, has_BU_lock, dont_check_fk);
/* (4) Foreign key checks — for every FK whose referencing column is * an attribute of this class, look up the parent in the parent * B-tree; error if not found. (locator_check_foreign_key does * the per-FK loop.) */ if (!dont_check_fk) locator_check_foreign_key (...);
/* (5) HA replication record (if HA enabled). * (6) WAL log entry — implicit, written by heap_insert_logical / * btree_insert as those primitives flush their own redo/undo. */}Step (3) is where the unique-check lives — btree_insert returns
ER_BTREE_UNIQUE_FAILED if the key already exists in a unique
B-tree, and locator_add_or_remove_index_internal propagates the
error. Step (4) is where the FK existence lives — if the parent
is missing, the server returns ER_FK_INVALID.
locator_update_force — diff-driven index updates
Section titled “locator_update_force — diff-driven index updates”UPDATE is more interesting because it might not touch every
index. If the user updated only column c5 and only one B-tree
covers c5, the others should be left alone.
// locator_update_force flow (paraphrased; see locator_sr.c)//// (1) Read the existing record (already done by attr_info_force; old_recdes in hand).// (2) Build new_recdes from attr_info.// (3) Decide policy:// - REC_HOME stays REC_HOME if size fits → in-place update (heap_update_logical).// - Otherwise → may relocate or move to overflow (heap manager decides).// (4) Index update loop:// locator_update_index (new_recdes, old_recdes, att_id[], n_att_id, ...)// For each btree on the class:// if no att_id from att_id[] is part of this btree's key → SKIP// else extract old_key from old_recdes, new_key from new_recdes// if old_key != new_key:// btree_delete (old_key, oid)// btree_insert (new_key, oid) /* unique-check happens here */// (5) FK checks for changed referencing keys.// (6) HA replication + WAL.The att_id[] / n_att_id arguments are the key. They tell the
locator which columns the executor updated; the locator uses them
to filter which B-trees need a touch. Without them, every UPDATE
would re-evaluate every index. This is a non-trivial saving for
wide tables with many indexes and narrow updates.
locator_delete_force — symmetric
Section titled “locator_delete_force — symmetric”// locator_delete_force — src/transaction/locator_sr.cintlocator_delete_force (THREAD_ENTRY *thread_p, HFID *hfid, OID *oid, int has_index, int op_type, HEAP_SCANCACHE *scan_cache, int *force_count, MVCC_REEV_DATA *mvcc_reev_data, bool need_locking){ return locator_delete_force_internal (thread_p, hfid, oid, has_index, op_type, scan_cache, force_count, mvcc_reev_data, FOR_INSERT_OR_DELETE, NULL, NULL, need_locking);}The for_moving variant (locator_delete_force_for_moving) is the
partitioned-table case: an UPDATE that moves a row from partition A
to partition B is implemented as delete from A + insert into B,
and the delete side carries the new OID + new partition class so
HA replication and trigger logic know it is a move, not a real
deletion. Both variants share locator_delete_force_internal,
which (1) reads the row to confirm the key, (2) calls
locator_add_or_remove_index with is_insert=false to remove the
keys from every covered B-tree, (3) calls heap_delete_logical
which sets mvcc_del_id (no physical removal — see
cubrid-mvcc.md), (4) writes the HA replication record, (5) the
WAL entry is again implicit in the heap/btree primitives.
Constraint orchestration
Section titled “Constraint orchestration”// locator_sr.c — index orchestration entriesextern int locator_add_or_remove_index (THREAD_ENTRY *thread_p, RECDES *recdes, OID *inst_oid, OID *class_oid, int is_insert, int op_type, HEAP_SCANCACHE *scan_cache, bool datayn, bool need_replication, HFID *hfid, FUNC_PRED_UNPACK_INFO *func_preds, bool has_BU_lock, bool skip_checking_fk);
extern int locator_update_index (THREAD_ENTRY *thread_p, RECDES *new_recdes, RECDES *old_recdes, ATTR_ID *att_id, int n_att_id, OID *oid, OID *class_oid, int op_type, HEAP_SCANCACHE *scan_cache, REPL_INFO *repl_info);Both call into locator_add_or_remove_index_internal, which
allocates a HEAP_CACHE_ATTRINFO index_attrinfo (the per-index
decoder cache; see cubrid-heap-manager.md’s “AttrInfo cache”),
then iterates over or_classrep->indexes[], the parsed list of
B-trees from the class representation:
// locator_add_or_remove_index_internal — sketchheap_attrinfo_start_with_index (thread_p, class_oid, NULL, &index_attrinfo, &idx_info);num_btids = idx_info.num_btids;
for (i = 0; i < num_btids; i++) { index = &index_attrinfo.last_classrepr->indexes[i]; btid = index->btid;
/* Skip indexes that are functional / partial / where filter excludes this row. */ if (or_pred && !pred_eval (or_pred, recdes)) continue;
/* Compute the key. heap_attrvalue_get_key extracts the key columns out of recdes using attr_info, encoding multi-column keys via tp_value_string_to_key_value. */ key_dbvalue = heap_attrvalue_get_key (thread_p, i, &index_attrinfo, recdes, btid, &dbvalue, ...);
/* Insert or delete the (key, inst_oid) pair. */ if (is_insert) btree_insert (thread_p, btid, key_dbvalue, class_oid, inst_oid, ..., &unique_pk, ...); else btree_delete (thread_p, btid, key_dbvalue, class_oid, inst_oid, ..., &unique_pk, ...); }locator_check_unique_btree_entries is the integrity-check variant
used by CHECKDB and the post-restore consistency pass — it walks
every B-tree of every class and confirms that every leaf entry has
a corresponding heap row, and that no two leaf entries map the same
unique key to different OIDs.
Foreign keys are orchestrated by locator_check_foreign_key:
// locator_check_foreign_key — src/transaction/locator_sr.c (signature)static intlocator_check_foreign_key (THREAD_ENTRY *thread_p, HFID *hfid, OID *class_oid, OID *inst_oid, RECDES *recdes, RECDES *new_recdes, bool *is_cached, LC_COPYAREA **cache_attr_copyarea);It walks the FK list on the class representation, extracts the
referencing-column key, and probes the parent class’s PK B-tree
via btree_keyoid_checks. On miss, the row insert or update is
rejected with ER_FK_INVALID.
Class fetch & catalog cache
Section titled “Class fetch & catalog cache”The catalog itself is reached via the locator. The “root class” is
itself an OID (oid_Root_class_oid); every user class is an OID in
the root class’s heap. When the executor needs to know the layout
of T1, it asks for T1’s class OID and reads it as a record.
// xlocator_find_class_oid — src/transaction/locator_sr.cextern LC_FIND_CLASSNAME xlocator_find_class_oid (THREAD_ENTRY *thread_p, const char *classname, OID *class_oid, LOCK lock);This is the catalog lookup. It returns one of LC_CLASSNAME_EXIST
/ LC_CLASSNAME_DELETED / LC_CLASSNAME_ERROR. The mapping itself
is held in a memory-hash table locator_Mht_classnames (file-scope
in locator_sr.c), keyed by class name.
Two transient-name mechanisms stack on top:
- Reservation —
xlocator_reserve_class_names/xlocator_reserve_class_name. During CREATE TABLE in a transaction, the name is reserved but the class is not committed yet. A second concurrent CREATE TABLE with the same name observes the reservation and fails or waits. - Drop transient on commit/abort —
locator_drop_transient_class_name_entries/locator_savepoint_transient_class_name_entriesreconcile the reservation set with the durable hash on transaction boundaries and on savepoint rollback.
On the client side, locator_fetch_class and
locator_fetch_class_with_classmop pull the class object into the
workspace via the same LC_COPYAREA mechanism as instance fetch.
The workspace’s MOP for the class is the durable handle the
executor uses for the rest of its work; the schema cache built on
top is what backs HEAP_CLASSREPR_CACHE at the storage layer.
Snapshot-aware read entries
Section titled “Snapshot-aware read entries”// locator_sr.c — snapshot-aware readsextern SCAN_CODE locator_get_object (THREAD_ENTRY *thread_p, const OID *oid, OID *class_oid, RECDES *recdes, HEAP_SCANCACHE *scan_cache, SCAN_OPERATION_TYPE op_type, LOCK lock_mode, int ispeeking, int chn);
extern SCAN_CODE locator_lock_and_get_object (THREAD_ENTRY *thread_p, const OID *oid, OID *class_oid, RECDES *recdes, HEAP_SCANCACHE *scan_cache, LOCK lock, int ispeeking, int old_chn, NON_EXISTENT_HANDLING handling);
extern SCAN_CODE locator_lock_and_get_object_with_evaluation ( THREAD_ENTRY *thread_p, OID *oid, OID *class_oid, RECDES *recdes, HEAP_SCANCACHE *scan_cache, SCAN_OPERATION_TYPE op_type, LOCK lock, int ispeeking, int chn, MVCC_REEV_DATA *mvcc_reev_data, UPDATE_INPLACE_STYLE inplace);locator_get_object is the read counterpart of the force
family. It is what the executor calls during scan, and what the
force functions call to read the old image during UPDATE / DELETE.
The body decides the lock mode automatically based on
op_type and whether the class is MVCC-disabled:
// locator_get_object — body sketch (src/transaction/locator_sr.c)if (!OID_IS_ROOTOID (class_oid)) { if (op_type == S_SELECT && !mvcc_is_mvcc_disabled_class (class_oid)) lock_mode = NULL_LOCK; /* MVCC: no lock */ else if (op_type == S_DELETE || op_type == S_UPDATE) lock_mode = X_LOCK; else lock_mode = S_LOCK; /* SELECT FOR UPDATE / non-MVCC */ }
if (op_type == S_SELECT && lock_mode == NULL_LOCK) scan_code = heap_get_visible_version_internal (thread_p, &context, false);else scan_code = locator_lock_and_get_object_internal (thread_p, &context, lock_mode);This is where the cubrid-mvcc.md claim (“MVCC headers are stamped
by locator_* flows”) gets cashed: every read above the heap layer
goes through this function, and it is the function that knows
when to take a lock (op_type-driven), when to take a snapshot
(MVCC-disabled-class check), and what scan code semantics to
return (S_SUCCESS / S_DOESNT_EXIST / S_SNAPSHOT_NOT_SATISFIED
/ S_ERROR).
locator_lock_and_get_object_with_evaluation is the variant that
re-evaluates the predicate after acquiring the lock — used in
SERIALIZABLE / REPEATABLE READ to detect that the row no longer
matches the WHERE clause after a concurrent UPDATE committed (the
“have I lost my row?” check in MVCC). The evaluation re-runs the
WHERE clause against the now-locked record; on V_FALSE, the row
is skipped.
Bulk transport & wire packing
Section titled “Bulk transport & wire packing”src/transaction/locator.c is the plumbing: serialization
helpers for LC_COPYAREA, LC_LOCKSET, LC_LOCKHINT, LC_OIDSET
to and from network buffers, and the shared free-list of areas
(locator_initialize_areas / locator_free_areas).
// locator.c — packing entriesextern char *locator_pack_copy_area_descriptor (int num_objs, LC_COPYAREA *, char *desc, int desc_len);extern char *locator_unpack_copy_area_descriptor (int num_objs, LC_COPYAREA *, char *desc, int packed_size);
extern int locator_pack_lockset (LC_LOCKSET *, bool pack_classes, bool pack_objects);extern int locator_unpack_lockset (LC_LOCKSET *, bool unpack_classes, bool unpack_objects);
extern int locator_pack_lockhint (LC_LOCKHINT *, bool pack_classes);extern int locator_unpack_lockhint (LC_LOCKHINT *, bool unpack_classes);
extern char *locator_pack_oid_set (char *buffer, LC_OIDSET *);extern LC_OIDSET *locator_unpack_oid_set_to_new (THREAD_ENTRY *, char *buffer);The packing is separate from the body. LC_COPYAREA.mem carries
the row bodies (already in network-endian by the upper-layer
encoder); pack_copy_area_descriptor packs only the descriptors
(LC_COPYAREA_MANYOBJS + N × LC_COPYAREA_ONEOBJ) into a
caller-provided byte array. This lets the network layer send the
two parts as separate buffers, avoiding a copy. The
LC_AREA_ONEOBJ_PACKED_SIZE macro in locator.h computes the
fixed packed size of a descriptor (4 ints + 1 HFID + 2 OIDs).
flowchart TB
subgraph CL["client side"]
WS["workspace dirty list"]
MF["LOCATOR_MFLUSH_CACHE\n(staging)"]
CA["LC_COPYAREA"]
WS --> MF --> CA
end
subgraph WIRE["wire"]
PD["packed descriptors\n(network-endian)"]
PB["packed bodies\n(already encoded)"]
CA -- "pack_copy_area_descriptor" --> PD
CA -- "(zero-copy)" --> PB
end
subgraph SR["server side"]
UCA["LC_COPYAREA reconstruct\n(unpack_copy_area_descriptor)"]
XF["xlocator_force"]
LOOP["per-OID dispatch:\nfor each ONEOBJ\n switch operation\n locator_∗_force"]
PD --> UCA
PB --> UCA
UCA --> XF --> LOOP
end
One end-to-end picture
Section titled “One end-to-end picture”sequenceDiagram
participant APP as application
participant WS as workspace
participant LCL as locator_cl
participant NET as wire
participant XS as xlocator_force
participant LSR as locator_sr
participant HM as heap_manager
participant BT as btree
participant LK as lock_manager
participant LG as log_manager
participant HA as repl_log
APP->>WS: db_create / db_get / set_attr
WS->>WS: mark MOP dirty (operation = INSERT/UPDATE)
APP->>WS: commit
WS->>LCL: locator_all_flush
LCL->>LCL: locator_mflush_initialize
LCL->>LCL: ws_map_dirty(locator_mflush)
loop per dirty MOP
LCL->>LCL: encode object → RECDES
LCL->>LCL: append LC_COPYAREA_ONEOBJ
end
LCL->>NET: locator_force(copy_area)
NET->>XS: xlocator_force
XS->>XS: tran_server_start_topop (atomic)
loop per ONEOBJ in copy_area
XS->>LSR: locator_attribute_info_force
LSR->>LK: lock_object (X_LOCK)
LSR->>HM: heap_insert_logical / heap_update_logical / heap_delete_logical
HM->>LG: log_append (redo/undo)
LSR->>BT: locator_add_or_remove_index / locator_update_index
BT->>LG: log_append (redo/undo for btree)
LSR->>LSR: locator_check_foreign_key (if FK present)
LSR->>HA: repl_log_insert (if HA enabled)
end
XS->>XS: tran_server_end_topop
XS->>NET: reply LC_COPYAREA (with perm OIDs)
NET->>LCL: receive
LCL->>WS: ws_update_oid_and_class (temp → perm)
WS->>APP: commit returns
The HA replication step deserves a note: locator_attribute_info_force
is the canonical handle for replication, which is why the
cubrid-ha-replication.md doc walks
locator_attribute_info_force → heap_*_logical → btree_update → repl_log_insert. The replication record is built from the
locator’s RECDES (already encoded) plus the OID and class OID;
repl_info.repl_info_type (REPL_INFO_TYPE_RBR_NORMAL for
row-based replication) selects the format. Because every DML goes
through this one function, the replication coverage is complete by
construction — there is no “I forgot to replicate this code path”
failure mode short of someone introducing a new DML operation that
bypasses the locator.
Source Walkthrough
Section titled “Source Walkthrough”Anchor on symbol names, not line numbers. The CUBRID source moves; a function name (or struct/enum tag) is the stable handle. Use
git grep -n '<symbol>' src/transaction/to locate the current position. The line numbers in the position-hint table at the end of this section were observed when the document was lastupdated:and are intended only as quick hints.
Type definitions (src/transaction/locator.h)
Section titled “Type definitions (src/transaction/locator.h)”enum LC_COPYAREA_OPERATION— 11-value enum:LC_FETCH,LC_FETCH_DELETED,LC_FETCH_DECACHE_LOCK,LC_FLUSH_INSERT,LC_FLUSH_INSERT_PRUNE,LC_FLUSH_INSERT_PRUNE_VERIFY,LC_FLUSH_DELETE,LC_FLUSH_UPDATE,LC_FLUSH_UPDATE_PRUNE,LC_FLUSH_UPDATE_PRUNE_VERIFY,LC_FETCH_VERIFY_CHN. The_PRUNEsuffixes mark partition-pruning variants.enum LC_FETCH_VERSION_TYPE— 4-value enum:LC_FETCH_CURRENT_VERSION,LC_FETCH_MVCC_VERSION,LC_FETCH_DIRTY_VERSION,LC_FETCH_CURRENT_VERSION_NO_CHECK.struct lc_copyarea_oneobj(LC_COPYAREA_ONEOBJ) — per-object descriptor: operation, flag, hfid, class_oid, oid, length, offset.struct lc_copyarea_manyobjs(LC_COPYAREA_MANYOBJS) — header for the descriptor array (objs first, multi_update_flags, num_objs).struct lc_copy_area(LC_COPYAREA) —(mem, length): the buffer.struct lc_lock_set(LC_LOCKSET) — bulk-fetch request:num_reqobjs,LC_LOCKSET_REQOBJ *objects,LC_LOCKSET_CLASSOF *classes, instance/class lock modes.struct lc_lock_hint(LC_LOCKHINT) — lockhint area: list of classes to prefetch with their locks.struct lc_oidset/lc_class_oidset/lc_oidmap— permanent-OID assignment request.enum lc_prefetch_flags—LC_PREF_FLAG_LOCK,LC_PREF_FLAG_COUNT_OPTIM.enum MULTI_UPDATE_FLAG—IS_MULTI_UPDATE,START_MULTI_UPDATE,END_MULTI_UPDATE.LC_FLAG_HAS_INDEX,LC_FLAG_UPDATED_BY_ME,LC_FLAG_HAS_UNIQUE_INDEX,LC_FLAG_TRIGGER_INVOLVED— per-object flag bits.
Workspace — client side (src/transaction/locator_cl.c)
Section titled “Workspace — client side (src/transaction/locator_cl.c)”struct locator_mflush_cache— staging buffer for bulk flush.struct locator_mflush_temp_oid— temp-OID list link.struct locator_cache_lock— per-fetch lock context.locator_is_root/locator_is_class— type predicates on a MOP.locator_fetch_object/_class/_class_of_instance/_instance/_set/_nested— fetch entries (mapped tolocator_lock/locator_lock_set/locator_lock_nested).locator_lock/locator_lock_set/locator_lock_nested— buildLC_LOCKSET, round-trip to server.locator_cache_lock/locator_cache_lock_set— workspace-side lock caching.locator_cache/locator_cache_object_class/_cache_object_instance/_cache_have_object/_cache_not_have_object— unpack a server replyLC_COPYAREAinto the workspace.locator_flush_class/_instance/_all_instances/_for_multi_update/_all_flush— public flush entries.locator_flush_and_decache_instance— flush + drop from cache.locator_mflush— per-MOP encode-and-pack (ws_map_dirtycallback).locator_mflush_initialize/_reset/_end/_reallocate_copy_area— staging lifecycle.locator_mflush_force— drain staging buffer to server, reconcile temp→perm OIDs, free per-flush state.locator_mem_to_disk/locator_class_to_disk— encoders (instance / class).locator_add_class/_add_instance/_add_root— workspace inserts.locator_remove_class/_remove_instance— workspace deletes.locator_update_instance/_update_class/_update_tree_classes— workspace updates that mark dirty.locator_prepare_rename_class— name-reservation handshake during ALTER TABLE RENAME.locator_force— wire send (callsnet_client_request_recv_copyarea).locator_repl_*— replication-side flush variants (locator_repl_mflush_force,locator_repl_flush_all).
Server-side — DML fan-in (src/transaction/locator_sr.c)
Section titled “Server-side — DML fan-in (src/transaction/locator_sr.c)”Force family (the canonical DML entry)
Section titled “Force family (the canonical DML entry)”locator_attribute_info_force— top-level entry; switch onLC_COPYAREA_OPERATION.locator_insert_force— heap insert + index insert + FK check.locator_update_force(static) — heap update + diff-driven index update + FK check + replication.locator_delete_force/locator_delete_force_for_moving/locator_delete_force_internal— heap delete + index delete + FK cascade + replication.locator_move_record— partition-move helper (delete on A + insert on B with linkage).locator_force_for_multi_update— multi-update path with trigger-aware ordering.xlocator_force— wire entry; loops overLC_COPYAREA_ONEOBJs and calls per-op force functions inside a top-op.xlocator_force_repl_update— HA-applier variant.
Constraint orchestration
Section titled “Constraint orchestration”locator_add_or_remove_index(extern) /locator_add_or_remove_index_for_moving(static) /locator_add_or_remove_index_internal(static) — per-index loop for INSERT/DELETE.locator_update_index— per-index loop for UPDATE (diff-driven byatt_id[]).locator_check_foreign_key(static) — FK existence probe.locator_check_unique_btree_entries— CHECKDB integrity sweep.locator_check_btree_entries/locator_check_class/locator_check_by_class_oid/locator_check_all_entries_of_all_btrees— the rest of the integrity-check family.locator_was_index_already_applied— guard against double application of a shared B-tree (PK ↔ FK overlap).xlocator_check_fk_validity— wire-callable FK validator (used during ALTER TABLE ADD CONSTRAINT).
OID lifecycle
Section titled “OID lifecycle”xlocator_assign_oid— pre-mint a single permanent OID (heap_assign_address+ name binding).xlocator_assign_oid_batch— batch variant forLC_OIDSET.xlocator_find_class_oid— class-name → class-OID with lock.locator_permoid_class_name— bind a freshly-minted OID to a reserved class name.xlocator_reserve_class_names/xlocator_reserve_class_name/xlocator_get_reserved_class_name_oid— name-reservation protocol for CREATE TABLE.xlocator_delete_class_name/xlocator_rename_class_name— catalog name maintenance.locator_drop_transient_class_name_entries/locator_savepoint_transient_class_name_entries— reconcile the transient name set on commit / abort / savepoint.locator_check_class_names/locator_dump_class_names— diagnostics over the name hash.
Snapshot-aware reads
Section titled “Snapshot-aware reads”locator_get_object— switch onop_typeto choose lock mode; dispatch toheap_get_visible_version_internal(no lock) orlocator_lock_and_get_object_internal(with lock).locator_lock_and_get_object— explicit-lock entry.locator_lock_and_get_object_with_evaluation— re-evaluate predicate after acquiring lock (for SERIALIZABLE / RR semantics).locator_lock_and_get_object_internal(static) — shared body.
Bulk fetch (the wire-side dual of bulk flush)
Section titled “Bulk fetch (the wire-side dual of bulk flush)”xlocator_fetch— single-OID server-side fetch (returns oneLC_COPYAREA).xlocator_fetch_all— heap scan returning oneLC_COPYAREAper page-batch (used during boot to populate caches).xlocator_fetch_lockset— bulk fetch ofLC_LOCKSETrequest.xlocator_fetch_all_reference_lockset— transitive-closure fetch (referenced classes/instances).xlocator_fetch_lockhint_classes— class-prefetch fromLC_LOCKHINT.locator_lock_and_return_object(static) — per-OID body of the bulk fetch path.locator_return_object_assign(static) — pack one OID into the replyLC_COPYAREA.locator_all_reference_lockset(static) — build the full reference closure for an OID.locator_find_lockset_missing_class_oids(static) — fill inLC_LOCKSET_REQOBJ.class_indexfor objects whose class was unknown to the caller.locator_guess_sub_classes(static) — expand subclass references inLC_LOCKHINT.xlc_fetch_allrefslockset— wire entry into the reference-closure path.
Lifecycle and module state (src/transaction/locator_sr.c)
Section titled “Lifecycle and module state (src/transaction/locator_sr.c)”locator_initialize/locator_finalize— create / destroy thelocator_Mht_classnameshash.locator_Pseudo_pageid_first/_Pseudo_pageid_last/_Pseudo_pageid_crt— pseudo-pageid range used for reservations during transient-name handling.locator_Mht_classnames— module-scope hash from class name to cached class OID + reservation state.
Special / replication
Section titled “Special / replication”locator_repl_prepare_force(static) — pre-check for HA-applier flush (resolves key, fetches old OID).locator_repl_get_key_value(static) — extract key columns from the replication record.locator_repl_add_error_to_copyarea(static) — pack a per-object error result back into the reply.xlocator_redistribute_partition_data— reshape partition data afterALTER TABLE ... REORGANIZE PARTITION.locator_rv_redo_rename— recovery hook for class renames.
Transport (src/transaction/locator.c)
Section titled “Transport (src/transaction/locator.c)”locator_allocate_copy_area_by_length/locator_reallocate_copy_area_by_length/locator_free_copy_area— buffer lifecycle.locator_pack_copy_area_descriptor/locator_unpack_copy_area_descriptor.locator_send_copy_area— split into(contents, descriptor)for the network layer.locator_recv_allocate_copyarea— server-side mirror.locator_allocate_lockset/_reallocate_lockset/_free_lockset/locator_pack_lockset/locator_unpack_lockset/locator_allocate_and_unpack_lockset.locator_allocate_lockhint/_reallocate_lockhint/_free_lockhint/locator_pack_lockhint/locator_unpack_lockhint/locator_allocate_and_unpack_lockhint.locator_initialize_areas/locator_free_areas— module-wide free lists.locator_make_oid_set/_clear_oid_set/_free_oid_set/_add_oid_set/_get_packed_oid_set_size/_pack_oid_set/_unpack_oid_set_to_new/_unpack_oid_set_to_exist.locator_manyobj_flag_is_set/_remove/_set— multi-update flag manipulation onLC_COPYAREA_MANYOBJS.
Position hints as of this revision
Section titled “Position hints as of this revision”The line column reflects positions observed when the doc was last
updated: and decays. If you land at a different definition, the
symbol name is authoritative; update the table on the way through.
| Symbol | File | Line |
|---|---|---|
enum LC_COPYAREA_OPERATION | locator.h | 107 |
enum LC_FETCH_VERSION_TYPE | locator.h | 179 |
struct lc_copyarea_oneobj | locator.h | 224 |
struct lc_copyarea_manyobjs | locator.h | 243 |
struct lc_copy_area | locator.h | 252 |
struct lc_lock_set | locator.h | 286 |
struct lc_lock_hint | locator.h | 329 |
struct lc_oidset | locator.h | 395 |
struct locator_mflush_cache | locator_cl.c | 69 |
struct locator_mflush_temp_oid | locator_cl.c | 61 |
locator_fetch_object | locator_cl.c | (varies) |
locator_fetch_class | locator_cl.c | (varies) |
locator_fetch_set | locator_cl.c | (varies) |
locator_mflush | locator_cl.c | 4435 |
locator_mflush_initialize | locator_cl.c | 3802 |
locator_mflush_force | locator_cl.c | 3995 |
locator_flush_class | locator_cl.c | 4890 |
locator_flush_instance | locator_cl.c | 5058 |
locator_all_flush | locator_cl.c | 5279 |
locator_initialize | locator_sr.c | 246 |
locator_finalize | locator_sr.c | 364 |
xlocator_reserve_class_names | locator_sr.c | 409 |
xlocator_find_class_oid | locator_sr.c | 1033 |
xlocator_assign_oid | locator_sr.c | 2043 |
xlocator_fetch | locator_sr.c | 2374 |
xlocator_fetch_all | locator_sr.c | 2772 |
xlocator_fetch_lockset | locator_sr.c | 3052 |
xlocator_fetch_all_reference_lockset | locator_sr.c | 3818 |
locator_check_foreign_key | locator_sr.c | 4023 |
locator_insert_force | locator_sr.c | 4938 |
locator_delete_force | locator_sr.c | 6116 |
locator_delete_force_internal | locator_sr.c | 6172 |
locator_force_for_multi_update | locator_sr.c | 6543 |
xlocator_force | locator_sr.c | 7129 |
locator_attribute_info_force | locator_sr.c | 7461 |
locator_add_or_remove_index | locator_sr.c | 7695 |
locator_add_or_remove_index_internal | locator_sr.c | 7760 |
locator_update_index | locator_sr.c | 8260 |
locator_check_unique_btree_entries | locator_sr.c | 9768 |
xlocator_fetch_lockhint_classes | locator_sr.c | 11356 |
xlocator_assign_oid_batch | locator_sr.c | 11577 |
xlocator_check_fk_validity | locator_sr.c | 11754 |
locator_lock_and_get_object_internal | locator_sr.c | 12936 |
locator_lock_and_get_object_with_evaluation | locator_sr.c | 13100 |
locator_get_object | locator_sr.c | 13241 |
locator_lock_and_get_object | locator_sr.c | 13352 |
locator_allocate_copy_area_by_length | locator.c | (varies) |
locator_pack_copy_area_descriptor | locator.c | (varies) |
locator_pack_lockset | locator.c | (varies) |
locator_pack_lockhint | locator.c | (varies) |
locator_pack_oid_set | locator.c | (varies) |
locator_sr.c is ≈ 14 000 lines and locator_cl.c is ≈ 7 100
lines; symbol-level git grep is the recommended lookup.
Cross-check Notes
Section titled “Cross-check Notes”This document was written without raw analysis materials, so the cross-checks below are against the other CUBRID code-analysis docs that mention the locator. They are paraphrases of the claims in those docs followed by what this reading confirmed or refined.
-
cubrid-heap-manager.mdsays “locator_insert/update/delete_forceare the primary callers ofheap_*_logical.” Confirmed.locator_insert_forcecallsheap_create_insert_contextthenheap_insert_logical;locator_delete_force_internalcallsheap_delete_logical;locator_update_forcecallsheap_update_logical. The claim is precise — there are very few other call sites of the heap logicals, and those (e.g., recovery-time replays, vacuum) are special by design. -
cubrid-ha-replication.mdwalkslocator_attribute_info_force → heap_*_logical → btree_update → repl_log_insert. Confirmed structurally. The replication record is built insidelocator_insert_force/locator_update_force/locator_delete_force_internalafter the heap operation succeeds and after the index updates, so the replication record always describes the post-state. Therepl_infoargument threads throughlocator_attribute_info_forceand is the choice point:REPL_INFO_TYPE_RBR_NORMALfor row-based,REPL_INFO_TYPE_STMT_NORMALfor statement-based (now rare), andREPL_INFO_TYPE_RBR_AT_LEAST_ONE_RECORDfor the multi-row case. -
cubrid-lock-manager.mdsays “locks are acquired through locator paths”. Refined. The lock acquisition site depends on who initiated the operation:- Executor-driven path (e.g., UPDATE WHERE): the X-lock is
taken during the SELECT-with-WHERE phase by
locator_lock_and_get_object(called fromscan_manager.c), andlocator_attribute_info_forceis then called withneed_locking=false. - Workspace-driven path (client flushes a previously-fetched
object the server has not seen X-locked yet): the X-lock is
taken inside
locator_attribute_info_forceitself, in theLC_FLUSH_UPDATEbranch’slocator_lock_and_get_objectcall, withneed_locking=true. - Force-update-in-place path (catalog updates, recovery):
force_update_inplaceisUPDATE_INPLACE_OLD_MVCCIDor similar; the function usesheap_get_last_versionwhich does not consult MVCC and does not lock. The caller is responsible for upstream locking. Either way, the locator is the lens through which row locks are taken.
- Executor-driven path (e.g., UPDATE WHERE): the X-lock is
taken during the SELECT-with-WHERE phase by
-
cubrid-mvcc.mdsays “MVCC headers are stamped bylocator_*flows”. Confirmed. The MVCC header (mvcc_rec_header) is stamped inside the heap’sheap_insert_logical/heap_update_logical/heap_delete_logical, which are called only from the locator’s force functions on the DML path. The one MVCC field the locator touches directly is the inserter MVCCID for newly-allocated heap rows, via themvcc_rec_header[2]array passed down throughlocator_add_or_remove_index_internalto the B-tree (so unique checks see the right inserter). The cross-reference is precise. -
cubrid-page-buffer-manager.mdmentionsPGBUF_WATCHERin the context of multi-page latch ordering. The locator honors this contract: the force functions accept aPGBUF_WATCHER *home_hint_pand pass it toheap_insert_logical, which uses it to keep the home page latched between the slot search and the row write. This is the page-buffer-side reason the bulk insert path is fast — the locator forwards the watcher rather than refixing the page on each call.
Open Questions
Section titled “Open Questions”-
What is the actual cost of the bidirectional
LC_COPYAREAlayout? The layout grows row bodies forward and descriptors backward, meeting in the middle. Thelocator_mflush_reallocate_copy_areapath triggers when they collide. Empirically, what is the distribution of “did we have to reallocate” vs “did we fit” under typical OLTP workloads? Investigation: instrument reallocations under a large write workload; revisit the defaultCOPYAREAsize. -
Why are there
*_PRUNEand*_PRUNE_VERIFYvariants of every flush operation?LC_FLUSH_INSERT_PRUNEtriggers partition-pruning on the server;_PRUNE_VERIFYadds an extra “did we end up in the right partition?” check. The verify variant looks defensive against client/server schema-version skew. Investigation: trace where each variant is produced on the client (locator_mem_to_diskand friends) and document the trigger conditions. -
Multi-update flag semantics.
IS_MULTI_UPDATE,START_MULTI_UPDATE,END_MULTI_UPDATEride on theLC_COPYAREA_MANYOBJSheader. The server’slocator_force_for_multi_updatekeys offIS_MULTI_UPDATE, but the START/END pair appears to bracket unique-statistics gathering across multipleLC_COPYAREAs of the same logical UPDATE. Investigation: find the producer and consumer of START_MULTI_UPDATE in the unique-stats path (btree_unique_stats). -
The classnames hash (
locator_Mht_classnames) is module-scope and implicitly singleton. Resizing semantics under heavy DDL? Investigation: check whether the hash is bounded in size and what eviction policy (if any) applies. -
LC_FETCH_VERSION_TYPEcovers four values, butLC_FETCH_CURRENT_VERSION_NO_CHECKis described in code comments as “skip server-side checks”. Which checks? Tracingxlocator_fetch: it setsskip_fetch_version_type_check = trueand then treats_NO_CHECKlike_CURRENT. The “checks” in question seem to be theassert (lock_get_object_lock(...) != NULL_LOCK)precondition — i.e., the caller is asserting it already holds the lock. Worth a more careful audit of every consumer. -
The
locator_force(client) vsxlocator_force(server) naming convention. The pattern is consistent:locator_*is client-side or shared,xlocator_*is the server-callable wire entry (thexprefix denotes “external/wire entry” in CUBRID’s convention). Worth confirming this is the only prefix rule. -
Workspace decache eviction policy. The
decacheflag onLOCATOR_MFLUSH_CACHEand onlocator_flush_and_decache_instancelets a flush also drop the MOP from the cache. When does the upper layer use this? Cache-pressure-driven evictions seem to live inwork_space.citself; tracing the call sites would clarify the contract. -
The
xlocator_fetch_all_reference_locksetreference-closure path. Following references transitively can blow up under wide schemas. What stops it? Investigation: readlocator_all_reference_locksetcarefully; identify theprune_levelandquit_on_errorsinterplay.
Sources
Section titled “Sources”CUBRID source (under /data/hgryoo/references/cubrid/)
Section titled “CUBRID source (under /data/hgryoo/references/cubrid/)”src/transaction/locator.h— public types and macros (LC_COPYAREA,LC_COPYAREA_ONEOBJ,LC_COPYAREA_MANYOBJS,LC_LOCKSET,LC_LOCKHINT,LC_OIDSET,LC_COPYAREA_OPERATION,LC_FETCH_VERSION_TYPE).src/transaction/locator.c— wire packing (locator_pack_copy_area_descriptor,locator_pack_lockset,locator_pack_lockhint,locator_pack_oid_set), area allocation / free-list management.src/transaction/locator_cl.h— client-side public API (locator_fetch_*,locator_flush_*,locator_add_*,locator_remove_*,locator_update_*).src/transaction/locator_cl.c— workspace bridge (LOCATOR_MFLUSH_CACHE,locator_mflush*,locator_cache*,locator_lock*,locator_force).src/transaction/locator_sr.h— server-side public API (locator_attribute_info_force,locator_get_object,locator_lock_and_get_object,locator_add_or_remove_index,locator_update_index,locator_check_class*).src/transaction/locator_sr.c— server-side fan-in (xlocator_force,xlocator_fetch*, force family, constraint orchestration, OID lifecycle, classnames hash, reference-closure fetch).
Cross-reference docs in this knowledge base
Section titled “Cross-reference docs in this knowledge base”cubrid-heap-manager.md— heap_insert/update/delete_logical and the slotted-page substrate the locator drives.cubrid-mvcc.md—mvcc_rec_headerstamping and the visibility predicate the locator’s reads consult.cubrid-lock-manager.md— lock modes and acquisition points threaded throughlocator_lock_and_get_object.cubrid-page-buffer-manager.md—PGBUF_WATCHERchains the force family preserves across pages.cubrid-btree.md— B-tree primitives invoked bylocator_add_or_remove_index_internal.cubrid-ha-replication.md— HA replication record produced insidelocator_attribute_info_force.cubrid-catalog-manager.md— root-class and per-class catalog the classnames hash backs.
Textbook chapters (under knowledge/research/dbms-general/)
Section titled “Textbook chapters (under knowledge/research/dbms-general/)”- Database Internals (Petrov), Ch. 3 “File Formats” and Ch. 4 “Implementing B-Trees” — RID/OID/TID semantics, identity under compaction.
- Database Systems: The Complete Book (Garcia-Molina, Ullman, Widom), §10.6 “Object-Oriented Database Systems” — workspace pattern, MOP vs persistent identity.
- Database System Concepts (Silberschatz, Korth, Sudarshan, 6th ed.), Ch. 13 “Storage and File Structure” and Ch. 17 “Database System Architectures” — client/server boundary, bulk-fetch motivation.