Skip to content

CUBRID Locator — OID Workspace, Bulk Fetch/Flush, and the Server-Side Insert/Update/Delete Bridge

Contents:

A relational engine that touches the disk has to bridge two very different vocabularies. The lower layers — heap manager, B-tree, page buffer, log — speak in physical addresses: a record is a slot on a page in a file on a volume, identified by an OID (volid, pageid, slotid) (CUBRID), a TID (blocknumber, offsetnumber) (PostgreSQL), or a ROWID (Oracle). The upper layers — the executor, the schema operations, the catalog, the network protocol that talks to clients — speak in logical objects: rows with column values, classes with attributes, instances with identities. Something has to translate, and that something is what CUBRID calls the locator.

The textbook problem the locator solves is, in Database Internals (Petrov, Ch. 3 “File Formats” and Ch. 4 “Implementing B-Trees”), called object identity: a way to name a row that survives compaction of its containing page, that survives moves between pages of the same file (forwarding pointers), and that the index layer can embed in its leaves so a B-tree lookup terminates at exactly one heap slot. The OID is the artifact; the locator is the layer that creates OIDs (when a new record is inserted), resolves OIDs (when a fetch needs to read the row body), and mutates the row at a known OID (when an UPDATE or DELETE is applied). Stonebraker’s POSTGRES (1986) and the EXODUS storage manager (Carey & DeWitt, 1986) introduced the canonical shape of this layer in object-oriented databases — they called it the object manager — and although CUBRID is a relational engine today, its lineage as a hybrid object/relational system shows in the name.

Two textbook ingredients complete the picture:

  1. The workspace pattern. Database Systems: The Complete Book (Garcia-Molina, Ullman, Widom), §10.6 “Object-Oriented Database Systems”, describes the workspace as the in-memory cache of objects an application has touched. Reads pull objects into the workspace; writes mark them dirty; commit flushes the dirty set back to disk in a single batch. The workspace pattern was the central design choice of object stores like ObjectStore and GemStone, and it is the reason CUBRID’s client still carries a Memory Object Pointer (MOP) ↔ OID map even though most modern relational clients do not.

  2. Bulk-fetch / bulk-flush vs per-row APIs. A relational client that touches N rows and round-trips to the server N times pays N × RTT. A workspace-based client pays roughly one RTT per transaction: at commit, the entire dirty set is packed into a single buffer and sent to the server. The buffer (CUBRID’s LC_COPYAREA) carries a header describing N objects plus the N row bodies concatenated, and the server unpacks and dispatches per-OID inside one transaction-scoped top-op. The same shape appears in distributed transaction monitors (X/Open XA), in bulk-load APIs (PostgreSQL COPY, MySQL LOAD DATA), and in ORMs that “session.flush()” before commit.

This document tracks how CUBRID realizes both pieces — the object workspace (locator_cl.c) and the server-side fan-in (locator_sr.c) — and how the network shape (LC_COPYAREA packed in locator.c) ties them together.

The textbook gives the model; this section names the engineering conventions that almost every row-oriented engine adopts in some form. CUBRID’s specific choices in ## CUBRID's Approach are best read as one set of dials within this shared design space.

Every DBMS has three layers that need to agree on what “a row” is:

  • The object/row layer (executor, parser, type-checker, catalog) speaks in fully decoded values — DB_VALUEs, PT_NODEs, RECDESs with column offsets parsed.
  • The storage layer (heap, btree, page buffer) speaks in raw byte arrays plus OIDs — it knows where the bytes live but not what they mean.
  • The cross-cutting services (lock manager, MVCC, log, vacuum, HA replication, foreign-key checker) speak in OIDs plus class metadata.

A canonical insert has to: (1) ask the heap to find a page, (2) allocate an OID, (3) lock the OID exclusively, (4) write the row bytes, (5) update every affected B-tree, (6) check unique, (7) check foreign keys, (8) write a log record per page touched, (9) generate a replication record if HA is on, (10) bump catalog statistics. None of those layers should know how to do all the others’ jobs. They need a conductor. CUBRID’s locator is that conductor.

In PostgreSQL, the same role is split across heap_insert, heap_update, heap_delete, plus ExecInsertIndexTuples and the ON-CONFLICT machinery. The executor calls these directly; there is no client-side workspace in the modern (post-Berkeley) Postgres — client tuples are sent on the wire as ASCII or binary parameters of a Bind/Execute message and the server materializes them into a HeapTuple inside the buffer pool. PostgreSQL’s executor therefore is the locator; the dispatch is implicit.

InnoDB and the other storage engines hide behind ha_innobase (and the abstract handler class). Rows enter and leave through ha_write_row, ha_update_row, ha_delete_row. There is a ha_bulk_update_row for batched updates, but the contract is per-table-handle, not per-transaction. A workspace exists in the form of the Field* array decoded from the row buffer, but it is per-statement, not per-transaction.

Oracle: row sources + dirty-buffer write-behind

Section titled “Oracle: row sources + dirty-buffer write-behind”

Oracle’s executor produces row sources that pipe into a DML operator, which calls kdusru / kdusrf for the actual mutation. The buffer cache is dirtied per row; a background DBWn writes behind. Fetch is per-row through the row source; the bulk path is PL/SQL’s FORALL … BULK COLLECT, which is a syntactic form, not a storage layer.

CUBRID: explicit client workspace + server-side fan-in

Section titled “CUBRID: explicit client workspace + server-side fan-in”

CUBRID inherits its workspace from CUBRID’s roots as an object-relational system (UniSQL, the predecessor, was an OODBMS). Even after the system became fully relational, the client kept the workspace because:

  • The same protocol serves the C API (db_* functions in src/compat/) where the application’s objects live in process memory — a MOP is a pointer the application holds long-term.
  • The catalog itself is read as objects via the locator (the root class and per-class system records are MOPs), so the workspace mechanism doubles as the catalog cache.
  • Stand-alone mode (SA_MODE) compiles the client and the server into the same process; the workspace is the boundary between them.

On the server, the dual is locator_sr.c’s force familylocator_attribute_info_force is the canonical entry that the executor (qexec_execute_*), the trigger machinery, the schema manipulator, the type-checker, and the partition-pruner all eventually call. It dispatches to locator_insert_force / locator_update_force / locator_delete_force, and those drive heap, btree, FK, unique-check, replication, and log in the right order.

TheoryCUBRID name
Per-record identifierOID = (volid, pageid, slotid)
Memory pointer to an objectMOP (Memory Object Pointer) — opaque handle
MOP → OID mappingworkspace hash (ws_* API; ws_oid (mop))
Transient (un-persisted) identifier”temp OID” — OID_ISTEMP (oid)
Workspace dirty listws_* dirty list scanned by locator_mflush
Bulk fetch bufferLC_COPYAREA (locator.h)
Bulk fetch requestLC_LOCKSET / LC_LOCKHINT
Per-object descriptor in the bufferLC_COPYAREA_ONEOBJ
Workspace-mflush stagingLOCATOR_MFLUSH_CACHE (locator_cl.c)
Client fetch entrylocator_fetch_object / _class / _instance / _set
Client flush entrylocator_flush_class / _instance / _all_instances / locator_force
Server fetch entryxlocator_fetch / _lockset / _lockhint_classes
Server force entry (transport)xlocator_force
Server canonical DML entrylocator_attribute_info_force
Server per-op forcelocator_insert_force / locator_update_force / locator_delete_force
Server constraint orchestrationlocator_add_or_remove_index / locator_update_index / locator_check_foreign_key
Server snapshot-aware readlocator_get_object / locator_lock_and_get_object
Catalog (class) lookupxlocator_find_class_oid
Pre-mint of OIDsxlocator_assign_oid / xlocator_assign_oid_batch

CUBRID’s locator has three faces. On the client side (locator_cl.c), it is a workspace + bulk-flush coordinator: it maintains the MOP-to-OID map, watches a workspace dirty list, packs dirty objects into an LC_COPYAREA, and ships the buffer to the server. On the server side (locator_sr.c), it is the DML fan-in: every insert/update/delete in the system, regardless of who originated it, comes through locator_attribute_info_forcelocator_{insert,update,delete}_force, which drive heap + btree + FK + unique + log + replication in the right order. In between (locator.c + locator.h), it is a protocol layer: the LC_COPYAREA, LC_LOCKSET, and LC_LOCKHINT structs are the on-wire shape that serializes the workspace.

The distinguishing choices are: (1) the workspace is explicit — not folded into the catalog cache as in PostgreSQL — and survives across transactions; (2) bulk flush packs all dirty objects of a transaction into one buffer, dispatched server-side under one top-op for atomicity; (3) the canonical server-side entry is one function (locator_attribute_info_force) with a switch on LC_COPYAREA_OPERATION rather than three independent code paths, which keeps cross-cutting work (lock acquisition, snapshot reading, class-OID resolution, partition pruning) in exactly one place.

flowchart LR
  A["application:\nINSERT/UPDATE/DELETE"] --> B["compat layer\n(db_∗)"]
  B --> C["workspace:\nfind/create MOP,\nmark dirty"]
  C --> D{"commit?"}
  D -- "no" --> E["return MOP\nto application"]
  D -- "yes (or explicit flush)" --> F["locator_mflush\n(workspace traversal)"]
  F --> G["LC_COPYAREA packing\n(LC_COPYAREA_MANYOBJS\n· N × LC_COPYAREA_ONEOBJ\n· row bodies)"]
  G --> H["wire: net_client_request_recv_copyarea"]
  H --> I["server: xlocator_force\n(transport entry)"]
  I --> J["per-object dispatch:\nlocator_attribute_info_force\nor inline switch"]
  J --> K{"operation?"}
  K -- "INSERT" --> KI["locator_insert_force"]
  K -- "UPDATE" --> KU["locator_update_force"]
  K -- "DELETE" --> KD["locator_delete_force"]
  KI --> L["heap_insert_logical"]
  KU --> L2["heap_update_logical"]
  KD --> L3["heap_delete_logical"]
  L --> M["btree_update\nlocator_add_or_remove_index"]
  L2 --> M2["locator_update_index\n(diff old/new attr_info)"]
  L3 --> M
  M --> N["btree_check_unique\nFK check\nrepl_log_insert (HA)\nlog_append (WAL)"]
  M2 --> N
  L --> N
  L2 --> N
  L3 --> N
  N --> Z["LC_COPYAREA\nreturned with\nfinal OIDs"]
  Z --> ZZ["client: ws_update_oid_and_class\n(temp OID → perm OID)"]

Each labeled box is unpacked in the subsections below. Note that the executor path (server-side query execution that produces an INSERT INTO … SELECT or UPDATE … WHERE) does not go through xlocator_force — it builds attr_info structures locally on the server and calls locator_attribute_info_force directly. The xlocator_force entry is for the client-driven path (workspace flush). Both paths converge at locator_attribute_info_forcelocator_*_force.

The workspace (“ws”) is implemented in src/object/work_space.c with locator_cl.c as the bridge that ferries objects between the workspace and the server. The workspace’s data structure is a hash of MOPs:

// MOP — src/object/work_space.h (sketch)
struct db_object // typedef MOP
{
OID oid; /* server OID; OID_ISTEMP until flushed */
MOP class_mop; /* MOP of the class object */
void *object; /* in-memory decoded object (MOBJ) */
unsigned dirty:1; /* needs flush */
unsigned deleted:1; /* logical delete */
unsigned no_objects:1; /* class with no instances cached */
/* ... */
};

A MOP is the application’s long-lived handle — held across statements, returned from queries, used to navigate from one row to another. The locator translates between MOP and OID at the boundary.

Fetch — pulling objects from server to workspace

Section titled “Fetch — pulling objects from server to workspace”

Public entry points (locator_cl.h):

// locator_cl.h — fetch entries
extern MOBJ locator_fetch_object (MOP mop, DB_FETCH_MODE purpose,
LC_FETCH_VERSION_TYPE fetch_version_type);
extern MOBJ locator_fetch_class (MOP class_mop, DB_FETCH_MODE purpose);
extern MOBJ locator_fetch_class_of_instance (MOP inst_mop, MOP *class_mop,
DB_FETCH_MODE purpose);
extern MOBJ locator_fetch_instance (MOP mop, DB_FETCH_MODE purpose,
LC_FETCH_VERSION_TYPE fetch_version_type);
extern MOBJ locator_fetch_set (int num_mops, MOP *mop_set,
DB_FETCH_MODE inst_purpose,
DB_FETCH_MODE class_purpose, int quit_on_errors);
extern MOBJ locator_fetch_nested (MOP mop, DB_FETCH_MODE purpose,
int prune_level, int quit_on_errors);

The five entries differ in scope: _object is one MOP, _class is a class MOP, _class_of_instance resolves the class given an instance MOP, _set is a vector of MOPs, _nested follows attribute references to a configurable depth. All five end up in locator_lock (for single MOP) or locator_lock_set (vector), which builds an LC_LOCKSET and round-trips to the server’s xlocator_fetch_lockset.

The reason for the vector form is prefetch. Reading one MOP that references many other MOPs (a class with many indexes pointing at many instances) and round-tripping per MOP would be N × RTT. With fetch_set, the workspace asks the server “I will need all these MOPs; please send the lot back in one buffer.” The server replies with one LC_COPYAREA containing many LC_COPYAREA_ONEOBJ descriptors plus the bodies.

flowchart LR
  WS["workspace"] -->|"miss\non N MOPs"| FS["locator_fetch_set(N, [mop_1..mop_N])"]
  FS --> LS["build LC_LOCKSET\n(N reqobjs)"]
  LS --> NET["wire: net_client_request_2recv_copyarea"]
  NET --> SRV["server: xlocator_fetch_lockset"]
  SRV --> HEAP["heap_get_visible_version × N"]
  HEAP --> CA["pack LC_COPYAREA\nwith N ONEOBJs"]
  CA --> NET2["wire: reply"]
  NET2 --> CACHE["locator_cache:\nfor each ONEOBJ,\nwrite into MOP,\nclear dirty bit,\nset chn"]
  CACHE --> DONE["all N MOPs\nin workspace"]

Mflush — packing dirty objects for flush

Section titled “Mflush — packing dirty objects for flush”

The flush path is the output dual of fetch. Its core data structure:

// LOCATOR_MFLUSH_CACHE — src/transaction/locator_cl.c
struct locator_mflush_cache
{
LC_COPYAREA *copy_area; /* staging buffer */
LC_COPYAREA_MANYOBJS *mobjs; /* N-objects descriptor */
LC_COPYAREA_ONEOBJ *obj; /* current ONEOBJ slot */
LOCATOR_MFLUSH_TEMP_OID *mop_toids; /* MOPs whose OID is temp */
LOCATOR_MFLUSH_TEMP_OID *mop_uoids; /* MOPs being repartitioned */
MOP mop_tail_toid;
MOP mop_tail_uoid;
MOP class_mop; /* class of last mflushed obj */
MOBJ class_obj; /* its decoded class */
HFID *hfid; /* its heap */
RECDES recdes; /* current record body */
bool decache; /* drop after flush */
bool isone_mflush; /* single-object mflush */
};

The flush is driven by a map over the workspace’s dirty list:

stateDiagram-v2
  [*] --> UNFETCHED : workspace miss
  UNFETCHED --> FETCHED : locator_fetch_*
  FETCHED --> DIRTY    : locator_update_instance / add_instance / remove_instance
  DIRTY --> FLUSHING   : ws_map_dirty + locator_mflush
  FLUSHING --> FLUSHED : locator_mflush_force succeeds
  FLUSHED --> FETCHED  : on next operation
  FLUSHED --> FREED    : decache or process exit
  DIRTY --> FREED      : explicit decache (rare)
  FETCHED --> FREED    : eviction

The packing loop, in locator_mflush:

  1. For each dirty MOP, compute its LC_COPYAREA_ONEOBJ descriptor: operation ∈ {LC_FLUSH_INSERT, LC_FLUSH_UPDATE, LC_FLUSH_DELETE}; flag carries LC_FLAG_HAS_INDEX, LC_FLAG_HAS_UNIQUE_INDEX, LC_FLAG_TRIGGER_INVOLVED, LC_FLAG_UPDATED_BY_ME; hfid is the class’s heap; class_oid the class OID; oid the row OID (possibly temp); length and offset describe where in the buffer the row body lives.
  2. The row body is encoded by locator_mem_to_disk (instance) or locator_class_to_disk (class), which calls into the schema / primitive layer to serialize the in-memory object into a raw RECDES.
  3. If the buffer overflows, locator_mflush_force is called now to drain the current contents to the server, then the buffer is reset and the loop continues with the overflowing object.
  4. If the MOP has a temp OID, it is recorded in mop_toids so that after the server replies with the permanent OID assigned during the flush, the workspace can be patched (ws_update_oid_and_class).

The wire shape, defined in locator.h:

// LC_COPYAREA — src/transaction/locator.h
struct lc_copy_area
{
char *mem; /* the buffer */
int length; /* size */
};
// LC_COPYAREA_MANYOBJS — at the END of the buffer, growing backward
struct lc_copyarea_manyobjs
{
LC_COPYAREA_ONEOBJ objs; /* first object descriptor */
int multi_update_flags; /* IS / START / END_MULTI_UPDATE */
int num_objs;
};
// LC_COPYAREA_ONEOBJ — one per object, packed N-wise at end
struct lc_copyarea_oneobj
{
LC_COPYAREA_OPERATION operation; /* LC_FLUSH_INSERT/UPDATE/DELETE/etc */
int flag; /* LC_FLAG_HAS_INDEX | ... */
HFID hfid; /* heap file id */
OID class_oid; /* class OID */
OID oid; /* row OID (may be temp) */
int length;
int offset; /* offset of body in buffer */
};

The buffer is laid out bidirectionally: row bodies grow from the front of mem, and LC_COPYAREA_ONEOBJ descriptors grow backward from the end (anchored at LC_COPYAREA_MANYOBJS). A run of macros in locator.h walks the descriptors:

// locator.h — descriptor walk macros
#define LC_MANYOBJS_PTR_IN_COPYAREA(copy_areaptr) \
((LC_COPYAREA_MANYOBJS *) ((char *)(copy_areaptr)->mem \
+ (copy_areaptr)->length \
- DB_SIZEOF(LC_COPYAREA_MANYOBJS)))
#define LC_START_ONEOBJ_PTR_IN_COPYAREA(manyobjs_ptr) (&(manyobjs_ptr)->objs)
#define LC_NEXT_ONEOBJ_PTR_IN_COPYAREA(oneobj_ptr) ((oneobj_ptr) - 1)

The descriptors point at row bodies via obj->offset; the body’s length is obj->length. Because both ends grow toward the middle and meet at the watermark mflush->recdes.data, packing is bounded by available buffer space without two passes.

flowchart LR
  subgraph CA["LC_COPYAREA buffer"]
    direction LR
    HEAD["row body 0\nrow body 1\nrow body 2"]
    GAP["… free gap …"]
    DESC2["ONEOBJ 2"]
    DESC1["ONEOBJ 1"]
    DESC0["ONEOBJ 0"]
    META["LC_COPYAREA_MANYOBJS\n(num_objs, flags)"]
    HEAD --> GAP --> DESC2 --> DESC1 --> DESC0 --> META
  end
// locator_cl.h — flush entries
extern int locator_flush_class (MOP class_mop);
extern int locator_flush_instance (MOP mop);
extern int locator_flush_all_instances (MOP class_mop, bool decache);
extern int locator_flush_for_multi_update (MOP class_mop);
extern int locator_all_flush (void);

locator_flush_instance is the explicit call when the application or the upper-layer code wants to make an in-memory change visible before commit. locator_flush_class and _all_instances are broader sweeps. locator_all_flush is what the commit path calls — it walks every workspace partition and pushes everything dirty. locator_flush_for_multi_update is the special path for UPDATE statements that may produce multiple updates per row (triggers, FK cascades) and needs the START_MULTI_UPDATE / END_MULTI_UPDATE markers in LC_COPYAREA_MANYOBJS.multi_update_flags.

Internally they all funnel into locator_mflush_initializews_map_dirty(locator_mflush, mflush)locator_mflush_forcelocator_force (the wire send), which calls net_client_request_recv_copyarea to the server’s xlocator_force.

An OID’s life has three stages: temp (workspace-allocated, not yet known to the server), assigned (server has bound it to a heap slot), resolved (server has confirmed the row body).

When db_create is called for a new instance, the workspace mints a temp OIDOID_ISTEMP returns true; the value is a sentinel that is not a real (volid, pageid, slotid) tuple. The MOP is inserted into the dirty list with operation LC_FLUSH_INSERT. No server contact yet.

At flush time, the server’s locator_insert_force calls heap_insert_logical, which (via the heap manager, see cubrid-heap-manager.md) finds a target page, allocates a slot, and that slot id becomes the permanent OID. The new OID is written back into the LC_COPYAREA_ONEOBJ.oid field of the reply buffer; on reply, locator_mflush_force’s post-processing walks mop_toids and calls ws_update_oid_and_class to remap the MOP.

For the rare case where an OID needs to be known before the row body is written — a catalog entry that needs to reference itself — there is xlocator_assign_oid:

// xlocator_assign_oid — src/transaction/locator_sr.c
int
xlocator_assign_oid (THREAD_ENTRY *thread_p, const HFID *hfid, OID *perm_oid,
int expected_length, OID *class_oid, const char *classname)
{
if (heap_assign_address (thread_p, hfid, class_oid, perm_oid, expected_length)
!= NO_ERROR)
return ER_FAILED;
if (classname != NULL)
locator_permoid_class_name (thread_p, classname, perm_oid);
return NO_ERROR;
}

heap_assign_address allocates a slot containing only a REC_ASSIGN_ADDRESS placeholder (see cubrid-heap-manager.md). The OID exists; the row body comes later.

For batches — the bulk catalog case where a CREATE TABLE creates many catalog rows in one shot — xlocator_assign_oid_batch (driven by LC_OIDSET / LC_CLASS_OIDSET from locator.h) does the same for many OIDs in one round trip.

// LC_FETCH_VERSION_TYPE — src/transaction/locator.h
typedef enum {
LC_FETCH_CURRENT_VERSION = 0x01, /* latest committed, no lock */
LC_FETCH_MVCC_VERSION = 0x02, /* visible to my snapshot */
LC_FETCH_DIRTY_VERSION = 0x03, /* updatable: S-lock + dirty */
LC_FETCH_CURRENT_VERSION_NO_CHECK = 0x04, /* skip server-side checks */
} LC_FETCH_VERSION_TYPE;

The version-type knob is the lock + visibility policy of a fetch. The header has a long comment explaining which is right when:

  • MVCC version for SELECT reads — no lock, snapshot visibility, “reader does not block writer”.
  • Dirty version for SELECT FOR UPDATE and existence checks — takes an S-lock, returns the latest committed version even if the snapshot would not have seen it; the lock prevents concurrent delete.
  • Current version when the caller already holds X-lock — saves the lock-acquisition cost; reads the latest committed version without further checks.

The server’s xlocator_fetch switches on fetch_version_type to build the right MVCC_SNAPSHOT and pass it to the heap layer.

The “force” family — server-side fan-in

Section titled “The “force” family — server-side fan-in”

The force family is where the locator stops being a transport layer and starts being a conductor. The canonical entry is locator_attribute_info_force:

// locator_attribute_info_force — src/transaction/locator_sr.c (signature)
int
locator_attribute_info_force (THREAD_ENTRY *thread_p,
const HFID *hfid, OID *oid,
HEAP_CACHE_ATTRINFO *attr_info,
ATTR_ID *att_id, int n_att_id,
LC_COPYAREA_OPERATION operation, int op_type,
HEAP_SCANCACHE *scan_cache, int *force_count,
bool not_check_fk, REPL_INFO_TYPE repl_info,
int pruning_type, PRUNING_CONTEXT *pcontext,
FUNC_PRED_UNPACK_INFO *func_preds,
MVCC_REEV_DATA *mvcc_reev_data,
UPDATE_INPLACE_STYLE force_update_inplace,
RECDES *rec_descriptor, bool need_locking);

The signature alone gives away the responsibilities. The function takes an attribute-info bundle (HEAP_CACHE_ATTRINFO, see cubrid-heap-manager.md’s “AttrInfo cache”) and an LC_COPYAREA_OPERATION and dispatches based on the operation. The body is a switch (operation):

// locator_attribute_info_force — body sketch
switch (operation)
{
case LC_FLUSH_UPDATE:
case LC_FLUSH_UPDATE_PRUNE:
case LC_FLUSH_UPDATE_PRUNE_VERIFY:
/* (1) Read the existing row using the right MVCC discipline */
if (HEAP_IS_UPDATE_INPLACE (force_update_inplace) || !need_locking)
scan = heap_get_last_version (thread_p, &context);
else
scan = locator_lock_and_get_object (thread_p, oid, &class_oid,
&copy_recdes, scan_cache, X_LOCK,
COPY, NULL_CHN, LOG_ERROR_IF_DELETED);
old_recdes = &copy_recdes;
/* fallthrough */
case LC_FLUSH_INSERT:
case LC_FLUSH_INSERT_PRUNE:
case LC_FLUSH_INSERT_PRUNE_VERIFY:
/* (2) Encode attr_info + (for UPDATE) old_recdes into a new RECDES */
copyarea = locator_allocate_copy_area_by_attr_info (thread_p, attr_info,
old_recdes, &new_recdes,
-1, LOB_FLAG_INCLUDE_LOB);
if (LC_IS_FLUSH_INSERT (operation))
error_code = locator_insert_force (thread_p, &class_hfid, &class_oid,
oid, &new_recdes, true, op_type,
scan_cache, force_count, pruning_type,
pcontext, func_preds,
UPDATE_INPLACE_NONE, NULL, false, false);
else /* LC_FLUSH_UPDATE */
error_code = locator_update_force (thread_p, &class_hfid, &class_oid,
oid, old_recdes, &new_recdes,
has_index, att_id, n_att_id, op_type,
scan_cache, force_count, not_check_fk,
repl_info, pruning_type, pcontext,
mvcc_reev_data, force_update_inplace,
need_locking);
break;
case LC_FLUSH_DELETE:
error_code = locator_delete_force (thread_p, &class_hfid, oid, true,
op_type, scan_cache, force_count,
mvcc_reev_data, need_locking);
break;
}

Three things to internalize:

  1. The UPDATE path falls through into the INSERT path. That is the C [[fallthrough]] you can see in the source. UPDATE is “read old + encode new + apply” — INSERT is just “encode new + apply”. Sharing the encoding step keeps them honest.

  2. Locking happens here, not in the heap. Whether the row gets X-locked depends on need_locking and force_update_inplace. For ordinary executor-driven UPDATE / DELETE, the row was already X-locked during the SELECT phase that drove the predicate (the executor calls locator_lock_and_get_object with X_LOCK for S_DELETE / S_UPDATE); the force path skips the lock. For the workspace-driven case (client flushed an object the server has not seen X-locked yet), locator_lock_and_get_object inside the force is what acquires it. This is the design reason the lock manager analysis (cubrid-lock-manager.md) says “locks flow through the locator”.

  3. Snapshot is consulted for UPDATE/DELETE under MVCC, but not for INSERT. Inserts do not have an existing version to be visible against. The fall-through arrangement makes this structural rather than conditional.

locator_insert_force — what an insert touches

Section titled “locator_insert_force — what an insert touches”
// locator_insert_force — src/transaction/locator_sr.c (skeleton)
static int
locator_insert_force (THREAD_ENTRY *thread_p, HFID *hfid, OID *class_oid,
OID *oid, RECDES *recdes, int has_index, int op_type,
HEAP_SCANCACHE *scan_cache, int *force_count,
int pruning_type, PRUNING_CONTEXT *pcontext,
FUNC_PRED_UNPACK_INFO *func_preds,
UPDATE_INPLACE_STYLE force_in_place,
PGBUF_WATCHER *home_hint_p, bool has_BU_lock,
bool dont_check_fk, bool use_bulk_logging)
{
/* (1) Partition pruning — if the class is partitioned, choose
* the actual partition that will receive the row. */
if (pruning_type != DB_NOT_PARTITIONED_CLASS)
partition_prune_insert (...);
/* (2) Heap insert. The OID is decided by the slot the heap chose;
* the row is now physically present. */
recdes->type = REC_HOME;
heap_create_insert_context (&context, &real_hfid, &real_class_oid, recdes,
local_scan_cache);
context.update_in_place = force_in_place;
context.is_bulk_op = has_BU_lock;
context.use_bulk_logging = use_bulk_logging;
heap_insert_logical (thread_p, &context, home_hint_p);
COPY_OID (oid, &context.res_oid);
/* (3) Index updates — for every B-tree on the class, encode the
* key out of the new record and insert the (key, oid) pair.
* locator_add_or_remove_index does the per-index loop. */
if (has_index)
locator_add_or_remove_index (thread_p, recdes, oid, &real_class_oid,
/*is_insert=*/true, op_type, scan_cache,
/*datayn=*/true, /*need_replication=*/true,
&real_hfid, func_preds, has_BU_lock,
dont_check_fk);
/* (4) Foreign key checks — for every FK whose referencing column is
* an attribute of this class, look up the parent in the parent
* B-tree; error if not found. (locator_check_foreign_key does
* the per-FK loop.) */
if (!dont_check_fk)
locator_check_foreign_key (...);
/* (5) HA replication record (if HA enabled).
* (6) WAL log entry — implicit, written by heap_insert_logical /
* btree_insert as those primitives flush their own redo/undo. */
}

Step (3) is where the unique-check lives — btree_insert returns ER_BTREE_UNIQUE_FAILED if the key already exists in a unique B-tree, and locator_add_or_remove_index_internal propagates the error. Step (4) is where the FK existence lives — if the parent is missing, the server returns ER_FK_INVALID.

locator_update_force — diff-driven index updates

Section titled “locator_update_force — diff-driven index updates”

UPDATE is more interesting because it might not touch every index. If the user updated only column c5 and only one B-tree covers c5, the others should be left alone.

// locator_update_force flow (paraphrased; see locator_sr.c)
//
// (1) Read the existing record (already done by attr_info_force; old_recdes in hand).
// (2) Build new_recdes from attr_info.
// (3) Decide policy:
// - REC_HOME stays REC_HOME if size fits → in-place update (heap_update_logical).
// - Otherwise → may relocate or move to overflow (heap manager decides).
// (4) Index update loop:
// locator_update_index (new_recdes, old_recdes, att_id[], n_att_id, ...)
// For each btree on the class:
// if no att_id from att_id[] is part of this btree's key → SKIP
// else extract old_key from old_recdes, new_key from new_recdes
// if old_key != new_key:
// btree_delete (old_key, oid)
// btree_insert (new_key, oid) /* unique-check happens here */
// (5) FK checks for changed referencing keys.
// (6) HA replication + WAL.

The att_id[] / n_att_id arguments are the key. They tell the locator which columns the executor updated; the locator uses them to filter which B-trees need a touch. Without them, every UPDATE would re-evaluate every index. This is a non-trivial saving for wide tables with many indexes and narrow updates.

// locator_delete_force — src/transaction/locator_sr.c
int
locator_delete_force (THREAD_ENTRY *thread_p, HFID *hfid, OID *oid,
int has_index, int op_type, HEAP_SCANCACHE *scan_cache,
int *force_count, MVCC_REEV_DATA *mvcc_reev_data,
bool need_locking)
{
return locator_delete_force_internal (thread_p, hfid, oid, has_index,
op_type, scan_cache, force_count,
mvcc_reev_data, FOR_INSERT_OR_DELETE,
NULL, NULL, need_locking);
}

The for_moving variant (locator_delete_force_for_moving) is the partitioned-table case: an UPDATE that moves a row from partition A to partition B is implemented as delete from A + insert into B, and the delete side carries the new OID + new partition class so HA replication and trigger logic know it is a move, not a real deletion. Both variants share locator_delete_force_internal, which (1) reads the row to confirm the key, (2) calls locator_add_or_remove_index with is_insert=false to remove the keys from every covered B-tree, (3) calls heap_delete_logical which sets mvcc_del_id (no physical removal — see cubrid-mvcc.md), (4) writes the HA replication record, (5) the WAL entry is again implicit in the heap/btree primitives.

// locator_sr.c — index orchestration entries
extern int locator_add_or_remove_index (THREAD_ENTRY *thread_p, RECDES *recdes,
OID *inst_oid, OID *class_oid,
int is_insert, int op_type,
HEAP_SCANCACHE *scan_cache,
bool datayn, bool need_replication,
HFID *hfid,
FUNC_PRED_UNPACK_INFO *func_preds,
bool has_BU_lock, bool skip_checking_fk);
extern int locator_update_index (THREAD_ENTRY *thread_p,
RECDES *new_recdes, RECDES *old_recdes,
ATTR_ID *att_id, int n_att_id,
OID *oid, OID *class_oid,
int op_type, HEAP_SCANCACHE *scan_cache,
REPL_INFO *repl_info);

Both call into locator_add_or_remove_index_internal, which allocates a HEAP_CACHE_ATTRINFO index_attrinfo (the per-index decoder cache; see cubrid-heap-manager.md’s “AttrInfo cache”), then iterates over or_classrep->indexes[], the parsed list of B-trees from the class representation:

// locator_add_or_remove_index_internal — sketch
heap_attrinfo_start_with_index (thread_p, class_oid, NULL, &index_attrinfo,
&idx_info);
num_btids = idx_info.num_btids;
for (i = 0; i < num_btids; i++)
{
index = &index_attrinfo.last_classrepr->indexes[i];
btid = index->btid;
/* Skip indexes that are functional / partial / where filter excludes
this row. */
if (or_pred && !pred_eval (or_pred, recdes))
continue;
/* Compute the key. heap_attrvalue_get_key extracts the key
columns out of recdes using attr_info, encoding multi-column
keys via tp_value_string_to_key_value. */
key_dbvalue = heap_attrvalue_get_key (thread_p, i, &index_attrinfo,
recdes, btid, &dbvalue, ...);
/* Insert or delete the (key, inst_oid) pair. */
if (is_insert)
btree_insert (thread_p, btid, key_dbvalue, class_oid, inst_oid, ...,
&unique_pk, ...);
else
btree_delete (thread_p, btid, key_dbvalue, class_oid, inst_oid, ...,
&unique_pk, ...);
}

locator_check_unique_btree_entries is the integrity-check variant used by CHECKDB and the post-restore consistency pass — it walks every B-tree of every class and confirms that every leaf entry has a corresponding heap row, and that no two leaf entries map the same unique key to different OIDs.

Foreign keys are orchestrated by locator_check_foreign_key:

// locator_check_foreign_key — src/transaction/locator_sr.c (signature)
static int
locator_check_foreign_key (THREAD_ENTRY *thread_p, HFID *hfid, OID *class_oid,
OID *inst_oid, RECDES *recdes,
RECDES *new_recdes, bool *is_cached,
LC_COPYAREA **cache_attr_copyarea);

It walks the FK list on the class representation, extracts the referencing-column key, and probes the parent class’s PK B-tree via btree_keyoid_checks. On miss, the row insert or update is rejected with ER_FK_INVALID.

The catalog itself is reached via the locator. The “root class” is itself an OID (oid_Root_class_oid); every user class is an OID in the root class’s heap. When the executor needs to know the layout of T1, it asks for T1’s class OID and reads it as a record.

// xlocator_find_class_oid — src/transaction/locator_sr.c
extern LC_FIND_CLASSNAME xlocator_find_class_oid (THREAD_ENTRY *thread_p,
const char *classname,
OID *class_oid, LOCK lock);

This is the catalog lookup. It returns one of LC_CLASSNAME_EXIST / LC_CLASSNAME_DELETED / LC_CLASSNAME_ERROR. The mapping itself is held in a memory-hash table locator_Mht_classnames (file-scope in locator_sr.c), keyed by class name.

Two transient-name mechanisms stack on top:

  • Reservationxlocator_reserve_class_names / xlocator_reserve_class_name. During CREATE TABLE in a transaction, the name is reserved but the class is not committed yet. A second concurrent CREATE TABLE with the same name observes the reservation and fails or waits.
  • Drop transient on commit/abortlocator_drop_transient_class_name_entries / locator_savepoint_transient_class_name_entries reconcile the reservation set with the durable hash on transaction boundaries and on savepoint rollback.

On the client side, locator_fetch_class and locator_fetch_class_with_classmop pull the class object into the workspace via the same LC_COPYAREA mechanism as instance fetch. The workspace’s MOP for the class is the durable handle the executor uses for the rest of its work; the schema cache built on top is what backs HEAP_CLASSREPR_CACHE at the storage layer.

// locator_sr.c — snapshot-aware reads
extern SCAN_CODE locator_get_object (THREAD_ENTRY *thread_p,
const OID *oid, OID *class_oid,
RECDES *recdes,
HEAP_SCANCACHE *scan_cache,
SCAN_OPERATION_TYPE op_type,
LOCK lock_mode, int ispeeking, int chn);
extern SCAN_CODE locator_lock_and_get_object (THREAD_ENTRY *thread_p,
const OID *oid, OID *class_oid,
RECDES *recdes,
HEAP_SCANCACHE *scan_cache,
LOCK lock, int ispeeking,
int old_chn,
NON_EXISTENT_HANDLING handling);
extern SCAN_CODE locator_lock_and_get_object_with_evaluation (
THREAD_ENTRY *thread_p,
OID *oid, OID *class_oid,
RECDES *recdes,
HEAP_SCANCACHE *scan_cache,
SCAN_OPERATION_TYPE op_type,
LOCK lock, int ispeeking,
int chn,
MVCC_REEV_DATA *mvcc_reev_data,
UPDATE_INPLACE_STYLE inplace);

locator_get_object is the read counterpart of the force family. It is what the executor calls during scan, and what the force functions call to read the old image during UPDATE / DELETE. The body decides the lock mode automatically based on op_type and whether the class is MVCC-disabled:

// locator_get_object — body sketch (src/transaction/locator_sr.c)
if (!OID_IS_ROOTOID (class_oid))
{
if (op_type == S_SELECT && !mvcc_is_mvcc_disabled_class (class_oid))
lock_mode = NULL_LOCK; /* MVCC: no lock */
else if (op_type == S_DELETE || op_type == S_UPDATE)
lock_mode = X_LOCK;
else
lock_mode = S_LOCK; /* SELECT FOR UPDATE / non-MVCC */
}
if (op_type == S_SELECT && lock_mode == NULL_LOCK)
scan_code = heap_get_visible_version_internal (thread_p, &context, false);
else
scan_code = locator_lock_and_get_object_internal (thread_p, &context,
lock_mode);

This is where the cubrid-mvcc.md claim (“MVCC headers are stamped by locator_* flows”) gets cashed: every read above the heap layer goes through this function, and it is the function that knows when to take a lock (op_type-driven), when to take a snapshot (MVCC-disabled-class check), and what scan code semantics to return (S_SUCCESS / S_DOESNT_EXIST / S_SNAPSHOT_NOT_SATISFIED / S_ERROR).

locator_lock_and_get_object_with_evaluation is the variant that re-evaluates the predicate after acquiring the lock — used in SERIALIZABLE / REPEATABLE READ to detect that the row no longer matches the WHERE clause after a concurrent UPDATE committed (the “have I lost my row?” check in MVCC). The evaluation re-runs the WHERE clause against the now-locked record; on V_FALSE, the row is skipped.

src/transaction/locator.c is the plumbing: serialization helpers for LC_COPYAREA, LC_LOCKSET, LC_LOCKHINT, LC_OIDSET to and from network buffers, and the shared free-list of areas (locator_initialize_areas / locator_free_areas).

// locator.c — packing entries
extern char *locator_pack_copy_area_descriptor (int num_objs, LC_COPYAREA *,
char *desc, int desc_len);
extern char *locator_unpack_copy_area_descriptor (int num_objs, LC_COPYAREA *,
char *desc, int packed_size);
extern int locator_pack_lockset (LC_LOCKSET *, bool pack_classes,
bool pack_objects);
extern int locator_unpack_lockset (LC_LOCKSET *, bool unpack_classes,
bool unpack_objects);
extern int locator_pack_lockhint (LC_LOCKHINT *, bool pack_classes);
extern int locator_unpack_lockhint (LC_LOCKHINT *, bool unpack_classes);
extern char *locator_pack_oid_set (char *buffer, LC_OIDSET *);
extern LC_OIDSET *locator_unpack_oid_set_to_new (THREAD_ENTRY *, char *buffer);

The packing is separate from the body. LC_COPYAREA.mem carries the row bodies (already in network-endian by the upper-layer encoder); pack_copy_area_descriptor packs only the descriptors (LC_COPYAREA_MANYOBJS + N × LC_COPYAREA_ONEOBJ) into a caller-provided byte array. This lets the network layer send the two parts as separate buffers, avoiding a copy. The LC_AREA_ONEOBJ_PACKED_SIZE macro in locator.h computes the fixed packed size of a descriptor (4 ints + 1 HFID + 2 OIDs).

flowchart TB
  subgraph CL["client side"]
    WS["workspace dirty list"]
    MF["LOCATOR_MFLUSH_CACHE\n(staging)"]
    CA["LC_COPYAREA"]
    WS --> MF --> CA
  end
  subgraph WIRE["wire"]
    PD["packed descriptors\n(network-endian)"]
    PB["packed bodies\n(already encoded)"]
    CA -- "pack_copy_area_descriptor" --> PD
    CA -- "(zero-copy)" --> PB
  end
  subgraph SR["server side"]
    UCA["LC_COPYAREA reconstruct\n(unpack_copy_area_descriptor)"]
    XF["xlocator_force"]
    LOOP["per-OID dispatch:\nfor each ONEOBJ\n  switch operation\n    locator_∗_force"]
    PD --> UCA
    PB --> UCA
    UCA --> XF --> LOOP
  end
sequenceDiagram
    participant APP as application
    participant WS as workspace
    participant LCL as locator_cl
    participant NET as wire
    participant XS as xlocator_force
    participant LSR as locator_sr
    participant HM as heap_manager
    participant BT as btree
    participant LK as lock_manager
    participant LG as log_manager
    participant HA as repl_log

    APP->>WS: db_create / db_get / set_attr
    WS->>WS: mark MOP dirty (operation = INSERT/UPDATE)
    APP->>WS: commit
    WS->>LCL: locator_all_flush
    LCL->>LCL: locator_mflush_initialize
    LCL->>LCL: ws_map_dirty(locator_mflush)
    loop per dirty MOP
      LCL->>LCL: encode object → RECDES
      LCL->>LCL: append LC_COPYAREA_ONEOBJ
    end
    LCL->>NET: locator_force(copy_area)
    NET->>XS: xlocator_force
    XS->>XS: tran_server_start_topop (atomic)
    loop per ONEOBJ in copy_area
      XS->>LSR: locator_attribute_info_force
      LSR->>LK: lock_object (X_LOCK)
      LSR->>HM: heap_insert_logical / heap_update_logical / heap_delete_logical
      HM->>LG: log_append (redo/undo)
      LSR->>BT: locator_add_or_remove_index / locator_update_index
      BT->>LG: log_append (redo/undo for btree)
      LSR->>LSR: locator_check_foreign_key (if FK present)
      LSR->>HA: repl_log_insert (if HA enabled)
    end
    XS->>XS: tran_server_end_topop
    XS->>NET: reply LC_COPYAREA (with perm OIDs)
    NET->>LCL: receive
    LCL->>WS: ws_update_oid_and_class (temp → perm)
    WS->>APP: commit returns

The HA replication step deserves a note: locator_attribute_info_force is the canonical handle for replication, which is why the cubrid-ha-replication.md doc walks locator_attribute_info_force → heap_*_logical → btree_update → repl_log_insert. The replication record is built from the locator’s RECDES (already encoded) plus the OID and class OID; repl_info.repl_info_type (REPL_INFO_TYPE_RBR_NORMAL for row-based replication) selects the format. Because every DML goes through this one function, the replication coverage is complete by construction — there is no “I forgot to replicate this code path” failure mode short of someone introducing a new DML operation that bypasses the locator.

Anchor on symbol names, not line numbers. The CUBRID source moves; a function name (or struct/enum tag) is the stable handle. Use git grep -n '<symbol>' src/transaction/ to locate the current position. The line numbers in the position-hint table at the end of this section were observed when the document was last updated: and are intended only as quick hints.

Type definitions (src/transaction/locator.h)

Section titled “Type definitions (src/transaction/locator.h)”
  • enum LC_COPYAREA_OPERATION — 11-value enum: LC_FETCH, LC_FETCH_DELETED, LC_FETCH_DECACHE_LOCK, LC_FLUSH_INSERT, LC_FLUSH_INSERT_PRUNE, LC_FLUSH_INSERT_PRUNE_VERIFY, LC_FLUSH_DELETE, LC_FLUSH_UPDATE, LC_FLUSH_UPDATE_PRUNE, LC_FLUSH_UPDATE_PRUNE_VERIFY, LC_FETCH_VERIFY_CHN. The _PRUNE suffixes mark partition-pruning variants.
  • enum LC_FETCH_VERSION_TYPE — 4-value enum: LC_FETCH_CURRENT_VERSION, LC_FETCH_MVCC_VERSION, LC_FETCH_DIRTY_VERSION, LC_FETCH_CURRENT_VERSION_NO_CHECK.
  • struct lc_copyarea_oneobj (LC_COPYAREA_ONEOBJ) — per-object descriptor: operation, flag, hfid, class_oid, oid, length, offset.
  • struct lc_copyarea_manyobjs (LC_COPYAREA_MANYOBJS) — header for the descriptor array (objs first, multi_update_flags, num_objs).
  • struct lc_copy_area (LC_COPYAREA) — (mem, length): the buffer.
  • struct lc_lock_set (LC_LOCKSET) — bulk-fetch request: num_reqobjs, LC_LOCKSET_REQOBJ *objects, LC_LOCKSET_CLASSOF *classes, instance/class lock modes.
  • struct lc_lock_hint (LC_LOCKHINT) — lockhint area: list of classes to prefetch with their locks.
  • struct lc_oidset / lc_class_oidset / lc_oidmap — permanent-OID assignment request.
  • enum lc_prefetch_flagsLC_PREF_FLAG_LOCK, LC_PREF_FLAG_COUNT_OPTIM.
  • enum MULTI_UPDATE_FLAGIS_MULTI_UPDATE, START_MULTI_UPDATE, END_MULTI_UPDATE.
  • LC_FLAG_HAS_INDEX, LC_FLAG_UPDATED_BY_ME, LC_FLAG_HAS_UNIQUE_INDEX, LC_FLAG_TRIGGER_INVOLVED — per-object flag bits.

Workspace — client side (src/transaction/locator_cl.c)

Section titled “Workspace — client side (src/transaction/locator_cl.c)”
  • struct locator_mflush_cache — staging buffer for bulk flush.
  • struct locator_mflush_temp_oid — temp-OID list link.
  • struct locator_cache_lock — per-fetch lock context.
  • locator_is_root / locator_is_class — type predicates on a MOP.
  • locator_fetch_object / _class / _class_of_instance / _instance / _set / _nested — fetch entries (mapped to locator_lock / locator_lock_set / locator_lock_nested).
  • locator_lock / locator_lock_set / locator_lock_nested — build LC_LOCKSET, round-trip to server.
  • locator_cache_lock / locator_cache_lock_set — workspace-side lock caching.
  • locator_cache / locator_cache_object_class / _cache_object_instance / _cache_have_object / _cache_not_have_object — unpack a server reply LC_COPYAREA into the workspace.
  • locator_flush_class / _instance / _all_instances / _for_multi_update / _all_flush — public flush entries.
  • locator_flush_and_decache_instance — flush + drop from cache.
  • locator_mflush — per-MOP encode-and-pack (ws_map_dirty callback).
  • locator_mflush_initialize / _reset / _end / _reallocate_copy_area — staging lifecycle.
  • locator_mflush_force — drain staging buffer to server, reconcile temp→perm OIDs, free per-flush state.
  • locator_mem_to_disk / locator_class_to_disk — encoders (instance / class).
  • locator_add_class / _add_instance / _add_root — workspace inserts.
  • locator_remove_class / _remove_instance — workspace deletes.
  • locator_update_instance / _update_class / _update_tree_classes — workspace updates that mark dirty.
  • locator_prepare_rename_class — name-reservation handshake during ALTER TABLE RENAME.
  • locator_force — wire send (calls net_client_request_recv_copyarea).
  • locator_repl_* — replication-side flush variants (locator_repl_mflush_force, locator_repl_flush_all).

Server-side — DML fan-in (src/transaction/locator_sr.c)

Section titled “Server-side — DML fan-in (src/transaction/locator_sr.c)”
  • locator_attribute_info_force — top-level entry; switch on LC_COPYAREA_OPERATION.
  • locator_insert_force — heap insert + index insert + FK check.
  • locator_update_force (static) — heap update + diff-driven index update + FK check + replication.
  • locator_delete_force / locator_delete_force_for_moving / locator_delete_force_internal — heap delete + index delete + FK cascade + replication.
  • locator_move_record — partition-move helper (delete on A + insert on B with linkage).
  • locator_force_for_multi_update — multi-update path with trigger-aware ordering.
  • xlocator_force — wire entry; loops over LC_COPYAREA_ONEOBJs and calls per-op force functions inside a top-op.
  • xlocator_force_repl_update — HA-applier variant.
  • locator_add_or_remove_index (extern) / locator_add_or_remove_index_for_moving (static) / locator_add_or_remove_index_internal (static) — per-index loop for INSERT/DELETE.
  • locator_update_index — per-index loop for UPDATE (diff-driven by att_id[]).
  • locator_check_foreign_key (static) — FK existence probe.
  • locator_check_unique_btree_entries — CHECKDB integrity sweep.
  • locator_check_btree_entries / locator_check_class / locator_check_by_class_oid / locator_check_all_entries_of_all_btrees — the rest of the integrity-check family.
  • locator_was_index_already_applied — guard against double application of a shared B-tree (PK ↔ FK overlap).
  • xlocator_check_fk_validity — wire-callable FK validator (used during ALTER TABLE ADD CONSTRAINT).
  • xlocator_assign_oid — pre-mint a single permanent OID (heap_assign_address + name binding).
  • xlocator_assign_oid_batch — batch variant for LC_OIDSET.
  • xlocator_find_class_oid — class-name → class-OID with lock.
  • locator_permoid_class_name — bind a freshly-minted OID to a reserved class name.
  • xlocator_reserve_class_names / xlocator_reserve_class_name / xlocator_get_reserved_class_name_oid — name-reservation protocol for CREATE TABLE.
  • xlocator_delete_class_name / xlocator_rename_class_name — catalog name maintenance.
  • locator_drop_transient_class_name_entries / locator_savepoint_transient_class_name_entries — reconcile the transient name set on commit / abort / savepoint.
  • locator_check_class_names / locator_dump_class_names — diagnostics over the name hash.
  • locator_get_object — switch on op_type to choose lock mode; dispatch to heap_get_visible_version_internal (no lock) or locator_lock_and_get_object_internal (with lock).
  • locator_lock_and_get_object — explicit-lock entry.
  • locator_lock_and_get_object_with_evaluation — re-evaluate predicate after acquiring lock (for SERIALIZABLE / RR semantics).
  • locator_lock_and_get_object_internal (static) — shared body.

Bulk fetch (the wire-side dual of bulk flush)

Section titled “Bulk fetch (the wire-side dual of bulk flush)”
  • xlocator_fetch — single-OID server-side fetch (returns one LC_COPYAREA).
  • xlocator_fetch_all — heap scan returning one LC_COPYAREA per page-batch (used during boot to populate caches).
  • xlocator_fetch_lockset — bulk fetch of LC_LOCKSET request.
  • xlocator_fetch_all_reference_lockset — transitive-closure fetch (referenced classes/instances).
  • xlocator_fetch_lockhint_classes — class-prefetch from LC_LOCKHINT.
  • locator_lock_and_return_object (static) — per-OID body of the bulk fetch path.
  • locator_return_object_assign (static) — pack one OID into the reply LC_COPYAREA.
  • locator_all_reference_lockset (static) — build the full reference closure for an OID.
  • locator_find_lockset_missing_class_oids (static) — fill in LC_LOCKSET_REQOBJ.class_index for objects whose class was unknown to the caller.
  • locator_guess_sub_classes (static) — expand subclass references in LC_LOCKHINT.
  • xlc_fetch_allrefslockset — wire entry into the reference-closure path.

Lifecycle and module state (src/transaction/locator_sr.c)

Section titled “Lifecycle and module state (src/transaction/locator_sr.c)”
  • locator_initialize / locator_finalize — create / destroy the locator_Mht_classnames hash.
  • locator_Pseudo_pageid_first / _Pseudo_pageid_last / _Pseudo_pageid_crt — pseudo-pageid range used for reservations during transient-name handling.
  • locator_Mht_classnames — module-scope hash from class name to cached class OID + reservation state.
  • locator_repl_prepare_force (static) — pre-check for HA-applier flush (resolves key, fetches old OID).
  • locator_repl_get_key_value (static) — extract key columns from the replication record.
  • locator_repl_add_error_to_copyarea (static) — pack a per-object error result back into the reply.
  • xlocator_redistribute_partition_data — reshape partition data after ALTER TABLE ... REORGANIZE PARTITION.
  • locator_rv_redo_rename — recovery hook for class renames.
  • locator_allocate_copy_area_by_length / locator_reallocate_copy_area_by_length / locator_free_copy_area — buffer lifecycle.
  • locator_pack_copy_area_descriptor / locator_unpack_copy_area_descriptor.
  • locator_send_copy_area — split into (contents, descriptor) for the network layer.
  • locator_recv_allocate_copyarea — server-side mirror.
  • locator_allocate_lockset / _reallocate_lockset / _free_lockset / locator_pack_lockset / locator_unpack_lockset / locator_allocate_and_unpack_lockset.
  • locator_allocate_lockhint / _reallocate_lockhint / _free_lockhint / locator_pack_lockhint / locator_unpack_lockhint / locator_allocate_and_unpack_lockhint.
  • locator_initialize_areas / locator_free_areas — module-wide free lists.
  • locator_make_oid_set / _clear_oid_set / _free_oid_set / _add_oid_set / _get_packed_oid_set_size / _pack_oid_set / _unpack_oid_set_to_new / _unpack_oid_set_to_exist.
  • locator_manyobj_flag_is_set / _remove / _set — multi-update flag manipulation on LC_COPYAREA_MANYOBJS.

The line column reflects positions observed when the doc was last updated: and decays. If you land at a different definition, the symbol name is authoritative; update the table on the way through.

SymbolFileLine
enum LC_COPYAREA_OPERATIONlocator.h107
enum LC_FETCH_VERSION_TYPElocator.h179
struct lc_copyarea_oneobjlocator.h224
struct lc_copyarea_manyobjslocator.h243
struct lc_copy_arealocator.h252
struct lc_lock_setlocator.h286
struct lc_lock_hintlocator.h329
struct lc_oidsetlocator.h395
struct locator_mflush_cachelocator_cl.c69
struct locator_mflush_temp_oidlocator_cl.c61
locator_fetch_objectlocator_cl.c(varies)
locator_fetch_classlocator_cl.c(varies)
locator_fetch_setlocator_cl.c(varies)
locator_mflushlocator_cl.c4435
locator_mflush_initializelocator_cl.c3802
locator_mflush_forcelocator_cl.c3995
locator_flush_classlocator_cl.c4890
locator_flush_instancelocator_cl.c5058
locator_all_flushlocator_cl.c5279
locator_initializelocator_sr.c246
locator_finalizelocator_sr.c364
xlocator_reserve_class_nameslocator_sr.c409
xlocator_find_class_oidlocator_sr.c1033
xlocator_assign_oidlocator_sr.c2043
xlocator_fetchlocator_sr.c2374
xlocator_fetch_alllocator_sr.c2772
xlocator_fetch_locksetlocator_sr.c3052
xlocator_fetch_all_reference_locksetlocator_sr.c3818
locator_check_foreign_keylocator_sr.c4023
locator_insert_forcelocator_sr.c4938
locator_delete_forcelocator_sr.c6116
locator_delete_force_internallocator_sr.c6172
locator_force_for_multi_updatelocator_sr.c6543
xlocator_forcelocator_sr.c7129
locator_attribute_info_forcelocator_sr.c7461
locator_add_or_remove_indexlocator_sr.c7695
locator_add_or_remove_index_internallocator_sr.c7760
locator_update_indexlocator_sr.c8260
locator_check_unique_btree_entrieslocator_sr.c9768
xlocator_fetch_lockhint_classeslocator_sr.c11356
xlocator_assign_oid_batchlocator_sr.c11577
xlocator_check_fk_validitylocator_sr.c11754
locator_lock_and_get_object_internallocator_sr.c12936
locator_lock_and_get_object_with_evaluationlocator_sr.c13100
locator_get_objectlocator_sr.c13241
locator_lock_and_get_objectlocator_sr.c13352
locator_allocate_copy_area_by_lengthlocator.c(varies)
locator_pack_copy_area_descriptorlocator.c(varies)
locator_pack_locksetlocator.c(varies)
locator_pack_lockhintlocator.c(varies)
locator_pack_oid_setlocator.c(varies)

locator_sr.c is ≈ 14 000 lines and locator_cl.c is ≈ 7 100 lines; symbol-level git grep is the recommended lookup.

This document was written without raw analysis materials, so the cross-checks below are against the other CUBRID code-analysis docs that mention the locator. They are paraphrases of the claims in those docs followed by what this reading confirmed or refined.

  • cubrid-heap-manager.md says “locator_insert/update/delete_force are the primary callers of heap_*_logical.” Confirmed. locator_insert_force calls heap_create_insert_context then heap_insert_logical; locator_delete_force_internal calls heap_delete_logical; locator_update_force calls heap_update_logical. The claim is precise — there are very few other call sites of the heap logicals, and those (e.g., recovery-time replays, vacuum) are special by design.

  • cubrid-ha-replication.md walks locator_attribute_info_force → heap_*_logical → btree_update → repl_log_insert. Confirmed structurally. The replication record is built inside locator_insert_force / locator_update_force / locator_delete_force_internal after the heap operation succeeds and after the index updates, so the replication record always describes the post-state. The repl_info argument threads through locator_attribute_info_force and is the choice point: REPL_INFO_TYPE_RBR_NORMAL for row-based, REPL_INFO_TYPE_STMT_NORMAL for statement-based (now rare), and REPL_INFO_TYPE_RBR_AT_LEAST_ONE_RECORD for the multi-row case.

  • cubrid-lock-manager.md says “locks are acquired through locator paths”. Refined. The lock acquisition site depends on who initiated the operation:

    • Executor-driven path (e.g., UPDATE WHERE): the X-lock is taken during the SELECT-with-WHERE phase by locator_lock_and_get_object (called from scan_manager.c), and locator_attribute_info_force is then called with need_locking=false.
    • Workspace-driven path (client flushes a previously-fetched object the server has not seen X-locked yet): the X-lock is taken inside locator_attribute_info_force itself, in the LC_FLUSH_UPDATE branch’s locator_lock_and_get_object call, with need_locking=true.
    • Force-update-in-place path (catalog updates, recovery): force_update_inplace is UPDATE_INPLACE_OLD_MVCCID or similar; the function uses heap_get_last_version which does not consult MVCC and does not lock. The caller is responsible for upstream locking. Either way, the locator is the lens through which row locks are taken.
  • cubrid-mvcc.md says “MVCC headers are stamped by locator_* flows”. Confirmed. The MVCC header (mvcc_rec_header) is stamped inside the heap’s heap_insert_logical / heap_update_logical / heap_delete_logical, which are called only from the locator’s force functions on the DML path. The one MVCC field the locator touches directly is the inserter MVCCID for newly-allocated heap rows, via the mvcc_rec_header[2] array passed down through locator_add_or_remove_index_internal to the B-tree (so unique checks see the right inserter). The cross-reference is precise.

  • cubrid-page-buffer-manager.md mentions PGBUF_WATCHER in the context of multi-page latch ordering. The locator honors this contract: the force functions accept a PGBUF_WATCHER *home_hint_p and pass it to heap_insert_logical, which uses it to keep the home page latched between the slot search and the row write. This is the page-buffer-side reason the bulk insert path is fast — the locator forwards the watcher rather than refixing the page on each call.

  1. What is the actual cost of the bidirectional LC_COPYAREA layout? The layout grows row bodies forward and descriptors backward, meeting in the middle. The locator_mflush_reallocate_copy_area path triggers when they collide. Empirically, what is the distribution of “did we have to reallocate” vs “did we fit” under typical OLTP workloads? Investigation: instrument reallocations under a large write workload; revisit the default COPYAREA size.

  2. Why are there *_PRUNE and *_PRUNE_VERIFY variants of every flush operation? LC_FLUSH_INSERT_PRUNE triggers partition-pruning on the server; _PRUNE_VERIFY adds an extra “did we end up in the right partition?” check. The verify variant looks defensive against client/server schema-version skew. Investigation: trace where each variant is produced on the client (locator_mem_to_disk and friends) and document the trigger conditions.

  3. Multi-update flag semantics. IS_MULTI_UPDATE, START_MULTI_UPDATE, END_MULTI_UPDATE ride on the LC_COPYAREA_MANYOBJS header. The server’s locator_force_for_multi_update keys off IS_MULTI_UPDATE, but the START/END pair appears to bracket unique-statistics gathering across multiple LC_COPYAREAs of the same logical UPDATE. Investigation: find the producer and consumer of START_MULTI_UPDATE in the unique-stats path (btree_unique_stats).

  4. The classnames hash (locator_Mht_classnames) is module-scope and implicitly singleton. Resizing semantics under heavy DDL? Investigation: check whether the hash is bounded in size and what eviction policy (if any) applies.

  5. LC_FETCH_VERSION_TYPE covers four values, but LC_FETCH_CURRENT_VERSION_NO_CHECK is described in code comments as “skip server-side checks”. Which checks? Tracing xlocator_fetch: it sets skip_fetch_version_type_check = true and then treats _NO_CHECK like _CURRENT. The “checks” in question seem to be the assert (lock_get_object_lock(...) != NULL_LOCK) precondition — i.e., the caller is asserting it already holds the lock. Worth a more careful audit of every consumer.

  6. The locator_force (client) vs xlocator_force (server) naming convention. The pattern is consistent: locator_* is client-side or shared, xlocator_* is the server-callable wire entry (the x prefix denotes “external/wire entry” in CUBRID’s convention). Worth confirming this is the only prefix rule.

  7. Workspace decache eviction policy. The decache flag on LOCATOR_MFLUSH_CACHE and on locator_flush_and_decache_instance lets a flush also drop the MOP from the cache. When does the upper layer use this? Cache-pressure-driven evictions seem to live in work_space.c itself; tracing the call sites would clarify the contract.

  8. The xlocator_fetch_all_reference_lockset reference-closure path. Following references transitively can blow up under wide schemas. What stops it? Investigation: read locator_all_reference_lockset carefully; identify the prune_level and quit_on_errors interplay.

CUBRID source (under /data/hgryoo/references/cubrid/)

Section titled “CUBRID source (under /data/hgryoo/references/cubrid/)”
  • src/transaction/locator.h — public types and macros (LC_COPYAREA, LC_COPYAREA_ONEOBJ, LC_COPYAREA_MANYOBJS, LC_LOCKSET, LC_LOCKHINT, LC_OIDSET, LC_COPYAREA_OPERATION, LC_FETCH_VERSION_TYPE).
  • src/transaction/locator.c — wire packing (locator_pack_copy_area_descriptor, locator_pack_lockset, locator_pack_lockhint, locator_pack_oid_set), area allocation / free-list management.
  • src/transaction/locator_cl.h — client-side public API (locator_fetch_*, locator_flush_*, locator_add_*, locator_remove_*, locator_update_*).
  • src/transaction/locator_cl.c — workspace bridge (LOCATOR_MFLUSH_CACHE, locator_mflush*, locator_cache*, locator_lock*, locator_force).
  • src/transaction/locator_sr.h — server-side public API (locator_attribute_info_force, locator_get_object, locator_lock_and_get_object, locator_add_or_remove_index, locator_update_index, locator_check_class*).
  • src/transaction/locator_sr.c — server-side fan-in (xlocator_force, xlocator_fetch*, force family, constraint orchestration, OID lifecycle, classnames hash, reference-closure fetch).

Cross-reference docs in this knowledge base

Section titled “Cross-reference docs in this knowledge base”
  • cubrid-heap-manager.md — heap_insert/update/delete_logical and the slotted-page substrate the locator drives.
  • cubrid-mvcc.mdmvcc_rec_header stamping and the visibility predicate the locator’s reads consult.
  • cubrid-lock-manager.md — lock modes and acquisition points threaded through locator_lock_and_get_object.
  • cubrid-page-buffer-manager.mdPGBUF_WATCHER chains the force family preserves across pages.
  • cubrid-btree.md — B-tree primitives invoked by locator_add_or_remove_index_internal.
  • cubrid-ha-replication.md — HA replication record produced inside locator_attribute_info_force.
  • cubrid-catalog-manager.md — root-class and per-class catalog the classnames hash backs.

Textbook chapters (under knowledge/research/dbms-general/)

Section titled “Textbook chapters (under knowledge/research/dbms-general/)”
  • Database Internals (Petrov), Ch. 3 “File Formats” and Ch. 4 “Implementing B-Trees” — RID/OID/TID semantics, identity under compaction.
  • Database Systems: The Complete Book (Garcia-Molina, Ullman, Widom), §10.6 “Object-Oriented Database Systems” — workspace pattern, MOP vs persistent identity.
  • Database System Concepts (Silberschatz, Korth, Sudarshan, 6th ed.), Ch. 13 “Storage and File Structure” and Ch. 17 “Database System Architectures” — client/server boundary, bulk-fetch motivation.