CUBRID Lock Manager

Multi-Granularity Locking, Conversion, Deadlock Detection

2026-06 · hgryoo · Code Analysis Seminar

© 2026 CUBRID Corporation. All rights reserved.

Agenda

  1. The problem — what a lock manager is for
  2. Theory — 2PL phases & variants, MGL, lub; what every DBMS lock manager shares
  3. Data structures — OID keys, LK_ENTRY dual-view, lk_Gl, boot & memory tiers
  4. The life of a lock request — paths A–G, conversion & UPR, release & NON2PL, escalation, suspend/resume
  5. Deadlock — WFG construction, cycle scan, six-criteria victim selection
  6. Special paths — instant probe, composite lock, hint / demote / subclass

Sources: cubrid-lock-manager.md (design intent) + cubrid-lock-manager-detail.md (code-level deep dive).
Appendix A: Lock vs Latch, in depth. Appendix B: the lock-table hash function.

© 2026 CUBRID Corporation. All rights reserved.

Start conservative — one transaction at a time

The trivially correct scheduler: one exclusive lock on the entire database.

center

  • Serial by construction — every transaction has the database to itself, so nothing can possibly go wrong. The cost: a 10 ms read stuck behind a 10 s batch waits 1,000× longer than its own work.
  • The fix is a dial, not a switch: shrink the unit of exclusion — database → class → row — so non-overlapping work runs in parallel. The same dial returns at intention modes and escalation.
  • Shrinking the grain is what admits interleavings — the next slide catalogs the five you just let in.
© 2026 CUBRID Corporation. All rights reserved.

The problem — what a lock manager is for

Interleave transactions on shared data without coordination and ACID's I (Isolation) breaks — exactly five classical anomalies appear:

Anomaly Scenario What goes wrong
Dirty read T₁ reads T₂'s uncommitted write T₂ rolls back → T₁ saw a value that "never existed"
Lost update T₁ and T₂ both read n, both write n+1 one of the increments is silently lost
Non-repeatable read T₁ reads R twice; T₂ updates R between them second read disagrees with the first
Phantom T₁ runs WHERE color='red' twice; T₂ inserts a red row second read returns a new row
Write skew T₁, T₂ both read R; each writes a different cell using R both updates pass their pre-condition; together they violate it
  • Goal: keep the concurrency but stay serializable — every result one a serial run (previous slide) could have produced.
  • Lock manager's role: the lock primitive — "I am reading / writing this object" — plus conflict arbitration (grant, queue, escalate, abort as deadlock victim).
  • Locks alone are not enough: lock-and-release per access still fails — the moment T₁ releases R, T₂ slips in between its steps; the missing piece is a discipline on lock timing → next slide.
  • Isolation levels (RC / RR / Serializable) are tunable trade-offs: which anomalies you tolerate for more concurrency.
© 2026 CUBRID Corporation. All rights reserved.

2PL — two phases, one lock point

How to allow interleavings and still be equivalent to some serial run? The classical answer:

center

  • Growing phase: a transaction may acquire any lock; once one is released, the shrinking phase has begun.
  • Shrinking phase: only releases are allowed. No further acquires.
  • The transition is the lock point — it does not have to be at commit time, but everything ordered around it determines the schedule.
  • Theorem (Eswaran 1976): if every transaction observes 2PL, the resulting schedule is serializable.

Two-phase locking is a discipline on the timing of lock(R) / unlock(R) calls, not on which locks are held.

© 2026 CUBRID Corporation. All rights reserved.

2PL variants — basic, strict, rigorous

Variant When does the shrinking phase end? Cascading abort?
Basic 2PL as soon as locks are no longer needed possible
Strict 2PL write locks held until commit / abort impossible
Rigorous 2PL every lock held until commit / abort impossible
  • What rigorous buys on top: holding every lock — S included — to commit makes the serialization order equal the commit order, so replicas and log readers can trust the order they observe. The price: read locks block writers until commit.
  • Cascading abort: if T₁ releases an X lock on R before committing and T₂ reads R, then T₁ aborts, T₂ must abort too. Strict 2PL forbids this by construction.
  • CUBRID uses strict 2PL for X locks — every X lock holds to commit. Reads are MVCC: plain SELECT on user tables takes no row lock. The RC-vs-RR S-lock release knob applies only to MVCC-disabled classes (root, serials, collation, HA apply-info) via lock_unlock_object_by_isolation.

"Strict" is the line WAL recovery assumes — undo at abort works only because no other transaction has seen our writes yet.

© 2026 CUBRID Corporation. All rights reserved.

Worked example — one abort drags down everyone downstream

T1 follows basic 2PL to the letter — and still poisons two other transactions:

t T1 T2 T3
1 writes R (X lock)
2 releases X on R before commit — basic 2PL allows it
3 reads R — uncommitted data
4 writes S using R; also releases early
5 reads S
6 aborts — R rolls back must abort — saw a value that now "never existed" must abort — derived from T2
  • This is the dirty read from §The problem, weaponized: the write became visible before commit, so one rollback cascades through every downstream reader, recursively.
  • The cure is blunt: hold X locks to commit (strict 2PL). Nobody can read your writes early — the chain can never start. CUBRID applies this to X locks unconditionally.
© 2026 CUBRID Corporation. All rights reserved.

Cost of 2PL — and the four standard remedies

  • Guarantee: serializable schedules (no anomaly).
  • Cost #1: contention. Long shrinking phases serialize the workload.
  • Cost #2: deadlocks. Hold-and-wait + circular dependency is unavoidable in general.
  • Four standard remedies:
    • Timeout — cheap to implement, prone to false positives under load.
    • Conservative 2PL — acquire every lock up front, before doing any work. Avoids deadlock entirely, but caller must know its working set.
    • Waits-for graph cycle scanCUBRID's primary detector (lock_detect_local_deadlock).
    • Timestamp avoidancewait-die, wound-wait. Decides who waits using transaction age; common in distributed designs (Spanner, CockroachDB), rare in monolithic DBMSs.

CUBRID picks WFG detection as the primary mechanism and falls back to per-request timeout when WFG maintenance lags.

© 2026 CUBRID Corporation. All rights reserved.

Three theoretical building blocks

Beyond plain S/X locks, three pieces of theory shape every real lock manager:

  1. Intention modes (IS, IX, SIX) — multi-granularity locking. A coarse lock declares intent to take finer locks underneath, so two transactions can lock different rows of the same table without false table-level conflict.
  2. Compatibility matrix — 2×2 (S, X) → 5×5 (with intention) → 12×12 in CUBRID (add BU, SCH-S/M, U, NON2PL).
  3. Conversion — re-requesting your own lock should upgrade, not deadlock with yourself. The new mode is the least upper bound (lub) of held and requested.
    Definition. In the lock-mode lattice, lub(A, B) is the smallest mode that satisfies both A and B. E.g. lub(S, IX) = SIX, lub(IS, S) = S, lub(S, X) = X.

The data structures in Part II realize these three pieces as LK_ENTRY + the lock-mode matrix + a conversion table.

© 2026 CUBRID Corporation. All rights reserved.

Why intention modes exist

Without intention modes (IS, IX, SIX) you face a binary choice for class-grain locking:

Choice Concurrency story Problem
Whole-class S / X only Two writers can never coexist, even on different rows dreadful concurrency on hot tables
No class lock at all Writer on row 5 can't tell a DDL is mid-ALTER TABLE on the same class breaks schema isolation

Intention modes solve both at once. Taking IX on a class declares "I will hold an X on at least one row underneath." Then:

  • Two writers on different rows can both hold IX on the class — no conflict (IX ∧ IX = ✓).
  • DDL takes SCH-M, which conflicts with IX — writers wait for the DDL to commit, but fine-grained concurrency between writers is preserved.
  • A reader who needs S on the whole class still conflicts with IX — correct, because someone's writing a row.

Without IS/IX/SIX, MGL collapses back to S/X-only. With them, you get row-grain data concurrency + class-grain DDL protection in the same compatibility matrix.

© 2026 CUBRID Corporation. All rights reserved.

Seven common DBMS patterns

  1. Lock vs latch separation — lock = logical, transaction-long, in the external lock table (LK_ENTRY); latch = physical, microseconds, inside the page (PGBUF_LATCH). Depth: Appendix A
  2. Resource hashing — OID, or (relation, key range)
  3. Aggregate-mode cachetotal_holders_mode (O(holders) → O(1))
  4. Dual-view threading — resource view + transaction view
  5. MGL + intention modes — IS, IX, S, SIX, X
  6. Conversion (lub) tableS + IX = SIX
  7. FIFO queue + starvation guard — holders ∧ waiters

None of these seven is a CUBRID invention — every serious engine sets the same seven dials. Part II shows the position CUBRID chose for each one.

© 2026 CUBRID Corporation. All rights reserved.

Overview — how a lock request flows

center

  • Two grains, one exception. Intent at the class first; the row lock is real for writes and MVCC-disabled reads — plain reads on MVCC tables skip it (snapshot visibility).
  • Compatibility is two-sided — granted modes and the wait queue (the starvation guard). A suspend ends one of three ways: granted by a releaser, timeout, or deadlock-victim rollback.
  • Release timing is scoped. Per-statement release exists only for S locks on MVCC-disabled classes under RC; everything else — X, class, intent — holds to commit.
© 2026 CUBRID Corporation. All rights reserved.

Part II

How does CUBRID turn the dials?

© 2026 CUBRID Corporation. All rights reserved.

Naming everything with OID

center

  • OID = (volid, pageid, slotid) — the db_identifier C struct; the tuples above are examples.
  • Granularity hierarchy = OID hierarchy: Root class → Class → Instance.
  • Every parent→child arrow above is exactly an intention-lock parent/child edge — the hierarchies mirror one-for-one.
© 2026 CUBRID Corporation. All rights reserved.

Three core types

// src/transaction/lock_manager.h
struct lk_res_key { LOCK_RESOURCE_TYPE type; OID oid; OID class_oid; };

struct lk_res {
  LK_RES_KEY key;
  LOCK       total_holders_mode;   // aggregate of granted modes
  LOCK       total_waiters_mode;   // aggregate of waiting modes
  LK_ENTRY  *holder, *waiter, *non2pl;
  pthread_mutex_t res_mutex;
};

struct lk_entry {
  LK_RES   *res_head;
  int       tran_index;
  LOCK      granted_mode, blocked_mode;
  int       count;                 // re-entrant counter
  LK_ENTRY *next;                  // resource list (holder or waiter)
  LK_ENTRY *tran_next, *tran_prev; // transaction list
  LK_ENTRY *class_entry;           // parent class, one hop
  int       ngranules;             // children below this intention lock
};

Watch three field pairs: the two aggregate modes (O(1) compatibility — §matrix), the two mode fields on LK_ENTRY (conversion state — §paths F/G), and the two linkages next vs tran_next/prev (§dual-view).

© 2026 CUBRID Corporation. All rights reserved.

LK_TRAN_LOCK — one transaction's lock state

// struct lk_tran_lock — src/transaction/lock_manager.c
struct lk_tran_lock {
  LK_ENTRY *inst_hold_list, *class_hold_list, *root_class_hold;
  LK_ENTRY *waiting;             // the one lock this txn is blocked on
  LK_ENTRY *lk_entry_pool;       // local LK_ENTRY pool, max 10
  LK_ENTRY *non2pl_list;         // early-released lock shadows (RC)
  bool      lock_escalation_on;  // escalation re-entrance guard
  bool      is_instant_duration; // instant-lock mode flag
};
  • Three hold lists, split by grain — "what do I hold on class A?" walks only class_hold_list: O(classes held), not O(all locks).
  • waiting is singular — a transaction blocks on at most one lock at a time.
  • The 10-entry local pool, the escalation guard, and the instant flag each return in the memory / escalation / special-path slides.
© 2026 CUBRID Corporation. All rights reserved.

The global lock table — lk_global_data

center

Three globals: the hash table, the per-tran table with a 10-entry local pool, a shared freelist for overflow. Sizing: §Boot.

© 2026 CUBRID Corporation. All rights reserved.

lk_Gl in action — one contended row, two views

Scenario: T7 is mid-UPDATE on a row of class A; T9's UPDATE of the same row has to wait.

Resource view — m_obj_hash_table:

LK_RES total_holders total_waiters holder list waiter list
class A IX NULL T7(IX) → T9(IX)
row (2, 4100, 1) X X T7(X) T9(X)

Transaction view — tran_lock_table:

LK_TRAN_LOCK class_hold_list inst_hold_list waiting
T7 IX on class A X on the row
T9 IX on class A X on the row
  • The two bold cells are one and the same LK_ENTRY — threaded into the resource's holder list and T7's hold list at once (§Dual-view, two slides ahead).
  • Both transactions hold IX on the class with zero conflict — the real fight happens one grain below.
© 2026 CUBRID Corporation. All rights reserved.

Entry initialization — granted vs blocked

// lock_initialize_entry_as_granted — src/transaction/lock_manager.c
  e->tran_index   = tran;          e->res_head = res;
  e->granted_mode = lock;          // → this entry is a holder
  e->blocked_mode = NULL_LOCK;
  e->count        = 1;             /* list pointers nulled */

// lock_initialize_entry_as_blocked — same file
  e->thrd_entry   = th;            // the thread to suspend
  e->granted_mode = NULL_LOCK;
  e->blocked_mode = lock;          // → this entry is a waiter
granted / blocked The entry is a… Lives in
X / – holder — fully acquired holder list
IX / SIX upgrader — holds IX, pending target SIX (§paths F/G) holder list, upgrader zone
– / X waiter — nothing acquired yet (§path D) waiter list
  • One entry per (resource, txn) — the field pair plus which list it threads is the state. (blocked_mode is a pending target, not an "intention" — that word belongs to IS/IX/SIX.) A third constructor builds the §NON2PL shadows — not a lock at all.
© 2026 CUBRID Corporation. All rights reserved.

Dual-View Threading

center

  • The same LK_ENTRY lives in two lists at once.
  • Resource view: who holds R?next.   Txn view: what does T hold?tran_next/prev.
  • Commit walks the txn list. Compatibility check reads the resource-side aggregate.
© 2026 CUBRID Corporation. All rights reserved.

Boot — lock_initialize, five steps in order

center

  • Steps 1–4 populate the lk_Gl globals from the assembly diagram (colors match); step 5 starts the only background thread. Functions: lock_initialize_* — full walk in detail ch. 2.
  • Order is load-bearing — step 4 sizes the WFG by num_trans, which step 1 sets.
  • The daemon does three duties per 100 ms tick: interrupt check, timeout check, and the (gated) deadlock scan — 100 ms is the timeout granularity, by design.
© 2026 CUBRID Corporation. All rights reserved.

Memory — three tiers, and the lockfree boundary

center

  • A 3–5-row OLTP transaction never leaves tier 1 — allocation is contention-free.
  • The boundary: finding an LK_RES (hash find_or_insert) is lockfree; mutating its lists is mutex-guarded — res_mutex per resource, hold_mutex per transaction.
  • Retired entries reclaim epoch-based (del_id) — the same machinery as CUBRID's other lockfree modules.
© 2026 CUBRID Corporation. All rights reserved.

A 12-mode vocabulary

Value Mode Meaning
0 NA / NULL placeholder
1 NON2PL tracker for S locks released early under RC (MVCC-disabled classes)
3 SCH-S / SCH-M schema stability / modification (DDL takes SCH-M)
4 IS / IX intention shared / exclusive
5 S shared
7 BU bulk update — loaddb takes it at class grain; load workers then skip row locks entirely
8 SIX shared + intention exclusive
9 U upgrade-intent read — survives in the enum & tables, no current code path issues it
10 X exclusive

Gray's 5 modes (IS / IX / S / SIX / X) + CUBRID's 7 extras.

© 2026 CUBRID Corporation. All rights reserved.

How SQL statements map to lock modes

A statement typically takes two locks: an intent at class grain — planted by lock_object's MGL preparation, or by lock_scan at heap-scan start — and a real lock at row grain.

SQL Class grain (intent) Row grain (lock_object)
SELECT (plain) IS none on MVCC-enabled classes  ·  S on MVCC-disabled (serials, etc.)
SELECT … FOR UPDATE IX X on each selected row
INSERT IX X on the new row
UPDATE IX X on each updated row
DELETE IX X on each row
LOAD DATA / bulk load BU none — load workers skip row locks; class BU subsumes them
CREATE / ALTER / DROP TABLE SCH-M — (DDL is class-level only)
SELECT against a class mid-DDL SCH-S —  ←  waits behind SCH-M
  • The IX row repeats deliberately — every row-level write takes the same class-level intent mode. The compatibility matrix does the rest.
© 2026 CUBRID Corporation. All rights reserved.

Compatibility matrix + starvation guard

NULL SCH-S IS S IX BU SIX X SCH-M
SCH-S
IS
S
IX
BU
SIX
X
SCH-M

A new request must be compatible with both total_holders_mode and total_waiters_mode — the latter is the starvation guard that stops a stream of S from leapfrogging a waiting X.

© 2026 CUBRID Corporation. All rights reserved.

Worked example — the starvation guard at work

A shape you can reproduce on any CUBRID table — long reader, then DDL, then more readers (class grain):

t Action Holders Waiters total_holders total_waiters Outcome
1 T1: long SELECTIS on class A T1(IS) IS NULL granted
2 T2: ALTER TABLE ASCH-M T1(IS) T2(SCH-M) IS SCH-M wait (SCH-M ∧ IS = ✗)
3 T3: new SELECTIS T1(IS) T2(SCH-M) IS SCH-M wait — fine vs holders, but IS ∧ SCH-M = ✗ vs waiters
  • At t = 3 a holders-only test would grant T3 (IS ∧ IS = ✓) — and a steady stream of readers would starve the DDL forever. The waiter check queues T3 behind T2; the ALTER runs as soon as T1 finishes.
  • A row-grain version of this story exists only on MVCC-disabled rows (serials) — plain MVCC readers never take row S locks.

PostgreSQL ships the same rule as strong-lock fairness; in CUBRID it is the total_waiters_mode side of the dual test.

© 2026 CUBRID Corporation. All rights reserved.

Part III

The life of a lock request — at the code level

© 2026 CUBRID Corporation. All rights reserved.

Who calls the lock manager

Caller APIs Why
locator_sr.c (41 sites) lock_object, lock_unlock_object, lock_classes_lock_hint object fetch / store / DDL — the dominant caller
btree.c (13) lock_object, …donot_move_to_non2pl key locks, FK checks — those rows never reach the client
heap_file.c (8) lock_scan, lock_object class IS at scan start, row locks per fetch
scan_manager.c (4) lock_hold_object_instantlock_object optimistic probe first, blocking lock as fallback
query_executor.c (4) instant-mode brackets, composite lock WHERE evaluation; bulk DELETE / UPDATE
serial.c (4) lock_object(X), lock_unlock_object NEXT VALUE — the main NON2PL exerciser under RC
log_manager.c (5) lock_unlock_all commit / rollback — the only bulk-release caller

The lock manager is a service layer: ten files account for ~90 % of all call sites.

© 2026 CUBRID Corporation. All rights reserved.

Two entry points, one internal workhorse

center

  • lock_scan — class-grain only; IS for reads, IX for writes. One real call site: heap scan start.
  • lock_object — dispatches on the OID (root / class / instance) and first secures the parent's intention lock.
  • Fast-out by subsumption. If a held class lock already covers the row request — class X covering row X, say — lock_object returns LK_GRANTED with no hash lookup, no LK_ENTRY. It's why escalation's remaining 40,000 rows (§worked example) cost nothing. Check: lock_is_class_lock_escalated.
© 2026 CUBRID Corporation. All rights reserved.

Inside the workhorse — the dispatch tree

center

  • Two questions sort every requestis the resource empty? and do I (the requesting txn) already hold it? — then compatibility decides inside each branch; the small captions name the LK_RES field each decision reads. (Instance requests pass the escalation check before any of this.)
© 2026 CUBRID Corporation. All rights reserved.

Inside the workhorse — seven paths

Path Do I already hold R? Condition Outcome
A no resource empty grant unconditionally
B no compatible with holders and waiters grant immediately
C no incompatible + ZERO_WAIT return timeout, never block
D no incompatible + willing to wait enqueue FIFO, suspend
E yes lub(req, held) = held re-entrance: count++, done
F yes upgrade compatible with other holders convert in place
G yes upgrade incompatible set blocked_mode, reposition, suspend

E is the OLTP hot path — the class-entry fast path needs no hash lookup and no res_mutex, just hold_mutex and a count bump.
F/G test against the group_mode of others — a transaction cannot conflict with itself.

© 2026 CUBRID Corporation. All rights reserved.

The dispatch in code — compressed

// lock_internal_perform_lock_object — lock_manager.c
start:      /* re-entered ONLY from the
               sibling branches in D & G */
  find_or_insert (key, &res);            /* lockfree */
  if (res is empty)
    return grant_fresh ();               /* PATH A */
  if (find_mine (res->holder, my_tran))
    goto lock_tran_lk_entry;
  if (compat (lock, res->total_waiters_mode)
      && compat (lock, res->total_holders_mode))
    return grant_immediately ();         /* PATH B */
  if (wait_msecs == ZERO_WAIT)
    return TIMEOUT;                      /* PATH C */
  suspend (); return woken_outcome ();   /* PATH D */

lock_tran_lk_entry:   /* I already hold R */
  new_mode = lock_conv (lock, granted_mode);
  if (new_mode == granted_mode)
    { count++; return GRANTED; }         /* PATH E */
  group = lub_of_OTHER_holders ();
  if (compat (new_mode, group))
    { granted_mode = new_mode;
      return GRANTED; }                  /* PATH F */
  blocked_mode = new_mode;               /* PATH G */
  reposition_per_UPR (); suspend ();
  • A–D, new request: the dual compat test is the starvation guard; D enqueues at the FIFO tail, folding into total_waiters_mode. Instance requests run the escalation check at start:.
  • E–F–G, re-request: lub decides — no-op (E), in-place upgrade checked against other holders only (F), or blocked upgrade: both mode fields set + UPR reposition (G).
  • Exits & re-entry: every grant returns; a woken D-waiter returns without re-checking — the releaser made it a holder. goto start lives only in the sibling branches inside D and G (tran_next_wait).
  • Both labels are the real symbolsgit grep-able.
© 2026 CUBRID Corporation. All rights reserved.

What PATH D leaves behind — the structures

center

  • Continuing the earlier snapshot (T7 holds X, T9 waits): T12's X lands on PATH D — a new LK_ENTRY strictly behind T9 at the FIFO tail, total_waiters_mode = lub(X, X), thread suspended.
  • The wake promise: the releaser's thread examines and promotes T12's entry before waking it — so T12 wakes already a holder; otherwise timeout / deadlock victim (§Suspend & resume).
  • These waiter→holder relationships are exactly what the deadlock detector later reads off as WFG edges (Part IV).
© 2026 CUBRID Corporation. All rights reserved.

Lock conversion — the lub rule

  • lub = least upper bound. In the lock-mode lattice ordered by strength, lub(A, B) is the smallest mode that satisfies both A and B simultaneously.
  • Re-requesting your own lock should upgrade, not deadlock with yourself.
  • Rule: granted_mode ← lub(granted_mode, requested_mode)
    • lub(S, IX) = SIX — need both shared and intention-to-write
    • lub(IS, S) = SS already implies IS
    • lub(S, X) = XX already implies S
  • Implementation: a square table lock_Conv[requested][current]src/transaction/lock_table.c.
  • A holder mid-conversion sets both granted_mode (current) and blocked_mode (target) until the upgrade succeeds — that is PATH G.
© 2026 CUBRID Corporation. All rights reserved.

Worked example A — self-upgrade with no conflict (PATH F)

T1 reads then writes the same row r. Premise: r belongs to an MVCC-disabled class (a serial, say) — on MVCC tables the read takes no row S and the write jumps straight to X.

-- autocommit off: the transaction starts with the first statement
SELECT * FROM accounts WHERE id = 5;             -- step 1
UPDATE accounts SET balance = 200 WHERE id = 5;  -- step 2
COMMIT;
Step What T1 asks for granted_mode blocked_mode What LK_ENTRY does
1 S lock on row 5 S NULL_LOCK new entry; placed in r's holder list
2 requested mode X arrives conversion table: lub(S, X) = X. No other holder ⇒ in-place upgrade
3 after upgrade X NULL_LOCK same LK_ENTRY, mode field flipped
  • The aggregate on LK_RES also flips: total_holders_mode goes SX. The holder count stays at 1.
  • Re-requesting your own lock is not an "ungrant + reacquire". The conversion table replaces that potentially deadlock-prone sequence with one atomic transition.

Why this matters: without the conversion table an upgrade is release-then-reacquire — a window for another txn to slip in and deadlock T1 against itself.

© 2026 CUBRID Corporation. All rights reserved.

Worked example B — self-upgrade waiting on a conflict (PATH G)

Same code, same MVCC-disabled premise — but another transaction T2 already holds S on row r when T1 begins.

t T1 (the upgrader) T2 (other reader) r's holders T1's LK_ENTRY
1 acquires S on r T2(S)
2 acquires S on r → granted (S ∧ S = ✓) T1(S), T2(S) granted_mode = S
3 requests X (UPDATE) (still holding S, mid-txn) T1(S), T2(S) granted_mode = S  ·  blocked_mode = X  ←  pending
4 suspended — entry stays in the holder list, repositioned per UPR commits → releases S T1(S) unchanged
5 T2's release runs lock_grant_blocked_holder → upgrade granted T1(X) granted_mode = X  ·  blocked_mode = NULL_LOCK

Different from example A: at step 3, T1 is simultaneously a holder (S) and a pending upgrader (to X). T1 never releases its S while waiting — that's why LK_ENTRY carries two mode fields instead of one.

© 2026 CUBRID Corporation. All rights reserved.

Self-upgrade — worked examples A vs B at a glance

A — no conflict (PATH F) B — conflicting holder (PATH G)
Holder list before step 2 T1(S) T1(S), T2(S)
At conversion request granted_mode = S → X directly granted_mode = S, blocked_mode = X
Does T1 release S while waiting? n/a — never waits no, holds S throughout
Where does the entry sit? holder list holder list (not the waiter list) — repositioned per UPR
What unblocks the upgrade nothing — immediate T2's release triggers lock_grant_blocked_holder
Mode fields used granted_mode only both granted_mode and blocked_mode

Same LK_ENTRY shape, two different code paths. The conversion table covers both: every defined cell of the matrix has a lub, so there's always a target mode to upgrade to.

© 2026 CUBRID Corporation. All rights reserved.

The buried upgrader — why holder-list order matters

A blocked upgrader waits inside the holder list (PATH G: it still holds its old mode, so the waiter queue is off-limits) — and on every release the grant pass scans that list front-to-back, stopping at the first plain holder: while (holder && holder->blocked_mode != NULL_LOCK).

Naive insertion — just append. A class-grain timeline; each entry reads granted/blocked ( = NULL_LOCK):

t event holder list (front → back)
1 T5 IS, T3 IX, T2 IX — all compatible T5 IS/–T3 IX/–T2 IX/–
2 T3 upgrades to SIX; T2's IX conflicts → only the blocked field is written, suspended in place T5 IS/–T3 IX/SIXT2 IX/–
3 T2 commits — grant pass starts at T5: blocked = –exits, nobody examined T5 IS/–T3 IX/SIX
4 SIXIS = ✓ — T3 is grantable, yet sleeps until timeout unchanged — T3 IX/SIX forever
  • An upgrader behind any plain holder is invisible to this and every future release — the pass never scans past the zone boundary.
  • Position is a correctness property, not bookkeeping. So where exactly must an upgrader sit? → drawn out next, then the rule.
© 2026 CUBRID Corporation. All rights reserved.

The buried upgrader — the same timeline, drawn

center

  • Panel ② is the bug: the while test fails at plain T5 — T3's eligibility is never even computed.
  • Panel ③ is the cost: a grantable upgrade (SIX ∧ IS = ✓) sleeps until timeout or deadlock victimhood.
  • The green strip is the real code: UPR moves T3 to the front before it sleeps — granted on the very next release.
© 2026 CUBRID Corporation. All rights reserved.

Why not just scan the whole list? — the trade behind UPR

A full scan would fix the liveness bug — examine every holder on every release, and position stops mattering. The code chooses otherwise, and the reason is the hot path:

center

  • The classic invariant-vs-scan trade: pay for placement once, on the rare path (the PATH G suspend) — and keep the every-release path constant.
  • Full scan would also need continue (not the prefix break) and loses the ->next shortcut — sound only because granted upgrades are repositioned to the zone boundary.
  • The holder list's order has exactly one consumer — this scan; everything else is order-independent.
© 2026 CUBRID Corporation. All rights reserved.

Upgrader Positioning Rule — the placement test

The invariant: keep the holder list shaped [ upgraders… ][ plain holders… ] — then the front-zone scan provably reaches every pending upgrade. lock_position_holder_entry walks the zone with the entry being positioned — call it new — and inserts it before the first available of ta > tb > tc, else appends.

Candidate Condition on existing upgrader i Why insert before it
ta new.blocked_modei.blocked_mode = ✓ compatible upgrades group together — grantable in one pass
tb new.blocked_modei.granted_mode = ✓  ·  i.blocked_modenew.granted_mode = ✗ the other order would wedge the two upgraders
tc first plain holder (blocked_mode == NULL_LOCK) at minimum, stay inside the upgrader zone
  • The grant pass maintains it too: a just-granted upgrade is repositioned back to the zone boundary (lock_grant_blocked_holder) before the pass continues.
  • Provenance: upgrades-before-new-waiters is classical (Gray & Reuter; System R / DB2 lineage); the ta/tb/tc test and the name "UPR" are CUBRID's own (lock_manager.c source comment).
© 2026 CUBRID Corporation. All rights reserved.

ta vs tb — grouping, versus the only order that survives

ta — my target ∧ your target = ✓. We can be granted in the same pass: stand together, and the prefix scan takes the whole group on one release. A batching rule.

tb — we conflict, but one-directionally. Say new = IX→SIX, i = IS→X:  SIX ∧ IS = ✓i's held mode doesn't block me — but X ∧ IX = ✗my held mode blocks i for as long as I hold. Then the order decides everything:

Order in the upgrader zone (front of the holder list) What the grant pass does Outcome
[ new, i ] — UPR's pick new: SIX ∧ IS = ✓granted; i gets its turn at new's commit progress
[ i, new ] i: X ∧ IX = ✗breaknew never examined, though grantable both sleep — wedged
  • The pass grants a compatible prefix and stops at the first failure — so a conflicting pair has exactly one viable order: whoever can be granted while the other still holds goes first.
© 2026 CUBRID Corporation. All rights reserved.

Suspend & resume — six ways to wake

lockwait_state Set by Meaning
LOCK_RESUMED the releaser's grant cascade granted — entry already a holder
LOCK_RESUMED_ABORTED_FIRST deadlock detector victim; this thread runs the rollback
LOCK_RESUMED_ABORTED_OTHER deadlock detector victim; a sibling thread handles the abort
LOCK_RESUMED_DEADLOCK_TIMEOUT deadlock detector victim woken as a retryable timeout
LOCK_RESUMED_TIMEOUT daemon timeout check wait_msecs exceeded
LOCK_RESUMED_INTERRUPT shutdown / interrupt exit with ER_INTERRUPTED
  • Race-free by construction: the waker sets the state before signaling, under the thread-entry mutex.
  • lock_suspend asserts no page latch is held — waiting on a lock while holding a latch is the classic ordering deadlock (Appendix A).
  • Fast gate: an atomic waiter counter (deadlock_and_timeout_detector) makes an idle 100 ms daemon tick cost one read.
  • The three victim rows differ in transaction fate and in who runs the rollback — unpacked in Part IV, right after victim selection.
© 2026 CUBRID Corporation. All rights reserved.

After the wake — what the resumed thread does

Six states, but only three behaviors — one switch in lock_suspend, one verdict back to the dispatch tree:

// lock_suspend, after the sleep — lock_manager.c
wake_chained_siblings ();    /* tran_next_wait relay */
switch (lockwait_state) {
  case RESUMED:            /* the grant cascade ran */
    return RESUMED;        /* already a holder — done */
  case ABORTED_FIRST:      /* I own the rollback */
    set_error (UNILATERALLY_ABORTED);
    abort_reason = TRAN_ABORT_DUE_DEADLOCK;
    while (sibling_workers (my_tran) > 0)
      { interrupt; sleep (10 ms); }  /* drain them */
    return ABORTED;
  case ABORTED_OTHER:      /* a sibling is FIRST */
  case DEADLOCK_TIMEOUT:   /* txn survives */
  case TIMEOUT:            /* wait_msecs ran out */
    set_error_for_timeout ();
    return DEADLOCK_TIMEOUT or TIMEOUT;
  case INTERRUPT: return INTERRUPT;
}
// back in lock_internal_perform_lock_object
if (ret != RESUMED)        /* entry still parked — */
  perform_unlock_object (entry);  /* detach my own */
  • The relay first — before judging its own fate, the wakee forwards the signal down the tran_next_wait chain to sibling waiters.
  • Success is free — on RESUMED the releaser already moved the entry into the holder list; nothing is re-checked.
  • One rollback, exactly — only ABORTED_FIRST does real work: flag the abort, drain sibling workers, unwind. ABORTED_OTHER / DEADLOCK_TIMEOUT just stamp a timeout error — fates split in Part IV.
  • Failure cleans up after itself — every non-granted state leaves the entry in a blocked list; the wakee detaches its own entry via the unlock path.
© 2026 CUBRID Corporation. All rights reserved.

Crossing into release — it starts at lock_unlock_object

Everything since the dispatch tree ran inside lock_object — and the acquire side never wakes anyone. Wake-ups belong to the release path, and its entry point decides what is released when:

// lock_unlock_object — src/transaction/lock_manager.c
if (force) {                       // commit / rollback
  lock_internal_perform_unlock_object
    (..., false, true);
  return;
}
if (lock != S_LOCK) return;        // X is commit-bound
switch (logtb_find_isolation (tran_index)) {
  case TRAN_SERIALIZABLE:
  case TRAN_REPEATABLE_READ: return;
  case TRAN_READ_COMMITTED:
    lock_unlock_object_by_isolation (...); break;
}
  • X locks — always commit-bound. The lock != S_LOCK short-circuit is "These will not be released" in the source.
  • S locks — none on MVCC-enabled classes (plain SELECT never took one). On MVCC-disabled classes, RC releases per statement; RR / SERIALIZABLE hold to commit.
  • Class locks — always commit-bound (intention locks outlive their children).
© 2026 CUBRID Corporation. All rights reserved.

Inside the release — the grant cascade

All three scopes funnel into lock_internal_perform_unlock_object. Right after the released entry comes off the list, the granting happens — in a fixed order:

center

  • Upgraders before waiters — pass 1 walks the upgrader zone (UPR pays off here); pass 2 walks the FIFO against the updated aggregate, stopping at the first incompatible entry.
  • The recompute is O(holders): lub is not invertible — you cannot subtract a mode from an aggregate.
© 2026 CUBRID Corporation. All rights reserved.

Release internals — count, release_flag, and commit

// lock_internal_perform_unlock_object — src/transaction/lock_manager.c
if (release_flag == false) {
  entry_ptr->count--;
  if (entry_ptr->blocked_mode == NULL_LOCK && entry_ptr->count > 0)
    return;            /* re-entrance guard: most unlock calls exit here */
}
  • Every lock_object bumps count; every unlock decrements it. The entry is freed only at zero — a nested caller can't pull the rug from under an outer one.
  • lock_unlock_all (commit / rollback; called only by log_manager.c) passes release_flag = true and releases instances → classes → root — leaf-to-root, the reverse of acquisition order.
  • An LK_RES is erased from the hash table only when holder, waiter and non2pl lists are all empty; a shadow-only resource stays alive.
© 2026 CUBRID Corporation. All rights reserved.

Worked example — RC vs RR on the same timeline

Scope. Textbook 2PL story — in CUBRID it applies only to MVCC-disabled classes (serials, etc.). On MVCC user tables, T2 takes no S lock; the same anomaly is resolved by snapshot timing — see cubrid-mvcc.md.

T2 reads twice in one transaction; T1 updates and commits between the reads. What does T2 see at the second read?

t T1 (writer) T2 (reader, one txn)
1 SELECT WHERE id=r1000  ·  takes S
2 UPDATE WHERE id=r — requests X
3 T1 commits (if able)
4 SELECT WHERE id=r — second read
T2 iso S released at T1 at t = 2 T2 reads at t = 4
RC end of statement (t = 1) X granted → commits 1100non-repeatable read
RR T2 commit X waits for T2's S 1000forbidden
© 2026 CUBRID Corporation. All rights reserved.

Isolation levels at the row — what the code actually varies

Row access READ COMMITTED RR / SERIALIZABLE
SELECT, MVCC table no row lock — snapshot, rebuilt per statement no row lock — snapshot, taken once per txn
SELECT, MVCC-disabled class (serials…) S released at statement end → NON2PL shadow S held to commit
UPDATE / DELETE row lock X held to commit X held to commit — identical
Write to a row modified after my snapshot proceeds — RC re-evaluates on the latest committed version ER_MVCC_SERIALIZABLE_CONFLICT → abort (first-updater-wins, no range locks)
  • Dispatch is two predicates, not the lock table: mvcc_is_mvcc_disabled_class (4 class kinds) and logtb_check_class_for_rr_isolation_err (3 exempt metadata classes; check sits in locator_sr.c).
  • RR ≡ SERIALIZABLE at runtime — every branch tests isolation > TRAN_READ_COMMITTED; SERIALIZABLE is honestly snapshot isolation (TRAN_NO_PHANTOM_READ alias), write skew not prevented.
  • Net: isolation moves snapshot timing + one write-time check. The lock manager's X-to-commit discipline never changes.
© 2026 CUBRID Corporation. All rights reserved.

Lock escalation

// lock_escalate_if_needed — src/transaction/lock_manager.c
if (!lock_check_escalate (th, class_entry, tran_lock))
  return LK_NOTGRANTED;                          // threshold not reached
if (class_entry->granted_mode == IX_LOCK
    || class_entry->granted_mode == SIX_LOCK)
  max_class_lock = X_LOCK;                       // writer  → class X
else
  max_class_lock = S_LOCK;                       // reader  → class S
granted = lock_internal_perform_lock_object (... , max_class_lock,
            LK_FORCE_ZERO_WAIT, &class_entry, NULL);
  • Checked on every instance request — one integer compare (ngranules vs the lock_escalation parameter) under hold_mutex, ~100 ns.
  • The IS → S, IX/SIX → X choice is a heuristic — counting per-row S vs. X would cost exactly what escalation is trying to save.
  • Conditional by design: LK_FORCE_ZERO_WAIT means a contended class lock silently abandons escalation — per-row locking just continues.
  • Opt-out: rollback_on_lock_escalation aborts the transaction instead of coarsening its footprint.
© 2026 CUBRID Corporation. All rights reserved.

Worked example — escalation under bulk DML

A maintenance job runs:

UPDATE accounts SET status = 'archived' WHERE created_at < '2020-01-01';
-- imagine 50,000 rows match
Step What the lock manager does LK_ENTRY count (this txn)
1 lock_scan(accounts, IX) → class-level intent 1 (class IX)
2 lock_object(row_1, X), (row_2, X), … 1 + N row entries
3 at ngranules = lock_escalation (e.g. 10,000): the next request fires lock_escalate_if_needed escalation triggered
4 class IX → X conversion (no-wait); conversion cleanup frees the now-subsumed row entries 1 (class X)
5 remaining 40,000 rows: subsumed by class X — fast-out before the hash lookup stays at 1
  • ✅ Memory drops from O(rows touched) to O(1); later checks are one matrix lookup.
  • ❌ Other transactions wanting any row in accounts now block on the class X for the rest of the txn.
  • ⚠️ If another txn holds a conflicting class lock at step 4, the no-wait attempt fails — escalation is abandoned and per-row locking continues.
© 2026 CUBRID Corporation. All rights reserved.

Deadlock — Waits-For Graph

center

  • Each detection run derives edges (LK_WFG_EDGE) from the lock table — built fresh per scan, not maintained online at wait time.
  • lock_detect_local_deadlock walks the graph with DFS; a back-edge to a node on the path is a cycle.
  • "Local" — distributed deadlocks are not this detector's problem; they fall back to per-request timeout.
© 2026 CUBRID Corporation. All rights reserved.

Detection — three gates, five phases

center

  • The WFG is rebuilt from scratch on every run — zero bookkeeping on the lock/unlock hot path; the scan costs O(active resources), at most once per second (lock_deadlock_detect_interval).
  • Safety net: 60 consecutive victim-less runs (~6 s) force-time-out one suspended thread to break undetectable hangs.
© 2026 CUBRID Corporation. All rights reserved.

WFG in code — the detector's own structures

// lock_manager.c — TWFG: rebuilt into lk_Gl on every detection run
struct lk_WFG_node {              // one slot per transaction
  int   first_edge;               // head of my outgoing edges (-1 = none)
  int   current, ancestor;        // DFS cursor + path parent (phase 3)
  INT64 thrd_wait_stime;          // when this txn started waiting
  int   tran_edge_seq_num;        // stale-edge filter
  bool  checked_by_deadlock_detector, DL_victim;
};
struct lk_WFG_edge {              // one "waits-for" arrow
  int   to_tran_index;            // the txn I wait for
  int   holder_flag;              // target holds? → victim criterion #1
  int   next;                     // next edge of the same from-node
  int   edge_seq_num;   INT64 edge_wait_stime;
  LK_ENTRY *holder, *waiter;      // evidence back in the lock table
};
  • An adjacency list in two flat arrays: lk_Gl.TWFG_node[tran_index] + one shared TWFG_edge[] pool — edges chain by index (first_edgenext), not by pointer.
  • Nodes are transactions, not threads or resources; edge storage grows in tiers — 200 (static) → 1,000 → MAX_NTRANS² (malloc) — read-only workloads never allocate.
© 2026 CUBRID Corporation. All rights reserved.

Detection in compressed code — one run, five phases

// lock_detect_local_deadlock — lock_manager.c
/* 1 reset */
for (k : every txn slot)
  TWFG_node[k] = { first_edge: -1, checked: true };
/* 2 build */
for (res : every LK_RES)          /* hash iterate */
  for ((blocked, blocking) : holder×holder,
        waiter×holder, waiter×waiter pairs)
    if (!compat (blocked_mode, their mode))
      lock_add_WFG_edge (from, to, holder_flag);
/* 3 DFS */
for (k : every txn)  walk current / ancestor;
  reach a node already on my path → CYCLE
/* 4–5 victims */
lock_select_deadlock_victim ();   /* six criteria */
lock_wakeup_deadlock_victim_* (); /* abort | timeout */
  • 2 build: the three pair-loops are literally the three edge kinds (next slide).
  • 3 DFS: iterative — the cursor lives in the node (current / ancestor); no recursion, no extra stack. Stale nodes pruned by seq_num.
  • Function names are the real symbolsgit grep-able.
© 2026 CUBRID Corporation. All rights reserved.

Building the graph — three kinds of edges

Edge From → To Captures
(a) upgrader → upgrader conversion conflicts inside one holder list
(b) waiter → holder the classic "blocked by a granted (or pending) mode"
(c) waiter → earlier waiter FIFO ordering: an incompatible request ahead of you blocks you too
  • Every (blocked, blocking) pair on every resource becomes one directed LK_WFG_EDGE; the holder_flag on each edge feeds victim criterion #1.
  • Stale edges (their txn committed mid-scan) are filtered during DFS by sequence-number checks; a cycle containing one is discarded as false.
© 2026 CUBRID Corporation. All rights reserved.

Victim selection — six tie-breakers, in order

  1. Must be a lock holder somewhere in the cycle (holder_flag) — only a holder's rollback frees what the cycle waits on; aborting a pure waiter breaks nothing.
  2. Must be an active transaction — one already committing / aborting is exiting anyway; it cannot be aborted again.
  3. Prefer one without deadlock priority — protected transactions survive.
  4. Prefer fewer log records written — the cheapest rollback.
  5. Prefer one with a finite timeout — that application already expects retries.
  6. Prefer the youngest (highest tranid) — the least work thrown away, and elders keep their progress instead of being re-victimized forever.
  • Resolution is two-mode: finite-wait_msecs victims wake as LOCK_RESUMED_DEADLOCK_TIMEOUT (retryable); infinite-wait victims wake as LOCK_RESUMED_ABORTED_* (rollback).
  • Not "most recently blocked" — earlier revisions of this deck (and the high-level doc) inferred that from insertion order; the detail doc's walk of lock_select_deadlock_victim settles it.
© 2026 CUBRID Corporation. All rights reserved.

Resolving the victim — why three wake states

One victim, three possible wake states. The differences: what happens to the transaction, and who does the work:

State The transaction This thread's job Surfaces as
DEADLOCK_TIMEOUT not rolled back — only this statement fails remove its own entry, return retryable lock-timeout error
ABORTED_FIRST rolled back owns the rollback — flags the abort, waits for sibling workers to drain, then unwinds transaction-aborted error
ABORTED_OTHER rolled back — by the FIRST thread nothing — returns immediately (in fact returns DEADLOCK_TIMEOUT to its caller) timeout-style error; quit to client
  • The TIMEOUT / ABORTED split is the victim's wait_msecs: finite → that application already handles timeout-shaped failures, so the transaction survives and may retry; infinite → abort is the only exit.
  • FIRST / OTHER exists because one transaction can have several threads parked on locks at once — the detector wakes them all, but exactly one is "charged of the transaction abortion" (the source comment); the rest are notified as timeouts so the rollback never runs twice.
© 2026 CUBRID Corporation. All rights reserved.

Worked example — two transactions, two rows

Setup: accounts (id, balance), initial A.balance = 1000, B.balance = 1000. Two sessions start at the same time:

Txn Statement Intent
T1 UPDATE accounts SET balance = balance - 100 WHERE id = 'A' X lock on A
T1 UPDATE accounts SET balance = balance + 100 WHERE id = 'B' X lock on B
T2 UPDATE accounts SET balance = balance - 50 WHERE id = 'B' X lock on B
T2 UPDATE accounts SET balance = balance + 50 WHERE id = 'A' X lock on A

T1 transfers from A to B; T2 transfers from B to A. The schedule depends on how the scheduler interleaves the two — and one of them is unlucky.

© 2026 CUBRID Corporation. All rights reserved.

Worked example — interleaving creates the cycle

t T1 acts T2 acts Lock table state
1 X-lock A → granted A: holder = T1
2 X-lock B → granted A: T1 ; B: T2
3 X-lock B → wait A: T1 ; B: T2, waiter T1
4 X-lock A → wait A: T1, waiter T2 ; B: T2, waiter T1
  • After t = 4, both transactions are in LOCK_SUSPENDED.
  • Neither will progress without external intervention — the lock table now encodes a cycle of length 2, waiting for the next detection run to find it.

Per-request timeout would eventually break this, but minutes of wasted wait — the detector resolves it within its one-second scan interval.

© 2026 CUBRID Corporation. All rights reserved.

Worked example — the cycle, visualized

center

  • Left: the concrete lock table — two LK_RES records, each with a holder and a waiter.
  • Right: the abstract WFG — one node per transaction, one waiter→holder edge per blocking relationship (edge category (b)).
  • The detector never inspects rows. It sweeps the lock table once to build the graph, then walks only the graph.
© 2026 CUBRID Corporation. All rights reserved.

Worked example — the detector resolves it

A daemon tick passes the gates (two suspended threads; interval up):

  1. Build. Sweep the lock table: row A yields edge T2 → T1; row B yields edge T1 → T2.
  2. DFS. Start at T1 → follow to T2 → back-edge to T1, which is on the DFS path → cycle found.
  3. Victim. Both are holders, both active, no deadlock priority set, log-record counts tie at one update each — the youngest tie-breaker picks T2 (it started later).
  4. Wake. With the default infinite wait, T2 resumes as LOCK_RESUMED_ABORTED_FIRST → rollback releases its X on B. (With a finite wait_msecs it would get the retryable DEADLOCK_TIMEOUT instead.)
  5. Cascade. The release grants T1's pending X on B (§grant cascade); T1 runs the rest of its statements and commits.

No lock is ever taken away from a running txn. The loser pays a rollback; the winner only notices a bit of extra wait.

© 2026 CUBRID Corporation. All rights reserved.

Worked example — a three-transaction cycle

Each transaction grabs one row, then reaches for the next — a circular chain.

t T1 T2 T3 Wait state
1 X on R1 → granted
2 X on R2 → granted
3 X on R3 → granted
4 X on R2 → wait T1 → T2
5 X on R3 → wait T1 → T2 → T3
6 X on R1 → wait T1 → T2 → T3 → T1 (cycle)

center

  • DFS from T1 walks T1→T2→T3, back-edge to T1 — cycle at depth 3. Victim = T3: criteria 1–5 tie; youngest decides.
  • T3 aborts → R3 freed → T2 wakes → R2 freed → T1 wakes — a chain of grant cascades.
© 2026 CUBRID Corporation. All rights reserved.

Special paths — instant probe

lock_hold_object_instant lock_object
hash operation find (read-only) find_or_insert
creates LK_ENTRY never yes
modifies holder / waiter lists never yes
on conflict returns LK_NOTGRANTED suspends the thread
  • "Would this lock be granted right now?" — the same dual-aggregate compatibility test, with zero state created.
  • User & workload — and why: scan_manager.c, row visits during heap / index scans. A real lock_object per visited row costs a hash find_or_insert plus list mutation under res_mutexper row — yet most rows are uncontended. So the scan probes read-only first and falls back to lock_object only on conflict: the common case never pays for a lock entry.
© 2026 CUBRID Corporation. All rights reserved.

Special paths — composite lock

center

  • User & workload — and why: query_executor.c, bulk DELETE / UPDATE plans (carried in the XASL). Acquiring each row's X mid-scan invites deadlocks with concurrent transactions and threads the suspends through the scan loop — so the scan only collects OIDs and all X locks land in one batch at finalize (detail ch. 9).
  • Built-in escalation: if a class's collected-OID count reaches the lock_escalation threshold, collection stops and finalize takes class X instead.
© 2026 CUBRID Corporation. All rights reserved.

Special paths — NON2PL, shadows for early-released S locks

Phase Function What happens
1 — create lock_add_non2pl_lock the released S entry becomes a shadow on the resource + txn non2pl lists
2 — mark lock_update_non2pl_list a later conflicting grant flips it to INCON_NON_TWO_PHASE_LOCK
3 — notify lock_notify_isolation_incons next fetch returns LC_FETCH_DECACHE_LOCK — the client drops its cached copy
4 — cleanup lock_unlock_all all shadows removed at commit / rollback
  • A shadow never blocks anyone — it is bookkeeping, not a lock mode in the matrix sense.
  • Why it exists: RC's statement-end S release is a deliberate 2PL violation — afterwards the client still holds a cached copy it no longer protects. The shadow records that fact, so a later conflicting grant can mark the copy stale (phase 2) and decache it at the next fetch (phase 3).
  • User & workload: serial.c — serial-row access under RC (§caller map: the main NON2PL exerciser). Predates MVCC, when every RC read released early; today only the MVCC-disabled niche remains.
© 2026 CUBRID Corporation. All rights reserved.

Special paths — the rest of the API surface

API One line
lock_classes_lock_hint batch class locks for a prefetch set (locator_sr.c), sorted by OID first — consistent order prevents class-level deadlocks
lock_demote_class_lock the only downgrade path; class locks only, "carefully used" (checksumdb, index load)
lock_subclass MGL one level up: a subclass lock requires an intent lock on the superclass (partitions)
lock_rep_read_tran root-class lock for ALTER TABLE … ADD COLUMN NOT NULL under RR — one call site

None of these introduce new machinery — every one bottoms out in lock_internal_perform_lock_object.

© 2026 CUBRID Corporation. All rights reserved.

The two analysis docs carry what a slide can't:

Doc What's there
cubrid-lock-manager.md design intent, theory ↔ code mapping, open questions, position hints
…-detail.md ch. 1–2 every struct field; boot, the hash function, lockfree integration
ch. 3–4 paths A–G line-by-line; UPR deep dive (ta / tb / tc candidates)
ch. 5–7 NON2PL & instant-lock internals; suspend / timeout machinery
ch. 8–9 WFG staleness checks, victim-selection code; composite / demote / subclass

Both docs keep position-hint tables with line numbers as of their updated: dates.
When they drift: git grep -n '<symbol>' src/transaction/.

© 2026 CUBRID Corporation. All rights reserved.

Beyond CUBRID — research frontiers

Direction One line
Bamboo (2021) Release X locks before commit. Generalizes CUBRID's NON2PL.
Brook-2PL (2025) Static dependency pre-analysis → deadlock-free 2PL.
TXSQL (2025) Adaptive lock-mode adjustment + contention-aware scheduling.
OCC (Hekaton / Silo / Cicada) Replace the lock table with commit-time validation.
SSI (PostgreSQL) Predicate locking — a cheaper path to serializability.
VLL Partition the lock table itself when it is the bottleneck.

CUBRID's dial position: textbook 2PL + MGL + WFG detection. A well-defended point.

© 2026 CUBRID Corporation. All rights reserved.

Thank you

Q & A

  • Analysis: cubrid-lock-manager.md  ·  deep dive: cubrid-lock-manager-detail.md
  • Code: src/transaction/lock_manager.{h,c} · lock_table.{h,c} · wait_for_graph.{h,c}
  • Questions afterwards: hgryoo@cubrid.com
© 2026 CUBRID Corporation. All rights reserved.

Appendix

© 2026 CUBRID Corporation. All rights reserved.

Appendix A — Lock vs Latch, side by side

Two words that sound alike. Same DBMS, completely different machines.

Dimension Lock Latch
What it protects transactional serialization order (logical) physical structure integrity (mid-split, mid-rebalance)
Held for duration of the transaction (or per-statement under RC) duration of a critical section (microseconds)
Stored external lock table (LK_RES, LK_ENTRY) inside the page / structure itself (PGBUF_LATCH)
Acquisition discipline 2PL (growing → shrinking) latch coupling, fixed lock order, deadlock-free by design
Granularity OID-shaped: database, class, instance page, list, hash bucket
Conflict resolution wait → WFG cycle scan → victim rollback spin or sleep; never deadlocks if discipline is followed

Database Internals ch. 5's first hard rule: don't conflate them.

© 2026 CUBRID Corporation. All rights reserved.

Appendix A — Lock vs Latch, in code (CUBRID heap insert)

PGBUF_LATCH page (X)              ← physical: nobody else may mutate this page
allocate slot, write record       ← page is now correct in memory
lock_object on (page, slot) OID   ← logical: claim transactional visibility
UNLATCH page                      ← physical exclusion ends here

... continue with the rest of the txn (latch released, lock retained) ...

at commit: lock_unlock_object     ← logical lock finally released
  • Page latch lives for microseconds — only while the bytes mutate.
  • Row lock lives for minutes to hours — until the txn commits.
  • The two timescales differ by orders of magnitude. Conflating them is a structural bug:
    • holding a lock during a page-level critical section serializes the entire workload through that page.
    • holding a latch across a network round-trip serializes the entire workload through that thread.
  • B-tree split uses latches only — no transactional visibility implications.
© 2026 CUBRID Corporation. All rights reserved.

Appendix B — the lock-table hash is not a modulo

// lock_get_hash_value — src/transaction/lock_manager.c
next_base_slotid = 2;
while (next_base_slotid <= slotid)
  next_base_slotid *= 2;
addr = pageid + (htsize / next_base_slotid)
              * (2 * slotid - next_base_slotid + 1);
return addr % htsize;
OID (page, slot) bucket (htsize = 6,000)
(100, 1) 3100
(100, 2) 1600
(100, 3) 4600
(100, 4) 850
  • Consecutive slots of one heap page are strided apart (van Emde Boas-style), so a scan locking rows in order hits widely separated buckets — no chain contention between parallel scans.
  • slotid ≤ 0 (index last-key OIDs use -1) falls back to a simple pageid - slotid.
© 2026 CUBRID Corporation. All rights reserved.

Appendix C — Q&A: how does release detach an entry?

"When a transaction commits, does it search the lock table for its entries?" — No. Commit walks its own hold lists and calls unlock with the entry pointer already in hand (§dual-view). The only walk left is splicing the entry out of the resource's holder list:

// lock_internal_perform_unlock_object — src/transaction/lock_manager.c
/* resource-side list is singly linked — find prev to splice */
prev = NULL;  curr = res_ptr->holder;
while (curr != NULL) {
  if (curr->tran_index == tran_index) break;
  prev = curr;  curr = curr->next;
}
if (prev == NULL) res_ptr->holder = curr->next;
else              prev->next = curr->next;
Linkage Shape Removal Why this shape
tran_next / tran_prev — txn view doubly linked O(1) commit unlinks thousands, one by one (lock_unlock_all)
next — resource view singly linked prev-walk above few holders per object; the same res_mutex section re-walks the list anyway (recompute + grant passes)
© 2026 CUBRID Corporation. All rights reserved.