CUBRID Transaction — TDES, Isolation, and Savepoints

Contents:

Theoretical Background
Common DBMS Design
CUBRID’s Approach
Source Walkthrough
Source verification (as of 2026-05-01)
Beyond CUBRID — Comparative Designs & Research Frontiers
Sources

Theoretical Background

A transaction in the relational sense is the unit of atomicity and isolation the engine sells to the client. Database Internals (Petrov, ch. 5 §“Transactions”) frames the responsibilities hierarchically: ACID-A (atomicity) is owned by the recovery manager through WAL and CLRs; ACID-D (durability) by the log force-flush at commit; ACID-C (consistency) by the data-side constraints; and ACID-I (isolation) by the joint operation of the lock manager and the MVCC visibility machinery. The transaction module sits at the hub where those four threads meet: it owns the per-transaction state that lock manager, log manager, MVCC table, and recovery manager all read and write.

The unit of state is the transaction descriptor (TDES in CUBRID, PROC in PostgreSQL, trx_t in InnoDB). The descriptor has a long list of obligations:

A stable identity (trid) that survives connection drops and appears in every log record.
A lifecycle state (active, committed, aborted, in-doubt, …) that recovery’s analysis pass reconstructs at restart.
An isolation level that gates what the access path’s visibility and lock acquisition look like.
A history of LSAs — head, tail, undo-next, postpone-next, savepoint, top-op — that recovery and rollback need to walk the transaction backwards.
A collection of side state — modified-classes registry, replication records, unique-index statistics, lob locators — that needs to be cleaned at commit / abort.

Two implementation choices the transaction model leaves open shape the rest of this document:

Where the TDES lives and how it is named. The textbook answer is “in a fixed-size transaction table indexed by transaction index”. The variants are how the table is sized (static vs. elastic), how indices are reused, and what is hot vs. cold inside the descriptor. CUBRID picks a fixed-size table, allocates TDES slots from a contiguous area, and recycles slots through a hint_free_index.
How isolation is enforced — at the access boundary, at the statement boundary, or both. SI engines enforce read isolation via snapshot acquisition; 2PL engines enforce it via lock acquisition; hybrid engines (CUBRID is one) acquire snapshots and take key-range locks for SERIALIZABLE / REPEATABLE READ. The isolation field on the TDES is the dispatch key.

Once those choices are named, every other piece of state on the TDES is in service of one of them.

Common DBMS Design

Every relational engine that supports nested rollbacks, isolation level toggling, and recovery uses the same handful of patterns around the transaction descriptor.

Per-transaction descriptor table

A fixed-size array of descriptors, indexed by a small integer (“transaction index”). The trade-off: fixed size means client count caps at table size, but indexing is O(1) and the memory layout is cache-friendly. PostgreSQL uses MaxBackends-sized PROC array; InnoDB has a trx_sys->trx_list; SQL Server uses a hash on tid. CUBRID is in the fixed-array camp.

Isolation level as TDES enum

Isolation is a per-TDES enum with three or four values (READ COMMITTED, REPEATABLE READ, SERIALIZABLE; some engines add READ UNCOMMITTED). The value is read at three places: snapshot acquisition (which kind of MVCC snapshot to build), lock acquisition (whether to take key-range locks), and statement boundary (whether to release short-cursor locks early). Every engine has the same three-way switch.

Nested top-operations / system operations

Any operation that touches multiple pages atomically (B+Tree split, heap overflow allocation, schema mutation) needs a sub-transactional unit of recovery — a “system op” in CUBRID, a “subxact” in PostgreSQL, a “mtr” in InnoDB. The TDES carries a stack of in-progress system ops. Commit pops the top frame and merges its log range into the parent; abort pops and rolls back the frame. The stack is a recursion-style structure, not a tree.

Savepoints as named LSAs

A savepoint is a name attached to “the LSA of the most recent log record at savepoint creation”. Rollback-to-savepoint is “undo all log records between current tail LSA and the savepoint LSA”. The implementation is a chain on the TDES: savept_lsa plus a prv_savept chain in each savepoint log record. Re-establishing savepoints in recovery happens by walking the chain.

Lifecycle as state machine, not flag soup

Naive engines use a bool committed and a bool aborted. Real engines use an enum with 10+ states because 2PC and postpone operations create transitions (“committed-with-postpone”, “committed-informing-participants”, “2PC-prepared”, “unilaterally-aborted”) that aren’t expressible with two booleans. CUBRID’s TRAN_STATE enum has 15 values; PostgreSQL has fewer (2PC is its own subsystem); InnoDB has fewer still (no 2PC server side).

Theory ↔ CUBRID mapping

Theoretical concept	CUBRID name
Transaction identifier	`TRANID trid` (in `LOG_TDES`)
Transaction descriptor	`LOG_TDES` (`log_impl.h`)
Transaction table	`TRANTABLE log_Gl.trantable` (`log_impl.h`)
Transaction state enum	`TRAN_STATE` — 15 states (`log_comm.h`)
Isolation level enum	`DB_TRAN_ISOLATION` aliased to `TRAN_ISOLATION` (`compat/dbtran_def.h`)
Active / committed / aborted predicates	`LOG_ISTRAN_ACTIVE`, `LOG_ISTRAN_COMMITTED`, `LOG_ISTRAN_ABORTED` macros (`log_impl.h`)
MVCC info (visibility, snapshot)	`LOG_TDES::mvccinfo` of type `MVCC_INFO`
Nested top-op stack	`LOG_TDES::topops` of type `LOG_TOPOPS_STACK` (`log_impl.h`)
Top-op log range	`LOG_TOPOPS_ADDRESSES { lastparent_lsa; posp_lsa }` per stack frame
Savepoint	`LOG_TDES::savept_lsa` + `LOG_REC_SAVEPT` chain
Postpone (deferred actions)	`LOG_TDES::posp_nxlsa` + `log_postpone_cache m_log_postpone_cache`
Modified-class registry	`LOG_TDES::m_modified_classes` of type `tx_transient_class_registry`
Per-tran B+Tree unique stats	`LOG_TDES::m_multiupd_stats` of type `multi_index_unique_stats`
2PC coordinator info	`LOG_TDES::coord` of type `LOG_2PC_COORDINATOR *` (covered in cubrid-2pc.md)
2PC global tran info	`LOG_TDES::gtrinfo` of type `LOG_2PC_GTRINFO`
Recovery-time TDES annotations	`LOG_TDES::rcv` of type `LOG_RCV_TDES`
Server-side commit entry	`xtran_server_commit` (`transaction_sr.c`)
Server-side abort entry	`xtran_server_abort` (`transaction_sr.c`)
Client-visible commit	`tran_commit` (`transaction_cl.c`)
Reusable index assignment	`logtb_assign_tran_index` (`log_tran_table.c`)
System op (sub-transaction) stack push	`log_sysop_start` (`log_manager.c`)

CUBRID’s Approach

The transaction module’s four moving parts are the trantable that holds all live TDES, the TDES itself, the lifecycle state machine that the trantable’s entries traverse, and the system-op stack the TDES owns for sub-transactional rollback. We walk them in that order.

Overall structure

flowchart LR
  subgraph CL["Client side (transaction_cl.c)"]
    TCL["tm_Tran_index\ntm_Tran_isolation\ntm_Tran_ID"]
    API["tran_commit\ntran_abort\ntran_savepoint_internal"]
    TCL --> API
  end
  subgraph SR["Server side (transaction_sr.c)"]
    XSC["xtran_server_commit"]
    XSA["xtran_server_abort"]
    XSV["xtran_server_savepoint"]
  end
  subgraph TT["Trantable (log_Gl.trantable)"]
    HDR["TRANTABLE { num_total_indices, hint_free_index, all_tdes[] }"]
    T1["LOG_TDES idx=1\ntrid=42"]
    T2["LOG_TDES idx=2\ntrid=43"]
    Tn["..."]
    HDR --> T1
    HDR --> T2
    HDR --> Tn
  end
  subgraph LM["log_manager"]
    LC["log_commit"]
    LA["log_abort"]
    LS["log_sysop_start / commit / abort"]
  end
  API -->|RPC| XSC
  API -->|RPC| XSA
  API -->|RPC| XSV
  XSC --> LC
  XSA --> LA
  LC --> T1
  LA --> T1
  LS --> T1

The figure encodes the three boundaries the transaction module sits at. (client/server) the client TDES is a thin shadow (tm_Tran_* globals); the heavy state lives server-side. (TDES / trantable) TDES slots are owned by the trantable; lookups are O(1) by index. (TDES / log) every commit / abort / savepoint / system-op call mutates the TDES and appends a log record; the two are kept in lockstep.

TDES — the descriptor

LOG_TDES in log_impl.h is the central data structure of the transaction module. It is large; below is the load-bearing slice with line-comment annotation.

// LOG_TDES — src/transaction/log_impl.h
struct log_tdes
{
  /* === MVCC and identity === */
  MVCC_INFO mvccinfo;           /* MVCC info — snapshot, MVCCID, sub-IDs */

  int       tran_index;         /* Index into trantable */
  TRANID    trid;               /* Stable transaction identifier */

  /* === lifecycle === */
  bool          isloose_end;
  TRAN_STATE    state;          /* 15-value enum */
  TRAN_ISOLATION isolation;     /* READ_COMMITTED | REPEATABLE_READ | SERIALIZABLE */
  int           wait_msecs;     /* Lock wait timeout */

  /* === LSA chain — these are the recovery anchors === */
  LOG_LSA head_lsa;             /* First record of this transaction */
  LOG_LSA tail_lsa;             /* Last record */
  LOG_LSA undo_nxlsa;           /* Next record to undo (compensate-aware) */
  LOG_LSA posp_nxlsa;           /* First / next postpone record */
  LOG_LSA savept_lsa;           /* Last user/system savepoint */
  LOG_LSA topop_lsa;             /* Last system op */
  LOG_LSA tail_topresult_lsa;   /* Last partial abort/commit */
  LOG_LSA commit_abort_lsa;     /* Commit/abort record (used by checkpoint) */

  /* === client identity, locking, and 2PC === */
  int             client_id;
  int             gtrid;        /* Global tran ID for 2PC */
  CLIENTIDS       client;
  SYNC_RMUTEX     rmutex_topop; /* Reentrant mutex serialising sysop begin/end */
  LOG_TOPOPS_STACK topops;      /* Active sub-transactional ops */
  LOG_2PC_GTRINFO gtrinfo;
  LOG_2PC_COORDINATOR *coord;   /* NULL unless this site is the 2PC coordinator */

  /* === per-transaction caches and stats === */
  int   num_unique_btrees;
  multi_index_unique_stats m_multiupd_stats;
  volatile sig_atomic_t interrupt;
  tx_transient_class_registry m_modified_classes;
  int   num_transient_classnames;
  int   num_repl_records;
  struct log_repl *repl_records;
  LOG_LSA repl_insert_lsa;
  LOG_LSA repl_update_lsa;
  void   *first_save_entry;
  int    suppress_replication;
  struct lob_rb_root lob_locator_root;
  INT64 query_timeout;
  INT64 query_start_time;
  INT64 tran_start_time;
  XASL_ID xasl_id;
  LK_RES *waiting_for_res;       /* The lock-resource I'm blocked on, if any */
  int     disable_modifications;
  TRAN_ABORT_REASON tran_abort_reason;
  int    num_exec_queries;
  DB_VALUE_ARRAY bind_history[MAX_NUM_EXEC_QUERY_HISTORY];
  int    num_log_records_written;
  LOG_TRAN_UPDATE_STATS log_upd_stats;
  bool   has_deadlock_priority;
  bool   block_global_oldest_active_until_commit;
  bool   is_user_active;
  LOG_RCV_TDES rcv;              /* Recovery-time annotations only */
  log_postpone_cache m_log_postpone_cache;
  bool   has_supplemental_log;
  char  *ddl_sql_user_text;
  // ... member functions for sysop locking and oldest-mvccid pinning ...
};

The struct is dense but cleanly stratified. The first block (mvcc info, identity) is what every page access reads. The second block (state, isolation, wait_msecs) is what lock acquisition and visibility decisions read. The LSA chain is the recovery-side contract: every TDES needs head_lsa/tail_lsa so analysis can identify the transaction, undo_nxlsa for rollback, posp_nxlsa for postpone replay, and savept_lsa / topop_lsa / tail_topresult_lsa for partial-rollback semantics. The remaining fields are bookkeeping: lock the transaction is waiting on, lobs to clean up, replication records to ship, modified classes to invalidate on commit.

Trantable — the table of TDES

The trantable in log_impl.h is a small header plus a contiguous allocation area for TDES.

// TRANTABLE — src/transaction/log_impl.h
struct trantable
{
  int num_total_indices;        /* Capacity (configured at boot) */
  int num_assigned_indices;     /* Currently in use */
  int num_coord_loose_end_indices;
  int num_prepared_loose_end_indices;
  int hint_free_index;          /* Speeds up next assignment */
  volatile sig_atomic_t num_interrupts;
  LOG_ADDR_TDESAREA *area;      /* Linked list of TDES storage areas */
  LOG_TDES **all_tdes;          /* Indexed pointer table */
};

The two important properties: (a) all_tdes is the lookup table indexed by transaction index, so LOG_FIND_TDES(idx) is one load. (b) area is a chain of contiguous allocations rather than a single block, because the table can grow (logtb_grow_* paths) without invalidating existing pointers.

logtb_assign_tran_index (log_tran_table.c:796) is the assigner. It uses hint_free_index to find a free slot fast, allocates a new area if needed, and initializes a fresh TDES — including logtb_set_loose_end_chkpt_lsa and the first head_lsa. The matching logtb_release_tran_index (1139) clears the TDES, releases locks the transaction held, and updates hint_free_index.

The trantable lives behind the TR_TABLE_CS critical section (csect_enter (CSECT_TRAN_TABLE)); writers (assign / release) take it in write mode, readers (most TDES lookups) take it in read mode or skip it entirely when going via direct index.

Lifecycle — the TRAN_STATE state machine

TRAN_STATE (log_comm.h) is the lifecycle enum. It has 15 values, which matters: a “transaction” is more than alive/dead/in-progress because postpone, 2PC, and unilateral abort each create their own intermediate states the recovery analysis pass must distinguish.

// TRAN_STATE — src/transaction/log_comm.h
enum
{
  TRAN_RECOVERY,                                  /* system tran for recovery */
  TRAN_ACTIVE,                                    /* normal in-flight */

  TRAN_UNACTIVE_COMMITTED,                        /* commit complete */
  TRAN_UNACTIVE_WILL_COMMIT,                      /* commit log written, force pending */
  TRAN_UNACTIVE_COMMITTED_WITH_POSTPONE,          /* committed, postpones running */
  TRAN_UNACTIVE_TOPOPE_COMMITTED_WITH_POSTPONE,   /* sysop-postpone variant */

  TRAN_UNACTIVE_ABORTED,                          /* user-initiated abort */
  TRAN_UNACTIVE_UNILATERALLY_ABORTED,             /* system aborted (crash) */

  TRAN_UNACTIVE_2PC_PREPARE,                      /* prepared, waiting decision */
  TRAN_UNACTIVE_2PC_COLLECTING_PARTICIPANT_VOTES, /* coordinator phase 1 */
  TRAN_UNACTIVE_2PC_ABORT_DECISION,               /* coordinator phase 2 — abort */
  TRAN_UNACTIVE_2PC_COMMIT_DECISION,              /* coordinator phase 2 — commit */
  TRAN_UNACTIVE_COMMITTED_INFORMING_PARTICIPANTS, /* informing after commit */
  TRAN_UNACTIVE_ABORTED_INFORMING_PARTICIPANTS,   /* informing after abort */

  TRAN_UNACTIVE_UNKNOWN
} TRAN_STATE;

The LOG_ISTRAN_* predicates in log_impl.h:143-183 collapse the enum into the questions the rest of the engine asks: LOG_ISTRAN_ACTIVE checks “is this a normal in-flight tran on a restarted server”, LOG_ISTRAN_COMMITTED collapses 5 commit-side states, LOG_ISTRAN_ABORTED collapses 4 abort-side states, LOG_ISTRAN_2PC_IN_SECOND_PHASE collapses the second-phase 2PC states. The collapsed views are how the recovery manager decides “do we need to redo, undo, or finish 2PC”, and how callers like logpb_checkpoint decide whether a TDES needs further attention.

Isolation — three levels, dispatched at access time

DB_TRAN_ISOLATION (compat/dbtran_def.h) is a 3-bit field:

// DB_TRAN_ISOLATION — src/compat/dbtran_def.h
typedef enum
{
  TRAN_UNKNOWN_ISOLATION = 0x00,

  TRAN_READ_COMMITTED      = 0x04,   /* alias TRAN_REP_CLASS_COMMIT_INSTANCE,
                                                TRAN_CURSOR_STABILITY */
  TRAN_REPEATABLE_READ     = 0x05,   /* alias TRAN_REP_READ */
  TRAN_SERIALIZABLE        = 0x06,   /* alias TRAN_NO_PHANTOM_READ */

  TRAN_DEFAULT_ISOLATION = TRAN_READ_COMMITTED,
} DB_TRAN_ISOLATION;

The value is set per-TDES (log_tdes::isolation), with a client-side shadow in tm_Tran_isolation (transaction_cl.h). It is read at three places:

Snapshot acquisition (mvcc_satisfies_snapshot in cubrid-mvcc.md): READ COMMITTED reacquires per-statement; REPEATABLE READ holds the snapshot for the transaction; SERIALIZABLE behaves like REPEATABLE READ at the snapshot level but adds key-range locks.
Lock acquisition (lock manager): SERIALIZABLE takes key-range locks at scan boundaries. REPEATABLE READ relies on MVCC for read-stability and only locks data being written. READ COMMITTED takes minimal locks; locks released at statement boundary.
Statement boundary (xtran_*_query_end_*): for cursor-stability semantics READ COMMITTED releases its snapshot; the others retain.

The deliberate aliasing — TRAN_CURSOR_STABILITY is the same value as TRAN_READ_COMMITTED — is a backward-compatibility seam: older APIs named the levels by the locking-engine vocabulary (cursor-stability, repeatable-class), newer code uses the SQL-standard names. Both work, and both compile to the same dispatch.

Commit and abort — the standard paths

Server-side commit lands at xtran_server_commit (transaction_sr.c:71); abort at xtran_server_abort (128). They are thin RPC wrappers that forward to log_commit / log_abort in log_manager.c, which is where the actual sequencing happens.

// xtran_server_commit — src/transaction/transaction_sr.c (condensed)
TRAN_STATE
xtran_server_commit (THREAD_ENTRY *thread_p, bool retain_lock)
{
  TRAN_STATE state;
  int tran_index = LOG_FIND_THREAD_TRAN_INDEX (thread_p);

  /* Guard rails: no in-flight queries, no held mutex stack. */
  // ... condensed ...

  state = log_commit (thread_p, tran_index, retain_lock);

  /* Fire post-commit triggers (replication, CDC supplemental flush). */
  // ... condensed ...
  return state;
}

The path inside log_commit (covered in cubrid-log-manager.md): append LOG_COMMIT_WITH_POSTPONE if any postpone records are buffered, run pending postpones, append LOG_COMMIT, force-flush, transition state to TRAN_UNACTIVE_COMMITTED, release locks (or retain if retain_lock), free the TDES via logtb_release_tran_index. log_abort is the mirror: append LOG_ABORT, drive undo, release locks, free.

System ops — sub-transactional units of recovery

Operations that are atomic-as-a-group but touch many pages (B+Tree splits, heap overflow allocation, schema mutations) use system ops. A system op opens with log_sysop_start (log_manager.c:3599), nests on the TDES’s topops stack, runs its sub-operation, then commits with log_sysop_commit (3916) or aborts with log_sysop_abort.

// LOG_TOPOPS_STACK / LOG_TOPOPS_ADDRESSES — src/transaction/log_impl.h
struct log_topops_addresses
{
  LOG_LSA lastparent_lsa;     /* Where the parent's log range was when this op began */
  LOG_LSA posp_lsa;           /* First postpone of this op */
};

struct log_topops_stack
{
  int max;
  int last;                   /* -1 ⇒ no system op in progress */
  LOG_TOPOPS_ADDRESSES *stack;
};

log_sysop_start pushes a new LOG_TOPOPS_ADDRESSES whose lastparent_lsa is the parent’s current tail_lsa. While the system op is active, the system op’s log records form a contiguous range on the log; log_sysop_commit writes a LOG_SYSOP_END_COMMIT record (with lastparent_lsa and prv_topresult_lsa for chaining) that marks the range complete. log_sysop_abort writes LOG_SYSOP_END_ABORT and walks the range backward applying undos.

The variants on log_sysop_end_* correspond to the union arms in LOG_REC_SYSOP_END (covered in cubrid-log-manager.md):

log_sysop_end_logical_undo — system op carries its own logical undo image (used for index splits where physical undo is not enough).
log_sysop_end_logical_compensate — system op was undone, leaves a CLR pointer.
log_sysop_end_logical_run_postpone — system op was used to drive a postpone.

Recovery is aware of these variants: the analysis pass categorises sysop ranges by their LOG_SYSOP_END_TYPE, and the redo / undo passes invoke the right path per variant. (Details in cubrid-recovery-manager.md.)

Savepoints — named LSAs in a chain

Savepoint creation is a log_append_savepoint (log_manager.c, declared in log_manager.h:132) that emits a LOG_SAVEPOINT record carrying the savepoint name. The TDES updates savept_lsa to the new record’s LSA; the record carries a prv_savept pointer to the previous savepoint, so the chain can be walked at rollback-to-savepoint time.

// LOG_REC_SAVEPT — src/transaction/log_record.hpp
struct log_rec_savept
{
  LOG_LSA prv_savept;       /* Previous savepoint record */
  int     length;           /* Savepoint name length follows */
};

Rollback to savepoint:

log_abort_partial(savepoint_name, savept_lsa)
  → walk savept_lsa chain to find named savepoint
  → undo from tail_lsa back to that savepoint
  → emit CLRs at each step
  → reset tail_lsa to the savepoint record's LSA

Savepoints come in two flavours marked by SAVEPOINT_TYPE (transaction_cl.h:49): USER_SAVEPOINT (named explicitly via SQL SAVEPOINT foo) and SYSTEM_SAVEPOINT (engine-internal, e.g., to bracket a statement so an error rolls back just that statement).

Lifecycle, end to end

stateDiagram-v2
  [*] --> ACTIVE: logtb_assign_tran_index
  ACTIVE --> WILL_COMMIT: log_commit\n(append LOG_COMMIT)
  WILL_COMMIT --> COMMITTED_W_POSTPONE: postpones queued
  COMMITTED_W_POSTPONE --> COMMITTED: log_do_postpone done
  WILL_COMMIT --> COMMITTED: no postpones
  ACTIVE --> ABORTED: log_abort\n(append LOG_ABORT, drive undo)
  ACTIVE --> UNILATERALLY_ABORTED: crash detected\nin recovery analysis
  ACTIVE --> PREPARED_2PC: log_2pc_prepare
  PREPARED_2PC --> COMMIT_DECISION: coordinator says commit
  PREPARED_2PC --> ABORT_DECISION: coordinator says abort
  COMMIT_DECISION --> INFORMING_PARTICIPANTS_C: phase 2 send
  ABORT_DECISION --> INFORMING_PARTICIPANTS_A: phase 2 send
  INFORMING_PARTICIPANTS_C --> COMMITTED: all acks received
  INFORMING_PARTICIPANTS_A --> ABORTED: all acks received
  COMMITTED --> [*]: logtb_release_tran_index
  ABORTED --> [*]
  UNILATERALLY_ABORTED --> [*]

Each transition is named by its log-record emission: log_commit emits LOG_COMMIT_WITH_POSTPONE or LOG_COMMIT; log_abort emits LOG_ABORT; the 2PC paths (covered in cubrid-2pc.md) emit LOG_2PC_* types. The analysis pass at recovery walks the log forward, builds a per-TDES picture, and uses LOG_ISTRAN_* to decide each TDES’s fate.

Client-side shadow and the API surface

The client side of the transaction module is small: a handful of globals (tm_Tran_index, tm_Tran_isolation, tm_Tran_ID, tm_Tran_async_ws, tm_Tran_wait_msecs) and the tran_* API in transaction_cl.h.

// Client-visible API — src/transaction/transaction_cl.h (excerpt)
extern int  tran_commit (bool retain_lock);
extern int  tran_abort (void);
extern int  tran_unilaterally_abort (void);
extern int  tran_reset_isolation (TRAN_ISOLATION isolation, bool async_ws);
extern int  tran_reset_wait_times (int wait_in_msecs);
extern int  tran_savepoint_internal (const char *name, SAVEPOINT_TYPE type);
extern int  tran_abort_upto_user_savepoint (const char *name);
extern int  tran_abort_upto_system_savepoint (const char *name);
extern int  tran_2pc_start (void);
extern int  tran_2pc_prepare (void);
extern int  tran_set_global_tran_info (int gtrid, void *info, int size);
extern bool tran_has_updated (void);

Each entry point does some local bookkeeping (e.g., call any registered tran_end_libcas_function for the broker) and then performs the server RPC. The server side is covered above.

Source Walkthrough

Anchor on symbol names, not line numbers.

Headers and types

log_tdes (log_impl.h) — the descriptor.
trantable (log_impl.h) — the table of descriptors.
log_topops_stack, log_topops_addresses (log_impl.h) — system op stack.
log_rcv_tdes (log_impl.h) — recovery-time TDES annotations.
TRAN_STATE (log_comm.h) — 15-value lifecycle enum.
DB_TRAN_ISOLATION (compat/dbtran_def.h) — 3-level isolation enum.
LOG_TOPOP_RANGE (log_manager.h) — pair of (start_lsa, end_lsa) used for nested-top postpone replay.
tx_transient_class_registry (transaction_transient.hpp) — modified-classes list that needs invalidation.

Trantable management

logtb_assign_tran_index (log_tran_table.c) — allocate a slot for a new transaction.
logtb_release_tran_index (log_tran_table.c) — return the slot.
logtb_set_current_tran_index (log_tran_table.c) — set thread-current index.
logtb_complete_mvcc (log_tran_table.c) — close out MVCC info on commit/abort.
logtb_grow_* (log_tran_table.c) — table growth.

Server entry points

xtran_server_commit (transaction_sr.c) — server commit.
xtran_server_abort (transaction_sr.c) — server abort.
xtran_server_savepoint (transaction_sr.c) — server savepoint creation.
xtran_server_unilaterally_abort_tran (transaction_sr.c) — forced abort during error recovery.

System op surface

log_sysop_start (log_manager.c) — push frame.
log_sysop_commit (log_manager.c) — pop with commit, write LOG_SYSOP_END_COMMIT.
log_sysop_abort (log_manager.c) — pop with abort, write LOG_SYSOP_END_ABORT, walk backward for undo.
log_sysop_start_atomic (log_manager.c) — atomic variant for recovery-sensitive operations (file allocation/deallocation).
log_sysop_end_logical_undo (log_manager.c) — system op that carries its own logical undo.
log_sysop_end_logical_compensate / log_sysop_end_logical_run_postpone (log_manager.c) — variants for compensation and postpone replay.
log_sysop_attach_to_outer (log_manager.c) — attach a system op’s log range to its parent without writing an end record (used when a system op is essentially a marker).

Client-side API

tran_commit (transaction_cl.c).
tran_abort (transaction_cl.c).
tran_savepoint_internal (transaction_cl.c) — both USER and SYSTEM savepoints land here.
tran_abort_upto_user_savepoint / tran_abort_upto_system_savepoint (transaction_cl.c).
tran_reset_isolation (transaction_cl.c) — flips tm_Tran_isolation and forwards to server.

Position hints as of 2026-05-01

Symbol	File	Line
`log_tdes` (struct)	`log_impl.h`	475
`log_topops_stack`	`log_impl.h`	362
`log_topops_addresses`	`log_impl.h`	353
`log_rcv_tdes`	`log_impl.h`	458
`trantable`	`log_impl.h`	602
`LOG_ISTRAN_ACTIVE` (macro)	`log_impl.h`	143
`LOG_ISTRAN_COMMITTED` (macro)	`log_impl.h`	146
`LOG_ISTRAN_ABORTED` (macro)	`log_impl.h`	153
`LOG_ISTRAN_2PC` (macro)	`log_impl.h`	173
`TRAN_STATE` enum	`log_comm.h`	36
`DB_TRAN_ISOLATION` enum	`dbtran_def.h`	36
`logtb_assign_tran_index`	`log_tran_table.c`	796
`logtb_release_tran_index`	`log_tran_table.c`	1139
`logtb_complete_mvcc`	`log_tran_table.c`	4050
`logtb_set_current_tran_index`	`log_tran_table.c`	6002
`xtran_server_commit`	`transaction_sr.c`	71
`xtran_server_abort`	`transaction_sr.c`	128
`xtran_server_savepoint`	`transaction_sr.c`	348
`log_sysop_start`	`log_manager.c`	3599
`log_sysop_start_atomic`	`log_manager.c`	3665
`log_sysop_commit_internal`	`log_manager.c`	3825
`log_sysop_commit`	`log_manager.c`	3916
`log_commit`	`log_manager.c`	5352
`log_abort`	`log_manager.c`	5461

Source verification (as of 2026-05-01)

Verified facts

LOG_TDES is a single struct of ~50 fields, not split between hot and cold. Verified at log_impl.h:475. Unlike PostgreSQL (which splits PROC from PGXACT so visibility scans only touch hot fields), CUBRID inlines visibility-relevant mvccinfo next to bookkeeping fields like bind_history and query_timeout. Implication: visibility scans of the TDES table read more cache lines per descriptor than the strictly necessary set.
TRAN_STATE has 15 values, of which 7 belong to the 2PC state machine. Verified at log_comm.h:36-67. The LOG_ISTRAN_2PC macro at log_impl.h:173-176 collapses 6 of them into “is in 2PC”. The 15-value enum does not include TRAN_RECOVERY separately as a 2PC variant; it’s a pseudo-state used for the recovery worker’s pseudo-tran.
Default isolation is TRAN_READ_COMMITTED (0x04). Verified at dbtran_def.h:53 (TRAN_DEFAULT_ISOLATION = TRAN_READ_COMMITTED) and dbtran_def.h:54 (MVCC_TRAN_DEFAULT_ISOLATION = TRAN_READ_COMMITTED). Both defaults agree because CUBRID is MVCC across the board; there is no non-MVCC mode where the default would differ.
Isolation-level enum values are deliberately aliased. TRAN_READ_COMMITTED == TRAN_REP_CLASS_COMMIT_INSTANCE == TRAN_CURSOR_STABILITY == 0x04. Verified at dbtran_def.h:40-42. The aliases preserve API compatibility with the older locking-vocabulary names; they compile to the same dispatch path.
Trantable size is configured at boot, not dynamic per transaction. Verified by reading logtb_assign_tran_index (log_tran_table.c:796) — it allocates from a contiguous area managed by LOG_ADDR_TDESAREA linked list, growing only when exhausted, never shrinking. The cap is set by the max_clients server parameter.
System ops nest via a stack on the TDES, not a separate table. Verified at log_impl.h:361-367 (LOG_TOPOPS_STACK). The stack’s last field is -1 when no system op is active, an integer index otherwise. There is no global system-op table — every TDES owns its own stack.
Lock-acquisition wait timeout is per-TDES. Verified at log_impl.h:486 (wait_msecs field) and the corresponding client-side global tm_Tran_wait_msecs in transaction_cl.h:58. The macro TRAN_LOCK_INFINITE_WAIT = -1 (log_comm.h:29) encodes the “wait forever” sentinel.
block_global_oldest_active_until_commit exists for long-running operations that need to do their own vacuuming. Verified at log_impl.h:555 and the lock_global_oldest_visible_mvccid member function declared at log_impl.h:585. Used by reorganize-partition / upgrade-domain code paths that scan large amounts of data and would otherwise have their MVCC threshold pushed forward by concurrent transactions.
LOG_2PC_GTRINFO and LOG_2PC_COORDINATOR * are inline TDES fields, present even for non-2PC transactions. Verified at log_impl.h:505-508. coord is NULL if the site is not the coordinator. The cost is one pointer per TDES; the benefit is that attaching a 2PC role to a previously-local transaction does not re-allocate.
LOG_RCV_TDES is non-NULL only during recovery. Verified at log_impl.h:458 (struct definition) and 558 (inlined into log_tdes::rcv). Its fields (sysop_start_postpone_lsa, tran_start_postpone_lsa, atomic_sysop_start_lsa, analysis_last_aborted_sysop_*) are populated during analysis-pass and consumed during redo/undo.

Open questions

TDES hot/cold split. Has anyone measured the cache-miss penalty of putting mvccinfo next to bind_history? Other engines split, presumably for a reason. Investigation path: perf stat -e cache-misses on a high-concurrency read workload; compare against a hypothetical TDES split.
Trantable growth. The header field LOG_ADDR_TDESAREA *area suggests growth is supported at runtime, but the trigger and coordination are unverified. Investigation path: grep for area writes in log_tran_table.c; check whether growth happens in the request path or only at a quiescent point.
hint_free_index correctness under contention. Multiple threads can simultaneously call logtb_assign_tran_index. The hint is single-valued — what guards it? Investigation path: read the body of logtb_assign_tran_index for compare-and-swap or mutex usage.
System-op rmutex_topop behaviour. A reentrant mutex per-TDES suggests system ops can recursively start while one is in progress on the same thread, but the depth bound is unverified. Investigation path: examine log_sysop_start for lock_topop() calls and chase the reentrance count.
Postpone cache integration. m_log_postpone_cache is a C++ class (log_postpone_cache) inlined into the TDES. Its purpose per the field comment is to remember postpone records that may be replayed at log_do_postpone. The exact lifetime (cleared on commit? on abort? carried across sysop boundaries?) is unverified. Investigation path: read log_postpone_cache.cpp together with log_do_postpone in log_manager.c.
Client-side TDES shadow vs. server reality. tm_Tran_* are client-side globals; what happens on a connection failover when the server has a different wait_msecs? Investigation path: trace tran_cache_tran_settings consumers; check whether the CAS broker re-syncs on reconnect.

Beyond CUBRID — Comparative Designs & Research Frontiers

Pointers, not analysis. Each bullet is a starting handle for a follow-up doc.

PostgreSQL PROC / PGXACT split — PG splits the descriptor into a hot half (PGXACT: xid, xmin, vacuumFlags) read by visibility scans and a cold half (PROC: locktag arrays, myProcLocks). A side-by-side with CUBRID’s monolithic TDES would measure the cache cost.
InnoDB trx_t plus lock_sys reservation — InnoDB embeds per-tran lock reservation inside trx_t::lock and uses a global lock_sys_t mutex. CUBRID separates this: LK_RES *waiting_for_res on the TDES plus the lock manager’s per-resource hash. Comparing the two would illuminate the lock-acquisition critical path.
Hekaton in-memory transaction map (Larson et al., VLDB 2011) — Hekaton stores TDES in a lock-free hash on transaction-id, with versions stored inline on records. CUBRID’s fixed-array trantable is the opposite design point.
Partial rollback chains in PostgreSQL subtransactions — PG uses SubTransactionId and a per-backend stack much like CUBRID’s topops stack. The two-version subtransaction-id mapping in PG (subxact + parent xid) is more elaborate than CUBRID’s LOG_TOPOPS_ADDRESSES but the lifecycle is structurally identical.
Optimistic concurrency control on RDMA (FaRM, NSDI 2014) — FaRM eliminates the TDES table by encoding transaction state directly in record versions. CUBRID’s TDES survives because its isolation modes need the descriptor for lock acquisition; comparison highlights what the descriptor is for on a shared-memory engine.
JTA XAResource semantics (JSR 907) — the CUBRID 2PC TRAN_STATE branch is conformant to JTA prepared/commit/rollback semantics; the cubrid-2pc.md doc is the natural follow-up that enumerates the conformance points.
CockroachDB serializable + parallel commits (Taft et al., SIGMOD 2020) — Cockroach pushes the descriptor into a distributed KV layer and commits a transaction by writing a single intent record whose status is resolved lazily; the “transaction record” plays the role of CUBRID’s TDES but without a fixed-size table. A side-by-side would surface what shared-memory engines pay (the trantable cap) versus what shared-nothing engines pay (intent resolution traffic).

Sources

Raw analyses (`raw/code-analysis/cubrid/storage/transaction/`)

Transaction Internals.pdf
Transaction Internals.pptx

Textbook chapters (under `knowledge/research/dbms-general/`)

Database Internals (Petrov), Ch. 5 “Transactions and Recovery”, §“ACID” and §“Isolation levels”.
Concurrency Control and Recovery in Database Systems (Bernstein, Hadzilacos, Goodman), Ch. 1–4.

CUBRID source (`/data/hgryoo/references/cubrid/`)

src/transaction/log_impl.h — TDES, trantable, sysop stack.
src/transaction/log_tran_table.c — trantable management.
src/transaction/transaction_cl.{h,c} — client-side API.
src/transaction/transaction_sr.{h,c} — server entry points.
src/transaction/transaction_global.hpp — system tran constants.
src/transaction/transaction_transient.hpp — modified-class registry, lob locator chain.
src/transaction/log_comm.h — TRAN_STATE enum.
src/transaction/log_manager.c — sysop, commit, abort.
src/compat/dbtran_def.h — DB_TRAN_ISOLATION enum.

Sibling docs in this knowledge base

knowledge/code-analysis/cubrid/cubrid-log-manager.md — log records the TDES emits.
knowledge/code-analysis/cubrid/cubrid-mvcc.md — consumer of log_tdes::mvccinfo.
knowledge/code-analysis/cubrid/cubrid-lock-manager.md — consumer of log_tdes::wait_msecs and producer of log_tdes::waiting_for_res.
knowledge/code-analysis/cubrid/cubrid-recovery-manager.md — consumer of TDES at analysis time; in-progress in the same batch.
knowledge/code-analysis/cubrid/cubrid-2pc.md — owner of the 2PC state-machine arms and coord / gtrinfo; in-progress in the same batch.