Skip to content

CUBRID Transaction — TDES, Isolation, and Savepoints

Contents:

A transaction in the relational sense is the unit of atomicity and isolation the engine sells to the client. Database Internals (Petrov, ch. 5 §“Transactions”) frames the responsibilities hierarchically: ACID-A (atomicity) is owned by the recovery manager through WAL and CLRs; ACID-D (durability) by the log force-flush at commit; ACID-C (consistency) by the data-side constraints; and ACID-I (isolation) by the joint operation of the lock manager and the MVCC visibility machinery. The transaction module sits at the hub where those four threads meet: it owns the per-transaction state that lock manager, log manager, MVCC table, and recovery manager all read and write.

The unit of state is the transaction descriptor (TDES in CUBRID, PROC in PostgreSQL, trx_t in InnoDB). The descriptor has a long list of obligations:

  • A stable identity (trid) that survives connection drops and appears in every log record.
  • A lifecycle state (active, committed, aborted, in-doubt, …) that recovery’s analysis pass reconstructs at restart.
  • An isolation level that gates what the access path’s visibility and lock acquisition look like.
  • A history of LSAs — head, tail, undo-next, postpone-next, savepoint, top-op — that recovery and rollback need to walk the transaction backwards.
  • A collection of side state — modified-classes registry, replication records, unique-index statistics, lob locators — that needs to be cleaned at commit / abort.

Two implementation choices the transaction model leaves open shape the rest of this document:

  1. Where the TDES lives and how it is named. The textbook answer is “in a fixed-size transaction table indexed by transaction index”. The variants are how the table is sized (static vs. elastic), how indices are reused, and what is hot vs. cold inside the descriptor. CUBRID picks a fixed-size table, allocates TDES slots from a contiguous area, and recycles slots through a hint_free_index.
  2. How isolation is enforced — at the access boundary, at the statement boundary, or both. SI engines enforce read isolation via snapshot acquisition; 2PL engines enforce it via lock acquisition; hybrid engines (CUBRID is one) acquire snapshots and take key-range locks for SERIALIZABLE / REPEATABLE READ. The isolation field on the TDES is the dispatch key.

Once those choices are named, every other piece of state on the TDES is in service of one of them.

Every relational engine that supports nested rollbacks, isolation level toggling, and recovery uses the same handful of patterns around the transaction descriptor.

A fixed-size array of descriptors, indexed by a small integer (“transaction index”). The trade-off: fixed size means client count caps at table size, but indexing is O(1) and the memory layout is cache-friendly. PostgreSQL uses MaxBackends-sized PROC array; InnoDB has a trx_sys->trx_list; SQL Server uses a hash on tid. CUBRID is in the fixed-array camp.

Isolation is a per-TDES enum with three or four values (READ COMMITTED, REPEATABLE READ, SERIALIZABLE; some engines add READ UNCOMMITTED). The value is read at three places: snapshot acquisition (which kind of MVCC snapshot to build), lock acquisition (whether to take key-range locks), and statement boundary (whether to release short-cursor locks early). Every engine has the same three-way switch.

Any operation that touches multiple pages atomically (B+Tree split, heap overflow allocation, schema mutation) needs a sub-transactional unit of recovery — a “system op” in CUBRID, a “subxact” in PostgreSQL, a “mtr” in InnoDB. The TDES carries a stack of in-progress system ops. Commit pops the top frame and merges its log range into the parent; abort pops and rolls back the frame. The stack is a recursion-style structure, not a tree.

A savepoint is a name attached to “the LSA of the most recent log record at savepoint creation”. Rollback-to-savepoint is “undo all log records between current tail LSA and the savepoint LSA”. The implementation is a chain on the TDES: savept_lsa plus a prv_savept chain in each savepoint log record. Re-establishing savepoints in recovery happens by walking the chain.

Naive engines use a bool committed and a bool aborted. Real engines use an enum with 10+ states because 2PC and postpone operations create transitions (“committed-with-postpone”, “committed-informing-participants”, “2PC-prepared”, “unilaterally-aborted”) that aren’t expressible with two booleans. CUBRID’s TRAN_STATE enum has 15 values; PostgreSQL has fewer (2PC is its own subsystem); InnoDB has fewer still (no 2PC server side).

Theoretical conceptCUBRID name
Transaction identifierTRANID trid (in LOG_TDES)
Transaction descriptorLOG_TDES (log_impl.h)
Transaction tableTRANTABLE log_Gl.trantable (log_impl.h)
Transaction state enumTRAN_STATE — 15 states (log_comm.h)
Isolation level enumDB_TRAN_ISOLATION aliased to TRAN_ISOLATION (compat/dbtran_def.h)
Active / committed / aborted predicatesLOG_ISTRAN_ACTIVE, LOG_ISTRAN_COMMITTED, LOG_ISTRAN_ABORTED macros (log_impl.h)
MVCC info (visibility, snapshot)LOG_TDES::mvccinfo of type MVCC_INFO
Nested top-op stackLOG_TDES::topops of type LOG_TOPOPS_STACK (log_impl.h)
Top-op log rangeLOG_TOPOPS_ADDRESSES { lastparent_lsa; posp_lsa } per stack frame
SavepointLOG_TDES::savept_lsa + LOG_REC_SAVEPT chain
Postpone (deferred actions)LOG_TDES::posp_nxlsa + log_postpone_cache m_log_postpone_cache
Modified-class registryLOG_TDES::m_modified_classes of type tx_transient_class_registry
Per-tran B+Tree unique statsLOG_TDES::m_multiupd_stats of type multi_index_unique_stats
2PC coordinator infoLOG_TDES::coord of type LOG_2PC_COORDINATOR * (covered in cubrid-2pc.md)
2PC global tran infoLOG_TDES::gtrinfo of type LOG_2PC_GTRINFO
Recovery-time TDES annotationsLOG_TDES::rcv of type LOG_RCV_TDES
Server-side commit entryxtran_server_commit (transaction_sr.c)
Server-side abort entryxtran_server_abort (transaction_sr.c)
Client-visible committran_commit (transaction_cl.c)
Reusable index assignmentlogtb_assign_tran_index (log_tran_table.c)
System op (sub-transaction) stack pushlog_sysop_start (log_manager.c)

The transaction module’s four moving parts are the trantable that holds all live TDES, the TDES itself, the lifecycle state machine that the trantable’s entries traverse, and the system-op stack the TDES owns for sub-transactional rollback. We walk them in that order.

flowchart LR
  subgraph CL["Client side (transaction_cl.c)"]
    TCL["tm_Tran_index\ntm_Tran_isolation\ntm_Tran_ID"]
    API["tran_commit\ntran_abort\ntran_savepoint_internal"]
    TCL --> API
  end
  subgraph SR["Server side (transaction_sr.c)"]
    XSC["xtran_server_commit"]
    XSA["xtran_server_abort"]
    XSV["xtran_server_savepoint"]
  end
  subgraph TT["Trantable (log_Gl.trantable)"]
    HDR["TRANTABLE { num_total_indices, hint_free_index, all_tdes[] }"]
    T1["LOG_TDES idx=1\ntrid=42"]
    T2["LOG_TDES idx=2\ntrid=43"]
    Tn["..."]
    HDR --> T1
    HDR --> T2
    HDR --> Tn
  end
  subgraph LM["log_manager"]
    LC["log_commit"]
    LA["log_abort"]
    LS["log_sysop_start / commit / abort"]
  end
  API -->|RPC| XSC
  API -->|RPC| XSA
  API -->|RPC| XSV
  XSC --> LC
  XSA --> LA
  LC --> T1
  LA --> T1
  LS --> T1

The figure encodes the three boundaries the transaction module sits at. (client/server) the client TDES is a thin shadow (tm_Tran_* globals); the heavy state lives server-side. (TDES / trantable) TDES slots are owned by the trantable; lookups are O(1) by index. (TDES / log) every commit / abort / savepoint / system-op call mutates the TDES and appends a log record; the two are kept in lockstep.

LOG_TDES in log_impl.h is the central data structure of the transaction module. It is large; below is the load-bearing slice with line-comment annotation.

// LOG_TDES — src/transaction/log_impl.h
struct log_tdes
{
/* === MVCC and identity === */
MVCC_INFO mvccinfo; /* MVCC info — snapshot, MVCCID, sub-IDs */
int tran_index; /* Index into trantable */
TRANID trid; /* Stable transaction identifier */
/* === lifecycle === */
bool isloose_end;
TRAN_STATE state; /* 15-value enum */
TRAN_ISOLATION isolation; /* READ_COMMITTED | REPEATABLE_READ | SERIALIZABLE */
int wait_msecs; /* Lock wait timeout */
/* === LSA chain — these are the recovery anchors === */
LOG_LSA head_lsa; /* First record of this transaction */
LOG_LSA tail_lsa; /* Last record */
LOG_LSA undo_nxlsa; /* Next record to undo (compensate-aware) */
LOG_LSA posp_nxlsa; /* First / next postpone record */
LOG_LSA savept_lsa; /* Last user/system savepoint */
LOG_LSA topop_lsa; /* Last system op */
LOG_LSA tail_topresult_lsa; /* Last partial abort/commit */
LOG_LSA commit_abort_lsa; /* Commit/abort record (used by checkpoint) */
/* === client identity, locking, and 2PC === */
int client_id;
int gtrid; /* Global tran ID for 2PC */
CLIENTIDS client;
SYNC_RMUTEX rmutex_topop; /* Reentrant mutex serialising sysop begin/end */
LOG_TOPOPS_STACK topops; /* Active sub-transactional ops */
LOG_2PC_GTRINFO gtrinfo;
LOG_2PC_COORDINATOR *coord; /* NULL unless this site is the 2PC coordinator */
/* === per-transaction caches and stats === */
int num_unique_btrees;
multi_index_unique_stats m_multiupd_stats;
volatile sig_atomic_t interrupt;
tx_transient_class_registry m_modified_classes;
int num_transient_classnames;
int num_repl_records;
struct log_repl *repl_records;
LOG_LSA repl_insert_lsa;
LOG_LSA repl_update_lsa;
void *first_save_entry;
int suppress_replication;
struct lob_rb_root lob_locator_root;
INT64 query_timeout;
INT64 query_start_time;
INT64 tran_start_time;
XASL_ID xasl_id;
LK_RES *waiting_for_res; /* The lock-resource I'm blocked on, if any */
int disable_modifications;
TRAN_ABORT_REASON tran_abort_reason;
int num_exec_queries;
DB_VALUE_ARRAY bind_history[MAX_NUM_EXEC_QUERY_HISTORY];
int num_log_records_written;
LOG_TRAN_UPDATE_STATS log_upd_stats;
bool has_deadlock_priority;
bool block_global_oldest_active_until_commit;
bool is_user_active;
LOG_RCV_TDES rcv; /* Recovery-time annotations only */
log_postpone_cache m_log_postpone_cache;
bool has_supplemental_log;
char *ddl_sql_user_text;
// ... member functions for sysop locking and oldest-mvccid pinning ...
};

The struct is dense but cleanly stratified. The first block (mvcc info, identity) is what every page access reads. The second block (state, isolation, wait_msecs) is what lock acquisition and visibility decisions read. The LSA chain is the recovery-side contract: every TDES needs head_lsa/tail_lsa so analysis can identify the transaction, undo_nxlsa for rollback, posp_nxlsa for postpone replay, and savept_lsa / topop_lsa / tail_topresult_lsa for partial-rollback semantics. The remaining fields are bookkeeping: lock the transaction is waiting on, lobs to clean up, replication records to ship, modified classes to invalidate on commit.

The trantable in log_impl.h is a small header plus a contiguous allocation area for TDES.

// TRANTABLE — src/transaction/log_impl.h
struct trantable
{
int num_total_indices; /* Capacity (configured at boot) */
int num_assigned_indices; /* Currently in use */
int num_coord_loose_end_indices;
int num_prepared_loose_end_indices;
int hint_free_index; /* Speeds up next assignment */
volatile sig_atomic_t num_interrupts;
LOG_ADDR_TDESAREA *area; /* Linked list of TDES storage areas */
LOG_TDES **all_tdes; /* Indexed pointer table */
};

The two important properties: (a) all_tdes is the lookup table indexed by transaction index, so LOG_FIND_TDES(idx) is one load. (b) area is a chain of contiguous allocations rather than a single block, because the table can grow (logtb_grow_* paths) without invalidating existing pointers.

logtb_assign_tran_index (log_tran_table.c:796) is the assigner. It uses hint_free_index to find a free slot fast, allocates a new area if needed, and initializes a fresh TDES — including logtb_set_loose_end_chkpt_lsa and the first head_lsa. The matching logtb_release_tran_index (1139) clears the TDES, releases locks the transaction held, and updates hint_free_index.

The trantable lives behind the TR_TABLE_CS critical section (csect_enter (CSECT_TRAN_TABLE)); writers (assign / release) take it in write mode, readers (most TDES lookups) take it in read mode or skip it entirely when going via direct index.

Lifecycle — the TRAN_STATE state machine

Section titled “Lifecycle — the TRAN_STATE state machine”

TRAN_STATE (log_comm.h) is the lifecycle enum. It has 15 values, which matters: a “transaction” is more than alive/dead/in-progress because postpone, 2PC, and unilateral abort each create their own intermediate states the recovery analysis pass must distinguish.

// TRAN_STATE — src/transaction/log_comm.h
enum
{
TRAN_RECOVERY, /* system tran for recovery */
TRAN_ACTIVE, /* normal in-flight */
TRAN_UNACTIVE_COMMITTED, /* commit complete */
TRAN_UNACTIVE_WILL_COMMIT, /* commit log written, force pending */
TRAN_UNACTIVE_COMMITTED_WITH_POSTPONE, /* committed, postpones running */
TRAN_UNACTIVE_TOPOPE_COMMITTED_WITH_POSTPONE, /* sysop-postpone variant */
TRAN_UNACTIVE_ABORTED, /* user-initiated abort */
TRAN_UNACTIVE_UNILATERALLY_ABORTED, /* system aborted (crash) */
TRAN_UNACTIVE_2PC_PREPARE, /* prepared, waiting decision */
TRAN_UNACTIVE_2PC_COLLECTING_PARTICIPANT_VOTES, /* coordinator phase 1 */
TRAN_UNACTIVE_2PC_ABORT_DECISION, /* coordinator phase 2 — abort */
TRAN_UNACTIVE_2PC_COMMIT_DECISION, /* coordinator phase 2 — commit */
TRAN_UNACTIVE_COMMITTED_INFORMING_PARTICIPANTS, /* informing after commit */
TRAN_UNACTIVE_ABORTED_INFORMING_PARTICIPANTS, /* informing after abort */
TRAN_UNACTIVE_UNKNOWN
} TRAN_STATE;

The LOG_ISTRAN_* predicates in log_impl.h:143-183 collapse the enum into the questions the rest of the engine asks: LOG_ISTRAN_ACTIVE checks “is this a normal in-flight tran on a restarted server”, LOG_ISTRAN_COMMITTED collapses 5 commit-side states, LOG_ISTRAN_ABORTED collapses 4 abort-side states, LOG_ISTRAN_2PC_IN_SECOND_PHASE collapses the second-phase 2PC states. The collapsed views are how the recovery manager decides “do we need to redo, undo, or finish 2PC”, and how callers like logpb_checkpoint decide whether a TDES needs further attention.

Isolation — three levels, dispatched at access time

Section titled “Isolation — three levels, dispatched at access time”

DB_TRAN_ISOLATION (compat/dbtran_def.h) is a 3-bit field:

// DB_TRAN_ISOLATION — src/compat/dbtran_def.h
typedef enum
{
TRAN_UNKNOWN_ISOLATION = 0x00,
TRAN_READ_COMMITTED = 0x04, /* alias TRAN_REP_CLASS_COMMIT_INSTANCE,
TRAN_CURSOR_STABILITY */
TRAN_REPEATABLE_READ = 0x05, /* alias TRAN_REP_READ */
TRAN_SERIALIZABLE = 0x06, /* alias TRAN_NO_PHANTOM_READ */
TRAN_DEFAULT_ISOLATION = TRAN_READ_COMMITTED,
} DB_TRAN_ISOLATION;

The value is set per-TDES (log_tdes::isolation), with a client-side shadow in tm_Tran_isolation (transaction_cl.h). It is read at three places:

  • Snapshot acquisition (mvcc_satisfies_snapshot in cubrid-mvcc.md): READ COMMITTED reacquires per-statement; REPEATABLE READ holds the snapshot for the transaction; SERIALIZABLE behaves like REPEATABLE READ at the snapshot level but adds key-range locks.
  • Lock acquisition (lock manager): SERIALIZABLE takes key-range locks at scan boundaries. REPEATABLE READ relies on MVCC for read-stability and only locks data being written. READ COMMITTED takes minimal locks; locks released at statement boundary.
  • Statement boundary (xtran_*_query_end_*): for cursor-stability semantics READ COMMITTED releases its snapshot; the others retain.

The deliberate aliasing — TRAN_CURSOR_STABILITY is the same value as TRAN_READ_COMMITTED — is a backward-compatibility seam: older APIs named the levels by the locking-engine vocabulary (cursor-stability, repeatable-class), newer code uses the SQL-standard names. Both work, and both compile to the same dispatch.

Server-side commit lands at xtran_server_commit (transaction_sr.c:71); abort at xtran_server_abort (128). They are thin RPC wrappers that forward to log_commit / log_abort in log_manager.c, which is where the actual sequencing happens.

// xtran_server_commit — src/transaction/transaction_sr.c (condensed)
TRAN_STATE
xtran_server_commit (THREAD_ENTRY *thread_p, bool retain_lock)
{
TRAN_STATE state;
int tran_index = LOG_FIND_THREAD_TRAN_INDEX (thread_p);
/* Guard rails: no in-flight queries, no held mutex stack. */
// ... condensed ...
state = log_commit (thread_p, tran_index, retain_lock);
/* Fire post-commit triggers (replication, CDC supplemental flush). */
// ... condensed ...
return state;
}

The path inside log_commit (covered in cubrid-log-manager.md): append LOG_COMMIT_WITH_POSTPONE if any postpone records are buffered, run pending postpones, append LOG_COMMIT, force-flush, transition state to TRAN_UNACTIVE_COMMITTED, release locks (or retain if retain_lock), free the TDES via logtb_release_tran_index. log_abort is the mirror: append LOG_ABORT, drive undo, release locks, free.

System ops — sub-transactional units of recovery

Section titled “System ops — sub-transactional units of recovery”

Operations that are atomic-as-a-group but touch many pages (B+Tree splits, heap overflow allocation, schema mutations) use system ops. A system op opens with log_sysop_start (log_manager.c:3599), nests on the TDES’s topops stack, runs its sub-operation, then commits with log_sysop_commit (3916) or aborts with log_sysop_abort.

// LOG_TOPOPS_STACK / LOG_TOPOPS_ADDRESSES — src/transaction/log_impl.h
struct log_topops_addresses
{
LOG_LSA lastparent_lsa; /* Where the parent's log range was when this op began */
LOG_LSA posp_lsa; /* First postpone of this op */
};
struct log_topops_stack
{
int max;
int last; /* -1 ⇒ no system op in progress */
LOG_TOPOPS_ADDRESSES *stack;
};

log_sysop_start pushes a new LOG_TOPOPS_ADDRESSES whose lastparent_lsa is the parent’s current tail_lsa. While the system op is active, the system op’s log records form a contiguous range on the log; log_sysop_commit writes a LOG_SYSOP_END_COMMIT record (with lastparent_lsa and prv_topresult_lsa for chaining) that marks the range complete. log_sysop_abort writes LOG_SYSOP_END_ABORT and walks the range backward applying undos.

The variants on log_sysop_end_* correspond to the union arms in LOG_REC_SYSOP_END (covered in cubrid-log-manager.md):

  • log_sysop_end_logical_undo — system op carries its own logical undo image (used for index splits where physical undo is not enough).
  • log_sysop_end_logical_compensate — system op was undone, leaves a CLR pointer.
  • log_sysop_end_logical_run_postpone — system op was used to drive a postpone.

Recovery is aware of these variants: the analysis pass categorises sysop ranges by their LOG_SYSOP_END_TYPE, and the redo / undo passes invoke the right path per variant. (Details in cubrid-recovery-manager.md.)

Savepoint creation is a log_append_savepoint (log_manager.c, declared in log_manager.h:132) that emits a LOG_SAVEPOINT record carrying the savepoint name. The TDES updates savept_lsa to the new record’s LSA; the record carries a prv_savept pointer to the previous savepoint, so the chain can be walked at rollback-to-savepoint time.

// LOG_REC_SAVEPT — src/transaction/log_record.hpp
struct log_rec_savept
{
LOG_LSA prv_savept; /* Previous savepoint record */
int length; /* Savepoint name length follows */
};

Rollback to savepoint:

log_abort_partial(savepoint_name, savept_lsa)
→ walk savept_lsa chain to find named savepoint
→ undo from tail_lsa back to that savepoint
→ emit CLRs at each step
→ reset tail_lsa to the savepoint record's LSA

Savepoints come in two flavours marked by SAVEPOINT_TYPE (transaction_cl.h:49): USER_SAVEPOINT (named explicitly via SQL SAVEPOINT foo) and SYSTEM_SAVEPOINT (engine-internal, e.g., to bracket a statement so an error rolls back just that statement).

stateDiagram-v2
  [*] --> ACTIVE: logtb_assign_tran_index
  ACTIVE --> WILL_COMMIT: log_commit\n(append LOG_COMMIT)
  WILL_COMMIT --> COMMITTED_W_POSTPONE: postpones queued
  COMMITTED_W_POSTPONE --> COMMITTED: log_do_postpone done
  WILL_COMMIT --> COMMITTED: no postpones
  ACTIVE --> ABORTED: log_abort\n(append LOG_ABORT, drive undo)
  ACTIVE --> UNILATERALLY_ABORTED: crash detected\nin recovery analysis
  ACTIVE --> PREPARED_2PC: log_2pc_prepare
  PREPARED_2PC --> COMMIT_DECISION: coordinator says commit
  PREPARED_2PC --> ABORT_DECISION: coordinator says abort
  COMMIT_DECISION --> INFORMING_PARTICIPANTS_C: phase 2 send
  ABORT_DECISION --> INFORMING_PARTICIPANTS_A: phase 2 send
  INFORMING_PARTICIPANTS_C --> COMMITTED: all acks received
  INFORMING_PARTICIPANTS_A --> ABORTED: all acks received
  COMMITTED --> [*]: logtb_release_tran_index
  ABORTED --> [*]
  UNILATERALLY_ABORTED --> [*]

Each transition is named by its log-record emission: log_commit emits LOG_COMMIT_WITH_POSTPONE or LOG_COMMIT; log_abort emits LOG_ABORT; the 2PC paths (covered in cubrid-2pc.md) emit LOG_2PC_* types. The analysis pass at recovery walks the log forward, builds a per-TDES picture, and uses LOG_ISTRAN_* to decide each TDES’s fate.

The client side of the transaction module is small: a handful of globals (tm_Tran_index, tm_Tran_isolation, tm_Tran_ID, tm_Tran_async_ws, tm_Tran_wait_msecs) and the tran_* API in transaction_cl.h.

// Client-visible API — src/transaction/transaction_cl.h (excerpt)
extern int tran_commit (bool retain_lock);
extern int tran_abort (void);
extern int tran_unilaterally_abort (void);
extern int tran_reset_isolation (TRAN_ISOLATION isolation, bool async_ws);
extern int tran_reset_wait_times (int wait_in_msecs);
extern int tran_savepoint_internal (const char *name, SAVEPOINT_TYPE type);
extern int tran_abort_upto_user_savepoint (const char *name);
extern int tran_abort_upto_system_savepoint (const char *name);
extern int tran_2pc_start (void);
extern int tran_2pc_prepare (void);
extern int tran_set_global_tran_info (int gtrid, void *info, int size);
extern bool tran_has_updated (void);

Each entry point does some local bookkeeping (e.g., call any registered tran_end_libcas_function for the broker) and then performs the server RPC. The server side is covered above.

Anchor on symbol names, not line numbers.

  • log_tdes (log_impl.h) — the descriptor.
  • trantable (log_impl.h) — the table of descriptors.
  • log_topops_stack, log_topops_addresses (log_impl.h) — system op stack.
  • log_rcv_tdes (log_impl.h) — recovery-time TDES annotations.
  • TRAN_STATE (log_comm.h) — 15-value lifecycle enum.
  • DB_TRAN_ISOLATION (compat/dbtran_def.h) — 3-level isolation enum.
  • LOG_TOPOP_RANGE (log_manager.h) — pair of (start_lsa, end_lsa) used for nested-top postpone replay.
  • tx_transient_class_registry (transaction_transient.hpp) — modified-classes list that needs invalidation.
  • logtb_assign_tran_index (log_tran_table.c) — allocate a slot for a new transaction.
  • logtb_release_tran_index (log_tran_table.c) — return the slot.
  • logtb_set_current_tran_index (log_tran_table.c) — set thread-current index.
  • logtb_complete_mvcc (log_tran_table.c) — close out MVCC info on commit/abort.
  • logtb_grow_* (log_tran_table.c) — table growth.
  • xtran_server_commit (transaction_sr.c) — server commit.
  • xtran_server_abort (transaction_sr.c) — server abort.
  • xtran_server_savepoint (transaction_sr.c) — server savepoint creation.
  • xtran_server_unilaterally_abort_tran (transaction_sr.c) — forced abort during error recovery.
  • log_sysop_start (log_manager.c) — push frame.
  • log_sysop_commit (log_manager.c) — pop with commit, write LOG_SYSOP_END_COMMIT.
  • log_sysop_abort (log_manager.c) — pop with abort, write LOG_SYSOP_END_ABORT, walk backward for undo.
  • log_sysop_start_atomic (log_manager.c) — atomic variant for recovery-sensitive operations (file allocation/deallocation).
  • log_sysop_end_logical_undo (log_manager.c) — system op that carries its own logical undo.
  • log_sysop_end_logical_compensate / log_sysop_end_logical_run_postpone (log_manager.c) — variants for compensation and postpone replay.
  • log_sysop_attach_to_outer (log_manager.c) — attach a system op’s log range to its parent without writing an end record (used when a system op is essentially a marker).
  • tran_commit (transaction_cl.c).
  • tran_abort (transaction_cl.c).
  • tran_savepoint_internal (transaction_cl.c) — both USER and SYSTEM savepoints land here.
  • tran_abort_upto_user_savepoint / tran_abort_upto_system_savepoint (transaction_cl.c).
  • tran_reset_isolation (transaction_cl.c) — flips tm_Tran_isolation and forwards to server.
SymbolFileLine
log_tdes (struct)log_impl.h475
log_topops_stacklog_impl.h362
log_topops_addresseslog_impl.h353
log_rcv_tdeslog_impl.h458
trantablelog_impl.h602
LOG_ISTRAN_ACTIVE (macro)log_impl.h143
LOG_ISTRAN_COMMITTED (macro)log_impl.h146
LOG_ISTRAN_ABORTED (macro)log_impl.h153
LOG_ISTRAN_2PC (macro)log_impl.h173
TRAN_STATE enumlog_comm.h36
DB_TRAN_ISOLATION enumdbtran_def.h36
logtb_assign_tran_indexlog_tran_table.c796
logtb_release_tran_indexlog_tran_table.c1139
logtb_complete_mvcclog_tran_table.c4050
logtb_set_current_tran_indexlog_tran_table.c6002
xtran_server_committransaction_sr.c71
xtran_server_aborttransaction_sr.c128
xtran_server_savepointtransaction_sr.c348
log_sysop_startlog_manager.c3599
log_sysop_start_atomiclog_manager.c3665
log_sysop_commit_internallog_manager.c3825
log_sysop_commitlog_manager.c3916
log_commitlog_manager.c5352
log_abortlog_manager.c5461
  • LOG_TDES is a single struct of ~50 fields, not split between hot and cold. Verified at log_impl.h:475. Unlike PostgreSQL (which splits PROC from PGXACT so visibility scans only touch hot fields), CUBRID inlines visibility-relevant mvccinfo next to bookkeeping fields like bind_history and query_timeout. Implication: visibility scans of the TDES table read more cache lines per descriptor than the strictly necessary set.

  • TRAN_STATE has 15 values, of which 7 belong to the 2PC state machine. Verified at log_comm.h:36-67. The LOG_ISTRAN_2PC macro at log_impl.h:173-176 collapses 6 of them into “is in 2PC”. The 15-value enum does not include TRAN_RECOVERY separately as a 2PC variant; it’s a pseudo-state used for the recovery worker’s pseudo-tran.

  • Default isolation is TRAN_READ_COMMITTED (0x04). Verified at dbtran_def.h:53 (TRAN_DEFAULT_ISOLATION = TRAN_READ_COMMITTED) and dbtran_def.h:54 (MVCC_TRAN_DEFAULT_ISOLATION = TRAN_READ_COMMITTED). Both defaults agree because CUBRID is MVCC across the board; there is no non-MVCC mode where the default would differ.

  • Isolation-level enum values are deliberately aliased. TRAN_READ_COMMITTED == TRAN_REP_CLASS_COMMIT_INSTANCE == TRAN_CURSOR_STABILITY == 0x04. Verified at dbtran_def.h:40-42. The aliases preserve API compatibility with the older locking-vocabulary names; they compile to the same dispatch path.

  • Trantable size is configured at boot, not dynamic per transaction. Verified by reading logtb_assign_tran_index (log_tran_table.c:796) — it allocates from a contiguous area managed by LOG_ADDR_TDESAREA linked list, growing only when exhausted, never shrinking. The cap is set by the max_clients server parameter.

  • System ops nest via a stack on the TDES, not a separate table. Verified at log_impl.h:361-367 (LOG_TOPOPS_STACK). The stack’s last field is -1 when no system op is active, an integer index otherwise. There is no global system-op table — every TDES owns its own stack.

  • Lock-acquisition wait timeout is per-TDES. Verified at log_impl.h:486 (wait_msecs field) and the corresponding client-side global tm_Tran_wait_msecs in transaction_cl.h:58. The macro TRAN_LOCK_INFINITE_WAIT = -1 (log_comm.h:29) encodes the “wait forever” sentinel.

  • block_global_oldest_active_until_commit exists for long-running operations that need to do their own vacuuming. Verified at log_impl.h:555 and the lock_global_oldest_visible_mvccid member function declared at log_impl.h:585. Used by reorganize-partition / upgrade-domain code paths that scan large amounts of data and would otherwise have their MVCC threshold pushed forward by concurrent transactions.

  • LOG_2PC_GTRINFO and LOG_2PC_COORDINATOR * are inline TDES fields, present even for non-2PC transactions. Verified at log_impl.h:505-508. coord is NULL if the site is not the coordinator. The cost is one pointer per TDES; the benefit is that attaching a 2PC role to a previously-local transaction does not re-allocate.

  • LOG_RCV_TDES is non-NULL only during recovery. Verified at log_impl.h:458 (struct definition) and 558 (inlined into log_tdes::rcv). Its fields (sysop_start_postpone_lsa, tran_start_postpone_lsa, atomic_sysop_start_lsa, analysis_last_aborted_sysop_*) are populated during analysis-pass and consumed during redo/undo.

  1. TDES hot/cold split. Has anyone measured the cache-miss penalty of putting mvccinfo next to bind_history? Other engines split, presumably for a reason. Investigation path: perf stat -e cache-misses on a high-concurrency read workload; compare against a hypothetical TDES split.

  2. Trantable growth. The header field LOG_ADDR_TDESAREA *area suggests growth is supported at runtime, but the trigger and coordination are unverified. Investigation path: grep for area writes in log_tran_table.c; check whether growth happens in the request path or only at a quiescent point.

  3. hint_free_index correctness under contention. Multiple threads can simultaneously call logtb_assign_tran_index. The hint is single-valued — what guards it? Investigation path: read the body of logtb_assign_tran_index for compare-and-swap or mutex usage.

  4. System-op rmutex_topop behaviour. A reentrant mutex per-TDES suggests system ops can recursively start while one is in progress on the same thread, but the depth bound is unverified. Investigation path: examine log_sysop_start for lock_topop() calls and chase the reentrance count.

  5. Postpone cache integration. m_log_postpone_cache is a C++ class (log_postpone_cache) inlined into the TDES. Its purpose per the field comment is to remember postpone records that may be replayed at log_do_postpone. The exact lifetime (cleared on commit? on abort? carried across sysop boundaries?) is unverified. Investigation path: read log_postpone_cache.cpp together with log_do_postpone in log_manager.c.

  6. Client-side TDES shadow vs. server reality. tm_Tran_* are client-side globals; what happens on a connection failover when the server has a different wait_msecs? Investigation path: trace tran_cache_tran_settings consumers; check whether the CAS broker re-syncs on reconnect.

Beyond CUBRID — Comparative Designs & Research Frontiers

Section titled “Beyond CUBRID — Comparative Designs & Research Frontiers”

Pointers, not analysis. Each bullet is a starting handle for a follow-up doc.

  • PostgreSQL PROC / PGXACT split — PG splits the descriptor into a hot half (PGXACT: xid, xmin, vacuumFlags) read by visibility scans and a cold half (PROC: locktag arrays, myProcLocks). A side-by-side with CUBRID’s monolithic TDES would measure the cache cost.

  • InnoDB trx_t plus lock_sys reservation — InnoDB embeds per-tran lock reservation inside trx_t::lock and uses a global lock_sys_t mutex. CUBRID separates this: LK_RES *waiting_for_res on the TDES plus the lock manager’s per-resource hash. Comparing the two would illuminate the lock-acquisition critical path.

  • Hekaton in-memory transaction map (Larson et al., VLDB 2011) — Hekaton stores TDES in a lock-free hash on transaction-id, with versions stored inline on records. CUBRID’s fixed-array trantable is the opposite design point.

  • Partial rollback chains in PostgreSQL subtransactions — PG uses SubTransactionId and a per-backend stack much like CUBRID’s topops stack. The two-version subtransaction-id mapping in PG (subxact + parent xid) is more elaborate than CUBRID’s LOG_TOPOPS_ADDRESSES but the lifecycle is structurally identical.

  • Optimistic concurrency control on RDMA (FaRM, NSDI 2014) — FaRM eliminates the TDES table by encoding transaction state directly in record versions. CUBRID’s TDES survives because its isolation modes need the descriptor for lock acquisition; comparison highlights what the descriptor is for on a shared-memory engine.

  • JTA XAResource semantics (JSR 907) — the CUBRID 2PC TRAN_STATE branch is conformant to JTA prepared/commit/rollback semantics; the cubrid-2pc.md doc is the natural follow-up that enumerates the conformance points.

  • CockroachDB serializable + parallel commits (Taft et al., SIGMOD 2020) — Cockroach pushes the descriptor into a distributed KV layer and commits a transaction by writing a single intent record whose status is resolved lazily; the “transaction record” plays the role of CUBRID’s TDES but without a fixed-size table. A side-by-side would surface what shared-memory engines pay (the trantable cap) versus what shared-nothing engines pay (intent resolution traffic).

Raw analyses (raw/code-analysis/cubrid/storage/transaction/)

Section titled “Raw analyses (raw/code-analysis/cubrid/storage/transaction/)”
  • Transaction Internals.pdf
  • Transaction Internals.pptx

Textbook chapters (under knowledge/research/dbms-general/)

Section titled “Textbook chapters (under knowledge/research/dbms-general/)”
  • Database Internals (Petrov), Ch. 5 “Transactions and Recovery”, §“ACID” and §“Isolation levels”.
  • Concurrency Control and Recovery in Database Systems (Bernstein, Hadzilacos, Goodman), Ch. 1–4.

CUBRID source (/data/hgryoo/references/cubrid/)

Section titled “CUBRID source (/data/hgryoo/references/cubrid/)”
  • src/transaction/log_impl.h — TDES, trantable, sysop stack.
  • src/transaction/log_tran_table.c — trantable management.
  • src/transaction/transaction_cl.{h,c} — client-side API.
  • src/transaction/transaction_sr.{h,c} — server entry points.
  • src/transaction/transaction_global.hpp — system tran constants.
  • src/transaction/transaction_transient.hpp — modified-class registry, lob locator chain.
  • src/transaction/log_comm.hTRAN_STATE enum.
  • src/transaction/log_manager.c — sysop, commit, abort.
  • src/compat/dbtran_def.hDB_TRAN_ISOLATION enum.
  • knowledge/code-analysis/cubrid/cubrid-log-manager.md — log records the TDES emits.
  • knowledge/code-analysis/cubrid/cubrid-mvcc.md — consumer of log_tdes::mvccinfo.
  • knowledge/code-analysis/cubrid/cubrid-lock-manager.md — consumer of log_tdes::wait_msecs and producer of log_tdes::waiting_for_res.
  • knowledge/code-analysis/cubrid/cubrid-recovery-manager.md — consumer of TDES at analysis time; in-progress in the same batch.
  • knowledge/code-analysis/cubrid/cubrid-2pc.md — owner of the 2PC state-machine arms and coord / gtrinfo; in-progress in the same batch.