Skip to content

CUBRID 2PC — Two-Phase Commit and In-Doubt Recovery

Contents:

The two-phase commit (2PC) protocol is the canonical answer to “how do N independent sites agree to commit or abort one distributed transaction without a central truthkeeper”. Jim Gray named it in Notes on Database Operating Systems (1978); the JTA/XA specification (X/Open, then Java JSR 907) made it the interoperability standard between transaction managers and resource managers. Database Internals (Petrov, ch. 13 “Distributed Transactions”) gives the textbook treatment.

The protocol has two roles and two phases:

  • Coordinator drives the transaction; participants are the resource managers that hold parts of the state.
  • Phase 1 (prepare): coordinator asks each participant “ready to commit?”. Each participant either votes YES (and promises not to abort unilaterally) or NO. The vote is durable before the participant replies.
  • Phase 2 (decision): if all voted YES, coordinator decides COMMIT and tells each participant; otherwise ABORT. The decision is durable before the coordinator sends. Participants ack and the coordinator forgets the gtrid.

The ugly state in 2PC is in-doubt — a participant has voted YES but hasn’t yet heard the coordinator’s decision when it crashes. On restart, the in-doubt transaction is locked (participants hold all locks until the decision arrives) and the coordinator is asked. If the coordinator is also gone, operator intervention is needed. This is why XA defines heuristic decisions and a separate XID identity space.

Two implementation choices the model leaves open shape every real engine and frame the rest of this document:

  1. Same site as coordinator and participant? A single CUBRID server can be both: a query that updates this server and a linked server is a distributed transaction whose coordinator is this server and participant is the linked server. CUBRID’s LOG_2PC_EXECUTE enum dispatches by role.
  2. How is a prepared transaction identified across crashes? The protocol needs a global ID (gtrid / XID) that survives the local TDES being recycled. CUBRID assigns gtrids at log_2pc_start and stores them on the TDES; in-doubt recovery rebuilds the gtrid → tid map from prepared-state log records.

After the choices are named, every CUBRID-specific structure in this document either implements one of them or makes the protocol durable.

Every engine that supports 2PC adopts the same set of patterns on top of Gray’s protocol.

The prepared state and the decision must be on stable storage before the network message is sent. Both records are force-flushed (cubrid-log-manager.md §“Force-at-commit”). PG, InnoDB, Oracle all share this discipline.

gtrid as a separate identifier from local trid

Section titled “gtrid as a separate identifier from local trid”

A local trid is reused after the transaction terminates. gtrid survives — until in-doubt recovery completes or the TM heuristically forgets it. Engines store the gtrid in a side-channel (TDES field, separate table) so the local trid can be recycled without losing the prepared transaction.

The coordinator side of a distributed transaction needs to remember the participant list and ack state. Storing this on the TDES (rather than a separate coordinator table) means a crash recovers it together with the rest of the TDES via LOG_2PC_START / LOG_2PC_PREPARE log records.

If a coordinator crashes after sending PREPARE but before deciding, in-doubt participants must conclude ABORT — there is no committed-decision record to find. The standard “presumed-abort” optimisation: don’t emit any log record for abort decisions you’ve already sent; participants assume abort on coordinator silence.

Java/JTA + the X/Open XA spec define a TM↔RM contract. CUBRID exposes XA through tran_2pc_* client APIs; gtrid here becomes XA’s XID. The CUBRID server is both an RM (when attached as a participant) and an internal coordinator (when driving its own dependent participants).

Theoretical conceptCUBRID name
Coordinator/participant role enumLOG_2PC_EXECUTE { FULL, PREPARE, COMMIT_DECISION, ABORT_DECISION }
Global transaction idgtrid field on LOG_TDES; LOG_2PC_NULL_GTRID = -1
Coordinator state on TDESLOG_2PC_COORDINATOR { num_particps, particp_id_length, block_particps_ids }
Global tran user info (XID payload)LOG_2PC_GTRINFO { info_length, info_data }
Lock-acquire flag on read-prepareLOG_2PC_OBTAIN_LOCKS = true / LOG_2PC_DONT_OBTAIN_LOCKS = false
Phase 1 commitlog_2pc_commit_first_phase (log_2pc.c:437)
Phase 2 commitlog_2pc_commit_second_phase (log_2pc.c:503)
Phase dispatchlog_2pc_commit (log_2pc.c:632)
Prepared log recordLOG_2PC_PREPARE + LOG_REC_2PC_PREPCOMMIT (log_record.hpp:387)
Start recordLOG_2PC_START + LOG_REC_2PC_START (log_record.hpp:399)
Decision recordsLOG_2PC_COMMIT_DECISION, LOG_2PC_ABORT_DECISION
Inform-participants recordsLOG_2PC_COMMIT_INFORM_PARTICPS, LOG_2PC_ABORT_INFORM_PARTICPS
Ack recordLOG_2PC_RECV_ACK + LOG_REC_2PC_PARTICP_ACK (log_record.hpp:412)
Prepared TDES stateTRAN_UNACTIVE_2PC_PREPARE
In-doubt-collecting stateTRAN_UNACTIVE_2PC_COLLECTING_PARTICIPANT_VOTES
Phase-2 decision statesTRAN_UNACTIVE_2PC_COMMIT_DECISION / _ABORT_DECISION
Informing-after-decision statesTRAN_UNACTIVE_COMMITTED_INFORMING_PARTICIPANTS / _ABORTED_INFORMING_*
In-doubt recoverylog_2pc_recovery (log_2pc.h:96)
In-doubt analysis-pass annotationlog_2pc_recovery_analysis_info (log_2pc.h:95)
XA prepared-list querylog_2pc_recovery_prepared (log_2pc.c:915)
XA attach by gtridlog_2pc_attach_global_tran (log_2pc.c:1036)
XA prepare for an attached gtridlog_2pc_prepare_global_tran (log_2pc.c:1126)

The 2PC module has four moving parts: the role-dispatch machinery that routes one TDES through different code paths depending on coordinator/participant role, the prepared-state log records that make the protocol durable, the in-doubt recovery that brings prepared transactions back at restart, and the XA bridge that lets external transaction managers drive the protocol. We walk them in that order.

flowchart LR
  subgraph CL["Client / TM (XA)"]
    XA["xa_prepare\nxa_commit\nxa_rollback"]
    TC["tran_2pc_∗\n(transaction_cl.c)"]
    XA --> TC
  end
  subgraph SR["Server (transaction_sr + log_2pc)"]
    XSC["xtran_2pc_∗"]
    L2C["log_2pc_∗"]
    XSC --> L2C
  end
  subgraph TDES["log_tdes (per-tran state)"]
    GT["gtrid"]
    GI["gtrinfo (XID payload)"]
    CO["coord (NULL if not coordinator)"]
    ST["state (TRAN_UNACTIVE_2PC_∗)"]
  end
  subgraph LOG["WAL records"]
    R0["LOG_2PC_PREPARE"]
    R1["LOG_2PC_START"]
    R2["LOG_2PC_COMMIT_DECISION"]
    R3["LOG_2PC_ABORT_DECISION"]
    R4["LOG_2PC_∗_INFORM_PARTICPS"]
    R5["LOG_2PC_RECV_ACK"]
  end
  subgraph PART["Participants"]
    P1["site B"]
    P2["site C"]
  end
  TC -->|RPC| XSC
  L2C --> TDES
  L2C --> LOG
  L2C -->|prepare| PART
  PART -->|vote| L2C
  L2C -->|commit / abort| PART
  PART -->|ack| L2C

The figure encodes three boundaries. (client / server) the XA / tran_2pc_* API is the client face; the server-side log_2pc_* is the implementation. (TDES / log) the TDES holds the live state (gtrid, coord, isolation-relevant fields); the log holds the durable trail recovery re-establishes. (coordinator / participant) the same TDES can be either, dispatched by LOG_2PC_EXECUTE enum.

// LOG_2PC_EXECUTE — src/transaction/log_2pc.h:45
enum log_2pc_execute
{
LOG_2PC_EXECUTE_FULL, /* The root coordinator */
LOG_2PC_EXECUTE_PREPARE, /* Participant that is also a non-root
coordinator running phase 1 */
LOG_2PC_EXECUTE_COMMIT_DECISION, /* Participant + non-root coordinator
running phase 2 (commit) */
LOG_2PC_EXECUTE_ABORT_DECISION /* Same but abort, possibly without
phase 1 */
};
typedef enum log_2pc_execute LOG_2PC_EXECUTE;

The four values map to four roles a single CUBRID server can play in a distributed transaction:

  • FULL — this server is the root coordinator. It drives prepare, collects votes, decides, and informs all participants.
  • PREPARE — this server is somewhere in the middle of the tree. It is a participant from the perspective of a higher coordinator, and a coordinator for participants below it. Phase 1 from above triggers phase 1 below.
  • COMMIT_DECISION / ABORT_DECISION — same middle position, but executing phase 2.

The dispatch happens in log_2pc_commit (log_2pc.c:632):

// log_2pc_commit — src/transaction/log_2pc.c (signature)
TRAN_STATE
log_2pc_commit (THREAD_ENTRY *thread_p,
log_tdes *tdes,
LOG_2PC_EXECUTE execute_2pc_type,
bool *decision);

The execute_2pc_type argument selects the path; *decision is filled with the local outcome (true=commit, false=abort) that propagates up to a parent coordinator if any.

When a TDES acts as a coordinator, the coord pointer points to a LOG_2PC_COORDINATOR block:

// LOG_2PC_COORDINATOR — src/transaction/log_2pc.h:64
struct log_2pc_coordinator
{
int num_particps; /* Number of participating sites */
int particp_id_length; /* Length of one participant identifier */
void *block_particps_ids; /* Block of N × particp_id_length bytes */
#ifdef LOG_2PC_ACK_RECV_REQUIRED
bool *ack_received; /* Per-participant ack vector */
#endif
};

block_particps_ids is a flat byte block of N participant IDs of length particp_id_length each — a network address, a name, or whatever the calling code passes to log_2pc_alloc_coord_info. Storing it as a flat block (rather than an array of pointers) means it serialises directly into the LOG_2PC_START record:

// LOG_REC_2PC_START — src/transaction/log_record.hpp:399
struct log_rec_2pc_start
{
char user_name[DB_MAX_USER_LENGTH + 1];
int gtrid;
int num_particps;
int particp_id_length;
/* immediately followed by num_particps × particp_id_length bytes */
};

The LOG_2PC_ACK_RECV_REQUIRED #ifdef controls whether the coordinator tracks per-participant acks. When defined, the ack vector is populated by LOG_2PC_RECV_ACK records during phase 2.

log_2pc_alloc_coord_info (declared at log_2pc.h:93) attaches this struct to a TDES; log_2pc_free_coord_info releases it on transaction end.

A gtrid is an int handed out at log_2pc_start and stored in log_tdes::gtrid (log_impl.h:499). The companion LOG_2PC_GTRINFO carries the XA-style payload:

// LOG_2PC_GTRINFO — src/transaction/log_2pc.h:57
struct log_2pc_gtrinfo
{
int info_length;
void *info_data; /* opaque to the engine — XID payload */
};

log_2pc_set_global_tran_info (log_2pc.c:705) writes the payload onto the TDES; log_2pc_get_global_tran_info (log_2pc.c:772) reads it back.

log_2pc_make_global_tran_id (log_2pc.c:323) generates a new gtrid; log_2pc_check_duplicate_global_tran_id (log_2pc.c:407) guards against gtrid collision (used during in-doubt recovery to ensure a recovered gtrid doesn’t clash with a freshly assigned one).

The first-phase function, called via log_2pc_commit (..., FULL, &decision) or log_2pc_commit (..., PREPARE, &decision):

  1. Append LOG_2PC_START record (only at the root) listing the participants.
  2. Send PREPARE to each participant (log_2pc_send_prepare, log_2pc.c:190).
  3. Append LOG_2PC_PREPARE for the local TDES (LOG_REC_2PC_PREPCOMMIT payload).
  4. Force-flush the log.
  5. Transition state to TRAN_UNACTIVE_2PC_COLLECTING_PARTICIPANT_VOTES.
  6. Wait for participant votes.
  7. If all voted YES, set *decision = true; transition to TRAN_UNACTIVE_2PC_COMMIT_DECISION. If any voted NO, set *decision = false; transition to TRAN_UNACTIVE_2PC_ABORT_DECISION.

The local prepared record carries the lock catalogue:

// LOG_REC_2PC_PREPCOMMIT — src/transaction/log_record.hpp:387
struct log_rec_2pc_prepcommit
{
char user_name[DB_MAX_USER_LENGTH + 1];
int gtrid;
int gtrinfo_length; /* length of XID payload that follows */
unsigned int num_object_locks;
unsigned int num_page_locks;
/* followed by gtrinfo bytes, object-lock list, page-lock list */
};

The lock catalogue matters because in-doubt recovery re-acquires locks before exposing the prepared transaction — otherwise a freshly-restarted server could let a concurrent transaction read or modify objects the prepared transaction holds.

After phase 1 produces a decision:

  1. Append the decision record: LOG_2PC_COMMIT_DECISION (log_record.hpp enum value 30) or LOG_2PC_ABORT_DECISION (31).
  2. Force-flush.
  3. Transition to TRAN_UNACTIVE_*_INFORMING_PARTICIPANTS.
  4. Send the decision to each participant (log_2pc_send_commit_decision / _send_abort_decision, log_2pc.c:222 / 261).
  5. Append LOG_2PC_*_INFORM_PARTICPS (32 / 33).
  6. Wait for participant acks.
  7. As each ack arrives, append LOG_2PC_RECV_ACK (34) with the acknowledging participant’s index.
  8. When all acks received, transition to TRAN_UNACTIVE_COMMITTED / _ABORTED and release locks.

Six record types form the durable 2PC trail:

Type numberNamePurpose
28LOG_2PC_PREPARELocal prepared state with lock catalogue
29LOG_2PC_STARTCoordinator’s record of participants
30LOG_2PC_COMMIT_DECISIONPhase-2 commit decision
31LOG_2PC_ABORT_DECISIONPhase-2 abort decision
32LOG_2PC_COMMIT_INFORM_PARTICPSSent commit to participants
33LOG_2PC_ABORT_INFORM_PARTICPSSent abort to participants
34LOG_2PC_RECV_ACKReceived ack from one participant

The order on the log of a successful distributed commit:

LOG_2PC_START
...participant work records...
LOG_2PC_PREPARE
(force flush, send prepare)
(collect votes)
LOG_2PC_COMMIT_DECISION
(force flush, send decision)
LOG_2PC_COMMIT_INFORM_PARTICPS
LOG_2PC_RECV_ACK (×N)
LOG_COMMIT

LOG_2PC_RECV_ACK carries a LOG_REC_2PC_PARTICP_ACK { particp_index } (log_record.hpp:412) — just the index into the start record’s participant block.

In-doubt recovery — the analysis-pass dance

Section titled “In-doubt recovery — the analysis-pass dance”

The recovery analysis pass (cubrid-recovery-manager.md §“Analysis pass”) classifies every TRANID. For 2PC, the classification depends on what records are present:

  • LOG_2PC_PREPARE but no decision record → state TRAN_UNACTIVE_2PC_PREPARE. In-doubt. The recovery must hold locks and wait for the coordinator’s decision.
  • LOG_2PC_COMMIT_DECISION but LOG_2PC_*_INFORM_PARTICPS not seen for some participants → state TRAN_UNACTIVE_COMMITTED_INFORMING_PARTICIPANTS. The decision is durable; we need to re-send to the missed participants and collect acks.
  • LOG_2PC_RECV_ACK for all participants → done; transition to TRAN_UNACTIVE_COMMITTED.

log_2pc_recovery_analysis_info (log_2pc.h:95) is called from the analysis pass for each 2PC-bearing TDES. After the analysis pass, log_2pc_recovery (log_2pc.h:96) walks the in-doubt set:

  • For each TRAN_UNACTIVE_2PC_PREPARE — re-acquire locks (log_2pc_read_prepare reads the lock catalogue from the prepared record, and LOG_2PC_OBTAIN_LOCKS = true makes it acquire them); the transaction stays in-doubt until the coordinator (or an operator) decides.
  • For each TRAN_UNACTIVE_*_INFORMING_PARTICIPANTS — resume inform-and-ack; re-send decision to participants whose ack is missing.

The fifth recovery phase LOG_RECOVERY_FINISH_2PC_PHASE (declared in log_impl.h:631) is the named slot for this work, even though the current log_recovery driver in cubrid-recovery-manager.md doesn’t call it — open question §4 in this doc.

XA bridge — external transaction managers

Section titled “XA bridge — external transaction managers”

The XA APIs (xa_prepare, xa_commit, xa_rollback, xa_recover) flow through tran_2pc_* on the client (transaction_cl.h) into xtran_2pc_* on the server. The key entry points:

  • tran_2pc_startlog_2pc_start (log_2pc.c:833): generate a gtrid, install on TDES.
  • tran_2pc_preparelog_2pc_prepare (log_2pc.c:877): run phase 1 with LOG_2PC_EXECUTE_FULL if the local server is the root, else LOG_2PC_EXECUTE_PREPARE.
  • tran_2pc_recovery_preparedlog_2pc_recovery_prepared (log_2pc.c:915): xa_recover equivalent — return the list of in-doubt gtrids the TM should resolve.
  • tran_2pc_attach_global_tranlog_2pc_attach_global_tran (log_2pc.c:1036): xa_start resume — attach to an existing gtrid (used after a connection-failover or thread switch in a thread-per-request server).
  • tran_2pc_prepare_global_tranlog_2pc_prepare_global_tran (log_2pc.c:1126): drive prepare on a previously attached gtrid.

log_2pc_find_tran_descriptor (log_2pc.c:952) is the gtrid → TDES lookup used by every attach-style call.

sequenceDiagram
  participant TM as Transaction Manager (XA)
  participant CO as Coordinator (CUBRID server)
  participant LM as log_manager
  participant P1 as Participant 1
  participant P2 as Participant 2

  TM->>CO: xa_start (gtrid)
  CO->>CO: log_2pc_start: assign gtrid, install on TDES
  Note over CO: ...transaction work happens...
  TM->>CO: xa_prepare
  CO->>LM: append LOG_2PC_START (participant block)
  CO->>P1: send PREPARE
  CO->>P2: send PREPARE
  CO->>LM: append LOG_2PC_PREPARE (local lock catalogue)
  CO->>LM: force flush
  CO->>CO: state = TRAN_UNACTIVE_2PC_COLLECTING_PARTICIPANT_VOTES
  P1-->>CO: vote YES
  P2-->>CO: vote YES
  CO->>CO: state = TRAN_UNACTIVE_2PC_COMMIT_DECISION
  TM->>CO: xa_commit
  CO->>LM: append LOG_2PC_COMMIT_DECISION
  CO->>LM: force flush
  CO->>P1: send COMMIT
  CO->>P2: send COMMIT
  CO->>LM: append LOG_2PC_COMMIT_INFORM_PARTICPS
  P1-->>CO: ack
  CO->>LM: append LOG_2PC_RECV_ACK (idx=1)
  P2-->>CO: ack
  CO->>LM: append LOG_2PC_RECV_ACK (idx=2)
  CO->>LM: append LOG_COMMIT
  CO->>CO: state = TRAN_UNACTIVE_COMMITTED
  CO->>CO: release locks

Anchor on symbol names, not line numbers.

  • LOG_2PC_NULL_GTRID (log_2pc.h) — sentinel for “no gtrid”.
  • LOG_2PC_OBTAIN_LOCKS / LOG_2PC_DONT_OBTAIN_LOCKS (log_2pc.h) — flags for log_2pc_read_prepare.
  • LOG_2PC_EXECUTE enum (log_2pc.h) — role dispatch.
  • LOG_2PC_GTRINFO (log_2pc.h) — XA payload wrapper.
  • LOG_2PC_COORDINATOR (log_2pc.h) — coordinator state on TDES.
  • LOG_REC_2PC_PREPCOMMIT (log_record.hpp) — prepared record payload.
  • LOG_REC_2PC_START (log_record.hpp) — start record payload.
  • LOG_REC_2PC_PARTICP_ACK (log_record.hpp) — ack payload.
  • log_2pc_start (log_2pc.c) — assign gtrid.
  • log_2pc_make_global_tran_id (log_2pc.c) — gtrid generator.
  • log_2pc_check_duplicate_global_tran_id (log_2pc.c) — recovery-time guard.
  • log_2pc_send_prepare (log_2pc.c) — phase-1 send.
  • log_2pc_send_commit_decision / log_2pc_send_abort_decision (log_2pc.c) — phase-2 send.
  • log_2pc_alloc_coord_info (log_2pc.h) — attach LOG_2PC_COORDINATOR to TDES.
  • log_2pc_free_coord_info (log_2pc.h) — release.
  • log_2pc_commit_first_phase (log_2pc.c).
  • log_2pc_commit_second_phase (log_2pc.c).
  • log_2pc_commit (log_2pc.c) — top-level dispatcher.
  • log_2pc_prepare (log_2pc.c) — XA prepare entry.
  • log_2pc_append_start (log_2pc.c).
  • log_2pc_append_decision (log_2pc.c).
  • log_2pc_recovery_analysis_info (log_2pc.h) — per-TDES classification during analysis pass.
  • log_2pc_recovery (log_2pc.h) — post-analysis driver for in-doubt and informing-participants TDES.
  • log_2pc_read_prepare (log_2pc.h) — read prepared record; optionally reacquire locks.
  • log_2pc_set_global_tran_info / log_2pc_get_global_tran_info (log_2pc.c).
  • log_2pc_recovery_prepared (log_2pc.c) — xa_recover equivalent.
  • log_2pc_find_tran_descriptor (log_2pc.c).
  • log_2pc_attach_client (log_2pc.c) — bind a client to a TDES.
  • log_2pc_attach_global_tran (log_2pc.c) — xa_start resume.
  • log_2pc_prepare_global_tran (log_2pc.c).
  • log_2pc_get_num_participants (log_2pc.c).
  • log_2pc_dump_participants / log_2pc_dump_gtrinfo / log_2pc_dump_acqobj_locks (log_2pc.c) — debug dumps.
  • log_2pc_is_tran_distributed (log_2pc.h) — bool query.
  • log_2pc_clear_and_is_tran_distributed (log_2pc.h).
SymbolFileLine
LOG_2PC_EXECUTE enumlog_2pc.h45
LOG_2PC_GTRINFO (struct)log_2pc.h58
LOG_2PC_COORDINATOR (struct)log_2pc.h65
log_2pc_get_num_participantslog_2pc.c132
log_2pc_dump_participantslog_2pc.c162
log_2pc_send_preparelog_2pc.c190
log_2pc_send_commit_decisionlog_2pc.c222
log_2pc_send_abort_decisionlog_2pc.c261
log_2pc_make_global_tran_idlog_2pc.c323
log_2pc_check_duplicate_global_tran_idlog_2pc.c407
log_2pc_commit_first_phaselog_2pc.c437
log_2pc_commit_second_phaselog_2pc.c503
log_2pc_commitlog_2pc.c632
log_2pc_set_global_tran_infolog_2pc.c705
log_2pc_get_global_tran_infolog_2pc.c772
log_2pc_startlog_2pc.c833
log_2pc_preparelog_2pc.c877
log_2pc_recovery_preparedlog_2pc.c915
log_2pc_find_tran_descriptorlog_2pc.c952
log_2pc_attach_clientlog_2pc.c984
log_2pc_attach_global_tranlog_2pc.c1036
log_2pc_prepare_global_tranlog_2pc.c1126
log_2pc_read_prepare (LSA variant)log_2pc.c1313
log_2pc_read_prepare (reader variant)log_2pc.c1389
log_2pc_dump_gtrinfolog_2pc.c1476
log_2pc_dump_acqobj_lockslog_2pc.c1491
log_2pc_append_startlog_2pc.c1513
log_2pc_append_decisionlog_2pc.c1570
  • The LOG_2PC_EXECUTE enum has four values, three of them for non-root coordinators. Verified at log_2pc.h:45. FULL is the root path; the other three correspond to a middle-of-tree node that is both a participant from above and a coordinator below.

  • Coordinator info is attached to the TDES, not stored separately. Verified at log_impl.h:506 (LOG_TDES::coord of type LOG_2PC_COORDINATOR *) plus log_2pc.h:65. The pointer is NULL when this site is not the coordinator; non-NULL when it owns the participant block.

  • Per-participant ack tracking is #ifdef LOG_2PC_ACK_RECV_REQUIRED. Verified at log_2pc.h:70. The macro is presumably defined on builds that need conservative ack tracking; the alternative is to skip per-participant acks and rely on the LOG_2PC_*_INFORM_PARTICPS records’ sequencing.

  • Six log record types form the 2PC durable trail (28-34). Verified at log_record.hpp:99-107: LOG_2PC_PREPARE (28), LOG_2PC_START (29), LOG_2PC_COMMIT_DECISION (30), LOG_2PC_ABORT_DECISION (31), LOG_2PC_COMMIT_INFORM_PARTICPS (32), LOG_2PC_ABORT_INFORM_PARTICPS (33), LOG_2PC_RECV_ACK (34). The values are stable — they appear in old archived logs identical to current.

  • The prepared record carries the full lock catalogue. Verified at log_record.hpp:387-396 (LOG_REC_2PC_PREPCOMMIT::num_object_locks and num_page_locks). After the fixed-size header, the record carries the gtrinfo bytes followed by the lock list. This is what log_2pc_read_prepare reads at recovery time to reacquire locks.

  • There are two overloads of log_2pc_read_prepare. Verified at log_2pc.h:88-90: one takes a LOG_LSA * + LOG_PAGE *, the other takes a log_reader &. The two exist for compatibility — older code paths use the explicit LSA, newer code paths use the log_reader class (cubrid-recovery-manager.md).

  • In-doubt recovery is a separate phase named LOG_RECOVERY_FINISH_2PC_PHASE. Verified at log_impl.h:631. The phase is named in the enum but is not called from the log_recovery body sketched in cubrid-recovery-manager.md — it is invoked from the analysis / undo passes via log_2pc_recovery (open question §4).

  • gtrid is an int, not an opaque XID. Verified at log_impl.h:499 (LOG_TDES::gtrid is int) and log_2pc.h:41 (LOG_2PC_NULL_GTRID = -1). The XA-style XID payload travels through LOG_2PC_GTRINFO::info_data separately.

  • log_2pc_recovery_prepared is the xa_recover equivalent. Verified by signature (int gtrids[], int size) and name. Returns a list of currently in-doubt gtrids the external TM should resolve.

  • log_2pc_attach_global_tran resumes a transaction by gtrid. Verified at log_2pc.c:1036. Used by the XA xa_start resume path when a previously suspended transaction is being re-attached, possibly on a different thread.

  • Lock acquisition during prepare is controlled by a flag. Verified at log_2pc.h:42-43: LOG_2PC_OBTAIN_LOCKS = true / LOG_2PC_DONT_OBTAIN_LOCKS = false. The flag is passed to log_2pc_read_prepare. False is for diagnostic dumping of a prepared record; true is for actual recovery use.

  1. Heuristic abort / heuristic commit handling. XA defines xa_forget for resolved heuristic decisions. CUBRID’s API surface (tran_2pc_*) does not obviously expose the heuristic-decision record type. Investigation path: search tran_2pc_* and xtran_2pc_* for a forget call.

  2. Presumed-abort optimisation. The standard “no abort log record on coordinator timeout” pattern — does CUBRID implement it? log_2pc_send_abort_decision appends a record before sending; whether this is force-flushed or skipped on coordinator timeout was not traced. Investigation path: read log_2pc_send_abort_decision body.

  3. Multi-level coordination tree. LOG_2PC_EXECUTE_PREPARE handles “I am a participant and coordinator below me”. How does the protocol handle 3+ levels? Are votes propagated serially? Investigation path: log_2pc_commit_first_phase for LOG_2PC_EXECUTE_PREPARE arm.

  4. LOG_RECOVERY_FINISH_2PC_PHASE invocation. The phase is named in log_impl.h:631 but the log_recovery driver in cubrid-recovery-manager.md does not call into it explicitly. Where exactly does log_2pc_recovery get invoked? Investigation path: grep for log_2pc_recovery callers.

  5. Coordinator-down-during-decision recovery. If the root coordinator crashes after LOG_2PC_COMMIT_DECISION but before LOG_2PC_COMMIT_INFORM_PARTICPS, the participants are in-doubt and the coordinator’s restart must re-send. The state TRAN_UNACTIVE_COMMITTED_INFORMING_PARTICIPANTS captures this, but the re-send timing (how often, how long) was not traced. Investigation path: log_2pc_recovery body.

  6. gtrid space exhaustion. gtrid is an int (~2 billion). Recycling vs. exhaustion behaviour wasn’t traced. Investigation path: log_2pc_make_global_tran_id and log_2pc_check_duplicate_global_tran_id.

Beyond CUBRID — Comparative Designs & Research Frontiers

Section titled “Beyond CUBRID — Comparative Designs & Research Frontiers”

Pointers, not analysis.

  • Paxos commit (Gray & Lamport, 2006) — replaces the blocking 2PC with a non-blocking protocol via Paxos consensus among coordinators. CUBRID’s LOG_2PC_* is classical 2PC; a Paxos-commit follow-up doc would document what CUBRID gives up by not running multiple coordinators.

  • Spanner’s 2PC over Paxos groups (Corbett et al., OSDI 2012) — globally-distributed 2PC where each participant is itself a Paxos group. The protocol is the same, but the participant side is replicated. Out of scope for CUBRID, but a useful contrast for the failure model.

  • Presumed-abort and presumed-commit optimisations — ARIES/PA, ARIES/PC. Reduce log volume in the common case. CUBRID’s discipline appears to be “log everything”; an audit of whether the optimisation could apply would be a good follow-up.

  • JTA/XA (X/Open CAE Spec C193, 1991) — the canonical resource-manager-to-transaction-manager contract. CUBRID supports it through the C XA library and the JDBC driver’s XADataSource.

  • Spanner’s TrueTime + commit wait — uses bounded clock uncertainty to externalise serializability. CUBRID’s 2PC has no clock-based ordering; reads cross-server can see inconsistent times.

  • eXtended Architecture for distributed transactions (D-XA, P-XA) — extensions for parallel and pipelined 2PC. Modern CUBRID could in principle pipeline phase 1 across participants more aggressively.

Raw analyses (raw/code-analysis/cubrid/storage/transaction/)

Section titled “Raw analyses (raw/code-analysis/cubrid/storage/transaction/)”
  • Transaction Internals.pdf
  • Transaction Internals.pptx — the 2PC chapters; the document is shared with cubrid-transaction.md, with scope-decisions in .meta/cubrid-2pc.yaml documenting the split.
  • knowledge/code-analysis/cubrid/cubrid-transaction.md — parent: TDES, isolation, savepoints. The lifecycle states TRAN_UNACTIVE_2PC_* are listed there in full.
  • knowledge/code-analysis/cubrid/cubrid-log-manager.md — the six 2PC log record types’ on-disk format.
  • knowledge/code-analysis/cubrid/cubrid-recovery-manager.md — the analysis pass that classifies in-doubt and informing TDES.
  • knowledge/code-analysis/cubrid/cubrid-lock-manager.md — the lock manager whose lock catalogue the prepared record serialises.

Textbook chapters (under knowledge/research/dbms-general/)

Section titled “Textbook chapters (under knowledge/research/dbms-general/)”
  • Database Internals (Petrov), Ch. 13 “Distributed Transactions” — 2PC, Paxos commit, presumed abort/commit.
  • Gray, Notes on Database Operating Systems, 1978 — the original 2PC protocol description.
  • Concurrency Control and Recovery in Database Systems (Bernstein et al.), Ch. 7 “Distributed Recovery”.

CUBRID source (/data/hgryoo/references/cubrid/)

Section titled “CUBRID source (/data/hgryoo/references/cubrid/)”
  • src/transaction/log_2pc.{c,h}
  • src/transaction/log_record.hppLOG_REC_2PC_* payload structs.
  • src/transaction/log_recovery.c — analysis-pass classification of 2PC records.
  • src/transaction/log_tran_table.c — TDES allocation (gtrid lives on the TDES).
  • src/transaction/transaction_{cl,sr}.{h,c} — the public tran_2pc_* / xtran_2pc_* API.