Skip to content

CUBRID HA Replication — Logical-Log Based Master/Slave Replication via copylogdb and applylogdb

Contents:

Replication in a relational database engine is, at the abstraction level, the engineering problem of keeping two on-disk databases equivalent under a query workload without forcing the writers on both nodes to coordinate per-statement. The textbook framing (Kleppmann, Designing Data-Intensive Applications, ch. 5 “Replication”; Petrov, Database Internals, ch. 13 “Replication”) splits the design space along three axes that every real engine — including CUBRID — has to choose a coordinate on.

The first axis is what gets shipped. Physical replication ships the engine’s own WAL records — page-level redo records, in their on-disk layout. The slave is an exact byte-equivalent of the master (same page layout, same heap free-space distribution, same B-tree page splits at the same LSAs). PostgreSQL streaming replication, MySQL InnoDB redo log replication, and Oracle’s physical standby use this model. Logical replication ships statements or row events: “INSERT INTO t VALUES (…)”, or “row with PK=X in table T was updated; before-image=A, after-image=B”. The slave is equivalent but not byte-identical — it might choose different physical layouts. MySQL row-based binlog, PostgreSQL logical replication slots, and Oracle GoldenGate use this model.

The trade-off is symmetric. Physical replication is cheap to produce (the WAL already exists; no extra emission cost) and fast to apply (memcpy onto a page), but it requires the slave to be byte-compatible with the master, which precludes schema divergence, mixed-version clusters, or partial replication (replicating a subset of tables). Logical replication is more expensive to produce (the engine must extract row images and metadata at DML time) and slower to apply (each row event re-runs the SQL execution path), but the slave is decoupled from the master’s physical layout, mixed versions and table-level filtering become possible, and the wire format is portable across engines that agree on it.

The second axis is statement-level versus row-level. Inside the logical camp, an event can be a SQL statement (“UPDATE t SET c = c+1 WHERE x > 10”) or a row image (“row OID=O, before=A, after=B”). Statement-level events are compact but non-deterministic: a NOW() or RAND() call must evaluate the same way on both nodes; an ordering-sensitive query without an ORDER BY may apply rows in different orders and produce different results. Row-level events are larger but deterministic because the slave applies a row image, not a query plan. Most modern engines either default to row-level (MySQL post-5.7, PostgreSQL logical replication) or expose a hybrid (CUBRID emits both — LOG_REPLICATION_DATA for row events, LOG_REPLICATION_STATEMENT for DDL and trigger-bound statements).

The third axis is synchronous versus asynchronous. Synchronous replication holds the master’s commit until the slave has acknowledged the commit’s records — providing zero-data-loss failover at the cost of master commit latency rising to slave round-trip plus apply time. Asynchronous replication lets the master commit immediately, the slave catches up at its own pace, and a master crash before the next batch ships loses the in-flight window. Eventually consistent in the textbook sense: the slave converges to the master’s state given enough time and no further master writes. CUBRID’s HA replication is asynchronous — the slave’s apply is decoupled from the master’s commit, the only contract is that the slave’s apply order matches the master’s commit order.

Once these three axes are named, every CUBRID-specific structure in this document is implementing a coordinate choice on one of them, or making the resulting state machine durable.

Below the textbook abstraction, every primary/standby DBMS that ships a logical replication path — MySQL row-based binlog, PostgreSQL logical decoding, Oracle GoldenGate, CUBRID HA — reaches for the same handful of patterns. They are not in the original replication chapters; they are the engineering vocabulary that lives between the model and the source.

The master’s regular WAL is physiological: a LOG_UNDOREDO_DATA record carries the page id, slot id, before-image and after-image for a single page mutation. That record is enough for crash recovery on the master itself, but a slave applier cannot decode it without also knowing the master’s catalog state at the time of the write — which class id maps to which table name, which index on which column, which heap layout corresponds to the slot. The remedy is universal: emit a second record at DML time that carries the catalog-resolved logical view (table name, primary-key column, primary-key value, operation kind). MySQL row-based binlog, the PostgreSQL pgoutput plugin, and CUBRID’s LOG_REPLICATION_DATA are all the same idea — a redundant record sitting next to the physiological WAL, paying the bandwidth cost up front so the slave can apply without consulting the master’s catalog.

Master-side staging in the transaction descriptor

Section titled “Master-side staging in the transaction descriptor”

The DML operation generates one or more replication records, but the records are not appended to the WAL stream until commit. They have to live somewhere in the meantime. The standard pattern is a per-transaction array hanging off the transaction descriptor (tdes in CUBRID, binlog_cache in MySQL, ReorderBuffer entries in PostgreSQL). On commit, the array is walked and each entry is turned into a real WAL record, atomically with the commit record itself; on abort, the array is discarded. CUBRID’s tdes->repl_records is exactly this — an array of LOG_REPL_RECORD staged until commit.

Atomic emission of replication and commit records

Section titled “Atomic emission of replication and commit records”

The slave’s apply algorithm walks the log forward and reacts to LOG_COMMIT by flushing all replication records seen for that transaction. The interleaving rule the slave depends on is strict: between transaction T’s last replication record and its commit record, no other transaction’s commit record may appear. If a peer transaction’s commit slipped in, a slave that crashed and restarted between the two records would incorrectly mark T as committed without applying its records. The fix is to emit T’s queued replication records and its commit record under a single hold of the prior-LSA mutex. CUBRID’s log_append_repl_info_and_commit_log is precisely this idiom — lock, append all repl records, append the commit record, unlock.

Slave-side push or pull, and where the daemon lives

Section titled “Slave-side push or pull, and where the daemon lives”

The slave’s log fetch is a separate concern from the slave’s log apply. Pulling the log can be done by a thread on the slave server itself (PostgreSQL’s walreceiver), or by an out-of-process daemon (CUBRID’s copylogdb, the legacy MySQL replication SQL thread). The apply side is similarly separable. Splitting them is universal because their failure modes are independent: a slow apply must not stop the slave from receiving new log, or the slave will fall arbitrarily behind and the master’s archive will eventually be deleted out from under it.

Forward log walking with a position cursor and durable bookmark

Section titled “Forward log walking with a position cursor and durable bookmark”

The applier carries an LSA cursor; it advances the cursor only after an event has been applied and acknowledged downstream. On restart it reads the cursor from a persistent location (a system table on the slave, a file in the log directory, an entry in a control database). CUBRID’s _db_ha_apply_info system table is exactly this — committed_lsa, committed_rep_lsa, required_lsa, maintained by la_log_commit so a daemon restart picks up where the previous run left off.

Per-transaction buffering until commit on the slave

Section titled “Per-transaction buffering until commit on the slave”

Logical events are emitted in interleaved order (T1’s INSERT, T2’s UPDATE, T1’s INSERT, T2’s COMMIT, T1’s COMMIT), but the slave wants them in commit order, transaction at a time, all of T2 then all of T1. The applier solves this with a per-trid hash of pending events. On LOG_COMMIT it walks the trid’s bucket and dispatches in order; on LOG_ABORT it discards the bucket. CUBRID’s la_Info.repl_lists[] (an array of LA_APPLY per transaction) plus la_Info.commit_head (a queue of LA_COMMIT for committed transactions) implement this protocol.

Theoretical conceptCUBRID name
Auxiliary logical-event log recordLOG_REPLICATION_DATA = 39 and LOG_REPLICATION_STATEMENT = 40 (log_record.hpp:116-117)
Per-record kind (recovery index)RVREPL_DATA_INSERT/UPDATE/DELETE/STATEMENT/UPDATE_START/UPDATE_END (recovery.h:149-154)
Master-side per-tran staging entryLOG_REPL_RECORD (replication.h:78) with repl_type, rcvindex, inst_oid, lsa, repl_data, length, must_flush, tde_encrypted
Per-tran array on the descriptorLOG_TDES::repl_records[], num_repl_records, cur_repl_record, fl_mark_repl_recidx (log_impl.h:522-526)
Update-LSA back-patchLOG_TDES::repl_insert_lsa, repl_update_lsa (log_impl.h:527-528); repl_add_update_lsa (replication.c:229)
Insert into stagingrepl_log_insert (replication.c:293)
Statement-level emissionrepl_log_insert_statement (replication.c:512)
Flush mark for system DDLrepl_start_flush_mark (replication.c:606), repl_end_flush_mark (replication.c:635)
Master-side commit-time emissionlog_append_repl_info_internal (log_manager.c:4555), log_append_repl_info_and_commit_log (log_manager.c:4647)
Atomic repl + commit emissionlog_append_repl_info_and_commit_log holds prior_lsa_mutex across both appends
Server side of copy protocolxlogwr_get_log_pages (log_writer.c:2571)
Slave side daemon (copylogdb)logwr_copy_log_file (log_writer.c:1659/1960); writes via logwr_flush_all_append_pages (1016) and logwr_archive_active_log (1275)
Slave side daemon (applylogdb)la_apply_log_file (log_applier.c:8074)
Slave per-record dispatchla_log_record_process (log_applier.c:6101)
Slave per-trid pending listLA_INFO::repl_lists[] of LA_APPLY (log_applier.c:255-264, 298)
Slave commit queueLA_INFO::commit_head / commit_tail of LA_COMMIT (log_applier.c:266-276, 304-305)
Slave dispatch fan-outla_apply_repl_log switching on item->item_type to la_apply_insert/update/delete/statement_log
Slave durable bookmarkLA_HA_APPLY_INFO row in _db_ha_apply_info (log_applier.c:393)
Slave retryable-error maskLA_RETRY_ON_ERROR (log_applier.h:34)
Slave table-level filterREPL_FILTER_TYPE and LA_REPL_FILTER (log_applier.h:48, log_applier.c:206)

CUBRID HA replication has four moving parts: the master-side emission path that, during DML, puts one LOG_REPL_RECORD into the transaction descriptor for every catalog-visible row mutation; the master-side flush path that, on commit, drains those staged records into the log stream as LOG_REPLICATION_DATA / LOG_REPLICATION_STATEMENT log records atomically with the commit record; copylogdb, a client-mode daemon running on the slave host that pulls active and archive log volumes from the master via a single net request and writes them to local storage; and applylogdb, another client-mode daemon that walks the local log volumes forward, dispatches per record type, and replays the DML through the slave server’s regular client API. We walk them in that order.

flowchart LR
  subgraph M["Master cub_server"]
    DML["DML transaction\n(qexec_execute_∗)"]
    LOC["locator_∗_force\nlocator_attribute_info_force"]
    HEAP["heap_∗_logical\nbtree_update"]
    REPL["repl_log_insert\nrepl_add_update_lsa"]
    TDES["tdes->repl_records[]\n(LOG_REPL_RECORD)"]
    COMMIT["log_commit ->\nlog_append_repl_info_and_commit_log"]
    PRIOR["prior_lsa list +\nlog_append_repl\n(LOG_REPLICATION_DATA)"]
    LGAT["active log\n(lgat) +\narchive volumes"]
    XLW["xlogwr_get_log_pages\n(NET_SERVER_LOGWR_GET_LOG_PAGES)"]
    DML --> LOC --> HEAP --> REPL --> TDES
    DML --> COMMIT --> PRIOR --> LGAT
    LGAT --> XLW
  end
  subgraph S["Slave host"]
    CLDB["copylogdb\nlogwr_copy_log_file"]
    SLOG["slave-side log volumes\n(active + archive)"]
    APPL["applylogdb\nla_apply_log_file"]
    REC["la_log_record_process\nper-record dispatch"]
    PEND["la_Info.repl_lists[]\nper-trid LA_APPLY"]
    CQ["la_Info.commit_head\nLA_COMMIT queue"]
    AP["la_apply_repl_log\nla_apply_insert/update/delete/statement_log"]
    SLAVE["slave cub_server\n(client-mode connection)"]
    HA["_db_ha_apply_info row\nLA_HA_APPLY_INFO"]
    XLW -- "log pages" --> CLDB --> SLOG --> APPL
    APPL --> REC
    REC -- "REPL records" --> PEND
    REC -- "COMMIT/ABORT" --> CQ
    CQ --> AP
    AP --> SLAVE
    AP --> HA
  end

The figure encodes four boundaries. (emit / flush) the master populates tdes->repl_records[] during DML, but the actual LOG_REPLICATION_DATA log record is only written at commit by log_append_repl_info. (commit atomicity) log_append_repl_info_and_commit_log takes the prior-LSA mutex once and holds it across the repl appends and the commit append, guaranteeing no peer commit slips between them. (copy / apply) copylogdb and applylogdb are separate processes that share only the on-disk log volumes — apply slowness cannot back-pressure the copy. (client-mode slave server) the apply daemon connects to the slave’s cub_server over the regular client/server protocol; it does not write pages directly. This is what makes table-level filtering, schema divergence tolerance, and per-row error retry possible.

Master side — LOG_REPL_RECORD and the staging array

Section titled “Master side — LOG_REPL_RECORD and the staging array”

A DML operation on the master is dispatched through the same execution path it would take on a non-HA server:

sqmgr_execute_query
→ xqmgr_execute_query
→ qmgr_process_query
→ qexec_execute_main_block / qexec_execute_mainblock_internal
→ qexec_execute_<insert|update|delete>
→ locator_attribute_info_force
→ locator_insert_force / locator_update_force / locator_delete_force

Inside locator_*_force, two side-effects happen for every row operation: a physiological log record is emitted via heap_insert_logical / heap_update_logical / heap_delete_logical (or the index variants btree_update), and a replication record is queued via repl_log_insert. The replication record does not go into the WAL stream at this point — it is appended to a per-transaction staging array on the transaction descriptor:

// LOG_TDES replication fields — src/transaction/log_impl.h:522
int num_repl_records; /* # of replication records (capacity) */
int cur_repl_record; /* # of replication records used so far */
int append_repl_recidx; /* cursor used at commit-time emission */
int fl_mark_repl_recidx; /* index of flush-marked record (DDL) */
struct log_repl *repl_records; /* the array */
LOG_LSA repl_insert_lsa; /* insert-or-MVCC-update target LSA */
LOG_LSA repl_update_lsa; /* in-place-update target LSA */

The staging entry is a LOG_REPL_RECORD:

// LOG_REPL_RECORD — src/transaction/replication.h:78
typedef struct log_repl LOG_REPL_RECORD;
struct log_repl
{
LOG_RECTYPE repl_type; /* LOG_REPLICATION_DATA or LOG_REPLICATION_STATEMENT */
LOG_RCVINDEX rcvindex; /* RVREPL_DATA_INSERT / UPDATE / DELETE /
UPDATE_START / UPDATE_END / STATEMENT */
OID inst_oid; /* OID of the row being changed */
LOG_LSA lsa; /* LSA of the related "real" log record
(filled in later for UPDATE) */
char *repl_data; /* | pkey size | class_name | pkey dbvalue | */
int length; /* repl_data length */
LOG_REPL_FLUSH must_flush; /* DONT_NEED_FLUSH=-1, COMMIT_NEED_FLUSH=0,
NEED_FLUSH=1 */
bool tde_encrypted; /* class is TDE-encrypted */
};

The repl_data payload is intentionally minimal: a 4-byte packed-key length, an or-packed class name, and the or-packed primary-key DB_VALUE. The slave does not need the full row image at this point; it will re-fetch the row from the master’s heap log when it applies the event. Keeping the staging entry small bounds the per-transaction memory cost when a DML touches millions of rows.

// repl_log_insert — src/transaction/replication.c:293 (condensed)
int
repl_log_insert (THREAD_ENTRY *thread_p, const OID *class_oid, const OID *inst_oid,
LOG_RECTYPE log_type, LOG_RCVINDEX rcvindex,
DB_VALUE *key_dbvalue, REPL_INFO_TYPE repl_info)
{
int tran_index = LOG_FIND_THREAD_TRAN_INDEX (thread_p);
LOG_TDES *tdes = LOG_FIND_TDES (tran_index);
LOG_REPL_RECORD *repl_rec;
if (tdes->suppress_replication != 0) {
LSA_SET_NULL (&tdes->repl_insert_lsa);
LSA_SET_NULL (&tdes->repl_update_lsa);
return NO_ERROR;
}
/* Allocate / grow tdes->repl_records as needed. */
if (REPL_LOG_IS_NOT_EXISTS (tran_index))
repl_log_info_alloc (tdes, REPL_LOG_INFO_ALLOC_SIZE, false);
else if (REPL_LOG_IS_FULL (tran_index))
repl_log_info_alloc (tdes, REPL_LOG_INFO_ALLOC_SIZE, true); /* realloc +100 */
repl_rec = (LOG_REPL_RECORD *) (&tdes->repl_records[tdes->cur_repl_record]);
repl_rec->repl_type = log_type;
repl_rec->rcvindex = rcvindex;
/* RBR_START / RBR_END refine UPDATE into UPDATE_START / UPDATE_END */
if (rcvindex == RVREPL_DATA_UPDATE) { /* ... map repl_info → rcvindex ... */ }
COPY_OID (&repl_rec->inst_oid, inst_oid);
if (log_type == LOG_REPLICATION_DATA) {
/* Build | packed_key_size | class_name | pkey_dbvalue | */
repl_rec->length = OR_INT_SIZE
+ or_packed_string_length (class_name, &strlen)
+ OR_VALUE_ALIGNED_SIZE (key_dbvalue);
repl_rec->repl_data = malloc (repl_rec->length);
/* ... pack class_name + key_dbvalue, fill packed_key_size ... */
}
repl_rec->must_flush = LOG_REPL_COMMIT_NEED_FLUSH;
/* Bookkeeping: link the LSA of the heap log to the repl record. */
switch (rcvindex) {
case RVREPL_DATA_INSERT:
if (!LSA_ISNULL (&tdes->repl_insert_lsa)) {
LSA_COPY (&repl_rec->lsa, &tdes->repl_insert_lsa);
LSA_SET_NULL (&tdes->repl_insert_lsa);
LSA_SET_NULL (&tdes->repl_update_lsa);
}
break;
case RVREPL_DATA_UPDATE:
/* For update, the heap log is written *after* the repl record;
repl_add_update_lsa back-patches repl_rec->lsa later. */
LSA_SET_NULL (&repl_rec->lsa);
break;
case RVREPL_DATA_DELETE:
/* For delete, no after-image is needed — pkey is enough. */
break;
}
tdes->cur_repl_record++;
tdes->must_flush = LOG_REPL_NEED_FLUSH;
return NO_ERROR;
}

Three points worth marking up. (a) The default array size is 100 (REPL_LOG_INFO_ALLOC_SIZE), grown by 100 on overflow via realloc; transactions that never exceed 100 row mutations never pay reallocation cost. (b) RVREPL_DATA_UPDATE maps to one of three sub-kinds — UPDATE, UPDATE_START, UPDATE_END — driven by REPL_INFO_TYPE. The split exists so the slave can recognise multi-statement updates within a system op (the START / END pair brackets the changes). (c) The relationship between the replication record’s lsa field and the heap log is asymmetric across operations. For INSERT, the heap log is written before repl_log_insert is called, so tdes->repl_insert_lsa already holds it and the function copies it directly. For UPDATE, the heap log is written after the index update path that triggered repl_log_insert, so the field is set null and back-patched later by repl_add_update_lsa once locator_update_force has the heap LSA in hand. For DELETE, the lsa is irrelevant — the slave only needs the primary key.

repl_add_update_lsa — the back-patch for UPDATE

Section titled “repl_add_update_lsa — the back-patch for UPDATE”
// repl_add_update_lsa — src/transaction/replication.c:229 (condensed)
int
repl_add_update_lsa (THREAD_ENTRY *thread_p, const OID *inst_oid)
{
LOG_TDES *tdes = LOG_FIND_TDES (LOG_FIND_THREAD_TRAN_INDEX (thread_p));
if (tdes->suppress_replication != 0) return NO_ERROR;
/* Walk backwards through repl_records; the last one matching this
* OID with a non-null repl_update_lsa is the one we just inserted. */
for (int i = tdes->cur_repl_record - 1; i >= 0; i--) {
LOG_REPL_RECORD *repl_rec = &tdes->repl_records[i];
if (OID_EQ (&repl_rec->inst_oid, inst_oid)
&& !LSA_ISNULL (&tdes->repl_update_lsa)) {
assert (repl_rec->rcvindex == RVREPL_DATA_UPDATE
|| repl_rec->rcvindex == RVREPL_DATA_UPDATE_START
|| repl_rec->rcvindex == RVREPL_DATA_UPDATE_END);
LSA_COPY (&repl_rec->lsa, &tdes->repl_update_lsa);
LSA_SET_NULL (&tdes->repl_update_lsa);
LSA_SET_NULL (&tdes->repl_insert_lsa);
return NO_ERROR;
}
}
return NO_ERROR; /* not found is not an error — debug log only */
}

The function is called by locator_update_force after the heap update has been logged; tdes->repl_update_lsa holds the LSA of the heap log; the matching LOG_REPL_RECORD (recently produced by btree_update → repl_log_insert) gets that LSA stamped into its lsa field. Walking backwards is correct because the most recent matching OID is always the one we just inserted: even when the same row is updated repeatedly within one transaction, the prior update’s lsa was already non-null (it was patched in its own repl_add_update_lsa call) and is skipped over.

Walking one INSERT through the master path

Section titled “Walking one INSERT through the master path”

The deck illustrates the pattern with INSERT (1, "가") followed by INSERT (2, "나") and COMMIT on t1(c1 PK, c2). The order of work for each insert is:

  1. heap_insert_logicalheap_insert_physical writes the row into the slotted page. heap_log_insert_physical appends a LOG_UNDOREDO_DATA record to the prior list.
  2. locator_add_or_remove_indexbtree_insert updates the primary-key B-tree index. Inside the index path, repl_log_insert is called: a new LOG_REPL_RECORD is appended to tdes->repl_records[] with rcvindex = RVREPL_DATA_INSERT, inst_oid = OID of the new row, lsa = LSA of the heap log, and the packed (pk_size, class_name, pk_dbvalue) payload.

After the second insert, tdes->repl_records[] has two entries:

idxrcvindexinst_oidlsarepl_data
0RVREPL_DATA_INSERT(oid_1)LSA(heap_1)“t1” + 1
1RVREPL_DATA_INSERT(oid_2)LSA(heap_2)“t1” + 2

Up to this point no LOG_REPLICATION_DATA record exists in the WAL stream. The records exist only on the descriptor. COMMIT is the trigger that converts them.

Commit-time emission — log_append_repl_info_and_commit_log

Section titled “Commit-time emission — log_append_repl_info_and_commit_log”

log_commit calls log_append_repl_info_and_commit_log (when the HA configuration demands it), which is the atomic emission idiom:

// log_append_repl_info_and_commit_log — src/transaction/log_manager.c:4647
static void
log_append_repl_info_and_commit_log (THREAD_ENTRY *thread_p, LOG_TDES *tdes,
LOG_LSA *commit_lsa)
{
if (tdes->has_supplemental_log) {
log_append_supplemental_info (thread_p, LOG_SUPPLEMENT_TRAN_USER,
strlen (tdes->client.get_db_user ()),
tdes->client.get_db_user ());
tdes->has_supplemental_log = false;
}
log_Gl.prior_info.prior_lsa_mutex.lock ();
log_append_repl_info_with_lock (thread_p, tdes, true);
log_append_commit_log_with_lock (thread_p, tdes, commit_lsa);
log_Gl.prior_info.prior_lsa_mutex.unlock ();
}

The mutex is held across both appends; this is the atomicity guarantee the slave depends on. Inside log_append_repl_info_internal each staged record is converted into a real prior-list node:

// log_append_repl_info_internal — src/transaction/log_manager.c:4555 (condensed)
static void
log_append_repl_info_internal (THREAD_ENTRY *thread_p, LOG_TDES *tdes,
bool is_commit, int with_lock)
{
if (tdes->append_repl_recidx == -1 || is_commit)
tdes->append_repl_recidx = 0;
while (tdes->append_repl_recidx < tdes->cur_repl_record) {
LOG_REPL_RECORD *repl_rec = &tdes->repl_records[tdes->append_repl_recidx];
if ((repl_rec->repl_type == LOG_REPLICATION_DATA
|| repl_rec->repl_type == LOG_REPLICATION_STATEMENT)
&& ((is_commit && repl_rec->must_flush != LOG_REPL_DONT_NEED_FLUSH)
|| repl_rec->must_flush == LOG_REPL_NEED_FLUSH)) {
LOG_PRIOR_NODE *node =
prior_lsa_alloc_and_copy_data (thread_p, repl_rec->repl_type,
RV_NOT_DEFINED, NULL,
repl_rec->length, repl_rec->repl_data,
0, NULL);
LOG_REC_REPLICATION *log = (LOG_REC_REPLICATION *) node->data_header;
if (repl_rec->rcvindex == RVREPL_DATA_DELETE
|| repl_rec->rcvindex == RVREPL_STATEMENT)
LSA_SET_NULL (&log->lsa);
else
LSA_COPY (&log->lsa, &repl_rec->lsa);
log->length = repl_rec->length;
log->rcvindex = repl_rec->rcvindex;
prior_lsa_next_record_with_lock (thread_p, node, tdes);
repl_rec->must_flush = LOG_REPL_DONT_NEED_FLUSH;
}
tdes->append_repl_recidx++;
}
}

The function emits LOG_REPLICATION_DATA (= 39) or LOG_REPLICATION_STATEMENT (= 40) — both real LOG_RECTYPE values defined alongside the regular log record types in log_record.hpp. The emitted record’s LOG_REC_REPLICATION data-header carries the rcvindex (so the slave knows whether this is INSERT/UPDATE/DELETE/STATEMENT), the lsa of the referenced heap log (so the slave can fetch the row image), and the length of the repl_data payload (the inline class-name + primary-key bytes, copied as the prior-node’s body).

For INSERT and STATEMENT the lsa field is set null in the emitted log because the slave reconstructs the row from the immediately- preceding heap log; for DELETE the lsa is null because the slave needs only the primary key; for UPDATE the lsa was back-patched by repl_add_update_lsa and is now copied through.

After the loop, must_flush = LOG_REPL_DONT_NEED_FLUSH on every emitted record so a subsequent abort path will not re-emit them.

After log_append_repl_info_and_commit_log returns, the prior list contains, in order, every staged repl record followed by the LOG_COMMIT. The drain in logpb_prior_lsa_append_all_list (see cubrid-log-manager.md) walks the list and copies records into the log page buffer; logpb_flush_all_append_pages writes them to the active log file. The slave-bound emissions are now durable on the master and visible to a copylogdb poll.

Slave side — copylogdb, the log-volume puller

Section titled “Slave side — copylogdb, the log-volume puller”

copylogdb is a client-mode CUBRID utility (registered in util_service.c and started by cubrid hb start together with the rest of the HA topology). Its job is to keep a slave-local copy of the master’s active and archive log volumes up to date.

The protocol is one-shot per loop iteration: send a NET_SERVER_LOGWR_GET_LOG_PAGES request whose body is the first-page LSA the slave is missing (first_pageid_torecv = last_recv_pageid); the master’s xlogwr_get_log_pages (log_writer.c:2571) responds with up to LOGWR_COPY_LOG_BUFFER_NPAGES * LOG_PAGESIZE bytes (default 128 pages × 16 KiB = 2 MiB) of contiguous log pages. If the requested page does not yet exist on the master, the master blocks and returns when it does — turning what would otherwise be a poll into an event-driven push.

The master-side scaffolding to satisfy a request:

// xlogwr_get_log_pages — src/transaction/log_writer.c (high-level)
xlogwr_get_log_pages (THREAD_ENTRY *thread_p, LOG_PAGEID first_pageid, LOGWR_MODE mode)
{
/* For each page from first_pageid to eof_lsa.pageid,
* - if !logpb_is_page_in_archive: read from active via
* logpb_copy_page_from_file → logpb_read_page_from_file → fileio_read
* - else: locate the right archive via logpb_get_guess_archive_num,
* logpb_arv_page_info_table search, then fetch via
* logpb_fetch_from_archive
* Pack via logwr_pack_log_pages, send via xlog_send_log_pages_to_client. */
}

The master, when locating an archive, consults the logpb_arv_page_info_table (an in-memory cache of (arv_num, fpageid, lpageid) records updated whenever an archive is created), and falls back to Log_Nname_info file scanning if the cache is cold. The “guess” in the function name refers to the arithmetic estimator: when an active log header is available, the function divides the requested pageid by LOGPB_ACTIVE_NPAGES to estimate the archive number; otherwise it starts from archive 0 and scans forward. Once it has a candidate archive, it compares the candidate’s Arv_hdr->fpageid against the requested pageid and walks forward (direction = +1) or backward (direction = -1) through the archive sequence as needed.

On the slave side, logwr_copy_log_file (log_writer.c:1659/1960) issues the request, fills its own Logwr_Gl structure with the arrived pages, and writes them through logwr_flush_all_append_pages (1016) to the slave-local active log. When the active log crosses its size boundary, logwr_archive_active_log (1275) copies the current active log’s contents into a new archive volume, page by page (fileio_read_pages + fileio_write_pages), and the slave-local active log is reset. Its name and structure are identical to a regular CUBRID log volume — applylogdb reads it the same way the master’s recovery would.

Slave side — applylogdb, the per-record dispatcher

Section titled “Slave side — applylogdb, the per-record dispatcher”

applylogdb’s entry point is la_apply_log_file (log_applier.c:8074). It runs as a long-running daemon. Its main loop fetches log records forward from la_Info.final_lsa and hands each record to la_log_record_process:

// la_log_record_process — src/transaction/log_applier.c:6101 (condensed)
static int
la_log_record_process (LOG_RECORD_HEADER *lrec, LOG_LSA *final, LOG_PAGE *pg_ptr)
{
/* Defensive: a non-EOL record must have non-null prev_tranlsa. */
if (lrec->trid == NULL_TRANID || LSA_GT (&lrec->prev_tranlsa, final)
|| LSA_GT (&lrec->back_lsa, final)) {
if (lrec->type != LOG_END_OF_LOG) return ER_LOG_PAGE_CORRUPTED;
}
/* First time we see this trid — register an LA_APPLY for it. */
if ((lrec->type != LOG_END_OF_LOG && lrec->type != LOG_DUMMY_HA_SERVER_STATE)
&& lrec->trid != LOG_SYSTEM_TRANID
&& LSA_ISNULL (&lrec->prev_tranlsa)) {
LA_APPLY *apply = la_add_apply_list (lrec->trid);
/* ... start_lsa bookkeeping ... */
}
switch (lrec->type) {
case LOG_END_OF_LOG:
/* Reached end of currently-known log. Set is_end_of_record and
* return ER_INTERRUPTED so the caller waits for more pages. */
return ER_INTERRUPTED;
case LOG_REPLICATION_DATA:
case LOG_REPLICATION_STATEMENT:
/* Buffer this event in the trid's apply list. */
return la_set_repl_log (pg_ptr, lrec->type, lrec->trid, final);
case LOG_SYSOP_END:
case LOG_COMMIT:
/* Flush the trid's apply list onto the slave. */
if (LSA_GT (final, &la_Info.committed_lsa)) {
eot_time = (lrec->type == LOG_SYSOP_END) ? 0
: la_retrieve_eot_time (pg_ptr, final);
la_add_node_into_la_commit_list (lrec->trid, final, lrec->type, eot_time);
do {
error = la_apply_commit_list (&lsa_apply, final_pageid);
/* ... handle ER_NET_CANT_CONNECT_SERVER, ER_HA_LA_EXCEED_MAX_MEM_SIZE,
LA_IS_FLUSH_ERROR, ER_TDE_CIPHER_IS_NOT_LOADED ... */
if (!LSA_ISNULL (&lsa_apply)) {
LSA_COPY (&la_Info.committed_lsa, &lsa_apply);
if (lrec->type == LOG_COMMIT) la_Info.commit_counter++;
}
} while (!LSA_ISNULL (&lsa_apply));
} else {
la_free_repl_items_by_tranid (lrec->trid); /* already past committed */
}
break;
case LOG_ABORT:
la_add_node_into_la_commit_list (lrec->trid, final, LOG_ABORT, 0);
break;
case LOG_DUMMY_HA_SERVER_STATE:
/* Detect master role change; if state != ACTIVE && != TO_BE_STANDBY,
* the slave's role has changed → set is_role_changed and return
* ER_INTERRUPTED so the daemon shuts down cleanly. */
break;
default: break;
}
/* ... handle out-of-bounds forw_lsa / type → ER_LOG_PAGE_CORRUPTED ... */
return NO_ERROR;
}

The dispatch is tight: every record type is either buffered (the two REPL types), triggered for flush (COMMIT, SYSOP_END, ABORT), or consumed for control (DUMMY_HA_SERVER_STATE, END_OF_LOG, DUMMY_CRASH_RECOVERY, END_CHKPT). All other types fall through the default arm — they are not relevant to apply.

la_set_repl_log — buffering a REPL record

Section titled “la_set_repl_log — buffering a REPL record”
// la_set_repl_log — src/transaction/log_applier.c:3419
static int
la_set_repl_log (LOG_PAGE *log_pgptr, int log_type, int tranid, LOG_LSA *lsa)
{
LA_APPLY *apply = la_find_apply_list (tranid);
if (apply == NULL) return NO_ERROR;
/* Long transaction: bypass the per-item buffer; just remember last_lsa. */
if (apply->is_long_trans) { LSA_COPY (&apply->last_lsa, lsa); return NO_ERROR; }
/* Cap per-trid items at LA_MAX_REPL_ITEMS (1000) — overflow degrades
* the trid into "long transaction" mode (re-fetch from log on apply). */
if (apply->num_items >= LA_MAX_REPL_ITEMS) {
la_free_all_repl_items_except_head (apply);
apply->is_long_trans = true;
LSA_COPY (&apply->last_lsa, lsa);
return NO_ERROR;
}
LA_ITEM *item = la_make_repl_item (log_pgptr, log_type, tranid, lsa);
la_add_repl_item (apply, item);
return NO_ERROR;
}

The bucketed structure is LA_INFO::repl_lists[] — an array of LA_APPLY pointers, indexed by a hash of trid. Each LA_APPLY holds the per-trid linked list of LA_ITEM:

// LA_APPLY and LA_ITEM — src/transaction/log_applier.c:236-264
struct la_item {
LA_ITEM *next, *prev;
int log_type; /* LOG_REPLICATION_DATA / LOG_REPLICATION_STATEMENT */
int item_type; /* RVREPL_DATA_INSERT / UPDATE / DELETE / STATEMENT */
char *class_name; /* unpacked from the REPL record */
char *db_user;
char *ha_sys_prm;
int packed_key_value_length;
char *packed_key_value; /* disk image of pkey value */
DB_VALUE key; /* unpacked from packed_key_value on demand */
LOG_LSA lsa; /* LSA of the LOG_REPLICATION_* record itself */
LOG_LSA target_lsa; /* LSA of the target heap/btree log record */
};
struct la_apply {
int tranid;
int num_items;
bool is_long_trans; /* exceeded LA_MAX_REPL_ITEMS — re-walk on apply */
LOG_LSA start_lsa;
LOG_LSA last_lsa;
LA_ITEM *head;
LA_ITEM *tail;
};

The is_long_trans flag is the escape hatch for the million-row-update problem: rather than carry a million LA_ITEM in memory, the daemon switches to a mode where it remembers only start_lsa and last_lsa, and on commit it walks the log forward from start_lsa to last_lsa re-fetching each REPL record. The trade-off is one extra log walk per long-transaction trid, in exchange for bounded memory.

la_apply_commit_list and la_apply_repl_log — the apply fan-out

Section titled “la_apply_commit_list and la_apply_repl_log — the apply fan-out”

When LOG_COMMIT arrives, the dispatcher queues a LA_COMMIT node in la_Info.commit_head/commit_tail and calls la_apply_commit_list in a loop until the head goes empty:

// la_apply_commit_list / la_apply_repl_log — src/transaction/log_applier.c:5920, 5739
static int
la_apply_commit_list (LOG_LSA *lsa, LOG_PAGEID final_pageid) {
LA_COMMIT *commit = la_Info.commit_head;
if (commit && (commit->type == LOG_COMMIT || commit->type == LOG_SYSOP_END
|| commit->type == LOG_ABORT)) {
error = la_apply_repl_log (commit->tranid, commit->type,
&commit->log_lsa, &la_Info.total_rows,
final_pageid);
LSA_COPY (lsa, &commit->log_lsa);
/* ... unlink commit, advance head, update _db_ha_apply_info ... */
}
return error;
}
static int
la_apply_repl_log (int tranid, int rectype, LOG_LSA *commit_lsa,
int *total_rows, LOG_PAGEID final_pageid) {
LA_APPLY *apply = la_find_apply_list (tranid);
if (rectype == LOG_ABORT) { la_clear_applied_info (apply); return NO_ERROR; }
for (LA_ITEM *item = apply->head; item != NULL; item = next) {
if (LSA_GT (&item->lsa, &la_Info.last_committed_rep_lsa)
&& la_need_filter_out (item) == false) {
if (item->log_type == LOG_REPLICATION_DATA) {
switch (item->item_type) {
case RVREPL_DATA_UPDATE_START:
case RVREPL_DATA_UPDATE_END:
case RVREPL_DATA_UPDATE: error = la_apply_update_log (item); break;
case RVREPL_DATA_INSERT: error = la_apply_insert_log (item); break;
case RVREPL_DATA_DELETE: error = la_apply_delete_log (item); break;
}
} else if (item->log_type == LOG_REPLICATION_STATEMENT) {
error = la_apply_statement_log (item);
}
if (error == NO_ERROR) LSA_COPY (&la_Info.committed_rep_lsa, &item->lsa);
else if (LA_RETRY_ON_ERROR (error)) { LA_SLEEP (10, 0); continue; }
/* ... handle ER_NET_CANT_CONNECT_SERVER, log error, advance ... */
}
next = la_get_next_repl_item (item, apply->is_long_trans, &apply->last_lsa);
la_free_repl_item (apply, item);
item = next;
}
/* ... end-of-trid bookkeeping; clear or free per LOG_SYSOP_END semantics ... */
return error;
}

la_apply_insert_log, la_apply_update_log, and la_apply_delete_log are the three workers. They share a common shape:

  1. Resolve the class. class_name from the LA_ITEM is resolved against the slave’s catalog to get a DB_OBJECT*.
  2. Reconstruct the row image. For INSERT/UPDATE, the item’s target_lsa points at the master’s heap log record; the daemon reads it via la_get_log_data, with helpers la_get_overflow_recdes (BIGONE / link-change), la_get_relocation_recdes (REC_RELOCATION + REC_NEWHOME), and la_get_next_update_log (REC_ASSIGN_ADDRESS deferred update). The result is a RECDES containing the after-image only; CUBRID does not ship before-images for replication.
  3. Apply the row. la_repl_add_object calls into the slave server’s regular client API (db_create, db_otmpl_*) with the reconstructed row. The slave server runs the operation as a normal DML, taking its own locks, generating its own MVCC IDs, writing its own WAL.
  4. Track or retry. On success, committed_rep_lsa advances. On a retryable error (deadlock, lock timeout, page latch abort, TDE cipher not loaded — the LA_RETRY_ON_ERROR mask), the daemon sleeps 10 seconds and retries. On a non-retryable error, the operation is logged and the daemon advances past it.

Delete is simpler: only the primary key is needed, no row image fetch, la_repl_add_object is called with recdes = NULL.

Reconstructing the row image — case analysis

Section titled “Reconstructing the row image — case analysis”

la_get_recdes is the dispatcher that produces an after-image RECDES from an item->target_lsa. Five record-type cases matter:

1. Normal heap record (REC_HOME).
La_get_log_data() — header + redo + undo, copy redo into recdes.
2. RVOVF_CHANGE_LINK — the record is BIGONE, but only the linkage
to overflow pages changed.
La_get_overflow_recdes(..., RVOVF_PAGE_UPDATE) — walk forward
collecting overflow-page redo until the dummy/anchor record.
3. recdes->type == REC_BIGONE — the record is a fresh BIGONE.
La_get_overflow_recdes(..., RVOVF_NEWPAGE_INSERT) — collect the
freshly-inserted overflow chain.
4. RVHF_INSERT && recdes->type == REC_ASSIGN_ADDRESS — the heap
reserved a slot first, then the actual data update was deferred.
La_get_next_update_log() — chase forw_lsa within the same trid
to find the deferred update record.
5. (RVHF_UPDATE || RVHF_UPDATE_NOTIFY_VACUUM) && recdes->type ==
REC_RELOCATION — the record is the REC_RELOCATION pointer; the
actual REC_NEWHOME lives elsewhere.
La_get_relocation_recdes() — chase prev_tranlsa within the same
trid to find the REC_NEWHOME companion.

The chase functions all read forward or backward through the log following one of the three header LSAs (forw_lsa, prev_tranlsa, back_lsa); they decode physiological log records and decompress them with the daemon’s per-instance LOG_ZIP contexts (la_Info.undo_unzip_ptr, la_Info.redo_unzip_ptr).

After every batch of applies, la_log_commit updates the _db_ha_apply_info system table on the slave with the new committed_lsa, committed_rep_lsa, final_lsa, and the running counters. The row is keyed by the master’s db_name/copied_log_path; on daemon restart the row is read back and used to seed la_Info.final_lsa. This is the durable end of the apply cursor.

A complete master → slave commit, end to end

Section titled “A complete master → slave commit, end to end”
sequenceDiagram
  participant TX as Master DML thread
  participant LOC as locator_*_force
  participant REPL as repl_log_insert
  participant TDES as tdes->repl_records
  participant FLUSH as log_append_repl_info_<br/>and_commit_log
  participant PRIOR as prior_lsa list
  participant LGAT as master active log
  participant XLW as xlogwr_get_log_pages
  participant CL as copylogdb (slave host)
  participant SLOG as slave-local log
  participant AL as applylogdb
  participant DISP as la_log_record_process
  participant AP as la_apply_<insert|update|delete>_log
  participant SS as slave cub_server
  participant HA as _db_ha_apply_info

  TX->>LOC: INSERT (1, "가")
  LOC->>LOC: heap_insert_logical → log_undoredo
  LOC->>REPL: repl_log_insert (RVREPL_DATA_INSERT)
  REPL->>TDES: append LOG_REPL_RECORD
  Note over TX,TDES: caller continues — no WAL emission yet
  TX->>FLUSH: COMMIT
  FLUSH->>FLUSH: prior_lsa_mutex.lock()
  loop each LOG_REPL_RECORD
    FLUSH->>PRIOR: append LOG_REPLICATION_DATA
  end
  FLUSH->>PRIOR: append LOG_COMMIT
  FLUSH->>FLUSH: prior_lsa_mutex.unlock()
  PRIOR->>LGAT: drain + flush (logpb_flush_all_append_pages)

  CL->>XLW: NET_SERVER_LOGWR_GET_LOG_PAGES (last_recv_pageid)
  XLW->>LGAT: read pages
  XLW-->>CL: up to 128 × LOG_PAGESIZE bytes
  CL->>SLOG: write pages (logwr_flush_all_append_pages)

  AL->>SLOG: la_get_page_buffer (la_Info.final_lsa)
  AL->>DISP: lrec = LOG_REPLICATION_DATA
  DISP->>DISP: la_set_repl_log → repl_lists[trid]
  AL->>DISP: lrec = LOG_COMMIT
  DISP->>DISP: la_add_node_into_la_commit_list
  DISP->>AP: la_apply_commit_list → la_apply_repl_log
  AP->>SS: db_otmpl_create / db_template_*
  SS-->>AP: success / retryable / fatal
  AP->>HA: la_log_commit (committed_rep_lsa)

The pipeline holds two interleaved orderings. LSA order is enforced on the master at attach time by the prior-LSA mutex — the emitted LOG_REPLICATION_DATA records and the LOG_COMMIT are strictly monotonic. Apply order on the slave is enforced by the per-trid buffer plus the commit queue — events are buffered as they appear, but applied only when LOG_COMMIT is reached, in the order in which LOG_COMMIT records arrive. The two orderings agree because the slave walks the log forward in LSA order and only one commit’s events fan out at a time.

Anchor on symbol names, not line numbers.

  • LOG_REPL_RECORD (replication.h) — staging entry per row mutation.
  • LOG_REPL_FLUSH enum (replication.h) — DONT_NEED_FLUSH = -1, COMMIT_NEED_FLUSH = 0, NEED_FLUSH = 1.
  • REPL_INFO_TYPE enum (replication.h) — SBR, RBR_START, RBR_NORMAL, RBR_END.
  • LOG_TDES::repl_records / num_repl_records / cur_repl_record / append_repl_recidx / fl_mark_repl_recidx / repl_insert_lsa / repl_update_lsa / must_flush (log_impl.h).
  • REPL_LOG_INFO_ALLOC_SIZE, REPL_LOG_IS_NOT_EXISTS, REPL_LOG_IS_FULL (replication.c).
  • repl_log_insert (replication.c) — append a LOG_REPL_RECORD to tdes->repl_records[].
  • repl_log_insert_statement (replication.c) — statement-based emission for DDL / replicated session statements.
  • repl_add_update_lsa (replication.c) — back-patch repl_rec->lsa after the heap log for UPDATE.
  • repl_log_info_alloc (replication.c) — initial alloc + grow-by-100 realloc.
  • repl_start_flush_mark / repl_end_flush_mark (replication.c) — bracket DDL emissions that must flush even on rollback.
  • repl_log_abort_after_lsa (replication.c) — drop staged records past a savepoint LSA.
  • locator_attribute_info_force / locator_insert_force / locator_update_force / locator_delete_force (locator_sr.c).
  • locator_add_or_remove_index (locator_sr.c) — INSERT path, calls btree_insert.
  • locator_update_index (locator_sr.c) — UPDATE path, calls btree_update.
  • heap_insert_logical / heap_update_logical / heap_delete_logical / heap_log_insert_physical / heap_log_update_physical (heap_file.c).
  • btree_update (btree.c) — inside repl_log_insert is called for the index side.
  • log_append_repl_info_internal (log_manager.c) — convert staged records to prior-list nodes.
  • log_append_repl_info (log_manager.c) — public entry, no lock.
  • log_append_repl_info_with_lock (log_manager.c) — variant taken when caller already holds the prior mutex.
  • log_append_repl_info_and_commit_log (log_manager.c) — atomic emission of repl + commit.
  • log_commit / log_commit_local (log_manager.c) — top-level commit drivers.
  • LOG_REC_REPLICATION (log_record.hpp) — on-disk data header for LOG_REPLICATION_DATA / LOG_REPLICATION_STATEMENT.
  • xlogwr_get_log_pages (log_writer.c) — server entry for NET_SERVER_LOGWR_GET_LOG_PAGES.
  • logwr_pack_log_pages (log_writer.c) — packs a contiguous range.
  • xlog_send_log_pages_to_client (server-support side) — wire write.
  • logpb_copy_page_from_file / logpb_read_page_from_file (log_page_buffer.c) — physical fetch.
  • logpb_fetch_from_archive / logpb_get_guess_archive_num (log_page_buffer.c) — archive lookup.
  • logpb_arv_page_info_table — in-memory cache, updated on every archive create.
  • logwr_initialize (log_writer.c) — init Logwr_Gl, open active log.
  • logwr_copy_log_file (log_writer.c) — main loop: fetch → write → archive.
  • logwr_set_hdr_and_flush_info (log_writer.c) — header reconciliation after each batch.
  • logwr_writev_append_pages / logwr_flush_all_append_pages (log_writer.c) — write to slave-local active log.
  • logwr_archive_active_log (log_writer.c) — roll active to archive when full.
  • logwr_flush_header_page / logwr_flush_bgarv_header_page (log_writer.c) — header writeback.
  • logwr_to_physical_pageid (log_writer.c) — logical → physical page id.
  • logwr_check_page_checksum (log_writer.c) — per-page integrity check.
  • LA_INFO global (log_applier.c) — per-process state.
  • LA_APPLY / LA_ITEM / LA_COMMIT / LA_HA_APPLY_INFO / LA_CACHE_PB / LA_CACHE_BUFFER / LA_REPL_FILTER / LA_OVF_PAGE_LIST / LA_RECDES_POOL (log_applier.c) — internal types.
  • LA_RETRY_ON_ERROR mask (log_applier.h) — retryable apply errors.
  • REPL_FILTER_TYPE enum (log_applier.h) — NONE / INCLUDE_TBL / EXCLUDE_TBL.
  • LA_MAX_REPL_ITEMS (1000), LA_MAX_REPL_ITEM_WITHOUT_RELEASE_PB (50), LA_STATUS_BUSY / LA_STATUS_IDLE (log_applier.c).
  • la_log_fetch / la_log_fetch_from_archive (log_applier.c) — read a log page from active or archive.
  • la_get_page_buffer / la_release_page_buffer (log_applier.c) — page cache access with refcount.
  • la_init_cache_pb / la_init_cache_log_buffer (log_applier.c) — page cache init.
  • la_cache_buffer_replace / la_invalidate_page_buffer / la_decache_page_buffers (log_applier.c) — eviction.
  • la_init_recdes_pool / la_assign_recdes_from_pool (log_applier.c) — preallocated RECDES pool.
  • la_apply_log_file (log_applier.c) — daemon main, the entry from cubrid hb start.
  • la_init (log_applier.c) — init globals, allocate caches, spawn helper threads.
  • la_apply_pre (log_applier.c) — pre-flight: lock, fetch header, check duplicates.
  • la_change_state (log_applier.c) — slave-state change handler.
  • la_log_commit (log_applier.c) — checkpoint _db_ha_apply_info.
  • la_force_shutdown (log_applier.h) — external shutdown hook.
  • la_log_record_process (log_applier.c) — switch on lrec->type.
  • la_set_repl_log (log_applier.c) — buffer a LOG_REPLICATION_* record.
  • la_make_repl_item / la_add_repl_item (log_applier.c) — build LA_ITEM from a log page.
  • la_find_apply_list / la_add_apply_list (log_applier.c) — per-trid bucket lookup.
  • la_init_repl_lists (log_applier.c) — bucket array init / realloc.
  • la_add_node_into_la_commit_list / la_retrieve_eot_time (log_applier.c) — commit queue.
  • la_log_copy_fromlog (log_applier.c) — copy bytes across log-page boundaries.
  • la_apply_commit_list (log_applier.c) — drain LA_COMMIT queue, dispatch one trid per call.
  • la_apply_repl_log (log_applier.c) — per-item dispatch over LA_APPLY::head.
  • la_apply_insert_log / la_apply_update_log / la_apply_delete_log / la_apply_statement_log (log_applier.c) — the four per-kind appliers.
  • la_repl_add_object (log_applier.c) — common end-stage that calls into the slave server.
  • la_get_recdes (log_applier.c) — five-case after-image reconstructor.
  • la_get_log_data (log_applier.c) — read+decompress one heap log.
  • la_get_overflow_recdes (log_applier.c) — BIGONE chain walker.
  • la_get_relocation_recdes (log_applier.c) — REC_RELOCATION → REC_NEWHOME chase.
  • la_get_next_update_log (log_applier.c) — REC_ASSIGN_ADDRESS deferred-update chase.
  • la_get_undoredo_diff / la_get_zipped_data (log_applier.c) — diff and zlib unzip.
  • la_make_room_for_mvcc_insid / la_make_room_for_mvcc_delid_and_prev_ver (log_applier.c) — MVCC header injection on the slave-applied row.
  • la_disk_to_obj (log_applier.c) — RECDES → DB_OTMPL conversion.
  • la_need_filter_out / la_create_repl_filter / la_print_repl_filter_info (log_applier.c) — table-level filter.
  • la_init_ha_apply_info (log_applier.c) — zero-fill a LA_HA_APPLY_INFO.
  • la_get_ha_apply_info (log_applier.c) — read _db_ha_apply_info.
  • la_insert_ha_apply_info / la_update_ha_last_applied_info / la_update_ha_apply_info_start_time / la_update_ha_apply_info_log_record_time (log_applier.c) — write back.
  • la_delete_ha_apply_info (log_applier.c) — cleanup on full reset.
  • la_get_last_ha_applied_info (log_applier.c) — restart bookmark.
  • la_find_required_lsa (log_applier.c) — minimal-needed-LSA computation.
  • la_remove_archive_logs (log_applier.c) — slave-local archive trimming after apply.
  • util_service.ccubrid hb start / cubrid heartbeat start wires cub_master plus per-host copylogdb and applylogdb.
  • commdb.c — operator-side activation of HA via cub_commdb.
  • connection/heartbeat.c — process-side hb_register_to_master is called from copylogdb / applylogdb startup so cub_master knows they are alive.
SymbolFileLine
LOG_REPLICATION_DATAlog_record.hpp116
LOG_REPLICATION_STATEMENTlog_record.hpp117
RVREPL_DATA_INSERT/UPDATE/DELETE/STATEMENTrecovery.h149-154
LOG_REPL_RECORD (struct log_repl)replication.h78
LOG_REPL_FLUSH enumreplication.h70
REPL_INFO_TYPE enumreplication.h43
LOG_TDES::repl_records grouplog_impl.h522-528
REPL_LOG_INFO_ALLOC_SIZEreplication.c49
repl_log_info_allocreplication.c165
repl_add_update_lsareplication.c229
repl_log_insertreplication.c293
repl_log_insert_statementreplication.c512
repl_start_flush_markreplication.c606
repl_end_flush_markreplication.c635
repl_log_abort_after_lsareplication.c673
log_append_repl_info_internallog_manager.c4555
log_append_repl_infolog_manager.c4623
log_append_repl_info_with_locklog_manager.c4629
log_append_repl_info_and_commit_loglog_manager.c4647
locator_insert_forcelocator_sr.c4938
locator_update_forcelocator_sr.c5396
locator_delete_forcelocator_sr.c6116
locator_attribute_info_forcelocator_sr.c7461
locator_add_or_remove_indexlocator_sr.c7695
locator_update_indexlocator_sr.c8260
xlogwr_get_log_pageslog_writer.c2571
logwr_initializelog_writer.c428
logwr_set_hdr_and_flush_infolog_writer.c639
logwr_writev_append_pageslog_writer.c838
logwr_flush_all_append_pageslog_writer.c1016
logwr_flush_header_pagelog_writer.c1207
logwr_archive_active_loglog_writer.c1275
logwr_write_log_pageslog_writer.c1512
logwr_copy_log_filelog_writer.c1659/1960
LA_RETRY_ON_ERROR (macro)log_applier.h34
REPL_FILTER_TYPE (enum)log_applier.h48
LA_CACHE_BUFFER/LA_CACHE_PBlog_applier.c177-204
LA_REPL_FILTERlog_applier.c206
LA_ITEMlog_applier.c236
LA_APPLYlog_applier.c254
LA_COMMITlog_applier.c266
LA_INFOlog_applier.c279
LA_HA_APPLY_INFOlog_applier.c393
la_init_ha_apply_infolog_applier.c606
la_get_page_bufferlog_applier.c1297
la_get_ha_apply_infolog_applier.c1514
la_init_recdes_poollog_applier.c2416
la_init_cache_pblog_applier.c2474
la_init_cache_log_bufferlog_applier.c2528
la_init_repl_listslog_applier.c2773
la_find_apply_listlog_applier.c2860
la_log_copy_fromloglog_applier.c2960
la_add_repl_itemlog_applier.c3050
la_make_repl_itemlog_applier.c3092
la_set_repl_loglog_applier.c3419
la_add_node_into_la_commit_listlog_applier.c3473
la_get_log_datalog_applier.c3949
la_get_overflow_recdeslog_applier.c4249
la_get_next_update_loglog_applier.c4393
la_get_relocation_recdeslog_applier.c4552
la_get_recdeslog_applier.c4604
la_repl_add_objectlog_applier.c4882
la_apply_delete_loglog_applier.c5000
la_apply_update_loglog_applier.c5110
la_apply_insert_loglog_applier.c5311
la_apply_statement_loglog_applier.c5496
la_apply_repl_loglog_applier.c5739
la_apply_commit_listlog_applier.c5920
la_log_record_processlog_applier.c6101
la_change_statelog_applier.c6397
la_log_commitlog_applier.c6531
la_initlog_applier.c6917
la_apply_log_filelog_applier.c8074

The raw deck (HA replication.pdf / .pptx) was authored against an earlier branch. Most of what it shows is still accurate against the 11.5.x source under /data/hgryoo/references/cubrid — verifying each major claim against the source as of updated: was straightforward. The drift points are recorded here.

  • repl_log_insert signature is unchanged. The deck shows tdes->repl_records[ ], tdes->num_repl_records, tdes->cur_repl_record, default size = 100, tdes->must_flush = LOG_REPL_NEED_FLUSH. All five names are present in log_impl.h:522-528 and replication.c:293. The only refinement modern code adds is tdes->fl_mark_repl_recidx (log_impl.h:525) and the RVREPL_DATA_UPDATE_START / RVREPL_DATA_UPDATE_END sub-kinds (recovery.h:152-154) plus the tde_encrypted field on the struct itself (replication.h:88). The deck’s LOG_REPL_RECORD enumeration predates these additions.

  • The LOG_REPL_RECORD::repl_data layout in the deck — | pkey size | class_name | pkey dbvalue | — matches current source. Verified at replication.c:411-419: the function reserves a leading OR_INT_SIZE for packed_key_value_size, then or-packs class_name, then or-packs the pkey DB_VALUE, then back-fills the leading int with the actual packed-key byte length.

  • Commit-time emission goes through log_append_repl_info_*. The deck’s Log_commit() → Log_commit_local() → Log_append_repl_info_and_commit_log() → Log_append_repl_info() → Log_append_repl_info_with_lock() → Log_append_repl_info_internal() sequence matches the current source’s log_commit → log_append_repl_info_and_commit_log → log_append_repl_info_with_lock → log_append_repl_info_internal. The deck splits the with-lock and without-lock variants at the caller level; the source does the same.

  • The atomic repl_info + commit_log idiom. The deck does not call out the atomicity but the current source documents it explicitly (log_manager.c:4642-4645): “Atomic write of replication log and commit log is crucial for replication consistencies. When a commit log of others is written in the middle of one’s replication and commit log, a restart of replication will break consistencies of slaves/replicas.”

  • copylogdb request format matches. The deck describes the request as a (ctx_ptr->last_error, mode, First_pageid_torecv) tuple sent under NET_SERVER_LOGWR_GET_LOG_PAGES. The current source has xlogwr_get_log_pages at log_writer.c:2571 taking (THREAD_ENTRY*, LOG_PAGEID first_pageid, LOGWR_MODE mode) and the slave-side logwr_copy_log_file at log_writer.c:1659/1960 driving the request loop. The buffer-size constant LOGWR_COPY_LOG_BUFFER_NPAGES = 128 is unchanged.

  • applylogdb per-record dispatch in la_log_record_process matches the deck. Verified at log_applier.c:6101. The switch arms — LOG_END_OF_LOG, LOG_REPLICATION_DATA / LOG_REPLICATION_STATEMENT, LOG_SYSOP_END / LOG_COMMIT, LOG_ABORT, LOG_DUMMY_CRASH_RECOVERY, LOG_END_CHKPT, LOG_DUMMY_HA_SERVER_STATE — are all present. The deck lists fewer arms because it focuses on the REPL + COMMIT path.

  • la_apply_repl_log dispatch table is the same shape. Verified at log_applier.c:5797-5826. The deck shows RVREPL_DATA_INSERT → la_apply_insert_log, RVREPL_DATA_UPDATE → la_apply_update_log, RVREPL_DATA_DELETE → la_apply_delete_log. The current source also dispatches RVREPL_DATA_UPDATE_START and RVREPL_DATA_UPDATE_END to la_apply_update_log (the START/END brackets on row-based-replication boundaries). The deck does not mention these sub-kinds.

  • The five-case la_get_recdes matches. Verified at log_applier.c:4604+. Cases 1-5 — normal, RVOVF_CHANGE_LINK, REC_BIGONE, RVHF_INSERT + REC_ASSIGN_ADDRESS, and (RVHF_UPDATE | RVHF_UPDATE_NOTIFY_VACUUM) + REC_RELOCATION — are all present.

  • LA_RETRY_ON_ERROR is now broader. The deck does not enumerate the error mask. Current source (log_applier.h:34-46) lists ER_LK_UNILATERALLY_ABORTED, three flavors of ER_LK_OBJECT_TIMEOUT, ER_LK_PAGE_TIMEOUT, two flavors of ER_PAGE_LATCH_*, three flavors of ER_LK_OBJECT_DL_TIMEOUT, ER_TDE_CIPHER_IS_NOT_LOADED, and ER_LK_DEADLOCK_CYCLE_DETECTED — twelve codes total. TDE is the most recent addition.

  • REPL_FILTER_TYPE and table-level filtering. The deck does not show the filter; current source has it in log_applier.h:48-53 (NONE, INCLUDE_TBL, EXCLUDE_TBL) with LA_REPL_FILTER consumed by la_need_filter_out inside la_apply_repl_log (log_applier.c:5797). The filter is evaluated per item on the slave at apply time, not on the master at emission time. This means filtered events still cost a la_get_recdes walk on the slave; only the final la_repl_add_object is skipped.

  • MVCC injection on the slave. The deck does not cover this. la_make_room_for_mvcc_insid and la_make_room_for_mvcc_delid_and_prev_ver (log_applier.c, declared near 503-504) reserve space in the reconstructed RECDES so the slave server’s MVCC layer can stamp the slave’s own MVCCID at apply time — the master’s MVCC IDs are not copied; the slave generates fresh ones.

  • la_log_record_process handles LOG_DUMMY_HA_SERVER_STATE for role-change detection. Verified at log_applier.c:6292+. When ha_server_state->state is not HA_SERVER_STATE_ACTIVE and not HA_SERVER_STATE_TO_BE_STANDBY, the daemon sets is_role_changed = true and returns ER_INTERRUPTED so the caller can shut the daemon down cleanly. The deck does not cover this path but it is the mechanism by which a master-to-slave demotion (driven by cub_master heartbeat failover, see cubrid-heartbeat.md) propagates into the applier.

  • is_long_trans overflow handling is unchanged. Verified at log_applier.c:3437-3443: when apply->num_items >= LA_MAX_REPL_ITEMS (1000), the daemon frees all items except the head, sets is_long_trans = true, and starts tracking only last_lsa. Apply for such a trid will re-walk the log between start_lsa and last_lsa. The deck does not surface this but the constants and branch are present in current source.

  • TDE on the master-side staging entry. The deck does not cover TDE. Current LOG_REPL_RECORD::tde_encrypted (replication.h:88) is set in repl_log_insert based on heap_get_class_tde_algorithm; on the prior-list emission side, prior_set_tde_encrypted is called when the flag is true (log_manager.c:4585-4592). On the slave, the daemon’s la_load_tde (and logwr_load_tde on the copy side) handles the symmetric decrypt.

  1. Synchronous replication mode. The model section calls out sync vs. async; current source ships only the async path. LOGWR_MODE (passed to logwr_copy_log_file and xlogwr_get_log_pages) is enumerated and the deck shows mode in the request. Are there sync values, or only async? Investigation: read the LOGWR_MODE enum and grep for non- LOGWR_MODE_ASYNC writers.

  2. fl_mark_repl_recidx semantics for DDL. The repl_start_flush_mark / repl_end_flush_mark pair sets fl_mark_repl_recidx to bracket DDL records. The intent is that records inside the bracket carry must_flush = LOG_REPL_NEED_FLUSH so they emit even on rollback (DDL is non-transactional in CUBRID for replication purposes). What guarantees this is robust against nested DDL and partial rollback? Investigation path: read the must_flush writers and their interaction with repl_log_abort_after_lsa.

  3. The repl_lists[] bucket size. LA_INFO::repl_cnt is the number of buckets; the deck does not specify how it is sized, and la_init_repl_lists shows a realloc-on-demand pattern. What is the initial cap and what triggers regrow? Investigation path: read la_init_repl_lists (log_applier.c:2773) and la_add_apply_list.

  4. Long-transaction re-walk performance. When a trid trips is_long_trans, la_get_next_repl_item_from_log walks the log forward looking for the next REPL record for the same trid. The cost is O(records-since-start-lsa) per item. What limits this from becoming O(N²) for huge transactions? Investigation path: read la_get_next_repl_item_from_log and measure on a synthetic million-row update.

  5. TDE key sharing between copylogdb and applylogdb. The UNSTABLE_TDE_FOR_REPLICATION_LOG guard in log_applier.c (lines 350-352) shows a unix-socket protocol between copylogdb and the apply side for sharing TDE data keys. The “unstable” name suggests this is not production. Is TDE + replication actually supported, or is it an internal-only feature flag? Investigation path: search for the symbol in release notes and CMake feature flags.

  6. logpb_get_guess_archive_num worst-case behavior. When the logpb_arv_page_info_table cache is cold, the master’s archive lookup falls back to estimating + scanning. On a node with thousands of archives, what is the worst-case latency? Investigation path: read the function body and measure with a pre-built archive set.

  7. The _db_ha_apply_info row’s recovery semantics. On slave-server crash mid-apply, the row holds the last acknowledged committed_rep_lsa; on restart the daemon re-walks from that point. But la_log_commit updates the row transactionally on the slave server, and that server’s own recovery may roll the row back. What happens if the slave server crashes between la_repl_add_object and la_log_commit? Are records re-applied (and idempotent against PK uniqueness), or is there a separate per-item ack? Investigation path: read la_log_commit (log_applier.c:6531) and trace the slave- server’s recovery interaction.

  8. Statement-based replication and non-determinism. The LOG_REPLICATION_STATEMENT path replays SQL text via la_apply_statement_log. CUBRID does not block non-deterministic functions (NOW(), RAND()) at master emission time. What prevents drift between master and slave on such statements? Investigation path: read la_apply_statement_log (log_applier.c:5496) and check for pre-bound parameter substitution.

  9. Filter race on consumer reconfigure. The LA_REPL_FILTER is loaded by la_create_repl_filter at daemon start and consulted by la_need_filter_out per item. If an operator changes the filter list while the daemon is running, when does the new filter take effect? Investigation path: read la_create_repl_filter and check for SIGHUP handlers.

  10. Replica vs. slave distinction. The heartbeat module distinguishes HB_NSTATE_SLAVE from HB_NSTATE_REPLICA; both run applylogdb, but a replica can never become master. Does the apply path differ between slave and replica, or are the two roles purely about the cluster-side FSM? Investigation path: cross-reference cubrid-heartbeat.md’s HB_NSTATE_REPLICA with apply-side branches (la_check_replica_info, etc.).

  • raw/code-analysis/cubrid/distributed/HA replication.pdf — the PDF render of the deck.
  • raw/code-analysis/cubrid/distributed/HA replication.pptx — the source slide deck.
  • raw/code-analysis/cubrid/distributed/_converted/ha-replication.pdf.md — pdftotext extract of the PDF.
  • raw/code-analysis/cubrid/distributed/_converted/ha-replication.pptx.md — markitdown extract of the PPTX.
  • knowledge/code-analysis/cubrid/cubrid-log-manager.md — the WAL machinery the master emits into and the slave’s applylogdb walks.
  • knowledge/code-analysis/cubrid/cubrid-cdc.md — the modern pull-style alternative; shares log_record.hpp types and the la_apply_* legacy code path.
  • knowledge/code-analysis/cubrid/cubrid-heartbeat.md — the cub_master cluster FSM that supervises copylogdb and applylogdb and triggers the role changes the apply daemon detects via LOG_DUMMY_HA_SERVER_STATE.
  • knowledge/code-analysis/cubrid/cubrid-recovery-manager.md — the master-side analysis/redo/undo passes share record decoding and log_reader infrastructure with applylogdb.
  • Designing Data-Intensive Applications (Kleppmann), Ch. 5 “Replication” — primary/standby, sync vs async, statement vs row vs WAL shipping.
  • Database Internals (Petrov), Ch. 13 “Replication” — leader- follower, fail-over and consistency guarantees, log shipping.
  • Database System Concepts (Silberschatz, Korth, Sudarshan), Ch. 19 “Recovery System” + Ch. 23 “Distributed Databases” — replication consistency models, distributed commit, recovery on replicas.

CUBRID source (/data/hgryoo/references/cubrid/)

Section titled “CUBRID source (/data/hgryoo/references/cubrid/)”
  • src/transaction/replication.c / replication.h — the master- side staging primitives.
  • src/transaction/log_manager.c / log_manager.hlog_append_repl_info_* family.
  • src/transaction/log_record.hppLOG_REPLICATION_DATA and LOG_REPLICATION_STATEMENT record-type enum entries.
  • src/transaction/recovery.hRVREPL_DATA_* recovery indices.
  • src/transaction/log_impl.hLOG_TDES replication fields.
  • src/transaction/log_writer.c / log_writer.h — the master-side xlogwr_* server endpoint and the slave-side logwr_* daemon (i.e., copylogdb).
  • src/transaction/log_applier.c / log_applier.h — the slave-side la_* daemon (i.e., applylogdb).
  • src/storage/heap_file.cheap_*_logical / heap_log_*_physical emission sites.
  • src/storage/btree.cbtree_update / btree_insert index side, where repl_log_insert is called for index ops.
  • src/transaction/locator_sr.clocator_*_force and locator_attribute_info_force, the upstream entry points that drive both the heap log and the replication staging.
  • src/executables/util_service.ccubrid hb start / cubrid heartbeat start wires cub_master, copylogdb, and applylogdb together.
  • src/connection/heartbeat.c — the process-side hb_register_to_master invoked by both daemons on startup.