CUBRID HA Replication — Logical-Log Based Master/Slave Replication via copylogdb and applylogdb
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Cross-check Notes
- Open Questions
- Sources
Theoretical Background
Section titled “Theoretical Background”Replication in a relational database engine is, at the abstraction level, the engineering problem of keeping two on-disk databases equivalent under a query workload without forcing the writers on both nodes to coordinate per-statement. The textbook framing (Kleppmann, Designing Data-Intensive Applications, ch. 5 “Replication”; Petrov, Database Internals, ch. 13 “Replication”) splits the design space along three axes that every real engine — including CUBRID — has to choose a coordinate on.
The first axis is what gets shipped. Physical replication ships the engine’s own WAL records — page-level redo records, in their on-disk layout. The slave is an exact byte-equivalent of the master (same page layout, same heap free-space distribution, same B-tree page splits at the same LSAs). PostgreSQL streaming replication, MySQL InnoDB redo log replication, and Oracle’s physical standby use this model. Logical replication ships statements or row events: “INSERT INTO t VALUES (…)”, or “row with PK=X in table T was updated; before-image=A, after-image=B”. The slave is equivalent but not byte-identical — it might choose different physical layouts. MySQL row-based binlog, PostgreSQL logical replication slots, and Oracle GoldenGate use this model.
The trade-off is symmetric. Physical replication is cheap to produce (the WAL already exists; no extra emission cost) and fast to apply (memcpy onto a page), but it requires the slave to be byte-compatible with the master, which precludes schema divergence, mixed-version clusters, or partial replication (replicating a subset of tables). Logical replication is more expensive to produce (the engine must extract row images and metadata at DML time) and slower to apply (each row event re-runs the SQL execution path), but the slave is decoupled from the master’s physical layout, mixed versions and table-level filtering become possible, and the wire format is portable across engines that agree on it.
The second axis is statement-level versus row-level. Inside
the logical camp, an event can be a SQL statement
(“UPDATE t SET c = c+1 WHERE x > 10”) or a row image
(“row OID=O, before=A, after=B”). Statement-level events are
compact but non-deterministic: a NOW() or RAND() call must
evaluate the same way on both nodes; an ordering-sensitive query
without an ORDER BY may apply rows in different orders and produce
different results. Row-level events are larger but deterministic
because the slave applies a row image, not a query plan. Most
modern engines either default to row-level (MySQL post-5.7,
PostgreSQL logical replication) or expose a hybrid (CUBRID emits
both — LOG_REPLICATION_DATA for row events,
LOG_REPLICATION_STATEMENT for DDL and trigger-bound statements).
The third axis is synchronous versus asynchronous. Synchronous replication holds the master’s commit until the slave has acknowledged the commit’s records — providing zero-data-loss failover at the cost of master commit latency rising to slave round-trip plus apply time. Asynchronous replication lets the master commit immediately, the slave catches up at its own pace, and a master crash before the next batch ships loses the in-flight window. Eventually consistent in the textbook sense: the slave converges to the master’s state given enough time and no further master writes. CUBRID’s HA replication is asynchronous — the slave’s apply is decoupled from the master’s commit, the only contract is that the slave’s apply order matches the master’s commit order.
Once these three axes are named, every CUBRID-specific structure in this document is implementing a coordinate choice on one of them, or making the resulting state machine durable.
Common DBMS Design
Section titled “Common DBMS Design”Below the textbook abstraction, every primary/standby DBMS that ships a logical replication path — MySQL row-based binlog, PostgreSQL logical decoding, Oracle GoldenGate, CUBRID HA — reaches for the same handful of patterns. They are not in the original replication chapters; they are the engineering vocabulary that lives between the model and the source.
Auxiliary log records emitted at DML time
Section titled “Auxiliary log records emitted at DML time”The master’s regular WAL is physiological: a LOG_UNDOREDO_DATA
record carries the page id, slot id, before-image and after-image
for a single page mutation. That record is enough for crash recovery
on the master itself, but a slave applier cannot decode it without
also knowing the master’s catalog state at the time of the write —
which class id maps to which table name, which index on which
column, which heap layout corresponds to the slot. The remedy is
universal: emit a second record at DML time that carries the
catalog-resolved logical view (table name, primary-key column,
primary-key value, operation kind). MySQL row-based binlog, the
PostgreSQL pgoutput plugin, and CUBRID’s LOG_REPLICATION_DATA
are all the same idea — a redundant record sitting next to the
physiological WAL, paying the bandwidth cost up front so the slave
can apply without consulting the master’s catalog.
Master-side staging in the transaction descriptor
Section titled “Master-side staging in the transaction descriptor”The DML operation generates one or more replication records, but
the records are not appended to the WAL stream until commit. They
have to live somewhere in the meantime. The standard pattern is a
per-transaction array hanging off the transaction descriptor
(tdes in CUBRID, binlog_cache in MySQL, ReorderBuffer entries
in PostgreSQL). On commit, the array is walked and each entry is
turned into a real WAL record, atomically with the commit record
itself; on abort, the array is discarded. CUBRID’s tdes->repl_records
is exactly this — an array of LOG_REPL_RECORD staged until commit.
Atomic emission of replication and commit records
Section titled “Atomic emission of replication and commit records”The slave’s apply algorithm walks the log forward and reacts to
LOG_COMMIT by flushing all replication records seen for that
transaction. The interleaving rule the slave depends on is strict:
between transaction T’s last replication record and its commit
record, no other transaction’s commit record may appear. If a peer
transaction’s commit slipped in, a slave that crashed and restarted
between the two records would incorrectly mark T as committed
without applying its records. The fix is to emit T’s queued
replication records and its commit record under a single hold of
the prior-LSA mutex. CUBRID’s log_append_repl_info_and_commit_log
is precisely this idiom — lock, append all repl records, append the
commit record, unlock.
Slave-side push or pull, and where the daemon lives
Section titled “Slave-side push or pull, and where the daemon lives”The slave’s log fetch is a separate concern from the slave’s log
apply. Pulling the log can be done by a thread on the slave server
itself (PostgreSQL’s walreceiver), or by an out-of-process daemon
(CUBRID’s copylogdb, the legacy MySQL replication SQL thread).
The apply side is similarly separable. Splitting them is universal
because their failure modes are independent: a slow apply must not
stop the slave from receiving new log, or the slave will fall
arbitrarily behind and the master’s archive will eventually be
deleted out from under it.
Forward log walking with a position cursor and durable bookmark
Section titled “Forward log walking with a position cursor and durable bookmark”The applier carries an LSA cursor; it advances the cursor only after
an event has been applied and acknowledged downstream. On restart
it reads the cursor from a persistent location (a system table on
the slave, a file in the log directory, an entry in a control
database). CUBRID’s _db_ha_apply_info system table is exactly
this — committed_lsa, committed_rep_lsa, required_lsa,
maintained by la_log_commit so a daemon restart picks up where
the previous run left off.
Per-transaction buffering until commit on the slave
Section titled “Per-transaction buffering until commit on the slave”Logical events are emitted in interleaved order (T1’s INSERT, T2’s
UPDATE, T1’s INSERT, T2’s COMMIT, T1’s COMMIT), but the slave wants
them in commit order, transaction at a time, all of T2 then all of
T1. The applier solves this with a per-trid hash of pending events.
On LOG_COMMIT it walks the trid’s bucket and dispatches in order;
on LOG_ABORT it discards the bucket. CUBRID’s la_Info.repl_lists[]
(an array of LA_APPLY per transaction) plus la_Info.commit_head
(a queue of LA_COMMIT for committed transactions) implement
this protocol.
Theory ↔ CUBRID mapping
Section titled “Theory ↔ CUBRID mapping”| Theoretical concept | CUBRID name |
|---|---|
| Auxiliary logical-event log record | LOG_REPLICATION_DATA = 39 and LOG_REPLICATION_STATEMENT = 40 (log_record.hpp:116-117) |
| Per-record kind (recovery index) | RVREPL_DATA_INSERT/UPDATE/DELETE/STATEMENT/UPDATE_START/UPDATE_END (recovery.h:149-154) |
| Master-side per-tran staging entry | LOG_REPL_RECORD (replication.h:78) with repl_type, rcvindex, inst_oid, lsa, repl_data, length, must_flush, tde_encrypted |
| Per-tran array on the descriptor | LOG_TDES::repl_records[], num_repl_records, cur_repl_record, fl_mark_repl_recidx (log_impl.h:522-526) |
| Update-LSA back-patch | LOG_TDES::repl_insert_lsa, repl_update_lsa (log_impl.h:527-528); repl_add_update_lsa (replication.c:229) |
| Insert into staging | repl_log_insert (replication.c:293) |
| Statement-level emission | repl_log_insert_statement (replication.c:512) |
| Flush mark for system DDL | repl_start_flush_mark (replication.c:606), repl_end_flush_mark (replication.c:635) |
| Master-side commit-time emission | log_append_repl_info_internal (log_manager.c:4555), log_append_repl_info_and_commit_log (log_manager.c:4647) |
| Atomic repl + commit emission | log_append_repl_info_and_commit_log holds prior_lsa_mutex across both appends |
| Server side of copy protocol | xlogwr_get_log_pages (log_writer.c:2571) |
| Slave side daemon (copylogdb) | logwr_copy_log_file (log_writer.c:1659/1960); writes via logwr_flush_all_append_pages (1016) and logwr_archive_active_log (1275) |
| Slave side daemon (applylogdb) | la_apply_log_file (log_applier.c:8074) |
| Slave per-record dispatch | la_log_record_process (log_applier.c:6101) |
| Slave per-trid pending list | LA_INFO::repl_lists[] of LA_APPLY (log_applier.c:255-264, 298) |
| Slave commit queue | LA_INFO::commit_head / commit_tail of LA_COMMIT (log_applier.c:266-276, 304-305) |
| Slave dispatch fan-out | la_apply_repl_log switching on item->item_type to la_apply_insert/update/delete/statement_log |
| Slave durable bookmark | LA_HA_APPLY_INFO row in _db_ha_apply_info (log_applier.c:393) |
| Slave retryable-error mask | LA_RETRY_ON_ERROR (log_applier.h:34) |
| Slave table-level filter | REPL_FILTER_TYPE and LA_REPL_FILTER (log_applier.h:48, log_applier.c:206) |
CUBRID’s Approach
Section titled “CUBRID’s Approach”CUBRID HA replication has four moving parts: the master-side
emission path that, during DML, puts one LOG_REPL_RECORD into
the transaction descriptor for every catalog-visible row mutation;
the master-side flush path that, on commit, drains those staged
records into the log stream as LOG_REPLICATION_DATA /
LOG_REPLICATION_STATEMENT log records atomically with the commit
record; copylogdb, a client-mode daemon running on the slave
host that pulls active and archive log volumes from the master via
a single net request and writes them to local storage; and
applylogdb, another client-mode daemon that walks the local
log volumes forward, dispatches per record type, and replays the
DML through the slave server’s regular client API. We walk them
in that order.
Overall structure
Section titled “Overall structure”flowchart LR
subgraph M["Master cub_server"]
DML["DML transaction\n(qexec_execute_∗)"]
LOC["locator_∗_force\nlocator_attribute_info_force"]
HEAP["heap_∗_logical\nbtree_update"]
REPL["repl_log_insert\nrepl_add_update_lsa"]
TDES["tdes->repl_records[]\n(LOG_REPL_RECORD)"]
COMMIT["log_commit ->\nlog_append_repl_info_and_commit_log"]
PRIOR["prior_lsa list +\nlog_append_repl\n(LOG_REPLICATION_DATA)"]
LGAT["active log\n(lgat) +\narchive volumes"]
XLW["xlogwr_get_log_pages\n(NET_SERVER_LOGWR_GET_LOG_PAGES)"]
DML --> LOC --> HEAP --> REPL --> TDES
DML --> COMMIT --> PRIOR --> LGAT
LGAT --> XLW
end
subgraph S["Slave host"]
CLDB["copylogdb\nlogwr_copy_log_file"]
SLOG["slave-side log volumes\n(active + archive)"]
APPL["applylogdb\nla_apply_log_file"]
REC["la_log_record_process\nper-record dispatch"]
PEND["la_Info.repl_lists[]\nper-trid LA_APPLY"]
CQ["la_Info.commit_head\nLA_COMMIT queue"]
AP["la_apply_repl_log\nla_apply_insert/update/delete/statement_log"]
SLAVE["slave cub_server\n(client-mode connection)"]
HA["_db_ha_apply_info row\nLA_HA_APPLY_INFO"]
XLW -- "log pages" --> CLDB --> SLOG --> APPL
APPL --> REC
REC -- "REPL records" --> PEND
REC -- "COMMIT/ABORT" --> CQ
CQ --> AP
AP --> SLAVE
AP --> HA
end
The figure encodes four boundaries. (emit / flush) the master
populates tdes->repl_records[] during DML, but the actual
LOG_REPLICATION_DATA log record is only written at commit by
log_append_repl_info. (commit atomicity) log_append_repl_info_and_commit_log
takes the prior-LSA mutex once and holds it across the repl
appends and the commit append, guaranteeing no peer commit
slips between them. (copy / apply) copylogdb and
applylogdb are separate processes that share only the on-disk
log volumes — apply slowness cannot back-pressure the copy. (client-mode
slave server) the apply daemon connects to the slave’s
cub_server over the regular client/server protocol; it does
not write pages directly. This is what makes table-level filtering,
schema divergence tolerance, and per-row error retry possible.
Master side — LOG_REPL_RECORD and the staging array
Section titled “Master side — LOG_REPL_RECORD and the staging array”A DML operation on the master is dispatched through the same execution path it would take on a non-HA server:
sqmgr_execute_query → xqmgr_execute_query → qmgr_process_query → qexec_execute_main_block / qexec_execute_mainblock_internal → qexec_execute_<insert|update|delete> → locator_attribute_info_force → locator_insert_force / locator_update_force / locator_delete_forceInside locator_*_force, two side-effects happen for every row
operation: a physiological log record is emitted via
heap_insert_logical / heap_update_logical / heap_delete_logical
(or the index variants btree_update), and a replication record
is queued via repl_log_insert. The replication record does not go
into the WAL stream at this point — it is appended to a per-transaction
staging array on the transaction descriptor:
// LOG_TDES replication fields — src/transaction/log_impl.h:522int num_repl_records; /* # of replication records (capacity) */int cur_repl_record; /* # of replication records used so far */int append_repl_recidx; /* cursor used at commit-time emission */int fl_mark_repl_recidx; /* index of flush-marked record (DDL) */struct log_repl *repl_records; /* the array */LOG_LSA repl_insert_lsa; /* insert-or-MVCC-update target LSA */LOG_LSA repl_update_lsa; /* in-place-update target LSA */The staging entry is a LOG_REPL_RECORD:
// LOG_REPL_RECORD — src/transaction/replication.h:78typedef struct log_repl LOG_REPL_RECORD;struct log_repl{ LOG_RECTYPE repl_type; /* LOG_REPLICATION_DATA or LOG_REPLICATION_STATEMENT */ LOG_RCVINDEX rcvindex; /* RVREPL_DATA_INSERT / UPDATE / DELETE / UPDATE_START / UPDATE_END / STATEMENT */ OID inst_oid; /* OID of the row being changed */ LOG_LSA lsa; /* LSA of the related "real" log record (filled in later for UPDATE) */ char *repl_data; /* | pkey size | class_name | pkey dbvalue | */ int length; /* repl_data length */ LOG_REPL_FLUSH must_flush; /* DONT_NEED_FLUSH=-1, COMMIT_NEED_FLUSH=0, NEED_FLUSH=1 */ bool tde_encrypted; /* class is TDE-encrypted */};The repl_data payload is intentionally minimal: a 4-byte packed-key
length, an or-packed class name, and the or-packed primary-key
DB_VALUE. The slave does not need the full row image at this
point; it will re-fetch the row from the master’s heap log when it
applies the event. Keeping the staging entry small bounds the
per-transaction memory cost when a DML touches millions of rows.
repl_log_insert — the staging primitive
Section titled “repl_log_insert — the staging primitive”// repl_log_insert — src/transaction/replication.c:293 (condensed)intrepl_log_insert (THREAD_ENTRY *thread_p, const OID *class_oid, const OID *inst_oid, LOG_RECTYPE log_type, LOG_RCVINDEX rcvindex, DB_VALUE *key_dbvalue, REPL_INFO_TYPE repl_info){ int tran_index = LOG_FIND_THREAD_TRAN_INDEX (thread_p); LOG_TDES *tdes = LOG_FIND_TDES (tran_index); LOG_REPL_RECORD *repl_rec;
if (tdes->suppress_replication != 0) { LSA_SET_NULL (&tdes->repl_insert_lsa); LSA_SET_NULL (&tdes->repl_update_lsa); return NO_ERROR; }
/* Allocate / grow tdes->repl_records as needed. */ if (REPL_LOG_IS_NOT_EXISTS (tran_index)) repl_log_info_alloc (tdes, REPL_LOG_INFO_ALLOC_SIZE, false); else if (REPL_LOG_IS_FULL (tran_index)) repl_log_info_alloc (tdes, REPL_LOG_INFO_ALLOC_SIZE, true); /* realloc +100 */
repl_rec = (LOG_REPL_RECORD *) (&tdes->repl_records[tdes->cur_repl_record]); repl_rec->repl_type = log_type; repl_rec->rcvindex = rcvindex; /* RBR_START / RBR_END refine UPDATE into UPDATE_START / UPDATE_END */ if (rcvindex == RVREPL_DATA_UPDATE) { /* ... map repl_info → rcvindex ... */ }
COPY_OID (&repl_rec->inst_oid, inst_oid);
if (log_type == LOG_REPLICATION_DATA) { /* Build | packed_key_size | class_name | pkey_dbvalue | */ repl_rec->length = OR_INT_SIZE + or_packed_string_length (class_name, &strlen) + OR_VALUE_ALIGNED_SIZE (key_dbvalue); repl_rec->repl_data = malloc (repl_rec->length); /* ... pack class_name + key_dbvalue, fill packed_key_size ... */ } repl_rec->must_flush = LOG_REPL_COMMIT_NEED_FLUSH;
/* Bookkeeping: link the LSA of the heap log to the repl record. */ switch (rcvindex) { case RVREPL_DATA_INSERT: if (!LSA_ISNULL (&tdes->repl_insert_lsa)) { LSA_COPY (&repl_rec->lsa, &tdes->repl_insert_lsa); LSA_SET_NULL (&tdes->repl_insert_lsa); LSA_SET_NULL (&tdes->repl_update_lsa); } break; case RVREPL_DATA_UPDATE: /* For update, the heap log is written *after* the repl record; repl_add_update_lsa back-patches repl_rec->lsa later. */ LSA_SET_NULL (&repl_rec->lsa); break; case RVREPL_DATA_DELETE: /* For delete, no after-image is needed — pkey is enough. */ break; } tdes->cur_repl_record++; tdes->must_flush = LOG_REPL_NEED_FLUSH; return NO_ERROR;}Three points worth marking up. (a) The default array size is
100 (REPL_LOG_INFO_ALLOC_SIZE), grown by 100 on overflow via
realloc; transactions that never exceed 100 row mutations never
pay reallocation cost. (b) RVREPL_DATA_UPDATE maps to one of
three sub-kinds — UPDATE, UPDATE_START, UPDATE_END — driven
by REPL_INFO_TYPE. The split exists so the slave can recognise
multi-statement updates within a system op (the START / END pair
brackets the changes). (c) The relationship between the
replication record’s lsa field and the heap log is asymmetric
across operations. For INSERT, the heap log is written before
repl_log_insert is called, so tdes->repl_insert_lsa already
holds it and the function copies it directly. For UPDATE, the heap
log is written after the index update path that triggered
repl_log_insert, so the field is set null and back-patched later
by repl_add_update_lsa once locator_update_force has the heap
LSA in hand. For DELETE, the lsa is irrelevant — the slave only
needs the primary key.
repl_add_update_lsa — the back-patch for UPDATE
Section titled “repl_add_update_lsa — the back-patch for UPDATE”// repl_add_update_lsa — src/transaction/replication.c:229 (condensed)intrepl_add_update_lsa (THREAD_ENTRY *thread_p, const OID *inst_oid){ LOG_TDES *tdes = LOG_FIND_TDES (LOG_FIND_THREAD_TRAN_INDEX (thread_p)); if (tdes->suppress_replication != 0) return NO_ERROR;
/* Walk backwards through repl_records; the last one matching this * OID with a non-null repl_update_lsa is the one we just inserted. */ for (int i = tdes->cur_repl_record - 1; i >= 0; i--) { LOG_REPL_RECORD *repl_rec = &tdes->repl_records[i]; if (OID_EQ (&repl_rec->inst_oid, inst_oid) && !LSA_ISNULL (&tdes->repl_update_lsa)) { assert (repl_rec->rcvindex == RVREPL_DATA_UPDATE || repl_rec->rcvindex == RVREPL_DATA_UPDATE_START || repl_rec->rcvindex == RVREPL_DATA_UPDATE_END); LSA_COPY (&repl_rec->lsa, &tdes->repl_update_lsa); LSA_SET_NULL (&tdes->repl_update_lsa); LSA_SET_NULL (&tdes->repl_insert_lsa); return NO_ERROR; } } return NO_ERROR; /* not found is not an error — debug log only */}The function is called by locator_update_force after the heap
update has been logged; tdes->repl_update_lsa holds the LSA of
the heap log; the matching LOG_REPL_RECORD (recently produced by
btree_update → repl_log_insert) gets that LSA stamped into its
lsa field. Walking backwards is correct because the most recent
matching OID is always the one we just inserted: even when the
same row is updated repeatedly within one transaction, the prior
update’s lsa was already non-null (it was patched in its own
repl_add_update_lsa call) and is skipped over.
Walking one INSERT through the master path
Section titled “Walking one INSERT through the master path”The deck illustrates the pattern with INSERT (1, "가") followed
by INSERT (2, "나") and COMMIT on t1(c1 PK, c2). The order of
work for each insert is:
heap_insert_logical→heap_insert_physicalwrites the row into the slotted page.heap_log_insert_physicalappends aLOG_UNDOREDO_DATArecord to the prior list.locator_add_or_remove_index→btree_insertupdates the primary-key B-tree index. Inside the index path,repl_log_insertis called: a newLOG_REPL_RECORDis appended totdes->repl_records[]withrcvindex = RVREPL_DATA_INSERT,inst_oid = OID of the new row,lsa = LSA of the heap log, and the packed(pk_size, class_name, pk_dbvalue)payload.
After the second insert, tdes->repl_records[] has two entries:
| idx | rcvindex | inst_oid | lsa | repl_data |
|---|---|---|---|---|
| 0 | RVREPL_DATA_INSERT | (oid_1) | LSA(heap_1) | “t1” + 1 |
| 1 | RVREPL_DATA_INSERT | (oid_2) | LSA(heap_2) | “t1” + 2 |
Up to this point no LOG_REPLICATION_DATA record exists in the
WAL stream. The records exist only on the descriptor. COMMIT
is the trigger that converts them.
Commit-time emission — log_append_repl_info_and_commit_log
Section titled “Commit-time emission — log_append_repl_info_and_commit_log”log_commit calls log_append_repl_info_and_commit_log (when the
HA configuration demands it), which is the atomic emission idiom:
// log_append_repl_info_and_commit_log — src/transaction/log_manager.c:4647static voidlog_append_repl_info_and_commit_log (THREAD_ENTRY *thread_p, LOG_TDES *tdes, LOG_LSA *commit_lsa){ if (tdes->has_supplemental_log) { log_append_supplemental_info (thread_p, LOG_SUPPLEMENT_TRAN_USER, strlen (tdes->client.get_db_user ()), tdes->client.get_db_user ()); tdes->has_supplemental_log = false; }
log_Gl.prior_info.prior_lsa_mutex.lock (); log_append_repl_info_with_lock (thread_p, tdes, true); log_append_commit_log_with_lock (thread_p, tdes, commit_lsa); log_Gl.prior_info.prior_lsa_mutex.unlock ();}The mutex is held across both appends; this is the atomicity
guarantee the slave depends on. Inside log_append_repl_info_internal
each staged record is converted into a real prior-list node:
// log_append_repl_info_internal — src/transaction/log_manager.c:4555 (condensed)static voidlog_append_repl_info_internal (THREAD_ENTRY *thread_p, LOG_TDES *tdes, bool is_commit, int with_lock){ if (tdes->append_repl_recidx == -1 || is_commit) tdes->append_repl_recidx = 0;
while (tdes->append_repl_recidx < tdes->cur_repl_record) { LOG_REPL_RECORD *repl_rec = &tdes->repl_records[tdes->append_repl_recidx];
if ((repl_rec->repl_type == LOG_REPLICATION_DATA || repl_rec->repl_type == LOG_REPLICATION_STATEMENT) && ((is_commit && repl_rec->must_flush != LOG_REPL_DONT_NEED_FLUSH) || repl_rec->must_flush == LOG_REPL_NEED_FLUSH)) {
LOG_PRIOR_NODE *node = prior_lsa_alloc_and_copy_data (thread_p, repl_rec->repl_type, RV_NOT_DEFINED, NULL, repl_rec->length, repl_rec->repl_data, 0, NULL); LOG_REC_REPLICATION *log = (LOG_REC_REPLICATION *) node->data_header; if (repl_rec->rcvindex == RVREPL_DATA_DELETE || repl_rec->rcvindex == RVREPL_STATEMENT) LSA_SET_NULL (&log->lsa); else LSA_COPY (&log->lsa, &repl_rec->lsa); log->length = repl_rec->length; log->rcvindex = repl_rec->rcvindex;
prior_lsa_next_record_with_lock (thread_p, node, tdes); repl_rec->must_flush = LOG_REPL_DONT_NEED_FLUSH; } tdes->append_repl_recidx++; }}The function emits LOG_REPLICATION_DATA (= 39) or
LOG_REPLICATION_STATEMENT (= 40) — both real LOG_RECTYPE
values defined alongside the regular log record types in
log_record.hpp. The emitted record’s LOG_REC_REPLICATION
data-header carries the rcvindex (so the slave knows whether
this is INSERT/UPDATE/DELETE/STATEMENT), the lsa of the
referenced heap log (so the slave can fetch the row image), and
the length of the repl_data payload (the inline class-name +
primary-key bytes, copied as the prior-node’s body).
For INSERT and STATEMENT the lsa field is set null in the emitted
log because the slave reconstructs the row from the immediately-
preceding heap log; for DELETE the lsa is null because the slave
needs only the primary key; for UPDATE the lsa was back-patched by
repl_add_update_lsa and is now copied through.
After the loop, must_flush = LOG_REPL_DONT_NEED_FLUSH on every
emitted record so a subsequent abort path will not re-emit them.
Master flush path
Section titled “Master flush path”After log_append_repl_info_and_commit_log returns, the prior list
contains, in order, every staged repl record followed by the
LOG_COMMIT. The drain in logpb_prior_lsa_append_all_list (see
cubrid-log-manager.md) walks the list and copies records into
the log page buffer; logpb_flush_all_append_pages writes them to
the active log file. The slave-bound emissions are now durable on
the master and visible to a copylogdb poll.
Slave side — copylogdb, the log-volume puller
Section titled “Slave side — copylogdb, the log-volume puller”copylogdb is a client-mode CUBRID utility (registered in
util_service.c and started by cubrid hb start together with
the rest of the HA topology). Its job is to keep a slave-local
copy of the master’s active and archive log volumes up to date.
The protocol is one-shot per loop iteration: send a
NET_SERVER_LOGWR_GET_LOG_PAGES request whose body is the
first-page LSA the slave is missing (first_pageid_torecv = last_recv_pageid); the master’s xlogwr_get_log_pages
(log_writer.c:2571) responds with up to
LOGWR_COPY_LOG_BUFFER_NPAGES * LOG_PAGESIZE bytes (default 128
pages × 16 KiB = 2 MiB) of contiguous log pages. If the requested
page does not yet exist on the master, the master blocks and
returns when it does — turning what would otherwise be a poll into
an event-driven push.
The master-side scaffolding to satisfy a request:
// xlogwr_get_log_pages — src/transaction/log_writer.c (high-level)xlogwr_get_log_pages (THREAD_ENTRY *thread_p, LOG_PAGEID first_pageid, LOGWR_MODE mode){ /* For each page from first_pageid to eof_lsa.pageid, * - if !logpb_is_page_in_archive: read from active via * logpb_copy_page_from_file → logpb_read_page_from_file → fileio_read * - else: locate the right archive via logpb_get_guess_archive_num, * logpb_arv_page_info_table search, then fetch via * logpb_fetch_from_archive * Pack via logwr_pack_log_pages, send via xlog_send_log_pages_to_client. */}The master, when locating an archive, consults the
logpb_arv_page_info_table (an in-memory cache of
(arv_num, fpageid, lpageid) records updated whenever an archive
is created), and falls back to Log_Nname_info file scanning if
the cache is cold. The “guess” in the function name refers to the
arithmetic estimator: when an active log header is available, the
function divides the requested pageid by LOGPB_ACTIVE_NPAGES to
estimate the archive number; otherwise it starts from archive 0
and scans forward. Once it has a candidate archive, it compares
the candidate’s Arv_hdr->fpageid against the requested pageid and
walks forward (direction = +1) or backward (direction = -1)
through the archive sequence as needed.
On the slave side, logwr_copy_log_file (log_writer.c:1659/1960)
issues the request, fills its own Logwr_Gl structure with the
arrived pages, and writes them through logwr_flush_all_append_pages
(1016) to the slave-local active log. When the active log
crosses its size boundary, logwr_archive_active_log (1275)
copies the current active log’s contents into a new archive
volume, page by page (fileio_read_pages + fileio_write_pages),
and the slave-local active log is reset. Its name and structure
are identical to a regular CUBRID log volume — applylogdb reads
it the same way the master’s recovery would.
Slave side — applylogdb, the per-record dispatcher
Section titled “Slave side — applylogdb, the per-record dispatcher”applylogdb’s entry point is la_apply_log_file (log_applier.c:8074).
It runs as a long-running daemon. Its main loop fetches log records
forward from la_Info.final_lsa and hands each record to
la_log_record_process:
// la_log_record_process — src/transaction/log_applier.c:6101 (condensed)static intla_log_record_process (LOG_RECORD_HEADER *lrec, LOG_LSA *final, LOG_PAGE *pg_ptr){ /* Defensive: a non-EOL record must have non-null prev_tranlsa. */ if (lrec->trid == NULL_TRANID || LSA_GT (&lrec->prev_tranlsa, final) || LSA_GT (&lrec->back_lsa, final)) { if (lrec->type != LOG_END_OF_LOG) return ER_LOG_PAGE_CORRUPTED; }
/* First time we see this trid — register an LA_APPLY for it. */ if ((lrec->type != LOG_END_OF_LOG && lrec->type != LOG_DUMMY_HA_SERVER_STATE) && lrec->trid != LOG_SYSTEM_TRANID && LSA_ISNULL (&lrec->prev_tranlsa)) { LA_APPLY *apply = la_add_apply_list (lrec->trid); /* ... start_lsa bookkeeping ... */ }
switch (lrec->type) { case LOG_END_OF_LOG: /* Reached end of currently-known log. Set is_end_of_record and * return ER_INTERRUPTED so the caller waits for more pages. */ return ER_INTERRUPTED;
case LOG_REPLICATION_DATA: case LOG_REPLICATION_STATEMENT: /* Buffer this event in the trid's apply list. */ return la_set_repl_log (pg_ptr, lrec->type, lrec->trid, final);
case LOG_SYSOP_END: case LOG_COMMIT: /* Flush the trid's apply list onto the slave. */ if (LSA_GT (final, &la_Info.committed_lsa)) { eot_time = (lrec->type == LOG_SYSOP_END) ? 0 : la_retrieve_eot_time (pg_ptr, final); la_add_node_into_la_commit_list (lrec->trid, final, lrec->type, eot_time); do { error = la_apply_commit_list (&lsa_apply, final_pageid); /* ... handle ER_NET_CANT_CONNECT_SERVER, ER_HA_LA_EXCEED_MAX_MEM_SIZE, LA_IS_FLUSH_ERROR, ER_TDE_CIPHER_IS_NOT_LOADED ... */ if (!LSA_ISNULL (&lsa_apply)) { LSA_COPY (&la_Info.committed_lsa, &lsa_apply); if (lrec->type == LOG_COMMIT) la_Info.commit_counter++; } } while (!LSA_ISNULL (&lsa_apply)); } else { la_free_repl_items_by_tranid (lrec->trid); /* already past committed */ } break;
case LOG_ABORT: la_add_node_into_la_commit_list (lrec->trid, final, LOG_ABORT, 0); break;
case LOG_DUMMY_HA_SERVER_STATE: /* Detect master role change; if state != ACTIVE && != TO_BE_STANDBY, * the slave's role has changed → set is_role_changed and return * ER_INTERRUPTED so the daemon shuts down cleanly. */ break;
default: break; } /* ... handle out-of-bounds forw_lsa / type → ER_LOG_PAGE_CORRUPTED ... */ return NO_ERROR;}The dispatch is tight: every record type is either buffered (the two REPL types), triggered for flush (COMMIT, SYSOP_END, ABORT), or consumed for control (DUMMY_HA_SERVER_STATE, END_OF_LOG, DUMMY_CRASH_RECOVERY, END_CHKPT). All other types fall through the default arm — they are not relevant to apply.
la_set_repl_log — buffering a REPL record
Section titled “la_set_repl_log — buffering a REPL record”// la_set_repl_log — src/transaction/log_applier.c:3419static intla_set_repl_log (LOG_PAGE *log_pgptr, int log_type, int tranid, LOG_LSA *lsa){ LA_APPLY *apply = la_find_apply_list (tranid); if (apply == NULL) return NO_ERROR;
/* Long transaction: bypass the per-item buffer; just remember last_lsa. */ if (apply->is_long_trans) { LSA_COPY (&apply->last_lsa, lsa); return NO_ERROR; }
/* Cap per-trid items at LA_MAX_REPL_ITEMS (1000) — overflow degrades * the trid into "long transaction" mode (re-fetch from log on apply). */ if (apply->num_items >= LA_MAX_REPL_ITEMS) { la_free_all_repl_items_except_head (apply); apply->is_long_trans = true; LSA_COPY (&apply->last_lsa, lsa); return NO_ERROR; }
LA_ITEM *item = la_make_repl_item (log_pgptr, log_type, tranid, lsa); la_add_repl_item (apply, item); return NO_ERROR;}The bucketed structure is LA_INFO::repl_lists[] — an array of
LA_APPLY pointers, indexed by a hash of trid. Each LA_APPLY
holds the per-trid linked list of LA_ITEM:
// LA_APPLY and LA_ITEM — src/transaction/log_applier.c:236-264struct la_item { LA_ITEM *next, *prev; int log_type; /* LOG_REPLICATION_DATA / LOG_REPLICATION_STATEMENT */ int item_type; /* RVREPL_DATA_INSERT / UPDATE / DELETE / STATEMENT */ char *class_name; /* unpacked from the REPL record */ char *db_user; char *ha_sys_prm; int packed_key_value_length; char *packed_key_value; /* disk image of pkey value */ DB_VALUE key; /* unpacked from packed_key_value on demand */ LOG_LSA lsa; /* LSA of the LOG_REPLICATION_* record itself */ LOG_LSA target_lsa; /* LSA of the target heap/btree log record */};
struct la_apply { int tranid; int num_items; bool is_long_trans; /* exceeded LA_MAX_REPL_ITEMS — re-walk on apply */ LOG_LSA start_lsa; LOG_LSA last_lsa; LA_ITEM *head; LA_ITEM *tail;};The is_long_trans flag is the escape hatch for the
million-row-update problem: rather than carry a million LA_ITEM
in memory, the daemon switches to a mode where it remembers only
start_lsa and last_lsa, and on commit it walks the log forward
from start_lsa to last_lsa re-fetching each REPL record. The
trade-off is one extra log walk per long-transaction trid, in
exchange for bounded memory.
la_apply_commit_list and la_apply_repl_log — the apply fan-out
Section titled “la_apply_commit_list and la_apply_repl_log — the apply fan-out”When LOG_COMMIT arrives, the dispatcher queues a LA_COMMIT
node in la_Info.commit_head/commit_tail and calls
la_apply_commit_list in a loop until the head goes empty:
// la_apply_commit_list / la_apply_repl_log — src/transaction/log_applier.c:5920, 5739static intla_apply_commit_list (LOG_LSA *lsa, LOG_PAGEID final_pageid) { LA_COMMIT *commit = la_Info.commit_head; if (commit && (commit->type == LOG_COMMIT || commit->type == LOG_SYSOP_END || commit->type == LOG_ABORT)) { error = la_apply_repl_log (commit->tranid, commit->type, &commit->log_lsa, &la_Info.total_rows, final_pageid); LSA_COPY (lsa, &commit->log_lsa); /* ... unlink commit, advance head, update _db_ha_apply_info ... */ } return error;}
static intla_apply_repl_log (int tranid, int rectype, LOG_LSA *commit_lsa, int *total_rows, LOG_PAGEID final_pageid) { LA_APPLY *apply = la_find_apply_list (tranid); if (rectype == LOG_ABORT) { la_clear_applied_info (apply); return NO_ERROR; }
for (LA_ITEM *item = apply->head; item != NULL; item = next) { if (LSA_GT (&item->lsa, &la_Info.last_committed_rep_lsa) && la_need_filter_out (item) == false) { if (item->log_type == LOG_REPLICATION_DATA) { switch (item->item_type) { case RVREPL_DATA_UPDATE_START: case RVREPL_DATA_UPDATE_END: case RVREPL_DATA_UPDATE: error = la_apply_update_log (item); break; case RVREPL_DATA_INSERT: error = la_apply_insert_log (item); break; case RVREPL_DATA_DELETE: error = la_apply_delete_log (item); break; } } else if (item->log_type == LOG_REPLICATION_STATEMENT) { error = la_apply_statement_log (item); } if (error == NO_ERROR) LSA_COPY (&la_Info.committed_rep_lsa, &item->lsa); else if (LA_RETRY_ON_ERROR (error)) { LA_SLEEP (10, 0); continue; } /* ... handle ER_NET_CANT_CONNECT_SERVER, log error, advance ... */ } next = la_get_next_repl_item (item, apply->is_long_trans, &apply->last_lsa); la_free_repl_item (apply, item); item = next; } /* ... end-of-trid bookkeeping; clear or free per LOG_SYSOP_END semantics ... */ return error;}la_apply_insert_log, la_apply_update_log, and
la_apply_delete_log are the three workers. They share a
common shape:
- Resolve the class.
class_namefrom theLA_ITEMis resolved against the slave’s catalog to get aDB_OBJECT*. - Reconstruct the row image. For INSERT/UPDATE, the item’s
target_lsapoints at the master’s heap log record; the daemon reads it viala_get_log_data, with helpersla_get_overflow_recdes(BIGONE / link-change),la_get_relocation_recdes(REC_RELOCATION + REC_NEWHOME), andla_get_next_update_log(REC_ASSIGN_ADDRESS deferred update). The result is aRECDEScontaining the after-image only; CUBRID does not ship before-images for replication. - Apply the row.
la_repl_add_objectcalls into the slave server’s regular client API (db_create,db_otmpl_*) with the reconstructed row. The slave server runs the operation as a normal DML, taking its own locks, generating its own MVCC IDs, writing its own WAL. - Track or retry. On success,
committed_rep_lsaadvances. On a retryable error (deadlock, lock timeout, page latch abort, TDE cipher not loaded — theLA_RETRY_ON_ERRORmask), the daemon sleeps 10 seconds and retries. On a non-retryable error, the operation is logged and the daemon advances past it.
Delete is simpler: only the primary key is needed, no row image
fetch, la_repl_add_object is called with recdes = NULL.
Reconstructing the row image — case analysis
Section titled “Reconstructing the row image — case analysis”la_get_recdes is the dispatcher that produces an after-image
RECDES from an item->target_lsa. Five record-type cases matter:
1. Normal heap record (REC_HOME). La_get_log_data() — header + redo + undo, copy redo into recdes.
2. RVOVF_CHANGE_LINK — the record is BIGONE, but only the linkage to overflow pages changed. La_get_overflow_recdes(..., RVOVF_PAGE_UPDATE) — walk forward collecting overflow-page redo until the dummy/anchor record.
3. recdes->type == REC_BIGONE — the record is a fresh BIGONE. La_get_overflow_recdes(..., RVOVF_NEWPAGE_INSERT) — collect the freshly-inserted overflow chain.
4. RVHF_INSERT && recdes->type == REC_ASSIGN_ADDRESS — the heap reserved a slot first, then the actual data update was deferred. La_get_next_update_log() — chase forw_lsa within the same trid to find the deferred update record.
5. (RVHF_UPDATE || RVHF_UPDATE_NOTIFY_VACUUM) && recdes->type == REC_RELOCATION — the record is the REC_RELOCATION pointer; the actual REC_NEWHOME lives elsewhere. La_get_relocation_recdes() — chase prev_tranlsa within the same trid to find the REC_NEWHOME companion.The chase functions all read forward or backward through the log
following one of the three header LSAs (forw_lsa, prev_tranlsa,
back_lsa); they decode physiological log records and decompress
them with the daemon’s per-instance LOG_ZIP contexts
(la_Info.undo_unzip_ptr, la_Info.redo_unzip_ptr).
la_log_commit — durable bookmark
Section titled “la_log_commit — durable bookmark”After every batch of applies, la_log_commit updates the
_db_ha_apply_info system table on the slave with the new
committed_lsa, committed_rep_lsa, final_lsa, and the
running counters. The row is keyed by the master’s
db_name/copied_log_path; on daemon restart the row is read
back and used to seed la_Info.final_lsa. This is the durable
end of the apply cursor.
A complete master → slave commit, end to end
Section titled “A complete master → slave commit, end to end”sequenceDiagram
participant TX as Master DML thread
participant LOC as locator_*_force
participant REPL as repl_log_insert
participant TDES as tdes->repl_records
participant FLUSH as log_append_repl_info_<br/>and_commit_log
participant PRIOR as prior_lsa list
participant LGAT as master active log
participant XLW as xlogwr_get_log_pages
participant CL as copylogdb (slave host)
participant SLOG as slave-local log
participant AL as applylogdb
participant DISP as la_log_record_process
participant AP as la_apply_<insert|update|delete>_log
participant SS as slave cub_server
participant HA as _db_ha_apply_info
TX->>LOC: INSERT (1, "가")
LOC->>LOC: heap_insert_logical → log_undoredo
LOC->>REPL: repl_log_insert (RVREPL_DATA_INSERT)
REPL->>TDES: append LOG_REPL_RECORD
Note over TX,TDES: caller continues — no WAL emission yet
TX->>FLUSH: COMMIT
FLUSH->>FLUSH: prior_lsa_mutex.lock()
loop each LOG_REPL_RECORD
FLUSH->>PRIOR: append LOG_REPLICATION_DATA
end
FLUSH->>PRIOR: append LOG_COMMIT
FLUSH->>FLUSH: prior_lsa_mutex.unlock()
PRIOR->>LGAT: drain + flush (logpb_flush_all_append_pages)
CL->>XLW: NET_SERVER_LOGWR_GET_LOG_PAGES (last_recv_pageid)
XLW->>LGAT: read pages
XLW-->>CL: up to 128 × LOG_PAGESIZE bytes
CL->>SLOG: write pages (logwr_flush_all_append_pages)
AL->>SLOG: la_get_page_buffer (la_Info.final_lsa)
AL->>DISP: lrec = LOG_REPLICATION_DATA
DISP->>DISP: la_set_repl_log → repl_lists[trid]
AL->>DISP: lrec = LOG_COMMIT
DISP->>DISP: la_add_node_into_la_commit_list
DISP->>AP: la_apply_commit_list → la_apply_repl_log
AP->>SS: db_otmpl_create / db_template_*
SS-->>AP: success / retryable / fatal
AP->>HA: la_log_commit (committed_rep_lsa)
The pipeline holds two interleaved orderings. LSA order is
enforced on the master at attach time by the prior-LSA mutex — the
emitted LOG_REPLICATION_DATA records and the LOG_COMMIT are
strictly monotonic. Apply order on the slave is enforced by
the per-trid buffer plus the commit queue — events are buffered as
they appear, but applied only when LOG_COMMIT is reached, in the
order in which LOG_COMMIT records arrive. The two orderings agree
because the slave walks the log forward in LSA order and only one
commit’s events fan out at a time.
Source Walkthrough
Section titled “Source Walkthrough”Anchor on symbol names, not line numbers.
Master — staging structures
Section titled “Master — staging structures”LOG_REPL_RECORD(replication.h) — staging entry per row mutation.LOG_REPL_FLUSHenum (replication.h) —DONT_NEED_FLUSH = -1,COMMIT_NEED_FLUSH = 0,NEED_FLUSH = 1.REPL_INFO_TYPEenum (replication.h) —SBR,RBR_START,RBR_NORMAL,RBR_END.LOG_TDES::repl_records/num_repl_records/cur_repl_record/append_repl_recidx/fl_mark_repl_recidx/repl_insert_lsa/repl_update_lsa/must_flush(log_impl.h).REPL_LOG_INFO_ALLOC_SIZE,REPL_LOG_IS_NOT_EXISTS,REPL_LOG_IS_FULL(replication.c).
Master — staging primitives
Section titled “Master — staging primitives”repl_log_insert(replication.c) — append aLOG_REPL_RECORDtotdes->repl_records[].repl_log_insert_statement(replication.c) — statement-based emission for DDL / replicated session statements.repl_add_update_lsa(replication.c) — back-patchrepl_rec->lsaafter the heap log for UPDATE.repl_log_info_alloc(replication.c) — initial alloc + grow-by-100 realloc.repl_start_flush_mark/repl_end_flush_mark(replication.c) — bracket DDL emissions that must flush even on rollback.repl_log_abort_after_lsa(replication.c) — drop staged records past a savepoint LSA.
Master — DML emission sites
Section titled “Master — DML emission sites”locator_attribute_info_force/locator_insert_force/locator_update_force/locator_delete_force(locator_sr.c).locator_add_or_remove_index(locator_sr.c) — INSERT path, callsbtree_insert.locator_update_index(locator_sr.c) — UPDATE path, callsbtree_update.heap_insert_logical/heap_update_logical/heap_delete_logical/heap_log_insert_physical/heap_log_update_physical(heap_file.c).btree_update(btree.c) — insiderepl_log_insertis called for the index side.
Master — commit-time emission
Section titled “Master — commit-time emission”log_append_repl_info_internal(log_manager.c) — convert staged records to prior-list nodes.log_append_repl_info(log_manager.c) — public entry, no lock.log_append_repl_info_with_lock(log_manager.c) — variant taken when caller already holds the prior mutex.log_append_repl_info_and_commit_log(log_manager.c) — atomic emission of repl + commit.log_commit/log_commit_local(log_manager.c) — top-level commit drivers.LOG_REC_REPLICATION(log_record.hpp) — on-disk data header forLOG_REPLICATION_DATA/LOG_REPLICATION_STATEMENT.
Master — wire side of the copy protocol
Section titled “Master — wire side of the copy protocol”xlogwr_get_log_pages(log_writer.c) — server entry forNET_SERVER_LOGWR_GET_LOG_PAGES.logwr_pack_log_pages(log_writer.c) — packs a contiguous range.xlog_send_log_pages_to_client(server-support side) — wire write.logpb_copy_page_from_file/logpb_read_page_from_file(log_page_buffer.c) — physical fetch.logpb_fetch_from_archive/logpb_get_guess_archive_num(log_page_buffer.c) — archive lookup.logpb_arv_page_info_table— in-memory cache, updated on every archive create.
Slave — copylogdb / logwr_*
Section titled “Slave — copylogdb / logwr_*”logwr_initialize(log_writer.c) — initLogwr_Gl, open active log.logwr_copy_log_file(log_writer.c) — main loop: fetch → write → archive.logwr_set_hdr_and_flush_info(log_writer.c) — header reconciliation after each batch.logwr_writev_append_pages/logwr_flush_all_append_pages(log_writer.c) — write to slave-local active log.logwr_archive_active_log(log_writer.c) — roll active to archive when full.logwr_flush_header_page/logwr_flush_bgarv_header_page(log_writer.c) — header writeback.logwr_to_physical_pageid(log_writer.c) — logical → physical page id.logwr_check_page_checksum(log_writer.c) — per-page integrity check.
Slave — applylogdb infrastructure
Section titled “Slave — applylogdb infrastructure”LA_INFOglobal (log_applier.c) — per-process state.LA_APPLY/LA_ITEM/LA_COMMIT/LA_HA_APPLY_INFO/LA_CACHE_PB/LA_CACHE_BUFFER/LA_REPL_FILTER/LA_OVF_PAGE_LIST/LA_RECDES_POOL(log_applier.c) — internal types.LA_RETRY_ON_ERRORmask (log_applier.h) — retryable apply errors.REPL_FILTER_TYPEenum (log_applier.h) —NONE/INCLUDE_TBL/EXCLUDE_TBL.LA_MAX_REPL_ITEMS(1000),LA_MAX_REPL_ITEM_WITHOUT_RELEASE_PB(50),LA_STATUS_BUSY/LA_STATUS_IDLE(log_applier.c).
Slave — log fetch and page cache
Section titled “Slave — log fetch and page cache”la_log_fetch/la_log_fetch_from_archive(log_applier.c) — read a log page from active or archive.la_get_page_buffer/la_release_page_buffer(log_applier.c) — page cache access with refcount.la_init_cache_pb/la_init_cache_log_buffer(log_applier.c) — page cache init.la_cache_buffer_replace/la_invalidate_page_buffer/la_decache_page_buffers(log_applier.c) — eviction.la_init_recdes_pool/la_assign_recdes_from_pool(log_applier.c) — preallocatedRECDESpool.
Slave — daemon entry and main loop
Section titled “Slave — daemon entry and main loop”la_apply_log_file(log_applier.c) — daemon main, the entry fromcubrid hb start.la_init(log_applier.c) — init globals, allocate caches, spawn helper threads.la_apply_pre(log_applier.c) — pre-flight: lock, fetch header, check duplicates.la_change_state(log_applier.c) — slave-state change handler.la_log_commit(log_applier.c) — checkpoint_db_ha_apply_info.la_force_shutdown(log_applier.h) — external shutdown hook.
Slave — record dispatch
Section titled “Slave — record dispatch”la_log_record_process(log_applier.c) — switch onlrec->type.la_set_repl_log(log_applier.c) — buffer aLOG_REPLICATION_*record.la_make_repl_item/la_add_repl_item(log_applier.c) — buildLA_ITEMfrom a log page.la_find_apply_list/la_add_apply_list(log_applier.c) — per-trid bucket lookup.la_init_repl_lists(log_applier.c) — bucket array init / realloc.la_add_node_into_la_commit_list/la_retrieve_eot_time(log_applier.c) — commit queue.la_log_copy_fromlog(log_applier.c) — copy bytes across log-page boundaries.
Slave — apply and row reconstruction
Section titled “Slave — apply and row reconstruction”la_apply_commit_list(log_applier.c) — drainLA_COMMITqueue, dispatch one trid per call.la_apply_repl_log(log_applier.c) — per-item dispatch overLA_APPLY::head.la_apply_insert_log/la_apply_update_log/la_apply_delete_log/la_apply_statement_log(log_applier.c) — the four per-kind appliers.la_repl_add_object(log_applier.c) — common end-stage that calls into the slave server.la_get_recdes(log_applier.c) — five-case after-image reconstructor.la_get_log_data(log_applier.c) — read+decompress one heap log.la_get_overflow_recdes(log_applier.c) — BIGONE chain walker.la_get_relocation_recdes(log_applier.c) — REC_RELOCATION → REC_NEWHOME chase.la_get_next_update_log(log_applier.c) — REC_ASSIGN_ADDRESS deferred-update chase.la_get_undoredo_diff/la_get_zipped_data(log_applier.c) — diff and zlib unzip.la_make_room_for_mvcc_insid/la_make_room_for_mvcc_delid_and_prev_ver(log_applier.c) — MVCC header injection on the slave-applied row.la_disk_to_obj(log_applier.c) — RECDES → DB_OTMPL conversion.la_need_filter_out/la_create_repl_filter/la_print_repl_filter_info(log_applier.c) — table-level filter.
Slave — durable bookkeeping
Section titled “Slave — durable bookkeeping”la_init_ha_apply_info(log_applier.c) — zero-fill aLA_HA_APPLY_INFO.la_get_ha_apply_info(log_applier.c) — read_db_ha_apply_info.la_insert_ha_apply_info/la_update_ha_last_applied_info/la_update_ha_apply_info_start_time/la_update_ha_apply_info_log_record_time(log_applier.c) — write back.la_delete_ha_apply_info(log_applier.c) — cleanup on full reset.la_get_last_ha_applied_info(log_applier.c) — restart bookmark.la_find_required_lsa(log_applier.c) — minimal-needed-LSA computation.la_remove_archive_logs(log_applier.c) — slave-local archive trimming after apply.
Wire-up at the utility level
Section titled “Wire-up at the utility level”util_service.c—cubrid hb start/cubrid heartbeat startwirescub_masterplus per-hostcopylogdbandapplylogdb.commdb.c— operator-side activation of HA viacub_commdb.connection/heartbeat.c— process-sidehb_register_to_masteris called fromcopylogdb/applylogdbstartup socub_masterknows they are alive.
Position hints as of 2026-04-30
Section titled “Position hints as of 2026-04-30”| Symbol | File | Line |
|---|---|---|
LOG_REPLICATION_DATA | log_record.hpp | 116 |
LOG_REPLICATION_STATEMENT | log_record.hpp | 117 |
RVREPL_DATA_INSERT/UPDATE/DELETE/STATEMENT | recovery.h | 149-154 |
LOG_REPL_RECORD (struct log_repl) | replication.h | 78 |
LOG_REPL_FLUSH enum | replication.h | 70 |
REPL_INFO_TYPE enum | replication.h | 43 |
LOG_TDES::repl_records group | log_impl.h | 522-528 |
REPL_LOG_INFO_ALLOC_SIZE | replication.c | 49 |
repl_log_info_alloc | replication.c | 165 |
repl_add_update_lsa | replication.c | 229 |
repl_log_insert | replication.c | 293 |
repl_log_insert_statement | replication.c | 512 |
repl_start_flush_mark | replication.c | 606 |
repl_end_flush_mark | replication.c | 635 |
repl_log_abort_after_lsa | replication.c | 673 |
log_append_repl_info_internal | log_manager.c | 4555 |
log_append_repl_info | log_manager.c | 4623 |
log_append_repl_info_with_lock | log_manager.c | 4629 |
log_append_repl_info_and_commit_log | log_manager.c | 4647 |
locator_insert_force | locator_sr.c | 4938 |
locator_update_force | locator_sr.c | 5396 |
locator_delete_force | locator_sr.c | 6116 |
locator_attribute_info_force | locator_sr.c | 7461 |
locator_add_or_remove_index | locator_sr.c | 7695 |
locator_update_index | locator_sr.c | 8260 |
xlogwr_get_log_pages | log_writer.c | 2571 |
logwr_initialize | log_writer.c | 428 |
logwr_set_hdr_and_flush_info | log_writer.c | 639 |
logwr_writev_append_pages | log_writer.c | 838 |
logwr_flush_all_append_pages | log_writer.c | 1016 |
logwr_flush_header_page | log_writer.c | 1207 |
logwr_archive_active_log | log_writer.c | 1275 |
logwr_write_log_pages | log_writer.c | 1512 |
logwr_copy_log_file | log_writer.c | 1659/1960 |
LA_RETRY_ON_ERROR (macro) | log_applier.h | 34 |
REPL_FILTER_TYPE (enum) | log_applier.h | 48 |
LA_CACHE_BUFFER/LA_CACHE_PB | log_applier.c | 177-204 |
LA_REPL_FILTER | log_applier.c | 206 |
LA_ITEM | log_applier.c | 236 |
LA_APPLY | log_applier.c | 254 |
LA_COMMIT | log_applier.c | 266 |
LA_INFO | log_applier.c | 279 |
LA_HA_APPLY_INFO | log_applier.c | 393 |
la_init_ha_apply_info | log_applier.c | 606 |
la_get_page_buffer | log_applier.c | 1297 |
la_get_ha_apply_info | log_applier.c | 1514 |
la_init_recdes_pool | log_applier.c | 2416 |
la_init_cache_pb | log_applier.c | 2474 |
la_init_cache_log_buffer | log_applier.c | 2528 |
la_init_repl_lists | log_applier.c | 2773 |
la_find_apply_list | log_applier.c | 2860 |
la_log_copy_fromlog | log_applier.c | 2960 |
la_add_repl_item | log_applier.c | 3050 |
la_make_repl_item | log_applier.c | 3092 |
la_set_repl_log | log_applier.c | 3419 |
la_add_node_into_la_commit_list | log_applier.c | 3473 |
la_get_log_data | log_applier.c | 3949 |
la_get_overflow_recdes | log_applier.c | 4249 |
la_get_next_update_log | log_applier.c | 4393 |
la_get_relocation_recdes | log_applier.c | 4552 |
la_get_recdes | log_applier.c | 4604 |
la_repl_add_object | log_applier.c | 4882 |
la_apply_delete_log | log_applier.c | 5000 |
la_apply_update_log | log_applier.c | 5110 |
la_apply_insert_log | log_applier.c | 5311 |
la_apply_statement_log | log_applier.c | 5496 |
la_apply_repl_log | log_applier.c | 5739 |
la_apply_commit_list | log_applier.c | 5920 |
la_log_record_process | log_applier.c | 6101 |
la_change_state | log_applier.c | 6397 |
la_log_commit | log_applier.c | 6531 |
la_init | log_applier.c | 6917 |
la_apply_log_file | log_applier.c | 8074 |
Cross-check Notes
Section titled “Cross-check Notes”The raw deck (HA replication.pdf / .pptx) was authored against
an earlier branch. Most of what it shows is still accurate against
the 11.5.x source under /data/hgryoo/references/cubrid —
verifying each major claim against the source as of updated: was
straightforward. The drift points are recorded here.
-
repl_log_insertsignature is unchanged. The deck showstdes->repl_records[ ],tdes->num_repl_records,tdes->cur_repl_record,default size = 100,tdes->must_flush = LOG_REPL_NEED_FLUSH. All five names are present inlog_impl.h:522-528andreplication.c:293. The only refinement modern code adds istdes->fl_mark_repl_recidx(log_impl.h:525) and theRVREPL_DATA_UPDATE_START/RVREPL_DATA_UPDATE_ENDsub-kinds (recovery.h:152-154) plus thetde_encryptedfield on the struct itself (replication.h:88). The deck’sLOG_REPL_RECORDenumeration predates these additions. -
The
LOG_REPL_RECORD::repl_datalayout in the deck —| pkey size | class_name | pkey dbvalue |— matches current source. Verified atreplication.c:411-419: the function reserves a leadingOR_INT_SIZEforpacked_key_value_size, then or-packsclass_name, then or-packs the pkeyDB_VALUE, then back-fills the leading int with the actual packed-key byte length. -
Commit-time emission goes through
log_append_repl_info_*. The deck’sLog_commit() → Log_commit_local() → Log_append_repl_info_and_commit_log() → Log_append_repl_info() → Log_append_repl_info_with_lock() → Log_append_repl_info_internal()sequence matches the current source’slog_commit → log_append_repl_info_and_commit_log → log_append_repl_info_with_lock → log_append_repl_info_internal. The deck splits the with-lock and without-lock variants at the caller level; the source does the same. -
The atomic
repl_info + commit_logidiom. The deck does not call out the atomicity but the current source documents it explicitly (log_manager.c:4642-4645): “Atomic write of replication log and commit log is crucial for replication consistencies. When a commit log of others is written in the middle of one’s replication and commit log, a restart of replication will break consistencies of slaves/replicas.” -
copylogdbrequest format matches. The deck describes the request as a(ctx_ptr->last_error, mode, First_pageid_torecv)tuple sent underNET_SERVER_LOGWR_GET_LOG_PAGES. The current source hasxlogwr_get_log_pagesatlog_writer.c:2571taking(THREAD_ENTRY*, LOG_PAGEID first_pageid, LOGWR_MODE mode)and the slave-sidelogwr_copy_log_fileatlog_writer.c:1659/1960driving the request loop. The buffer-size constantLOGWR_COPY_LOG_BUFFER_NPAGES = 128is unchanged. -
applylogdbper-record dispatch inla_log_record_processmatches the deck. Verified atlog_applier.c:6101. The switch arms —LOG_END_OF_LOG,LOG_REPLICATION_DATA/LOG_REPLICATION_STATEMENT,LOG_SYSOP_END/LOG_COMMIT,LOG_ABORT,LOG_DUMMY_CRASH_RECOVERY,LOG_END_CHKPT,LOG_DUMMY_HA_SERVER_STATE— are all present. The deck lists fewer arms because it focuses on the REPL + COMMIT path. -
la_apply_repl_logdispatch table is the same shape. Verified atlog_applier.c:5797-5826. The deck showsRVREPL_DATA_INSERT → la_apply_insert_log,RVREPL_DATA_UPDATE → la_apply_update_log,RVREPL_DATA_DELETE → la_apply_delete_log. The current source also dispatchesRVREPL_DATA_UPDATE_STARTandRVREPL_DATA_UPDATE_ENDtola_apply_update_log(the START/END brackets on row-based-replication boundaries). The deck does not mention these sub-kinds. -
The five-case
la_get_recdesmatches. Verified atlog_applier.c:4604+. Cases 1-5 — normal, RVOVF_CHANGE_LINK, REC_BIGONE, RVHF_INSERT + REC_ASSIGN_ADDRESS, and (RVHF_UPDATE | RVHF_UPDATE_NOTIFY_VACUUM) + REC_RELOCATION — are all present. -
LA_RETRY_ON_ERRORis now broader. The deck does not enumerate the error mask. Current source (log_applier.h:34-46) lists ER_LK_UNILATERALLY_ABORTED, three flavors of ER_LK_OBJECT_TIMEOUT, ER_LK_PAGE_TIMEOUT, two flavors of ER_PAGE_LATCH_*, three flavors of ER_LK_OBJECT_DL_TIMEOUT, ER_TDE_CIPHER_IS_NOT_LOADED, and ER_LK_DEADLOCK_CYCLE_DETECTED — twelve codes total. TDE is the most recent addition. -
REPL_FILTER_TYPEand table-level filtering. The deck does not show the filter; current source has it inlog_applier.h:48-53(NONE,INCLUDE_TBL,EXCLUDE_TBL) withLA_REPL_FILTERconsumed byla_need_filter_outinsidela_apply_repl_log(log_applier.c:5797). The filter is evaluated per item on the slave at apply time, not on the master at emission time. This means filtered events still cost ala_get_recdeswalk on the slave; only the finalla_repl_add_objectis skipped. -
MVCC injection on the slave. The deck does not cover this.
la_make_room_for_mvcc_insidandla_make_room_for_mvcc_delid_and_prev_ver(log_applier.c, declared near 503-504) reserve space in the reconstructed RECDES so the slave server’s MVCC layer can stamp the slave’s own MVCCID at apply time — the master’s MVCC IDs are not copied; the slave generates fresh ones. -
la_log_record_processhandlesLOG_DUMMY_HA_SERVER_STATEfor role-change detection. Verified atlog_applier.c:6292+. Whenha_server_state->stateis notHA_SERVER_STATE_ACTIVEand notHA_SERVER_STATE_TO_BE_STANDBY, the daemon setsis_role_changed = trueand returnsER_INTERRUPTEDso the caller can shut the daemon down cleanly. The deck does not cover this path but it is the mechanism by which a master-to-slave demotion (driven bycub_masterheartbeat failover, seecubrid-heartbeat.md) propagates into the applier. -
is_long_transoverflow handling is unchanged. Verified atlog_applier.c:3437-3443: whenapply->num_items >= LA_MAX_REPL_ITEMS(1000), the daemon frees all items except the head, setsis_long_trans = true, and starts tracking onlylast_lsa. Apply for such a trid will re-walk the log betweenstart_lsaandlast_lsa. The deck does not surface this but the constants and branch are present in current source. -
TDE on the master-side staging entry. The deck does not cover TDE. Current
LOG_REPL_RECORD::tde_encrypted(replication.h:88) is set inrepl_log_insertbased onheap_get_class_tde_algorithm; on the prior-list emission side,prior_set_tde_encryptedis called when the flag is true (log_manager.c:4585-4592). On the slave, the daemon’sla_load_tde(andlogwr_load_tdeon the copy side) handles the symmetric decrypt.
Open Questions
Section titled “Open Questions”-
Synchronous replication mode. The model section calls out sync vs. async; current source ships only the async path.
LOGWR_MODE(passed tologwr_copy_log_fileandxlogwr_get_log_pages) is enumerated and the deck showsmodein the request. Are there sync values, or only async? Investigation: read theLOGWR_MODEenum and grep for non-LOGWR_MODE_ASYNCwriters. -
fl_mark_repl_recidxsemantics for DDL. Therepl_start_flush_mark/repl_end_flush_markpair setsfl_mark_repl_recidxto bracket DDL records. The intent is that records inside the bracket carrymust_flush = LOG_REPL_NEED_FLUSHso they emit even on rollback (DDL is non-transactional in CUBRID for replication purposes). What guarantees this is robust against nested DDL and partial rollback? Investigation path: read themust_flushwriters and their interaction withrepl_log_abort_after_lsa. -
The
repl_lists[]bucket size.LA_INFO::repl_cntis the number of buckets; the deck does not specify how it is sized, andla_init_repl_listsshows a realloc-on-demand pattern. What is the initial cap and what triggers regrow? Investigation path: readla_init_repl_lists(log_applier.c:2773) andla_add_apply_list. -
Long-transaction re-walk performance. When a trid trips
is_long_trans,la_get_next_repl_item_from_logwalks the log forward looking for the next REPL record for the same trid. The cost is O(records-since-start-lsa) per item. What limits this from becoming O(N²) for huge transactions? Investigation path: readla_get_next_repl_item_from_logand measure on a synthetic million-row update. -
TDE key sharing between
copylogdbandapplylogdb. TheUNSTABLE_TDE_FOR_REPLICATION_LOGguard inlog_applier.c(lines 350-352) shows a unix-socket protocol betweencopylogdband the apply side for sharing TDE data keys. The “unstable” name suggests this is not production. Is TDE + replication actually supported, or is it an internal-only feature flag? Investigation path: search for the symbol in release notes and CMake feature flags. -
logpb_get_guess_archive_numworst-case behavior. When thelogpb_arv_page_info_tablecache is cold, the master’s archive lookup falls back to estimating + scanning. On a node with thousands of archives, what is the worst-case latency? Investigation path: read the function body and measure with a pre-built archive set. -
The
_db_ha_apply_inforow’s recovery semantics. On slave-server crash mid-apply, the row holds the last acknowledgedcommitted_rep_lsa; on restart the daemon re-walks from that point. Butla_log_commitupdates the row transactionally on the slave server, and that server’s own recovery may roll the row back. What happens if the slave server crashes betweenla_repl_add_objectandla_log_commit? Are records re-applied (and idempotent against PK uniqueness), or is there a separate per-item ack? Investigation path: readla_log_commit(log_applier.c:6531) and trace the slave- server’s recovery interaction. -
Statement-based replication and non-determinism. The
LOG_REPLICATION_STATEMENTpath replays SQL text viala_apply_statement_log. CUBRID does not block non-deterministic functions (NOW(),RAND()) at master emission time. What prevents drift between master and slave on such statements? Investigation path: readla_apply_statement_log(log_applier.c:5496) and check for pre-bound parameter substitution. -
Filter race on consumer reconfigure. The
LA_REPL_FILTERis loaded byla_create_repl_filterat daemon start and consulted byla_need_filter_outper item. If an operator changes the filter list while the daemon is running, when does the new filter take effect? Investigation path: readla_create_repl_filterand check for SIGHUP handlers. -
Replica vs. slave distinction. The heartbeat module distinguishes
HB_NSTATE_SLAVEfromHB_NSTATE_REPLICA; both runapplylogdb, but a replica can never become master. Does the apply path differ between slave and replica, or are the two roles purely about the cluster-side FSM? Investigation path: cross-referencecubrid-heartbeat.md’sHB_NSTATE_REPLICAwith apply-side branches (la_check_replica_info, etc.).
Sources
Section titled “Sources”Raw analyses
Section titled “Raw analyses”raw/code-analysis/cubrid/distributed/HA replication.pdf— the PDF render of the deck.raw/code-analysis/cubrid/distributed/HA replication.pptx— the source slide deck.raw/code-analysis/cubrid/distributed/_converted/ha-replication.pdf.md— pdftotext extract of the PDF.raw/code-analysis/cubrid/distributed/_converted/ha-replication.pptx.md— markitdown extract of the PPTX.
Sibling docs in this knowledge base
Section titled “Sibling docs in this knowledge base”knowledge/code-analysis/cubrid/cubrid-log-manager.md— the WAL machinery the master emits into and the slave’sapplylogdbwalks.knowledge/code-analysis/cubrid/cubrid-cdc.md— the modern pull-style alternative; shareslog_record.hpptypes and thela_apply_*legacy code path.knowledge/code-analysis/cubrid/cubrid-heartbeat.md— thecub_mastercluster FSM that supervisescopylogdbandapplylogdband triggers the role changes the apply daemon detects viaLOG_DUMMY_HA_SERVER_STATE.knowledge/code-analysis/cubrid/cubrid-recovery-manager.md— the master-side analysis/redo/undo passes share record decoding andlog_readerinfrastructure withapplylogdb.
Textbook chapters
Section titled “Textbook chapters”- Designing Data-Intensive Applications (Kleppmann), Ch. 5 “Replication” — primary/standby, sync vs async, statement vs row vs WAL shipping.
- Database Internals (Petrov), Ch. 13 “Replication” — leader- follower, fail-over and consistency guarantees, log shipping.
- Database System Concepts (Silberschatz, Korth, Sudarshan), Ch. 19 “Recovery System” + Ch. 23 “Distributed Databases” — replication consistency models, distributed commit, recovery on replicas.
CUBRID source (/data/hgryoo/references/cubrid/)
Section titled “CUBRID source (/data/hgryoo/references/cubrid/)”src/transaction/replication.c/replication.h— the master- side staging primitives.src/transaction/log_manager.c/log_manager.h—log_append_repl_info_*family.src/transaction/log_record.hpp—LOG_REPLICATION_DATAandLOG_REPLICATION_STATEMENTrecord-type enum entries.src/transaction/recovery.h—RVREPL_DATA_*recovery indices.src/transaction/log_impl.h—LOG_TDESreplication fields.src/transaction/log_writer.c/log_writer.h— the master-sidexlogwr_*server endpoint and the slave-sidelogwr_*daemon (i.e.,copylogdb).src/transaction/log_applier.c/log_applier.h— the slave-sidela_*daemon (i.e.,applylogdb).src/storage/heap_file.c—heap_*_logical/heap_log_*_physicalemission sites.src/storage/btree.c—btree_update/btree_insertindex side, whererepl_log_insertis called for index ops.src/transaction/locator_sr.c—locator_*_forceandlocator_attribute_info_force, the upstream entry points that drive both the heap log and the replication staging.src/executables/util_service.c—cubrid hb start/cubrid heartbeat startwirescub_master,copylogdb, andapplylogdbtogether.src/connection/heartbeat.c— the process-sidehb_register_to_masterinvoked by both daemons on startup.