CUBRID Checkpoint — Fuzzy ARIES Checkpoint Protocol, Active-TX Snapshot, and Recovery Anchor LSA

Contents:

Theoretical Background
Common DBMS Design
CUBRID’s Approach
Source Walkthrough
Cross-check Notes
Open Questions
Sources

Theoretical Background

A checkpoint is the database engine’s negotiated truce with the recovery manager: a periodic record in the write-ahead log whose position bounds the work the next restart will have to do. Without checkpoints, the analysis pass at restart would have to walk the entire log from the beginning of time — every record ever written — because it cannot otherwise know which transactions were active when the engine crashed and which dirty pages may not have reached their home volumes. With checkpoints, the engine periodically commits a piece of self-knowledge to the log: “as of this LSA, here is the set of in-flight transactions and here is the smallest LSA whose data page is still dirty in memory; recovery may safely begin its analysis pass from that LSA.” Database Internals (Petrov, ch. 5 §“Recovery”) frames it as the fundamental tool that makes restart time bounded rather than proportional to engine uptime.

The contract every checkpoint protocol must satisfy is laid out in the ARIES paper (Mohan et al., TODS 17.1, 1992). The paper distinguishes two checkpoint families on a single axis — how much of the engine has to stop while the checkpoint runs.

Sharp / consistent / quiescent checkpoints. Drain all in-flight transactions, flush every dirty page, then write one checkpoint record. Captured state is consistent. Cost: every checkpoint freezes user transactions for the duration of the buffer-pool flush. Bernstein/Hadzilacos/Goodman (ch. 6 §“Checkpointing”) describe this as the textbook variant; few production engines use it because the throughput hit is unacceptable.
Fuzzy / non-blocking checkpoints. Take a snapshot of the active-transaction set and the dirty-page set without freezing them, write it to the log between two bracketing records (begin-CHKPT and end-CHKPT in ARIES parlance), and continue serving traffic. Captured state is internally inconsistent — between the bracket records, transactions commit, abort, and dirty new pages — but ARIES proves the analysis pass tolerates this by treating any record in the window the same as a record after end-CHKPT.

CUBRID picks the fuzzy variant. ARIES frames correctness around three properties:

Coverage. Every transaction active at end-CHKPT must either appear in the snapshot or have its activity visible in records after end-CHKPT. ARIES guarantees this by walking the trantable under a read-mode lock.
Anchoring. The redo-LSA hint must be a lower bound on the LSA of any dirty page. Anything below is provably on disk; anything above may or may not be.
Restartability. A crash during checkpoint must not corrupt the recovery boundary. ARIES makes the checkpoint entirely log-resident — the in-memory pointer (chkpt_lsa) advances only after both bracket records are durable.

The two-step (begin-CHKPT / end-CHKPT) pattern is the canonical ARIES expression of these properties. The rest of this document is the slow zoom into how CUBRID realises it.

Common DBMS Design

Almost every WAL-based engine implements a fuzzy checkpoint protocol; the names differ but the mechanics are remarkably stable. This section names the shared engineering vocabulary so that the CUBRID-specific symbols in ## CUBRID's Approach slot into a familiar shape.

Periodic timer or threshold trigger

The checkpoint is launched either on a wall-clock timer or when the log has consumed a configured amount of space since the previous checkpoint. The two policies are usually combined. Time-based triggering bounds recovery time after long quiet periods; space-based triggering bounds it after busy periods. PostgreSQL exposes checkpoint_timeout + max_wal_size; CUBRID exposes log_checkpoint_interval (time) and log_checkpoint_size (log-page count).

Active-transaction snapshot

The checkpoint emits one entry per in-flight transaction with enough state to re-bootstrap recovery’s transaction table from the checkpoint record alone. The minimum is (trid, state, head_lsa, tail_lsa); state distinguishes losers (active, will be undone) from in-doubts (2PC-prepared, kept alive) from committed-with-postpone (need postpones replayed). Most engines also capture undo_nxlsa and savepoint LSAs.

Dirty-page snapshot — or the redo-LSA hint that subsumes it

Two design points. (a) Some engines emit the entire dirty-page table — (volid, pageid, recovery_lsa) triples — so recovery reconstructs the DPT directly. ARIES paper. Downside: a busy engine emits a multi-megabyte record. (b) Most modern engines compress the DPT into a single redo-LSA scalar (smallest recovery_lsa). Recovery rebuilds the DPT inline by walking forward from this LSA. CUBRID takes the compressed approach: LOG_REC_CHKPT.redo_lsa is a single LSA.

Bracketing records — begin-CHKPT and end-CHKPT

The checkpoint emits begin-CHKPT (marks the LSA next analysis starts from) and end-CHKPT (carries the snapshot). Between them ordinary transactions continue producing log records; ARIES proves that bracket-window records are correctly handled by analysis as if they were after end-CHKPT.

Page-buffer flush coordination & header durability

The checkpoint drives enough dirty pages to disk that the next redo-LSA can advance — otherwise the redo-LSA stays pinned and recovery work keeps growing. PostgreSQL’s CheckPointBuffers, InnoDB’s flush-list worker, and CUBRID’s pgbuf_flush_checkpoint all share this role. Recovery starts by reading the active-log header to find the checkpoint LSA; the header therefore carries a chkpt_lsa field that advances only after the checkpoint record is durable.

Comparative landscape

Engine	Header / pointer	Trigger	Dirty-page hint
PostgreSQL	`pg_control.checkPoint`	`checkpoint_timeout` + `max_wal_size`	redoLSN scalar; runningXacts side record
InnoDB	log-file header	log-bytes-since-last + dirty-pages-pct	`oldest_modification` per page tracker
Oracle	controlfile `SCN`	redo-log size + manual checkpoint	Mean Time To Recover targets
SQL Server	bootpage `dbi_checkptLSN`	recovery-interval target	dirty-page LSN watermark
CUBRID	`log_Gl.hdr.chkpt_lsa`	`log_checkpoint_interval` (timer)	`LOG_REC_CHKPT.redo_lsa` scalar

All entries are fuzzy. CUBRID sits in the mainstream; its distinctive choices are the explicit dual bracket records (faithful to the ARIES paper) and a timer-only trigger.

Theory ↔ CUBRID mapping

Theoretical concept	CUBRID name
Fuzzy checkpoint daemon	`log_Checkpoint_daemon` (created by `log_checkpoint_daemon_init`)
Daemon period (timer)	`log_get_checkpoint_interval` reading `PRM_ID_LOG_CHECKPOINT_INTERVAL_SECS`
Daemon body	`log_checkpoint_execute` → `logpb_checkpoint`
Begin-CHKPT log record	`LOG_START_CHKPT` (= 25)
End-CHKPT log record	`LOG_END_CHKPT` (= 26)
End-CHKPT payload	`LOG_REC_CHKPT { redo_lsa, ntrans, ntops }`
Per-tran snapshot row	`LOG_INFO_CHKPT_TRANS { trid, state, head_lsa, tail_lsa, undo_nxlsa, posp_nxlsa, savept_lsa, ... }`
Per-sysop snapshot row	`LOG_INFO_CHKPT_SYSOP { trid, sysop_start_postpone_lsa, atomic_sysop_start_lsa }`
Recovery anchor LSA in log header	`log_Gl.hdr.chkpt_lsa`
In-memory copy of last redo-LSA	`log_Gl.chkpt_redo_lsa`
Watermark for archive removal	`log_Gl.hdr.smallest_lsa_at_last_chkpt`
Mutex protecting `chkpt_lsa` & `chkpt_redo_lsa`	`log_Gl.chkpt_lsa_lock`
Active-TX walk	`for i in trantable.num_total_indices: logpb_checkpoint_trans(...)`
Active-sysop walk	`logpb_checkpoint_topops` (called twice — user trans and system trans)
Page-buffer flush helper	`pgbuf_flush_checkpoint (flush_upto, prev_redo, out_redo, out_npages)`
Force log header to disk	`logpb_flush_header`
Force WAL pages to disk	`logpb_flush_pages_direct`
File-system-level fsync of all volumes	`fileio_synchronize_all` (DWB cooperates inside)
Restart entry that consumes the checkpoint	`log_recovery → log_recovery_analysis (start_lsa = log_Gl.hdr.chkpt_lsa)`
Per-record analysis arms for chkpt records	`log_rv_analysis_start_checkpoint`, `log_rv_analysis_end_checkpoint`
`LOG_ISCHECKPOINT_TIME` macro	Page-count-based predicate (legacy path) in `log_manager.c`

CUBRID’s Approach

CUBRID’s checkpoint subsystem has six moving parts: the daemon registration that wakes the checkpoint thread on a timer; the preflight phase that flushes the existing dirty pages and emits LOG_START_CHKPT; the active-transaction snapshot captured by walking the trantable under a read-mode lock; the active-sysop snapshot for nested top-operations in commit-postpone; the end-CHKPT emission that packs the snapshot into LOG_REC_CHKPT plus its trailing arrays; and finally the header-update + fsync that publishes the new chkpt_lsa so the next restart anchors on it. We walk them in that order, then close with the recovery-side view (how log_recovery_analysis consumes what was written) and the cooperation with the double-write buffer.

Daemon registration

log_Checkpoint_daemon is a cubthread::daemon declared at file scope in log_manager.c:

// log_Checkpoint_daemon — src/transaction/log_manager.c
static cubthread::daemon *log_Checkpoint_daemon = NULL;

It is created by log_checkpoint_daemon_init, which the global log-daemon bootstrap (log_daemons_init) calls during server start. The daemon’s tick period comes from the PRM_ID_LOG_CHECKPOINT_INTERVAL_SECS system parameter; the looper binds a callback that re-reads the parameter on every tick so the period can be retuned at runtime via db_change_active_log_arg-style calls.

// log_checkpoint_daemon_init — src/transaction/log_manager.c
REGISTER_DAEMON (log_checkpoint);

void
log_checkpoint_daemon_init ()
{
  assert (log_Checkpoint_daemon == NULL);

  cubthread::looper looper = cubthread::looper (log_get_checkpoint_interval);
  cubthread::entry_callable_task *daemon_task =
      new cubthread::entry_callable_task (log_checkpoint_execute);

  log_Checkpoint_daemon =
      cubthread::get_manager ()->create_daemon (looper, daemon_task, "log-checkpoint");
}

// log_get_checkpoint_interval — src/transaction/log_manager.c
void
log_get_checkpoint_interval (bool & is_timed_wait, cubthread::delta_time & period)
{
  int log_checkpoint_interval_sec = prm_get_integer_value (PRM_ID_LOG_CHECKPOINT_INTERVAL_SECS);
  assert (log_checkpoint_interval_sec >= 0);

  if (log_checkpoint_interval_sec > 0)
    {
      is_timed_wait = true;
      period = std::chrono::seconds (log_checkpoint_interval_sec);
    }
  else
    {
      // infinite wait — checkpoint disabled until someone wakes it explicitly
      is_timed_wait = false;
    }
}

The default is 360 seconds (6 minutes), observable in system_parameter.c where PRM_ID_LOG_CHECKPOINT_INTERVAL_SECS is declared with default 360. Setting the parameter to zero disables the timer entirely; the daemon will then wait indefinitely until something calls log_wakeup_checkpoint_daemon. The user-facing wakeup is the path used by ad-hoc cubrid_check requests and by the (void) logpb_checkpoint (thread_p) call inside log_recovery after restart finishes — the engine takes a fresh checkpoint as the very last act of restart so the next crash starts from a clean boundary.

The daemon body is a thin wrapper:

// log_checkpoint_execute — src/transaction/log_manager.c
static void
log_checkpoint_execute (cubthread::entry & thread_ref)
{
  if (!BO_IS_SERVER_RESTARTED ())
    {
      // wait for boot to finish — if we ran during analysis the trantable
      // would not be populated yet
      return;
    }

  logpb_checkpoint (&thread_ref);
}

BO_IS_SERVER_RESTARTED is the boolean that flips when boot has fully completed, including the three-pass restart-recovery (analysis → redo → undo) plus all postpone replays. Until that flag is true the daemon refuses to run — taking a checkpoint mid-restart would record a transaction table that is itself being mutated by recovery.

A legacy page-count trigger is preserved as a macro:

// LOG_ISCHECKPOINT_TIME — src/transaction/log_manager.c
#define LOG_ISCHECKPOINT_TIME() \
  (log_Gl.rcv_phase == LOG_RESTARTED \
   && log_Gl.run_nxchkpt_atpageid != NULL_PAGEID \
   && log_Gl.hdr.append_lsa.pageid >= log_Gl.run_nxchkpt_atpageid)

run_nxchkpt_atpageid is a per-process counter that logpb_checkpoint advances by chkpt_every_npages (default 100000 log pages, clamped to ≥ PRM_ID_LOG_NBUFFERS) every time it finishes. The intent was for log appenders to poll on each append and trigger an inline checkpoint; in the modern code path the daemon-driven timer dominates and the macro is defensive belt-and-braces.

Top-level flow — `logpb_checkpoint`

sequenceDiagram
  participant T as log-checkpoint daemon
  participant CHK as logpb_checkpoint
  participant LOG as log append
  participant TT as trantable
  participant PB as page buffer
  participant DWB as DWB
  participant FS as filesystem
  participant HDR as log header page
  Note over T: timer tick (PRM_ID_LOG_CHECKPOINT_INTERVAL_SECS)
  T->>CHK: log_checkpoint_execute
  CHK->>LOG: logpb_flush_pages_direct (drain in-flight WAL)
  CHK->>LOG: prior_lsa_alloc_and_copy_data(LOG_START_CHKPT)
  Note right of CHK: newchkpt_lsa = LSA of begin record
  CHK->>PB: pgbuf_flush_checkpoint(newchkpt_lsa, prev_redo, &out_redo, &nflushed)
  PB->>DWB: dwb_flush_force (page-by-page)
  DWB->>FS: fsync DWB volume
  DWB->>FS: write home pages
  PB-->>CHK: tmp_chkpt.redo_lsa = smallest dirty LSA remaining
  CHK->>FS: fileio_synchronize_all (force all data volumes)
  CHK->>TT: TR_TABLE_CS_ENTER (read mode)
  CHK->>TT: walk trantable, build LOG_INFO_CHKPT_TRANS[]
  CHK->>TT: walk trantable, build LOG_INFO_CHKPT_SYSOP[]
  CHK->>LOG: prior_lsa_alloc_and_copy_data(LOG_END_CHKPT, packed_arrays)
  CHK->>TT: TR_TABLE_CS_EXIT
  CHK->>LOG: logpb_flush_pages_direct (force end record)
  CHK->>HDR: log_Gl.hdr.chkpt_lsa = newchkpt_lsa
  CHK->>HDR: log_Gl.hdr.smallest_lsa_at_last_chkpt = smallest_head_lsa
  CHK->>LOG: logpb_flush_header
  CHK->>FS: write checkpoint LSA into every volume's disk header

The numbered structure of logpb_checkpoint (spans roughly log_page_buffer.c:6877 to :7300):

LOG_CS_ENTER. Exclusive on the log structure (the prior-list mutex sits inside).
Refuse if recovery hasn’t finished. if (BO_IS_SERVER_RESTARTED () && log_Gl.run_nxchkpt_atpageid == NULL_PAGEID) return; — the NULL_PAGEID sentinel also serves as the “only one checkpoint at a time” guard.
Snapshot previous chkpt_lsa under chkpt_lsa_lock so readers like log_get_db_start_parameters don’t take the log CS.
Drain in-flight WAL via logpb_flush_pages_direct — every record before begin-CHKPT must be durable.
Emit LOG_START_CHKPT. A bare record; its LSA is captured as newchkpt_lsa, the next restart’s analysis start.
logtb_reflect_global_unique_stats_to_btree. CUBRID’s btrees keep cached unique counters that must be flushed to the catalog btree before redo-LSA advances past them.
pgbuf_flush_checkpoint(newchkpt_lsa, …). Picks every BCB whose oldest_unflush_lsa <= newchkpt_lsa and flushes them through the DWB. The smallest remaining LSA is returned in tmp_chkpt.redo_lsa — the redo-LSA hint.
fileio_synchronize_all. fsync(2) every data volume.
Walk trantable under TR_TABLE_CS_ENTER (read). Iterate 0..trantable.num_total_indices; skip the system transaction (checkpoint runs as it). Each live LOG_TDES becomes one logpb_checkpoint_trans row.
Walk again for active sysops in TRAN_UNACTIVE_TOPOPE_COMMITTED_WITH_POSTPONE via logpb_checkpoint_topops.
Emit LOG_END_CHKPT. prior_lsa_alloc_and_copy_data with LOG_REC_CHKPT as data_header and the two trailing arrays.
Force the end record via another logpb_flush_pages_direct.
Update in-memory pointers under chkpt_lsa_lock: log_Gl.hdr.chkpt_lsa = newchkpt_lsa, log_Gl.chkpt_redo_lsa = tmp_chkpt.redo_lsa, log_Gl.hdr.smallest_lsa_at_last_chkpt ← smallest tran-head_lsa (archive-removal watermark).
logpb_flush_header. Writes the active-log header page — after this fsync the boundary is durable.
Stamp every volume’s disk header for media-recovery (read by log_rv_find_checkpoint during restart from a backup).

The two-phase commit pattern of the checkpoint itself is worth naming: the checkpoint becomes effective when the log header is durable, not when the end-CHKPT record is durable. A crash between step 12 and step 14 leaves the log with both checkpoint records present but the in-memory header pointer not yet on disk; the next restart reads the on-disk header, finds the previous chkpt_lsa, and runs analysis from there. The new bracket records appear during that analysis and get treated as ordinary records; because they don’t carry information the analysis needs (the previous checkpoint already covered everything they would say plus more), they are effectively ignored — the log_rv_analysis_end_checkpoint arm short-circuits when *may_use_checkpoint == false, which is the case if the start record’s LSA didn’t match the analysis start LSA. Correctness is preserved by paying the cost of redoing one checkpoint window.

The active-transaction snapshot — `logpb_checkpoint_trans`

The per-TDES extractor is short enough to read in full:

// logpb_checkpoint_trans — src/transaction/log_page_buffer.c
void
logpb_checkpoint_trans (LOG_INFO_CHKPT_TRANS * chkpt_entries, log_tdes * tdes,
                        int &ntrans, int &ntops, LOG_LSA & smallest_lsa)
{
  LOG_INFO_CHKPT_TRANS *chkpt_entry = &chkpt_entries[ntrans];

  if (tdes != NULL && tdes->trid != NULL_TRANID
      && !tdes->tail_lsa.is_null () && tdes->commit_abort_lsa.is_null ())
    {
      chkpt_entry->isloose_end = tdes->isloose_end;
      chkpt_entry->trid = tdes->trid;
      chkpt_entry->state = tdes->state;
      LSA_COPY (&chkpt_entry->head_lsa, &tdes->head_lsa);
      LSA_COPY (&chkpt_entry->tail_lsa, &tdes->tail_lsa);

      if (chkpt_entry->state == TRAN_UNACTIVE_ABORTED)
        {
          /* Transaction is in the middle of an abort, since rollback is
           * not run in a critical section. Set the undo point to be the
           * same as its tail. The recovery process will read the last
           * record which is likely a compensating one, and find where to
           * continue a rollback operation. */
          LSA_COPY (&chkpt_entry->undo_nxlsa, &tdes->tail_lsa);
        }
      else
        {
          LSA_COPY (&chkpt_entry->undo_nxlsa, &tdes->undo_nxlsa);
        }

      LSA_COPY (&chkpt_entry->posp_nxlsa, &tdes->posp_nxlsa);
      LSA_COPY (&chkpt_entry->savept_lsa, &tdes->savept_lsa);
      LSA_COPY (&chkpt_entry->tail_topresult_lsa, &tdes->tail_topresult_lsa);
      LSA_COPY (&chkpt_entry->start_postpone_lsa, &tdes->rcv.tran_start_postpone_lsa);
      strncpy (chkpt_entry->user_name, tdes->client.get_db_user (), LOG_USERNAME_MAX);
      ntrans++;

      if (tdes->topops.last >= 0
          && (tdes->state == TRAN_UNACTIVE_TOPOPE_COMMITTED_WITH_POSTPONE))
        {
          ntops += tdes->topops.last + 1;
        }

      if (LSA_ISNULL (&smallest_lsa) || LSA_GT (&smallest_lsa, &tdes->head_lsa))
        {
          LSA_COPY (&smallest_lsa, &tdes->head_lsa);
        }
    }
}

Three things deserve marking up. (a) The eligibility test filters out three categories: null trantable slots, transactions that haven’t logged anything yet (tail_lsa.is_null ()), and transactions that have already appended their commit/abort record (commit_abort_lsa.is_null () is false). The third test is notable: a transaction whose commit record is in the prior list but not yet drained still counts as in-flight from the trantable’s perspective, but its commit record being already appended means recovery will see it and resolve it without needing the snapshot entry. The condition is “the actual transaction state is ignored by the checkpoint mechanism as long as either the commit or the abort log records have been appended” — the source comment is explicit. (b) The TRAN_UNACTIVE_ABORTED path forces undo_nxlsa = tail_lsa. Rollback in CUBRID does not hold a critical section, so a checkpoint can land mid-rollback; the snapshot must be a position recovery can resume from, and the tail of the chain (most likely a CLR for the last completed undo step) is the safe rendezvous. (c) The smallest_lsa accumulator is the watermark used outside the loop to update log_Gl.hdr.smallest_lsa_at_last_chkpt. This is not the redo-LSA used by the analysis pass; it is the archive-retention watermark — no log archive whose pages are all below this LSA is needed for crash recovery, so archive removal is gated on it.

The active-sysop snapshot — `logpb_checkpoint_topops`

Sysops (CUBRID’s nested top-operations, equivalent to ARIES’s mini-transactions) only need to be snapshotted when they have side-effects that recovery’s postpone pass would replay. The extractor’s eligibility test reads:

// logpb_checkpoint_topops — src/transaction/log_page_buffer.c (excerpt)
if (tdes != NULL && tdes->trid != NULL_TRANID
    && (!LSA_ISNULL (&tdes->rcv.sysop_start_postpone_lsa)
        || !LSA_ISNULL (&tdes->rcv.atomic_sysop_start_lsa)))
  {
    /* this transaction is running system operation postpone or an
     * atomic system operation
     * note: we cannot compare tdes->state with
     *       TRAN_UNACTIVE_TOPOPE_COMMITTED_WITH_POSTPONE. we are
     *       not synchronizing setting transaction state.
     *       however, setting tdes->rcv.sysop_start_postpone_lsa is
     *       protected by log_Gl.prior_info.prior_lsa_mutex. so we
     *       check this instead of state. */
    ...
    LOG_INFO_CHKPT_SYSOP *chkpt_topop = &chkpt_topops[ntops];
    chkpt_topop->trid = tdes->trid;
    chkpt_topop->sysop_start_postpone_lsa = tdes->rcv.sysop_start_postpone_lsa;
    chkpt_topop->atomic_sysop_start_lsa = tdes->rcv.atomic_sysop_start_lsa;
    ntops++;
  }

The comment is the load-bearing piece. The natural eligibility predicate would be tdes->state == TRAN_UNACTIVE_TOPOPE_COMMITTED_WITH_POSTPONE, but tdes->state mutates without holding a mutex (the writers update it optimistically and rely on per-record log entries for correctness). What is protected by a mutex is the LSA that records “this transaction has entered sysop-postpone”: setting sysop_start_postpone_lsa requires prior_lsa_mutex. The checkpoint walks under prior_lsa_mutex-held already (it took the mutex to guarantee no new prior-list nodes appear during the walk), so the LSA test is the safe stand-in.

The captured LOG_INFO_CHKPT_SYSOP has only three fields — trid, sysop_start_postpone_lsa, atomic_sysop_start_lsa. Recovery’s analysis pass uses these to seed tdes->rcv.sysop_start_postpone_lsa on the rebuilt TDES, which then drives the postpone replay during log_recovery_finish_all_postpone.

There is a subtle interaction with the ntops counter. The first trantable walk computes ntops by counting transactions in TRAN_UNACTIVE_TOPOPE_COMMITTED_WITH_POSTPONE. The second walk re-derives the actual ntops from the eligibility test above. The two counts can diverge — a transaction can transition between states between the two walks — and the second walk wins. The length_all_tops buffer is reallocated inside logpb_checkpoint_topops if the second walk’s running count exceeds the first walk’s estimate.

The end record — `LOG_REC_CHKPT` and its trailing arrays

The on-log shape:

// LOG_REC_CHKPT — src/transaction/log_record.hpp
typedef struct log_rec_chkpt LOG_REC_CHKPT;
struct log_rec_chkpt
{
  LOG_LSA redo_lsa;             /* Oldest LSA of dirty data page in page buffers */
  int     ntrans;               /* Number of active transactions */
  int     ntops;                /* Total number of system operations */
};

/* Transaction descriptor */
typedef struct log_info_chkpt_trans LOG_INFO_CHKPT_TRANS;
struct log_info_chkpt_trans
{
  int     isloose_end;
  TRANID  trid;                 /* Transaction identifier */
  TRAN_STATE state;             /* Transaction state (e.g., Active, aborted) */
  LOG_LSA head_lsa;             /* First log address of transaction */
  LOG_LSA tail_lsa;             /* Last log record address of transaction */
  LOG_LSA undo_nxlsa;           /* Next log record address for UNDO purposes */
  LOG_LSA posp_nxlsa;           /* First address of a postpone record */
  LOG_LSA savept_lsa;           /* Address of last savepoint */
  LOG_LSA tail_topresult_lsa;   /* Address of last partial abort/commit */
  LOG_LSA start_postpone_lsa;   /* Address of start postpone */
  char    user_name[LOG_USERNAME_MAX];
};

typedef struct log_info_chkpt_sysop LOG_INFO_CHKPT_SYSOP;
struct log_info_chkpt_sysop
{
  TRANID  trid;
  LOG_LSA sysop_start_postpone_lsa;
  LOG_LSA atomic_sysop_start_lsa;
};

The on-disk layout of the end-CHKPT record is therefore:

| LOG_RECORD_HEADER (back/forw/trid/type=LOG_END_CHKPT) |
| LOG_REC_CHKPT { redo_lsa, ntrans, ntops }              |
| LOG_INFO_CHKPT_TRANS [ntrans]                          |
| LOG_INFO_CHKPT_SYSOP [ntops]                           |

Two implementation details worth marking up. (a) The two arrays are heap-allocated separately (malloc(ntrans * sizeof(...)) for trans, ditto for topops) and the prior-list-allocator is then asked to copy them into the appended record:

// from logpb_checkpoint, end-CHKPT emission
node = prior_lsa_alloc_and_copy_data (
    thread_p, LOG_END_CHKPT, RV_NOT_DEFINED, NULL,
    length_all_chkpt_trans, (char *) chkpt_trans,
    (int) length_all_tops,    (char *) chkpt_topops);

chkpt = (LOG_REC_CHKPT *) node->data_header;
*chkpt = tmp_chkpt;

prior_lsa_next_record_with_lock (thread_p, node, tdes);

The prior_lsa_alloc_and_copy_data overload accepts two payloads which it concatenates after the data_header. The header itself is filled in after allocation by writing through node->data_header. (b) The record uses RV_NOT_DEFINED as its recovery index. A checkpoint record has no redo function and no undo function — it is purely informational, consumed by the analysis pass. The RV_fun[RV_NOT_DEFINED] slot is a debug-only dump entry; trying to redo or undo a checkpoint record would fail an assert.

Recovery interaction — analysis anchors on `chkpt_lsa`

The starting LSA of log_recovery_analysis is set inside log_recovery as the very first action:

// log_recovery — src/transaction/log_recovery.c (excerpt)
LSA_COPY (&rcv_lsa, &log_Gl.hdr.chkpt_lsa);

if (ismedia_crash != false)
  {
    /* Media crash, we may have to start from an older checkpoint... */
    (void) fileio_map_mounted (
        thread_p,
        (bool (*)(THREAD_ENTRY *, VOLID, void *)) log_rv_find_checkpoint,
        &rcv_lsa);
  }

The crash-vs-media distinction is significant. For crash recovery, chkpt_lsa is exactly the LSA the analysis must start from — every record before it has been consumed by a prior checkpoint and is no longer needed. For media recovery (restart from a backup), the per-volume disk headers may carry rcv-LSAs that predate the global chkpt_lsa, because the backup was taken at a different point in time; the loop walks all mounted volumes and takes the minimum rcv-LSA across them.

The analysis pass arms for the two checkpoint records:

// log_rv_analysis_start_checkpoint — src/transaction/log_recovery.c
static int
log_rv_analysis_start_checkpoint (LOG_LSA * log_lsa, LOG_LSA * start_lsa,
                                  bool * may_use_checkpoint)
{
  /* Use the checkpoint record only if it is the first record in the
   * analysis. */
  if (LSA_EQ (log_lsa, start_lsa))
    {
      *may_use_checkpoint = true;
    }
  return NO_ERROR;
}

// log_rv_analysis_end_checkpoint — src/transaction/log_recovery.c (sketch)
if (*may_use_checkpoint == false) return NO_ERROR;
*may_use_checkpoint = false;

LSA_COPY (check_point, log_lsa);
/* read LOG_REC_CHKPT header, then the trans + topops trailing arrays */
...
for (i = 0; i < chkpt.ntrans; i++)
  {
    tdes = logtb_rv_find_allocate_tran_index (thread_p, chkpt_trans[i].trid, log_lsa);
    logtb_clear_tdes (thread_p, tdes);

    if (chkpt_one->state == TRAN_ACTIVE
        || chkpt_one->state == TRAN_UNACTIVE_ABORTED)
      tdes->state = TRAN_UNACTIVE_UNILATERALLY_ABORTED;
    else
      tdes->state = chkpt_one->state;

    LSA_COPY (&tdes->head_lsa, &chkpt_one->head_lsa);
    LSA_COPY (&tdes->tail_lsa, &chkpt_one->tail_lsa);
    LSA_COPY (&tdes->undo_nxlsa, &chkpt_one->undo_nxlsa);
    LSA_COPY (&tdes->posp_nxlsa, &chkpt_one->posp_nxlsa);
    /* ...savept_lsa, tail_topresult_lsa, tran_start_postpone_lsa... */

    if (LOG_ISTRAN_2PC (tdes))
      *may_need_synch_checkpoint_2pc = true;
  }

Three observations. (a) The eligibility gate if (LSA_EQ (log_lsa, start_lsa)) enforces that only the first checkpoint encountered is consumed. Subsequent checkpoint records inside the analysis window are skipped — they exist as ordinary log traffic but their snapshot is stale. (b) TRAN_ACTIVE and TRAN_UNACTIVE_ABORTED from the snapshot are coerced to TRAN_UNACTIVE_UNILATERALLY_ABORTED — recovery treats every still-active transaction as a loser. The 2PC-prepared state is kept verbatim so the in-doubt path can find it. (c) start_redo_lsa is set from chkpt.redo_lsa. This becomes the lower bound log_recovery_redo walks from.

Recovery boundary diagram

flowchart LR
  subgraph LOG["WAL on disk"]
    direction LR
    OLD["...older records..."] --> SC["LOG_START_CHKPT @ chkpt_lsa"]
    SC --> M1["LOG_UNDOREDO_DATA"]
    M1 --> M2["LOG_COMMIT (T17)"]
    M2 --> EC["LOG_END_CHKPT (snapshot, redo_lsa=R)"]
    EC --> P1["LOG_UNDOREDO_DATA"]
    P1 --> P2["LOG_MVCC_REDO_DATA"]
    P2 --> EOF["LOG_END_OF_LOG (crash here)"]
  end
  subgraph HDR["log header (pageid -9)"]
    HC["chkpt_lsa = SC.lsa"]
    HS["smallest_lsa_at_last_chkpt"]
  end
  subgraph PASS["analysis pass"]
    A1["start_lsa = chkpt_lsa"]
    A2["walk forward"]
    A3["seed TT/DPT from end-CHKPT"]
    A4["redo_lsa = chkpt.redo_lsa"]
  end
  HDR -.->|read at restart| A1
  A1 --> SC
  SC -.->|"start: may_use_chkpt=true"| A2
  EC -.->|"consume snapshot"| A3
  A3 --> A4
  A4 -.->|"redo from R (may be < SC.lsa)"| LOG

The diagram makes the redo-vs-analysis distinction visible: analysis starts from chkpt_lsa (the start-CHKPT record), but redo starts from chkpt.redo_lsa (the smallest-dirty-LSA hint from the end-CHKPT record). Those are typically the same LSA, but not always — redo_lsa can be earlier than chkpt_lsa when a page-buffer entry has an oldest_unflush_lsa that predates the checkpoint (a long-lived dirty page). In that case redo walks backward from the analysis start, applying each record only if the target page’s on-disk LSA is below the record’s LSA.

Cooperation with the page buffer and DWB

pgbuf_flush_checkpoint is the page-buffer entry point step 7 calls. Its essential body:

// pgbuf_flush_checkpoint — src/storage/page_buffer.c (sketch)
int
pgbuf_flush_checkpoint (THREAD_ENTRY *thread_p,
                        const LOG_LSA *flush_upto_lsa,
                        const LOG_LSA *prev_chkpt_redo_lsa,
                        LOG_LSA *smallest_lsa, int *flushed_page_cnt)
{
  /* Things must be truly flushed up to this lsa */
  logpb_flush_log_for_wal (thread_p, flush_upto_lsa);
  LSA_SET_NULL (smallest_lsa);

  for (bufid = 0; bufid < pgbuf_Pool.num_buffers; bufid++)
    {
      bufptr = PGBUF_FIND_BCB_PTR (bufid);
      PGBUF_BCB_LOCK (bufptr);

      /* skip non-dirty, post-window-dirty, temp-volume BCBs */
      if (!pgbuf_bcb_is_dirty (bufptr)
          || LSA_GT (&bufptr->oldest_unflush_lsa, flush_upto_lsa)
          || pgbuf_is_temporary_volume (bufptr->vpid.volid))
        { PGBUF_BCB_UNLOCK (bufptr); continue; }

      /* defensive invariant: oldest_unflush_lsa must not predate
       * the previous checkpoint's redo-LSA */
      if (LSA_LT (&bufptr->oldest_unflush_lsa, prev_chkpt_redo_lsa))
        { er_set (...ER_LOG_CHECKPOINT_SKIP_INVALID_PAGE...); assert (false); }

      /* enqueue for flush, sorted by VPID */
      f_list[collected_bcbs++].bufptr = bufptr;
      ...
    }
  /* drain via pgbuf_flush_chkpt_seq_list → DWB → home volume */
}

Three things matter. (a) The first call logpb_flush_log_for_wal enforces WAL ordering: every record up to and including the begin-CHKPT (flush_upto_lsa = newchkpt_lsa) is forced before any data page is written. (b) Pages that became dirty after the begin-CHKPT are deliberately skipped — flushing them would not advance the next redo-LSA usefully and would interfere with in-flight transactions. (c) The prev_chkpt_redo_lsa invariant catches incorrect redo-LSAs from the previous checkpoint via assertion.

The actual write path goes through the DWB. BCBs are sorted by VPID and handed to pgbuf_flush_chkpt_seq_list, which drives each page through dwb_add_page so the DWB flush daemon can write the staging slot before the home page. Even if the engine crashes mid-checkpoint, every page mid-home-write is either fully on disk (clean) or recoverable from its DWB slot during restart’s pre-redo DWB scan. See cubrid-double-write-buffer.md.

flowchart LR
  subgraph CHK["logpb_checkpoint"]
    direction TB
    SC["1) emit LOG_START_CHKPT"]
    PF["2) pgbuf_flush_checkpoint(newchkpt_lsa)"]
    FA["3) fileio_synchronize_all"]
    EC["4) emit LOG_END_CHKPT"]
    HDR["5) flush log header"]
  end
  subgraph PB["page buffer"]
    BCBS["BCBs with oldest_unflush_lsa <= newchkpt_lsa"]
  end
  subgraph DWBP["DWB"]
    SLOT["staged page in DWB slot"]
    HOME["home volume page"]
  end
  subgraph FS["filesystem"]
    DV["data volumes"]
    LV["log file"]
    HV["log header page"]
  end
  SC --> LV
  PF --> BCBS
  BCBS -->|dwb_add_page| SLOT
  SLOT -->|fsync DWB volume| DV
  SLOT -->|then write home| HOME
  HOME --> DV
  PF --> FA
  FA -->|fsync each volume| DV
  EC --> LV
  HDR --> HV
  HV -.->|"chkpt_lsa = newchkpt_lsa"| LV

Failure cases

The protocol is designed to crash safely at any point.

Before begin-CHKPT. No effect; the previous checkpoint remains the boundary.
Between begin-CHKPT and end-CHKPT, or between end-CHKPT and header flush. log_Gl.hdr.chkpt_lsa on disk still points at the previous checkpoint (the header is the very last thing written). Analysis runs from the previous chkpt_lsa, encounters the partial new-bracket records as ordinary log traffic. The begin-CHKPT arm’s eligibility gate (LSA_EQ(log_lsa, start_lsa)) fails — analysis didn’t start from the new begin record — so the arm no-ops and the partial bracket is harmless. Cost: one checkpoint window’s worth of re-processing.
After header flush. New chkpt_lsa is durable. Analysis starts from the new bracket; the gate fires correctly.
Missing checkpoint (brand-new install). chkpt_lsa is NULL_LSA; analysis walks from the beginning of the log. Slow but correct.
Checkpoint LSA past durable end-of-log. Header corruption — fatal (logpb_fatal_error), restore from backup.
Mid-flush of dirty pages (step 7). DWB protects against torn pages: pages mid-home-write are restored from their DWB slot during dwb_load_and_recover_pages before redo runs.

Source Walkthrough

Daemon registration and timing

log_Checkpoint_daemon (log_manager.c) — file-scope cubthread::daemon pointer.
log_checkpoint_daemon_init (log_manager.c) — creates the daemon at server start.
log_get_checkpoint_interval (log_manager.c) — reads PRM_ID_LOG_CHECKPOINT_INTERVAL_SECS for the looper period.
log_checkpoint_execute (log_manager.c) — daemon body; defers to logpb_checkpoint.
log_wakeup_checkpoint_daemon (log_manager.c) — out-of-band wakeup hook.
log_daemons_init / log_daemons_destroy (log_manager.c) — bootstrap and teardown.
LOG_ISCHECKPOINT_TIME macro (log_manager.c) — page-count-based legacy trigger.

Checkpoint emission

logpb_checkpoint (log_page_buffer.c) — orchestrator.
logpb_checkpoint_trans (log_page_buffer.c) — per-TDES extractor.
logpb_checkpoint_topops (log_page_buffer.c) — per-active-sysop extractor.
logpb_dump_checkpoint_trans (log_page_buffer.c) — debug dumper for cubrid logdump.
log_dump_record_checkpoint (log_manager.c) — top-level dispatcher for dumping a checkpoint record.
log_dump_checkpoint_topops (log_manager.c) — debug dumper for active-sysop array.
prior_lsa_alloc_and_copy_data (log_append.cpp) — shared with all log appenders; both bracket records use it.
prior_lsa_next_record_with_lock (log_append.cpp) — assigns the LSA and links the prior-list node.

Page-buffer cooperation

pgbuf_flush_checkpoint (page_buffer.c) — selects dirty BCBs, sorts by VPID, drives them through DWB.
pgbuf_flush_chkpt_seq_list (page_buffer.c) — performs the actual flush of one batch.
pgbuf_Pool.is_checkpoint (page_buffer.c) — atomic flag the page-buffer flusher reads to coordinate with concurrent victims.
logpb_flush_log_for_wal (log_page_buffer.c) — WAL-ordering enforcement called by pgbuf_flush_checkpoint.
fileio_synchronize_all (file_io.c) — fsync all volumes after the dirty-page flush.
dwb_flush_force (double_write_buffer.cpp) — forces pending DWB blocks; called transitively from fileio_synchronize_all.

Recovery-side consumption

log_recovery (log_recovery.c) — sets rcv_lsa = log_Gl.hdr.chkpt_lsa as the analysis starting point.
log_rv_find_checkpoint (log_recovery.c) — per-volume rcv-LSA scan used during media recovery.
log_recovery_analysis (log_recovery.c) — forward walk from chkpt_lsa.
log_rv_analysis_record (log_recovery.c) — switch over LOG_RECTYPE; arms for LOG_START_CHKPT and LOG_END_CHKPT.
log_rv_analysis_start_checkpoint (log_recovery.c) — sets may_use_checkpoint if the start record’s LSA matches the analysis start.
log_rv_analysis_end_checkpoint (log_recovery.c) — reads LOG_REC_CHKPT, the trans array, and the topops array; seeds the trantable; sets start_redo_lsa.
logtb_rv_find_allocate_tran_index (log_tran_table.c) — allocates a TDES slot keyed by trid for each row in the trans array.
logtb_clear_tdes (log_tran_table.c) — zeroes the TDES before re-populating from the snapshot.

log_Gl.hdr.chkpt_lsa (log_storage.hpp field LOG_HEADER::chkpt_lsa) — on-disk recovery anchor.
log_Gl.hdr.smallest_lsa_at_last_chkpt (log_storage.hpp) — archive-removal watermark.
log_Gl.chkpt_redo_lsa (log_impl.h::log_global) — in-memory copy of the last end-CHKPT’s redo_lsa, used by pgbuf_flush_checkpoint as prev_chkpt_redo_lsa.
log_Gl.chkpt_lsa_lock (log_impl.h::log_global) — pthread mutex guarding the two LSAs above.
log_Gl.run_nxchkpt_atpageid / log_Gl.chkpt_every_npages (log_impl.h::log_global) — legacy page-count trigger state.
logpb_flush_header (log_page_buffer.c) — writes the active-log header page to disk.

Log record type & payload

LOG_START_CHKPT (log_record.hpp, value 25) — begin marker.
LOG_END_CHKPT (log_record.hpp, value 26) — end record carrying the snapshot.
LOG_REC_CHKPT (log_record.hpp) — { redo_lsa, ntrans, ntops } header.
LOG_INFO_CHKPT_TRANS (log_record.hpp) — per-tran row.
LOG_INFO_CHKPT_SYSOP (log_record.hpp) — per-active-sysop row.

System parameters

PRM_ID_LOG_CHECKPOINT_INTERVAL_SECS (system_parameter.c, default 360 s, deprecated) — timer period.
PRM_ID_LOG_CHECKPOINT_INTERVAL (system_parameter.c, default 360 s, replacement) — same role, different unit handling.
PRM_ID_LOG_CHECKPOINT_NPAGES (system_parameter.c, default 100000, deprecated) — page-count trigger.
PRM_ID_LOG_CHECKPOINT_SIZE (system_parameter.c, default 100000, replacement) — size-based equivalent.
PRM_ID_LOG_CHECKPOINT_SLEEP_MSECS (system_parameter.c, default 1 ms, hidden) — inter-page flush throttle.
PRM_ID_LOG_CHKPT_DETAILED (system_parameter.c) — turns on detailed_er_log traces inside logpb_checkpoint.

Position hints as of 2026-05-01

Symbol	File	Line
`log_Checkpoint_daemon`	`log_manager.c`	359
`log_get_checkpoint_interval`	`log_manager.c`	10075
`log_wakeup_checkpoint_daemon`	`log_manager.c`	10113
`log_checkpoint_execute`	`log_manager.c`	10167
`log_checkpoint_daemon_init`	`log_manager.c`	10407
`LOG_ISCHECKPOINT_TIME` macro	`log_manager.c`	122
`log_dump_checkpoint_topops`	`log_manager.c`	6769
`log_dump_record_checkpoint`	`log_manager.c`	6792
`logpb_checkpoint_trans`	`log_page_buffer.c`	6783
`logpb_checkpoint_topops`	`log_page_buffer.c`	6833
`logpb_checkpoint`	`log_page_buffer.c`	6877
`logpb_dump_checkpoint_trans`	`log_page_buffer.c`	7395
`log_rv_find_checkpoint`	`log_recovery.c`	579
`log_rv_analysis_start_checkpoint`	`log_recovery.c`	1797
`log_rv_analysis_end_checkpoint`	`log_recovery.c`	1830
`log_rv_analysis_record (LOG_*_CHKPT arms)`	`log_recovery.c`	2436
`log_recovery (chkpt_lsa anchor)`	`log_recovery.c`	780
`LOG_REC_CHKPT` struct	`log_record.hpp`	345
`LOG_INFO_CHKPT_TRANS` struct	`log_record.hpp`	354
`LOG_INFO_CHKPT_SYSOP` struct	`log_record.hpp`	372
`LOG_START_CHKPT` enum	`log_record.hpp`	96
`LOG_END_CHKPT` enum	`log_record.hpp`	97
`LOG_HEADER::chkpt_lsa`	`log_storage.hpp`	141
`LOG_HEADER::smallest_lsa_at_last_chkpt`	`log_storage.hpp`	163
`log_global::chkpt_lsa_lock`	`log_impl.h`	681
`log_global::chkpt_redo_lsa`	`log_impl.h`	683
`log_global::chkpt_every_npages`	`log_impl.h`	684
`log_global::run_nxchkpt_atpageid`	`log_impl.h`	678
`pgbuf_flush_checkpoint`	`page_buffer.c`	3960
`PRM_ID_LOG_CHECKPOINT_INTERVAL_SECS`	`system_parameter.c`	1368
`PRM_ID_LOG_CHECKPOINT_INTERVAL`	`system_parameter.c`	1379
`PRM_ID_LOG_CHECKPOINT_NPAGES`	`system_parameter.c`	1346
`PRM_ID_LOG_CHECKPOINT_SIZE`	`system_parameter.c`	1357

Cross-check Notes

vs `cubrid-recovery-manager.md`

Bracket records are LOG_START_CHKPT / LOG_END_CHKPT. Both docs say so; this is the faithful ARIES two-step.
log_Gl.hdr.chkpt_lsa is the analysis anchor, not the redo anchor. The redo anchor (start_redo_lsa) is derived from LOG_REC_CHKPT.redo_lsa. The two typically match but can diverge when long-lived dirty pages exist.
2PC: the recovery-manager doc lists LOG_RECOVERY_FINISH_2PC_PHASE as a conditional phase; the trigger is set inside log_rv_analysis_end_checkpoint via *may_need_synch_checkpoint_2pc = true when the snapshot contains a TRAN_2PC_PREPARED row.
The “post-restart final checkpoint” call at the end of log_recovery is the same path used at clean shutdown.

vs `cubrid-log-manager.md`

Checkpoint records flow through the same prior-list discipline as every other appender:

The end-CHKPT record carries trailing payloads via the two-payload overload of prior_lsa_alloc_and_copy_data.
The checkpoint bypasses the group-commit waiter and uses logpb_flush_pages_direct because it needs synchronous durability of both bracket records.

vs `cubrid-mvcc.md`

LOG_HEADER.mvcc_op_log_lsa is updated alongside chkpt_lsa, giving vacuum a durable handle.
LOG_INFO_CHKPT_TRANS does NOT carry MVCCID. Recovery rebuilds per-TDES MVCCID from the per-record mvcc_id field during analysis, not from the snapshot. This is correct because MVCCID issuance is lazy.

vs `cubrid-double-write-buffer.md`

pgbuf_flush_checkpoint is the largest single DWB producer.
fileio_synchronize_all calls dwb_flush_force transitively; step 8 of logpb_checkpoint ensures all DWB-staged pages reach home before the end-CHKPT record is emitted, so the redo-LSA promise is sound.
DWB and checkpoint protocols protect against in-progress crashes independently: torn-page recovery vs. previous-checkpoint fallback. Neither relies on the other.

Open Questions

Why are both PRM_ID_LOG_CHECKPOINT_INTERVAL_SECS and PRM_ID_LOG_CHECKPOINT_INTERVAL defined with default 360? The former is marked PRM_DEPRECATED, the latter is the modern replacement. Are there call sites that still read the deprecated one? Investigation path: grep for PRM_ID_LOG_CHECKPOINT_INTERVAL without the _SECS suffix; check whether log_get_checkpoint_interval should switch.
Is the page-count trigger (LOG_ISCHECKPOINT_TIME) actually used? The macro is defined but the daemon-driven timer appears to dominate. A grep for LOG_ISCHECKPOINT_TIME would show whether any append path still polls it; if not, the macro is dead code preserved for legacy compatibility.
What is the upper bound on LOG_REC_CHKPT record size? A busy engine with many active transactions could produce a multi-MB end-CHKPT record. Is there a clamp? What happens if the serialised size exceeds one log page? The LOG_READ_ADVANCE_WHEN_DOESNT_FIT macro inside log_rv_analysis_end_checkpoint suggests the recovery side handles multi-page checkpoint records, but the emit side’s allocation is monolithic — is that correct?
Crash atomicity of the per-volume-header write. Step 15 of logpb_checkpoint rewrites every data volume’s disk header with the new chkpt_lsa for media-recovery purposes. This is not atomic across volumes. A crash mid-loop would leave some volumes with the new LSA and others with the old. Is media recovery robust to this? log_rv_find_checkpoint takes the minimum per-volume LSA, so the answer is “yes” — but the property should be confirmed.
Does the standalone (SA_MODE) path emit checkpoints? The daemon registration is #if defined(SERVER_MODE)-guarded. Standalone tools (csql -S, loaddb) presumably take a checkpoint only at exit, not periodically. Investigation path: trace logpb_checkpoint callers under SA_MODE.
The tdes->client.set_system_internal_with_user (chkpt_one->user_name) call in analysis recovery looks unusual. It sets a system marker with a user name from the snapshot. Why does recovery need the user name? Possibly for HA/replication audit logs. Worth tracing.
Interaction with the page-server replication path. CUBRID has a “page server” replication mode where the page buffer is on a remote node. Does the checkpoint daemon coordinate with the page server? log_recovery_redo.hpp mentions the redo dispatcher is shared between recovery and page-server replication; is the checkpoint emitter also shared?

Sources

CUBRID source (`/data/hgryoo/references/cubrid/`)

src/transaction/log_manager.c — daemon registration, looper, legacy LOG_ISCHECKPOINT_TIME macro, dump helpers.
src/transaction/log_page_buffer.c — the body of logpb_checkpoint plus its helpers logpb_checkpoint_trans, logpb_checkpoint_topops, logpb_dump_checkpoint_trans, and logpb_flush_header.
src/transaction/log_record.hpp — the on-log shape: LOG_REC_CHKPT, LOG_INFO_CHKPT_TRANS, LOG_INFO_CHKPT_SYSOP, the LOG_START_CHKPT / LOG_END_CHKPT enum values.
src/transaction/log_storage.hpp — LOG_HEADER::chkpt_lsa and smallest_lsa_at_last_chkpt.
src/transaction/log_impl.h — log_global’s checkpoint-related fields (chkpt_lsa_lock, chkpt_redo_lsa, chkpt_every_npages, run_nxchkpt_atpageid).
src/transaction/log_recovery.c — the consumer: log_rv_find_checkpoint, log_rv_analysis_start_checkpoint, log_rv_analysis_end_checkpoint, the analysis dispatch arm in log_rv_analysis_record, the chkpt_lsa = rcv_lsa assignment in log_recovery.
src/transaction/log_tran_table.c — TR_TABLE_CS_ENTER/EXIT primitives the checkpoint walks under, and logtb_clear_tdes, logtb_rv_find_allocate_tran_index used by recovery.
src/storage/page_buffer.c — pgbuf_flush_checkpoint, the dirty-page driver invoked from inside logpb_checkpoint.
src/base/system_parameter.c — PRM_ID_LOG_CHECKPOINT_* entries and their defaults.

Theoretical references

Mohan, Haderle, Lindsay, Pirahesh, Schwarz, ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging, ACM TODS 17.1, 1992 — the fuzzy-checkpoint protocol with explicit begin/end records is the ARIES section §6.
Bernstein, Hadzilacos, Goodman, Concurrency Control and Recovery in Database Systems, 1987 — the textbook treatment of checkpoints in §6 (“Recovery”); distinguishes consistent vs fuzzy variants.
Petrov, Database Internals, 2019, ch. 5 §“Recovery” and §“ARIES” — modern textbook framing; introduces redo-LSA hint and the relationship between checkpoint frequency and recovery time.
Silberschatz, Korth, Sudarshan, Database System Concepts, 7th ed., ch. 19 (“Recovery System”) — the standard undergraduate presentation; checkpoints are framed as a way to bound the redo pass.

Sibling docs in this knowledge base

knowledge/code-analysis/cubrid/cubrid-recovery-manager.md — the three-pass restart protocol that consumes what this checkpoint emits.
knowledge/code-analysis/cubrid/cubrid-log-manager.md — the WAL framework whose prior-list and append discipline the checkpoint uses for both bracket records.
knowledge/code-analysis/cubrid/cubrid-mvcc.md — MVCC interactions through mvcc_op_log_lsa and the lazy-MVCCID issuance model.
knowledge/code-analysis/cubrid/cubrid-double-write-buffer.md — the torn-page guard the checkpoint cooperates with during step 7 (pgbuf_flush_checkpoint) and step 8 (fileio_synchronize_all).
knowledge/code-analysis/cubrid/cubrid-page-buffer-manager.md — the dirty-page tracking that drives the redo-LSA hint.
knowledge/code-analysis/cubrid/cubrid-2pc.md — the in-doubt transactions that the active-TX snapshot keeps alive across restart via may_need_synch_checkpoint_2pc.