CUBRID Backup and Restore — Online Volume Backup, LSA Markers, and Point-in-Time Recovery

Theoretical Background

A physical backup of a running database is fundamentally racy. Pages are mutated while the backup tool is reading them, so any naïve copy of the data files captures an arbitrary mix of pre-image and post-image bytes — a “fuzzy” snapshot. WAL-based engines accept this and reach consistency only after redo replay: the data files inside the backup become valid only when paired with the log records that bracket the copy.

The bracketing is described by two log sequence numbers. A start LSA (the checkpoint LSA) marks the earliest log record whose effects must still be redone — every page mutation older than that LSA is already durably stored in the snapshot. An end LSA (the append LSA at backup-end) marks the latest log record whose effects might appear in the copied pages. A backup that includes the page snapshot, the start LSA, and every log record in [start_lsa, end_lsa] is self-contained: a restore can mount the page images, replay the log from start_lsa, and converge on any commit boundary between the two — that is, point-in-time recovery (PITR).

Three follow-on choices fall out of this. The engine must keep enough log archives around so the restore can find every record between start_lsa and end_lsa; if archive deletion races ahead of an in-progress backup, the backup is irrecoverable. The page-copy phase itself does not need to lock anything because any mutation that races with the copy reappears during redo replay. And the start LSA cannot be picked arbitrarily — it must be the LSA of a fuzzy checkpoint such that all dirty pages older than it have been flushed at least once. CUBRID picks the most recent global checkpoint LSA and blocks the next checkpoint for the duration of the copy.

Backup levels are the same model at a coarser grain. A level-0 (full) backup copies every allocated page. A level-1 backup copies only pages whose prv.lsa (the page’s last-modified LSA) is strictly newer than the level-0 backup’s start LSA. A level-2 backup uses the level-1 start LSA the same way. This is “differential by LSA” rather than “differential by mtime”; the chain L0 → L1 → L2 is restored in reverse time order so the freshest image of each page wins.

The PITR target is the user-facing knob. The user provides a wall-clock timestamp; the restore code translates it to a stop LSA by scanning log records until a LOG_COMMIT or LOG_ABORT exceeds the target, then stops redo just before that point. Time monotonicity is load-bearing — if two distinct events share a time(NULL) second, the restore cannot tell them apart. CUBRID acknowledges this in fileio_finish_backup with a forced 1-second sleep so the backup-end timestamp is strictly less than every post-backup commit timestamp, and explicitly flags the residual gap.

Common DBMS Design

PostgreSQL ships pg_basebackup, which streams the data directory while an internal base-backup log record marks the start LSA. WAL is archived separately via archive_command or streaming. Restore replays WAL from the base backup’s start LSA up to a recovery_target_time/LSN/named point and refuses to start in primary mode until all expected WAL has been seen. The WAL stream and the base backup are decoupled, so any third party can implement either half.

MySQL historically split the world into mysqldump (logical) and various physical-copy approaches. Percona XtraBackup copies InnoDB data files while concurrently capturing redo into a sidecar file; at “prepare” time it runs InnoDB recovery against the captured redo. PITR requires keeping binary logs around. XtraBackup’s redo capture is push-based — the tool subscribes to the redo stream rather than the server bracketing a window — which is why prepare, not copy, decides consistency.

Oracle RMAN is the most integrated. RMAN runs inside the database and does block-level incrementals (blocks whose SCN exceeds the parent backup’s SCN — Oracle’s SCN is equivalent to CUBRID’s LSA), archived-redo-log backup, on-the-fly verification, and PITR by SCN/time/restore-point. RMAN’s distinguishing choice is its catalog: backup metadata lives in the database itself (or a recovery catalog DB), not in flat files alongside the backups.

CUBRID sits between Postgres and RMAN. The backup utility runs inside the server (or in SA mode, in-process), so it has direct access to checkpoint state, the LSN cursor, and the archive directory. It writes a single self-contained backup volume that bundles data pages, archived log records, and metadata — closer to XtraBackup’s combined output than to Postgres’s split. Three levels of incremental are supported. PITR is by wall-clock time (-d backuptime or -d "YYYY-MM-DD...") but internally resolves to an LSA-based stop point during redo. The on-disk format is documented inside file_io.c via FILEIO_BACKUP_HEADER and pageid sentinels (FILEIO_BACKUP_START_PAGE_ID, FILEIO_BACKUP_END_PAGE_ID, FILEIO_BACKUP_FILE_START_PAGE_ID, FILEIO_BACKUP_FILE_END_PAGE_ID); there is no external catalog and no third-party reader.

CUBRID’s Approach

The end-to-end pipeline runs across three modules: src/storage/file_io.c owns the on-disk backup format and per-volume copy, src/transaction/log_page_buffer.c owns the orchestration and log-archive bracketing, and src/transaction/log_recovery.c owns redo replay during restore. The user-facing entry points live in src/executables/util_cs.c (backupdb) and src/executables/util_sa.c (restoredb). Both eventually go through src/transaction/boot_sr.c to enter the server-side code path.

Backup-side data structures

The backup volume is a flat sequence of fixed-size pages prefixed by a header. Page identifiers are negative sentinels used as in-band markers:

#define FILEIO_BACKUP_START_PAGE_ID      (-2)  // header page
#define FILEIO_BACKUP_END_PAGE_ID        (-3)  // last page in this backup volume
#define FILEIO_BACKUP_FILE_START_PAGE_ID (-4)  // start of one DB volume's payload
#define FILEIO_BACKUP_FILE_END_PAGE_ID   (-5)  // end of one DB volume's payload
#define FILEIO_BACKUP_VOL_CONT_PAGE_ID   (-6)  // continuation in next backup volume

FILEIO_BACKUP_PAGE is the wire format of every payload page; it carries the real iopageid at the front and a duplicate iopageid_dup at a runtime-computed tail offset — the only end-to-end consistency check (the inner page is unmodified database content). FILEIO_BACKUP_HEADER lives at FILEIO_BACKUP_START_PAGE_ID and carries everything the restore needs to authenticate the volume and chain incremental levels:

// FILEIO_BACKUP_HEADER — file_io.h (key fields)
struct fileio_backup_header {
  PAGEID iopageid;                          // = FILEIO_BACKUP_START_PAGE_ID
  char magic[];                             // "CUBRID DATABASE BACKUP"
  INT64 db_creation;                        // binds backup to its source DB
  INT64 start_time, end_time;               // bounds for PITR comparator
  char db_fullname[PATH_MAX];
  PGLENGTH db_iopagesize;
  FILEIO_BACKUP_LEVEL level;                // 0=full, 1=big incr, 2=small incr
  LOG_LSA start_lsa;                        // skip predicate: copy if prv.lsa > start_lsa
  LOG_LSA chkpt_lsa;                        // restore's redo-start cursor
  int unit_num, bkup_iosize, bkpagesize;
  FILEIO_BACKUP_RECORD_INFO previnfo[FILEIO_BACKUP_UNDEFINED_LEVEL];  // parent-level chain
  char db_prec_bkvolname[PATH_MAX];         // multi-volume back-chain
  FILEIO_ZIP_METHOD zip_method;             // NONE or LZ4
  FILEIO_ZIP_LEVEL zip_level;
};

For full backups CUBRID groups FILEIO_FULL_LEVEL_EXP = 32 database pages into one backup page (bkpagesize = db_iopagesize × 32); the grouping amortises per-page overhead and aligns with LZ4’s preferred input window. Incrementals stay at one DB page per backup page because the access pattern is sparse.

FILEIO_BACKUP_FILE_HEADER (with iopageid = FILEIO_BACKUP_FILE_START_PAGE_ID, volid, nbytes, vlabel) precedes every database volume’s payload. FILEIO_BACKUP_SESSION (io_backup_session) is the runtime state — an in-memory mirror of one open backup volume (bkup) plus the buffer area for the in-flight DB volume (dbfile), the read-thread pool (read_thread_info), the verbose progress stream, and the per-1MB throttle (sleep_msecs).

Backup orchestration: `logpb_backup`

logpb_backup (in src/transaction/log_page_buffer.c) is the orchestrator. It threads gating concerns: serialisation against checkpoint, log-archive bracketing, level-chain validation, and TDE key-file separation. The skeleton:

// logpb_backup — log_page_buffer.c
int
logpb_backup (THREAD_ENTRY *thread_p, int num_perm_vols, const char *allbackup_path,
              FILEIO_BACKUP_LEVEL backup_level, ...)
{
  // 1. Serialise — only one backup at a time
  LOG_CS_ENTER (thread_p);
  if (log_Gl.backup_in_progress) { LOG_CS_EXIT(thread_p); return ER_LOG_BKUP_DUPLICATE_REQUESTS; }
  log_Gl.backup_in_progress = true;
  LOG_CS_EXIT (thread_p);

  // 2. Initialise session: allocates header, area, thread pool
  fileio_initialize_backup (log_Db_fullname, allbackup_path, &session, backup_level, ...);

  // 3. Wait for in-flight checkpoint, then freeze the next one
loop:
  LOG_CS_ENTER (thread_p);
  if (log_Gl.run_nxchkpt_atpageid == NULL_PAGEID) {        // checkpoint in progress
    LOG_CS_EXIT (thread_p); thread_sleep (1000); goto loop;
  }
  saved_run_nxchkpt_atpageid = log_Gl.run_nxchkpt_atpageid;
  log_Gl.run_nxchkpt_atpageid = NULL_PAGEID;                // freeze
  LSA_COPY (&chkpt_lsa, &log_Gl.hdr.chkpt_lsa);             // capture start LSA
  LOG_CS_EXIT (thread_p);

  // 4. Resolve start_lsa from the level chain
  switch (backup_level) {
    case FILEIO_BACKUP_BIG_INCREMENT_LEVEL:                  // L1: parent must exist
      if (LSA_ISNULL (&log_Gl.hdr.bkup_level0_lsa)) return ER_LOG_BACKUP_LEVEL_NOGAPS;
      LSA_COPY (&bkup_start_lsa, &log_Gl.hdr.bkup_level0_lsa); break;
    case FILEIO_BACKUP_SMALL_INCREMENT_LEVEL:                // L2: requires L1
      if (LSA_ISNULL (&log_Gl.hdr.bkup_level1_lsa)) return ER_LOG_BACKUP_LEVEL_NOGAPS;
      LSA_COPY (&bkup_start_lsa, &log_Gl.hdr.bkup_level1_lsa); break;
    default:
      LSA_SET_NULL (&bkup_start_lsa); break;                 // L0 starts from -1|-1
  }

  // 5. Stamp + write header, emit FILEIO_BACKUP_START_PAGE_ID
  fileio_start_backup (thread_p, log_Db_fullname, &log_Gl.hdr.db_creation,
                       backup_level, &bkup_start_lsa, &chkpt_lsa, all_bkup_info,
                       &session, zip_method, zip_level);

  // 6. Walk every DB volume in volid order
  volid = LOG_DBTDE_KEYS_VOLID;
  do {
    if (volid >= LOG_DBFIRST_VOLID)
      logpb_backup_for_volume (thread_p, volid, &chkpt_lsa, &session, isincremental);
    else
      fileio_backup_volume (thread_p, &session, from_vlabel, volid, -1, false);
    volid = fileio_find_next_perm_volume (thread_p, volid);
  } while (volid != NULL_VOLID);

  // 7. Archive the active log so the backup is self-contained
  LOG_CS_ENTER (thread_p);
  first_arv_needed = log_Gl.hdr.last_arv_num_for_syscrashes >= 0
                       ? log_Gl.hdr.last_arv_num_for_syscrashes
                       : log_Gl.hdr.nxarv_num;
  if (first_arv_needed < log_Gl.hdr.nxarv_num)
    logpb_backup_needed_archive_logs (thread_p, &session,
                                      first_arv_needed, log_Gl.hdr.nxarv_num - 1);

  // 8. Append _lginf, finalise bkvinf, write end-time, 1-second monotonicity sleep
  fileio_backup_volume (thread_p, &session, log_Name_info, LOG_DBLOG_INFO_VOLID, -1, false);
  logpb_update_backup_volume_info (log_Name_bkupinfo);
  fileio_finish_backup (thread_p, &session);

  // 9. Release checkpoint freeze
  log_Gl.run_nxchkpt_atpageid = saved_run_nxchkpt_atpageid;
  log_Gl.backup_in_progress = false;
}

LSA marker emission. The start LSA is chkpt_lsa = log_Gl.hdr.chkpt_lsa, sampled while holding LOG_CS and the checkpoint mutex. This is the only LSA the restore needs to know — every page whose prv.lsa < chkpt_lsa is durable, and every page whose prv.lsa >= chkpt_lsa will be redone from the archived log. The header’s start_lsa field is different: for L0 it is -1|-1 (always copy); for incrementals it is the parent’s chkpt_lsa, also stored in log_Gl.hdr.bkup_level0_lsa/bkup_level1_lsa. Pages with prv.lsa <= start_lsa are skipped during incrementals.

Checkpoint freeze. log_Gl.run_nxchkpt_atpageid = NULL_PAGEID is the same sentinel the checkpoint daemon sets on itself, so the backup borrows that state machine. Without the freeze, a new checkpoint could advance chkpt_lsa mid-copy and break the invariant that every dirty-at-chkpt_lsa page must appear in either the snapshot or the redo log.

Archive-log self-containment. logpb_backup_needed_archive_logs copies every archive from first_arv_needed to nxarv_num - 1 into the backup volume. first_arv_needed is last_arv_num_for_syscrashes (the oldest archive still required for crash recovery) or nxarv_num, pulled further back when vacuum’s vacuum_min_log_pageid_to_keep requires older records.

Level chain. all_bkup_info[] carries the timestamps and LSAs of every prior level so the header records the full chain; restore validates parent timestamps to refuse mixed-run L0/L1/L2.

End-time monotonicity workaround. fileio_finish_backup writes time(NULL) into end_time and then sleeps until time(NULL) is strictly greater. This guards the PITR comparator (commit_timestamp > end_time) since second-resolution timestamps are not monotonic without the fence. The code comment flags this as pending millisecond-resolution LOG_REC_DONETIME.

Per-volume copy: `fileio_backup_volume`

fileio_backup_volume is the inner loop that turns one database volume into a sequence of backup pages.

// fileio_backup_volume — file_io.c
int
fileio_backup_volume (THREAD_ENTRY *thread_p, FILEIO_BACKUP_SESSION *session_p,
                      const char *from_vol_label_p, VOLID from_vol_id,
                      PAGEID last_page, bool is_only_updated_pages)
{
  // 1. Mount source volume read-only (or reuse fd for log_active)
  session_p->dbfile.vlabel = from_vol_label_p;
  session_p->dbfile.volid  = from_vol_id;
  session_p->dbfile.vdes   = (from_vol_id == LOG_DBLOG_ACTIVE_VOLID)
                              ? fileio_get_volume_descriptor (LOG_DBLOG_ACTIVE_VOLID)
                              : fileio_open (session_p->dbfile.vlabel, O_RDONLY, 0);
  fstat (session_p->dbfile.vdes, &from_stbuf);
  session_p->dbfile.nbytes = from_stbuf.st_size;
  from_npages = CEIL_PTVDIV (session_p->dbfile.nbytes, backup_header_p->bkpagesize);

  // 2. Emit FILEIO_BACKUP_FILE_START_PAGE_ID with FILEIO_BACKUP_FILE_HEADER inside
  session_p->dbfile.area->iopageid = FILEIO_BACKUP_FILE_START_PAGE_ID;
  file_header_p          = (FILEIO_BACKUP_FILE_HEADER *) (&session_p->dbfile.area->iopage);
  file_header_p->volid   = session_p->dbfile.volid;
  file_header_p->nbytes  = session_p->dbfile.nbytes;
  strncpy (file_header_p->vlabel, session_p->dbfile.vlabel, PATH_MAX);
  fileio_write_backup (thread_p, session_p, FILEIO_BACKUP_FILE_HEADER_PAGE_SIZE);

  // 3. Read pages (parallel via fileio_start_backup_thread, or single-threaded here)
  for (page_id = 0; page_id < from_npages; page_id++) {
    node_p = fileio_allocate_node (queue_p, backup_header_p);
    node_p->pageid = page_id;
    node_p->nread  = fileio_read_backup (thread_p, session_p, node_p->pageid);

    // Incremental skip predicate: only copy if prv.lsa > parent start_lsa
    if (!is_only_updated_pages
        || LSA_ISNULL (&session_p->dbfile.lsa)
        || LSA_LT (&session_p->dbfile.lsa, &node_p->area->iopage.prv.lsa)) {
      node_p->nread += FILEIO_BACKUP_PAGE_OVERHEAD;
      FILEIO_SET_BACKUP_PAGE_ID_COPY (node_p->area, node_p->pageid, backup_header_p->bkpagesize);
      if (backup_header_p->zip_method != FILEIO_ZIP_NONE_METHOD)
        fileio_compress_backup_node (node_p, backup_header_p);
      fileio_write_backup_node (thread_p, session_p, node_p, backup_header_p);
    }
    fileio_free_node (queue_p, node_p);
  }

  // 4. Emit FILEIO_BACKUP_FILE_END_PAGE_ID
  node_p = fileio_allocate_node (queue_p, backup_header_p);
  FILEIO_SET_BACKUP_PAGE_ID (node_p->area, FILEIO_BACKUP_FILE_END_PAGE_ID, backup_header_p->bkpagesize);
  fileio_write_backup_node (thread_p, session_p, node_p, backup_header_p);
}

The skip predicate LSA_LT (&session_p->dbfile.lsa, &node_p->area->iopage.prv.lsa) means “copy this page if its last-modified LSA is strictly newer than the parent’s start LSA”. session_p->dbfile.lsa was set by fileio_start_backup from bkup_start_lsa — the L0 checkpoint LSA for L1, the L1 checkpoint LSA for L2, and LSA_NULL for L0 (predicate folds to “always copy”). Full backups bundle FILEIO_FULL_LEVEL_EXP = 32 DB pages per backup chunk; incrementals stay at one DB page per chunk so the comparison is per-DB-page.

The dual pageid encoding (FILEIO_SET_BACKUP_PAGE_ID_COPY writes iopageid at the head and iopageid_dup at a runtime-computed tail offset) is the only end-to-end consistency check; FILEIO_CHECK_RESTORE_PAGE_ID compares the two during decompression.

`fileio_finish_backup` — closing the volume

After all volumes are copied, fileio_finish_backup writes a FILEIO_BACKUP_END_PAGE_ID marker, pads the buffer up to the device’s I/O quantum (so tape devices that require record-aligned writes accept the trailer), fsyncs the device, and stamps the end timestamp into the header by re-seeking to offsetof(FILEIO_BACKUP_HEADER, end_time). The 1-second monotonicity sleep mentioned earlier sits at the very end of this function, after the data is durable but before control returns to logpb_backup.

Backup pipeline diagram

flowchart TD
    A[backupdb CLI] --> B[boot_backup → xboot_backup]
    B --> C{logpb_backup}
    C --> D[Acquire LOG_CS, gate on backup_in_progress]
    D --> E[Wait for checkpoint completion<br/>then freeze run_nxchkpt_atpageid]
    E --> F[Capture start LSA<br/>chkpt_lsa = log_Gl.hdr.chkpt_lsa]
    F --> G[fileio_initialize_backup<br/>allocate session + bkuphdr]
    G --> H[fileio_start_backup<br/>write FILEIO_BACKUP_START_PAGE_ID + header]
    H --> I[For each DB volume:<br/>fileio_backup_volume]
    I --> J[Per page chunk:<br/>read → LSA skip predicate → optional LZ4 →<br/>FILEIO_BACKUP_PAGE with dual pageid]
    J --> I
    I --> K[After data volumes:<br/>logpb_backup_needed_archive_logs<br/>copy archives needed for redo]
    K --> L[Append _lginfo + bkvinf]
    L --> M[fileio_finish_backup<br/>FILEIO_BACKUP_END_PAGE_ID + end_time<br/>· 1-second monotonicity sleep]
    M --> N[Release checkpoint freeze<br/>backup_in_progress = false]

Restore-side data structures

The restore side reuses FILEIO_BACKUP_SESSION with type = FILEIO_BACKUP_READ and adds two structures for incremental replay tracking.

FILEIO_RESTORE_PAGE_BITMAP (struct page_bitmap with vol_id, size, bitmap[], next) records which physical pages of a target volume have already been restored. Because incrementals are applied in reverse time order (newest first), a page set by L2 must not be overwritten when an older L1 is later applied. fileio_page_bitmap_set flips a bit on write; fileio_page_bitmap_is_set is consulted before each write so older backups skip pages already covered by newer ones. The list of bitmaps (one per vol_id) lives on the stack of logpb_restore.

BO_RESTART_ARG is the user-side request envelope passed from restoredb through boot_restart_from_backup into logpb_restore:

// bo_restart_arg — boot_sr.h
struct bo_restart_arg {
  bool printtoc;                   // -t: dump backup TOC and exit
  time_t stopat;                   // -d "YYYY-MM-DD..." or current time
  const char *backuppath;          // -B: explicit backup file path
  int level;                       // -l: highest level to apply
  const char *verbose_file;
  bool newvolpath;                 // -u: relocate volumes per databases.txt
  bool restore_upto_bktime;        // -d backuptime: stop at end_time
  bool restore_slave;
  bool is_restore_from_backup;
  INT64 db_creation;
  LOG_LSA restart_repl_lsa, restart_committed_lsa;
  char keys_file_path[PATH_MAX];   // TDE master-key path override
};

Restore orchestration: `logpb_restore`

logpb_restore is shaped as a level-walk: it starts at the user-requested level and descends to L0, applying each level in turn.

// logpb_restore — log_page_buffer.c
int
logpb_restore (THREAD_ENTRY *thread_p, const char *db_fullname, const char *logpath,
               const char *prefix_logname, bo_restart_arg *r_args)
{
  try_level   = (FILEIO_BACKUP_LEVEL) r_args->level;
  start_level = try_level;
  fileio_page_bitmap_list_init (&page_bitmap_list);
  LOG_CS_ENTER (thread_p);

  while (try_level >= FILEIO_BACKUP_FULL_LEVEL && try_level < FILEIO_BACKUP_UNDEFINED_LEVEL)
  {
    if (!first_time) {
      bkup_match_time = session->bkup.bkuphdr->previnfo[try_level].at_time;  // chain check
      fileio_finish_restore (thread_p, session);
    }

    // 1. Locate + authenticate the backup volume for this level
    fileio_get_backup_volume (thread_p, db_fullname, logpath,
                              r_args->backuppath, try_level, from_volbackup);
    fileio_start_restore (thread_p, db_fullname, from_volbackup, db_creation,
                          &bkdb_iopagesize, &bkdb_compatibility, &session_storage,
                          try_level, printtoc, bkup_match_time,
                          r_args->verbose_file, r_args->newvolpath);
    session = &session_storage;

    if (first_time) {
      // Resolve PITR target
      if (r_args->restore_upto_bktime)
        r_args->stopat = (time_t) session->bkup.bkuphdr->end_time;
      else if (r_args->stopat > 0)
        logpb_check_stop_at_time (session, r_args->stopat,
                                  (time_t) (session->bkup.bkuphdr->end_time > 0
                                            ? session->bkup.bkuphdr->end_time
                                            : session->bkup.bkuphdr->start_time));
      LSA_COPY (&session->bkup.last_chkpt_lsa, &session->bkup.bkuphdr->chkpt_lsa);
    }

    // 2. Walk the volumes inside this backup file
    while (true) {
      another_vol = fileio_get_next_restore_file (thread_p, session, to_volname, &to_volid);
      if (another_vol == 0) break;        // FILEIO_BACKUP_END_PAGE_ID seen

      // Log volumes get staged to *_tmp first
      if (to_volid == LOG_DBLOG_ACTIVE_VOLID || to_volid == LOG_DBLOG_INFO_VOLID
          || to_volid == LOG_DBLOG_ARCHIVE_VOLID) {
        fileio_make_temp_log_files_from_backup (tmp_logfiles_from_backup, ...);
        volume_name_p = tmp_logfiles_from_backup;
      } else volume_name_p = to_volname;

      // Skip log/info volumes on lower-level passes (highest level wins)
      if (!first_time && (to_volid == LOG_DBLOG_BKUPINFO_VOLID
                          || to_volid == LOG_DBLOG_ACTIVE_VOLID
                          || to_volid == LOG_DBLOG_INFO_VOLID
                          || to_volid == LOG_DBVOLINFO_VOLID
                          || to_volid == LOG_DBLOG_ARCHIVE_VOLID
                          || to_volid == LOG_DBTDE_KEYS_VOLID)) {
        fileio_skip_restore_volume (thread_p, session); continue;
      }

      // 3. Per-volid bitmap (created on first sight, reused across levels)
      if (to_volid >= LOG_DBFIRST_VOLID) {
        page_bitmap = fileio_page_bitmap_list_find (&page_bitmap_list, to_volid);
        if (page_bitmap == NULL) {
          page_bitmap = fileio_page_bitmap_create (to_volid, total_pages);
          fileio_page_bitmap_list_add (&page_bitmap_list, page_bitmap);
        }
      }

      // 4. Restore pages; bitmap suppresses overwrites of newer-level pages
      fileio_restore_volume (thread_p, session, volume_name_p, verbose_to_volname,
                             prev_volname, page_bitmap,
                             /*remember_pages=*/ start_level > FILEIO_BACKUP_FULL_LEVEL,
                             is_prev_volheader_restored, unlinked_volinfo);

      // 5. Promote staged log iff backup's copy is fresher than on-disk
      if (volume_name_p == tmp_logfiles_from_backup) {
        if (logpb_is_log_active_from_backup_useful (...))
          os_rename_file (tmp_logfiles_from_backup, to_volname);
        else
          unlink (tmp_logfiles_from_backup);
      }
    }
    try_level = (FILEIO_BACKUP_LEVEL) (try_level - 1);
  }

  // 6. Late-fix linkage between volumes whose headers were absent in incrementals
  for (const auto &[volid, volnames] : unlinked_volinfo)
    disk_set_link (...);

  fileio_finish_restore (thread_p, session);
  LOG_CS_EXIT (thread_p);
  fileio_page_bitmap_list_destroy (&page_bitmap_list);
}

Reverse-time replay. try_level decrements. Each pass writes only pages whose (volid, pageid) bit is not yet set, so the freshest copy wins. The alternative (apply L0 then patch with L1, L2) would double-read shared pages and fight with partial volume headers in incrementals.

Log staging. Active/archive/info logs are extracted to *_tmp files first. logpb_is_log_active_from_backup_useful then decides whether the backup’s log is fresher than what is already on disk. If the on-disk active log is more recent than the next archive in the backup, the staged copy is discarded; otherwise it replaces the in-tree file. This matters for crash-and-restore-on-the-same-host scenarios where on-disk log records are newer than the backup.

Header-page tolerance. Incrementals frequently omit page 0 (the disk header) because its prv.lsa did not advance past the parent’s start LSA. fileio_restore_volume notes whether the disk header was present (incremental_includes_volume_header); if not, disk_set_link is deferred via the unlinked_volinfo map and applied after all levels have been processed. The L0 disk header survives as canonical while later passes still write the rest of the volume.

TDE key handling. The _keys master-key file is restored from the highest-level backup in the chain (closest to stopat); lower-level passes skip it. --keys-file-path overrides this for keys rotated post-parent.

Per-volume restore: `fileio_restore_volume`

fileio_restore_volume mirrors fileio_backup_volume plus two concerns: filling holes (for full backups, gaps between consecutive iopageids represent unallocated pages that must be zeroed) and skipping pages already covered by newer levels.

// fileio_restore_volume — file_io.c (key parts)
while (true) {
  fileio_decompress_restore_volume (thread_p, session_p, nbytes);

  if (FILEIO_GET_BACKUP_PAGE_ID (session_p->dbfile.area) == FILEIO_BACKUP_FILE_END_PAGE_ID) {
    if (session_p->dbfile.level == FILEIO_BACKUP_FULL_LEVEL && next_page_id < npages)
      fileio_fill_hole_during_restore (thread_p, &next_page_id, npages, session_p, bitmap);
    break;
  }

  // Sanity check: pageid in range, dual pageid match
  if (FILEIO_GET_BACKUP_PAGE_ID (session_p->dbfile.area) > from_npages
      || !FILEIO_CHECK_RESTORE_PAGE_ID (session_p->dbfile.area, bkpagesize))
    return ER_IO_RESTORE_READ_ERROR;

  // Hole-fill on full backup
  if (session_p->dbfile.level == FILEIO_BACKUP_FULL_LEVEL
      && next_page_id < FILEIO_GET_BACKUP_PAGE_ID (session_p->dbfile.area))
    fileio_fill_hole_during_restore (thread_p, &next_page_id,
                                     session_p->dbfile.area->iopageid, session_p, bitmap);

  // Write each DB page in this chunk; bitmap suppresses overwrites
  buffer_p = (char *) &session_p->dbfile.area->iopage;
  for (i = 0; i < unit && next_page_id < npages; i++) {
    fileio_write_restore (thread_p, bitmap, session_p->dbfile.vdes,
                          buffer_p + i * IO_PAGESIZE, session_p->dbfile.volid,
                          next_page_id++, session_p->dbfile.level);
  }
}

// After streaming: stamp checkpoint LSA into volume header (only if disk header was present)
if (session_p->dbfile.volid >= LOG_DBFIRST_VOLID
    && (session_p->dbfile.level == FILEIO_BACKUP_FULL_LEVEL
        || incremental_includes_volume_header)) {
  disk_set_creation (thread_p, volid, to_vol_label_p, &backup_header_p->db_creation,
                     &session_p->bkup.last_chkpt_lsa, false, DISK_FLUSH_AND_INVALIDATE);
  if (volid != LOG_DBFIRST_VOLID && is_prev_vol_header_restored)
    disk_set_link (thread_p, prev_volid, volid, to_vol_label_p, false,
                   DISK_FLUSH_AND_INVALIDATE);
}

fileio_write_restore is the bitmap consultation point: if bitmap != NULL && fileio_page_bitmap_is_set (bitmap, page_id), the write is skipped; otherwise the write proceeds and fileio_page_bitmap_set flips the bit. Pure-L0 restores pass bitmap = NULL so every page is written unconditionally.

disk_set_creation writes the backup header’s chkpt_lsa (session_p->bkup.last_chkpt_lsa) into the volume’s header page. The redo phase later reads volume-header LSAs and picks the minimum across all volumes as the redo-start cursor.

Restore pipeline diagram

flowchart TD
    A[restoredb CLI] --> B[boot_restart_from_backup]
    B --> C[boot_restart_server with from_backup=true]
    C --> D[logpb_restore]
    D --> E{first_time?}
    E -->|yes| F[fileio_get_backup_volume<br/>resolve backup file path]
    E -->|no| G[bkup_match_time = previnfo at_time<br/>finish previous level]
    G --> F
    F --> H[fileio_start_restore<br/>read + auth FILEIO_BACKUP_HEADER]
    H --> I[Resolve stopat<br/>backup_time + r_args->stopat]
    I --> J[Loop fileio_get_next_restore_file<br/>one DB volume at a time]
    J --> K{volid is log?}
    K -->|yes| L[stage to *_tmp filename]
    K -->|no| M[restore directly]
    L --> N[fileio_restore_volume]
    M --> N
    N --> O[Per page: bitmap-skip → write → set bit<br/>fill holes for full level]
    O --> J
    J --> P{try_level > 0?}
    P -->|yes| Q[try_level--<br/>match previnfo at_time]
    Q --> F
    P -->|no| R[Fix unlinked volume headers<br/>via disk_set_link]
    R --> S[fileio_finish_restore<br/>fsync + close]
    S --> T[boot_restart_server proceeds<br/>mount + log_recovery]
    T --> U[log_recovery analysis →<br/>log_recovery_redo from chkpt_lsa<br/>up to stopat]
    U --> V[log_recovery_resetlog<br/>truncate log past stopat]

Recovery replay: `log_recovery` and `log_recovery_resetlog`

After logpb_restore returns, boot_restart_server mounts the restored volumes, calls boot_get_db_parm, walks the rest of the permanent volumes via boot_find_rest_volumes, and enters log_recovery (thread_p, ismedia_crash=1, &stopat). The ismedia_crash=1 flag is the signal that this is a backup-driven restart — it tells analysis to look at every volume’s disk header, not just log_Gl.hdr.chkpt_lsa, when picking the redo-start LSA.

// log_recovery — log_recovery.c
void
log_recovery (THREAD_ENTRY *thread_p, int ismedia_crash, time_t *stopat)
{
  LSA_COPY (&rcv_lsa, &log_Gl.hdr.chkpt_lsa);
  if (ismedia_crash) {
    // Lower rcv_lsa to the oldest chkpt_lsa stored in any volume header
    // (written there during fileio_restore_volume → disk_set_creation)
    fileio_map_mounted (thread_p,
                        (bool (*)(THREAD_ENTRY *, VOLID, void *)) log_rv_find_checkpoint,
                        &rcv_lsa);
  } else if (stopat) *stopat = -1;             // crash recovery never stops early

  log_recovery_analysis (thread_p, &rcv_lsa, &start_redolsa, &end_redo_lsa,
                         ismedia_crash, stopat, &did_incom_recovery, &num_redo_log_records);
  log_recovery_redo (thread_p, &start_redolsa, &end_redo_lsa);
  log_recovery_undo (thread_p);

  if (did_incom_recovery) {                     // analysis cut log short at *stopat
    log_recovery_resetlog (thread_p, &record_header_lsa, prev_lsa);
  }
}

The PITR cut is made inside log_recovery_analysis, not redo. Analysis scans forward from rcv_lsa parsing every record header. When it sees a LOG_COMMIT or LOG_ABORT whose LOG_REC_DONETIME.at_time > *stopat, it sets did_incom_recovery = true and lowers end_redo_lsa to the LSA of the previous record. Redo then replays only [start_redolsa, end_redo_lsa], which cannot include the late-committing transaction. log_recovery_resetlog afterwards truncates the active log past end_redo_lsa and invalidates archives containing records past the cut so a subsequent crash recovery cannot re-apply them — making PITR irreversible (identical to Postgres’s recovery_target semantics).

Mount on restored volumes: `boot_restart_server`

The from_backup branch of boot_restart_server runs logpb_restore (above), then proceeds with the standard restart sequence: boot_mount (LOG_DBFIRST_VOLID), disk_get_boot_hfid, boot_get_db_parm, tde_cipher_initialize (with r_args->keys_file_path), heap_cache_class_info, boot_find_rest_volumes (which walks _vinf and mounts every non-first permanent volume; with r_args != NULL it accepts newvolpath = true for databases.txt-driven relocation), disk_manager_init, and finally log_recovery (..., &r_args->stopat). After redo+undo, logpb_recreate_volume_info rewrites _vinf so subsequent normal restarts see the same set of volumes the restore reconstructed.

Multi-volume backup

If the destination is a directory and cubrid_backup_volume_max_size_bytes is set, fileio_flush_backup writes a FILEIO_BACKUP_VOL_CONT_PAGE_ID marker when the next write would overflow, then closes the current file and either auto-generates the next unit (db.bkLv0v002, v003, …) via fileio_get_next_backup_volume or prompts. fileio_add_volume_to_backup_info updates the in-memory bkvinf cache that is flushed to _lginfo/db.bkupinfo. On restore, fileio_continue_restore follows the chain via db_prec_bkvolname and prompts for missing units.

CLI surface

backupdb (util_cs.c) flags map into logpb_backup parameters: -D (destination, directory or FIFO), -l (level 0/1/2), -r (remove archives older than chkpt_lsa after backup), -o (verbose output), --no-check (skip CHECKDB pre-pass — consistency verification can outrun the backup itself), --no-compress (disable LZ4), -t (read worker count, server mode only), --sleep-msecs (per-1MB throttle), --separate-keys (TDE master key sidecar), -S (SA mode).

restoredb (util_sa.c, SA-only by definition) flags map into BO_RESTART_ARG: -d "YYYY-MM-DD..." or -d backuptime (stopat / restore_upto_bktime), -l (level, highest level to apply), -B (backuppath, explicit backup path), -o (verbose_file), -u (newvolpath, use databases.txt), -t (printtoc, list only), -p (partial archive log replay), --keys (keys_file_path).

What the restore does not do

A few things worth flagging:

No CRC over backup pages. The dual-pageid trick is the only consistency check; corrupt-but-valid-pageid pages will pass through. CUBRID relies on the underlying filesystem and on operator process around backup-volume integrity (compression with checksum-bearing LZ4 frames is the closest the engine gets, and it is disabled by --no-compress).
No incremental verify. restoredb -t (printtoc) lists what is in the backup but does not validate it end-to-end the way Oracle RMAN’s restore validate does. The backup is “validated” only by attempting an actual restore.
No streaming WAL. Unlike Postgres, there is no equivalent of archive_command that would let the user keep restoring forward with WAL beyond what was archived inside the backup. The PITR window is [chkpt_lsa, end_lsa] for that backup; to extend it the user would need a newer backup or HA replica replay (which is a different code path entirely, in the cubrid_replica_* modules).
No partial-volume restore. Either the entire volume comes back or none of it does; there is no per-table restore. Logical export is via unloaddb / loaddb.

Notable Design Decisions

Freeze-checkpoint via sentinel reuse. Setting log_Gl.run_nxchkpt_atpageid = NULL_PAGEID borrows the in-progress sentinel from the checkpoint daemon rather than introducing a new mutex. Economical, but couples backup correctness to checkpoint semantics — a future change allowing concurrent checkpoints would silently break backup gating.

LSA-internal, timestamp-external. The LSA_LT(parent_chkpt_lsa, page.prv.lsa) rule is LSA-clean, but the user-facing PITR surface is wall-clock time. logpb_check_stop_at_time does almost nothing — the actual translation happens inside log_recovery_analysis while parsing each commit/abort record. Adding --stopat-lsa would be small in BO_RESTART_ARG plus log_recovery_analysis.

End-time monotonicity sleep. The 1-second forced sleep in fileio_finish_backup is an acknowledged workaround pending millisecond-resolution LOG_REC_DONETIME. Until then, restoring to end_time - 1 is conservative.

Reverse-time level walk with bitmap. One bit per DB page per volume — ~80 MB for a 10 TB / 16 KB-page database, fine. The bitmap list (FILEIO_RESTORE_PAGE_BITMAP_LIST) walks linearly in fileio_page_bitmap_list_find; with hundreds of volumes this is O(volumes²) but not currently a hotspot.

Disk-header tolerance. incremental_includes_volume_header + unlinked_volinfo is post-hoc fixup that emerged because L1 backups frequently omit the disk header page (its prv.lsa did not advance past the parent). Forcing the header into every incremental would simplify restore at the cost of slightly larger incrementals.

Source Position Table

Symbol	File	Line
`FILEIO_BACKUP_LEVEL` enum	`src/storage/file_io.h`	96
`FILEIO_BACKUP_VOL_TYPE` enum	`src/storage/file_io.h`	122
`FILEIO_BACKUP_TYPE` enum	`src/storage/file_io.h`	146
`fileio_backup_page` struct	`src/storage/file_io.h`	239
`page_bitmap` struct	`src/storage/file_io.h`	255
`fileio_backup_record_info` struct	`src/storage/file_io.h`	271
`fileio_backup_header` struct	`src/storage/file_io.h`	280
`fileio_backup_buffer` struct	`src/storage/file_io.h`	317
`fileio_backup_db_buffer` struct	`src/storage/file_io.h`	341
`fileio_backup_file_header` struct	`src/storage/file_io.h`	351
`io_backup_session` struct	`src/storage/file_io.h`	423
`FILEIO_BACKUP_*_PAGE_ID` constants	`src/storage/file_io.c`	273
`FILEIO_FULL_LEVEL_EXP`	`src/storage/file_io.c`	197
`FILEIO_BACKUP_HEADER_IO_SIZE`	`src/storage/file_io.c`	205
`FILEIO_BACKUP_FILE_HEADER_PAGE_SIZE`	`src/storage/file_io.c`	227
`fileio_initialize_backup`	`src/storage/file_io.c`	6732
`fileio_initialize_backup_thread`	`src/storage/file_io.c`	6654
`fileio_finalize_backup_thread`	`src/storage/file_io.c`	6966
`fileio_abort_backup`	`src/storage/file_io.c`	7055
`fileio_start_backup`	`src/storage/file_io.c`	7158
`fileio_write_backup_end_time_to_header`	`src/storage/file_io.c`	7258
`fileio_finish_backup`	`src/storage/file_io.c`	7340
`fileio_remove_all_backup`	`src/storage/file_io.c`	7486
`fileio_compress_backup_node`	`src/storage/file_io.c`	7702
`fileio_write_backup_node`	`src/storage/file_io.c`	7788
`fileio_read_backup_volume`	`src/storage/file_io.c`	7844
`fileio_write_backup_volume`	`src/storage/file_io.c`	8055
`fileio_start_backup_thread`	`src/storage/file_io.c`	8190
`fileio_backup_volume`	`src/storage/file_io.c`	8258
`fileio_flush_backup`	`src/storage/file_io.c`	8620
`fileio_write_backup_header`	`src/storage/file_io.c`	8930
`fileio_initialize_restore`	`src/storage/file_io.c`	8987
`fileio_abort_restore`	`src/storage/file_io.c`	9023
`fileio_read_restore`	`src/storage/file_io.c`	9039
`fileio_read_restore_header`	`src/storage/file_io.c`	9240
`fileio_start_restore`	`src/storage/file_io.c`	9313
`fileio_continue_restore`	`src/storage/file_io.c`	9384
`fileio_finish_restore`	`src/storage/file_io.c`	9732
`fileio_list_restore`	`src/storage/file_io.c`	9752
`fileio_get_backup_volume`	`src/storage/file_io.c`	9895
`fileio_get_next_restore_file`	`src/storage/file_io.c`	10009
`fileio_fill_hole_during_restore`	`src/storage/file_io.c`	10084
`fileio_decompress_restore_volume`	`src/storage/file_io.c`	10133
`fileio_restore_volume`	`src/storage/file_io.c`	10287
`fileio_write_restore`	`src/storage/file_io.c`	10594
`fileio_skip_restore_volume`	`src/storage/file_io.c`	10647
`fileio_get_next_backup_volume`	`src/storage/file_io.c`	10891
`fileio_add_volume_to_backup_info`	`src/storage/file_io.c`	11042
`fileio_page_bitmap_list_init`	`src/storage/file_io.c`	11694
`fileio_page_bitmap_list_find`	`src/storage/file_io.c`	11746
`fileio_page_bitmap_list_add`	`src/storage/file_io.c`	11777
`fileio_page_bitmap_set`	`src/storage/file_io.c`	11850
`fileio_page_bitmap_is_set`	`src/storage/file_io.c`	11865
`logpb_initialize_backup_info`	`src/transaction/log_page_buffer.c`	1277
`logpb_backup_for_volume`	`src/transaction/log_page_buffer.c`	7447
`logpb_backup`	`src/transaction/log_page_buffer.c`	7593
`logpb_check_stop_at_time`	`src/transaction/log_page_buffer.c`	8356
`logpb_restore`	`src/transaction/log_page_buffer.c`	8416
`logpb_copy_database`	`src/transaction/log_page_buffer.c`	9404
`logpb_backup_needed_archive_logs`	`src/transaction/log_page_buffer.c`	10752
`logpb_backup_level_info_to_string`	`src/transaction/log_page_buffer.c`	11253
`log_recovery`	`src/transaction/log_recovery.c`	736
`log_recovery_redo`	`src/transaction/log_recovery.c`	3251
`log_recovery_resetlog`	`src/transaction/log_recovery.c`	5221
`bo_restart_arg` struct	`src/transaction/boot_sr.h`	112
`boot_restart_server` (from_backup branch)	`src/transaction/boot_sr.c`	1969
`xboot_restart_from_backup`	`src/transaction/boot_sr.c`	2808
`boot_reset_mk_after_restart_from_backup`	`src/transaction/boot_sr.c`	2899
`xboot_backup`	`src/transaction/boot_sr.c`	3918
`backupdb` (CLI entry)	`src/executables/util_cs.c`	130
`restoredb` (CLI entry)	`src/executables/util_sa.c`	967

Future Work

Natural extensions, ordered from least to most invasive:

Page-level CRC. A 4-byte checksum in the FILEIO_BACKUP_PAGE trailer would catch silent corruption between fileio_write_backup and fileio_decompress_restore_volume. Backward-incompatible bump of bk_hdr_version; the engine already supports version-gated reads (see CUBRID_MAGIC_DATABASE_BACKUP_OLD in fileio_continue_restore).

Stopat by LSA. Exposing --stopat-lsa would side-step timestamp monotonicity. Internally a tiny change in log_recovery_analysis (switch comparator from at_time to LSA); UX challenge is that LSAs are not human-friendly. Mitigation: extend cubrid_log output to print at_time, LSA pairs so operators can pick a target.

Online incremental verify. A restoredb --validate mode that streams every page through FILEIO_CHECK_RESTORE_PAGE_ID (and optional CRC) without writing to disk. Skeleton already exists in fileio_list_restore (header-only); extending to per-page is straightforward.

Streaming archive log to backup destination. Postgres-style continuous WAL archiving into the backup directory would extend the PITR window beyond end_time. Requires either a new cub_backup_writer daemon or a hook in the archive flush path. Disk-format change is minimal (bkvinf already has per-archive entries).

Block-device direct I/O. fileio_initialize_backup detects raw devices but always uses buffered I/O. Adding O_DIRECT for the backup destination would cut double-buffering on very large backups; trickiness is bkpagesize alignment to the device logical block size.

Per-table backup. CUBRID’s disk format scatters table data across heap files, B-trees, and overflow files; a logical unloaddb already covers this use case. A physical per-table backup would require walking file_tracker.c and copying only the file IDs belonging to a class — a significant project across class catalog, B-tree, and partitioning code.