CUBRID Flashback — Transaction Summary and Per-Tran Replay From the Log

Contents:

Theoretical Background
Common DBMS Design
CUBRID’s Approach
Source Walkthrough
Source verification (as of 2026-05-01)
Beyond CUBRID — Comparative Designs & Research Frontiers
Sources

Theoretical Background

Flashback is the question “what happened to my data between two points in the past, and can I see it?” It is not rollback — the database state is not changed; the operator gets a report of changes (or a regenerable script) that they can manually apply. The closest classical concept is “log mining” — turning the WAL into a queryable history.

Database Internals (Petrov) does not have a dedicated flashback chapter, but the topic sits at the intersection of ch. 5 (Recovery, WAL) and ch. 11 (Logging). Two implementation choices the model leaves open shape every flashback implementation and frame the rest of this document:

Forward walk or backward walk? A user typically asks “between time A and B”. The implementation can scan forward from A, or scan backward from B; the choice determines which direction the per-transaction LSA chain is followed. CUBRID picks forward walk in two phases: (1) summary phase scans forward and accumulates per-trid counts; (2) loginfo phase scans the chosen trid’s range forward, materialising row images. The “backward” framing in the doc is conceptual — flashback restores a past state, even though the implementation walks the log forward.
Whole-log mining or filtered? A long log range can hold millions of records. Filtering by class OID and user is essential for human-scale output. CUBRID supports both filters and caps the summary size at FLASHBACK_MAX_NUM_TRAN_TO_SUMMARY so an unfiltered query doesn’t blow up.

After the choices are named, every CUBRID-specific structure in this document either implements one of them or makes the access faster.

Common DBMS Design

Engines that ship flashback (Oracle, CUBRID, SQL Server’s “Temporal Tables”) share a small handful of patterns.

Two-phase pull: summary then detail

The user almost never wants every event in a wide time range. First phase: enumerate transactions, return a small per-tran summary (trid, user, time, counts of INSERT/UPDATE/DELETE, classes touched). Second phase: the user picks a transaction and asks for its full record stream. The summary phase amortises the log walk across many trans; the detail phase amortises the record-decode across many DML statements within the chosen trans.

Re-use of CDC infrastructure

Flashback and CDC both walk the log forward. Modern engines share the walker code: the per-record decoder, the LSA-to-time mapping, the indirect undo/redo chase. CUBRID does this — the flashback loginfo path packs into CDC_LOGINFO_ENTRY (the same struct CDC uses) and chases data records via cdc_get_recdes with is_flashback=true.

Archive retention separate from CDC retention

A flashback request can pin log archives older than any CDC consumer needs. The retention watermark is therefore a two-input minimum: smallest CDC pageid kept and smallest flashback pageid kept. CUBRID exposes cdc_min_log_pageid_to_keep and flashback_min_log_pageid_to_keep separately and the archive remove daemon takes the min of both.

Time → LSA mapping

The user phrases the request in wall-clock time, but the engine walks LSAs. Flashback needs time_t → LOG_LSA resolution. CUBRID’s flashback_verify_time is the boundary checker; it walks log records’ commit timestamps to find the LSA that brackets a given time.

Theory ↔ CUBRID mapping

Theoretical concept	CUBRID name
Flashback summary entry	`FLASHBACK_SUMMARY_ENTRY { trid, user, start_time, end_time, counts, lsas, classoid_set }`
Server-side summary context	`FLASHBACK_SUMMARY_CONTEXT` (`flashback.h:87`)
Server-side loginfo context	`FLASHBACK_LOGINFO_CONTEXT` (`flashback.h:100`)
Client-side summary entry	`FLASHBACK_SUMMARY_INFO` (`flashback_cl.h:49`)
Client-side summary map	`FLASHBACK_SUMMARY_INFO_MAP` (`flashback_cl.h:58`)
Wall-time → LSA resolver	`flashback_verify_time` (`flashback.h:117`)
Summary builder	`flashback_make_summary_list` (`flashback.c:284`)
Loginfo builder	`flashback_make_loginfo` (`flashback.c:767`)
Server-side init	`flashback_initialize` (`flashback.c:109`)
Archive-keep watermark	`flashback_min_log_pageid_to_keep` (`flashback.h:128`)
Active-flashback gate	`flashback_is_needed_to_keep_archive` (`flashback.h:129`)
Time-budget gate	`flashback_check_time_exceed_threshold` (`flashback.h:130`)
Per-event entry shape (shared with CDC)	`CDC_LOGINFO_ENTRY` (`log_impl.h`)
Per-tran summary cap	`FLASHBACK_MAX_NUM_TRAN_TO_SUMMARY` macro → `PRM_ID_FLASHBACK_MAX_TRANSACTION`

CUBRID’s Approach

The flashback module has three moving parts: the summary phase that turns “time range + filters” into a per-transaction list, the loginfo phase that turns “one transaction” into a detailed event stream, and the archive-retention discipline that keeps log volumes alive while flashback requests are in progress. We walk them in that order.

Overall structure

flowchart LR
  subgraph CL["Utility / client (flashback_cl)"]
    USER["operator: cubrid flashback"]
    DEC["unpack + print"]
    USER --> DEC
  end
  subgraph SRV["Server (flashback.c)"]
    PHASE1["flashback_make_summary_list\n(forward walk, count per trid)"]
    PHASE2["flashback_make_loginfo\n(forward walk for one trid)"]
    PACK["flashback_pack_summary_entry\nflashback_pack_loginfo"]
  end
  subgraph LOG["WAL (archived volumes)"]
    LOGV["log archive volumes\n(cubrid-log-manager.md)"]
  end
  subgraph CDCSHR["Shared with CDC"]
    LRD["log_reader"]
    CGR["cdc_get_recdes (is_flashback=true)"]
    LE["CDC_LOGINFO_ENTRY"]
  end
  USER -->|time A,B + class/user filter| PHASE1
  PHASE1 --> LRD --> LOGV
  PHASE1 --> PACK -->|summary buffer| DEC
  USER -->|chosen trid| PHASE2
  PHASE2 --> LRD
  PHASE2 --> CGR
  PHASE2 --> LE --> PACK -->|loginfo buffer| DEC
  PHASE1 -.set retention.-> RET["flashback_set_min_log_pageid_to_keep"]
  RET -.archive remove daemon checks.-> LOGV

The figure encodes three boundaries. (client / server) the human-facing print/format runs client-side; the log walk runs server-side. (phase 1 / phase 2) the summary phase produces small per-tran rows; the loginfo phase produces the verbose event stream for one chosen tran. (flashback / CDC) the forward-walking machinery is shared with CDC, and the per-event wire format is CDC_LOGINFO_ENTRY — flashback was built after CDC and reuses its plumbing rather than duplicating.

Summary phase — per-transaction roll-up

The phase is driven by a context object:

// FLASHBACK_SUMMARY_CONTEXT — src/transaction/flashback.h:87
struct flashback_summary_context
{
  LOG_LSA start_lsa;             /* time A → LSA */
  LOG_LSA end_lsa;               /* time B → LSA */
  char *user;                    /* whitelist user (or NULL = all) */
  int num_summary;               /* output: filled by builder */
  int num_class;
  std::vector<OID> classoids;    /* whitelist class OIDs */
  std::map<TRANID, FLASHBACK_SUMMARY_ENTRY> summary_list;
};

The summary list maps trid → per-tran roll-up. Each entry:

// FLASHBACK_SUMMARY_ENTRY — src/transaction/flashback.h:63
struct flashback_summary_entry
{
  TRANID trid;
  char user[DB_MAX_USER_LENGTH + 1];
  time_t start_time;
  time_t end_time;
  int num_insert;
  int num_update;
  int num_delete;
  LOG_LSA start_lsa;
  LOG_LSA end_lsa;
  std::unordered_set<OID> classoid_set;   /* classes this tran touched */
};

flashback_make_summary_list (flashback.c:284) is the builder. Its body walks the log forward from start_lsa to end_lsa, visiting every record. For each record:

If it’s a LOG_SUPPLEMENTAL_INFO of type INSERT/UPDATE/DELETE on a whitelisted class and trid, increment the corresponding per-tran counter and add the class OID to the per-tran set.
If it’s a LOG_SUPPLEMENT_TRAN_USER, record the user name (and filter the trid out if the user doesn’t match).
If it’s LOG_COMMIT or LOG_ABORT, finalise the per-tran end LSA and end time.

The summary list is capped at FLASHBACK_MAX_NUM_TRAN_TO_SUMMARY trans (configurable parameter PRM_ID_FLASHBACK_MAX_TRANSACTION); beyond the cap, additional trans are dropped from the summary to keep memory bounded.

The packing function flashback_pack_summary_entry (flashback.h:119) serialises the summary into a wire buffer the client decodes via flashback_unpack_and_print_summary (flashback_cl.h:62). The wire size of one entry without the class set is:

// OR_SUMMARY_ENTRY_SIZE_WITHOUT_CLASS — src/transaction/flashback.h:79
#define OR_SUMMARY_ENTRY_SIZE_WITHOUT_CLASS \
  (OR_INT_SIZE                  /* trid */ \
   + DB_MAX_USER_LENGTH + MAX_ALIGNMENT \
   + OR_INT64_SIZE * 2          /* start_time, end_time */ \
   + OR_INT_SIZE * 3            /* counts */ \
   + OR_LOG_LSA_SIZE * 2        /* start/end LSA */ \
   + OR_INT_SIZE)               /* num classes */

Loginfo phase — per-transaction detail

Once the operator picks a transaction from the summary, the second phase fetches its full event stream:

// FLASHBACK_LOGINFO_CONTEXT — src/transaction/flashback.h:100
struct flashback_loginfo_context
{
  TRANID trid;                                /* the chosen trid */
  char *user;
  LOG_LSA start_lsa;                          /* normally summary.start_lsa */
  LOG_LSA end_lsa;
  int num_class;                              /* class filter cardinality */
  int forward;                                /* direction (always forward
                                                 in current implementation) */
  int num_loginfo;                            /* output count */
  int queue_size;
  OID invalid_class;                          /* class observed during walk
                                                 that wasn't in filter — for
                                                 diagnostics */
  std::unordered_set<OID> classoid_set;       /* whitelist */
  std::queue<CDC_LOGINFO_ENTRY *> loginfo_queue;
};

flashback_make_loginfo (flashback.c:767) walks the log range again, this time emitting one CDC_LOGINFO_ENTRY per matching event. The chase from supplemental → underlying data record uses cdc_get_recdes with is_flashback=true:

// from cubrid-cdc.md, the shared chase
int cdc_get_recdes (THREAD_ENTRY *thread_p,
                     LOG_LSA *undo_lsa, RECDES *undo_recdes,
                     LOG_LSA *redo_lsa, RECDES *redo_recdes,
                     bool is_flashback);

The is_flashback=true flag changes behaviour in two places: (a) missing pages or broken chains are tolerated (a chain that goes off the end of a removed archive returns S_END rather than S_ERROR); (b) the function is willing to re-fetch from older archives if needed.

The packing function flashback_pack_loginfo (flashback.h:123) serialises the queue into a wire buffer that the client prints via flashback_print_loginfo (flashback_cl.h:65).

Time → LSA — `flashback_verify_time`

The user supplies a wall-clock time range; the engine needs LSAs:

// flashback_verify_time — src/transaction/flashback.h:117
int flashback_verify_time (THREAD_ENTRY *thread_p,
                            time_t *start_time, time_t *end_time,
                            LOG_LSA *start_lsa, LOG_LSA *end_lsa);

The function walks log records carrying timestamps (LOG_REC_DONETIME, LOG_REC_HA_SERVER_STATE, the donetime field on LOG_REC_START_POSTPONE) until it brackets the requested times. The output *_lsa are the LSAs that span the time range; on out-of-range request (e.g., before the oldest archive) it returns an error so the operator can be told that flashback can’t go back that far.

Archive retention — `flashback_min_log_pageid_to_keep`

A flashback request in flight pins log volumes:

// Retention API — src/transaction/flashback.h
extern LOG_PAGEID flashback_min_log_pageid_to_keep ();
extern bool flashback_is_needed_to_keep_archive ();
extern bool flashback_check_time_exceed_threshold (int *threshold);

extern void flashback_set_min_log_pageid_to_keep (LOG_LSA *lsa);
extern void flashback_set_request_done_time ();
extern void flashback_set_status_active ();
extern void flashback_set_status_inactive ();
extern void flashback_reset ();

The discipline:

Operator starts a flashback request → flashback_set_status_active, flashback_set_min_log_pageid_to_keep to the request’s start_lsa.pageid.
The archive remove daemon (log_wakeup_remove_log_archive_daemon in cubrid-log-manager.md) takes min(cdc_min_log_pageid_to_keep, flashback_min_log_pageid_to_keep) and refuses to delete archives whose pageids are above that minimum.
When the request finishes (or hits a configurable timeout per flashback_check_time_exceed_threshold), flashback_set_status_inactive is called and the daemon resumes deleting eligible archives.

The timeout exists so a stuck or abandoned flashback request doesn’t pin archives forever.

Active vs. inactive — single-request gate

flashback_set_status_active and flashback_set_status_inactive flip a global. The current implementation appears to support only one active flashback request at a time (see open question §1) — the _request_done_time and _check_time_exceed_threshold machinery is per-status, not per-request. Multi-tenant deployments that need concurrent flashback would need additional plumbing.

Phase timeline

sequenceDiagram
  participant OP as Operator
  participant CL as flashback_cl
  participant SR as flashback (server)
  participant LR as log_reader
  participant CGR as cdc_get_recdes
  participant ARD as archive remove daemon

  OP->>CL: cubrid flashback --start A --end B --classes c1,c2
  CL->>SR: flashback_verify_time (A, B)
  SR-->>CL: start_lsa, end_lsa
  CL->>SR: flashback_set_status_active + min_pageid
  Note over ARD: archives at start_lsa..end_lsa now pinned
  CL->>SR: flashback_make_summary_list (filter, summary_list)
  SR->>LR: walk start_lsa..end_lsa
  LR-->>SR: records
  SR->>SR: per-trid count, classoid set
  SR-->>CL: packed summary buffer
  CL->>OP: print summary list
  OP->>CL: pick trid T
  CL->>SR: flashback_make_loginfo (trid=T)
  SR->>LR: walk T's range
  loop each LOG_SUPPLEMENT_*
    SR->>CGR: chase to undo+redo
    CGR-->>SR: undo_recdes, redo_recdes
    SR->>SR: pack CDC_LOGINFO_ENTRY
  end
  SR-->>CL: packed loginfo buffer
  CL->>OP: print event stream
  OP->>CL: done
  CL->>SR: flashback_set_status_inactive + reset

Source Walkthrough

Anchor on symbol names, not line numbers.

Header types

FLASHBACK_SUMMARY_ENTRY (flashback.h) — server-side per-tran roll-up.
FLASHBACK_SUMMARY_CONTEXT (flashback.h) — server-side context for summary phase.
FLASHBACK_LOGINFO_CONTEXT (flashback.h) — server-side context for loginfo phase.
FLASHBACK_SUMMARY_INFO (flashback_cl.h) — client-side decoded summary entry.
FLASHBACK_SUMMARY_INFO_MAP (flashback_cl.h) — client-side summary map.
FLASHBACK_MAX_NUM_TRAN_TO_SUMMARY macro (flashback.h) — per-request cap.
OR_SUMMARY_ENTRY_SIZE_WITHOUT_CLASS macro (flashback.h) — wire size.

Server-side functions

flashback_initialize (flashback.c) — boot-time setup.
flashback_make_summary_list (flashback.c) — phase 1.
flashback_make_loginfo (flashback.c) — phase 2.
flashback_verify_time (flashback.h, defined in flashback.c) — time → LSA.
flashback_pack_summary_entry (flashback.h) — wire pack.
flashback_pack_loginfo (flashback.h) — wire pack.
flashback_min_log_pageid_to_keep (flashback.h).
flashback_is_needed_to_keep_archive (flashback.h).
flashback_check_time_exceed_threshold (flashback.h).
flashback_is_loginfo_generation_finished (flashback.h).
flashback_set_min_log_pageid_to_keep (flashback.h).
flashback_set_request_done_time (flashback.h).
flashback_set_status_active / _inactive (flashback.h).
flashback_reset (flashback.h).

Client / utility-side functions

flashback_find_class_index (flashback_cl.h) — locate a class OID in the operator-provided list.
flashback_unpack_and_print_summary (flashback_cl.h) — decode and emit summary table.
flashback_print_loginfo (flashback_cl.h) — decode and emit per-event detail.

Shared with CDC

cdc_get_recdes — same chase as cubrid-cdc.md. Declared in log_manager.h, defined in log_manager.c.
CDC_LOGINFO_ENTRY (log_impl.h) — same wire format.
log_reader class (log_reader.hpp) — same forward walker.

Position hints as of 2026-04-30

Symbol	File	Line
`FLASHBACK_SUMMARY_ENTRY` (struct)	`flashback.h`	63
`FLASHBACK_SUMMARY_CONTEXT` (struct)	`flashback.h`	87
`FLASHBACK_LOGINFO_CONTEXT` (struct)	`flashback.h`	100
`FLASHBACK_MAX_NUM_TRAN_TO_SUMMARY`	`flashback.h`	45
`OR_SUMMARY_ENTRY_SIZE_WITHOUT_CLASS`	`flashback.h`	79
`FLASHBACK_SUMMARY_INFO` (struct)	`flashback_cl.h`	49
`FLASHBACK_SUMMARY_INFO_MAP` (typedef)	`flashback_cl.h`	58
`flashback_initialize`	`flashback.c`	109
`flashback_make_summary_list`	`flashback.c`	284
`flashback_make_loginfo`	`flashback.c`	767

Source verification (as of 2026-05-01)

Verified facts

Flashback is two-phase: summary then loginfo. Verified by the existence of separate flashback_make_summary_list (flashback.c:284) and flashback_make_loginfo (flashback.c:767) entry points, with separate context structs (FLASHBACK_SUMMARY_CONTEXT vs. FLASHBACK_LOGINFO_CONTEXT).
Both phases walk the log forward, not backward. Verified at flashback.h:107 — flashback_loginfo_context::forward is an int field, not a bool, suggesting room for direction but flashback.c body does not implement backward walking (open question §3).
Per-event format is CDC_LOGINFO_ENTRY, shared with CDC. Verified at flashback.h:113 — loginfo_queue is std::queue<CDC_LOGINFO_ENTRY *>. The wire format is the same one consumers already understand, so the print code is the same.
The undo/redo chase shares cdc_get_recdes. Verified by the is_flashback parameter on cdc_get_recdes (cubrid-cdc.md; signature in log_manager.h). Setting is_flashback=true loosens error tolerance for partial chains.
Per-request transaction cap is configurable. Verified at flashback.h:45: FLASHBACK_MAX_NUM_TRAN_TO_SUMMARY is prm_get_integer_value (PRM_ID_FLASHBACK_MAX_TRANSACTION). Operators can raise it for big audit jobs at the cost of server memory.
The summary entry’s class set is a std::unordered_set<OID>. Verified at flashback.h:75. Implication: the same class observed N times within one tran adds to memory only once. The wire format (OR_SUMMARY_ENTRY_SIZE_WITHOUT_CLASS) confirms the set is packed separately (per-class size not in the macro).
Archive retention is a separate watermark from CDC’s. Verified at flashback.h:128: flashback_min_log_pageid_to_keep (). The archive remove daemon (cubrid-log-manager.md) takes the min of this and cdc_min_log_pageid_to_keep.
A flashback request can time out and release archives. Verified at flashback.h:130 (flashback_check_time_exceed_threshold (int *threshold)). The threshold is read out via the int * argument, so the caller can log “exceeded by N seconds”.
Time-to-LSA conversion exists as a dedicated function. Verified at flashback.h:117 (flashback_verify_time). It walks log timestamp records to find LSAs that bracket the requested times, returning error if the window is outside available archives.
Client-side summary uses a smaller struct than server-side. Verified at flashback_cl.h:49 (FLASHBACK_SUMMARY_INFO) vs. flashback.h:63 (FLASHBACK_SUMMARY_ENTRY). The client side keeps only trid, user, start/end LSA — no per-class info, no counts. Counts and classes are encoded in the print output, not retained.

Open questions

Concurrent flashback requests. The flashback_set_status_active /_inactive API uses a single global flag. Two operators running flashback simultaneously would race. Investigation path: read flashback.c:109 (flashback_initialize) and flashback_set_status_active’s body for mutex / refcount.
Behaviour on archive-removed-during-walk. If the daemon somehow deletes an archive that flashback is reading (despite the watermark), what happens? The is_flashback=true tolerance suggests partial graceful degradation, but the exact error code returned to the operator is unverified. Investigation path: trace cdc_get_recdes’s flashback arm.
Backward-walking flashback. The forward field on FLASHBACK_LOGINFO_CONTEXT is an int — does the code path support backward walking (e.g., for an out-of-order replay)? Or is the field reserved for future use? Investigation path: read flashback_make_loginfo body.
Trigger events. LOG_SUPPLEMENT_TRIGGER_INSERT/UPDATE/DELETE exist (cubrid-cdc.md). Does the summary phase count them alongside regular DML, or separately, or skip? Investigation path: flashback_make_summary_list body’s switch over SUPPLEMENT_REC_TYPE.
DDL in flashback. Does the summary include DDL events? The summary entry’s counts are num_insert, num_update, num_delete — no DDL counter. So DDL is presumably either ignored or shown only in the loginfo phase. Investigation path: search for LOG_SUPPLEMENT_DDL handling in flashback.c.
Memory bound on loginfo_queue. The summary cap is explicit (FLASHBACK_MAX_NUM_TRAN_TO_SUMMARY) but the loginfo queue’s bound is via queue_size. What is the default? Is the consumer expected to drain incrementally? Investigation path: read flashback_make_loginfo body for queue_size consumers.

Beyond CUBRID — Comparative Designs & Research Frontiers

Pointers, not analysis.

Oracle Flashback Query — backed by the undo tablespace, not the WAL. Restores past row state directly via SQL. CUBRID’s flashback is closer to a log-mining utility than Oracle’s “make my SELECT see past data”. A side-by-side would highlight what each gives up.
PostgreSQL pg_dirtyread / pg_freespacemap — extension- level access to past row versions. Limited compared to CUBRID’s full per-tran replay; a candidate next-doc on what CUBRID’s flashback provides that PG’s extensions don’t.
Snowflake Time Travel — built on the table’s own version history (micro-partitions), not the log. Different storage model entirely. Flashback in column-store / cloud DBs.
SQL Server Temporal Tables — system-versioned tables that expose history through SYSTEM_TIME clauses. CUBRID’s flashback is post-hoc (operator-issued); temporal tables are query-time. Comparing user-visibility models would document the design trade-offs.
GDPR-driven audit log mining — flashback as compliance tool for “what did transaction T do to data subject X”. CUBRID’s per-class filtering supports this; the next doc could outline how to wire flashback output into a SIEM.
Time-travel for ML reproducibility — read past row state to retrain a model on the data it saw originally. CUBRID’s flashback gives the change history, not the past state — a candidate research direction would be a state-reconstruction layer on top.

Sources

Raw analyses (`raw/code-analysis/cubrid/storage/cdc/`)

flashback 인수인계.pptx

Sibling docs

knowledge/code-analysis/cubrid/cubrid-cdc.md — opposite intent (forward streaming) but shared infrastructure (log_reader, cdc_get_recdes, CDC_LOGINFO_ENTRY).
knowledge/code-analysis/cubrid/cubrid-log-manager.md — LOG_SUPPLEMENTAL_INFO records flashback consumes.
knowledge/code-analysis/cubrid/cubrid-recovery-manager.md — shares the log_reader class with crash-recovery redo.

Textbook chapters (under `knowledge/research/dbms-general/`)

Database Internals (Petrov), Ch. 5 §“Logging” (the WAL is the substrate flashback walks).
Designing Data-Intensive Applications (Kleppmann), Ch. 11 “Stream Processing” — change-history-as-stream framing.

CUBRID source (`/data/hgryoo/references/cubrid/`)

src/transaction/flashback.{c,h} — server-side.
src/transaction/flashback_cl.{c,h} — utility-side.
src/transaction/log_reader.{cpp,hpp} — shared walker.
src/transaction/log_manager.c — cdc_get_recdes shared entry.