Skip to content

CUBRID Flashback — Transaction Summary and Per-Tran Replay From the Log

Contents:

Flashback is the question “what happened to my data between two points in the past, and can I see it?” It is not rollback — the database state is not changed; the operator gets a report of changes (or a regenerable script) that they can manually apply. The closest classical concept is “log mining” — turning the WAL into a queryable history.

Database Internals (Petrov) does not have a dedicated flashback chapter, but the topic sits at the intersection of ch. 5 (Recovery, WAL) and ch. 11 (Logging). Two implementation choices the model leaves open shape every flashback implementation and frame the rest of this document:

  1. Forward walk or backward walk? A user typically asks “between time A and B”. The implementation can scan forward from A, or scan backward from B; the choice determines which direction the per-transaction LSA chain is followed. CUBRID picks forward walk in two phases: (1) summary phase scans forward and accumulates per-trid counts; (2) loginfo phase scans the chosen trid’s range forward, materialising row images. The “backward” framing in the doc is conceptual — flashback restores a past state, even though the implementation walks the log forward.
  2. Whole-log mining or filtered? A long log range can hold millions of records. Filtering by class OID and user is essential for human-scale output. CUBRID supports both filters and caps the summary size at FLASHBACK_MAX_NUM_TRAN_TO_SUMMARY so an unfiltered query doesn’t blow up.

After the choices are named, every CUBRID-specific structure in this document either implements one of them or makes the access faster.

Engines that ship flashback (Oracle, CUBRID, SQL Server’s “Temporal Tables”) share a small handful of patterns.

The user almost never wants every event in a wide time range. First phase: enumerate transactions, return a small per-tran summary (trid, user, time, counts of INSERT/UPDATE/DELETE, classes touched). Second phase: the user picks a transaction and asks for its full record stream. The summary phase amortises the log walk across many trans; the detail phase amortises the record-decode across many DML statements within the chosen trans.

Flashback and CDC both walk the log forward. Modern engines share the walker code: the per-record decoder, the LSA-to-time mapping, the indirect undo/redo chase. CUBRID does this — the flashback loginfo path packs into CDC_LOGINFO_ENTRY (the same struct CDC uses) and chases data records via cdc_get_recdes with is_flashback=true.

Archive retention separate from CDC retention

Section titled “Archive retention separate from CDC retention”

A flashback request can pin log archives older than any CDC consumer needs. The retention watermark is therefore a two-input minimum: smallest CDC pageid kept and smallest flashback pageid kept. CUBRID exposes cdc_min_log_pageid_to_keep and flashback_min_log_pageid_to_keep separately and the archive remove daemon takes the min of both.

The user phrases the request in wall-clock time, but the engine walks LSAs. Flashback needs time_t → LOG_LSA resolution. CUBRID’s flashback_verify_time is the boundary checker; it walks log records’ commit timestamps to find the LSA that brackets a given time.

Theoretical conceptCUBRID name
Flashback summary entryFLASHBACK_SUMMARY_ENTRY { trid, user, start_time, end_time, counts, lsas, classoid_set }
Server-side summary contextFLASHBACK_SUMMARY_CONTEXT (flashback.h:87)
Server-side loginfo contextFLASHBACK_LOGINFO_CONTEXT (flashback.h:100)
Client-side summary entryFLASHBACK_SUMMARY_INFO (flashback_cl.h:49)
Client-side summary mapFLASHBACK_SUMMARY_INFO_MAP (flashback_cl.h:58)
Wall-time → LSA resolverflashback_verify_time (flashback.h:117)
Summary builderflashback_make_summary_list (flashback.c:284)
Loginfo builderflashback_make_loginfo (flashback.c:767)
Server-side initflashback_initialize (flashback.c:109)
Archive-keep watermarkflashback_min_log_pageid_to_keep (flashback.h:128)
Active-flashback gateflashback_is_needed_to_keep_archive (flashback.h:129)
Time-budget gateflashback_check_time_exceed_threshold (flashback.h:130)
Per-event entry shape (shared with CDC)CDC_LOGINFO_ENTRY (log_impl.h)
Per-tran summary capFLASHBACK_MAX_NUM_TRAN_TO_SUMMARY macro → PRM_ID_FLASHBACK_MAX_TRANSACTION

The flashback module has three moving parts: the summary phase that turns “time range + filters” into a per-transaction list, the loginfo phase that turns “one transaction” into a detailed event stream, and the archive-retention discipline that keeps log volumes alive while flashback requests are in progress. We walk them in that order.

flowchart LR
  subgraph CL["Utility / client (flashback_cl)"]
    USER["operator: cubrid flashback"]
    DEC["unpack + print"]
    USER --> DEC
  end
  subgraph SRV["Server (flashback.c)"]
    PHASE1["flashback_make_summary_list\n(forward walk, count per trid)"]
    PHASE2["flashback_make_loginfo\n(forward walk for one trid)"]
    PACK["flashback_pack_summary_entry\nflashback_pack_loginfo"]
  end
  subgraph LOG["WAL (archived volumes)"]
    LOGV["log archive volumes\n(cubrid-log-manager.md)"]
  end
  subgraph CDCSHR["Shared with CDC"]
    LRD["log_reader"]
    CGR["cdc_get_recdes (is_flashback=true)"]
    LE["CDC_LOGINFO_ENTRY"]
  end
  USER -->|time A,B + class/user filter| PHASE1
  PHASE1 --> LRD --> LOGV
  PHASE1 --> PACK -->|summary buffer| DEC
  USER -->|chosen trid| PHASE2
  PHASE2 --> LRD
  PHASE2 --> CGR
  PHASE2 --> LE --> PACK -->|loginfo buffer| DEC
  PHASE1 -.set retention.-> RET["flashback_set_min_log_pageid_to_keep"]
  RET -.archive remove daemon checks.-> LOGV

The figure encodes three boundaries. (client / server) the human-facing print/format runs client-side; the log walk runs server-side. (phase 1 / phase 2) the summary phase produces small per-tran rows; the loginfo phase produces the verbose event stream for one chosen tran. (flashback / CDC) the forward-walking machinery is shared with CDC, and the per-event wire format is CDC_LOGINFO_ENTRY — flashback was built after CDC and reuses its plumbing rather than duplicating.

The phase is driven by a context object:

// FLASHBACK_SUMMARY_CONTEXT — src/transaction/flashback.h:87
struct flashback_summary_context
{
LOG_LSA start_lsa; /* time A → LSA */
LOG_LSA end_lsa; /* time B → LSA */
char *user; /* whitelist user (or NULL = all) */
int num_summary; /* output: filled by builder */
int num_class;
std::vector<OID> classoids; /* whitelist class OIDs */
std::map<TRANID, FLASHBACK_SUMMARY_ENTRY> summary_list;
};

The summary list maps trid → per-tran roll-up. Each entry:

// FLASHBACK_SUMMARY_ENTRY — src/transaction/flashback.h:63
struct flashback_summary_entry
{
TRANID trid;
char user[DB_MAX_USER_LENGTH + 1];
time_t start_time;
time_t end_time;
int num_insert;
int num_update;
int num_delete;
LOG_LSA start_lsa;
LOG_LSA end_lsa;
std::unordered_set<OID> classoid_set; /* classes this tran touched */
};

flashback_make_summary_list (flashback.c:284) is the builder. Its body walks the log forward from start_lsa to end_lsa, visiting every record. For each record:

  • If it’s a LOG_SUPPLEMENTAL_INFO of type INSERT/UPDATE/DELETE on a whitelisted class and trid, increment the corresponding per-tran counter and add the class OID to the per-tran set.
  • If it’s a LOG_SUPPLEMENT_TRAN_USER, record the user name (and filter the trid out if the user doesn’t match).
  • If it’s LOG_COMMIT or LOG_ABORT, finalise the per-tran end LSA and end time.

The summary list is capped at FLASHBACK_MAX_NUM_TRAN_TO_SUMMARY trans (configurable parameter PRM_ID_FLASHBACK_MAX_TRANSACTION); beyond the cap, additional trans are dropped from the summary to keep memory bounded.

The packing function flashback_pack_summary_entry (flashback.h:119) serialises the summary into a wire buffer the client decodes via flashback_unpack_and_print_summary (flashback_cl.h:62). The wire size of one entry without the class set is:

// OR_SUMMARY_ENTRY_SIZE_WITHOUT_CLASS — src/transaction/flashback.h:79
#define OR_SUMMARY_ENTRY_SIZE_WITHOUT_CLASS \
(OR_INT_SIZE /* trid */ \
+ DB_MAX_USER_LENGTH + MAX_ALIGNMENT \
+ OR_INT64_SIZE * 2 /* start_time, end_time */ \
+ OR_INT_SIZE * 3 /* counts */ \
+ OR_LOG_LSA_SIZE * 2 /* start/end LSA */ \
+ OR_INT_SIZE) /* num classes */

Once the operator picks a transaction from the summary, the second phase fetches its full event stream:

// FLASHBACK_LOGINFO_CONTEXT — src/transaction/flashback.h:100
struct flashback_loginfo_context
{
TRANID trid; /* the chosen trid */
char *user;
LOG_LSA start_lsa; /* normally summary.start_lsa */
LOG_LSA end_lsa;
int num_class; /* class filter cardinality */
int forward; /* direction (always forward
in current implementation) */
int num_loginfo; /* output count */
int queue_size;
OID invalid_class; /* class observed during walk
that wasn't in filter — for
diagnostics */
std::unordered_set<OID> classoid_set; /* whitelist */
std::queue<CDC_LOGINFO_ENTRY *> loginfo_queue;
};

flashback_make_loginfo (flashback.c:767) walks the log range again, this time emitting one CDC_LOGINFO_ENTRY per matching event. The chase from supplemental → underlying data record uses cdc_get_recdes with is_flashback=true:

// from cubrid-cdc.md, the shared chase
int cdc_get_recdes (THREAD_ENTRY *thread_p,
LOG_LSA *undo_lsa, RECDES *undo_recdes,
LOG_LSA *redo_lsa, RECDES *redo_recdes,
bool is_flashback);

The is_flashback=true flag changes behaviour in two places: (a) missing pages or broken chains are tolerated (a chain that goes off the end of a removed archive returns S_END rather than S_ERROR); (b) the function is willing to re-fetch from older archives if needed.

The packing function flashback_pack_loginfo (flashback.h:123) serialises the queue into a wire buffer that the client prints via flashback_print_loginfo (flashback_cl.h:65).

The user supplies a wall-clock time range; the engine needs LSAs:

// flashback_verify_time — src/transaction/flashback.h:117
int flashback_verify_time (THREAD_ENTRY *thread_p,
time_t *start_time, time_t *end_time,
LOG_LSA *start_lsa, LOG_LSA *end_lsa);

The function walks log records carrying timestamps (LOG_REC_DONETIME, LOG_REC_HA_SERVER_STATE, the donetime field on LOG_REC_START_POSTPONE) until it brackets the requested times. The output *_lsa are the LSAs that span the time range; on out-of-range request (e.g., before the oldest archive) it returns an error so the operator can be told that flashback can’t go back that far.

Archive retention — flashback_min_log_pageid_to_keep

Section titled “Archive retention — flashback_min_log_pageid_to_keep”

A flashback request in flight pins log volumes:

// Retention API — src/transaction/flashback.h
extern LOG_PAGEID flashback_min_log_pageid_to_keep ();
extern bool flashback_is_needed_to_keep_archive ();
extern bool flashback_check_time_exceed_threshold (int *threshold);
extern void flashback_set_min_log_pageid_to_keep (LOG_LSA *lsa);
extern void flashback_set_request_done_time ();
extern void flashback_set_status_active ();
extern void flashback_set_status_inactive ();
extern void flashback_reset ();

The discipline:

  1. Operator starts a flashback request → flashback_set_status_active, flashback_set_min_log_pageid_to_keep to the request’s start_lsa.pageid.
  2. The archive remove daemon (log_wakeup_remove_log_archive_daemon in cubrid-log-manager.md) takes min(cdc_min_log_pageid_to_keep, flashback_min_log_pageid_to_keep) and refuses to delete archives whose pageids are above that minimum.
  3. When the request finishes (or hits a configurable timeout per flashback_check_time_exceed_threshold), flashback_set_status_inactive is called and the daemon resumes deleting eligible archives.

The timeout exists so a stuck or abandoned flashback request doesn’t pin archives forever.

Active vs. inactive — single-request gate

Section titled “Active vs. inactive — single-request gate”

flashback_set_status_active and flashback_set_status_inactive flip a global. The current implementation appears to support only one active flashback request at a time (see open question §1) — the _request_done_time and _check_time_exceed_threshold machinery is per-status, not per-request. Multi-tenant deployments that need concurrent flashback would need additional plumbing.

sequenceDiagram
  participant OP as Operator
  participant CL as flashback_cl
  participant SR as flashback (server)
  participant LR as log_reader
  participant CGR as cdc_get_recdes
  participant ARD as archive remove daemon

  OP->>CL: cubrid flashback --start A --end B --classes c1,c2
  CL->>SR: flashback_verify_time (A, B)
  SR-->>CL: start_lsa, end_lsa
  CL->>SR: flashback_set_status_active + min_pageid
  Note over ARD: archives at start_lsa..end_lsa now pinned
  CL->>SR: flashback_make_summary_list (filter, summary_list)
  SR->>LR: walk start_lsa..end_lsa
  LR-->>SR: records
  SR->>SR: per-trid count, classoid set
  SR-->>CL: packed summary buffer
  CL->>OP: print summary list
  OP->>CL: pick trid T
  CL->>SR: flashback_make_loginfo (trid=T)
  SR->>LR: walk T's range
  loop each LOG_SUPPLEMENT_*
    SR->>CGR: chase to undo+redo
    CGR-->>SR: undo_recdes, redo_recdes
    SR->>SR: pack CDC_LOGINFO_ENTRY
  end
  SR-->>CL: packed loginfo buffer
  CL->>OP: print event stream
  OP->>CL: done
  CL->>SR: flashback_set_status_inactive + reset

Anchor on symbol names, not line numbers.

  • FLASHBACK_SUMMARY_ENTRY (flashback.h) — server-side per-tran roll-up.
  • FLASHBACK_SUMMARY_CONTEXT (flashback.h) — server-side context for summary phase.
  • FLASHBACK_LOGINFO_CONTEXT (flashback.h) — server-side context for loginfo phase.
  • FLASHBACK_SUMMARY_INFO (flashback_cl.h) — client-side decoded summary entry.
  • FLASHBACK_SUMMARY_INFO_MAP (flashback_cl.h) — client-side summary map.
  • FLASHBACK_MAX_NUM_TRAN_TO_SUMMARY macro (flashback.h) — per-request cap.
  • OR_SUMMARY_ENTRY_SIZE_WITHOUT_CLASS macro (flashback.h) — wire size.
  • flashback_initialize (flashback.c) — boot-time setup.
  • flashback_make_summary_list (flashback.c) — phase 1.
  • flashback_make_loginfo (flashback.c) — phase 2.
  • flashback_verify_time (flashback.h, defined in flashback.c) — time → LSA.
  • flashback_pack_summary_entry (flashback.h) — wire pack.
  • flashback_pack_loginfo (flashback.h) — wire pack.
  • flashback_min_log_pageid_to_keep (flashback.h).
  • flashback_is_needed_to_keep_archive (flashback.h).
  • flashback_check_time_exceed_threshold (flashback.h).
  • flashback_is_loginfo_generation_finished (flashback.h).
  • flashback_set_min_log_pageid_to_keep (flashback.h).
  • flashback_set_request_done_time (flashback.h).
  • flashback_set_status_active / _inactive (flashback.h).
  • flashback_reset (flashback.h).
  • flashback_find_class_index (flashback_cl.h) — locate a class OID in the operator-provided list.
  • flashback_unpack_and_print_summary (flashback_cl.h) — decode and emit summary table.
  • flashback_print_loginfo (flashback_cl.h) — decode and emit per-event detail.
  • cdc_get_recdes — same chase as cubrid-cdc.md. Declared in log_manager.h, defined in log_manager.c.
  • CDC_LOGINFO_ENTRY (log_impl.h) — same wire format.
  • log_reader class (log_reader.hpp) — same forward walker.
SymbolFileLine
FLASHBACK_SUMMARY_ENTRY (struct)flashback.h63
FLASHBACK_SUMMARY_CONTEXT (struct)flashback.h87
FLASHBACK_LOGINFO_CONTEXT (struct)flashback.h100
FLASHBACK_MAX_NUM_TRAN_TO_SUMMARYflashback.h45
OR_SUMMARY_ENTRY_SIZE_WITHOUT_CLASSflashback.h79
FLASHBACK_SUMMARY_INFO (struct)flashback_cl.h49
FLASHBACK_SUMMARY_INFO_MAP (typedef)flashback_cl.h58
flashback_initializeflashback.c109
flashback_make_summary_listflashback.c284
flashback_make_loginfoflashback.c767
  • Flashback is two-phase: summary then loginfo. Verified by the existence of separate flashback_make_summary_list (flashback.c:284) and flashback_make_loginfo (flashback.c:767) entry points, with separate context structs (FLASHBACK_SUMMARY_CONTEXT vs. FLASHBACK_LOGINFO_CONTEXT).

  • Both phases walk the log forward, not backward. Verified at flashback.h:107flashback_loginfo_context::forward is an int field, not a bool, suggesting room for direction but flashback.c body does not implement backward walking (open question §3).

  • Per-event format is CDC_LOGINFO_ENTRY, shared with CDC. Verified at flashback.h:113loginfo_queue is std::queue<CDC_LOGINFO_ENTRY *>. The wire format is the same one consumers already understand, so the print code is the same.

  • The undo/redo chase shares cdc_get_recdes. Verified by the is_flashback parameter on cdc_get_recdes (cubrid-cdc.md; signature in log_manager.h). Setting is_flashback=true loosens error tolerance for partial chains.

  • Per-request transaction cap is configurable. Verified at flashback.h:45: FLASHBACK_MAX_NUM_TRAN_TO_SUMMARY is prm_get_integer_value (PRM_ID_FLASHBACK_MAX_TRANSACTION). Operators can raise it for big audit jobs at the cost of server memory.

  • The summary entry’s class set is a std::unordered_set<OID>. Verified at flashback.h:75. Implication: the same class observed N times within one tran adds to memory only once. The wire format (OR_SUMMARY_ENTRY_SIZE_WITHOUT_CLASS) confirms the set is packed separately (per-class size not in the macro).

  • Archive retention is a separate watermark from CDC’s. Verified at flashback.h:128: flashback_min_log_pageid_to_keep (). The archive remove daemon (cubrid-log-manager.md) takes the min of this and cdc_min_log_pageid_to_keep.

  • A flashback request can time out and release archives. Verified at flashback.h:130 (flashback_check_time_exceed_threshold (int *threshold)). The threshold is read out via the int * argument, so the caller can log “exceeded by N seconds”.

  • Time-to-LSA conversion exists as a dedicated function. Verified at flashback.h:117 (flashback_verify_time). It walks log timestamp records to find LSAs that bracket the requested times, returning error if the window is outside available archives.

  • Client-side summary uses a smaller struct than server-side. Verified at flashback_cl.h:49 (FLASHBACK_SUMMARY_INFO) vs. flashback.h:63 (FLASHBACK_SUMMARY_ENTRY). The client side keeps only trid, user, start/end LSA — no per-class info, no counts. Counts and classes are encoded in the print output, not retained.

  1. Concurrent flashback requests. The flashback_set_status_active /_inactive API uses a single global flag. Two operators running flashback simultaneously would race. Investigation path: read flashback.c:109 (flashback_initialize) and flashback_set_status_active’s body for mutex / refcount.

  2. Behaviour on archive-removed-during-walk. If the daemon somehow deletes an archive that flashback is reading (despite the watermark), what happens? The is_flashback=true tolerance suggests partial graceful degradation, but the exact error code returned to the operator is unverified. Investigation path: trace cdc_get_recdes’s flashback arm.

  3. Backward-walking flashback. The forward field on FLASHBACK_LOGINFO_CONTEXT is an int — does the code path support backward walking (e.g., for an out-of-order replay)? Or is the field reserved for future use? Investigation path: read flashback_make_loginfo body.

  4. Trigger events. LOG_SUPPLEMENT_TRIGGER_INSERT/UPDATE/DELETE exist (cubrid-cdc.md). Does the summary phase count them alongside regular DML, or separately, or skip? Investigation path: flashback_make_summary_list body’s switch over SUPPLEMENT_REC_TYPE.

  5. DDL in flashback. Does the summary include DDL events? The summary entry’s counts are num_insert, num_update, num_delete — no DDL counter. So DDL is presumably either ignored or shown only in the loginfo phase. Investigation path: search for LOG_SUPPLEMENT_DDL handling in flashback.c.

  6. Memory bound on loginfo_queue. The summary cap is explicit (FLASHBACK_MAX_NUM_TRAN_TO_SUMMARY) but the loginfo queue’s bound is via queue_size. What is the default? Is the consumer expected to drain incrementally? Investigation path: read flashback_make_loginfo body for queue_size consumers.

Beyond CUBRID — Comparative Designs & Research Frontiers

Section titled “Beyond CUBRID — Comparative Designs & Research Frontiers”

Pointers, not analysis.

  • Oracle Flashback Query — backed by the undo tablespace, not the WAL. Restores past row state directly via SQL. CUBRID’s flashback is closer to a log-mining utility than Oracle’s “make my SELECT see past data”. A side-by-side would highlight what each gives up.

  • PostgreSQL pg_dirtyread / pg_freespacemap — extension- level access to past row versions. Limited compared to CUBRID’s full per-tran replay; a candidate next-doc on what CUBRID’s flashback provides that PG’s extensions don’t.

  • Snowflake Time Travel — built on the table’s own version history (micro-partitions), not the log. Different storage model entirely. Flashback in column-store / cloud DBs.

  • SQL Server Temporal Tables — system-versioned tables that expose history through SYSTEM_TIME clauses. CUBRID’s flashback is post-hoc (operator-issued); temporal tables are query-time. Comparing user-visibility models would document the design trade-offs.

  • GDPR-driven audit log mining — flashback as compliance tool for “what did transaction T do to data subject X”. CUBRID’s per-class filtering supports this; the next doc could outline how to wire flashback output into a SIEM.

  • Time-travel for ML reproducibility — read past row state to retrain a model on the data it saw originally. CUBRID’s flashback gives the change history, not the past state — a candidate research direction would be a state-reconstruction layer on top.

Raw analyses (raw/code-analysis/cubrid/storage/cdc/)

Section titled “Raw analyses (raw/code-analysis/cubrid/storage/cdc/)”
  • flashback 인수인계.pptx
  • knowledge/code-analysis/cubrid/cubrid-cdc.md — opposite intent (forward streaming) but shared infrastructure (log_reader, cdc_get_recdes, CDC_LOGINFO_ENTRY).
  • knowledge/code-analysis/cubrid/cubrid-log-manager.mdLOG_SUPPLEMENTAL_INFO records flashback consumes.
  • knowledge/code-analysis/cubrid/cubrid-recovery-manager.md — shares the log_reader class with crash-recovery redo.

Textbook chapters (under knowledge/research/dbms-general/)

Section titled “Textbook chapters (under knowledge/research/dbms-general/)”
  • Database Internals (Petrov), Ch. 5 §“Logging” (the WAL is the substrate flashback walks).
  • Designing Data-Intensive Applications (Kleppmann), Ch. 11 “Stream Processing” — change-history-as-stream framing.

CUBRID source (/data/hgryoo/references/cubrid/)

Section titled “CUBRID source (/data/hgryoo/references/cubrid/)”
  • src/transaction/flashback.{c,h} — server-side.
  • src/transaction/flashback_cl.{c,h} — utility-side.
  • src/transaction/log_reader.{cpp,hpp} — shared walker.
  • src/transaction/log_manager.ccdc_get_recdes shared entry.