CUBRID checksumdb — HA Replica vs Master Row-Checksum Verifier with Chunked, Replication-Replayed Comparison

Theoretical Background

A replica integrity verifier is the tool that answers “is my replica really identical to the master?” — a question that comes up after a long-running replication chain, after suspected data corruption, or as a routine periodic check before a planned failover. The trivial implementation (dump both sides and diff) is impractical at scale: it requires either suspending writes for the duration of the dump or paying expensive snapshot-tracking costs, and the bandwidth cost is one full table-scan per replica.

Production engines therefore ship sampling or chunked-checksum verifiers that trade exhaustiveness for online operation:

Per-row checksum. A row-by-row hash comparison; exhaustive but proportional to data size and requires synchronisation between the two sides for every row.
Chunked checksum. Tables are split into chunks (by PK range, by row count, or by time window); per-chunk checksums are computed on each side and compared. A chunk-level mismatch flags a divergence; the operator can drill down to per-row only on flagged chunks.
Replication-replayed checksum. The master computes per-chunk checksums into a replicated table, the same replication stream carries them to the slave, and the slave recomputes the same chunks locally and updates the table with its own checksum. Comparison becomes a WHERE master_checksum != slave_checksum query on the result table. Bandwidth cost is one summary row per chunk, not one row per data row.

CUBRID’s checksumdb picks option 3, with chunking by primary-key range along the PK ordering (so each chunk is contiguous in the table’s logical ordering), a configurable chunk size measured in rows, and a separate schema-checksum side that detects DDL drift even when row data is intact.

Common DBMS Design

Engine	Tool	Approach	Replica trip
MySQL	`pt-table-checksum` (Percona Toolkit)	Chunked CRC32 by PK range; replicates the checksum SQL via binlog so slaves compute on their own data	One result row per chunk replicated
PostgreSQL	`pg_compare`, `pglogical_compare`, `bucardo_ctl validate`	Per-table snapshot diff; some tools use chunked hashes	Varies by tool
Oracle	`DBMS_COMPARISON` package	Chunked PK-range hashes; can also reconcile differences	One summary row per scan column
MongoDB	`db.collection.dataSize()` summary checks; no first-class row-level integrity verifier	Application-level	n/a
CUBRID checksumdb	Chunked CRC by PK range, replicated through the same WAL stream that ferries data; explicit schema-checksum side	One result row per chunk replicated; schema rows replicated separately

The CUBRID design is closest to pt-table-checksum — chunked PK range, replication-replayed, lock-during-checksum. The distinguishing trait is the explicit schema-checksum table: DDL drift between master and replica produces a separate row class that doesn’t conflict with row-data divergence reports.

CUBRID’s Approach

The two replicated tables

checksumdb maintains two dedicated tables that exist only for its own use:

Table	Contents	Purpose
`db_ha_apply_info_chksum_` (the result* table)	One row per (target table, chunk_id) pair: PK lower bound, chunk size, master checksum, master replication info, slave checksum (computed at replay time)	Per-chunk verification
`db_ha_apply_info_chksum__schema` (the schema* table)	One row per checked target table: serialised schema definition + repid + master schema checksum	DDL-drift detection

The exact name suffix is configurable; the constants chksum_result_Table_name and chksum_schema_Table_name are populated from the database name during startup.

The two tables are regular CUBRID tables — they participate in WAL like any other catalog row, which means the master’s writes to them propagate automatically to slaves through the existing HA replication path (see cubrid-ha-replication.md). On the slave side, the apply-log daemon replays the master’s row inserts into the result table; checksumdb-on-the-slave then re-reads each row, recomputes the chunk’s checksum locally, and writes the slave’s own checksum back into the same row for comparison.

Chunking by PK range

For each user table, checksumdb walks the data along the PK ordering, breaking it into chunks of --chunk-size rows (default 10,000). Each chunk has:

lower_bound: a printable expression equivalent to the first row of the chunk, written as a WHERE clause that selects “rows where (pk_col1, pk_col2, …) >= (lb1, lb2, …)” with proper quoting and escaping.
upper_bound: implicit — the next chunk’s lower bound, or the end of the table.
chunk_id: a sequential integer per (table, chunk) pair.
checksum: a CRC over the chunk’s rows in PK order, computed by a SELECT BIT_XOR(MD5(...)) query that reads from the table itself with the WHERE-bound applied and LIMIT chunk_size.

// chksum_calculate_checksum — checksumdb.c:1686
chksum_calculate_checksum (PARSER_CONTEXT *parser, const OID *class_oidp,
                           const char *table_name, DB_ATTRIBUTE *attributes,
                           PARSER_VARCHAR *lower_bound, int chunk_id,
                           int chunk_size)
{
    /* build the actual checksum SELECT for this chunk */
    checksum_query = chksum_print_checksum_query (parser, table_name, attributes,
                                                  lower_bound, chunk_id, chunk_size);

    /* set replication-record info so the master's replay of the
     * INSERT-into-result-table to slaves carries the right context;
     * acquire SHARED lock on the user table so the chunk is
     * consistent for the duration of the SELECT */
    chksum_set_repl_info_and_demote_table_lock (table_name, checksum_query, ...);

    /* execute the checksum SELECT; INSERT result row into the
     * result table; advance to the next chunk via
     * chksum_get_next_lower_bound */
}

The lock pattern is important: an exclusive LOCK TABLE WRITE would block readers; a LOCK TABLE READ (SHARED) acquired on the user table allows other readers but blocks writers for just the duration of the chunk’s SELECT, then is downgraded / released. This is the trade-off between consistency and online operation: replication during the lock will queue, briefly stall, then resume.

The lower-bound iteration

chksum_get_next_lower_bound (called between chunks) issues a short SELECT to find the PK of the row that would be the first row of the next chunk:

SELECT pk_col1, pk_col2, ...
FROM <table>
WHERE (pk_col1, pk_col2, ...) >= (lb1, lb2, ...)
ORDER BY pk_col1 ASC, pk_col2 ASC, ...
LIMIT <chunk_size>;
-- last row of the result is the boundary

The “last row” of the limited result becomes the next chunk’s lower bound. If the result has fewer than chunk_size rows, the table is at its end and the loop terminates.

For the very first chunk, chksum_get_initial_lower_bound returns a synthetic “minimum PK” expression — usually a NULL or MIN(pk) derivative, depending on PK column types — that selects all rows.

`chksum_arg` — option carrier

struct chksum_arg {
  int             chunk_size;     // rows per chunk; default 10000
  int             sleep_msecs;    // pause between chunks to throttle
  int             timeout_msecs;  // per-chunk lock-wait timeout
  bool            resume;         // pick up from previous run
  bool            cont_on_err;    // skip a failing table, continue
  bool            schema_only;    // skip data chunks; do schema check only
  dynamic_array  *include_list;   // explicit table allowlist
  dynamic_array  *exclude_list;   // table blocklist (default: catalog tables)
};

Five operational knobs:

--chunk-size. Smaller chunks = more lock-acquire overhead
- more result rows but shorter per-chunk lock window. Default 10,000 is the production sweet spot for OLTP tables; very wide rows or very busy tables may want smaller.
--sleep-msecs. Sleep N ms between chunks to keep the verifier from saturating either side; especially relevant on the master where the SELECT competes with live workload.
--timeout-msecs. How long to wait for the per-chunk SHARED lock; expiration aborts the chunk (and, depending on --cont-on-err, the table or the whole run).
--resume. Skip already-checksummed chunks by reading the prior result table and starting from the highest stored chunk_id per table. chksum_get_prev_checksum_results populates chksum_Prev_results; chksum_set_initial_chunk_id_and_lower_bound consumes it.
--schema-only. Skip data and only re-check the schema table. Useful as a fast pre-flight before a full data run, or for ongoing DDL-drift monitoring without the per-row cost.

Include / exclude list filtering

Include and exclude lists are processed per-table by chksum_need_skip_table:

If include_list is non-empty, only listed tables are checksummed.
The exclude_list always wins over the include list.
The default exclude_list includes catalog tables (db_class, db_attribute, etc.) which would otherwise produce huge schema-noise.

The two lists are read from files specified by --include-class-file / --exclude-class-file (one table name per line), or set directly via repeated --include-class / --exclude-class flags.

Schema check

Before the data run starts, checksumdb computes the schema checksum for every target table:

// for each target table:
chksum_update_current_schema_definition (table_name, repid);
//   serialises the class definition (column list with types,
//   constraints, owner) into a textual canonical form;
//   computes its MD5; compares against the stored schema row;
//   inserts a fresh row if absent;
//   marks divergence in chksum_report_schema_diff if mismatched.

The schema checksum is computed as of the current revision (repid is the schema revision id from the catalog). On the slave, the same canonical-form serialisation runs locally and the recomputed checksum is compared.

If a table’s schema diverges, data-chunk checksums are still attempted (they may also diverge or coincidentally match — both states are recorded), but the run reports the schema mismatch as the primary error.

Report

After a run completes (or partially completes), chksum_report opens an output file and writes three sections:

Header — chksum_report_header: database name, run start time, output table names, --chunk-size, include/exclude lists.
Summary — chksum_report_summary: per-table totals (chunks checked, chunks matched, chunks mismatched, schema status).
Diffs — chksum_report_diff (data) and chksum_report_schema_diff (schema): per-mismatch rows with PK lower bound, master vs slave checksum, replication info.

The report is human-readable text; downstream tooling typically parses the summary table for monitoring integration.

Resume mode

--resume is the operational mode for long-running verifications that may be interrupted (operator stops the run, the slave restarts during apply, etc.). On resume:

chksum_get_prev_checksum_results reads every existing row from the result table into the in-memory linked list chksum_Prev_results (one entry per table with the highest chunk_id already done).
For each target table at chunk-start time, chksum_set_initial_chunk_id_and_lower_bound consults the list; if the table has prior results, it sets chunk_id = prev_chunk_id + 1 and lower_bound from the prior row’s last_lower_bound; otherwise starts from chunk 1 and the synthetic minimum-PK bound.

The result table is therefore both an output and a checkpoint — the same WAL replication that propagates the result rows to slaves also supports the master’s restart from a partial run.

Source Walkthrough

Top-level (`checksumdb.c`)

Symbol	Role
`checksumdb` (entry, called by `cubrid checksumdb`)	UTIL_FUNCTION_ARG-style entry; option parse; calls `chksum_start`
`chksum_start`	Main loop; for each target table, run schema check then data chunks
`chksum_init_checksum_tables`	Drop + recreate result tables, or read prior results when resuming
`chksum_drop_and_create_checksum_table`	Catalog-level drop + create of the result/schema tables
`chksum_get_prev_checksum_results` / `_get_checksum_result` / `_free_results`	Resume-mode prior-run consumer
`chksum_need_skip_table`	Include/exclude predicate

Chunking and SQL emit

Symbol	Role
`chksum_calculate_checksum`	Per-chunk: build SELECT, set repl info, lock, execute, insert result, advance
`chksum_print_checksum_query`	Build the `SELECT BIT_XOR(MD5(...))` over the chunk
`chksum_print_pk_list`	Render PK-column list
`chksum_print_attribute_list`	Render full attribute list (used in checksum query)
`chksum_print_select_last_chunk`	Boundary-finding SELECT
`chksum_print_lower_bound_string` / `chksum_get_quote_escaped_lower_bound`	Render PK tuple as quoted SQL literals
`chksum_get_initial_lower_bound` / `chksum_get_next_lower_bound`	Per-chunk boundary advance
`chksum_set_initial_chunk_id_and_lower_bound`	Resume-aware first-chunk setup

Locking and replication

Symbol	Role
`chksum_set_repl_info_and_demote_table_lock`	Acquire SHARED lock on the user table, set replication-record context so the master’s INSERT-into-result-table replicates with the right metadata

Schema check

Symbol	Role
`chksum_update_current_schema_definition`	Serialise current schema, compute MD5, update existing row
`chksum_insert_schema_definition`	Insert when no prior row exists

Master checksum SELECT

Symbol	Role
`chksum_print_select_master_checksum`	Build SELECT to read master checksum from the result table
`chksum_print_update_master_checksum`	Build UPDATE to write master checksum
`chksum_update_master_checksum`	Execute the update

Reporting

Symbol	Role
`chksum_report`	Top-level report driver
`chksum_report_open_file`	Open report output file
`chksum_report_header`	Header
`chksum_report_summary`	Per-table totals
`chksum_report_diff`	Per-mismatch data rows
`chksum_report_schema_diff`	Per-mismatch schema rows

Position hints (as of 2026-05-05)

Symbol	Path
`CHKSUM_RESULT` (struct)	`src/executables/checksumdb.c:84`
`CHKSUM_ARG` (struct)	`src/executables/checksumdb.c:94`
`chksum_calculate_checksum`	`src/executables/checksumdb.c:1686`
`chksum_print_checksum_query`	`src/executables/checksumdb.c:1082`
`chksum_get_next_lower_bound`	`src/executables/checksumdb.c:1319`
`chksum_start`	`src/executables/checksumdb.c:1798`
`chksum_report_open_file`	`src/executables/checksumdb.c:160`
`chksum_report_header`	`src/executables/checksumdb.c:175`

Symbol names are the canonical anchor; line numbers are hints scoped to the updated: date.

Cross-check Notes

Result table is itself replicated. The point of replicating the master’s checksum rows is so that the slave’s local recompute has something to compare against without a separate connection to the master. A non-HA database has no use for checksumdb beyond the schema-only mode.
Replication delay matters. If the slave hasn’t caught up to the moment when the master finished a chunk, the slave’s recomputed checksum will be over a prior state of the table. Operators run checksumdb only after replication is confirmed caught-up (via cubrid heartbeat status or db_ha_apply_info_* lag indicators).
SHARED lock semantics. The per-chunk lock is LOCK TABLE READ, downgraded after the SELECT. Concurrent INSERT/UPDATE to the same table queues briefly per chunk, multiplied by the number of chunks in the table. For very large tables, the cumulative impact is meaningful — --sleep-msecs exists exactly to spread that impact.
PK is required. Tables without a primary key cannot be chunked along a stable order; checksumdb skips them with a warning and records the skip in the report.
Catalog-table exclusion is hard-coded in default. The exclude list defaults include db_class, db_attribute, db_attr_setdomain_elm, etc. Operators wanting to verify the catalog itself would have to override the exclude list, but the catalog is also the most likely victim of replication drift, so this is mostly a noise-suppression default rather than a correctness call.
Schema serialisation must be canonical. Both sides must produce byte-identical text from the same schema; otherwise every checksum comparison would falsely flag drift. The serialiser explicitly normalises whitespace, attribute ordering (using attribute-order, not storage-order), and quoting.

Open Questions

Per-chunk lock backoff. A chunk that times out is recorded as failed; there’s no in-loop retry with backoff. Long-held application transactions can therefore leave gaps in the chunk grid that don’t get re-attempted within the run.
MVCC snapshot bracketing. checksumdb doesn’t pin a snapshot across chunks — each chunk reads at its own MVCC snapshot. A row that moves between chunks during the run could be missed (counted in neither chunk) or double-counted. Real divergence detection therefore tolerates a small false-positive rate from concurrent writes; operators typically run checksumdb against a quiesced or read-mostly window.
Cross-version compatibility. The serialised schema form is tied to the schema’s repid; cross-version replication with a schema change in flight can produce diffs that aren’t meaningful. The remediation is to run checksumdb only when master and slave are at the same repid.
No ECC-style auto-repair. checksumdb reports differences but doesn’t reconcile. Repair requires manual intervention or a re-baselining of the slave from a fresh master backup.

Sources

src/executables/checksumdb.c — entire utility (single file, ~2400 lines)
src/executables/AGENTS.md — agent guide
Adjacent docs: cubrid-ha-replication.md (the replication channel checksumdb relies on), cubrid-cdc.md (the CDC alternative for change capture; checksumdb verifies HA, CDC publishes changes), cubrid-cub-admin.md §“Database-admin verbs” (checksumdb is a CS_ONLY verb in the unified cubrid admin entry), cubrid-master-process.md (cub_master is involved only as the connection broker; checksumdb itself runs in libcubridcs)

CUBRID checksumdb — HA Replica vs Master Row-Checksum Verifier with Chunked, Replication-Replayed Comparison

Theoretical Background

Common DBMS Design

CUBRID’s Approach

The two replicated tables

Chunking by PK range

The lower-bound iteration

chksum_arg — option carrier

Include / exclude list filtering

Schema check

Report

Resume mode

Source Walkthrough

Top-level (checksumdb.c)

Chunking and SQL emit

Locking and replication

Schema check

Master checksum SELECT

Reporting

Position hints (as of 2026-05-05)

Cross-check Notes

Open Questions

Sources

`chksum_arg` — option carrier

Top-level (`checksumdb.c`)