Skip to content

CUBRID checksumdb — HA Replica vs Master Row-Checksum Verifier with Chunked, Replication-Replayed Comparison

A replica integrity verifier is the tool that answers “is my replica really identical to the master?” — a question that comes up after a long-running replication chain, after suspected data corruption, or as a routine periodic check before a planned failover. The trivial implementation (dump both sides and diff) is impractical at scale: it requires either suspending writes for the duration of the dump or paying expensive snapshot-tracking costs, and the bandwidth cost is one full table-scan per replica.

Production engines therefore ship sampling or chunked-checksum verifiers that trade exhaustiveness for online operation:

  1. Per-row checksum. A row-by-row hash comparison; exhaustive but proportional to data size and requires synchronisation between the two sides for every row.
  2. Chunked checksum. Tables are split into chunks (by PK range, by row count, or by time window); per-chunk checksums are computed on each side and compared. A chunk-level mismatch flags a divergence; the operator can drill down to per-row only on flagged chunks.
  3. Replication-replayed checksum. The master computes per-chunk checksums into a replicated table, the same replication stream carries them to the slave, and the slave recomputes the same chunks locally and updates the table with its own checksum. Comparison becomes a WHERE master_checksum != slave_checksum query on the result table. Bandwidth cost is one summary row per chunk, not one row per data row.

CUBRID’s checksumdb picks option 3, with chunking by primary-key range along the PK ordering (so each chunk is contiguous in the table’s logical ordering), a configurable chunk size measured in rows, and a separate schema-checksum side that detects DDL drift even when row data is intact.

EngineToolApproachReplica trip
MySQLpt-table-checksum (Percona Toolkit)Chunked CRC32 by PK range; replicates the checksum SQL via binlog so slaves compute on their own dataOne result row per chunk replicated
PostgreSQLpg_compare, pglogical_compare, bucardo_ctl validatePer-table snapshot diff; some tools use chunked hashesVaries by tool
OracleDBMS_COMPARISON packageChunked PK-range hashes; can also reconcile differencesOne summary row per scan column
MongoDBdb.collection.dataSize() summary checks; no first-class row-level integrity verifierApplication-leveln/a
CUBRID checksumdbChunked CRC by PK range, replicated through the same WAL stream that ferries data; explicit schema-checksum sideOne result row per chunk replicated; schema rows replicated separately

The CUBRID design is closest to pt-table-checksum — chunked PK range, replication-replayed, lock-during-checksum. The distinguishing trait is the explicit schema-checksum table: DDL drift between master and replica produces a separate row class that doesn’t conflict with row-data divergence reports.

checksumdb maintains two dedicated tables that exist only for its own use:

TableContentsPurpose
db_ha_apply_info_chksum_* (the result table)One row per (target table, chunk_id) pair: PK lower bound, chunk size, master checksum, master replication info, slave checksum (computed at replay time)Per-chunk verification
db_ha_apply_info_chksum_*_schema (the schema table)One row per checked target table: serialised schema definition + repid + master schema checksumDDL-drift detection

The exact name suffix is configurable; the constants chksum_result_Table_name and chksum_schema_Table_name are populated from the database name during startup.

The two tables are regular CUBRID tables — they participate in WAL like any other catalog row, which means the master’s writes to them propagate automatically to slaves through the existing HA replication path (see cubrid-ha-replication.md). On the slave side, the apply-log daemon replays the master’s row inserts into the result table; checksumdb-on-the-slave then re-reads each row, recomputes the chunk’s checksum locally, and writes the slave’s own checksum back into the same row for comparison.

For each user table, checksumdb walks the data along the PK ordering, breaking it into chunks of --chunk-size rows (default 10,000). Each chunk has:

  • lower_bound: a printable expression equivalent to the first row of the chunk, written as a WHERE clause that selects “rows where (pk_col1, pk_col2, …) >= (lb1, lb2, …)” with proper quoting and escaping.
  • upper_bound: implicit — the next chunk’s lower bound, or the end of the table.
  • chunk_id: a sequential integer per (table, chunk) pair.
  • checksum: a CRC over the chunk’s rows in PK order, computed by a SELECT BIT_XOR(MD5(...)) query that reads from the table itself with the WHERE-bound applied and LIMIT chunk_size.
// chksum_calculate_checksum — checksumdb.c:1686
chksum_calculate_checksum (PARSER_CONTEXT *parser, const OID *class_oidp,
const char *table_name, DB_ATTRIBUTE *attributes,
PARSER_VARCHAR *lower_bound, int chunk_id,
int chunk_size)
{
/* build the actual checksum SELECT for this chunk */
checksum_query = chksum_print_checksum_query (parser, table_name, attributes,
lower_bound, chunk_id, chunk_size);
/* set replication-record info so the master's replay of the
* INSERT-into-result-table to slaves carries the right context;
* acquire SHARED lock on the user table so the chunk is
* consistent for the duration of the SELECT */
chksum_set_repl_info_and_demote_table_lock (table_name, checksum_query, ...);
/* execute the checksum SELECT; INSERT result row into the
* result table; advance to the next chunk via
* chksum_get_next_lower_bound */
}

The lock pattern is important: an exclusive LOCK TABLE WRITE would block readers; a LOCK TABLE READ (SHARED) acquired on the user table allows other readers but blocks writers for just the duration of the chunk’s SELECT, then is downgraded / released. This is the trade-off between consistency and online operation: replication during the lock will queue, briefly stall, then resume.

chksum_get_next_lower_bound (called between chunks) issues a short SELECT to find the PK of the row that would be the first row of the next chunk:

SELECT pk_col1, pk_col2, ...
FROM <table>
WHERE (pk_col1, pk_col2, ...) >= (lb1, lb2, ...)
ORDER BY pk_col1 ASC, pk_col2 ASC, ...
LIMIT <chunk_size>;
-- last row of the result is the boundary

The “last row” of the limited result becomes the next chunk’s lower bound. If the result has fewer than chunk_size rows, the table is at its end and the loop terminates.

For the very first chunk, chksum_get_initial_lower_bound returns a synthetic “minimum PK” expression — usually a NULL or MIN(pk) derivative, depending on PK column types — that selects all rows.

struct chksum_arg {
int chunk_size; // rows per chunk; default 10000
int sleep_msecs; // pause between chunks to throttle
int timeout_msecs; // per-chunk lock-wait timeout
bool resume; // pick up from previous run
bool cont_on_err; // skip a failing table, continue
bool schema_only; // skip data chunks; do schema check only
dynamic_array *include_list; // explicit table allowlist
dynamic_array *exclude_list; // table blocklist (default: catalog tables)
};

Five operational knobs:

  • --chunk-size. Smaller chunks = more lock-acquire overhead
    • more result rows but shorter per-chunk lock window. Default 10,000 is the production sweet spot for OLTP tables; very wide rows or very busy tables may want smaller.
  • --sleep-msecs. Sleep N ms between chunks to keep the verifier from saturating either side; especially relevant on the master where the SELECT competes with live workload.
  • --timeout-msecs. How long to wait for the per-chunk SHARED lock; expiration aborts the chunk (and, depending on --cont-on-err, the table or the whole run).
  • --resume. Skip already-checksummed chunks by reading the prior result table and starting from the highest stored chunk_id per table. chksum_get_prev_checksum_results populates chksum_Prev_results; chksum_set_initial_chunk_id_and_lower_bound consumes it.
  • --schema-only. Skip data and only re-check the schema table. Useful as a fast pre-flight before a full data run, or for ongoing DDL-drift monitoring without the per-row cost.

Include and exclude lists are processed per-table by chksum_need_skip_table:

  • If include_list is non-empty, only listed tables are checksummed.
  • The exclude_list always wins over the include list.
  • The default exclude_list includes catalog tables (db_class, db_attribute, etc.) which would otherwise produce huge schema-noise.

The two lists are read from files specified by --include-class-file / --exclude-class-file (one table name per line), or set directly via repeated --include-class / --exclude-class flags.

Before the data run starts, checksumdb computes the schema checksum for every target table:

// for each target table:
chksum_update_current_schema_definition (table_name, repid);
// serialises the class definition (column list with types,
// constraints, owner) into a textual canonical form;
// computes its MD5; compares against the stored schema row;
// inserts a fresh row if absent;
// marks divergence in chksum_report_schema_diff if mismatched.

The schema checksum is computed as of the current revision (repid is the schema revision id from the catalog). On the slave, the same canonical-form serialisation runs locally and the recomputed checksum is compared.

If a table’s schema diverges, data-chunk checksums are still attempted (they may also diverge or coincidentally match — both states are recorded), but the run reports the schema mismatch as the primary error.

After a run completes (or partially completes), chksum_report opens an output file and writes three sections:

  1. Headerchksum_report_header: database name, run start time, output table names, --chunk-size, include/exclude lists.
  2. Summarychksum_report_summary: per-table totals (chunks checked, chunks matched, chunks mismatched, schema status).
  3. Diffschksum_report_diff (data) and chksum_report_schema_diff (schema): per-mismatch rows with PK lower bound, master vs slave checksum, replication info.

The report is human-readable text; downstream tooling typically parses the summary table for monitoring integration.

--resume is the operational mode for long-running verifications that may be interrupted (operator stops the run, the slave restarts during apply, etc.). On resume:

  1. chksum_get_prev_checksum_results reads every existing row from the result table into the in-memory linked list chksum_Prev_results (one entry per table with the highest chunk_id already done).
  2. For each target table at chunk-start time, chksum_set_initial_chunk_id_and_lower_bound consults the list; if the table has prior results, it sets chunk_id = prev_chunk_id + 1 and lower_bound from the prior row’s last_lower_bound; otherwise starts from chunk 1 and the synthetic minimum-PK bound.

The result table is therefore both an output and a checkpoint — the same WAL replication that propagates the result rows to slaves also supports the master’s restart from a partial run.

SymbolRole
checksumdb (entry, called by cubrid checksumdb)UTIL_FUNCTION_ARG-style entry; option parse; calls chksum_start
chksum_startMain loop; for each target table, run schema check then data chunks
chksum_init_checksum_tablesDrop + recreate result tables, or read prior results when resuming
chksum_drop_and_create_checksum_tableCatalog-level drop + create of the result/schema tables
chksum_get_prev_checksum_results / _get_checksum_result / _free_resultsResume-mode prior-run consumer
chksum_need_skip_tableInclude/exclude predicate
SymbolRole
chksum_calculate_checksumPer-chunk: build SELECT, set repl info, lock, execute, insert result, advance
chksum_print_checksum_queryBuild the SELECT BIT_XOR(MD5(...)) over the chunk
chksum_print_pk_listRender PK-column list
chksum_print_attribute_listRender full attribute list (used in checksum query)
chksum_print_select_last_chunkBoundary-finding SELECT
chksum_print_lower_bound_string / chksum_get_quote_escaped_lower_boundRender PK tuple as quoted SQL literals
chksum_get_initial_lower_bound / chksum_get_next_lower_boundPer-chunk boundary advance
chksum_set_initial_chunk_id_and_lower_boundResume-aware first-chunk setup
SymbolRole
chksum_set_repl_info_and_demote_table_lockAcquire SHARED lock on the user table, set replication-record context so the master’s INSERT-into-result-table replicates with the right metadata
SymbolRole
chksum_update_current_schema_definitionSerialise current schema, compute MD5, update existing row
chksum_insert_schema_definitionInsert when no prior row exists
SymbolRole
chksum_print_select_master_checksumBuild SELECT to read master checksum from the result table
chksum_print_update_master_checksumBuild UPDATE to write master checksum
chksum_update_master_checksumExecute the update
SymbolRole
chksum_reportTop-level report driver
chksum_report_open_fileOpen report output file
chksum_report_headerHeader
chksum_report_summaryPer-table totals
chksum_report_diffPer-mismatch data rows
chksum_report_schema_diffPer-mismatch schema rows
SymbolPath
CHKSUM_RESULT (struct)src/executables/checksumdb.c:84
CHKSUM_ARG (struct)src/executables/checksumdb.c:94
chksum_calculate_checksumsrc/executables/checksumdb.c:1686
chksum_print_checksum_querysrc/executables/checksumdb.c:1082
chksum_get_next_lower_boundsrc/executables/checksumdb.c:1319
chksum_startsrc/executables/checksumdb.c:1798
chksum_report_open_filesrc/executables/checksumdb.c:160
chksum_report_headersrc/executables/checksumdb.c:175

Symbol names are the canonical anchor; line numbers are hints scoped to the updated: date.

  • Result table is itself replicated. The point of replicating the master’s checksum rows is so that the slave’s local recompute has something to compare against without a separate connection to the master. A non-HA database has no use for checksumdb beyond the schema-only mode.
  • Replication delay matters. If the slave hasn’t caught up to the moment when the master finished a chunk, the slave’s recomputed checksum will be over a prior state of the table. Operators run checksumdb only after replication is confirmed caught-up (via cubrid heartbeat status or db_ha_apply_info_* lag indicators).
  • SHARED lock semantics. The per-chunk lock is LOCK TABLE READ, downgraded after the SELECT. Concurrent INSERT/UPDATE to the same table queues briefly per chunk, multiplied by the number of chunks in the table. For very large tables, the cumulative impact is meaningful — --sleep-msecs exists exactly to spread that impact.
  • PK is required. Tables without a primary key cannot be chunked along a stable order; checksumdb skips them with a warning and records the skip in the report.
  • Catalog-table exclusion is hard-coded in default. The exclude list defaults include db_class, db_attribute, db_attr_setdomain_elm, etc. Operators wanting to verify the catalog itself would have to override the exclude list, but the catalog is also the most likely victim of replication drift, so this is mostly a noise-suppression default rather than a correctness call.
  • Schema serialisation must be canonical. Both sides must produce byte-identical text from the same schema; otherwise every checksum comparison would falsely flag drift. The serialiser explicitly normalises whitespace, attribute ordering (using attribute-order, not storage-order), and quoting.
  • Per-chunk lock backoff. A chunk that times out is recorded as failed; there’s no in-loop retry with backoff. Long-held application transactions can therefore leave gaps in the chunk grid that don’t get re-attempted within the run.
  • MVCC snapshot bracketing. checksumdb doesn’t pin a snapshot across chunks — each chunk reads at its own MVCC snapshot. A row that moves between chunks during the run could be missed (counted in neither chunk) or double-counted. Real divergence detection therefore tolerates a small false-positive rate from concurrent writes; operators typically run checksumdb against a quiesced or read-mostly window.
  • Cross-version compatibility. The serialised schema form is tied to the schema’s repid; cross-version replication with a schema change in flight can produce diffs that aren’t meaningful. The remediation is to run checksumdb only when master and slave are at the same repid.
  • No ECC-style auto-repair. checksumdb reports differences but doesn’t reconcile. Repair requires manual intervention or a re-baselining of the slave from a fresh master backup.
  • src/executables/checksumdb.c — entire utility (single file, ~2400 lines)
  • src/executables/AGENTS.md — agent guide
  • Adjacent docs: cubrid-ha-replication.md (the replication channel checksumdb relies on), cubrid-cdc.md (the CDC alternative for change capture; checksumdb verifies HA, CDC publishes changes), cubrid-cub-admin.md §“Database-admin verbs” (checksumdb is a CS_ONLY verb in the unified cubrid admin entry), cubrid-master-process.md (cub_master is involved only as the connection broker; checksumdb itself runs in libcubridcs)