CUBRID checksumdb — HA Replica vs Master Row-Checksum Verifier with Chunked, Replication-Replayed Comparison
Theoretical Background
Section titled “Theoretical Background”A replica integrity verifier is the tool that answers “is my
replica really identical to the master?” — a question that comes up
after a long-running replication chain, after suspected data
corruption, or as a routine periodic check before a planned
failover. The trivial implementation (dump both sides and diff)
is impractical at scale: it requires either suspending writes for
the duration of the dump or paying expensive snapshot-tracking
costs, and the bandwidth cost is one full table-scan per replica.
Production engines therefore ship sampling or chunked-checksum verifiers that trade exhaustiveness for online operation:
- Per-row checksum. A row-by-row hash comparison; exhaustive but proportional to data size and requires synchronisation between the two sides for every row.
- Chunked checksum. Tables are split into chunks (by PK range, by row count, or by time window); per-chunk checksums are computed on each side and compared. A chunk-level mismatch flags a divergence; the operator can drill down to per-row only on flagged chunks.
- Replication-replayed checksum. The master computes per-chunk
checksums into a replicated table, the same replication
stream carries them to the slave, and the slave recomputes
the same chunks locally and updates the table with its own
checksum. Comparison becomes a
WHERE master_checksum != slave_checksumquery on the result table. Bandwidth cost is one summary row per chunk, not one row per data row.
CUBRID’s checksumdb picks option 3, with chunking by primary-key
range along the PK ordering (so each chunk is contiguous in the
table’s logical ordering), a configurable chunk size measured in
rows, and a separate schema-checksum side that detects DDL drift
even when row data is intact.
Common DBMS Design
Section titled “Common DBMS Design”| Engine | Tool | Approach | Replica trip |
|---|---|---|---|
| MySQL | pt-table-checksum (Percona Toolkit) | Chunked CRC32 by PK range; replicates the checksum SQL via binlog so slaves compute on their own data | One result row per chunk replicated |
| PostgreSQL | pg_compare, pglogical_compare, bucardo_ctl validate | Per-table snapshot diff; some tools use chunked hashes | Varies by tool |
| Oracle | DBMS_COMPARISON package | Chunked PK-range hashes; can also reconcile differences | One summary row per scan column |
| MongoDB | db.collection.dataSize() summary checks; no first-class row-level integrity verifier | Application-level | n/a |
| CUBRID checksumdb | Chunked CRC by PK range, replicated through the same WAL stream that ferries data; explicit schema-checksum side | One result row per chunk replicated; schema rows replicated separately |
The CUBRID design is closest to pt-table-checksum — chunked PK
range, replication-replayed, lock-during-checksum. The
distinguishing trait is the explicit schema-checksum table:
DDL drift between master and replica produces a separate row class
that doesn’t conflict with row-data divergence reports.
CUBRID’s Approach
Section titled “CUBRID’s Approach”The two replicated tables
Section titled “The two replicated tables”checksumdb maintains two dedicated tables that exist only for
its own use:
| Table | Contents | Purpose |
|---|---|---|
db_ha_apply_info_chksum_* (the result table) | One row per (target table, chunk_id) pair: PK lower bound, chunk size, master checksum, master replication info, slave checksum (computed at replay time) | Per-chunk verification |
db_ha_apply_info_chksum_*_schema (the schema table) | One row per checked target table: serialised schema definition + repid + master schema checksum | DDL-drift detection |
The exact name suffix is configurable; the constants
chksum_result_Table_name and chksum_schema_Table_name are
populated from the database name during startup.
The two tables are regular CUBRID tables — they participate in
WAL like any other catalog row, which means the master’s writes
to them propagate automatically to slaves through the existing
HA replication path (see cubrid-ha-replication.md). On the
slave side, the apply-log daemon replays the master’s row inserts
into the result table; checksumdb-on-the-slave then re-reads each
row, recomputes the chunk’s checksum locally, and writes the
slave’s own checksum back into the same row for comparison.
Chunking by PK range
Section titled “Chunking by PK range”For each user table, checksumdb walks the data along the PK
ordering, breaking it into chunks of --chunk-size rows
(default 10,000). Each chunk has:
lower_bound: a printable expression equivalent to the first row of the chunk, written as aWHEREclause that selects “rows where (pk_col1, pk_col2, …) >= (lb1, lb2, …)” with proper quoting and escaping.upper_bound: implicit — the next chunk’s lower bound, or the end of the table.chunk_id: a sequential integer per (table, chunk) pair.checksum: a CRC over the chunk’s rows in PK order, computed by aSELECT BIT_XOR(MD5(...))query that reads from the table itself with the WHERE-bound applied andLIMIT chunk_size.
// chksum_calculate_checksum — checksumdb.c:1686chksum_calculate_checksum (PARSER_CONTEXT *parser, const OID *class_oidp, const char *table_name, DB_ATTRIBUTE *attributes, PARSER_VARCHAR *lower_bound, int chunk_id, int chunk_size){ /* build the actual checksum SELECT for this chunk */ checksum_query = chksum_print_checksum_query (parser, table_name, attributes, lower_bound, chunk_id, chunk_size);
/* set replication-record info so the master's replay of the * INSERT-into-result-table to slaves carries the right context; * acquire SHARED lock on the user table so the chunk is * consistent for the duration of the SELECT */ chksum_set_repl_info_and_demote_table_lock (table_name, checksum_query, ...);
/* execute the checksum SELECT; INSERT result row into the * result table; advance to the next chunk via * chksum_get_next_lower_bound */}The lock pattern is important: an exclusive LOCK TABLE WRITE
would block readers; a LOCK TABLE READ (SHARED) acquired on the
user table allows other readers but blocks writers for just the
duration of the chunk’s SELECT, then is downgraded / released.
This is the trade-off between consistency and online operation:
replication during the lock will queue, briefly stall, then
resume.
The lower-bound iteration
Section titled “The lower-bound iteration”chksum_get_next_lower_bound (called between chunks) issues a
short SELECT to find the PK of the row that would be the first
row of the next chunk:
SELECT pk_col1, pk_col2, ...FROM <table>WHERE (pk_col1, pk_col2, ...) >= (lb1, lb2, ...)ORDER BY pk_col1 ASC, pk_col2 ASC, ...LIMIT <chunk_size>;-- last row of the result is the boundaryThe “last row” of the limited result becomes the next chunk’s
lower bound. If the result has fewer than chunk_size rows, the
table is at its end and the loop terminates.
For the very first chunk, chksum_get_initial_lower_bound
returns a synthetic “minimum PK” expression — usually a NULL or
MIN(pk) derivative, depending on PK column types — that selects
all rows.
chksum_arg — option carrier
Section titled “chksum_arg — option carrier”struct chksum_arg { int chunk_size; // rows per chunk; default 10000 int sleep_msecs; // pause between chunks to throttle int timeout_msecs; // per-chunk lock-wait timeout bool resume; // pick up from previous run bool cont_on_err; // skip a failing table, continue bool schema_only; // skip data chunks; do schema check only dynamic_array *include_list; // explicit table allowlist dynamic_array *exclude_list; // table blocklist (default: catalog tables)};Five operational knobs:
--chunk-size. Smaller chunks = more lock-acquire overhead- more result rows but shorter per-chunk lock window. Default 10,000 is the production sweet spot for OLTP tables; very wide rows or very busy tables may want smaller.
--sleep-msecs. Sleep N ms between chunks to keep the verifier from saturating either side; especially relevant on the master where the SELECT competes with live workload.--timeout-msecs. How long to wait for the per-chunk SHARED lock; expiration aborts the chunk (and, depending on--cont-on-err, the table or the whole run).--resume. Skip already-checksummed chunks by reading the prior result table and starting from the highest storedchunk_idper table.chksum_get_prev_checksum_resultspopulateschksum_Prev_results;chksum_set_initial_chunk_id_and_lower_boundconsumes it.--schema-only. Skip data and only re-check the schema table. Useful as a fast pre-flight before a full data run, or for ongoing DDL-drift monitoring without the per-row cost.
Include / exclude list filtering
Section titled “Include / exclude list filtering”Include and exclude lists are processed per-table by
chksum_need_skip_table:
- If
include_listis non-empty, only listed tables are checksummed. - The
exclude_listalways wins over the include list. - The default
exclude_listincludes catalog tables (db_class,db_attribute, etc.) which would otherwise produce huge schema-noise.
The two lists are read from files specified by
--include-class-file / --exclude-class-file (one table name
per line), or set directly via repeated --include-class /
--exclude-class flags.
Schema check
Section titled “Schema check”Before the data run starts, checksumdb computes the schema checksum for every target table:
// for each target table:chksum_update_current_schema_definition (table_name, repid);// serialises the class definition (column list with types,// constraints, owner) into a textual canonical form;// computes its MD5; compares against the stored schema row;// inserts a fresh row if absent;// marks divergence in chksum_report_schema_diff if mismatched.The schema checksum is computed as of the current revision
(repid is the schema revision id from the catalog). On the
slave, the same canonical-form serialisation runs locally and
the recomputed checksum is compared.
If a table’s schema diverges, data-chunk checksums are still attempted (they may also diverge or coincidentally match — both states are recorded), but the run reports the schema mismatch as the primary error.
Report
Section titled “Report”After a run completes (or partially completes), chksum_report
opens an output file and writes three sections:
- Header —
chksum_report_header: database name, run start time, output table names,--chunk-size, include/exclude lists. - Summary —
chksum_report_summary: per-table totals (chunks checked, chunks matched, chunks mismatched, schema status). - Diffs —
chksum_report_diff(data) andchksum_report_schema_diff(schema): per-mismatch rows with PK lower bound, master vs slave checksum, replication info.
The report is human-readable text; downstream tooling typically parses the summary table for monitoring integration.
Resume mode
Section titled “Resume mode”--resume is the operational mode for long-running verifications
that may be interrupted (operator stops the run, the slave
restarts during apply, etc.). On resume:
chksum_get_prev_checksum_resultsreads every existing row from the result table into the in-memory linked listchksum_Prev_results(one entry per table with the highest chunk_id already done).- For each target table at chunk-start time,
chksum_set_initial_chunk_id_and_lower_boundconsults the list; if the table has prior results, it setschunk_id = prev_chunk_id + 1andlower_boundfrom the prior row’slast_lower_bound; otherwise starts from chunk 1 and the synthetic minimum-PK bound.
The result table is therefore both an output and a checkpoint — the same WAL replication that propagates the result rows to slaves also supports the master’s restart from a partial run.
Source Walkthrough
Section titled “Source Walkthrough”Top-level (checksumdb.c)
Section titled “Top-level (checksumdb.c)”| Symbol | Role |
|---|---|
checksumdb (entry, called by cubrid checksumdb) | UTIL_FUNCTION_ARG-style entry; option parse; calls chksum_start |
chksum_start | Main loop; for each target table, run schema check then data chunks |
chksum_init_checksum_tables | Drop + recreate result tables, or read prior results when resuming |
chksum_drop_and_create_checksum_table | Catalog-level drop + create of the result/schema tables |
chksum_get_prev_checksum_results / _get_checksum_result / _free_results | Resume-mode prior-run consumer |
chksum_need_skip_table | Include/exclude predicate |
Chunking and SQL emit
Section titled “Chunking and SQL emit”| Symbol | Role |
|---|---|
chksum_calculate_checksum | Per-chunk: build SELECT, set repl info, lock, execute, insert result, advance |
chksum_print_checksum_query | Build the SELECT BIT_XOR(MD5(...)) over the chunk |
chksum_print_pk_list | Render PK-column list |
chksum_print_attribute_list | Render full attribute list (used in checksum query) |
chksum_print_select_last_chunk | Boundary-finding SELECT |
chksum_print_lower_bound_string / chksum_get_quote_escaped_lower_bound | Render PK tuple as quoted SQL literals |
chksum_get_initial_lower_bound / chksum_get_next_lower_bound | Per-chunk boundary advance |
chksum_set_initial_chunk_id_and_lower_bound | Resume-aware first-chunk setup |
Locking and replication
Section titled “Locking and replication”| Symbol | Role |
|---|---|
chksum_set_repl_info_and_demote_table_lock | Acquire SHARED lock on the user table, set replication-record context so the master’s INSERT-into-result-table replicates with the right metadata |
Schema check
Section titled “Schema check”| Symbol | Role |
|---|---|
chksum_update_current_schema_definition | Serialise current schema, compute MD5, update existing row |
chksum_insert_schema_definition | Insert when no prior row exists |
Master checksum SELECT
Section titled “Master checksum SELECT”| Symbol | Role |
|---|---|
chksum_print_select_master_checksum | Build SELECT to read master checksum from the result table |
chksum_print_update_master_checksum | Build UPDATE to write master checksum |
chksum_update_master_checksum | Execute the update |
Reporting
Section titled “Reporting”| Symbol | Role |
|---|---|
chksum_report | Top-level report driver |
chksum_report_open_file | Open report output file |
chksum_report_header | Header |
chksum_report_summary | Per-table totals |
chksum_report_diff | Per-mismatch data rows |
chksum_report_schema_diff | Per-mismatch schema rows |
Position hints (as of 2026-05-05)
Section titled “Position hints (as of 2026-05-05)”| Symbol | Path |
|---|---|
CHKSUM_RESULT (struct) | src/executables/checksumdb.c:84 |
CHKSUM_ARG (struct) | src/executables/checksumdb.c:94 |
chksum_calculate_checksum | src/executables/checksumdb.c:1686 |
chksum_print_checksum_query | src/executables/checksumdb.c:1082 |
chksum_get_next_lower_bound | src/executables/checksumdb.c:1319 |
chksum_start | src/executables/checksumdb.c:1798 |
chksum_report_open_file | src/executables/checksumdb.c:160 |
chksum_report_header | src/executables/checksumdb.c:175 |
Symbol names are the canonical anchor; line numbers are hints
scoped to the updated: date.
Cross-check Notes
Section titled “Cross-check Notes”- Result table is itself replicated. The point of replicating
the master’s checksum rows is so that the slave’s local recompute
has something to compare against without a separate connection
to the master. A non-HA database has no use for
checksumdbbeyond the schema-only mode. - Replication delay matters. If the slave hasn’t caught up to
the moment when the master finished a chunk, the slave’s
recomputed checksum will be over a prior state of the table.
Operators run
checksumdbonly after replication is confirmed caught-up (viacubrid heartbeatstatus ordb_ha_apply_info_*lag indicators). - SHARED lock semantics. The per-chunk lock is
LOCK TABLE READ, downgraded after the SELECT. Concurrent INSERT/UPDATE to the same table queues briefly per chunk, multiplied by the number of chunks in the table. For very large tables, the cumulative impact is meaningful —--sleep-msecsexists exactly to spread that impact. - PK is required. Tables without a primary key cannot be chunked along a stable order; checksumdb skips them with a warning and records the skip in the report.
- Catalog-table exclusion is hard-coded in default. The
exclude list defaults include
db_class,db_attribute,db_attr_setdomain_elm, etc. Operators wanting to verify the catalog itself would have to override the exclude list, but the catalog is also the most likely victim of replication drift, so this is mostly a noise-suppression default rather than a correctness call. - Schema serialisation must be canonical. Both sides must produce byte-identical text from the same schema; otherwise every checksum comparison would falsely flag drift. The serialiser explicitly normalises whitespace, attribute ordering (using attribute-order, not storage-order), and quoting.
Open Questions
Section titled “Open Questions”- Per-chunk lock backoff. A chunk that times out is recorded as failed; there’s no in-loop retry with backoff. Long-held application transactions can therefore leave gaps in the chunk grid that don’t get re-attempted within the run.
- MVCC snapshot bracketing.
checksumdbdoesn’t pin a snapshot across chunks — each chunk reads at its own MVCC snapshot. A row that moves between chunks during the run could be missed (counted in neither chunk) or double-counted. Real divergence detection therefore tolerates a small false-positive rate from concurrent writes; operators typically runchecksumdbagainst a quiesced or read-mostly window. - Cross-version compatibility. The serialised schema form
is tied to the schema’s
repid; cross-version replication with a schema change in flight can produce diffs that aren’t meaningful. The remediation is to runchecksumdbonly when master and slave are at the samerepid. - No ECC-style auto-repair.
checksumdbreports differences but doesn’t reconcile. Repair requires manual intervention or a re-baselining of the slave from a fresh master backup.
Sources
Section titled “Sources”src/executables/checksumdb.c— entire utility (single file, ~2400 lines)src/executables/AGENTS.md— agent guide- Adjacent docs:
cubrid-ha-replication.md(the replication channel checksumdb relies on),cubrid-cdc.md(the CDC alternative for change capture; checksumdb verifies HA, CDC publishes changes),cubrid-cub-admin.md§“Database-admin verbs” (checksumdb is a CS_ONLY verb in the unifiedcubridadmin entry),cubrid-master-process.md(cub_master is involved only as the connection broker; checksumdb itself runs inlibcubridcs)