CUBRID Heartbeat — Cluster Liveness, Failover and Failback
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Source verification (as of 2026-05-01)
- Beyond CUBRID — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”The heartbeat machinery is the contract holder of cluster liveness. Two textbook problems live underneath it: failure detection (deciding a peer is gone) and leader election (deciding who replaces the gone peer). Designing Data-Intensive Applications (Kleppmann), Ch. 5 “Replication” and Ch. 9 “Consistency and Consensus” give the modern framing; Chandra & Toueg’s Unreliable Failure Detectors for Reliable Distributed Systems (PODC 1996) is the formal reference for why an asynchronous network forces every detector to be either unsafe (false positives) or untimely (slow).
In a primary/standby database cluster the same machinery covers three operational events:
- Failover — a slave promotes itself to master because the former master is no longer reachable.
- Failback — a master demotes itself to slave because another node is the rightful master (split-brain loser, or the local resource has failed).
- Demote — a master temporarily steps aside while a new master is elected (resource-side disk failure on the master is the canonical trigger).
Three implementation choices the model leaves open shape every real engine and frame the rest of this document:
- Consensus or local decision? Classical Raft (Ongaro &
Ousterhout, USENIX ATC 2014) and ZooKeeper-style ZAB elect a
leader by quorum: a node only believes it is the leader once
a majority has confirmed. CUBRID makes the opposite choice —
each
cub_masterreaches an independent verdict from its own peer table. This is cheaper but inherits the asymmetric partition and split-brain problems consensus-based systems sidestep. - Push or pull liveness? Some systems push periodic
“I am alive” packets; others pull (the watcher polls the
watched). CUBRID pushes —
cub_masterbroadcasts a UDP heartbeat to every othercub_masterit knows about, and updates its local view of each peer’s state from the received packets. - What stops a network-partitioned slave from promoting?
An isolated slave that cannot reach the master would, with
timeout-only logic, unilaterally promote and create
split-brain. CUBRID has two guards:
ha_ping_hosts(a list of external addresses that must be reachable for the local node to trust its own promotion decision) and theis_isolatedpredicate (the local node is isolated if every non-replica peer is inHB_NSTATE_UNKNOWN).
After these are named, every CUBRID-specific structure in this document either implements one of them or makes the resulting state machine durable.
Common DBMS Design
Section titled “Common DBMS Design”Every primary/standby cluster — MySQL/Galera, PostgreSQL with Patroni or repmgr, Oracle Data Guard, MongoDB replica sets, CUBRID — adopts the same set of patterns on top of the textbook failure detector. They are not in the original Chandra-Toueg paper; they are the engineering vocabulary that lives between the theory and the source.
Per-peer scoring with priority and state
Section titled “Per-peer scoring with priority and state”The detector cannot just emit a boolean per peer; it has to
order peers so that promotion is deterministic across nodes.
The standard pattern packs (state, priority) into one
comparable scalar: high bits encode role (master, to-be-master,
slave, replica, unknown), low bits encode priority within that
role. PostgreSQL’s synchronous_standby_names priority list,
MongoDB’s priority member field, and CUBRID’s node->score
are the same idea.
Heartbeat gap and last-heard timestamp
Section titled “Heartbeat gap and last-heard timestamp”Two independent staleness signals coexist because each catches
a different failure mode. The gap counter (incremented on
every send, decremented on every receive) catches symmetric
loss — the network drops both directions equally. The
last-heard timestamp catches asymmetric loss — receives
work but our packets vanish on the way out. CUBRID keeps both
in HB_NODE_ENTRY (heartbeat_gap and last_recv_hbtime);
either threshold breach demotes the peer to HB_NSTATE_UNKNOWN.
A separate ping channel for split-brain prevention
Section titled “A separate ping channel for split-brain prevention”Pure peer-to-peer liveness cannot tell “the master is gone”
apart from “I am partitioned away from the master”. A node
that promotes on the wrong answer creates split-brain. The
canonical fix is a third reference point: a static list of
external hosts (gateway, DNS server, witness node) the local
node pings before accepting a promotion. Pacemaker calls this
fencing; ZooKeeper calls it the witness; CUBRID calls it
ha_ping_hosts. The semantics are identical — an isolated
node that also cannot ping the witnesses is genuinely cut off
and must not promote.
Job queue + worker thread
Section titled “Job queue + worker thread”The detector cannot run on the I/O path because send/receive
must not block on slow operations like fork()/execv() of a
restarted server. The standard split is two threads minimum:
one reads the wire, one runs the FSM transitions as deferred
jobs. Galera’s gcs thread plus applier, etcd’s
raft.Node.Tick plus applier, CUBRID’s
cluster_reader_th plus cluster_worker_th follow the same
shape.
Resource side — keep cluster decisions and process management apart
Section titled “Resource side — keep cluster decisions and process management apart”A heartbeat module also has to start, stop, and re-mode local
processes (the database server, replication readers). Mixing
this with cluster gossip leads to deadlocks (sending a
heartbeat blocks because we are holding a process-table lock).
The standard separation is two protected blocks with two
locks: cluster state (peer table) and resource state (local
process table). CUBRID materialises this as hb_Cluster /
hb_Resource, each with its own pthread_mutex_t lock.
Theory ↔ CUBRID mapping
Section titled “Theory ↔ CUBRID mapping”| Theoretical concept | CUBRID name |
|---|---|
| Failure detector + leader election | hb_Cluster global with peer table (HB_NODE_ENTRY linked list) |
| Local node FSM state | HB_NODE_STATE { UNKNOWN, SLAVE, TO_BE_MASTER, TO_BE_SLAVE, MASTER, REPLICA } (heartbeat.h:86) |
| Peer-state-induced score | node->score = node->priority | HB_NODE_SCORE_<state> in hb_cluster_calc_score |
| Score role bit-mask | HB_NODE_SCORE_MASTER 0x8000, HB_NODE_SCORE_TO_BE_MASTER 0xF000, HB_NODE_SCORE_SLAVE 0x0000, HB_NODE_SCORE_UNKNOWN 0x7FFF (master_heartbeat.h:122-125) |
| Symmetric-loss staleness signal | HB_NODE_ENTRY::heartbeat_gap + ha_max_heartbeat_gap (default 5) |
| Asymmetric-loss staleness signal | HB_NODE_ENTRY::last_recv_hbtime + ha_calc_score_interval_in_msecs (default 3000) |
| Witness-host channel | HB_PING_HOST_ENTRY list under hb_Cluster->ping_hosts; gate flag is_ping_check_enabled |
| Isolation predicate | hb_cluster_is_isolated (master_heartbeat.c:762) |
| Split-brain “two masters” detection | num_master > 1 branch in hb_cluster_job_calc_score (master_heartbeat.c:867) |
| Cluster job FSM enum | HB_CLUSTER_JOB { INIT, HEARTBEAT, CALC_SCORE, CHECK_PING, FAILOVER, FAILBACK, CHECK_VALID_PING_SERVER, DEMOTE } (master_heartbeat.h:62) |
| Resource job FSM enum | HB_RESOURCE_JOB { PROC_START, PROC_DEREG, CONFIRM_START, CONFIRM_DEREG, CHANGE_MODE, DEMOTE_START_SHUTDOWN, DEMOTE_CONFIRM_SHUTDOWN, CLEANUP_ALL, CONFIRM_CLEANUP_ALL } (master_heartbeat.h:76) |
| Process state on resource side | HB_PROC_STATE { DEAD, DEREGISTERED, STARTED, REGISTERED_AND_STANDBY, REGISTERED_AND_TO_BE_STANDBY, REGISTERED_AND_ACTIVE, REGISTERED_AND_TO_BE_ACTIVE } (master_heartbeat.h:93) |
| Wire header | HBP_HEADER { type, r:1, len, seq, group_id, orig_host_name, dest_host_name } (heartbeat.h:114) |
| Wire body | one HB_NODE_STATE_TYPE packed via or_pack_int (master_heartbeat.c:1719) |
| Reader thread | hb_thread_cluster_reader (master_heartbeat.c:4704) |
| Cluster worker thread | hb_thread_cluster_worker (master_heartbeat.c:4659) |
| Resource worker thread | hb_thread_resource_worker (master_heartbeat.c:4769) |
| Server-hang detector thread | hb_thread_check_disk_failure (master_heartbeat.c:4814) |
CUBRID’s Approach
Section titled “CUBRID’s Approach”The heartbeat module has four moving parts: the cluster-side FSM that gossips peer state and elects a master, the resource-side FSM that registers and supervises local processes, the job queue + worker pair that drives both FSMs, and the wire protocol they exchange. We walk them in that order.
Overall structure
Section titled “Overall structure”flowchart LR
subgraph WIRE["UDP cluster gossip"]
PEER1["cub_master @ Node1"]
PEER2["cub_master @ Node2"]
PEER3["cub_master @ Node3"]
end
subgraph LOCAL["Local cub_master process"]
R["cluster_reader_th\nhb_thread_cluster_reader"]
CW["cluster_worker_th\nhb_thread_cluster_worker"]
RW["resource_worker_th\nhb_thread_resource_worker"]
DK["check_disk_failure_th\nhb_thread_check_disk_failure"]
CJQ["cluster_Jobs\n(CJOB queue)"]
RJQ["resource_Jobs\n(RJOB queue)"]
HC["hb_Cluster\n(peer table)"]
HR["hb_Resource\n(local proc table)"]
end
subgraph PROCS["Local HA processes"]
SVR["cub_server"]
CL["copylogdb"]
AL["applylogdb"]
end
PEER2 -- HBP_CLUSTER_HEARTBEAT --> R
PEER3 -- HBP_CLUSTER_HEARTBEAT --> R
R --> HC
CW --> CJQ
CJQ --> CW
CW --> HC
CW --> RJQ
RW --> RJQ
RJQ --> RW
RW --> HR
RW --> SVR
RW --> CL
RW --> AL
DK --> HR
DK --> SVR
PEER1 <-- HBP_CLUSTER_HEARTBEAT --> R
CW -. broadcast .-> PEER1
CW -. broadcast .-> PEER2
CW -. broadcast .-> PEER3
The figure encodes three boundaries. (reader / worker) the
wire is read by one thread (cluster_reader_th) and the FSM
runs in another (cluster_worker_th); the queue between them
(cluster_Jobs) is the only synchronisation. (cluster /
resource) peer-table mutations and process-table mutations
are protected by separate locks (hb_Cluster->lock,
hb_Resource->lock); the worker threads cross between the two
on transitions like failover, but each lock is held only as
briefly as the transition demands. (cub_master / managed
processes) the local cub_master owns no database state of
its own; it supervises the processes that do.
Cluster-side FSM — node state transitions
Section titled “Cluster-side FSM — node state transitions”The peer state space is six values:
// HB_NODE_STATE — src/connection/heartbeat.h:86enum HB_NODE_STATE{ HB_NSTATE_UNKNOWN = 0, HB_NSTATE_SLAVE = 1, HB_NSTATE_TO_BE_MASTER = 2, HB_NSTATE_TO_BE_SLAVE = 3, HB_NSTATE_MASTER = 4, HB_NSTATE_REPLICA = 5, HB_NSTATE_MAX};UNKNOWN is the absence-of-information state — every node
starts as SLAVE (or REPLICA when configured as a
replica-only host) and only enters UNKNOWN if the peer’s
gap or last-heard time crosses a threshold. TO_BE_MASTER
and TO_BE_SLAVE are the in-flight transitions; the source
notes that TO_BE_SLAVE is reachable only by remote update
(a peer telling us it is going slave) — local
MASTER → SLAVE transitions skip it and go direct.
stateDiagram-v2 [*] --> SLAVE : start [*] --> REPLICA : ha_replica_list SLAVE --> TO_BE_MASTER : calc_score elects me TO_BE_MASTER --> MASTER : failover confirms TO_BE_MASTER --> SLAVE : failover cancelled MASTER --> SLAVE : failback (split-brain loser) MASTER --> SLAVE : demote (resource fail) MASTER --> UNKNOWN : demote (transient, before SLAVE) UNKNOWN --> SLAVE : demote step 2 SLAVE --> UNKNOWN : peer view only MASTER --> UNKNOWN : peer view only REPLICA --> REPLICA : never elected
The most important property is what is not in the
diagram: there is no transition from SLAVE to MASTER that
bypasses TO_BE_MASTER. Every promotion goes through the
intermediate state because the intermediate state is the
window in which a re-scored cluster can still cancel the
promotion (the case : failover arm of calc_score writes
TO_BE_MASTER, then the subsequent failover job re-runs
hb_cluster_calc_score and can revert to SLAVE if the
result has changed).
Score computation — local leader election
Section titled “Score computation — local leader election”Each cub_master independently runs hb_cluster_calc_score
on a timer (default ha_calc_score_interval_in_msecs = 3000).
The function maps every known peer to a short score, then
the smallest score wins.
// hb_cluster_calc_score — src/executables/master_heartbeat.c:1556static inthb_cluster_calc_score (void){ int num_master = 0; short min_score = HB_NODE_SCORE_UNKNOWN; HB_NODE_ENTRY *node; struct timeval now;
hb_Cluster->myself->state = hb_Cluster->state; gettimeofday (&now, NULL);
for (node = hb_Cluster->nodes; node; node = node->next) { /* Demote stale peers to UNKNOWN — symmetric or asymmetric loss. */ if (node->heartbeat_gap > prm_get_integer_value (PRM_ID_HA_MAX_HEARTBEAT_GAP) || (!HB_IS_INITIALIZED_TIME (node->last_recv_hbtime) && HB_GET_ELAPSED_TIME (now, node->last_recv_hbtime) > prm_get_integer_value (PRM_ID_HA_CALC_SCORE_INTERVAL_IN_MSECS))) { // ... condensed: save peer name if it was master, then ... node->state = HB_NSTATE_UNKNOWN; }
switch (node->state) { case HB_NSTATE_MASTER: case HB_NSTATE_TO_BE_SLAVE: node->score = node->priority | HB_NODE_SCORE_MASTER; /* 0x8000 */ break; case HB_NSTATE_TO_BE_MASTER: node->score = node->priority | HB_NODE_SCORE_TO_BE_MASTER; /* 0xF000 */ break; case HB_NSTATE_SLAVE: node->score = node->priority | HB_NODE_SCORE_SLAVE; /* 0x0000 */ break; case HB_NSTATE_REPLICA: case HB_NSTATE_UNKNOWN: default: node->score = node->priority | HB_NODE_SCORE_UNKNOWN; /* 0x7FFF */ break; }
if (node->score < min_score) { hb_Cluster->master = node; min_score = node->score; } if (node->score < (short) HB_NODE_SCORE_TO_BE_MASTER) num_master++; } return num_master;}Two non-obvious facts. First, the role-bit assignment is
ordered so that MASTER (0x8000) is smallest as a short
(0x8000 reads as -32768) — the min_score comparison
naturally favours the existing master, with priority breaking
ties between two masters. Second, TO_BE_SLAVE shares the
master role bit; this is intentional because a peer mid-demote
is still authoritative until its successor is confirmed.
The num_master counter (every node whose score is below
HB_NODE_SCORE_TO_BE_MASTER) is the split-brain detector — if
more than one node thinks it is master simultaneously, the
caller in hb_cluster_job_calc_score queues a FAILBACK for
the loser.
Cluster job FSM — what the worker dispatches
Section titled “Cluster job FSM — what the worker dispatches”The cluster worker dequeues from cluster_Jobs and dispatches
through the cluster job table:
// hb_cluster_jobs — src/executables/master_heartbeat.c:259static HB_JOB_FUNC hb_cluster_jobs[] = { hb_cluster_job_init, hb_cluster_job_heartbeat, hb_cluster_job_calc_score, hb_cluster_job_check_ping, hb_cluster_job_failover, hb_cluster_job_failback, hb_cluster_job_check_valid_ping_server, hb_cluster_job_demote, NULL};The index into the array is the HB_CLUSTER_JOB enum value;
the table is in lockstep with the enum and a missing entry
would be caught by the NULL sentinel.
The transitions between cluster jobs are not represented as a formal state diagram — they emerge from each job re-queueing its successor. The spine of normal operation is:
flowchart LR INIT["INIT"] HB["HEARTBEAT (every 0.5s)"] CV["CHECK_VALID_PING_SERVER"] CS["CALC_SCORE (every 3s)"] CP["CHECK_PING"] FO["FAILOVER"] FB["FAILBACK"] DM["DEMOTE"] INIT --> HB INIT --> CV INIT --> CS HB --> HB CV --> CV CS --> CS CS --> CP CP --> FO CP --> CS CP --> FB FO --> CS FB --> CS DM --> DM
Each arrow is the job calling hb_cluster_job_queue on its
own successor. The most consequential branch is in
hb_cluster_job_calc_score: depending on the local node’s
state, the isolation predicate, and the split-brain count, the
job either continues with another CALC_SCORE, queues a
CHECK_PING to validate isolation, or queues FAILBACK
directly.
One failover, end to end
Section titled “One failover, end to end”sequenceDiagram
participant M as Old master (Node1)
participant N as Slave (Node2, will promote)
participant P as Slave (Node3)
participant W as ha_ping_hosts witnesses
Note over M: cub_master crash or partition
loop every 0.5s
N->>M: HBP_CLUSTER_HEARTBEAT (req)
N->>P: HBP_CLUSTER_HEARTBEAT (req)
P-->>N: HBP_CLUSTER_HEARTBEAT (resp, state=SLAVE)
Note over N: node[Node1].heartbeat_gap++
end
Note over N: heartbeat_gap > ha_max_heartbeat_gap
N->>N: hb_cluster_calc_score:<br/>node[Node1].state = UNKNOWN
N->>N: min score is myself (priority|SLAVE)
N->>N: state = TO_BE_MASTER, broadcast hb
N->>W: ping ha_ping_hosts
W-->>N: at least one replies
N->>N: hb_cluster_job_failover:<br/>recompute score → still me
N->>N: state = MASTER<br/>hb_Resource->state = MASTER
N->>N: queue HB_RJOB_CHANGE_MODE
N->>P: HBP_CLUSTER_HEARTBEAT (state=MASTER)
P->>P: peer table updated, master=Node2
Two timing properties matter. (a) TO_BE_MASTER is held for
at least ha_failover_wait_time_in_msecs (default 3 seconds)
when no ping witnesses exist or when not all peers replied —
this is the window in which a still-alive but slow master can
reassert. (b) The transition TO_BE_MASTER → MASTER is gated
on a second score recomputation inside hb_cluster_job_failover,
not just the first one in hb_cluster_job_calc_score; if the
peer table changed during the wait, failover cancels.
Failover — hb_cluster_job_failover
Section titled “Failover — hb_cluster_job_failover”// hb_cluster_job_failover — src/executables/master_heartbeat.c:1163static voidhb_cluster_job_failover (HB_JOB_ARG * arg){ int num_master;
pthread_mutex_lock (&hb_Cluster->lock); num_master = hb_cluster_calc_score ();
if (hb_Cluster->master && hb_Cluster->myself && hb_Cluster->master->priority == hb_Cluster->myself->priority) { /* I am still the highest-priority master after the wait. */ hb_Cluster->state = HB_NSTATE_MASTER; hb_Resource->state = HB_NSTATE_MASTER; hb_resource_job_set_expire_and_reorder (HB_RJOB_CHANGE_MODE, HB_JOB_TIMER_IMMEDIATELY); } else { /* A new master appeared during the wait — abort. */ hb_Cluster->state = HB_NSTATE_SLAVE; } hb_cluster_request_heartbeat_to_all (); pthread_mutex_unlock (&hb_Cluster->lock); // ... condensed: re-queue CALC_SCORE ...}The promotion is two-sided: hb_Cluster->state advances the
cluster-side FSM, and hb_Resource->state advances the
resource-side FSM. The resource side is what
HB_RJOB_CHANGE_MODE reads to flip the local cub_server
from HA_SERVER_STATE_STANDBY to HA_SERVER_STATE_ACTIVE.
Failback — hb_cluster_job_failback
Section titled “Failback — hb_cluster_job_failback”Failback is unconditional: by the time the job runs, the caller has already decided this node is no longer master. The job’s responsibility is to clean up.
// hb_cluster_job_failback — src/executables/master_heartbeat.c:1351 (condensed)static voidhb_cluster_job_failback (HB_JOB_ARG * arg){ HB_PROC_ENTRY *proc; pid_t *pids = NULL; int count = 0;
pthread_mutex_lock (&hb_Cluster->lock); hb_Cluster->state = HB_NSTATE_SLAVE; hb_Cluster->myself->state = hb_Cluster->state; hb_cluster_request_heartbeat_to_all (); /* announce SLAVE */ pthread_mutex_unlock (&hb_Cluster->lock);
pthread_mutex_lock (&hb_Resource->lock); hb_Resource->state = HB_NSTATE_SLAVE; for (proc = hb_Resource->procs; proc; proc = proc->next) if (proc->type == HB_PTYPE_SERVER) { pids = (pid_t *) realloc (pids, sizeof (pid_t) * (count + 1)); pids[count++] = proc->pid; } pthread_mutex_unlock (&hb_Resource->lock);
hb_kill_process (pids, count); /* SIGTERM, then SIGKILL on timeout */ // ... condensed: re-queue CALC_SCORE ...}Note the absence of an HB_NSTATE_TO_BE_SLAVE write —
failback is a one-step transition from MASTER to SLAVE.
The TO_BE_SLAVE value exists in the enum but is reachable
only via peer-side state mirrored in from a remote
HBP_CLUSTER_HEARTBEAT.
The cub_server process is killed rather than sent a
mode-change RPC because the slave-side cub_server has a
different in-process configuration (recovery semantics, log
applier) than the master-side one. Restart is the simplest way
to ensure the process comes back in the right shape.
Demote — hb_cluster_job_demote
Section titled “Demote — hb_cluster_job_demote”Demote is the master-initiated step-aside path, used when the
local resource has failed (typically detected by
hb_thread_check_disk_failure). Unlike failback, demote
waits for a new master to appear — up to
HB_MAX_WAIT_FOR_NEW_MASTER (60) iterations of the job, one
per second, before giving up and reasserting master.
The state sequence is MASTER → UNKNOWN → SLAVE with the
hide_to_demote flag asserted throughout. While hide_to_demote
is true the local node neither broadcasts heartbeats nor
participates in calc_score, so peers see this node as
UNKNOWN (no heartbeat in the gap window) and are free to
elect a new master without contention.
Wire protocol — UDP HBP_HEADER + state byte
Section titled “Wire protocol — UDP HBP_HEADER + state byte”A heartbeat packet is one HBP_HEADER followed by an
or-packed int carrying the sender’s HB_NODE_STATE_TYPE:
// HBP_HEADER — src/connection/heartbeat.h:114struct hbp_header{ unsigned char type; /* HBP_CLUSTER_HEARTBEAT */ /* (bit-field portability — bigendian/littleendian variants) */ char reserved:7; char r:1; /* 1 = request, 0 = response */ unsigned short len; /* body length */ unsigned int seq; char group_id[HB_MAX_GROUP_ID_LEN]; /* HA group filter */ char orig_host_name[CUB_MAXHOSTNAMELEN]; char dest_host_name[CUB_MAXHOSTNAMELEN];};The header carries a group_id because multiple HA clusters
can run on the same hosts; a packet whose group_id does not
match hb_Cluster->group_id is dropped silently. The r bit
distinguishes request from response — a request expects a
response, and the gap counter is incremented on every send
while the receive side does not increment a “response sent”
counter (so an isolated node’s gap monotonically grows even
if it answers received probes).
// hb_cluster_send_heartbeat_internal — src/executables/master_heartbeat.c:1702 (condensed)static inthb_cluster_send_heartbeat_internal (struct sockaddr_in *saddr, socklen_t saddr_len, char *dest_host_name, bool is_req){ HBP_HEADER *hbp_header; char buffer[HB_BUFFER_SZ], *p;
hbp_header = (HBP_HEADER *) (&buffer[0]); hb_set_net_header (hbp_header, HBP_CLUSTER_HEARTBEAT, is_req, OR_INT_SIZE, 0, dest_host_name);
p = (char *) (hbp_header + 1); p = or_pack_int (p, hb_Cluster->state);
return sendto (hb_Cluster->sfd, buffer, sizeof (HBP_HEADER) + OR_INT_SIZE, 0, (struct sockaddr *) saddr, saddr_len) > 0 ? NO_ERROR : ER_FAILED;}Reading and dispatching incoming heartbeats
Section titled “Reading and dispatching incoming heartbeats”hb_thread_cluster_reader is the only thread that calls
recvfrom on hb_Cluster->sfd. Its body is intentionally
short — it pushes work into hb_cluster_receive_heartbeat
and loops:
// hb_thread_cluster_reader — src/executables/master_heartbeat.c:4704 (condensed)static void *hb_thread_cluster_reader (void *arg){ SOCKET sfd = hb_Cluster->sfd; char buffer[HB_BUFFER_SZ + MAX_ALIGNMENT]; char *aligned_buffer = PTR_ALIGN (buffer, MAX_ALIGNMENT); struct pollfd po[1] = { {0, 0, 0} };
while (hb_Cluster->shutdown == false) { po[0].fd = sfd; po[0].events = POLLIN; if (poll (po, 1, 1) <= 0) continue;
if ((po[0].revents & POLLIN) && sfd == hb_Cluster->sfd) { struct sockaddr_in from; socklen_t from_len = sizeof (from); int len = recvfrom (sfd, aligned_buffer, HB_BUFFER_SZ, 0, (struct sockaddr *) &from, &from_len); if (len > 0) hb_cluster_receive_heartbeat (aligned_buffer, len, &from, from_len); } } return NULL;}hb_cluster_receive_heartbeat is where the inbound packet
mutates hb_Cluster:
- Validate
dest_host_nameagainsthb_Cluster->host_name(drop wrong-host). - Validate body length against
hbp_header->len. - If
hb_is_heartbeat_validrejects the packet (unknown host, group mismatch, IP mismatch), record the sender as anHB_UI_NODE_ENTRYfor diagnostics and drop. - If
r == 1andhide_to_demote == false, send a response viahb_cluster_send_heartbeat_resp. - Look up the sender in the peer table; update
node->state, decrementheartbeat_gap(floored at 0), and stamplast_recv_hbtime. - If the previously-known master demoted itself in this
message (
old.state == MASTER && new.state != MASTER), setis_state_changed = trueand after releasing the lock callhb_cluster_job_set_expire_and_reorderto bumpCALC_SCOREto immediate.
The “bump score immediate” in step 6 is the path that lets a
peer-side state change (e.g., the master demoting on its
disk-failure timer) propagate into our local view inside one
heartbeat interval rather than waiting for the next periodic
CALC_SCORE.
Resource side — supervising local processes
Section titled “Resource side — supervising local processes”hb_Resource tracks the cub_server, copylogdb, and
applylogdb processes the local cub_master is responsible
for. Each process is one HB_PROC_ENTRY:
// HB_PROC_ENTRY — src/executables/master_heartbeat.h:272 (excerpt)struct HB_PROC_ENTRY{ HB_PROC_ENTRY *next; HB_PROC_ENTRY **prev;
unsigned char state; /* HB_PROC_STATE — REGISTERED_AND_ACTIVE etc. */ unsigned char type; /* HB_PTYPE_SERVER / COPYLOGDB / APPLYLOGDB */ int sfd; /* TCP socket cub_master ↔ process */ int pid; char exec_path[HB_MAX_SZ_PROC_EXEC_PATH]; char args[HB_MAX_SZ_PROC_ARGS];
struct timeval frtime, rtime, dtime, ktime, stime;
unsigned short changemode_rid; unsigned short changemode_gap;
LOG_LSA prev_eof; /* previous server-reported EOF LSA */ LOG_LSA curr_eof; /* current server-reported EOF LSA */ bool is_curr_eof_received;
CSS_CONN_ENTRY *conn; bool being_shutdown; bool server_hang; /* set when prev_eof == curr_eof */};The two LOG_LSA fields plus server_hang are how
hb_thread_check_disk_failure decides the master’s local
cub_server has hung: it asks the server for its current
EOF LSA every ha_check_disk_failure_interval_in_secs
(default 15), and if two consecutive answers are equal the
server is presumed wedged. The detector then queues
HB_RJOB_DEMOTE_START_SHUTDOWN, which triggers the
MASTER → SLAVE demote chain.
Server-hang detection
Section titled “Server-hang detection”// hb_thread_check_disk_failure — src/executables/master_heartbeat.c:4814 (condensed)static void *hb_thread_check_disk_failure (void *arg){ while (hb_Resource->shutdown == false) { int interval = prm_get_integer_value (PRM_ID_HA_CHECK_DISK_FAILURE_INTERVAL_IN_SECS); if (interval > 0 && remaining_time_msecs <= 0) { pthread_mutex_lock (&css_Master_socket_anchor_lock); pthread_mutex_lock (&hb_Cluster->lock); pthread_mutex_lock (&hb_Resource->lock);
if (hb_Cluster->is_isolated == false && hb_Resource->state == HB_NSTATE_MASTER) { if (hb_resource_check_server_log_grow () == false) { /* Two equal EOF LSAs → server wedged → demote. */ hb_Resource->state = HB_NSTATE_SLAVE; // ... condensed: drop locks ... hb_resource_job_queue (HB_RJOB_DEMOTE_START_SHUTDOWN, NULL, HB_JOB_TIMER_IMMEDIATELY); continue; } } if (hb_Resource->state == HB_NSTATE_MASTER) hb_resource_send_get_eof (); /* ask server for fresh EOF */ // ... condensed: unlock ... remaining_time_msecs = interval * 1000; } SLEEP_MILISEC (0, HB_DISK_FAILURE_CHECK_TIMER_IN_MSECS); remaining_time_msecs -= HB_DISK_FAILURE_CHECK_TIMER_IN_MSECS; } return NULL;}The detector only acts when this node is master and not
isolated — the isolation guard is critical because an isolated
master cannot trust the is_state_changed signal from peers,
and so cannot distinguish a hung local server from a healthy
local server whose hb messages are not getting through.
Process registration and HB_PSTATE_*
Section titled “Process registration and HB_PSTATE_*”A cub_server, copylogdb, or applylogdb registers itself
on startup by sending an HBP_PROC_REGISTER message to
cub_master’s control socket. The handler is
hb_register_new_process (master_heartbeat.c:4238); it
resolves the existing entry by args (so a restart of the
same configuration re-uses the slot), allocates a new entry if
none exists via hb_alloc_new_proc, and transitions the
entry’s state field through:
// HB_PROC_STATE — src/executables/master_heartbeat.h:93enum HB_PROC_STATE{ HB_PSTATE_UNKNOWN = 0, HB_PSTATE_DEAD = 1, HB_PSTATE_DEREGISTERED = 2, HB_PSTATE_STARTED = 3, HB_PSTATE_NOT_REGISTERED = 4, HB_PSTATE_REGISTERED = 5, HB_PSTATE_REGISTERED_AND_STANDBY = HB_PSTATE_REGISTERED, HB_PSTATE_REGISTERED_AND_TO_BE_STANDBY = 6, HB_PSTATE_REGISTERED_AND_ACTIVE = 7, HB_PSTATE_REGISTERED_AND_TO_BE_ACTIVE = 8, HB_PSTATE_MAX};The crucial relationship is between this enum and the cluster
side: when hb_Cluster->state advances to MASTER, the
HB_RJOB_CHANGE_MODE job moves any
HB_PSTATE_REGISTERED_AND_STANDBY server entry to
HB_PSTATE_REGISTERED_AND_TO_BE_ACTIVE and asks the server
itself to become HA_SERVER_STATE_ACTIVE. The server’s
acknowledgement (via hb_resource_receive_changemode,
master_heartbeat.c:4444) finalises the entry as
HB_PSTATE_REGISTERED_AND_ACTIVE.
Initialization flow
Section titled “Initialization flow”Activation goes through hb_master_init (master_heartbeat.c:5250):
// hb_master_init — src/executables/master_heartbeat.c (sketch)inthb_master_init (void){ hb_cluster_initialize (ha_node_list, ha_replica_list); hb_cluster_job_initialize (); /* queues HB_CJOB_INIT */ hb_resource_initialize (); hb_resource_job_initialize (); hb_thread_initialize (); /* spawns the four threads */ return NO_ERROR;}HB_CJOB_INIT is the bootstrap that queues the three
periodic jobs (HEARTBEAT, CHECK_VALID_PING_SERVER,
CALC_SCORE) and exits. From that point the cluster is
self-sustaining — every periodic job re-queues itself with
its configured interval.
The activation entrypoint exposed to operators is
hb_activate_heartbeat (master_heartbeat.c:6599); the
deactivation symmetric is hb_deactivate_heartbeat (6557).
Both are reachable from cub_commdb (the operator-facing
utility) via ACTIVATE_HEARTBEAT / DEACTIVATE_HEARTBEAT
control messages handled in css_process_activate_heartbeat.
Source Walkthrough
Section titled “Source Walkthrough”Anchor on symbol names, not line numbers. The CUBRID source moves; the position table at the end is scoped to the doc’s
updated:date.
Wire protocol and headers
Section titled “Wire protocol and headers”HBP_HEADER(heartbeat.h) — UDP wire header.HBP_CLUSTER_HEARTBEAT(heartbeat.h) — onlyHBP_HEADER::typevalue in current code.HBP_PROC_REGISTER(heartbeat.h) — TCP register-with-master payload.HB_NODE_STATE(heartbeat.h) — six-value cluster FSM.HB_PROC_TYPE(heartbeat.h) — server / copylogdb / applylogdb.hb_set_net_header(master_heartbeat.c) — fills the header includinggroup_idandorig_host_name.
Cluster-side global and tables
Section titled “Cluster-side global and tables”hb_Cluster(master_heartbeat.h) — the global pointer.HB_CLUSTERstruct (master_heartbeat.h) — peer table, myself/master cursors, isolation flags, ping list.HB_NODE_ENTRY(master_heartbeat.h) — one entry per peer.HB_PING_HOST_ENTRY(master_heartbeat.h) — witness host.HB_UI_NODE_ENTRY(master_heartbeat.h) — unidentified inbound-source diagnostics.
Cluster jobs
Section titled “Cluster jobs”HB_CLUSTER_JOBenum (master_heartbeat.h) — eight values.hb_cluster_jobs[](master_heartbeat.c) — function table.hb_cluster_job_init— queues HEARTBEAT, CHECK_VALID_PING_SERVER, CALC_SCORE.hb_cluster_job_heartbeat— broadcast and re-queue.hb_cluster_job_calc_score— score, classify, branch.hb_cluster_calc_score— the score computation itself.hb_cluster_is_isolated— every non-replica peer is UNKNOWN.hb_cluster_is_received_heartbeat_from_all— peer-poll freshness predicate used by the failover wait.hb_cluster_job_check_ping— witness consultation.hb_cluster_check_valid_ping_server— periodic witness refresh (only the result enables/disables ping use).hb_cluster_job_check_valid_ping_server— the periodic job wrapper.hb_cluster_job_failover— second score check, promote.hb_cluster_job_failback— kill cub_server, demote.hb_cluster_job_demote— wait-for-new-master loop withhide_to_demote.
Wire I/O
Section titled “Wire I/O”hb_cluster_receive_heartbeat— inbound packet dispatcher.hb_cluster_send_heartbeat_internal— outbound packet builder.hb_cluster_send_heartbeat_req/_resp— direction wrappers.hb_cluster_request_heartbeat_to_all— broadcast loop; increments per-peerheartbeat_gap.
Resource-side global and tables
Section titled “Resource-side global and tables”hb_Resource(master_heartbeat.h) — local proc table.HB_RESOURCEstruct — proc list, FSM state, shutdown flag.HB_PROC_ENTRY— one per server / copylogdb / applylogdb.HB_PROC_STATE— registered / standby / active / etc.
Resource jobs
Section titled “Resource jobs”HB_RESOURCE_JOBenum (master_heartbeat.h).hb_resource_jobs[](master_heartbeat.c) — function table.hb_resource_job_proc_start/_confirm_start— fork+execv a dead process and confirm liveness.hb_resource_job_proc_dereg/_confirm_dereg— graceful shutdown with SIGTERM, escalate to SIGKILL on timeout.hb_resource_job_change_mode— flip cub_server between STANDBY and ACTIVE.hb_resource_job_demote_start_shutdown/_demote_confirm_shutdown— used by disk-fail demote path.hb_resource_job_cleanup_all/_confirm_cleanup_all— invoked bycubrid hb stop.hb_resource_demote_start_shutdown_server_proc— helper that initiates the cub_server stop for demote.
Process registration and changemode
Section titled “Process registration and changemode”hb_register_new_process—HBP_PROC_REGISTERhandler.hb_alloc_new_proc/hb_remove_proc— list ops onhb_Resource->procs.hb_resource_receive_changemode— server’s reply to a CHANGE_MODE request.hb_resource_receive_get_eof— server’s reply to the disk-failure detector’s EOF probe.hb_resource_send_changemode— outbound change-mode RPC.hb_resource_send_get_eof— outbound EOF probe.hb_resource_check_server_log_grow—prev_eof == curr_eofpredicate.
Threads and lifecycle
Section titled “Threads and lifecycle”hb_thread_initialize— spawns the four worker threads.hb_thread_cluster_reader— UDP reader.hb_thread_cluster_worker— CJOB dispatcher.hb_thread_resource_worker— RJOB dispatcher.hb_thread_check_disk_failure— server-hang detector.hb_master_init— cluster init + thread spawn entry.hb_activate_heartbeat/hb_deactivate_heartbeat— operator-driven on/off.hb_resource_shutdown_and_cleanup/hb_cluster_shutdown_and_cleanup— shutdown.
Job-queue primitives
Section titled “Job-queue primitives”HB_JOB,HB_JOB_ENTRY,HB_JOB_ARG(master_heartbeat.h).hb_job_queue/hb_job_dequeue/hb_job_set_expire_and_reorder— generic ordered queue used by both cluster and resource sides.hb_cluster_job_queue/hb_resource_job_queue— typed wrappers.
Process-side bridge (heartbeat.{c,h} under connection/)
Section titled “Process-side bridge (heartbeat.{c,h} under connection/)”hb_register_to_master— server / replication process registers itself on startup.hb_deregister_from_master— symmetric.hb_process_init— connect, register, start reader.hb_process_master_request— receive loop on the process-to-master TCP socket.hb_thread_master_reader— process side of the master liveness check (kills self on disconnect).
Position hints as of 2026-05-01
Section titled “Position hints as of 2026-05-01”| Symbol | File | Line |
|---|---|---|
HB_NODE_STATE enum | heartbeat.h | 86 |
HBP_HEADER struct | heartbeat.h | 114 |
HBP_PROC_REGISTER struct | heartbeat.h | 138 |
HBP_CLUSTER_HEARTBEAT | heartbeat.h | 75 |
HB_CLUSTER_JOB enum | master_heartbeat.h | 62 |
HB_RESOURCE_JOB enum | master_heartbeat.h | 76 |
HB_PROC_STATE enum | master_heartbeat.h | 93 |
HB_NODE_SCORE_* macros | master_heartbeat.h | 122 |
HB_NODE_ENTRY struct | master_heartbeat.h | 200 |
HB_CLUSTER struct | master_heartbeat.h | 242 |
HB_PROC_ENTRY struct | master_heartbeat.h | 272 |
HB_RESOURCE struct | master_heartbeat.h | 307 |
hb_cluster_jobs[] | master_heartbeat.c | 259 |
hb_resource_jobs[] | master_heartbeat.c | 272 |
hb_cluster_job_init | master_heartbeat.c | 708 |
hb_cluster_job_heartbeat | master_heartbeat.c | 734 |
hb_cluster_is_isolated | master_heartbeat.c | 762 |
hb_cluster_is_received_heartbeat_from_all | master_heartbeat.c | 785 |
hb_cluster_job_calc_score | master_heartbeat.c | 812 |
hb_cluster_job_check_ping | master_heartbeat.c | 992 |
hb_cluster_job_failover | master_heartbeat.c | 1163 |
hb_cluster_job_demote | master_heartbeat.c | 1236 |
hb_cluster_job_failback | master_heartbeat.c | 1351 |
hb_cluster_check_valid_ping_server | master_heartbeat.c | 1463 |
hb_cluster_job_check_valid_ping_server | master_heartbeat.c | 1500 |
hb_cluster_calc_score | master_heartbeat.c | 1556 |
hb_cluster_request_heartbeat_to_all | master_heartbeat.c | 1646 |
hb_cluster_send_heartbeat_req | master_heartbeat.c | 1677 |
hb_cluster_send_heartbeat_resp | master_heartbeat.c | 1696 |
hb_cluster_send_heartbeat_internal | master_heartbeat.c | 1702 |
hb_cluster_receive_heartbeat | master_heartbeat.c | 1750 |
hb_set_net_header | master_heartbeat.c | 1914 |
hb_cluster_load_group_and_node_list | master_heartbeat.c | 2730 |
hb_resource_demote_start_shutdown_server_proc | master_heartbeat.c | 3307 |
hb_resource_job_demote_confirm_shutdown | master_heartbeat.c | 3416 |
hb_resource_job_demote_start_shutdown | master_heartbeat.c | 3494 |
hb_resource_job_confirm_start | master_heartbeat.c | 3552 |
hb_resource_job_confirm_dereg | master_heartbeat.c | 3702 |
hb_resource_job_change_mode | master_heartbeat.c | 3791 |
hb_alloc_new_proc | master_heartbeat.c | 3925 |
hb_register_new_process | master_heartbeat.c | 4238 |
hb_resource_send_changemode | master_heartbeat.c | 4356 |
hb_resource_receive_changemode | master_heartbeat.c | 4444 |
hb_resource_check_server_log_grow | master_heartbeat.c | 4518 |
hb_resource_send_get_eof | master_heartbeat.c | 4577 |
hb_resource_receive_get_eof | master_heartbeat.c | 4605 |
hb_thread_cluster_worker | master_heartbeat.c | 4659 |
hb_thread_cluster_reader | master_heartbeat.c | 4704 |
hb_thread_resource_worker | master_heartbeat.c | 4769 |
hb_thread_check_disk_failure | master_heartbeat.c | 4814 |
hb_thread_initialize | master_heartbeat.c | 5146 |
hb_master_init | master_heartbeat.c | 5250 |
hb_deactivate_heartbeat | master_heartbeat.c | 6557 |
hb_activate_heartbeat | master_heartbeat.c | 6599 |
css_send_heartbeat_request | connection/heartbeat.c | 160 |
hb_register_to_master | connection/heartbeat.c | 298 |
hb_process_init | connection/heartbeat.c | 691 |
Source verification (as of 2026-05-01)
Section titled “Source verification (as of 2026-05-01)”Verified facts
Section titled “Verified facts”-
Each
cub_masterdecides its master independently — there is no consensus protocol. Verified athb_cluster_calc_score(master_heartbeat.c:1556): the function loops overhb_Cluster->nodes, computes a localscoreper peer, and writeshb_Cluster->masterto the smallest-score peer. No quorum, no inter-cub_mastervote. Convergence relies on every node seeing the same input (heartbeat-driven peer states); divergent inputs produce divergent verdicts (handled by the split-brain branch). -
The role-bit constants are intentionally negative when read as
short. Verified atmaster_heartbeat.h:122-125:HB_NODE_SCORE_MASTER 0x8000isSHORT_MIN, smallest signed-shortvalue. Themin_scorecomparison inhb_cluster_calc_scoretherefore favours masters; priority acts only as a tiebreaker within the same role bit. Two masters with the same priority is impossible because priority is assigned 1, 2, 3 … by configuration order inhb_cluster_load_group_and_node_list(master_heartbeat.c:2730). -
HB_NSTATE_TO_BE_SLAVEis unreachable from local transitions. Verified by inspecting all writes tohb_Cluster->state: failback writesSLAVEdirectly (master_heartbeat.c:1364), demote walksMASTER → UNKNOWN → SLAVE(1259,1267), failover writesMASTERorSLAVE(1180,1192). The only way a node entersTO_BE_SLAVEis viahb_cluster_receive_heartbeatreading a peer’s reported state — i.e., it’s a peer-side view, not a local-side action. -
hide_to_demote == truesilences both outbound heartbeats and score participation. Verified athb_cluster_job_heartbeat(master_heartbeat.c:740,if (hb_Cluster->hide_to_demote == false)gates the broadcast) andhb_cluster_job_calc_score(828,goto calc_endskips the master/split-brain branches whenhide_to_demoteis set). The flag is asserted only insidehb_cluster_job_demoteand cleared when a new master is found (1310) or the wait expires (1274). -
The wire protocol carries one int (the sender’s state) beyond the header. Verified at
hb_cluster_send_heartbeat_internal(master_heartbeat.c:1702): afterhb_set_net_header, the body is exactlyor_pack_int (p, hb_Cluster->state), lengthOR_INT_SIZE(4 bytes). The receiver inhb_cluster_receive_heartbeat(1802)or_unpack_ints the same field. There is no other body content; HBP_HEADER’slenfield always equals 4. -
The reader uses
recvfromover a single UDP socket; UDP loss is therefore tolerated by design. Verified athb_thread_cluster_reader(master_heartbeat.c:4704): a singlerecvfromper loop onhb_Cluster->sfd. UDP-level retry is the gap counter — every send incrementsnode->heartbeat_gap, every receive decrements it (floored at 0), so a single dropped packet is invisible whereasha_max_heartbeat_gap-many consecutive losses demote the peer toUNKNOWN. -
The disk-failure detector does nothing on non-master nodes, by guard. Verified at
hb_thread_check_disk_failure(master_heartbeat.c:4840): theis_isolated == false && hb_Resource->state == HB_NSTATE_MASTERguard skips both the EOF probe and the demote check on slaves. The thread still wakes everyHB_DISK_FAILURE_CHECK_TIMER_IN_MSECS(100 ms) even on slaves — the cost is one mutex-trylock dance and back to sleep. -
The cluster job table and the cluster job enum are size-locked at compile time only by convention. Verified at
master_heartbeat.c:259-269(hb_cluster_jobs[]) andmaster_heartbeat.h:62(HB_CLUSTER_JOB). There is no static-assert; a contributor adding an enum value but forgetting the table entry would crash on first dispatch. The terminatingNULLin the array is for diagnostic clarity rather than dispatch (thehb_cluster_job_queueboundary check onHB_CJOB_MAXalready prevents out-of-bounds). -
HB_MAX_NUM_NODES = 8is hard-coded; clusters larger than eight nodes are unsupported. Verified atmaster_heartbeat.h:128. The figure also appears in the property documentation as the upper bound forha_node_list. Not a runtime parameter — exceeding it requires source changes. -
The peer table and the local proc table use separate mutexes. Verified at
master_heartbeat.h:244(HB_CLUSTER::lock) andmaster_heartbeat.h:309(HB_RESOURCE::lock). The failover and failback paths cross both: failback releases the cluster lock between announcingSLAVEand killing servers (1390,1392) — i.e., the cluster lock is not held during the kill loop, which is what makeshb_kill_processsafe from blocking the cluster reader. -
The activation entry point is callable from
cub_commdbremote operators. Verified atmaster_heartbeat.c:6599(hb_activate_heartbeat) plus theACTIVATE_HEARTBEAT/DEACTIVATE_HEARTBEATcommand handlers incommdb.c(operator path goescubrid hb start→cub_commdb --activate-heartbeat→css_process_activate_heartbeat→hb_activate_heartbeat→hb_master_init).
Open questions
Section titled “Open questions”-
HB_MAX_WAIT_FOR_NEW_MASTER = 60semantics. The demote loop bounds itself at 60 retries with one-second waits (master_heartbeat.c:1270), then reasserts master (hide_to_demote = falseat1274, no state write back toMASTER). Is the intent that the original master comes back if no successor is found, or is this a leak (the node ends upSLAVEwithhide_to_demote == falsebut its server-sidecub_serverwas already killed)? Investigation path: trace the relationship betweenhb_cluster_job_demoteexit and the resource side after the wait expires. -
changemode_ridandchangemode_gapare commented “unused” in the deck. The deck callsHB_PROC_ENTRY::changemode_rid/_gap미사용. Verified the fields are still defined atmaster_heartbeat.h:292-293. Are they referenced anywhere in the current source, or have they decayed to dead state? Investigation path:git grep changemode_ridandgit grep changemode_gapacross the tree. -
UDP packet authentication is
group_idonly. Verified thathb_is_heartbeat_validchecksgroup_idand host resolution but no cryptographic signature. A spoofed UDP packet with the rightgroup_idand any of the configured host names would be accepted. Is this acceptable in the HA threat model, or is there an external assumption (firewall, private network) that needs to be documented? -
Bit-field byte ordering in
HBP_HEADER.heartbeat.h:117-122has#if defined(HPUX) || defined(_AIX) || defined(sparc)reordering of ther:1andreserved:7fields. Modern builds are mostly Linux x86_64; the alternative arms are probably untested. Investigation path: build under_AIX/sparctoolchains if available; check whether the wire is interoperable across mixed-endian clusters. -
HB_PSTATE_REGISTERED_AND_TO_BE_ACTIVEto_ACTIVEcompletion path. The deck sketches the transition but the exactcub_serverreply that flips it is not traced in detail. Investigation path: readhb_resource_receive_changemode(master_heartbeat.c:4444) and the matching server-side sender insrc/connection/server_support.c. -
Failback on isolated master without
ha_ping_hosts. An isolated master with no ping witnesses goes throughhb_cluster_job_check_pingto theping_check_cancelbranch (master_heartbeat.c:1011) — i.e., it stays master indefinitely. The deck flags this as “master 지위 유지 -> 고립 해소 전 무한 반복”. Is this the intended permanent policy, or should there be a max-isolation timeout that eventually demotes? Investigation path: open RND tickets touchingis_ping_check_enableddefaults.
Beyond CUBRID — Comparative Designs & Research Frontiers
Section titled “Beyond CUBRID — Comparative Designs & Research Frontiers”Pointers, not analysis.
-
Raft (Ongaro & Ousterhout, USENIX ATC 2014) — quorum- based leader election with explicit term numbers and a log comparison rule. CUBRID’s per-node
calc_scoreis the opposite design point; a follow-up doc could quantify what CUBRID gives up by not running consensus (specifically: the isolated-master indefinite-master-retention behaviour surfaced in open question §6). -
ZooKeeper / ZAB (Hunt et al., USENIX ATC 2010) — atomic broadcast with a designated leader and a witness service external to the data path. Pacemaker, etcd, and Kubernetes all delegate election to it. CUBRID’s
ha_ping_hostsachieves a similar witness role at much lower cost but without ZAB’s safety guarantees. -
MySQL Group Replication / Galera Cluster — peer-to-peer certification-based replication where election is a byproduct of the certification protocol. Comparable to CUBRID in the “no central coordinator” choice but very different in failure-detection style (vector clocks vs. heartbeat gap).
-
Patroni (PostgreSQL HA orchestrator) — delegates leader election to an external DCS (etcd / Consul / ZooKeeper), using the database engine itself as a passive participant. CUBRID embeds the equivalent role into
cub_master. A comparative doc could trace how the failure-injection surface differs. -
Pacemaker + Corosync — Linux-HA’s stack for active/passive clusters. STONITH (Shoot The Other Node In The Head) provides the fence guarantee CUBRID’s
kill (proc->pid, SIGKILL)in failback approximates — but Pacemaker fences the whole node, not just the database process. -
MongoDB replica sets — heartbeat-driven election with priority and a 10-second timeout. The state machine (PRIMARY / SECONDARY / RECOVERING / ROLLBACK / FATAL) is larger than CUBRID’s six-value FSM and includes data-side states the heartbeat module here intentionally keeps out of cluster-side scope.
-
Designing Data-Intensive Applications (Kleppmann), Ch. 5 “Replication” + Ch. 9 “Consistency and Consensus” — the textbook framing for the choices CUBRID’s heartbeat makes. Especially the “split-brain” treatment in Ch. 9 motivates the open-question §6 above.
Sources
Section titled “Sources”Raw analyses (raw/code-analysis/cubrid/distributed/heartbeat/)
Section titled “Raw analyses (raw/code-analysis/cubrid/distributed/heartbeat/)”heartbeat 코드 분석.pdfheartbeat 코드 분석.pptx_converted/heartbeat-code-analysis.pdf.txt— pdftotext extract of the PDF._converted/heartbeat-code-analysis.pptx.md— markitdown extract of the PPTX.
Sibling docs
Section titled “Sibling docs”knowledge/code-analysis/cubrid/cubrid-recovery-manager.md— heartbeat triggers in-doubt recovery on the new master at failover.knowledge/code-analysis/cubrid/cubrid-2pc.md— cross-node 2PC interacts with the cluster FSM through the same control surface (XA fromcub_commdb).
Textbook chapters / papers
Section titled “Textbook chapters / papers”- Designing Data-Intensive Applications (Kleppmann), Ch. 5 “Replication” — primary/standby framing, sync vs. async.
- Designing Data-Intensive Applications (Kleppmann), Ch. 9 “Consistency and Consensus” — split-brain, fencing.
- Chandra & Toueg, Unreliable Failure Detectors for Reliable Distributed Systems, PODC 1996 — formal treatment of why asynchronous detectors are necessarily unsafe or untimely.
- Ongaro & Ousterhout, In Search of an Understandable Consensus Algorithm (Raft), USENIX ATC 2014 — counterpoint to CUBRID’s local-decision design.
CUBRID source (/data/hgryoo/references/cubrid/)
Section titled “CUBRID source (/data/hgryoo/references/cubrid/)”src/executables/master_heartbeat.{c,h}— the cub_master side; the bulk of the module.src/connection/heartbeat.{c,h}— the process side; how cub_server / copylogdb / applylogdb register and run their master-reader.src/executables/util_service.c—cubrid hb start|stop|...utility entry points.src/executables/commdb.c—cub_commdboperator-side utility that brokers commands tocub_master.src/connection/server_support.c— server-side process registration (hb_register_to_masteris called fromnet_server_start→css_init).