Skip to content

CUBRID Heartbeat — Cluster Liveness, Failover and Failback

Contents:

The heartbeat machinery is the contract holder of cluster liveness. Two textbook problems live underneath it: failure detection (deciding a peer is gone) and leader election (deciding who replaces the gone peer). Designing Data-Intensive Applications (Kleppmann), Ch. 5 “Replication” and Ch. 9 “Consistency and Consensus” give the modern framing; Chandra & Toueg’s Unreliable Failure Detectors for Reliable Distributed Systems (PODC 1996) is the formal reference for why an asynchronous network forces every detector to be either unsafe (false positives) or untimely (slow).

In a primary/standby database cluster the same machinery covers three operational events:

  • Failover — a slave promotes itself to master because the former master is no longer reachable.
  • Failback — a master demotes itself to slave because another node is the rightful master (split-brain loser, or the local resource has failed).
  • Demote — a master temporarily steps aside while a new master is elected (resource-side disk failure on the master is the canonical trigger).

Three implementation choices the model leaves open shape every real engine and frame the rest of this document:

  1. Consensus or local decision? Classical Raft (Ongaro & Ousterhout, USENIX ATC 2014) and ZooKeeper-style ZAB elect a leader by quorum: a node only believes it is the leader once a majority has confirmed. CUBRID makes the opposite choice — each cub_master reaches an independent verdict from its own peer table. This is cheaper but inherits the asymmetric partition and split-brain problems consensus-based systems sidestep.
  2. Push or pull liveness? Some systems push periodic “I am alive” packets; others pull (the watcher polls the watched). CUBRID pushes — cub_master broadcasts a UDP heartbeat to every other cub_master it knows about, and updates its local view of each peer’s state from the received packets.
  3. What stops a network-partitioned slave from promoting? An isolated slave that cannot reach the master would, with timeout-only logic, unilaterally promote and create split-brain. CUBRID has two guards: ha_ping_hosts (a list of external addresses that must be reachable for the local node to trust its own promotion decision) and the is_isolated predicate (the local node is isolated if every non-replica peer is in HB_NSTATE_UNKNOWN).

After these are named, every CUBRID-specific structure in this document either implements one of them or makes the resulting state machine durable.

Every primary/standby cluster — MySQL/Galera, PostgreSQL with Patroni or repmgr, Oracle Data Guard, MongoDB replica sets, CUBRID — adopts the same set of patterns on top of the textbook failure detector. They are not in the original Chandra-Toueg paper; they are the engineering vocabulary that lives between the theory and the source.

The detector cannot just emit a boolean per peer; it has to order peers so that promotion is deterministic across nodes. The standard pattern packs (state, priority) into one comparable scalar: high bits encode role (master, to-be-master, slave, replica, unknown), low bits encode priority within that role. PostgreSQL’s synchronous_standby_names priority list, MongoDB’s priority member field, and CUBRID’s node->score are the same idea.

Two independent staleness signals coexist because each catches a different failure mode. The gap counter (incremented on every send, decremented on every receive) catches symmetric loss — the network drops both directions equally. The last-heard timestamp catches asymmetric loss — receives work but our packets vanish on the way out. CUBRID keeps both in HB_NODE_ENTRY (heartbeat_gap and last_recv_hbtime); either threshold breach demotes the peer to HB_NSTATE_UNKNOWN.

A separate ping channel for split-brain prevention

Section titled “A separate ping channel for split-brain prevention”

Pure peer-to-peer liveness cannot tell “the master is gone” apart from “I am partitioned away from the master”. A node that promotes on the wrong answer creates split-brain. The canonical fix is a third reference point: a static list of external hosts (gateway, DNS server, witness node) the local node pings before accepting a promotion. Pacemaker calls this fencing; ZooKeeper calls it the witness; CUBRID calls it ha_ping_hosts. The semantics are identical — an isolated node that also cannot ping the witnesses is genuinely cut off and must not promote.

The detector cannot run on the I/O path because send/receive must not block on slow operations like fork()/execv() of a restarted server. The standard split is two threads minimum: one reads the wire, one runs the FSM transitions as deferred jobs. Galera’s gcs thread plus applier, etcd’s raft.Node.Tick plus applier, CUBRID’s cluster_reader_th plus cluster_worker_th follow the same shape.

Resource side — keep cluster decisions and process management apart

Section titled “Resource side — keep cluster decisions and process management apart”

A heartbeat module also has to start, stop, and re-mode local processes (the database server, replication readers). Mixing this with cluster gossip leads to deadlocks (sending a heartbeat blocks because we are holding a process-table lock). The standard separation is two protected blocks with two locks: cluster state (peer table) and resource state (local process table). CUBRID materialises this as hb_Cluster / hb_Resource, each with its own pthread_mutex_t lock.

Theoretical conceptCUBRID name
Failure detector + leader electionhb_Cluster global with peer table (HB_NODE_ENTRY linked list)
Local node FSM stateHB_NODE_STATE { UNKNOWN, SLAVE, TO_BE_MASTER, TO_BE_SLAVE, MASTER, REPLICA } (heartbeat.h:86)
Peer-state-induced scorenode->score = node->priority | HB_NODE_SCORE_<state> in hb_cluster_calc_score
Score role bit-maskHB_NODE_SCORE_MASTER 0x8000, HB_NODE_SCORE_TO_BE_MASTER 0xF000, HB_NODE_SCORE_SLAVE 0x0000, HB_NODE_SCORE_UNKNOWN 0x7FFF (master_heartbeat.h:122-125)
Symmetric-loss staleness signalHB_NODE_ENTRY::heartbeat_gap + ha_max_heartbeat_gap (default 5)
Asymmetric-loss staleness signalHB_NODE_ENTRY::last_recv_hbtime + ha_calc_score_interval_in_msecs (default 3000)
Witness-host channelHB_PING_HOST_ENTRY list under hb_Cluster->ping_hosts; gate flag is_ping_check_enabled
Isolation predicatehb_cluster_is_isolated (master_heartbeat.c:762)
Split-brain “two masters” detectionnum_master > 1 branch in hb_cluster_job_calc_score (master_heartbeat.c:867)
Cluster job FSM enumHB_CLUSTER_JOB { INIT, HEARTBEAT, CALC_SCORE, CHECK_PING, FAILOVER, FAILBACK, CHECK_VALID_PING_SERVER, DEMOTE } (master_heartbeat.h:62)
Resource job FSM enumHB_RESOURCE_JOB { PROC_START, PROC_DEREG, CONFIRM_START, CONFIRM_DEREG, CHANGE_MODE, DEMOTE_START_SHUTDOWN, DEMOTE_CONFIRM_SHUTDOWN, CLEANUP_ALL, CONFIRM_CLEANUP_ALL } (master_heartbeat.h:76)
Process state on resource sideHB_PROC_STATE { DEAD, DEREGISTERED, STARTED, REGISTERED_AND_STANDBY, REGISTERED_AND_TO_BE_STANDBY, REGISTERED_AND_ACTIVE, REGISTERED_AND_TO_BE_ACTIVE } (master_heartbeat.h:93)
Wire headerHBP_HEADER { type, r:1, len, seq, group_id, orig_host_name, dest_host_name } (heartbeat.h:114)
Wire bodyone HB_NODE_STATE_TYPE packed via or_pack_int (master_heartbeat.c:1719)
Reader threadhb_thread_cluster_reader (master_heartbeat.c:4704)
Cluster worker threadhb_thread_cluster_worker (master_heartbeat.c:4659)
Resource worker threadhb_thread_resource_worker (master_heartbeat.c:4769)
Server-hang detector threadhb_thread_check_disk_failure (master_heartbeat.c:4814)

The heartbeat module has four moving parts: the cluster-side FSM that gossips peer state and elects a master, the resource-side FSM that registers and supervises local processes, the job queue + worker pair that drives both FSMs, and the wire protocol they exchange. We walk them in that order.

flowchart LR
  subgraph WIRE["UDP cluster gossip"]
    PEER1["cub_master @ Node1"]
    PEER2["cub_master @ Node2"]
    PEER3["cub_master @ Node3"]
  end
  subgraph LOCAL["Local cub_master process"]
    R["cluster_reader_th\nhb_thread_cluster_reader"]
    CW["cluster_worker_th\nhb_thread_cluster_worker"]
    RW["resource_worker_th\nhb_thread_resource_worker"]
    DK["check_disk_failure_th\nhb_thread_check_disk_failure"]
    CJQ["cluster_Jobs\n(CJOB queue)"]
    RJQ["resource_Jobs\n(RJOB queue)"]
    HC["hb_Cluster\n(peer table)"]
    HR["hb_Resource\n(local proc table)"]
  end
  subgraph PROCS["Local HA processes"]
    SVR["cub_server"]
    CL["copylogdb"]
    AL["applylogdb"]
  end
  PEER2 -- HBP_CLUSTER_HEARTBEAT --> R
  PEER3 -- HBP_CLUSTER_HEARTBEAT --> R
  R --> HC
  CW --> CJQ
  CJQ --> CW
  CW --> HC
  CW --> RJQ
  RW --> RJQ
  RJQ --> RW
  RW --> HR
  RW --> SVR
  RW --> CL
  RW --> AL
  DK --> HR
  DK --> SVR
  PEER1 <-- HBP_CLUSTER_HEARTBEAT --> R
  CW -. broadcast .-> PEER1
  CW -. broadcast .-> PEER2
  CW -. broadcast .-> PEER3

The figure encodes three boundaries. (reader / worker) the wire is read by one thread (cluster_reader_th) and the FSM runs in another (cluster_worker_th); the queue between them (cluster_Jobs) is the only synchronisation. (cluster / resource) peer-table mutations and process-table mutations are protected by separate locks (hb_Cluster->lock, hb_Resource->lock); the worker threads cross between the two on transitions like failover, but each lock is held only as briefly as the transition demands. (cub_master / managed processes) the local cub_master owns no database state of its own; it supervises the processes that do.

Cluster-side FSM — node state transitions

Section titled “Cluster-side FSM — node state transitions”

The peer state space is six values:

// HB_NODE_STATE — src/connection/heartbeat.h:86
enum HB_NODE_STATE
{
HB_NSTATE_UNKNOWN = 0,
HB_NSTATE_SLAVE = 1,
HB_NSTATE_TO_BE_MASTER = 2,
HB_NSTATE_TO_BE_SLAVE = 3,
HB_NSTATE_MASTER = 4,
HB_NSTATE_REPLICA = 5,
HB_NSTATE_MAX
};

UNKNOWN is the absence-of-information state — every node starts as SLAVE (or REPLICA when configured as a replica-only host) and only enters UNKNOWN if the peer’s gap or last-heard time crosses a threshold. TO_BE_MASTER and TO_BE_SLAVE are the in-flight transitions; the source notes that TO_BE_SLAVE is reachable only by remote update (a peer telling us it is going slave) — local MASTER → SLAVE transitions skip it and go direct.

stateDiagram-v2
  [*] --> SLAVE : start
  [*] --> REPLICA : ha_replica_list
  SLAVE --> TO_BE_MASTER : calc_score elects me
  TO_BE_MASTER --> MASTER : failover confirms
  TO_BE_MASTER --> SLAVE : failover cancelled
  MASTER --> SLAVE : failback (split-brain loser)
  MASTER --> SLAVE : demote (resource fail)
  MASTER --> UNKNOWN : demote (transient, before SLAVE)
  UNKNOWN --> SLAVE : demote step 2
  SLAVE --> UNKNOWN : peer view only
  MASTER --> UNKNOWN : peer view only
  REPLICA --> REPLICA : never elected

The most important property is what is not in the diagram: there is no transition from SLAVE to MASTER that bypasses TO_BE_MASTER. Every promotion goes through the intermediate state because the intermediate state is the window in which a re-scored cluster can still cancel the promotion (the case : failover arm of calc_score writes TO_BE_MASTER, then the subsequent failover job re-runs hb_cluster_calc_score and can revert to SLAVE if the result has changed).

Score computation — local leader election

Section titled “Score computation — local leader election”

Each cub_master independently runs hb_cluster_calc_score on a timer (default ha_calc_score_interval_in_msecs = 3000). The function maps every known peer to a short score, then the smallest score wins.

// hb_cluster_calc_score — src/executables/master_heartbeat.c:1556
static int
hb_cluster_calc_score (void)
{
int num_master = 0;
short min_score = HB_NODE_SCORE_UNKNOWN;
HB_NODE_ENTRY *node;
struct timeval now;
hb_Cluster->myself->state = hb_Cluster->state;
gettimeofday (&now, NULL);
for (node = hb_Cluster->nodes; node; node = node->next)
{
/* Demote stale peers to UNKNOWN — symmetric or asymmetric loss. */
if (node->heartbeat_gap > prm_get_integer_value (PRM_ID_HA_MAX_HEARTBEAT_GAP)
|| (!HB_IS_INITIALIZED_TIME (node->last_recv_hbtime)
&& HB_GET_ELAPSED_TIME (now, node->last_recv_hbtime)
> prm_get_integer_value (PRM_ID_HA_CALC_SCORE_INTERVAL_IN_MSECS)))
{
// ... condensed: save peer name if it was master, then ...
node->state = HB_NSTATE_UNKNOWN;
}
switch (node->state)
{
case HB_NSTATE_MASTER:
case HB_NSTATE_TO_BE_SLAVE:
node->score = node->priority | HB_NODE_SCORE_MASTER; /* 0x8000 */
break;
case HB_NSTATE_TO_BE_MASTER:
node->score = node->priority | HB_NODE_SCORE_TO_BE_MASTER; /* 0xF000 */
break;
case HB_NSTATE_SLAVE:
node->score = node->priority | HB_NODE_SCORE_SLAVE; /* 0x0000 */
break;
case HB_NSTATE_REPLICA:
case HB_NSTATE_UNKNOWN:
default:
node->score = node->priority | HB_NODE_SCORE_UNKNOWN; /* 0x7FFF */
break;
}
if (node->score < min_score)
{
hb_Cluster->master = node;
min_score = node->score;
}
if (node->score < (short) HB_NODE_SCORE_TO_BE_MASTER)
num_master++;
}
return num_master;
}

Two non-obvious facts. First, the role-bit assignment is ordered so that MASTER (0x8000) is smallest as a short (0x8000 reads as -32768) — the min_score comparison naturally favours the existing master, with priority breaking ties between two masters. Second, TO_BE_SLAVE shares the master role bit; this is intentional because a peer mid-demote is still authoritative until its successor is confirmed.

The num_master counter (every node whose score is below HB_NODE_SCORE_TO_BE_MASTER) is the split-brain detector — if more than one node thinks it is master simultaneously, the caller in hb_cluster_job_calc_score queues a FAILBACK for the loser.

Cluster job FSM — what the worker dispatches

Section titled “Cluster job FSM — what the worker dispatches”

The cluster worker dequeues from cluster_Jobs and dispatches through the cluster job table:

// hb_cluster_jobs — src/executables/master_heartbeat.c:259
static HB_JOB_FUNC hb_cluster_jobs[] = {
hb_cluster_job_init,
hb_cluster_job_heartbeat,
hb_cluster_job_calc_score,
hb_cluster_job_check_ping,
hb_cluster_job_failover,
hb_cluster_job_failback,
hb_cluster_job_check_valid_ping_server,
hb_cluster_job_demote,
NULL
};

The index into the array is the HB_CLUSTER_JOB enum value; the table is in lockstep with the enum and a missing entry would be caught by the NULL sentinel.

The transitions between cluster jobs are not represented as a formal state diagram — they emerge from each job re-queueing its successor. The spine of normal operation is:

flowchart LR
  INIT["INIT"]
  HB["HEARTBEAT (every 0.5s)"]
  CV["CHECK_VALID_PING_SERVER"]
  CS["CALC_SCORE (every 3s)"]
  CP["CHECK_PING"]
  FO["FAILOVER"]
  FB["FAILBACK"]
  DM["DEMOTE"]
  INIT --> HB
  INIT --> CV
  INIT --> CS
  HB --> HB
  CV --> CV
  CS --> CS
  CS --> CP
  CP --> FO
  CP --> CS
  CP --> FB
  FO --> CS
  FB --> CS
  DM --> DM

Each arrow is the job calling hb_cluster_job_queue on its own successor. The most consequential branch is in hb_cluster_job_calc_score: depending on the local node’s state, the isolation predicate, and the split-brain count, the job either continues with another CALC_SCORE, queues a CHECK_PING to validate isolation, or queues FAILBACK directly.

sequenceDiagram
  participant M as Old master (Node1)
  participant N as Slave (Node2, will promote)
  participant P as Slave (Node3)
  participant W as ha_ping_hosts witnesses
  Note over M: cub_master crash or partition
  loop every 0.5s
    N->>M: HBP_CLUSTER_HEARTBEAT (req)
    N->>P: HBP_CLUSTER_HEARTBEAT (req)
    P-->>N: HBP_CLUSTER_HEARTBEAT (resp, state=SLAVE)
    Note over N: node[Node1].heartbeat_gap++
  end
  Note over N: heartbeat_gap > ha_max_heartbeat_gap
  N->>N: hb_cluster_calc_score:<br/>node[Node1].state = UNKNOWN
  N->>N: min score is myself (priority|SLAVE)
  N->>N: state = TO_BE_MASTER, broadcast hb
  N->>W: ping ha_ping_hosts
  W-->>N: at least one replies
  N->>N: hb_cluster_job_failover:<br/>recompute score → still me
  N->>N: state = MASTER<br/>hb_Resource->state = MASTER
  N->>N: queue HB_RJOB_CHANGE_MODE
  N->>P: HBP_CLUSTER_HEARTBEAT (state=MASTER)
  P->>P: peer table updated, master=Node2

Two timing properties matter. (a) TO_BE_MASTER is held for at least ha_failover_wait_time_in_msecs (default 3 seconds) when no ping witnesses exist or when not all peers replied — this is the window in which a still-alive but slow master can reassert. (b) The transition TO_BE_MASTER → MASTER is gated on a second score recomputation inside hb_cluster_job_failover, not just the first one in hb_cluster_job_calc_score; if the peer table changed during the wait, failover cancels.

// hb_cluster_job_failover — src/executables/master_heartbeat.c:1163
static void
hb_cluster_job_failover (HB_JOB_ARG * arg)
{
int num_master;
pthread_mutex_lock (&hb_Cluster->lock);
num_master = hb_cluster_calc_score ();
if (hb_Cluster->master && hb_Cluster->myself
&& hb_Cluster->master->priority == hb_Cluster->myself->priority)
{
/* I am still the highest-priority master after the wait. */
hb_Cluster->state = HB_NSTATE_MASTER;
hb_Resource->state = HB_NSTATE_MASTER;
hb_resource_job_set_expire_and_reorder (HB_RJOB_CHANGE_MODE,
HB_JOB_TIMER_IMMEDIATELY);
}
else
{
/* A new master appeared during the wait — abort. */
hb_Cluster->state = HB_NSTATE_SLAVE;
}
hb_cluster_request_heartbeat_to_all ();
pthread_mutex_unlock (&hb_Cluster->lock);
// ... condensed: re-queue CALC_SCORE ...
}

The promotion is two-sided: hb_Cluster->state advances the cluster-side FSM, and hb_Resource->state advances the resource-side FSM. The resource side is what HB_RJOB_CHANGE_MODE reads to flip the local cub_server from HA_SERVER_STATE_STANDBY to HA_SERVER_STATE_ACTIVE.

Failback is unconditional: by the time the job runs, the caller has already decided this node is no longer master. The job’s responsibility is to clean up.

// hb_cluster_job_failback — src/executables/master_heartbeat.c:1351 (condensed)
static void
hb_cluster_job_failback (HB_JOB_ARG * arg)
{
HB_PROC_ENTRY *proc;
pid_t *pids = NULL;
int count = 0;
pthread_mutex_lock (&hb_Cluster->lock);
hb_Cluster->state = HB_NSTATE_SLAVE;
hb_Cluster->myself->state = hb_Cluster->state;
hb_cluster_request_heartbeat_to_all (); /* announce SLAVE */
pthread_mutex_unlock (&hb_Cluster->lock);
pthread_mutex_lock (&hb_Resource->lock);
hb_Resource->state = HB_NSTATE_SLAVE;
for (proc = hb_Resource->procs; proc; proc = proc->next)
if (proc->type == HB_PTYPE_SERVER)
{
pids = (pid_t *) realloc (pids, sizeof (pid_t) * (count + 1));
pids[count++] = proc->pid;
}
pthread_mutex_unlock (&hb_Resource->lock);
hb_kill_process (pids, count); /* SIGTERM, then SIGKILL on timeout */
// ... condensed: re-queue CALC_SCORE ...
}

Note the absence of an HB_NSTATE_TO_BE_SLAVE write — failback is a one-step transition from MASTER to SLAVE. The TO_BE_SLAVE value exists in the enum but is reachable only via peer-side state mirrored in from a remote HBP_CLUSTER_HEARTBEAT.

The cub_server process is killed rather than sent a mode-change RPC because the slave-side cub_server has a different in-process configuration (recovery semantics, log applier) than the master-side one. Restart is the simplest way to ensure the process comes back in the right shape.

Demote is the master-initiated step-aside path, used when the local resource has failed (typically detected by hb_thread_check_disk_failure). Unlike failback, demote waits for a new master to appear — up to HB_MAX_WAIT_FOR_NEW_MASTER (60) iterations of the job, one per second, before giving up and reasserting master.

The state sequence is MASTER → UNKNOWN → SLAVE with the hide_to_demote flag asserted throughout. While hide_to_demote is true the local node neither broadcasts heartbeats nor participates in calc_score, so peers see this node as UNKNOWN (no heartbeat in the gap window) and are free to elect a new master without contention.

Wire protocol — UDP HBP_HEADER + state byte

Section titled “Wire protocol — UDP HBP_HEADER + state byte”

A heartbeat packet is one HBP_HEADER followed by an or-packed int carrying the sender’s HB_NODE_STATE_TYPE:

// HBP_HEADER — src/connection/heartbeat.h:114
struct hbp_header
{
unsigned char type; /* HBP_CLUSTER_HEARTBEAT */
/* (bit-field portability — bigendian/littleendian variants) */
char reserved:7;
char r:1; /* 1 = request, 0 = response */
unsigned short len; /* body length */
unsigned int seq;
char group_id[HB_MAX_GROUP_ID_LEN]; /* HA group filter */
char orig_host_name[CUB_MAXHOSTNAMELEN];
char dest_host_name[CUB_MAXHOSTNAMELEN];
};

The header carries a group_id because multiple HA clusters can run on the same hosts; a packet whose group_id does not match hb_Cluster->group_id is dropped silently. The r bit distinguishes request from response — a request expects a response, and the gap counter is incremented on every send while the receive side does not increment a “response sent” counter (so an isolated node’s gap monotonically grows even if it answers received probes).

// hb_cluster_send_heartbeat_internal — src/executables/master_heartbeat.c:1702 (condensed)
static int
hb_cluster_send_heartbeat_internal (struct sockaddr_in *saddr, socklen_t saddr_len,
char *dest_host_name, bool is_req)
{
HBP_HEADER *hbp_header;
char buffer[HB_BUFFER_SZ], *p;
hbp_header = (HBP_HEADER *) (&buffer[0]);
hb_set_net_header (hbp_header, HBP_CLUSTER_HEARTBEAT, is_req,
OR_INT_SIZE, 0, dest_host_name);
p = (char *) (hbp_header + 1);
p = or_pack_int (p, hb_Cluster->state);
return sendto (hb_Cluster->sfd, buffer, sizeof (HBP_HEADER) + OR_INT_SIZE, 0,
(struct sockaddr *) saddr, saddr_len) > 0
? NO_ERROR : ER_FAILED;
}

Reading and dispatching incoming heartbeats

Section titled “Reading and dispatching incoming heartbeats”

hb_thread_cluster_reader is the only thread that calls recvfrom on hb_Cluster->sfd. Its body is intentionally short — it pushes work into hb_cluster_receive_heartbeat and loops:

// hb_thread_cluster_reader — src/executables/master_heartbeat.c:4704 (condensed)
static void *
hb_thread_cluster_reader (void *arg)
{
SOCKET sfd = hb_Cluster->sfd;
char buffer[HB_BUFFER_SZ + MAX_ALIGNMENT];
char *aligned_buffer = PTR_ALIGN (buffer, MAX_ALIGNMENT);
struct pollfd po[1] = { {0, 0, 0} };
while (hb_Cluster->shutdown == false)
{
po[0].fd = sfd; po[0].events = POLLIN;
if (poll (po, 1, 1) <= 0) continue;
if ((po[0].revents & POLLIN) && sfd == hb_Cluster->sfd)
{
struct sockaddr_in from;
socklen_t from_len = sizeof (from);
int len = recvfrom (sfd, aligned_buffer, HB_BUFFER_SZ, 0,
(struct sockaddr *) &from, &from_len);
if (len > 0)
hb_cluster_receive_heartbeat (aligned_buffer, len, &from, from_len);
}
}
return NULL;
}

hb_cluster_receive_heartbeat is where the inbound packet mutates hb_Cluster:

  1. Validate dest_host_name against hb_Cluster->host_name (drop wrong-host).
  2. Validate body length against hbp_header->len.
  3. If hb_is_heartbeat_valid rejects the packet (unknown host, group mismatch, IP mismatch), record the sender as an HB_UI_NODE_ENTRY for diagnostics and drop.
  4. If r == 1 and hide_to_demote == false, send a response via hb_cluster_send_heartbeat_resp.
  5. Look up the sender in the peer table; update node->state, decrement heartbeat_gap (floored at 0), and stamp last_recv_hbtime.
  6. If the previously-known master demoted itself in this message (old.state == MASTER && new.state != MASTER), set is_state_changed = true and after releasing the lock call hb_cluster_job_set_expire_and_reorder to bump CALC_SCORE to immediate.

The “bump score immediate” in step 6 is the path that lets a peer-side state change (e.g., the master demoting on its disk-failure timer) propagate into our local view inside one heartbeat interval rather than waiting for the next periodic CALC_SCORE.

Resource side — supervising local processes

Section titled “Resource side — supervising local processes”

hb_Resource tracks the cub_server, copylogdb, and applylogdb processes the local cub_master is responsible for. Each process is one HB_PROC_ENTRY:

// HB_PROC_ENTRY — src/executables/master_heartbeat.h:272 (excerpt)
struct HB_PROC_ENTRY
{
HB_PROC_ENTRY *next;
HB_PROC_ENTRY **prev;
unsigned char state; /* HB_PROC_STATE — REGISTERED_AND_ACTIVE etc. */
unsigned char type; /* HB_PTYPE_SERVER / COPYLOGDB / APPLYLOGDB */
int sfd; /* TCP socket cub_master ↔ process */
int pid;
char exec_path[HB_MAX_SZ_PROC_EXEC_PATH];
char args[HB_MAX_SZ_PROC_ARGS];
struct timeval frtime, rtime, dtime, ktime, stime;
unsigned short changemode_rid;
unsigned short changemode_gap;
LOG_LSA prev_eof; /* previous server-reported EOF LSA */
LOG_LSA curr_eof; /* current server-reported EOF LSA */
bool is_curr_eof_received;
CSS_CONN_ENTRY *conn;
bool being_shutdown;
bool server_hang; /* set when prev_eof == curr_eof */
};

The two LOG_LSA fields plus server_hang are how hb_thread_check_disk_failure decides the master’s local cub_server has hung: it asks the server for its current EOF LSA every ha_check_disk_failure_interval_in_secs (default 15), and if two consecutive answers are equal the server is presumed wedged. The detector then queues HB_RJOB_DEMOTE_START_SHUTDOWN, which triggers the MASTER → SLAVE demote chain.

// hb_thread_check_disk_failure — src/executables/master_heartbeat.c:4814 (condensed)
static void *
hb_thread_check_disk_failure (void *arg)
{
while (hb_Resource->shutdown == false)
{
int interval = prm_get_integer_value (PRM_ID_HA_CHECK_DISK_FAILURE_INTERVAL_IN_SECS);
if (interval > 0 && remaining_time_msecs <= 0)
{
pthread_mutex_lock (&css_Master_socket_anchor_lock);
pthread_mutex_lock (&hb_Cluster->lock);
pthread_mutex_lock (&hb_Resource->lock);
if (hb_Cluster->is_isolated == false
&& hb_Resource->state == HB_NSTATE_MASTER)
{
if (hb_resource_check_server_log_grow () == false)
{
/* Two equal EOF LSAs → server wedged → demote. */
hb_Resource->state = HB_NSTATE_SLAVE;
// ... condensed: drop locks ...
hb_resource_job_queue (HB_RJOB_DEMOTE_START_SHUTDOWN, NULL,
HB_JOB_TIMER_IMMEDIATELY);
continue;
}
}
if (hb_Resource->state == HB_NSTATE_MASTER)
hb_resource_send_get_eof (); /* ask server for fresh EOF */
// ... condensed: unlock ...
remaining_time_msecs = interval * 1000;
}
SLEEP_MILISEC (0, HB_DISK_FAILURE_CHECK_TIMER_IN_MSECS);
remaining_time_msecs -= HB_DISK_FAILURE_CHECK_TIMER_IN_MSECS;
}
return NULL;
}

The detector only acts when this node is master and not isolated — the isolation guard is critical because an isolated master cannot trust the is_state_changed signal from peers, and so cannot distinguish a hung local server from a healthy local server whose hb messages are not getting through.

A cub_server, copylogdb, or applylogdb registers itself on startup by sending an HBP_PROC_REGISTER message to cub_master’s control socket. The handler is hb_register_new_process (master_heartbeat.c:4238); it resolves the existing entry by args (so a restart of the same configuration re-uses the slot), allocates a new entry if none exists via hb_alloc_new_proc, and transitions the entry’s state field through:

// HB_PROC_STATE — src/executables/master_heartbeat.h:93
enum HB_PROC_STATE
{
HB_PSTATE_UNKNOWN = 0,
HB_PSTATE_DEAD = 1,
HB_PSTATE_DEREGISTERED = 2,
HB_PSTATE_STARTED = 3,
HB_PSTATE_NOT_REGISTERED = 4,
HB_PSTATE_REGISTERED = 5,
HB_PSTATE_REGISTERED_AND_STANDBY = HB_PSTATE_REGISTERED,
HB_PSTATE_REGISTERED_AND_TO_BE_STANDBY = 6,
HB_PSTATE_REGISTERED_AND_ACTIVE = 7,
HB_PSTATE_REGISTERED_AND_TO_BE_ACTIVE = 8,
HB_PSTATE_MAX
};

The crucial relationship is between this enum and the cluster side: when hb_Cluster->state advances to MASTER, the HB_RJOB_CHANGE_MODE job moves any HB_PSTATE_REGISTERED_AND_STANDBY server entry to HB_PSTATE_REGISTERED_AND_TO_BE_ACTIVE and asks the server itself to become HA_SERVER_STATE_ACTIVE. The server’s acknowledgement (via hb_resource_receive_changemode, master_heartbeat.c:4444) finalises the entry as HB_PSTATE_REGISTERED_AND_ACTIVE.

Activation goes through hb_master_init (master_heartbeat.c:5250):

// hb_master_init — src/executables/master_heartbeat.c (sketch)
int
hb_master_init (void)
{
hb_cluster_initialize (ha_node_list, ha_replica_list);
hb_cluster_job_initialize (); /* queues HB_CJOB_INIT */
hb_resource_initialize ();
hb_resource_job_initialize ();
hb_thread_initialize (); /* spawns the four threads */
return NO_ERROR;
}

HB_CJOB_INIT is the bootstrap that queues the three periodic jobs (HEARTBEAT, CHECK_VALID_PING_SERVER, CALC_SCORE) and exits. From that point the cluster is self-sustaining — every periodic job re-queues itself with its configured interval.

The activation entrypoint exposed to operators is hb_activate_heartbeat (master_heartbeat.c:6599); the deactivation symmetric is hb_deactivate_heartbeat (6557). Both are reachable from cub_commdb (the operator-facing utility) via ACTIVATE_HEARTBEAT / DEACTIVATE_HEARTBEAT control messages handled in css_process_activate_heartbeat.

Anchor on symbol names, not line numbers. The CUBRID source moves; the position table at the end is scoped to the doc’s updated: date.

  • HBP_HEADER (heartbeat.h) — UDP wire header.
  • HBP_CLUSTER_HEARTBEAT (heartbeat.h) — only HBP_HEADER::type value in current code.
  • HBP_PROC_REGISTER (heartbeat.h) — TCP register-with-master payload.
  • HB_NODE_STATE (heartbeat.h) — six-value cluster FSM.
  • HB_PROC_TYPE (heartbeat.h) — server / copylogdb / applylogdb.
  • hb_set_net_header (master_heartbeat.c) — fills the header including group_id and orig_host_name.
  • hb_Cluster (master_heartbeat.h) — the global pointer.
  • HB_CLUSTER struct (master_heartbeat.h) — peer table, myself/master cursors, isolation flags, ping list.
  • HB_NODE_ENTRY (master_heartbeat.h) — one entry per peer.
  • HB_PING_HOST_ENTRY (master_heartbeat.h) — witness host.
  • HB_UI_NODE_ENTRY (master_heartbeat.h) — unidentified inbound-source diagnostics.
  • HB_CLUSTER_JOB enum (master_heartbeat.h) — eight values.
  • hb_cluster_jobs[] (master_heartbeat.c) — function table.
  • hb_cluster_job_init — queues HEARTBEAT, CHECK_VALID_PING_SERVER, CALC_SCORE.
  • hb_cluster_job_heartbeat — broadcast and re-queue.
  • hb_cluster_job_calc_score — score, classify, branch.
  • hb_cluster_calc_score — the score computation itself.
  • hb_cluster_is_isolated — every non-replica peer is UNKNOWN.
  • hb_cluster_is_received_heartbeat_from_all — peer-poll freshness predicate used by the failover wait.
  • hb_cluster_job_check_ping — witness consultation.
  • hb_cluster_check_valid_ping_server — periodic witness refresh (only the result enables/disables ping use).
  • hb_cluster_job_check_valid_ping_server — the periodic job wrapper.
  • hb_cluster_job_failover — second score check, promote.
  • hb_cluster_job_failback — kill cub_server, demote.
  • hb_cluster_job_demote — wait-for-new-master loop with hide_to_demote.
  • hb_cluster_receive_heartbeat — inbound packet dispatcher.
  • hb_cluster_send_heartbeat_internal — outbound packet builder.
  • hb_cluster_send_heartbeat_req / _resp — direction wrappers.
  • hb_cluster_request_heartbeat_to_all — broadcast loop; increments per-peer heartbeat_gap.
  • hb_Resource (master_heartbeat.h) — local proc table.
  • HB_RESOURCE struct — proc list, FSM state, shutdown flag.
  • HB_PROC_ENTRY — one per server / copylogdb / applylogdb.
  • HB_PROC_STATE — registered / standby / active / etc.
  • HB_RESOURCE_JOB enum (master_heartbeat.h).
  • hb_resource_jobs[] (master_heartbeat.c) — function table.
  • hb_resource_job_proc_start / _confirm_start — fork+execv a dead process and confirm liveness.
  • hb_resource_job_proc_dereg / _confirm_dereg — graceful shutdown with SIGTERM, escalate to SIGKILL on timeout.
  • hb_resource_job_change_mode — flip cub_server between STANDBY and ACTIVE.
  • hb_resource_job_demote_start_shutdown / _demote_confirm_shutdown — used by disk-fail demote path.
  • hb_resource_job_cleanup_all / _confirm_cleanup_all — invoked by cubrid hb stop.
  • hb_resource_demote_start_shutdown_server_proc — helper that initiates the cub_server stop for demote.
  • hb_register_new_processHBP_PROC_REGISTER handler.
  • hb_alloc_new_proc / hb_remove_proc — list ops on hb_Resource->procs.
  • hb_resource_receive_changemode — server’s reply to a CHANGE_MODE request.
  • hb_resource_receive_get_eof — server’s reply to the disk-failure detector’s EOF probe.
  • hb_resource_send_changemode — outbound change-mode RPC.
  • hb_resource_send_get_eof — outbound EOF probe.
  • hb_resource_check_server_log_growprev_eof == curr_eof predicate.
  • hb_thread_initialize — spawns the four worker threads.
  • hb_thread_cluster_reader — UDP reader.
  • hb_thread_cluster_worker — CJOB dispatcher.
  • hb_thread_resource_worker — RJOB dispatcher.
  • hb_thread_check_disk_failure — server-hang detector.
  • hb_master_init — cluster init + thread spawn entry.
  • hb_activate_heartbeat / hb_deactivate_heartbeat — operator-driven on/off.
  • hb_resource_shutdown_and_cleanup / hb_cluster_shutdown_and_cleanup — shutdown.
  • HB_JOB, HB_JOB_ENTRY, HB_JOB_ARG (master_heartbeat.h).
  • hb_job_queue / hb_job_dequeue / hb_job_set_expire_and_reorder — generic ordered queue used by both cluster and resource sides.
  • hb_cluster_job_queue / hb_resource_job_queue — typed wrappers.

Process-side bridge (heartbeat.{c,h} under connection/)

Section titled “Process-side bridge (heartbeat.{c,h} under connection/)”
  • hb_register_to_master — server / replication process registers itself on startup.
  • hb_deregister_from_master — symmetric.
  • hb_process_init — connect, register, start reader.
  • hb_process_master_request — receive loop on the process-to-master TCP socket.
  • hb_thread_master_reader — process side of the master liveness check (kills self on disconnect).
SymbolFileLine
HB_NODE_STATE enumheartbeat.h86
HBP_HEADER structheartbeat.h114
HBP_PROC_REGISTER structheartbeat.h138
HBP_CLUSTER_HEARTBEATheartbeat.h75
HB_CLUSTER_JOB enummaster_heartbeat.h62
HB_RESOURCE_JOB enummaster_heartbeat.h76
HB_PROC_STATE enummaster_heartbeat.h93
HB_NODE_SCORE_* macrosmaster_heartbeat.h122
HB_NODE_ENTRY structmaster_heartbeat.h200
HB_CLUSTER structmaster_heartbeat.h242
HB_PROC_ENTRY structmaster_heartbeat.h272
HB_RESOURCE structmaster_heartbeat.h307
hb_cluster_jobs[]master_heartbeat.c259
hb_resource_jobs[]master_heartbeat.c272
hb_cluster_job_initmaster_heartbeat.c708
hb_cluster_job_heartbeatmaster_heartbeat.c734
hb_cluster_is_isolatedmaster_heartbeat.c762
hb_cluster_is_received_heartbeat_from_allmaster_heartbeat.c785
hb_cluster_job_calc_scoremaster_heartbeat.c812
hb_cluster_job_check_pingmaster_heartbeat.c992
hb_cluster_job_failovermaster_heartbeat.c1163
hb_cluster_job_demotemaster_heartbeat.c1236
hb_cluster_job_failbackmaster_heartbeat.c1351
hb_cluster_check_valid_ping_servermaster_heartbeat.c1463
hb_cluster_job_check_valid_ping_servermaster_heartbeat.c1500
hb_cluster_calc_scoremaster_heartbeat.c1556
hb_cluster_request_heartbeat_to_allmaster_heartbeat.c1646
hb_cluster_send_heartbeat_reqmaster_heartbeat.c1677
hb_cluster_send_heartbeat_respmaster_heartbeat.c1696
hb_cluster_send_heartbeat_internalmaster_heartbeat.c1702
hb_cluster_receive_heartbeatmaster_heartbeat.c1750
hb_set_net_headermaster_heartbeat.c1914
hb_cluster_load_group_and_node_listmaster_heartbeat.c2730
hb_resource_demote_start_shutdown_server_procmaster_heartbeat.c3307
hb_resource_job_demote_confirm_shutdownmaster_heartbeat.c3416
hb_resource_job_demote_start_shutdownmaster_heartbeat.c3494
hb_resource_job_confirm_startmaster_heartbeat.c3552
hb_resource_job_confirm_deregmaster_heartbeat.c3702
hb_resource_job_change_modemaster_heartbeat.c3791
hb_alloc_new_procmaster_heartbeat.c3925
hb_register_new_processmaster_heartbeat.c4238
hb_resource_send_changemodemaster_heartbeat.c4356
hb_resource_receive_changemodemaster_heartbeat.c4444
hb_resource_check_server_log_growmaster_heartbeat.c4518
hb_resource_send_get_eofmaster_heartbeat.c4577
hb_resource_receive_get_eofmaster_heartbeat.c4605
hb_thread_cluster_workermaster_heartbeat.c4659
hb_thread_cluster_readermaster_heartbeat.c4704
hb_thread_resource_workermaster_heartbeat.c4769
hb_thread_check_disk_failuremaster_heartbeat.c4814
hb_thread_initializemaster_heartbeat.c5146
hb_master_initmaster_heartbeat.c5250
hb_deactivate_heartbeatmaster_heartbeat.c6557
hb_activate_heartbeatmaster_heartbeat.c6599
css_send_heartbeat_requestconnection/heartbeat.c160
hb_register_to_masterconnection/heartbeat.c298
hb_process_initconnection/heartbeat.c691
  • Each cub_master decides its master independently — there is no consensus protocol. Verified at hb_cluster_calc_score (master_heartbeat.c:1556): the function loops over hb_Cluster->nodes, computes a local score per peer, and writes hb_Cluster->master to the smallest-score peer. No quorum, no inter-cub_master vote. Convergence relies on every node seeing the same input (heartbeat-driven peer states); divergent inputs produce divergent verdicts (handled by the split-brain branch).

  • The role-bit constants are intentionally negative when read as short. Verified at master_heartbeat.h:122-125: HB_NODE_SCORE_MASTER 0x8000 is SHORT_MIN, smallest signed-short value. The min_score comparison in hb_cluster_calc_score therefore favours masters; priority acts only as a tiebreaker within the same role bit. Two masters with the same priority is impossible because priority is assigned 1, 2, 3 … by configuration order in hb_cluster_load_group_and_node_list (master_heartbeat.c:2730).

  • HB_NSTATE_TO_BE_SLAVE is unreachable from local transitions. Verified by inspecting all writes to hb_Cluster->state: failback writes SLAVE directly (master_heartbeat.c:1364), demote walks MASTER → UNKNOWN → SLAVE (1259, 1267), failover writes MASTER or SLAVE (1180, 1192). The only way a node enters TO_BE_SLAVE is via hb_cluster_receive_heartbeat reading a peer’s reported state — i.e., it’s a peer-side view, not a local-side action.

  • hide_to_demote == true silences both outbound heartbeats and score participation. Verified at hb_cluster_job_heartbeat (master_heartbeat.c:740, if (hb_Cluster->hide_to_demote == false) gates the broadcast) and hb_cluster_job_calc_score (828, goto calc_end skips the master/split-brain branches when hide_to_demote is set). The flag is asserted only inside hb_cluster_job_demote and cleared when a new master is found (1310) or the wait expires (1274).

  • The wire protocol carries one int (the sender’s state) beyond the header. Verified at hb_cluster_send_heartbeat_internal (master_heartbeat.c:1702): after hb_set_net_header, the body is exactly or_pack_int (p, hb_Cluster->state), length OR_INT_SIZE (4 bytes). The receiver in hb_cluster_receive_heartbeat (1802) or_unpack_ints the same field. There is no other body content; HBP_HEADER’s len field always equals 4.

  • The reader uses recvfrom over a single UDP socket; UDP loss is therefore tolerated by design. Verified at hb_thread_cluster_reader (master_heartbeat.c:4704): a single recvfrom per loop on hb_Cluster->sfd. UDP-level retry is the gap counter — every send increments node->heartbeat_gap, every receive decrements it (floored at 0), so a single dropped packet is invisible whereas ha_max_heartbeat_gap-many consecutive losses demote the peer to UNKNOWN.

  • The disk-failure detector does nothing on non-master nodes, by guard. Verified at hb_thread_check_disk_failure (master_heartbeat.c:4840): the is_isolated == false && hb_Resource->state == HB_NSTATE_MASTER guard skips both the EOF probe and the demote check on slaves. The thread still wakes every HB_DISK_FAILURE_CHECK_TIMER_IN_MSECS (100 ms) even on slaves — the cost is one mutex-trylock dance and back to sleep.

  • The cluster job table and the cluster job enum are size-locked at compile time only by convention. Verified at master_heartbeat.c:259-269 (hb_cluster_jobs[]) and master_heartbeat.h:62 (HB_CLUSTER_JOB). There is no static-assert; a contributor adding an enum value but forgetting the table entry would crash on first dispatch. The terminating NULL in the array is for diagnostic clarity rather than dispatch (the hb_cluster_job_queue boundary check on HB_CJOB_MAX already prevents out-of-bounds).

  • HB_MAX_NUM_NODES = 8 is hard-coded; clusters larger than eight nodes are unsupported. Verified at master_heartbeat.h:128. The figure also appears in the property documentation as the upper bound for ha_node_list. Not a runtime parameter — exceeding it requires source changes.

  • The peer table and the local proc table use separate mutexes. Verified at master_heartbeat.h:244 (HB_CLUSTER::lock) and master_heartbeat.h:309 (HB_RESOURCE::lock). The failover and failback paths cross both: failback releases the cluster lock between announcing SLAVE and killing servers (1390, 1392) — i.e., the cluster lock is not held during the kill loop, which is what makes hb_kill_process safe from blocking the cluster reader.

  • The activation entry point is callable from cub_commdb remote operators. Verified at master_heartbeat.c:6599 (hb_activate_heartbeat) plus the ACTIVATE_HEARTBEAT / DEACTIVATE_HEARTBEAT command handlers in commdb.c (operator path goes cubrid hb startcub_commdb --activate-heartbeatcss_process_activate_heartbeathb_activate_heartbeathb_master_init).

  1. HB_MAX_WAIT_FOR_NEW_MASTER = 60 semantics. The demote loop bounds itself at 60 retries with one-second waits (master_heartbeat.c:1270), then reasserts master (hide_to_demote = false at 1274, no state write back to MASTER). Is the intent that the original master comes back if no successor is found, or is this a leak (the node ends up SLAVE with hide_to_demote == false but its server-side cub_server was already killed)? Investigation path: trace the relationship between hb_cluster_job_demote exit and the resource side after the wait expires.

  2. changemode_rid and changemode_gap are commented “unused” in the deck. The deck calls HB_PROC_ENTRY::changemode_rid / _gap 미사용. Verified the fields are still defined at master_heartbeat.h:292-293. Are they referenced anywhere in the current source, or have they decayed to dead state? Investigation path: git grep changemode_rid and git grep changemode_gap across the tree.

  3. UDP packet authentication is group_id only. Verified that hb_is_heartbeat_valid checks group_id and host resolution but no cryptographic signature. A spoofed UDP packet with the right group_id and any of the configured host names would be accepted. Is this acceptable in the HA threat model, or is there an external assumption (firewall, private network) that needs to be documented?

  4. Bit-field byte ordering in HBP_HEADER. heartbeat.h:117-122 has #if defined(HPUX) || defined(_AIX) || defined(sparc) reordering of the r:1 and reserved:7 fields. Modern builds are mostly Linux x86_64; the alternative arms are probably untested. Investigation path: build under _AIX/sparc toolchains if available; check whether the wire is interoperable across mixed-endian clusters.

  5. HB_PSTATE_REGISTERED_AND_TO_BE_ACTIVE to _ACTIVE completion path. The deck sketches the transition but the exact cub_server reply that flips it is not traced in detail. Investigation path: read hb_resource_receive_changemode (master_heartbeat.c:4444) and the matching server-side sender in src/connection/server_support.c.

  6. Failback on isolated master without ha_ping_hosts. An isolated master with no ping witnesses goes through hb_cluster_job_check_ping to the ping_check_cancel branch (master_heartbeat.c:1011) — i.e., it stays master indefinitely. The deck flags this as “master 지위 유지 -> 고립 해소 전 무한 반복”. Is this the intended permanent policy, or should there be a max-isolation timeout that eventually demotes? Investigation path: open RND tickets touching is_ping_check_enabled defaults.

Beyond CUBRID — Comparative Designs & Research Frontiers

Section titled “Beyond CUBRID — Comparative Designs & Research Frontiers”

Pointers, not analysis.

  • Raft (Ongaro & Ousterhout, USENIX ATC 2014) — quorum- based leader election with explicit term numbers and a log comparison rule. CUBRID’s per-node calc_score is the opposite design point; a follow-up doc could quantify what CUBRID gives up by not running consensus (specifically: the isolated-master indefinite-master-retention behaviour surfaced in open question §6).

  • ZooKeeper / ZAB (Hunt et al., USENIX ATC 2010) — atomic broadcast with a designated leader and a witness service external to the data path. Pacemaker, etcd, and Kubernetes all delegate election to it. CUBRID’s ha_ping_hosts achieves a similar witness role at much lower cost but without ZAB’s safety guarantees.

  • MySQL Group Replication / Galera Cluster — peer-to-peer certification-based replication where election is a byproduct of the certification protocol. Comparable to CUBRID in the “no central coordinator” choice but very different in failure-detection style (vector clocks vs. heartbeat gap).

  • Patroni (PostgreSQL HA orchestrator) — delegates leader election to an external DCS (etcd / Consul / ZooKeeper), using the database engine itself as a passive participant. CUBRID embeds the equivalent role into cub_master. A comparative doc could trace how the failure-injection surface differs.

  • Pacemaker + Corosync — Linux-HA’s stack for active/passive clusters. STONITH (Shoot The Other Node In The Head) provides the fence guarantee CUBRID’s kill (proc->pid, SIGKILL) in failback approximates — but Pacemaker fences the whole node, not just the database process.

  • MongoDB replica sets — heartbeat-driven election with priority and a 10-second timeout. The state machine (PRIMARY / SECONDARY / RECOVERING / ROLLBACK / FATAL) is larger than CUBRID’s six-value FSM and includes data-side states the heartbeat module here intentionally keeps out of cluster-side scope.

  • Designing Data-Intensive Applications (Kleppmann), Ch. 5 “Replication” + Ch. 9 “Consistency and Consensus” — the textbook framing for the choices CUBRID’s heartbeat makes. Especially the “split-brain” treatment in Ch. 9 motivates the open-question §6 above.

Raw analyses (raw/code-analysis/cubrid/distributed/heartbeat/)

Section titled “Raw analyses (raw/code-analysis/cubrid/distributed/heartbeat/)”
  • heartbeat 코드 분석.pdf
  • heartbeat 코드 분석.pptx
  • _converted/heartbeat-code-analysis.pdf.txt — pdftotext extract of the PDF.
  • _converted/heartbeat-code-analysis.pptx.md — markitdown extract of the PPTX.
  • knowledge/code-analysis/cubrid/cubrid-recovery-manager.md — heartbeat triggers in-doubt recovery on the new master at failover.
  • knowledge/code-analysis/cubrid/cubrid-2pc.md — cross-node 2PC interacts with the cluster FSM through the same control surface (XA from cub_commdb).
  • Designing Data-Intensive Applications (Kleppmann), Ch. 5 “Replication” — primary/standby framing, sync vs. async.
  • Designing Data-Intensive Applications (Kleppmann), Ch. 9 “Consistency and Consensus” — split-brain, fencing.
  • Chandra & Toueg, Unreliable Failure Detectors for Reliable Distributed Systems, PODC 1996 — formal treatment of why asynchronous detectors are necessarily unsafe or untimely.
  • Ongaro & Ousterhout, In Search of an Understandable Consensus Algorithm (Raft), USENIX ATC 2014 — counterpoint to CUBRID’s local-decision design.

CUBRID source (/data/hgryoo/references/cubrid/)

Section titled “CUBRID source (/data/hgryoo/references/cubrid/)”
  • src/executables/master_heartbeat.{c,h} — the cub_master side; the bulk of the module.
  • src/connection/heartbeat.{c,h} — the process side; how cub_server / copylogdb / applylogdb register and run their master-reader.
  • src/executables/util_service.ccubrid hb start|stop|... utility entry points.
  • src/executables/commdb.ccub_commdb operator-side utility that brokers commands to cub_master.
  • src/connection/server_support.c — server-side process registration (hb_register_to_master is called from net_server_startcss_init).