Skip to content

CUBRID cub_master Process — Daemon Lifecycle, Connection Registry, Request Dispatch, and the Auto-Restart Server Monitor

A service-registry daemon is the long-lived process that owns the per-host map of “what database servers are running here, on what ports, with what arguments.” Multi-server engines all need one — clients have to ask something “where’s database mydb?” without already knowing the per-server port. The design choices divide along two axes:

  1. Per-host vs. per-cluster. Per-host (PostgreSQL’s postmaster is per-cluster but per-host in practice; MySQL relies on systemd; Oracle has CRS for cluster-wide and lsnrctl per-host) keeps the registry coresident with the processes it tracks. Per-cluster (etcd-backed registries in newer engines) survives host failures but adds a network dependency. CUBRID picks per-host: every CUBRID install runs one cub_master per host, and clients connect to it on a well-known port to discover the actual cub_server ports.

  2. Process supervision scope. A bare registry just answers “where’s X?” and lets an external supervisor (systemd, monit, runit) handle restarts. A supervising registry also tracks PIDs and re-forks on abnormal exit. CUBRID’s master defaults to bare registry but has an opt-in auto_restart_server mode that activates the server_monitor C++ subsystem — a small process supervisor embedded in cub_master that re-execs cub_server from the recorded argv when the kernel reaps a child unexpectedly.

The cub_master binary is therefore three things bundled into one: (a) a request server that handles status / shutdown queries from commdb / cubrid commdb, (b) a connection registry that brokers introductions between clients and cub_server instances, and (c) optionally a process supervisor for the server instances themselves. The HA replication subsystem (cubrid-heartbeat.md) layers on top of all three.

EnginePer-host daemonProcess supervisionClient discovery
PostgreSQLpostmaster per data directory; one Postgres backend per client connectionpostmaster re-execs backends on crash; cluster-wide restart on PANICClient connects directly to the postmaster port (5432)
MySQLmysqld per data directory; thread-per-connectionsystemd / mysql.server script restarts mysqld; no in-process supervisionDirect TCP/UDS connect to mysqld (3306 / socket file)
Oraclelsnrctl (Listener) per host accepts connections and brokers to the right Oracle instance; OHASD/CRSD for clustered OracleOHASD/CRSD restart Oracle instances on failureClient connects to listener (1521); listener forwards to the appropriate instance
MongoDBmongod per node; mongos is the router for sharded clusterssystemd; replica set primary handles failover via electionDirect connect to mongod (or to mongos for sharded)
CUBRIDcub_master per host; brokers between clients and cub_serversOptional in-process server_monitor (enabled by auto_restart_server); HA failover via master_heartbeat.c (separate doc)Client connects to cub_master (default 1523), receives the per-database cub_server port, then connects to that

CUBRID’s listener (cub_master) is closest in spirit to Oracle’s lsnrctl: a per-host introduction broker. The distinguishing trait is the optional in-process supervisor — most engines delegate to systemd/init for restart, but CUBRID’s master can own server lifecycle directly when configured to.

cub_master’s main (in master.c:1207) runs the following sequence:

// master.c::main (paraphrased)
utility_initialize (); // message catalog
util_config_ret = master_util_config_startup (
(argc > 1) ? argv[1] : NULL, &port_id); // read $CUBRID/conf/cubrid.conf
GETHOSTNAME (hostname, ...); // for error log filename
er_init (errlog, ER_NEVER_EXIT); // <hostname>_master.err
if (__gv_cvar.css_does_master_exist (port_id)) // duplicate-master check
goto cleanup; // bail out — another master already there
msgcat_final (); // close catalog before fork
er_final (ER_ALL_FINAL);
if (envvar_get ("NO_DAEMON") == NULL)
css_daemon_start (); // fork into background
utility_initialize (); // reopen catalog in child
er_init (errlog, ER_NEVER_EXIT);
time (&css_Start_time); // record start time for status
if (css_master_init (port_id, css_Master_socket_fd) != NO_ERROR)
goto cleanup; // socket bind / signal setup failed
if (envvar_get ("NO_DAEMON") != NULL)
os_set_signal_handler (SIGINT, css_master_cleanup);
if (!HA_DISABLED ())
hb_master_init (); // see cubrid-heartbeat.md
auto_Restart_server = prm_get_bool_value (PRM_ID_AUTO_RESTART_SERVER);
if (auto_Restart_server)
master_Server_monitor.reset (new server_monitor ());
conn = __gv_cvar.css_make_conn (css_Master_socket_fd[0]);
css_add_request_to_socket_queue (conn, false, NULL,
css_Master_socket_fd[0], READ_WRITE, 0,
&css_Master_socket_anchor);
/* ... add second socket fd for IPv6 / UDS ... */
/* ... main select loop ... */

The flow has six observable phases:

  1. Config + duplicate detection. master_util_config_startup reads cubrid.conf for the master_shm_id and listening port. css_does_master_exist probes the port — if another master already listens, the new instance refuses to start.
  2. Error-log init. Errors land in $CUBRID/log/<hostname>_master.err. Done before the fork because both pre-fork and post-fork code can hit errors.
  3. Daemonisation. css_daemon_start does the standard double-fork to detach from the controlling terminal. The NO_DAEMON env var skips this — used for foreground debugging and inside container init systems that already do PID 1.
  4. Master socket setup. css_master_init binds the listening socket(s) (one IPv4, one IPv6 / UDS), installs SIGCHLD / SIGINT / SIGTERM handlers, and seeds css_Master_socket_anchor (the doubly-linked list of every active connection — the select() set).
  5. Optional HA bootstrap. If cubrid.conf has ha_mode = on, hb_master_init initialises the heartbeat subsystem (covered in cubrid-heartbeat.md) which adds heartbeat-specific connection types and worker threads.
  6. Optional server monitor. If auto_restart_server = on, instantiate the server_monitor C++ object (described below). This is the in-process supervisor that re-execs crashed cub_server instances.

After init, the master enters a select() loop multiplexing every connection in css_Master_socket_anchor. Each socket-queue entry has a name field that identifies what kind of connection it is:

  • Listening sockets (IPv4, IPv6/UDS) — accept new clients.
  • Registered cub_server connections — established when a cub_server boots and registers itself with master (so master can broker incoming clients to it). The connection name is the database name (or <dbname>@<hostname> for HA).
  • HA-server / HA-copylog / HA-applylog connections — held open by the heartbeat subsystem for replication-process liveness tracking. Identified by name prefixes (IS_MASTER_CONN_NAME_HA_SERVER, IS_MASTER_CONN_NAME_HA_COPYLOG, IS_MASTER_CONN_NAME_HA_APPLYLOG).
  • Driver / commdb / management-tool connections — short-lived client requests asking for status or shutdown (commdb, cubrid commdb, the cubrid manager web tool).
  • Client introductions — a client connecting to find a database; master answers with the cub_server’s port and the client reconnects directly. The introduction connection itself is then closed.

Every iteration of the select loop:

  1. Drains accept-pending listening sockets via css_process_master_request (which also re-arms them in the anchor).
  2. For each connection with data ready, reads a request and dispatches by opcode.
  3. For connections that closed (peer EOF), removes them from the anchor and frees the per-connection state. If the closed connection was a registered cub_server and auto_restart_server is on, enqueues a REVIVE_SERVER job to the server monitor.

The opcode dispatch lives at master_request.c:1947. Three families:

// master_request.c — request dispatch (paraphrased)
switch (request) {
// Status family
case GET_START_TIME: css_process_start_time_info (...); break;
case GET_SHUTDOWN_TIME: css_process_shutdown_time_info (...); break;
case GET_SERVER_COUNT: css_process_server_count_info (...); break;
case GET_REQUEST_COUNT: css_process_request_count_info (...); break;
case GET_SERVER_LIST: css_process_server_list_info (...); break;
case GET_ALL_COUNT: css_process_all_count_info (...); break;
case GET_ALL_LIST: css_process_all_list_info (...); break;
case GET_SERVER_HA_MODE: css_process_get_server_ha_mode (...); break;
case GET_SERVER_STATE: /* ... */; break;
// Shutdown family
case KILL_SLAVE_SERVER: css_process_kill_slave (...); break;
case KILL_MASTER_SERVER: css_process_kill_master (); break;
case KILL_SERVER_IMMEDIATE: css_process_kill_immediate (...); break;
case START_SHUTDOWN: css_process_start_shutdown (...); break;
// HA family (when ha_mode = on)
case GET_HA_PING_HOST_INFO: css_process_ha_ping_host_info (...); break;
case GET_HA_NODE_LIST: css_process_ha_node_list_info (..., false); break;
case GET_HA_NODE_LIST_VERBOSE: css_process_ha_node_list_info (..., true); break;
case GET_HA_PROCESS_LIST: css_process_ha_process_list_info (..., false); break;
case GET_HA_PROCESS_LIST_VERBOSE: css_process_ha_process_list_info (..., true); break;
case GET_HA_ADMIN_INFO: css_process_ha_admin_info (...); break;
case KILL_ALL_HA_PROCESS: css_process_kill_all_ha_process (...); break;
case DEREGISTER_HA_PROCESS_BY_PID: css_process_deregister_ha_process_by_pid (...); break;
case DEREGISTER_HA_PROCESS_BY_ARGS: css_process_deregister_ha_process_by_args (...); break;
case START_HA_UTIL_PROCESS: css_process_start_ha_util_process (...); break;
}

Each handler reads its arguments from the request packet, performs the operation (often by walking css_Master_socket_anchor and inspecting per-connection name fields), and writes a response back via the same connection.

The status family answers what commdb -P / commdb -O / cubrid service status / cubrid server status / cubrid heartbeat status ask. The shutdown family is what commdb -S / commdb -A / cubrid server stop send. The HA family is what cubrid heartbeat … sends and is the public interface to the heartbeat subsystem.

Several handlers — css_process_kill_slave, css_process_kill_immediate, css_process_start_shutdown_by_name, css_process_get_server_ha_mode, css_process_shutdown_reviving_server — walk css_Master_socket_anchor for an entry whose name matches the target server name and is not an HA-replication connection (!IS_MASTER_CONN_NAME_HA_SERVER etc.). The repeated guard filters out the heartbeat-internal connections that share the same anchor list but represent different processes — failing to exclude them would cause shutdown commands to target the wrong process.

This is the cross-cutting reason IS_MASTER_CONN_NAME_* macros appear so often in master_request.c: every per-server operation has to disambiguate “the cub_server connection for database X” from “the HA-copylog/applylog connection for database X” because both share the database name.

When auto_restart_server = on, master instantiates a server_monitor (declared in master_server_monitor.hpp):

class server_monitor {
public:
enum class job_type {
REGISTER_SERVER = 0, // a new cub_server connected — record its PID + argv
UNREGISTER_SERVER = 1, // cub_server cleanly deregistered — drop the entry
REVIVE_SERVER = 2, // cub_server connection died unexpectedly — re-fork
CONFIRM_REVIVE_SERVER = 3, // post-fork check that the new server actually came up
SHUTDOWN_SERVER = 4, // explicit shutdown request — drop entry without revive
JOB_MAX
};
void produce_job (job_type, int pid, const std::string &exec_path,
const std::string &args, const std::string &server_name);
private:
std::unordered_map <std::string, server_entry> m_server_entry_map;
std::unique_ptr<std::thread> m_monitoring_thread;
std::queue<job> m_job_queue;
std::mutex m_server_monitor_mutex;
std::condition_variable m_monitor_cv_consumer;
void server_monitor_thread_worker (); // consumer loop on m_job_queue
};
class server_entry {
int m_pid;
std::string m_exec_path;
std::unique_ptr<char *[]> m_argv; // saved for re-exec
volatile bool m_need_revive;
std::chrono::steady_clock::time_point m_last_revived_time;
};

The supervisor is a producer-consumer queue with one consumer thread. produce_job is called from the master’s select-loop thread when various events occur:

  • REGISTER_SERVER — a new cub_server connection appears in the anchor and identifies itself; master records the PID, exec path, args, and server name in m_server_entry_map.
  • UNREGISTER_SERVER — a cub_server cleanly disconnects (e.g., normal shutdown via commdb -S); the entry is dropped without revive.
  • REVIVE_SERVER — a cub_server connection died without prior unregister; the entry’s m_need_revive is set to true.
  • CONFIRM_REVIVE_SERVER — after a revive fork, master enqueues a confirmation job that verifies the new process actually opened a new connection within a timeout.
  • SHUTDOWN_SERVER — explicit shutdown; entry dropped; no revive even if auto_restart_server is on.

The consumer thread (server_monitor_thread_worker) blocks on m_monitor_cv_consumer waiting for jobs and processes them serially — registration and revival can’t race because they share the lock around m_server_entry_map and the queue.

try_revive_server is the actual fork point: it forks and execvs <exec_path> with the saved argv, records the new PID against the existing server-name entry, and updates m_last_revived_time so a flapping server can be rate-limited (though the rate-limiting policy itself isn’t strict in current code — m_last_revived_time is recorded but the consumer doesn’t read it before the next revive).

auto_restart_server is the only condition that activates this supervisor; without it, master_Server_monitor is a null unique_ptr and master doesn’t track or restart child servers itself. (In production, operators using HA almost always set auto_restart_server = on because heartbeat failover assumes crashed servers come back; in standalone HA-disabled deployments, the operator’s choice depends on whether they have an external supervisor.)

master_util.c and master_util.h — small helpers

Section titled “master_util.c and master_util.h — small helpers”

Three utility helpers used during boot:

  • master_util_config_startup (config_path, &port_id) — reads the master section of cubrid.conf; returns false if the file is missing or unparseable. The argv[1] from main is passed as a config-file path override (rarely used; defaults to $CUBRID/conf/cubrid.conf).
  • master_util_wait_proc_terminate — used by css_process_kill_immediate to wait for a SIGTERM-ed child to actually die before reporting completion to the requester.
  • master_util_get_eof_message — formats end-of-stream messages for connections that drop unexpectedly.

These are intentionally small — master_util.c is ~90 lines, not a substantial module on its own.

Master shutdown can be initiated from three places:

  1. commdb -S <db> — kills one cub_server (KILL_SLAVE_SERVER / KILL_SERVER_IMMEDIATE); master sends SIGTERM to the child, waits, drops the entry. Doesn’t shut down master itself.
  2. commdb -A — kills every registered cub_server. Each killed in turn; master remains running.
  3. commdb for master (KILL_MASTER_SERVER, or cubrid service stop) — css_process_kill_master initiates master shutdown: cleanly terminates the server monitor’s thread, sends shutdown to every registered cub_server, closes the listening socket, exits.

css_master_cleanup is the SIGINT/SIGTERM handler installed in NO_DAEMON (foreground debug) mode; it runs the same shutdown sequence as KILL_MASTER_SERVER.

SymbolRole
mainEntry; config; duplicate-master check; daemonise; init sockets and signals; optional HA + server-monitor instantiation; enter select loop
css_master_initSets up listening sockets (IPv4 + IPv6/UDS), installs SIGCHLD/SIGINT/SIGTERM, seeds the connection anchor
css_daemon_startStandard double-fork daemonisation; setsid, redirect stdio to /dev/null
css_master_errorMaster-specific error printer that goes to both stderr and <hostname>_master.err
css_master_cleanupSIGINT handler in foreground mode; runs shutdown sequence
css_Master_socket_fd[]The two listening file descriptors
css_Master_socket_anchorThe connection anchor (doubly-linked list) the select loop iterates
css_Start_timeRecorded at boot for GET_START_TIME responses
auto_Restart_serverBool mirror of PRM_ID_AUTO_RESTART_SERVER; gates master_Server_monitor instantiation
SymbolRole
process_master_requestTop-level opcode switch (line 1947); status, shutdown, HA families
css_process_start_time_info / _shutdown_time_infoGET_START_TIME / GET_SHUTDOWN_TIME
css_process_server_count_info / _server_list_infoGET_SERVER_COUNT / GET_SERVER_LIST — for commdb -P
css_process_all_count_info / _all_list_infoGET_ALL_COUNT / GET_ALL_LIST — for commdb -O (servers + brokers + pl)
css_process_request_count_infoMaster’s own request counter — diagnostic
css_process_kill_slave / _kill_immediate / _kill_masterShutdown handlers
css_process_start_shutdown / _start_shutdown_by_name / _shutdown / _stop_shutdownTwo-phase shutdown coordination
css_process_shutdown_reviving_serverSpecial shutdown path for a server that’s currently being revived (race avoidance with server_monitor)
css_process_get_server_ha_modeGET_SERVER_HA_MODE — used by HA logic
css_process_register_ha_process / _deregister_ha_process / _change_ha_modeHA-process lifecycle (heartbeat subsystem ↔ master)
css_process_ha_ping_host_info / _ha_node_list_info / _ha_admin_infoHA-info-query handlers
css_process_get_eofGeneric EOF responder for short-lived probe connections

Server monitor (master_server_monitor.{cpp,hpp})

Section titled “Server monitor (master_server_monitor.{cpp,hpp})”
SymbolRole
server_monitor (class)The supervisor; owns m_server_entry_map, m_job_queue, the consumer thread
server_monitor::produce_jobProducer entry called from the master select loop
server_monitor::server_monitor_thread_workerConsumer loop; pulls jobs and dispatches
server_monitor::register_server_entry / remove_server_entry / revive_server / try_revive_server / check_server_revived / shutdown_serverPer-job handlers
server_entry (inner class)One per registered cub_server; PID + exec path + saved argv + revive timestamps
master_Server_monitor (global unique_ptr)Singleton instance; nullptr unless auto_restart_server is on
auto_Restart_server (global bool)Mirror of the sysparam
SymbolRole
master_util_config_startupReads cubrid.conf; returns port ID and validity
master_util_wait_proc_terminatewaitpid wrapper used after sending SIGTERM to a child
master_util_get_eof_messageEOF message formatter
SymbolPath
master.c::mainsrc/executables/master.c:1207
css_master_initsrc/executables/master.c:259
process_master_request (request switch)src/executables/master_request.c:1947
css_process_start_time_infosrc/executables/master_request.c:166
css_process_server_list_infosrc/executables/master_request.c:286
css_process_all_list_infosrc/executables/master_request.c:386
css_process_kill_slavesrc/executables/master_request.c:498
css_process_kill_immediatesrc/executables/master_request.c:579
css_process_start_shutdown_by_namesrc/executables/master_request.c:615
css_process_kill_mastersrc/executables/master_request.c:686
css_process_register_ha_processsrc/executables/master_request.c:913
css_process_change_ha_modesrc/executables/master_request.c:956
server_monitor (class)src/executables/master_server_monitor.hpp:38
server_monitor::job_type (enum)src/executables/master_server_monitor.hpp:43

Symbol names are the canonical anchor; line numbers are hints scoped to the updated: date.

  • HA-replication and master are separate. This doc covers master’s daemon shape and request dispatch. The HA-replication subsystem (master_heartbeat.c, ~7300 lines) sits on top of the same select loop and uses additional connection types identified by IS_MASTER_CONN_NAME_HA_* prefixes. Heartbeat semantics, peer-discovery, and failover live in cubrid-heartbeat.md.
  • Server monitor and heartbeat are independent. The C++ server_monitor (this doc) is per-host process supervision. Heartbeat is per-cluster replication-state coordination. A configuration could enable one without the other (HA on, auto_restart_server off — heartbeat tracks replication state but doesn’t restart crashed servers; HA off, auto_restart_server on — single-host with crash-restart). Both on is the typical HA production setup.
  • Connection-anchor name disambiguation. Every per-server request handler in master_request.c filters IS_MASTER_CONN_NAME_HA_* prefixes from its anchor walk. The pattern is repeated 8+ times in the source. Adding a new request handler that operates on cub_server connections and forgetting this filter would silently target HA-internal connections — common bug shape worth flagging in code review.
  • auto_restart_server activation is one-shot at master startup. Toggling the sysparam at runtime doesn’t switch the server monitor on or off; the master would have to be restarted. This is intentional — the supervisor’s state (the m_server_entry_map) requires consistent registration history from boot to be useful.
  • Revive race with explicit shutdown. css_process_shutdown_reviving_server exists specifically because an operator may issue commdb -S <db> while server_monitor is in the middle of re-forking that same server. The handler sets a sentinel that try_revive_server checks before actually doing the fork; if the shutdown landed first, the revive is cancelled.
  • m_last_revived_time is recorded but not enforced. The server_entry records the timestamp of each revive attempt, but the consumer doesn’t currently use it to rate-limit a flapping server. A buggy cub_server that panics within seconds of restart will be revived in a tight loop. This is a known gap; flapping detection is left to external monitoring.
  • Duplicate-master check is per-port, not per-config. css_does_master_exist (port_id) only checks if the port is in use. Two cub_master instances reading different cubrid.confs but assigned the same port will conflict; two instances on different ports will both run, neither aware of the other.
  • Containerised deployment. The double-fork daemonisation is unnecessary inside container init systems (Docker init=tini, Kubernetes pod with init container). The NO_DAEMON env var skips the fork but doesn’t change other daemon behaviours (signal handling, error log path). A fully container-aware mode would also redirect logs to stdout/stderr and not write the <hostname>_master.err file.
  • Server monitor flapping policy. As noted, no rate limiting on revive. A documented policy would be: exponential backoff with a hard ceiling (e.g., max 5 revives in 5 minutes, then deregister and require operator intervention).
  • Request opcode authentication. Request opcodes have no authentication — anyone who can connect to master’s port can issue KILL_MASTER_SERVER. Production deployments rely on network-level firewalling (the master port is typically not exposed). A first-class auth mechanism is undocumented.
  • HA detail. This doc deliberately stays at the connection- registry level for HA. A complete picture of the HA-info-query semantics (GET_HA_NODE_LIST, GET_HA_PROCESS_LIST, etc.) belongs in cubrid-heartbeat.md rather than here.
  • src/executables/master.c — daemon entry, init, select loop, socket anchor management, signal handlers
  • src/executables/master_request.c — opcode dispatcher and the per-opcode handlers
  • src/executables/master_request.h — opcode enum, IS_MASTER_CONN_NAME_* macros, handler prototypes
  • src/executables/master_server_monitor.{cpp,hpp} — C++ process supervisor with producer-consumer job queue
  • src/executables/master_util.{c,h} — config reader and small process helpers
  • src/executables/AGENTS.md — agent guide
  • Adjacent docs: cubrid-heartbeat.md (HA-replication subsystem layered on the same daemon — covers master_heartbeat.c and the HA-info-query handlers cross-referenced from this doc), cubrid-broker.md (broker is a separate daemon, but commdb -O reaches it through master), cubrid-cub-admin.md (the unified admin CLI; cubrid service family forks master and the cubrid commdb verb is what most of master’s request handlers serve), cubrid-overview-server-architecture.md (master’s place in the four-process model)