CUBRID cub_master Process — Daemon Lifecycle, Connection Registry, Request Dispatch, and the Auto-Restart Server Monitor
Theoretical Background
Section titled “Theoretical Background”A service-registry daemon is the long-lived process that owns
the per-host map of “what database servers are running here, on
what ports, with what arguments.” Multi-server engines all need
one — clients have to ask something “where’s database mydb?”
without already knowing the per-server port. The design choices
divide along two axes:
-
Per-host vs. per-cluster. Per-host (PostgreSQL’s
postmasteris per-cluster but per-host in practice; MySQL relies on systemd; Oracle has CRS for cluster-wide andlsnrctlper-host) keeps the registry coresident with the processes it tracks. Per-cluster (etcd-backed registries in newer engines) survives host failures but adds a network dependency. CUBRID picks per-host: every CUBRID install runs onecub_masterper host, and clients connect to it on a well-known port to discover the actualcub_serverports. -
Process supervision scope. A bare registry just answers “where’s X?” and lets an external supervisor (systemd, monit, runit) handle restarts. A supervising registry also tracks PIDs and re-forks on abnormal exit. CUBRID’s master defaults to bare registry but has an opt-in
auto_restart_servermode that activates theserver_monitorC++ subsystem — a small process supervisor embedded incub_masterthat re-execscub_serverfrom the recordedargvwhen the kernel reaps a child unexpectedly.
The cub_master binary is therefore three things bundled into
one: (a) a request server that handles status / shutdown queries
from commdb / cubrid commdb, (b) a connection registry that
brokers introductions between clients and cub_server
instances, and (c) optionally a process supervisor for the
server instances themselves. The HA replication subsystem
(cubrid-heartbeat.md) layers on top of all three.
Common DBMS Design
Section titled “Common DBMS Design”| Engine | Per-host daemon | Process supervision | Client discovery |
|---|---|---|---|
| PostgreSQL | postmaster per data directory; one Postgres backend per client connection | postmaster re-execs backends on crash; cluster-wide restart on PANIC | Client connects directly to the postmaster port (5432) |
| MySQL | mysqld per data directory; thread-per-connection | systemd / mysql.server script restarts mysqld; no in-process supervision | Direct TCP/UDS connect to mysqld (3306 / socket file) |
| Oracle | lsnrctl (Listener) per host accepts connections and brokers to the right Oracle instance; OHASD/CRSD for clustered Oracle | OHASD/CRSD restart Oracle instances on failure | Client connects to listener (1521); listener forwards to the appropriate instance |
| MongoDB | mongod per node; mongos is the router for sharded clusters | systemd; replica set primary handles failover via election | Direct connect to mongod (or to mongos for sharded) |
| CUBRID | cub_master per host; brokers between clients and cub_servers | Optional in-process server_monitor (enabled by auto_restart_server); HA failover via master_heartbeat.c (separate doc) | Client connects to cub_master (default 1523), receives the per-database cub_server port, then connects to that |
CUBRID’s listener (cub_master) is closest in spirit to Oracle’s
lsnrctl: a per-host introduction broker. The distinguishing
trait is the optional in-process supervisor — most engines
delegate to systemd/init for restart, but CUBRID’s master can
own server lifecycle directly when configured to.
CUBRID’s Approach
Section titled “CUBRID’s Approach”Boot sequence
Section titled “Boot sequence”cub_master’s main (in master.c:1207) runs the following
sequence:
// master.c::main (paraphrased)utility_initialize (); // message catalog
util_config_ret = master_util_config_startup ( (argc > 1) ? argv[1] : NULL, &port_id); // read $CUBRID/conf/cubrid.conf
GETHOSTNAME (hostname, ...); // for error log filenameer_init (errlog, ER_NEVER_EXIT); // <hostname>_master.err
if (__gv_cvar.css_does_master_exist (port_id)) // duplicate-master check goto cleanup; // bail out — another master already there
msgcat_final (); // close catalog before forker_final (ER_ALL_FINAL);
if (envvar_get ("NO_DAEMON") == NULL) css_daemon_start (); // fork into background
utility_initialize (); // reopen catalog in childer_init (errlog, ER_NEVER_EXIT);time (&css_Start_time); // record start time for status
if (css_master_init (port_id, css_Master_socket_fd) != NO_ERROR) goto cleanup; // socket bind / signal setup failed
if (envvar_get ("NO_DAEMON") != NULL) os_set_signal_handler (SIGINT, css_master_cleanup);
if (!HA_DISABLED ()) hb_master_init (); // see cubrid-heartbeat.md
auto_Restart_server = prm_get_bool_value (PRM_ID_AUTO_RESTART_SERVER);if (auto_Restart_server) master_Server_monitor.reset (new server_monitor ());
conn = __gv_cvar.css_make_conn (css_Master_socket_fd[0]);css_add_request_to_socket_queue (conn, false, NULL, css_Master_socket_fd[0], READ_WRITE, 0, &css_Master_socket_anchor);/* ... add second socket fd for IPv6 / UDS ... */
/* ... main select loop ... */The flow has six observable phases:
- Config + duplicate detection.
master_util_config_startupreadscubrid.conffor themaster_shm_idand listening port.css_does_master_existprobes the port — if another master already listens, the new instance refuses to start. - Error-log init. Errors land in
$CUBRID/log/<hostname>_master.err. Done before the fork because both pre-fork and post-fork code can hit errors. - Daemonisation.
css_daemon_startdoes the standard double-fork to detach from the controlling terminal. TheNO_DAEMONenv var skips this — used for foreground debugging and inside container init systems that already do PID 1. - Master socket setup.
css_master_initbinds the listening socket(s) (one IPv4, one IPv6 / UDS), installs SIGCHLD / SIGINT / SIGTERM handlers, and seedscss_Master_socket_anchor(the doubly-linked list of every active connection — theselect()set). - Optional HA bootstrap. If
cubrid.confhasha_mode = on,hb_master_initinitialises the heartbeat subsystem (covered incubrid-heartbeat.md) which adds heartbeat-specific connection types and worker threads. - Optional server monitor. If
auto_restart_server = on, instantiate theserver_monitorC++ object (described below). This is the in-process supervisor that re-execs crashedcub_serverinstances.
The select-loop and connection registry
Section titled “The select-loop and connection registry”After init, the master enters a select() loop multiplexing
every connection in css_Master_socket_anchor. Each socket-queue
entry has a name field that identifies what kind of connection
it is:
- Listening sockets (IPv4, IPv6/UDS) — accept new clients.
- Registered cub_server connections — established when a
cub_serverboots and registers itself with master (so master can broker incoming clients to it). The connection name is the database name (or<dbname>@<hostname>for HA). - HA-server / HA-copylog / HA-applylog connections — held
open by the heartbeat subsystem for replication-process
liveness tracking. Identified by name prefixes
(
IS_MASTER_CONN_NAME_HA_SERVER,IS_MASTER_CONN_NAME_HA_COPYLOG,IS_MASTER_CONN_NAME_HA_APPLYLOG). - Driver / commdb / management-tool connections — short-lived
client requests asking for status or shutdown (
commdb,cubrid commdb, thecubrid managerweb tool). - Client introductions — a client connecting to find a database; master answers with the cub_server’s port and the client reconnects directly. The introduction connection itself is then closed.
Every iteration of the select loop:
- Drains accept-pending listening sockets via
css_process_master_request(which also re-arms them in the anchor). - For each connection with data ready, reads a request and dispatches by opcode.
- For connections that closed (peer EOF), removes them from the
anchor and frees the per-connection state. If the closed
connection was a registered
cub_serverandauto_restart_serveris on, enqueues aREVIVE_SERVERjob to the server monitor.
Request dispatch (process_master_request)
Section titled “Request dispatch (process_master_request)”The opcode dispatch lives at master_request.c:1947. Three
families:
// master_request.c — request dispatch (paraphrased)switch (request) { // Status family case GET_START_TIME: css_process_start_time_info (...); break; case GET_SHUTDOWN_TIME: css_process_shutdown_time_info (...); break; case GET_SERVER_COUNT: css_process_server_count_info (...); break; case GET_REQUEST_COUNT: css_process_request_count_info (...); break; case GET_SERVER_LIST: css_process_server_list_info (...); break; case GET_ALL_COUNT: css_process_all_count_info (...); break; case GET_ALL_LIST: css_process_all_list_info (...); break; case GET_SERVER_HA_MODE: css_process_get_server_ha_mode (...); break; case GET_SERVER_STATE: /* ... */; break;
// Shutdown family case KILL_SLAVE_SERVER: css_process_kill_slave (...); break; case KILL_MASTER_SERVER: css_process_kill_master (); break; case KILL_SERVER_IMMEDIATE: css_process_kill_immediate (...); break; case START_SHUTDOWN: css_process_start_shutdown (...); break;
// HA family (when ha_mode = on) case GET_HA_PING_HOST_INFO: css_process_ha_ping_host_info (...); break; case GET_HA_NODE_LIST: css_process_ha_node_list_info (..., false); break; case GET_HA_NODE_LIST_VERBOSE: css_process_ha_node_list_info (..., true); break; case GET_HA_PROCESS_LIST: css_process_ha_process_list_info (..., false); break; case GET_HA_PROCESS_LIST_VERBOSE: css_process_ha_process_list_info (..., true); break; case GET_HA_ADMIN_INFO: css_process_ha_admin_info (...); break; case KILL_ALL_HA_PROCESS: css_process_kill_all_ha_process (...); break; case DEREGISTER_HA_PROCESS_BY_PID: css_process_deregister_ha_process_by_pid (...); break; case DEREGISTER_HA_PROCESS_BY_ARGS: css_process_deregister_ha_process_by_args (...); break; case START_HA_UTIL_PROCESS: css_process_start_ha_util_process (...); break;}Each handler reads its arguments from the request packet, performs
the operation (often by walking
css_Master_socket_anchor and inspecting per-connection name
fields), and writes a response back via the same connection.
The status family answers what commdb -P / commdb -O /
cubrid service status / cubrid server status / cubrid heartbeat status ask. The shutdown family is what commdb -S /
commdb -A / cubrid server stop send. The HA family is what
cubrid heartbeat … sends and is the public interface to the
heartbeat subsystem.
Server-name → connection lookup
Section titled “Server-name → connection lookup”Several handlers — css_process_kill_slave,
css_process_kill_immediate, css_process_start_shutdown_by_name,
css_process_get_server_ha_mode,
css_process_shutdown_reviving_server — walk
css_Master_socket_anchor for an entry whose name matches the
target server name and is not an HA-replication connection
(!IS_MASTER_CONN_NAME_HA_SERVER etc.). The repeated guard
filters out the heartbeat-internal connections that share the
same anchor list but represent different processes — failing to
exclude them would cause shutdown commands to target the wrong
process.
This is the cross-cutting reason IS_MASTER_CONN_NAME_* macros
appear so often in master_request.c: every per-server operation
has to disambiguate “the cub_server connection for database X”
from “the HA-copylog/applylog connection for database X” because
both share the database name.
server_monitor — the C++ supervisor
Section titled “server_monitor — the C++ supervisor”When auto_restart_server = on, master instantiates a
server_monitor (declared in master_server_monitor.hpp):
class server_monitor { public: enum class job_type { REGISTER_SERVER = 0, // a new cub_server connected — record its PID + argv UNREGISTER_SERVER = 1, // cub_server cleanly deregistered — drop the entry REVIVE_SERVER = 2, // cub_server connection died unexpectedly — re-fork CONFIRM_REVIVE_SERVER = 3, // post-fork check that the new server actually came up SHUTDOWN_SERVER = 4, // explicit shutdown request — drop entry without revive JOB_MAX };
void produce_job (job_type, int pid, const std::string &exec_path, const std::string &args, const std::string &server_name);
private: std::unordered_map <std::string, server_entry> m_server_entry_map; std::unique_ptr<std::thread> m_monitoring_thread; std::queue<job> m_job_queue; std::mutex m_server_monitor_mutex; std::condition_variable m_monitor_cv_consumer;
void server_monitor_thread_worker (); // consumer loop on m_job_queue};
class server_entry { int m_pid; std::string m_exec_path; std::unique_ptr<char *[]> m_argv; // saved for re-exec volatile bool m_need_revive; std::chrono::steady_clock::time_point m_last_revived_time;};The supervisor is a producer-consumer queue with one consumer
thread. produce_job is called from the master’s select-loop
thread when various events occur:
REGISTER_SERVER— a newcub_serverconnection appears in the anchor and identifies itself; master records the PID, exec path, args, and server name inm_server_entry_map.UNREGISTER_SERVER— acub_servercleanly disconnects (e.g., normal shutdown viacommdb -S); the entry is dropped without revive.REVIVE_SERVER— acub_serverconnection died without prior unregister; the entry’sm_need_reviveis set totrue.CONFIRM_REVIVE_SERVER— after a revive fork, master enqueues a confirmation job that verifies the new process actually opened a new connection within a timeout.SHUTDOWN_SERVER— explicit shutdown; entry dropped; no revive even ifauto_restart_serveris on.
The consumer thread (server_monitor_thread_worker) blocks on
m_monitor_cv_consumer waiting for jobs and processes them
serially — registration and revival can’t race because they
share the lock around m_server_entry_map and the queue.
try_revive_server is the actual fork point: it forks and
execvs <exec_path> with the saved argv, records the new
PID against the existing server-name entry, and updates
m_last_revived_time so a flapping server can be rate-limited
(though the rate-limiting policy itself isn’t strict in current
code — m_last_revived_time is recorded but the consumer
doesn’t read it before the next revive).
auto_restart_server is the only condition that activates
this supervisor; without it, master_Server_monitor is a null
unique_ptr and master doesn’t track or restart child servers
itself. (In production, operators using HA almost always set
auto_restart_server = on because heartbeat failover assumes
crashed servers come back; in standalone HA-disabled
deployments, the operator’s choice depends on whether they
have an external supervisor.)
master_util.c and master_util.h — small helpers
Section titled “master_util.c and master_util.h — small helpers”Three utility helpers used during boot:
master_util_config_startup (config_path, &port_id)— reads the master section ofcubrid.conf; returnsfalseif the file is missing or unparseable. Theargv[1]from main is passed as a config-file path override (rarely used; defaults to$CUBRID/conf/cubrid.conf).master_util_wait_proc_terminate— used bycss_process_kill_immediateto wait for a SIGTERM-ed child to actually die before reporting completion to the requester.master_util_get_eof_message— formats end-of-stream messages for connections that drop unexpectedly.
These are intentionally small — master_util.c is ~90 lines,
not a substantial module on its own.
Lifecycle and shutdown
Section titled “Lifecycle and shutdown”Master shutdown can be initiated from three places:
commdb -S <db>— kills one cub_server (KILL_SLAVE_SERVER / KILL_SERVER_IMMEDIATE); master sends SIGTERM to the child, waits, drops the entry. Doesn’t shut down master itself.commdb -A— kills every registered cub_server. Each killed in turn; master remains running.commdbfor master (KILL_MASTER_SERVER, orcubrid service stop) —css_process_kill_masterinitiates master shutdown: cleanly terminates the server monitor’s thread, sends shutdown to every registered cub_server, closes the listening socket, exits.
css_master_cleanup is the SIGINT/SIGTERM handler installed in
NO_DAEMON (foreground debug) mode; it runs the same shutdown
sequence as KILL_MASTER_SERVER.
Source Walkthrough
Section titled “Source Walkthrough”Daemon (master.c)
Section titled “Daemon (master.c)”| Symbol | Role |
|---|---|
main | Entry; config; duplicate-master check; daemonise; init sockets and signals; optional HA + server-monitor instantiation; enter select loop |
css_master_init | Sets up listening sockets (IPv4 + IPv6/UDS), installs SIGCHLD/SIGINT/SIGTERM, seeds the connection anchor |
css_daemon_start | Standard double-fork daemonisation; setsid, redirect stdio to /dev/null |
css_master_error | Master-specific error printer that goes to both stderr and <hostname>_master.err |
css_master_cleanup | SIGINT handler in foreground mode; runs shutdown sequence |
css_Master_socket_fd[] | The two listening file descriptors |
css_Master_socket_anchor | The connection anchor (doubly-linked list) the select loop iterates |
css_Start_time | Recorded at boot for GET_START_TIME responses |
auto_Restart_server | Bool mirror of PRM_ID_AUTO_RESTART_SERVER; gates master_Server_monitor instantiation |
Request dispatch (master_request.c)
Section titled “Request dispatch (master_request.c)”| Symbol | Role |
|---|---|
process_master_request | Top-level opcode switch (line 1947); status, shutdown, HA families |
css_process_start_time_info / _shutdown_time_info | GET_START_TIME / GET_SHUTDOWN_TIME |
css_process_server_count_info / _server_list_info | GET_SERVER_COUNT / GET_SERVER_LIST — for commdb -P |
css_process_all_count_info / _all_list_info | GET_ALL_COUNT / GET_ALL_LIST — for commdb -O (servers + brokers + pl) |
css_process_request_count_info | Master’s own request counter — diagnostic |
css_process_kill_slave / _kill_immediate / _kill_master | Shutdown handlers |
css_process_start_shutdown / _start_shutdown_by_name / _shutdown / _stop_shutdown | Two-phase shutdown coordination |
css_process_shutdown_reviving_server | Special shutdown path for a server that’s currently being revived (race avoidance with server_monitor) |
css_process_get_server_ha_mode | GET_SERVER_HA_MODE — used by HA logic |
css_process_register_ha_process / _deregister_ha_process / _change_ha_mode | HA-process lifecycle (heartbeat subsystem ↔ master) |
css_process_ha_ping_host_info / _ha_node_list_info / _ha_admin_info | HA-info-query handlers |
css_process_get_eof | Generic EOF responder for short-lived probe connections |
Server monitor (master_server_monitor.{cpp,hpp})
Section titled “Server monitor (master_server_monitor.{cpp,hpp})”| Symbol | Role |
|---|---|
server_monitor (class) | The supervisor; owns m_server_entry_map, m_job_queue, the consumer thread |
server_monitor::produce_job | Producer entry called from the master select loop |
server_monitor::server_monitor_thread_worker | Consumer loop; pulls jobs and dispatches |
server_monitor::register_server_entry / remove_server_entry / revive_server / try_revive_server / check_server_revived / shutdown_server | Per-job handlers |
server_entry (inner class) | One per registered cub_server; PID + exec path + saved argv + revive timestamps |
master_Server_monitor (global unique_ptr) | Singleton instance; nullptr unless auto_restart_server is on |
auto_Restart_server (global bool) | Mirror of the sysparam |
Boot / shutdown helpers (master_util.c)
Section titled “Boot / shutdown helpers (master_util.c)”| Symbol | Role |
|---|---|
master_util_config_startup | Reads cubrid.conf; returns port ID and validity |
master_util_wait_proc_terminate | waitpid wrapper used after sending SIGTERM to a child |
master_util_get_eof_message | EOF message formatter |
Position hints (as of 2026-05-05)
Section titled “Position hints (as of 2026-05-05)”| Symbol | Path |
|---|---|
master.c::main | src/executables/master.c:1207 |
css_master_init | src/executables/master.c:259 |
process_master_request (request switch) | src/executables/master_request.c:1947 |
css_process_start_time_info | src/executables/master_request.c:166 |
css_process_server_list_info | src/executables/master_request.c:286 |
css_process_all_list_info | src/executables/master_request.c:386 |
css_process_kill_slave | src/executables/master_request.c:498 |
css_process_kill_immediate | src/executables/master_request.c:579 |
css_process_start_shutdown_by_name | src/executables/master_request.c:615 |
css_process_kill_master | src/executables/master_request.c:686 |
css_process_register_ha_process | src/executables/master_request.c:913 |
css_process_change_ha_mode | src/executables/master_request.c:956 |
server_monitor (class) | src/executables/master_server_monitor.hpp:38 |
server_monitor::job_type (enum) | src/executables/master_server_monitor.hpp:43 |
Symbol names are the canonical anchor; line numbers are hints
scoped to the updated: date.
Cross-check Notes
Section titled “Cross-check Notes”- HA-replication and master are separate. This doc covers
master’s daemon shape and request dispatch. The HA-replication
subsystem (
master_heartbeat.c, ~7300 lines) sits on top of the same select loop and uses additional connection types identified byIS_MASTER_CONN_NAME_HA_*prefixes. Heartbeat semantics, peer-discovery, and failover live incubrid-heartbeat.md. - Server monitor and heartbeat are independent. The C++
server_monitor(this doc) is per-host process supervision. Heartbeat is per-cluster replication-state coordination. A configuration could enable one without the other (HA on, auto_restart_server off — heartbeat tracks replication state but doesn’t restart crashed servers; HA off, auto_restart_server on — single-host with crash-restart). Both on is the typical HA production setup. - Connection-anchor name disambiguation. Every per-server
request handler in
master_request.cfiltersIS_MASTER_CONN_NAME_HA_*prefixes from its anchor walk. The pattern is repeated 8+ times in the source. Adding a new request handler that operates on cub_server connections and forgetting this filter would silently target HA-internal connections — common bug shape worth flagging in code review. auto_restart_serveractivation is one-shot at master startup. Toggling the sysparam at runtime doesn’t switch the server monitor on or off; the master would have to be restarted. This is intentional — the supervisor’s state (them_server_entry_map) requires consistent registration history from boot to be useful.- Revive race with explicit shutdown.
css_process_shutdown_reviving_serverexists specifically because an operator may issuecommdb -S <db>while server_monitor is in the middle of re-forking that same server. The handler sets a sentinel thattry_revive_serverchecks before actually doing the fork; if the shutdown landed first, the revive is cancelled. m_last_revived_timeis recorded but not enforced. Theserver_entryrecords the timestamp of each revive attempt, but the consumer doesn’t currently use it to rate-limit a flapping server. A buggycub_serverthat panics within seconds of restart will be revived in a tight loop. This is a known gap; flapping detection is left to external monitoring.- Duplicate-master check is per-port, not per-config.
css_does_master_exist (port_id)only checks if the port is in use. Twocub_masterinstances reading differentcubrid.confs but assigned the same port will conflict; two instances on different ports will both run, neither aware of the other.
Open Questions
Section titled “Open Questions”- Containerised deployment. The double-fork daemonisation
is unnecessary inside container init systems (Docker
init=tini, Kubernetes pod with init container). The
NO_DAEMONenv var skips the fork but doesn’t change other daemon behaviours (signal handling, error log path). A fully container-aware mode would also redirect logs to stdout/stderr and not write the<hostname>_master.errfile. - Server monitor flapping policy. As noted, no rate limiting on revive. A documented policy would be: exponential backoff with a hard ceiling (e.g., max 5 revives in 5 minutes, then deregister and require operator intervention).
- Request opcode authentication. Request opcodes have no
authentication — anyone who can connect to master’s port can
issue
KILL_MASTER_SERVER. Production deployments rely on network-level firewalling (the master port is typically not exposed). A first-class auth mechanism is undocumented. - HA detail. This doc deliberately stays at the connection-
registry level for HA. A complete picture of the HA-info-query
semantics (
GET_HA_NODE_LIST,GET_HA_PROCESS_LIST, etc.) belongs incubrid-heartbeat.mdrather than here.
Sources
Section titled “Sources”src/executables/master.c— daemon entry, init, select loop, socket anchor management, signal handlerssrc/executables/master_request.c— opcode dispatcher and the per-opcode handlerssrc/executables/master_request.h— opcode enum,IS_MASTER_CONN_NAME_*macros, handler prototypessrc/executables/master_server_monitor.{cpp,hpp}— C++ process supervisor with producer-consumer job queuesrc/executables/master_util.{c,h}— config reader and small process helperssrc/executables/AGENTS.md— agent guide- Adjacent docs:
cubrid-heartbeat.md(HA-replication subsystem layered on the same daemon — coversmaster_heartbeat.cand the HA-info-query handlers cross-referenced from this doc),cubrid-broker.md(broker is a separate daemon, butcommdb -Oreaches it through master),cubrid-cub-admin.md(the unified admin CLI;cubrid servicefamily forks master and thecubrid commdbverb is what most of master’s request handlers serve),cubrid-overview-server-architecture.md(master’s place in the four-process model)