CUBRID cub_master Process — Daemon Lifecycle, Connection Registry, Request Dispatch, and the Auto-Restart Server Monitor

Theoretical Background

A service-registry daemon is the long-lived process that owns the per-host map of “what database servers are running here, on what ports, with what arguments.” Multi-server engines all need one — clients have to ask something “where’s database mydb?” without already knowing the per-server port. The design choices divide along two axes:

Per-host vs. per-cluster. Per-host (PostgreSQL’s postmaster is per-cluster but per-host in practice; MySQL relies on systemd; Oracle has CRS for cluster-wide and lsnrctl per-host) keeps the registry coresident with the processes it tracks. Per-cluster (etcd-backed registries in newer engines) survives host failures but adds a network dependency. CUBRID picks per-host: every CUBRID install runs one cub_master per host, and clients connect to it on a well-known port to discover the actual cub_server ports.
Process supervision scope. A bare registry just answers “where’s X?” and lets an external supervisor (systemd, monit, runit) handle restarts. A supervising registry also tracks PIDs and re-forks on abnormal exit. CUBRID’s master defaults to bare registry but has an opt-in auto_restart_server mode that activates the server_monitor C++ subsystem — a small process supervisor embedded in cub_master that re-execs cub_server from the recorded argv when the kernel reaps a child unexpectedly.

The cub_master binary is therefore three things bundled into one: (a) a request server that handles status / shutdown queries from commdb / cubrid commdb, (b) a connection registry that brokers introductions between clients and cub_server instances, and (c) optionally a process supervisor for the server instances themselves. The HA replication subsystem (cubrid-heartbeat.md) layers on top of all three.

Common DBMS Design

Engine	Per-host daemon	Process supervision	Client discovery
PostgreSQL	`postmaster` per data directory; one Postgres backend per client connection	`postmaster` re-execs backends on crash; cluster-wide restart on PANIC	Client connects directly to the `postmaster` port (5432)
MySQL	`mysqld` per data directory; thread-per-connection	systemd / mysql.server script restarts mysqld; no in-process supervision	Direct TCP/UDS connect to mysqld (3306 / socket file)
Oracle	`lsnrctl` (Listener) per host accepts connections and brokers to the right Oracle instance; OHASD/CRSD for clustered Oracle	OHASD/CRSD restart Oracle instances on failure	Client connects to listener (1521); listener forwards to the appropriate instance
MongoDB	`mongod` per node; `mongos` is the router for sharded clusters	systemd; replica set primary handles failover via election	Direct connect to mongod (or to mongos for sharded)
CUBRID	`cub_master` per host; brokers between clients and `cub_server`s	Optional in-process `server_monitor` (enabled by `auto_restart_server`); HA failover via `master_heartbeat.c` (separate doc)	Client connects to `cub_master` (default 1523), receives the per-database `cub_server` port, then connects to that

CUBRID’s listener (cub_master) is closest in spirit to Oracle’s lsnrctl: a per-host introduction broker. The distinguishing trait is the optional in-process supervisor — most engines delegate to systemd/init for restart, but CUBRID’s master can own server lifecycle directly when configured to.

CUBRID’s Approach

Boot sequence

cub_master’s main (in master.c:1207) runs the following sequence:

// master.c::main (paraphrased)
utility_initialize ();                              // message catalog

util_config_ret = master_util_config_startup (
    (argc > 1) ? argv[1] : NULL, &port_id);         // read $CUBRID/conf/cubrid.conf

GETHOSTNAME (hostname, ...);                        // for error log filename
er_init (errlog, ER_NEVER_EXIT);                    // <hostname>_master.err

if (__gv_cvar.css_does_master_exist (port_id))      // duplicate-master check
    goto cleanup;                                   // bail out — another master already there

msgcat_final ();                                    // close catalog before fork
er_final (ER_ALL_FINAL);

if (envvar_get ("NO_DAEMON") == NULL)
    css_daemon_start ();                            // fork into background

utility_initialize ();                              // reopen catalog in child
er_init (errlog, ER_NEVER_EXIT);
time (&css_Start_time);                             // record start time for status

if (css_master_init (port_id, css_Master_socket_fd) != NO_ERROR)
    goto cleanup;                                   // socket bind / signal setup failed

if (envvar_get ("NO_DAEMON") != NULL)
    os_set_signal_handler (SIGINT, css_master_cleanup);

if (!HA_DISABLED ())
    hb_master_init ();                              // see cubrid-heartbeat.md

auto_Restart_server = prm_get_bool_value (PRM_ID_AUTO_RESTART_SERVER);
if (auto_Restart_server)
    master_Server_monitor.reset (new server_monitor ());

conn = __gv_cvar.css_make_conn (css_Master_socket_fd[0]);
css_add_request_to_socket_queue (conn, false, NULL,
    css_Master_socket_fd[0], READ_WRITE, 0,
    &css_Master_socket_anchor);
/* ... add second socket fd for IPv6 / UDS ... */

/* ... main select loop ... */

The flow has six observable phases:

Config + duplicate detection. master_util_config_startup reads cubrid.conf for the master_shm_id and listening port. css_does_master_exist probes the port — if another master already listens, the new instance refuses to start.
Error-log init. Errors land in $CUBRID/log/<hostname>_master.err. Done before the fork because both pre-fork and post-fork code can hit errors.
Daemonisation. css_daemon_start does the standard double-fork to detach from the controlling terminal. The NO_DAEMON env var skips this — used for foreground debugging and inside container init systems that already do PID 1.
Master socket setup. css_master_init binds the listening socket(s) (one IPv4, one IPv6 / UDS), installs SIGCHLD / SIGINT / SIGTERM handlers, and seeds css_Master_socket_anchor (the doubly-linked list of every active connection — the select() set).
Optional HA bootstrap. If cubrid.conf has ha_mode = on, hb_master_init initialises the heartbeat subsystem (covered in cubrid-heartbeat.md) which adds heartbeat-specific connection types and worker threads.
Optional server monitor. If auto_restart_server = on, instantiate the server_monitor C++ object (described below). This is the in-process supervisor that re-execs crashed cub_server instances.

The select-loop and connection registry

After init, the master enters a select() loop multiplexing every connection in css_Master_socket_anchor. Each socket-queue entry has a name field that identifies what kind of connection it is:

Listening sockets (IPv4, IPv6/UDS) — accept new clients.
Registered cub_server connections — established when a cub_server boots and registers itself with master (so master can broker incoming clients to it). The connection name is the database name (or <dbname>@<hostname> for HA).
HA-server / HA-copylog / HA-applylog connections — held open by the heartbeat subsystem for replication-process liveness tracking. Identified by name prefixes (IS_MASTER_CONN_NAME_HA_SERVER, IS_MASTER_CONN_NAME_HA_COPYLOG, IS_MASTER_CONN_NAME_HA_APPLYLOG).
Driver / commdb / management-tool connections — short-lived client requests asking for status or shutdown (commdb, cubrid commdb, the cubrid manager web tool).
Client introductions — a client connecting to find a database; master answers with the cub_server’s port and the client reconnects directly. The introduction connection itself is then closed.

Every iteration of the select loop:

Drains accept-pending listening sockets via css_process_master_request (which also re-arms them in the anchor).
For each connection with data ready, reads a request and dispatches by opcode.
For connections that closed (peer EOF), removes them from the anchor and frees the per-connection state. If the closed connection was a registered cub_server and auto_restart_server is on, enqueues a REVIVE_SERVER job to the server monitor.

Request dispatch (`process_master_request`)

The opcode dispatch lives at master_request.c:1947. Three families:

// master_request.c — request dispatch (paraphrased)
switch (request) {
  // Status family
  case GET_START_TIME:           css_process_start_time_info (...); break;
  case GET_SHUTDOWN_TIME:        css_process_shutdown_time_info (...); break;
  case GET_SERVER_COUNT:         css_process_server_count_info (...); break;
  case GET_REQUEST_COUNT:        css_process_request_count_info (...); break;
  case GET_SERVER_LIST:          css_process_server_list_info (...); break;
  case GET_ALL_COUNT:            css_process_all_count_info (...); break;
  case GET_ALL_LIST:             css_process_all_list_info (...); break;
  case GET_SERVER_HA_MODE:       css_process_get_server_ha_mode (...); break;
  case GET_SERVER_STATE:         /* ... */; break;

  // Shutdown family
  case KILL_SLAVE_SERVER:        css_process_kill_slave (...); break;
  case KILL_MASTER_SERVER:       css_process_kill_master (); break;
  case KILL_SERVER_IMMEDIATE:    css_process_kill_immediate (...); break;
  case START_SHUTDOWN:           css_process_start_shutdown (...); break;

  // HA family (when ha_mode = on)
  case GET_HA_PING_HOST_INFO:    css_process_ha_ping_host_info (...); break;
  case GET_HA_NODE_LIST:         css_process_ha_node_list_info (..., false); break;
  case GET_HA_NODE_LIST_VERBOSE: css_process_ha_node_list_info (..., true); break;
  case GET_HA_PROCESS_LIST:      css_process_ha_process_list_info (..., false); break;
  case GET_HA_PROCESS_LIST_VERBOSE: css_process_ha_process_list_info (..., true); break;
  case GET_HA_ADMIN_INFO:        css_process_ha_admin_info (...); break;
  case KILL_ALL_HA_PROCESS:      css_process_kill_all_ha_process (...); break;
  case DEREGISTER_HA_PROCESS_BY_PID:  css_process_deregister_ha_process_by_pid (...); break;
  case DEREGISTER_HA_PROCESS_BY_ARGS: css_process_deregister_ha_process_by_args (...); break;
  case START_HA_UTIL_PROCESS:    css_process_start_ha_util_process (...); break;
}

Each handler reads its arguments from the request packet, performs the operation (often by walking css_Master_socket_anchor and inspecting per-connection name fields), and writes a response back via the same connection.

The status family answers what commdb -P / commdb -O / cubrid service status / cubrid server status / cubrid heartbeat status ask. The shutdown family is what commdb -S / commdb -A / cubrid server stop send. The HA family is what cubrid heartbeat … sends and is the public interface to the heartbeat subsystem.

Server-name → connection lookup

Several handlers — css_process_kill_slave, css_process_kill_immediate, css_process_start_shutdown_by_name, css_process_get_server_ha_mode, css_process_shutdown_reviving_server — walk css_Master_socket_anchor for an entry whose name matches the target server name and is not an HA-replication connection (!IS_MASTER_CONN_NAME_HA_SERVER etc.). The repeated guard filters out the heartbeat-internal connections that share the same anchor list but represent different processes — failing to exclude them would cause shutdown commands to target the wrong process.

This is the cross-cutting reason IS_MASTER_CONN_NAME_* macros appear so often in master_request.c: every per-server operation has to disambiguate “the cub_server connection for database X” from “the HA-copylog/applylog connection for database X” because both share the database name.

`server_monitor` — the C++ supervisor

When auto_restart_server = on, master instantiates a server_monitor (declared in master_server_monitor.hpp):

class server_monitor {
  public:
    enum class job_type {
      REGISTER_SERVER       = 0,    // a new cub_server connected — record its PID + argv
      UNREGISTER_SERVER     = 1,    // cub_server cleanly deregistered — drop the entry
      REVIVE_SERVER         = 2,    // cub_server connection died unexpectedly — re-fork
      CONFIRM_REVIVE_SERVER = 3,    // post-fork check that the new server actually came up
      SHUTDOWN_SERVER       = 4,    // explicit shutdown request — drop entry without revive
      JOB_MAX
    };

    void produce_job (job_type, int pid, const std::string &exec_path,
                      const std::string &args, const std::string &server_name);

  private:
    std::unordered_map <std::string, server_entry> m_server_entry_map;
    std::unique_ptr<std::thread>                   m_monitoring_thread;
    std::queue<job>                                m_job_queue;
    std::mutex                                     m_server_monitor_mutex;
    std::condition_variable                        m_monitor_cv_consumer;

    void server_monitor_thread_worker ();   // consumer loop on m_job_queue
};

class server_entry {
    int                                       m_pid;
    std::string                               m_exec_path;
    std::unique_ptr<char *[]>                 m_argv;            // saved for re-exec
    volatile bool                             m_need_revive;
    std::chrono::steady_clock::time_point     m_last_revived_time;
};

The supervisor is a producer-consumer queue with one consumer thread. produce_job is called from the master’s select-loop thread when various events occur:

REGISTER_SERVER — a new cub_server connection appears in the anchor and identifies itself; master records the PID, exec path, args, and server name in m_server_entry_map.
UNREGISTER_SERVER — a cub_server cleanly disconnects (e.g., normal shutdown via commdb -S); the entry is dropped without revive.
REVIVE_SERVER — a cub_server connection died without prior unregister; the entry’s m_need_revive is set to true.
CONFIRM_REVIVE_SERVER — after a revive fork, master enqueues a confirmation job that verifies the new process actually opened a new connection within a timeout.
SHUTDOWN_SERVER — explicit shutdown; entry dropped; no revive even if auto_restart_server is on.

The consumer thread (server_monitor_thread_worker) blocks on m_monitor_cv_consumer waiting for jobs and processes them serially — registration and revival can’t race because they share the lock around m_server_entry_map and the queue.

try_revive_server is the actual fork point: it forks and execvs <exec_path> with the saved argv, records the new PID against the existing server-name entry, and updates m_last_revived_time so a flapping server can be rate-limited (though the rate-limiting policy itself isn’t strict in current code — m_last_revived_time is recorded but the consumer doesn’t read it before the next revive).

auto_restart_server is the only condition that activates this supervisor; without it, master_Server_monitor is a null unique_ptr and master doesn’t track or restart child servers itself. (In production, operators using HA almost always set auto_restart_server = on because heartbeat failover assumes crashed servers come back; in standalone HA-disabled deployments, the operator’s choice depends on whether they have an external supervisor.)

`master_util.c` and `master_util.h` — small helpers

Three utility helpers used during boot:

master_util_config_startup (config_path, &port_id) — reads the master section of cubrid.conf; returns false if the file is missing or unparseable. The argv[1] from main is passed as a config-file path override (rarely used; defaults to $CUBRID/conf/cubrid.conf).
master_util_wait_proc_terminate — used by css_process_kill_immediate to wait for a SIGTERM-ed child to actually die before reporting completion to the requester.
master_util_get_eof_message — formats end-of-stream messages for connections that drop unexpectedly.

These are intentionally small — master_util.c is ~90 lines, not a substantial module on its own.

Lifecycle and shutdown

Master shutdown can be initiated from three places:

commdb -S <db> — kills one cub_server (KILL_SLAVE_SERVER / KILL_SERVER_IMMEDIATE); master sends SIGTERM to the child, waits, drops the entry. Doesn’t shut down master itself.
commdb -A — kills every registered cub_server. Each killed in turn; master remains running.
commdb for master (KILL_MASTER_SERVER, or cubrid service stop) — css_process_kill_master initiates master shutdown: cleanly terminates the server monitor’s thread, sends shutdown to every registered cub_server, closes the listening socket, exits.

css_master_cleanup is the SIGINT/SIGTERM handler installed in NO_DAEMON (foreground debug) mode; it runs the same shutdown sequence as KILL_MASTER_SERVER.

Source Walkthrough

Daemon (`master.c`)

Symbol	Role
`main`	Entry; config; duplicate-master check; daemonise; init sockets and signals; optional HA + server-monitor instantiation; enter select loop
`css_master_init`	Sets up listening sockets (IPv4 + IPv6/UDS), installs SIGCHLD/SIGINT/SIGTERM, seeds the connection anchor
`css_daemon_start`	Standard double-fork daemonisation; setsid, redirect stdio to /dev/null
`css_master_error`	Master-specific error printer that goes to both `stderr` and `<hostname>_master.err`
`css_master_cleanup`	SIGINT handler in foreground mode; runs shutdown sequence
`css_Master_socket_fd[]`	The two listening file descriptors
`css_Master_socket_anchor`	The connection anchor (doubly-linked list) the select loop iterates
`css_Start_time`	Recorded at boot for `GET_START_TIME` responses
`auto_Restart_server`	Bool mirror of `PRM_ID_AUTO_RESTART_SERVER`; gates `master_Server_monitor` instantiation

Request dispatch (`master_request.c`)

Symbol	Role
`process_master_request`	Top-level opcode switch (line 1947); status, shutdown, HA families
`css_process_start_time_info` / `_shutdown_time_info`	GET_START_TIME / GET_SHUTDOWN_TIME
`css_process_server_count_info` / `_server_list_info`	GET_SERVER_COUNT / GET_SERVER_LIST — for `commdb -P`
`css_process_all_count_info` / `_all_list_info`	GET_ALL_COUNT / GET_ALL_LIST — for `commdb -O` (servers + brokers + pl)
`css_process_request_count_info`	Master’s own request counter — diagnostic
`css_process_kill_slave` / `_kill_immediate` / `_kill_master`	Shutdown handlers
`css_process_start_shutdown` / `_start_shutdown_by_name` / `_shutdown` / `_stop_shutdown`	Two-phase shutdown coordination
`css_process_shutdown_reviving_server`	Special shutdown path for a server that’s currently being revived (race avoidance with server_monitor)
`css_process_get_server_ha_mode`	GET_SERVER_HA_MODE — used by HA logic
`css_process_register_ha_process` / `_deregister_ha_process` / `_change_ha_mode`	HA-process lifecycle (heartbeat subsystem ↔ master)
`css_process_ha_ping_host_info` / `_ha_node_list_info` / `_ha_admin_info`	HA-info-query handlers
`css_process_get_eof`	Generic EOF responder for short-lived probe connections

Server monitor (`master_server_monitor.{cpp,hpp}`)

Symbol	Role
`server_monitor` (class)	The supervisor; owns `m_server_entry_map`, `m_job_queue`, the consumer thread
`server_monitor::produce_job`	Producer entry called from the master select loop
`server_monitor::server_monitor_thread_worker`	Consumer loop; pulls jobs and dispatches
`server_monitor::register_server_entry` / `remove_server_entry` / `revive_server` / `try_revive_server` / `check_server_revived` / `shutdown_server`	Per-job handlers
`server_entry` (inner class)	One per registered cub_server; PID + exec path + saved argv + revive timestamps
`master_Server_monitor` (global `unique_ptr`)	Singleton instance; nullptr unless `auto_restart_server` is on
`auto_Restart_server` (global bool)	Mirror of the sysparam

Boot / shutdown helpers (`master_util.c`)

Symbol	Role
`master_util_config_startup`	Reads `cubrid.conf`; returns port ID and validity
`master_util_wait_proc_terminate`	`waitpid` wrapper used after sending SIGTERM to a child
`master_util_get_eof_message`	EOF message formatter

Position hints (as of 2026-05-05)

Symbol	Path
`master.c::main`	`src/executables/master.c:1207`
`css_master_init`	`src/executables/master.c:259`
`process_master_request` (request switch)	`src/executables/master_request.c:1947`
`css_process_start_time_info`	`src/executables/master_request.c:166`
`css_process_server_list_info`	`src/executables/master_request.c:286`
`css_process_all_list_info`	`src/executables/master_request.c:386`
`css_process_kill_slave`	`src/executables/master_request.c:498`
`css_process_kill_immediate`	`src/executables/master_request.c:579`
`css_process_start_shutdown_by_name`	`src/executables/master_request.c:615`
`css_process_kill_master`	`src/executables/master_request.c:686`
`css_process_register_ha_process`	`src/executables/master_request.c:913`
`css_process_change_ha_mode`	`src/executables/master_request.c:956`
`server_monitor` (class)	`src/executables/master_server_monitor.hpp:38`
`server_monitor::job_type` (enum)	`src/executables/master_server_monitor.hpp:43`

Symbol names are the canonical anchor; line numbers are hints scoped to the updated: date.

Cross-check Notes

HA-replication and master are separate. This doc covers master’s daemon shape and request dispatch. The HA-replication subsystem (master_heartbeat.c, ~7300 lines) sits on top of the same select loop and uses additional connection types identified by IS_MASTER_CONN_NAME_HA_* prefixes. Heartbeat semantics, peer-discovery, and failover live in cubrid-heartbeat.md.
Server monitor and heartbeat are independent. The C++ server_monitor (this doc) is per-host process supervision. Heartbeat is per-cluster replication-state coordination. A configuration could enable one without the other (HA on, auto_restart_server off — heartbeat tracks replication state but doesn’t restart crashed servers; HA off, auto_restart_server on — single-host with crash-restart). Both on is the typical HA production setup.
Connection-anchor name disambiguation. Every per-server request handler in master_request.c filters IS_MASTER_CONN_NAME_HA_* prefixes from its anchor walk. The pattern is repeated 8+ times in the source. Adding a new request handler that operates on cub_server connections and forgetting this filter would silently target HA-internal connections — common bug shape worth flagging in code review.
auto_restart_server activation is one-shot at master startup. Toggling the sysparam at runtime doesn’t switch the server monitor on or off; the master would have to be restarted. This is intentional — the supervisor’s state (the m_server_entry_map) requires consistent registration history from boot to be useful.
Revive race with explicit shutdown. css_process_shutdown_reviving_server exists specifically because an operator may issue commdb -S <db> while server_monitor is in the middle of re-forking that same server. The handler sets a sentinel that try_revive_server checks before actually doing the fork; if the shutdown landed first, the revive is cancelled.
m_last_revived_time is recorded but not enforced. The server_entry records the timestamp of each revive attempt, but the consumer doesn’t currently use it to rate-limit a flapping server. A buggy cub_server that panics within seconds of restart will be revived in a tight loop. This is a known gap; flapping detection is left to external monitoring.
Duplicate-master check is per-port, not per-config. css_does_master_exist (port_id) only checks if the port is in use. Two cub_master instances reading different cubrid.confs but assigned the same port will conflict; two instances on different ports will both run, neither aware of the other.

Open Questions

Containerised deployment. The double-fork daemonisation is unnecessary inside container init systems (Docker init=tini, Kubernetes pod with init container). The NO_DAEMON env var skips the fork but doesn’t change other daemon behaviours (signal handling, error log path). A fully container-aware mode would also redirect logs to stdout/stderr and not write the <hostname>_master.err file.
Server monitor flapping policy. As noted, no rate limiting on revive. A documented policy would be: exponential backoff with a hard ceiling (e.g., max 5 revives in 5 minutes, then deregister and require operator intervention).
Request opcode authentication. Request opcodes have no authentication — anyone who can connect to master’s port can issue KILL_MASTER_SERVER. Production deployments rely on network-level firewalling (the master port is typically not exposed). A first-class auth mechanism is undocumented.
HA detail. This doc deliberately stays at the connection- registry level for HA. A complete picture of the HA-info-query semantics (GET_HA_NODE_LIST, GET_HA_PROCESS_LIST, etc.) belongs in cubrid-heartbeat.md rather than here.

Sources

src/executables/master.c — daemon entry, init, select loop, socket anchor management, signal handlers
src/executables/master_request.c — opcode dispatcher and the per-opcode handlers
src/executables/master_request.h — opcode enum, IS_MASTER_CONN_NAME_* macros, handler prototypes
src/executables/master_server_monitor.{cpp,hpp} — C++ process supervisor with producer-consumer job queue
src/executables/master_util.{c,h} — config reader and small process helpers
src/executables/AGENTS.md — agent guide
Adjacent docs: cubrid-heartbeat.md (HA-replication subsystem layered on the same daemon — covers master_heartbeat.c and the HA-info-query handlers cross-referenced from this doc), cubrid-broker.md (broker is a separate daemon, but commdb -O reaches it through master), cubrid-cub-admin.md (the unified admin CLI; cubrid service family forks master and the cubrid commdb verb is what most of master’s request handlers serve), cubrid-overview-server-architecture.md (master’s place in the four-process model)