CUBRID Thread and Worker Pool — Workers, Daemons, Lock-Free Primitives, and Critical Sections

Contents:

Theoretical Background
Common DBMS Design
CUBRID’s Approach
Source Walkthrough
Cross-check Notes
Open Questions
Sources

Theoretical Background

Every database engine is, at its lowest layer, a thread system: a way of turning a stream of incoming requests and a set of recurring background duties into actual operating-system threads, and a way of giving each of those threads enough state to do its job without paying for that state on every call. The CUBRID server is no different. Before the lock manager, before MVCC, before recovery, there is a thread pool, a per-thread context, a set of long-running background daemons, and a small set of synchronization primitives those higher layers reuse.

Two textbooks frame the design space.

Database Internals (Petrov, Ch. 14 “Concurrency Control” and the storage-layer chapters) names the basic split: the engine has worker threads, which exist to run user-driven units of work (connections, queries, scans, page reads), and daemon threads, which exist to do recurring engine-driven work (flushing dirty pages, trimming WAL, detecting deadlocks, vacuuming dead versions). The lifetimes are different — a worker is born to do one task and is recycled to do another, while a daemon is born once and runs until shutdown. The synchronisation requirements are also different — a worker is usually woken by a request arriving on a queue, while a daemon is usually woken by a clock or a wakeup hint from another thread.

The Art of Multiprocessor Programming (Herlihy & Shavit) names the algorithmic problem the thread system has to solve as soon as the worker count grows past a handful: lock-free data structures for the queues, free-lists, and hash tables those workers and daemons share. A naive design with a single mutex around the task queue or the lock table serializes every thread on the critical line of code covered by that mutex, and the engine stops scaling. The book gives the canonical patterns — the Treiber stack (CAS-based push/pop), the Michael–Scott queue (lock-free FIFO), and the Maged Michael / Fraser Robust Hash (FRH, also called split-ordered list hashmap, the basis for Java’s ConcurrentHashMap after JSR-166). Every modern engine that pretends to scale picks some subset of these and uses them everywhere a thread might wait.

These two framings give the four implementation choices that shape every real engine and frame the rest of this document:

Thread-per-connection or thread-pool? The classical approach (PostgreSQL, the original C MySQL) is one OS process or thread per connection, no pool — the OS scheduler is the work queue. The pooled approach (MySQL’s thread-pool plugin, Oracle’s MTS, SQL Server’s SOS scheduler, CUBRID) decouples connections from threads: a connection arrives at a queue, and a worker picks it up. The trade-off is overhead vs isolation: pools amortise context creation and bound concurrency, but require their own scheduler and their own per-thread context to fake what thread-per-connection got from the OS for free.
One queue or many? A single global task queue is simple and correct but contended. The standard refinement is partitioned queues: split the workers into groups (call them cores, partitions, shards), each with its own task queue, and pick a queue per task by hash, round-robin, or NUMA proximity. The cost is uneven load (one shard may starve while another piles up); the gain is the lock on each shard’s queue is rarely contended. CUBRID picks partitioned queues with a round-robin dispatch.
Pool the threads or pool the entries? When a worker finishes its task, two things can be retired: the OS thread itself, and the per-thread context (the bag of state the engine code reads and writes via thread_p). Some engines pool both (always-alive workers, never destroy contexts). Some pool only the contexts and let the OS thread come and go on demand. CUBRID does the latter by default — the context (the cubthread::entry) is dispatched from a fixed-size pool, and the OS thread is created when a task is assigned to a worker and exits after an idle timeout when no new task arrives. A tunable (pool_threads, plus a perf-test override) keeps threads alive forever when needed.
What synchronization primitive guards the slow paths? Every thread system has at least three layers of sync primitives. At the bottom, a pthread_mutex_t protects single critical lines (a queue head, a free list). One layer up, a reader/writer primitive lets many readers in while excluding writers (the classic textbook RW-lock). At the top, lock-free structures replace locks entirely on the hot path. CUBRID has all three: pthread_mutex_t everywhere, a custom SYNC_CRITICAL_SECTION RW primitive (the csect) for slow heavyweight gates like the transaction table, and lockfree::hashmap plus lockfree::circular_queue (in the base module) for the hot data structures.

After these four choices are named, every CUBRID-specific structure in this document either implements one of them or makes the chosen implementation faster.

Common DBMS Design

Every DBMS that supports concurrent connections ships some form of thread system, and the shapes converge on a small handful of patterns. CUBRID’s specific choices in the next section are best read as one set of dials within this shared design space.

Thread-per-connection (PostgreSQL)

PostgreSQL’s core uses a process-per-connection model, not a thread pool. The postmaster forks a backend per connection; the backend holds a MyProc and a PGPROC shared-memory slot until the connection closes. There are background processes (autovacuum launcher + workers, wal-writer, bgwriter, checkpointer, archiver, replication walsender/walreceiver, logical replication apply workers) but they are full processes, registered and started by the postmaster, not threads of a pool. The result is excellent isolation (a corrupted backend cannot corrupt others) at a cost in per-connection memory and IPC overhead. Connection pooling is delegated to external middleware (PgBouncer, pgpool).

Pooled threads (MySQL, MariaDB, Oracle MTS, SQL Server)

Commercial and high-fanout DBMSes pool threads for the obvious reason that the OS cannot economically dispatch tens of thousands of connections each as a thread.

MySQL Enterprise / MariaDB thread pool assigns connections to thread groups, each with its own queue and its own waiter and listener threads. The number of groups is roughly the CPU count; each group runs one or two active threads.
Oracle Multi-Threaded Server (MTS / shared-server architecture) multiplexes user processes onto a smaller pool of server processes via dispatchers. Dedicated-server mode bypasses the pool.
SQL Server SOS (SQL OS) is its own user-space scheduler with workers, schedulers (one per logical CPU), tasks, and a task queue. It ports cooperatively-scheduled fibers on top of OS threads.

CUBRID sits squarely in this camp: a CUBRID server is a single multi-threaded process, with a thread-entry pool, a partitioned worker pool, and a daemon set.

Daemon set (everyone)

Independent of the worker model, every engine ships a small set of named long-running background threads. The names converge:

Page-flush / bgwriter / lazy writer — picks dirty pages and writes them so the buffer pool always has clean victims.
Log-flush / wal-writer — fsyncs WAL up to the latest committed LSN to enforce the WAL rule.
Checkpoint — periodically flushes the buffer pool and writes a checkpoint log record.
Archive / log-mover — ships old WAL out of the active range.
Vacuum / purge / clean-up — reclaims dead MVCC versions (PostgreSQL autovacuum, InnoDB purge, CUBRID vacuum).
Deadlock detector — periodically walks the wait-for graph.
Stats collector — periodically samples activity counters.

Every CUBRID daemon listed in §“Pool consumers” matches one of these.

Custom RW-lock and tracker

When a single shared resource is read constantly and written rarely (the transaction table, the schema cache, the disk allocation map), a pthread_rwlock_t is the textbook answer. In practice, every serious engine ships its own RW primitive that adds: writer queue (so writers don’t starve under reader load), promotion (a reader upgrading to a writer without releasing first), demotion (a writer downgrading to reader), per-section identity (so deadlock-cycle detection can name the section), and a tracker that, in debug builds, records which sections each thread holds and complains on inconsistent re-entry. This is exactly CUBRID’s SYNC_CRITICAL_SECTION plus its critical_section_tracker.

Lock-free shared structures

The two consistently lock-free structures in modern engines are:

A lock-free hash map for the lock table, the page table, the schema cache, the lock-free allocator buckets. The standard algorithm is Maged Michael’s split-ordered hashmap (Java ConcurrentHashMap heritage) — CUBRID’s lockfree::hashmap.
A lock-free circular buffer / MPSC queue for handoff between workers and a single consumer (a flusher, a vacuumer). CUBRID’s lockfree::circular_queue (in src/base/) is used inside the page buffer for direct victim handoff.

CUBRID exposes the lock-free hash map through a thread-aware wrapper (cubthread::lockfree_hashmap) that ties hash-map transactions to per-thread descriptors carried inside cubthread::entry.

CUBRID’s Approach

CUBRID compiles three different binaries from the same src/thread/ sources via preprocessor guards: SERVER_MODE (the cub_server process), SA_MODE (standalone, where client and server are linked into the same binary), and CS_MODE (client-side library). The thread machinery is fully active only in SERVER_MODE. In SA_MODE, cubthread::manager is constructed but worker pools and daemons are not — instead, push_task executes the task synchronously on the calling thread. This is the first detail worth carrying into the rest of the document: every worker-pool path you see is also a synchronous path in standalone, by skipping the pool dispatch.

// manager::push_task — thread/thread_manager.cpp
void
manager::push_task (worker_pool *worker_pool_arg, entry_task *exec_p)
{
  if (worker_pool_arg == NULL)
    {
      // execute on this thread
      exec_p->execute (get_entry ());
      exec_p->retire ();
    }
  else
    {
#if defined (SERVER_MODE)
      check_not_single_thread ();
      worker_pool_arg->execute (exec_p);
#else
      // ... fall back to immediate execute ...
#endif
    }
}

The rest of this section walks the architecture top-down: per-thread context first (cubthread::entry), then the manager that owns the context pool, then the worker pool template parameterised by it, then the daemon class that uses the same context pool but a different loop pattern, then the lock-free hashmap that those threads call into, and finally the critical-section primitive layered on top of pthread_mutex_t for slow shared resources.

`cubthread::entry` — the per-thread context

cubthread::entry is the single most important struct in the server. Every engine routine that needs to know “who is calling me and what state are they in” takes a THREAD_ENTRY *thread_p (typedef alias for cubthread::entry *) as the first parameter. The entry carries everything that, in a thread-per-connection engine, would be implicit in the OS thread’s TLS:

Identity — index (slot in the manager’s array, 1-based), type (worker, daemon, vacuum master, recovery, …), m_id (std::thread::id), client_id (which session is this thread serving), tran_index (which transaction descriptor — TDES — is bound), private_lru_index (which page-buffer private LRU list this thread owns when private quotas are enabled).
Status & wait reason — an FSM with TS_DEAD, TS_FREE, TS_RUN, TS_WAIT, TS_CHECK, plus a resume_status enumerating the reason a TS_WAIT was entered (THREAD_LOCK_SUSPENDED, THREAD_PGBUF_SUSPENDED, THREAD_LOGWR_SUSPENDED, …). The reason is the protocol with whoever is going to wake the thread up: when the lock manager grants a lock, it passes THREAD_LOCK_RESUMED to thread_wakeup so the waiter can verify it was woken for the right reason and not by a spurious signal.
Synchronization — a per-entry pthread_mutex_t th_entry_lock plus a pthread_cond_t wakeup_cond. Every blocking primitive in the server (latch, lock, csect, log writer) ultimately calls thread_suspend_wakeup_and_unlock_entry which does pthread_cond_wait on this pair. There is no futex-style fast path; every wait goes through the entry’s condition variable.
Allocator caches — private_heap_id, an HL_HEAPID into the thread-private heap; log_zip_undo / log_zip_redo / log_data_ptr, the per-thread compression buffers reused across log appends; tran_entries[THREAD_TS_*], an array of lock-free transaction descriptors, one per shared lock-free table (sessions, catalog, lock-res, lock-ent, xcache, fpcache, hfid table, dwb-slots, …) used to coordinate with the lock-free hashmap’s epoch / hazard-pointer scheme.
Trackers — m_alloc_tracker, m_pgbuf_tracker, m_csect_tracker. Active only in debug builds with ENABLE_TRACKERS = !NDEBUG && SERVER_MODE. They record every malloc, every page fix, every csect enter so the engine can, on transaction end, assert that the thread released everything it grabbed.
Per-subsystem state — vacuum_worker (a pointer to vacuum worker info if this entry is bound to a vacuum thread), xasl_unpack_info_ptr (the unpacked XASL for the currently executing query), lockwait / lockwait_msecs / lockwait_state (the “I am waiting on object X for at most N ms” tuple read by the deadlock detector), event_stats (slow query timing), and a thicket of flags — interrupted, shutdown (atomic), check_interrupt, no_logging, is_cdc_daemon, trigger_involved.

// cubthread::entry — thread/thread_entry.hpp
class entry
{
  public:
    enum class status { TS_DEAD, TS_FREE, TS_RUN, TS_WAIT, TS_CHECK };

    int index;            // thread entry index
    thread_type type;     // worker, daemon, vacuum, recovery, …
    int client_id;        // client whose request this thread runs
    int tran_index;       // bound transaction descriptor
    int private_lru_index;
    pthread_mutex_t  th_entry_lock;
    pthread_cond_t   wakeup_cond;
    status m_status;
    thread_resume_suspend_status resume_status;

    HL_HEAPID private_heap_id;        // private allocator cache
    css_conn_entry *conn_entry;       // network connection
    xasl_unpack_info *xasl_unpack_info_ptr;
    vacuum_worker *vacuum_worker;
    lf_tran_entry *tran_entries[THREAD_TS_COUNT];   // lock-free txn descriptors
    pgbuf_holder_anchor *m_holder_anchor;           // page-fix list head

    std::atomic_bool shutdown;
    bool interrupted;
    bool check_interrupt;
    bool wait_for_latch_promote;
    // ... condensed ...
  private:
    cuberr::context             m_error;
    cubbase::alloc_tracker      &m_alloc_tracker;
    cubbase::pgbuf_tracker      &m_pgbuf_tracker;
    cubsync::critical_section_tracker &m_csect_tracker;
    log_system_tdes             *m_systdes;
    lockfree::tran::index        m_lf_tran_index;
};

The “private members will gradually replace public” comment in the header is genuine: the project is mid-refactor away from a C-era THREAD_ENTRY struct of public fields, and the public fields above are intentional liabilities. New code that needs to read or write entry state should prefer accessor methods (get_error_context, get_lf_tran_index, get_pgbuf_tracker, …) over direct field access.

The constructor allocates one private heap, two pthread mutexes, one condition variable, and the three trackers — these are the costs the manager wants to amortise across many tasks, which is the whole reason entries are pooled.

Suspend/wake protocol

Every blocking primitive in the server reduces to two C-style functions on the entry’s mutex+condvar pair. Reading them once makes the rest of the engine readable:

// thread_suspend_wakeup_and_unlock_entry — thread/thread_entry.cpp
void
thread_suspend_wakeup_and_unlock_entry (cubthread::entry *thread_p,
                                        thread_resume_suspend_status suspended_reason)
{
  // entry's th_entry_lock must be held by caller
  thread_p->m_status = cubthread::entry::status::TS_WAIT;
  thread_p->resume_status = suspended_reason;
  // ... slow-query timing prelude ...

  pthread_cond_wait (&thread_p->wakeup_cond, &thread_p->th_entry_lock);

  thread_p->m_status = old_status;
  pthread_mutex_unlock (&thread_p->th_entry_lock);
}

// thread_wakeup_internal — thread/thread_entry.cpp
static void
thread_wakeup_internal (cubthread::entry *thread_p,
                        thread_resume_suspend_status resume_reason, bool had_mutex)
{
  if (!had_mutex)        thread_lock_entry (thread_p);
  pthread_cond_signal (&thread_p->wakeup_cond);
  thread_p->resume_status = resume_reason;
  if (!had_mutex)        thread_unlock_entry (thread_p);
}

The contract: a thread that wants to sleep on its own entry first locks th_entry_lock, sets resume_status to a suspend-reason, and calls cond_wait. A waker — typically holding another mutex (the lock-table mutex, the page-buffer mutex) — calls thread_wakeup with the matching resume-reason; the woken thread compares resume_status against the suspend-reason it expected and either proceeds or treats the wake as spurious. The _already_had_mutex variant skips the lock since the caller still holds it (typical when both the suspend and the wake are coordinated under a higher-level mutex).

This is also how the timeout variants work. thread_suspend_with_other_mutex takes an external mutex and lets the caller wait on the entry’s condvar but under that external mutex — the lock manager uses this so “hold the lock-table mutex, suspend on my entry, release lock-table mutex” is atomic across the suspend.

`cubthread::manager` — orchestrator

The manager is a singleton (Manager static, accessed via cubthread::get_manager ()) that owns the entry pool, the worker pool list, the daemon list, and the lock-free transaction system. It is allocated by cubthread::initialize, set up in initialize_thread_entries, and torn down by cubthread::finalize.

// cubthread::manager — thread/thread_manager.hpp
class manager
{
    // entries
    entry *m_all_entries;
    entry_dispatcher *m_entry_dispatcher;     // resource_shared_pool<entry>
    std::size_t m_max_threads;
    std::size_t m_available_entries_count;

    // pools and daemons
    std::vector<worker_pool *> m_worker_pools;
    std::vector<daemon *>      m_daemons;
    std::vector<daemon *>      m_daemons_without_entries;

    entry_manager        m_entry_manager;
    daemon_entry_manager m_daemon_entry_manager;

    // lock-free transaction system
    lockfree::tran::system *m_lf_tran_sys;

    // ...
    template <typename Res, typename ... CtArgs>
    Res *create_worker_pool (std::size_t pool_size, std::size_t core_count,
                             CtArgs &&... args);
    daemon *create_daemon (const looper &, entry_task *, const char *name,
                           entry_manager * = NULL);
    daemon *create_daemon_without_entry (const looper &,
                                         task_without_context *, const char *);
    entry *claim_entry (void);
    void retire_entry (entry &);
};

The size of the entry pool is not a knob — it is computed from the count of registered consumers using a count_registry global:

// manager::set_max_thread_count_from_config — thread/thread_manager.cpp
void
manager::set_max_thread_count_from_config (void)
{
  m_max_threads = cubbase::count_registry<connection>::total ()
                + cubbase::count_registry<worker_pool>::total ()
                + cubbase::count_registry<daemon>::total ()
                + 1 /* PAD */;
}

count_registry<T>::total() aggregates registrations made by the REGISTER_CONNECTION, REGISTER_WORKERPOOL, and REGISTER_DAEMON macros that each subsystem invokes at file scope. The macros do not register the quantity of threads; they register the count of named callers. So adding a new daemon or pool requires a REGISTER_* macro near the file’s top so the manager can size the entry pool, otherwise create_worker_pool will return NULL when “reserve N entries” exceeds m_available_entries_count.

// macros — thread/thread_manager.hpp
#define REGISTER_CONNECTION(name, getter)  static cubthread::manager::connection_registry_t  _gl_reg_conn_##name (#name, getter)
#define REGISTER_WORKERPOOL(name, getter)  static cubthread::manager::workerpool_registry_t  _gl_reg_wp_##name   (#name, getter)
#define REGISTER_DAEMON(name)              static cubthread::manager::daemon_registry_t      _gl_reg_daemon_##name (#name, 1)

Once entries exist, they are dispatched to OS threads on demand. claim_entry pops an entry from resource_shared_pool and stores it in a thread-local pointer tl_Entry_p; retire_entry puts it back. Every server thread starts by calling claim_entry (typically inside a worker pool’s init_run or a daemon’s loop_with_context), gets back its entry &, and then runs.

// manager::claim_entry / retire_entry — thread/thread_manager.cpp
entry *
manager::claim_entry (void)
{
  tl_Entry_p = m_entry_dispatcher->claim ();
  return tl_Entry_p;
}

void
manager::retire_entry (entry &entry_p)
{
  assert (tl_Entry_p == &entry_p);
  tl_Entry_p = NULL;
  m_entry_dispatcher->retire (entry_p);
}

The entry_manager (and its derivatives daemon_entry_manager, system_worker_entry_manager) is the policy object the worker pool calls into on each create_context / retire_context / recycle_context / stop_execution. It is the customisation hook by which subsystems attach their per-thread state to a generic worker. For example, vacuum’s vacuum_Worker_entry_manager overrides on_create to attach a vacuum_worker * to the entry; pl/sp’s m_monitor_helper_daemon uses a custom entry manager to wire up the JNI bridge’s class loaders.

`cubthread::worker_pool` — partitioned pool with per-core queues

The worker pool is the engine’s only way of running an arbitrary-but-bounded number of entry_task instances in parallel without spawning a thread per task. It is a template worker_pool_impl<bool Stats> so that pools that do not need statistics (typical) and pools that do (the global per-transaction worker pool) compile to different code with no runtime overhead. The base class worker_pool is the runtime polymorphic interface the manager and push_task see.

Pool → Core → Worker → Task

The pool is partitioned into cores (typically 1 for low-volume pools, several for high-volume pools), and each core owns a fixed slice of the pool’s workers and one task queue.

                         worker_pool
                         /    |    \
                       core  core  core            (m_cores)
                      /  |    |  \
                  worker worker  worker            (core::m_workers)
                                                   (core::m_available_workers)
                                                   (core::m_task_queue)

execute_on_core picks a core by hash (round-robin by default, custom by argument) and calls core_impl::execute_task. The hot path is short: take m_workers_mutex, pop an idle worker if any, and assign the task to that worker. Otherwise enqueue.

// core_impl::execute_task — thread/thread_worker_pool_impl.hpp
void
worker_pool_impl<Stats>::core_impl::execute_task (task_type *task_p, bool is_temp)
{
  if (!m_parent_pool->is_running ()) { task_p->retire (); return; }

  wrapped_task task_ref (task_p);
  std::unique_lock<std::mutex> ulock (m_workers_mutex);

  if (!m_available_workers.empty ())
    {
      worker_impl *refp = static_cast<worker_impl *> (m_available_workers.back ());
      m_available_workers.pop_back ();
      ulock.unlock ();
      refp->assign_task (std::move (task_ref));         // wake or spawn
    }
  else if (is_temp)
    {
      ulock.unlock ();
      execute_task_as_temp (std::move (task_ref));      // spawn an ad-hoc temp worker
    }
  else
    {
      m_task_queue.push (std::move (task_ref));         // enqueue
    }
}

A worker, after finishing its current task, calls get_task_or_become_available — a single critical section that either pops a queued task or registers the worker as available:

// core_impl::get_task_or_become_available — thread/thread_worker_pool_impl.hpp
std::optional<wrapped_task>
worker_pool_impl<Stats>::core_impl::get_task_or_become_available (worker &w)
{
  std::unique_lock<std::mutex> ulock (m_workers_mutex);
  if (!m_task_queue.empty ())
    {
      wrapped_task qt = std::move (m_task_queue.front ());
      m_task_queue.pop ();
      return std::optional<wrapped_task> (std::in_place, std::move (qt));
    }
  m_available_workers.push_back (&w);
  return std::nullopt;
}

So a worker is exactly in one of two states: holding a task and running it, or in m_available_workers waiting to be assigned one. The mutex guards both lists and serializes the assign vs available race. There is no separate idle list; m_available_workers is the idle list.

Worker thread lifecycle

A worker is a persistent C++ object inside the pool; what comes and goes is the OS thread. assign_task either notifies the existing thread (if the worker has one — m_has_thread) or spawns one fresh:

// worker_impl::assign_task — thread/thread_worker_pool_impl.hpp
void
worker_pool_impl<Stats>::core_impl::worker_impl::assign_task (wrapped_task &&task_ref)
{
  std::unique_lock<std::mutex> ulock (m_task_mutex);
  m_wrapped_task.emplace (std::move (task_ref));

  if (m_is_temp)             { m_has_thread = true; start_thread (); return; }
  if (m_has_thread)          { ulock.unlock (); m_task_cv.notify_one (); }
  else
    {
      m_has_thread = true;
      ulock.unlock ();
      start_thread ();        // std::thread (&worker_impl::run, this).detach ()
    }
}

The detached thread runs worker_impl::run, which is:

// worker_impl::run — thread/thread_worker_pool_impl.hpp
void
worker_pool_impl<Stats>::core_impl::worker_impl::run (void)
{
  os::resources::cpu::clearaffinity ();
  pthread_setname_np (pthread_self (), m_parent_core->get_parent_pool ()->get_name ().c_str ());
  init_run ();                                                  // claim entry (context)

  if (m_is_temp) { execute_current_task (); finish_run (); return; }

  // loop: execute current task, then try to grab another
  if (!m_wrapped_task.has_value ())
    {
      if (get_new_task ()) { /* got one */ }
    }
  if (m_wrapped_task.has_value ())
    {
      do { execute_current_task (); } while (get_new_task ());
    }
}

init_run calls m_parent_core->get_entry_manager().create_context() which under the hood claims an entry from the manager’s dispatcher and runs the subsystem-specific on_create hook. finish_run is the mirror — retire_context releases the entry back. Between them, the thread loops on tasks; each task gets the same entry * as context, so a long-running pool effectively keeps its claimed entries pinned.

get_new_task is the part of the worker that decides whether to keep the OS thread alive:

// worker_impl::get_new_task — thread/thread_worker_pool_impl.hpp
bool
worker_pool_impl<Stats>::core_impl::worker_impl::get_new_task (void)
{
  std::unique_lock<std::mutex> ulock (m_task_mutex, std::defer_lock);

  if (!m_stop)
    {
      // 1. queued task?  pop it.
      auto qt = static_cast<core_impl *> (m_parent_core)->get_task_or_become_available (*this);
      if (qt.has_value ())  { m_wrapped_task.emplace (std::move (*qt)); return true; }

      // 2. no queued task; we just got added to m_available_workers.
      //    wait on m_task_cv until either a task arrives or idle_timeout fires.
      ulock.lock ();
      if (!m_wrapped_task.has_value () && !m_stop)
        {
          condvar_wait (m_task_cv, ulock,
                        m_parent_core->get_parent_pool ()->get_idle_timeout (),
                        [this] () -> bool { return m_wrapped_task.has_value () || m_stop; });
        }
    }
  // ...
  if (!m_wrapped_task.has_value ())
    {
      // timed out; thread will exit.  next assign_task spawns fresh.
      m_has_thread = false;
      finish_run ();          // retire context now
      return false;
    }
  // got a task, keep going
  return true;
}

The idle timeout default is 5 seconds (thread_create_worker_pool’s default in the manager), or infinity when pool_threads=true is passed (or perf-test mode forces it via wp_set_force_thread_always_alive ()). When a worker’s thread times out, the worker stays in the pool but with m_has_thread = false; the next assign_task starts a new OS thread. This is the “low-load: retire threads, high-load: keep them alive” knob.

Task wrapper, statistics

wrapped_task<Stats> wraps the task pointer with optional timing. On Stats=true it records cubperf::time_point at construction and the worker tracks per-stage timing (start_thread, create_context, execute_task, retire_task, found_in_queue, wakeup_with_task, recycle_context, retire_context). On Stats=false the wrapper is essentially a task* and if constexpr (Stats) discards the counters at compile time.

Task-cap (rate limiting)

Some pools must not be flooded — if the page-flush daemon enqueues faster than workers can flush, the queue grows without bound and backpressure breaks. cubthread::worker_pool_task_capper (thread_worker_pool_taskcap.hpp) wraps a pool with a fixed token budget. try_task returns false if budget exhausted; push_task blocks on a condition variable until end_task (called from the capped_task::execute epilogue) signals a free slot. This is how recovery’s parallel redo and the page-flush dispatcher avoid unbounded buildup.

Pool consumers

Eight pools live in the codebase, found by grepping thread_create_worker_pool and thread_create_stats_worker_pool:

Pool name	Subsystem	Created in
`transaction`	Connection / query workers (the user-facing pool)	`src/connection/server_support.c`
`vacuum`	Vacuum workers (per-block log replay)	`src/query/vacuum.c`
`parallel-query`	Intra-query parallel scan/sort	`src/query/parallel/px_worker_manager_global.cpp`
`loaddb`	Bulk loader workers	`src/loaddb/load_worker_manager.cpp`
`online-index`	Online index builder	`src/storage/btree_load.c`
`backup-read`	Parallel-read backup	`src/transaction/log_page_buffer.c`
`recovery-redo`	Parallel WAL redo	`src/transaction/log_recovery_redo_parallel.cpp`
(per-method)	Method/SP temp workers	implicit via `execute_on_core (..., is_temp=true)`

The transaction pool is the one that runs SELECT and DML — when “who actually executes my SQL” is asked, the answer is “a worker thread of the transaction pool”.

flowchart LR
  subgraph WP["worker_pool"]
    direction TB
    Q1["core[0].queue"]
    Q2["core[1].queue"]
    Q3["core[2].queue"]
    A1["available[0]"]
    A2["available[1]"]
    A3["available[2]"]
  end

  Push["push_task"] -->|round-robin| RR{"get_next_core"}
  RR -->|hash%N| Q1 & Q2 & Q3
  Q1 --> Wk1["worker_impl"]
  Q2 --> Wk2["worker_impl"]
  Q3 --> Wk3["worker_impl"]
  Wk1 --> Ex1["execute_current_task"]
  Wk2 --> Ex2["execute_current_task"]
  Wk3 --> Ex3["execute_current_task"]
  Ex1 -->|next loop| Get1["get_task_or_become_available"]
  Get1 -->|queued? yes| Wk1
  Get1 -->|queued? no| A1
  A1 -.idle_timeout.-> Exit1["thread exit, m_has_thread=false"]

`cubthread::daemon` — single-thread looper

A daemon is one OS thread, one task, one looper. daemon does not share a pool — each daemon has its own thread, its own waiter, its own context. The thread is started in the daemon constructor via std::thread (daemon::loop_with_context, this, ...) and joined in stop_execution (called from ~daemon).

// daemon::loop_with_context — thread/thread_daemon.cpp
void
daemon::loop_with_context (daemon *daemon_arg, entry_manager *entry_manager_arg,
                           entry_task *exec_arg, const char *name)
{
  pthread_setname_np (pthread_self (),
                      name[0] ? name : "unnamed-daemon");

  entry &context = entry_manager_arg->create_context ();         // claim entry, run on_daemon_create
  daemon_arg->register_stat_start ();

  while (!daemon_arg->m_looper.is_stopped ())
    {
      exec_arg->execute (context);                               // do the periodic work
      daemon_arg->register_stat_execute ();

      daemon_arg->pause ();                                      // sleep per looper policy
      daemon_arg->register_stat_pause ();
    }

  entry_manager_arg->stop_execution (context);
  entry_manager_arg->retire_context (context);
  exec_arg->retire ();
}

The looper controls how the daemon waits between iterations (see next subsection). The waiter is the per-daemon condvar/mutex. wakeup calls m_waiter.wakeup () so that an external event can shorten the next wait — the page-flush daemon is woken as soon as the dirty page count crosses a threshold; the deadlock detector is woken when a thread has been on a lock wait queue for longer than the configured threshold.

Daemon FSM

stateDiagram-v2
  [*] --> CreateContext
  CreateContext --> ExecuteTask: entry_manager.create_context()
  ExecuteTask --> Pause: exec->execute(ctx)
  Pause --> CheckStopped: looper.put_to_sleep(waiter)
  CheckStopped --> ExecuteTask: !looper.is_stopped()
  CheckStopped --> RetireContext: looper.is_stopped()
  RetireContext --> [*]: entry_manager.retire_context(ctx); exec->retire()

  note right of Pause
    Wait pattern decided by looper:
    - INF_WAITS: cond_wait until wakeup
    - FIXED_WAITS: cond_timedwait for fixed period
    - INCREASING_WAITS: increasing periods, reset on wakeup
    - CUSTOM_WAITS: user-provided period_function
  end note

Daemon consumers

Found by grepping create_daemon:

Daemon name	Subsystem	Looper policy	Created in
`log-checkpoint`	Log manager	Fixed period (checkpoint_interval)	`log_manager.c`
`log-rm-archive`	Log manager	Fixed period	`log_manager.c`
`log-clock`	Log manager	Fixed (1s)	`log_manager.c`
`ha-delay-check`	HA replication	Fixed period	`log_manager.c`
`log-flush`	Log manager	INF (woken by commits)	`log_manager.c`
`cdc-loginfo-producer`	CDC	INF	`log_manager.c`
`deadlock-detect`	Lock manager	Fixed (deadlock_detection_interval)	`lock_manager.c`
`dwb-flush-block`	Double-write buffer	INF (woken on full block)	`double_write_buffer.cpp`
`dwb-file-sync`	Double-write buffer	INF	`double_write_buffer.cpp`
`pgbuf-maintain`	Page buffer	Fixed	`page_buffer.c`
`pgbuf-page-flush`	Page buffer	INF (woken on dirty threshold)	`page_buffer.c`
`pgbuf-page-post-flush`	Page buffer	INF	`page_buffer.c`
`pgbuf-flush-control`	Page buffer	Fixed	`page_buffer.c`
`session-control`	Session	Fixed	`session.c`
`vacuum-master`	Vacuum	Fixed (master scans WAL)	`vacuum.c`
`pl-monitor`	PL/Java server	Fixed	`pl_sr.cpp`

That’s the complete daemon set. Cross-reference with cubrid-vacuum.md (vacuum master + workers), cubrid-recovery-manager.md (recovery-redo pool, no daemon), cubrid-page-buffer-manager.md (four pgbuf daemons + dwb daemons).

graph TD
  subgraph "Worker pools (consumers)"
    TXN["transaction (server_support)"]
    VWP["vacuum (vacuum.c)"]
    PXQ["parallel-query (parallel/)"]
    LDDB["loaddb"]
    OIB["online-index (btree_load)"]
    BR["backup-read"]
    RR["recovery-redo"]
  end
  subgraph "Daemons (consumers)"
    LF["log-flush"]
    LCK["log-checkpoint"]
    LRA["log-rm-archive"]
    DD["deadlock-detect"]
    PGM["pgbuf-maintain"]
    PGF["pgbuf-page-flush"]
    PGPF["pgbuf-page-post-flush"]
    PGFC["pgbuf-flush-control"]
    DWB1["dwb-flush-block"]
    DWB2["dwb-file-sync"]
    SC["session-control"]
    VM["vacuum-master"]
    PLM["pl-monitor"]
    HAD["ha-delay-check"]
    CDC["cdc-loginfo-producer"]
    LCK1["log-clock"]
  end
  TM["cubthread::manager"]
  TM ---|tracks| TXN
  TM ---|tracks| VWP
  TM ---|tracks| PXQ
  TM ---|tracks| LDDB
  TM ---|tracks| OIB
  TM ---|tracks| BR
  TM ---|tracks| RR
  TM ---|tracks| LF
  TM ---|tracks| LCK
  TM ---|tracks| DD
  TM ---|tracks| PGF
  TM ---|tracks| DWB1
  TM ---|tracks| VM

`cubthread::looper` — wait pattern

The looper is small but central — it is the policy by which a daemon’s “do work, then sleep” loop spaces out iterations. Four patterns:

// looper::wait_type — thread/thread_looper.hpp
enum wait_type
{
  INF_WAITS,         // sleep indefinitely until wakeup
  FIXED_WAITS,       // sleep a fixed duration, repeat
  INCREASING_WAITS,  // sleep increasing durations on timeout, reset on wakeup
  CUSTOM_WAITS,      // arbitrary period_function
};

put_to_sleep is the one method daemons call:

// looper::put_to_sleep — thread/thread_looper.cpp
void
looper::put_to_sleep (waiter &waiter_arg)
{
  if (is_stopped ()) return;

  bool is_timed_wait = true;
  delta_time period = delta_time (0);
  m_setup_period (is_timed_wait, period);          // policy bound at construction

  if (is_timed_wait)
    {
      // subtract task execution time from the desired period so the daemon
      // ticks at a consistent wall-clock rate, not "execute + period"
      delta_time wait_time = delta_time (0);
      delta_time exec = std::chrono::system_clock::now () - m_start_execution_time;
      if (period > exec) wait_time = period - exec;
      m_was_woken_up = waiter_arg.wait_for (wait_time);
    }
  else
    {
      waiter_arg.wait_inf ();
      m_was_woken_up = true;
    }
  m_start_execution_time = std::chrono::system_clock::now ();
}

The “subtract execution time” detail matters: a daemon configured with a 1-second period that takes 800ms to do its job will sleep 200ms, not 1000ms, so wall-clock periodicity is preserved.

INCREASING_WAITS is the back-off policy — used for daemons whose work runs quickly when there’s nothing to do but should not poll expensively. The vacuum master uses this: scan WAL, find no work, sleep increasing intervals (e.g. 10ms → 100ms → 1s); on the first external wakeup hint, reset the index back to 0.

`cubthread::lockfree_hashmap` — shared hash map facade

CUBRID has two lock-free hash map implementations, kept side by side: an older lf_hash_table_cpp and a newer lockfree::hashmap (the split-ordered one). The thread layer provides a single template wrapper that picks one based on PRM_ID_ENABLE_NEW_LFHASH:

// cubthread::lockfree_hashmap — thread/thread_lockfree_hash_map.hpp
template <class Key, class T>
class lockfree_hashmap
{
  enum type { OLD, NEW, UNKNOWN };
  lf_hash_table_cpp<Key, T>  m_old_hash;
  lockfree::hashmap<Key, T>  m_new_hash;
  type                       m_type;
  int                        m_entry_idx;          // index into entry::tran_entries[]
  // ...
};

#define lockfree_hashmap_forward_func(f_, tp_, ...)  \
  is_old_type ()                                     \
    ? m_old_hash.f_ (get_tran_entry (tp_), __VA_ARGS__)         \
    : m_new_hash.f_ ((tp_)->get_lf_tran_index (), __VA_ARGS__)

Every operation forwards through entry::tran_entries[m_entry_idx] (old hash) or entry::get_lf_tran_index() (new hash) — the thread’s per-table descriptor used by the hash-map’s epoch / hazard scheme. The set of m_entry_idx slots is fixed in the THREAD_TS_* enum — every consumer of the lock-free hashmap must pre-allocate one in cubthread::entry. Today’s slots: THREAD_TS_SPAGE_SAVING, THREAD_TS_OBJ_LOCK_RES, THREAD_TS_OBJ_LOCK_ENT, THREAD_TS_CATALOG, THREAD_TS_SESSIONS, THREAD_TS_FREE_SORT_LIST, THREAD_TS_GLOBAL_UNIQUE_STATS, THREAD_TS_HFID_TABLE, THREAD_TS_XCACHE, THREAD_TS_FPCACHE, THREAD_TS_DWB_SLOTS. To add a new lock-free hashmap, you add a slot to the enum, request the entry slot in entry::request_lock_free_transactions, and create the hashmap. There is no extension point that avoids touching cubthread::entry.

`SYNC_CRITICAL_SECTION` (csect) — heavyweight RW gate

The csect is the engine’s coarsest synchronization primitive, reserved for shared resources where (a) reads dominate writes by orders of magnitude and (b) the resource is touched on critical control paths where a wait must be observable.

// SYNC_CRITICAL_SECTION — thread/critical_section.h
typedef struct sync_critical_section
{
  const char *name;
  int cs_index;                                // identity, into csect_Names[]
  pthread_mutex_t lock;                        // monitor lock
  int rwlock;                                  // >0 = # readers, <0 = writer, 0 = none
  unsigned int waiting_readers;
  unsigned int waiting_writers;
  pthread_cond_t readers_ok;                   // wakeup readers
  THREAD_ENTRY *waiting_writers_queue;         // FIFO of waiting writers
  THREAD_ENTRY *waiting_promoters_queue;       // FIFO of demoted-then-promoting
  thread_id_t owner;                           // current writer
  int tran_index;                              // for diag/dump
  SYNC_STATS *stats;
} SYNC_CRITICAL_SECTION;

The csect index space is fixed and small (CSECT_LAST ≈ 18 in the current source):

// csect/CSECT_* — thread/critical_section.h
enum
{
  CSECT_WFG = 0,                          // wait-for-graph
  CSECT_LOG,                              // log manager
  CSECT_LOCATOR_SR_CLASSNAME_TABLE,
  CSECT_QPROC_QUERY_TABLE,
  CSECT_QPROC_LIST_CACHE,
  CSECT_DISK_CHECK,
  CSECT_CNV_FMT_LEXER,
  CSECT_HEAP_CHNGUESS,
  CSECT_TRAN_TABLE,
  CSECT_CT_OID_TABLE,
  CSECT_HA_SERVER_STATE,
  CSECT_COMPACTDB_ONE_INSTANCE,
  CSECT_ACL,
  CSECT_PARTITION_CACHE,
  CSECT_EVENT_LOG_FILE,
  CSECT_TRACE_LOG_FILE,
  CSECT_LOG_ARCHIVE,
  CSECT_ACCESS_STATUS,
  CSECT_LAST
};

Each csect has a name, a pthread_mutex_t guarding the rwlock state, a single pthread_cond_t readers_ok for waking readers, and two intrusive FIFOs (waiting_writers_queue, waiting_promoters_queue) of THREAD_ENTRY *. The interesting feature is promotion and demotion: a thread that entered as reader can call csect_promote to upgrade in place to writer (without releasing first); a writer can csect_demote to step down to reader. This is what the lock manager and the schema manager need when they “observed something during a read scan that requires a write”.

Each csect also has a stats pointer — total enters, total waits, total re-enters, total elapsed wait, max elapsed wait — visible via csect_dump_statistics and the csect_start_scan SHOW handler.

Critical-section tracker

In debug builds (ENABLE_TRACKERS = !NDEBUG && SERVER_MODE), each thread carries a cubsync::critical_section_tracker that records how many times the thread has entered each csect, whether it is currently the writer or a (possibly demoted) reader, and a hard-coded interdependency rule:

// critical_section_tracker::check_csect_interdependencies — thread/critical_section_tracker.cpp
void
critical_section_tracker::check_csect_interdependencies (int cs_index)
{
  if (cs_index == CSECT_LOCATOR_SR_CLASSNAME_TABLE)
    {
      cstrack_assert (m_cstrack_array[CSECT_CT_OID_TABLE].m_enter_count == 0);
    }
}

The intent: never acquire LOCATOR_SR_CLASSNAME_TABLE while already holding CT_OID_TABLE. This is hard-coded ordering — the project knows from past deadlock investigations that one specific two-section cycle is forbidden, and the tracker enforces it. New csects can be added without touching the tracker; new ordering rules must be added by hand here.

The tracker also enforces re-enter counts: max 8 re-enters per section (MAX_REENTERS), reader re-enter only when the thread is already the writer or in a demoted state, writer re-enter only when the thread is already the writer. Anything else trips cstrack_assert (false). On task end, entry::end_resource_tracks checks that all csects are released, the alloc tracker is balanced, and the page-buffer fix list is empty — three independent “release-everything-you-grabbed” assertions.

How a SQL request becomes a worker task

Putting the pieces together: a CSQL request arrives at the broker, which forwards it to cub_server over the connection protocol. The server-side connection-state machine in src/connection/server_support.c calls thread_create_stats_worker_pool with name "transaction" at startup; for each incoming request, it constructs an entry_task and calls thread_get_manager()->push_task. The pool dispatches to a core by round-robin, the core hands the task to an available worker (or queues), the worker’s OS thread (spawned on demand) calls worker_impl::run, which calls init_run (claim entry, on_create), then execute_current_task (which calls task::execute (entry &) — the actual SQL path), then recycle_context, then get_new_task (next request or become_available). The thread runs until idle_timeout exhausts and then exits; the next request spawns it fresh.

sequenceDiagram
  participant CAS as broker (CAS)
  participant SS as server_support<br/>(connection thread)
  participant WP as transaction worker_pool
  participant W as worker_impl
  participant T as entry_task<br/>(query executor)

  CAS->>SS: request bytes
  SS->>WP: push_task (entry_task)
  WP->>WP: round-robin core
  WP->>W: assign_task (notify or spawn)
  W->>W: init_run -> claim_entry, on_create
  W->>T: execute (entry &)
  T-->>W: returned
  W->>W: retire_current_task; recycle_context
  W->>WP: get_task_or_become_available
  alt queued task waiting
    WP-->>W: return queued task
  else no task
    WP-->>W: register as available, wait on m_task_cv
  end

A vacuum block job, a parallel-query exchange, a recovery-redo chunk follow the same shape — they differ only in which pool’s push_task is called and which entry_task subclass is pushed. That uniformity is the whole point of the worker-pool template.

Source Walkthrough

Symbols grouped by subsystem.

`cubthread::entry` (per-thread context)

cubthread::entry — the class.
cubthread::entry::status — TS_DEAD/FREE/RUN/WAIT/CHECK enum.
thread_resume_suspend_status — the protocol enum (THREAD_LOCK_SUSPENDED, THREAD_PGBUF_SUSPENDED, …).
cubthread::entry::request_lock_free_transactions — wires tran_entries[THREAD_TS_*] to the lock-free transaction systems.
cubthread::entry::register_id / get_id / unregister_id — store / read / clear the OS thread id.
cubthread::entry::lock / unlock — pthread_mutex around th_entry_lock.
cubthread::entry::end_resource_tracks / push_resource_tracks / pop_resource_tracks — debug-only alloc/pgbuf/csect tracker hooks.
cubthread::entry::claim_system_worker / retire_system_worker — for daemons that need a system TDES.
cubthread::entry::assign_lf_tran_index / pull_lf_tran_index / get_lf_tran_index — new-style lock-free hashmap descriptor.
THREAD_TS_* enum values — slot identities for per-table lock-free transaction descriptors.
thread_suspend_wakeup_and_unlock_entry, thread_suspend_timeout_wakeup_and_unlock_entry, thread_wakeup, thread_wakeup_already_had_mutex, thread_check_suspend_reason_and_wakeup, thread_suspend_with_other_mutex — the C-callable suspend/wake API.
thread_type_to_string, thread_status_to_string, thread_resume_status_to_string — pretty-printers.

`cubthread::manager` (orchestrator)

cubthread::manager — the singleton class.
cubthread::initialize / cubthread::finalize — lifecycle.
cubthread::initialize_thread_entries — entry pool sizing and lock-free transaction system bring-up.
cubthread::get_manager, get_max_thread_count, get_entry, is_single_thread, check_not_single_thread — global accessors.
cubthread::manager::alloc_entries — allocate entry[m_max_threads] and the dispatcher.
cubthread::manager::init_entries — per-entry init, optional lock-free wiring.
cubthread::manager::init_lockfree_system — construct lockfree::tran::system.
cubthread::manager::create_worker_pool<Res> — template (see hpp).
cubthread::manager::destroy_worker_pool — pool-stop + entry release.
cubthread::manager::push_task / cubthread::manager::push_task_on_core — dispatch.
cubthread::manager::create_daemon / cubthread::manager::create_daemon_without_entry / cubthread::manager::destroy_daemon / cubthread::manager::destroy_daemon_without_entry.
cubthread::manager::set_max_thread_count_from_config — reads count_registry<connection>::total() + count_registry<worker_pool>::total() + count_registry<daemon>::total() + 1.
cubthread::manager::claim_entry / retire_entry — thread-local entry plumbing through tl_Entry_p.
cubthread::manager::find_by_tid — entry lookup.
cubthread::manager::map_entries — iteration helper used by SHOW handlers.
REGISTER_CONNECTION / REGISTER_WORKERPOOL / REGISTER_DAEMON macros (the count_registry API).
cubthread::is_logging_configured — runtime gate for _er_log_debug.

`cubthread::worker_pool` (worker pool template)

cubthread::worker_pool — abstract base.
cubthread::worker_pool::core — abstract base for partitions.
cubthread::worker_pool::core::worker — abstract base for individual workers.
cubthread::worker_pool_impl<bool Stats> — the template impl; worker_pool_type (no stats) and stats_worker_pool_type (stats) are the two used aliases.
cubthread::worker_pool_impl::initialize, cubthread::worker_pool_impl::execute, cubthread::worker_pool_impl::execute_on_core, cubthread::worker_pool_impl::warmup, cubthread::worker_pool_impl::stop_execution, cubthread::worker_pool_impl::is_running, cubthread::worker_pool_impl::get_worker_count, cubthread::worker_pool_impl::get_core_count — worker-pool API.
cubthread::worker_pool_impl::map_running_contexts / map_cores — debug iteration over running threads / cores.
cubthread::worker_pool_impl::core_impl — concrete partition with m_workers, m_available_workers, m_task_queue, m_temp_workers.
cubthread::worker_pool_impl::core_impl::execute_task — assign-or-enqueue.
cubthread::worker_pool_impl::core_impl::get_task_or_become_available — worker reentry.
cubthread::worker_pool_impl::core_impl::execute_task_as_temp — ad-hoc method/SP worker spawned outside the pool.
cubthread::worker_pool_impl::core_impl::worker_impl — concrete worker; owns m_context_p, m_wrapped_task, m_task_cv, m_task_mutex, m_stop, m_has_thread, m_is_temp.
cubthread::worker_pool_impl::core_impl::worker_impl::run — thread main.
cubthread::worker_pool_impl::core_impl::worker_impl::init_run / finish_run — context create / retire.
cubthread::worker_pool_impl::core_impl::worker_impl::assign_task / start_thread / get_new_task / execute_current_task / retire_current_task / stop_execution — worker control flow.
cubthread::worker_pool_impl::wrapped_task — task pointer + optional timing.
cubthread::worker_pool_impl::stats — per-worker counter definitions (start_thread, create_context, execute_task, retire_task, found_in_queue, wakeup_with_task, recycle_context, retire_context).
cubthread::worker_pool_task_capper — bounded-queue wrapper.
cubthread::system_core_count, cubthread::wp_handle_system_error, cubthread::wp_set_force_thread_always_alive, cubthread::wp_is_thread_always_alive_forced — utilities.

`cubthread::daemon` and `cubthread::looper`

cubthread::daemon — single-thread looper class.
cubthread::daemon::loop_with_context, cubthread::daemon::loop_without_context — thread main.
cubthread::daemon::wakeup, cubthread::daemon::stop_execution, cubthread::daemon::pause, cubthread::daemon::was_woken_up, cubthread::daemon::reset_looper, cubthread::daemon::is_running — control.
cubthread::daemon::get_stats, cubthread::daemon::get_stats_value_count, cubthread::daemon::get_stat_name — stats facade.
cubthread::looper — wait pattern.
cubthread::looper::wait_type — INF / FIXED / INCREASING / CUSTOM.
cubthread::looper::put_to_sleep — the called method.
cubthread::looper::reset — used by INCREASING_WAITS on wakeup.
cubthread::looper::stop, cubthread::looper::is_stopped — shutdown gate.
cubthread::looper::setup_fixed_waits, cubthread::looper::setup_infinite_wait, cubthread::looper::setup_increasing_waits — internal policy function bound at construction.
cubthread::waiter — the per-daemon mutex+condvar used by looper.
cubthread::condvar_wait — overload that handles wait_duration<D> (infinite flag).

`cubthread::task` and entry-task adapters

cubthread::task<Context> — abstract base for any task with execute(Context &) + retire().
cubthread::callable_task<Context> — std::function-backed task.
cubthread::task<void> / task_without_context / callable_task<void> — context-less specialisation.
cubthread::entry_task = task<entry> — the standard adapter.
cubthread::entry_callable_task = callable_task<entry>.
cubthread::entry_manager — create_context / retire_context / recycle_context / stop_execution.
cubthread::daemon_entry_manager — daemon-specific specialisation with on_daemon_create / on_daemon_retire.
cubthread::system_worker_entry_manager — pre-baked manager that sets entry::type and a system-tdes context.

Lock-free hash map

cubthread::lockfree_hashmap<Key, T> — facade over old/new impls.
cubthread::lockfree_hashmap::iterator.
cubthread::lockfree_hashmap::init, init_as_old, init_as_new, destroy.
cubthread::lockfree_hashmap::find, find_or_insert, insert, insert_given, erase, erase_locked, unlock, clear, freelist_claim, freelist_retire, start_tran, end_tran.
cubthread::lockfree_hashmap::is_old_type, cubthread::lockfree_hashmap::get_tran_entry — internal forwarding glue. Macros lockfree_hashmap_forward_func / lockfree_hashmap_forward_func_noarg route to old or new.
cubthread::get_thread_entry_lftransys — accessor for the new-style lock-free transaction system.
THREAD_TS_* slots in cubthread::entry::tran_entries — per-table descriptor identities (in thread_entry.hpp).

Critical sections

SYNC_CRITICAL_SECTION (struct) — RW gate state.
SYNC_RWLOCK, SYNC_RMUTEX (struct) — sibling primitives.
SYNC_STATS, SYNC_PRIMITIVE_TYPE, SYNC_TYPE_CSECT/RWLOCK/RMUTEX/MUTEX — common stat plumbing.
csect_initialize_static_critical_sections, csect_finalize_static_critical_sections — bring-up.
csect_initialize_critical_section, csect_finalize_critical_section — single-csect lifecycle.
csect_enter, csect_enter_as_reader, csect_enter_critical_section, csect_enter_critical_section_as_reader, csect_demote, csect_promote, csect_exit, csect_exit_critical_section — primary API.
csect_check_own — assertion helper.
csect_dump_statistics, csect_start_scan, csect_name_at — diagnostics.
rwlock_initialize, rwlock_finalize, rwlock_read_lock, rwlock_read_unlock, rwlock_write_lock, rwlock_write_unlock — non-reentrant RW lock variant.
rmutex_initialize, rmutex_finalize, rmutex_lock, rmutex_unlock — recursive mutex variant.
sync_initialize_sync_stats, sync_finalize_sync_stats, sync_dump_statistics — common stats.
cubsync::critical_section_tracker — per-thread debug tracker: start, stop, clear_all, on_enter_as_reader, on_enter_as_writer, on_promote, on_demote, on_exit, check_csect_interdependencies — the tracker API used in debug builds.

Position hints (as of this revision)

Symbol	File	Line
`cubthread::manager` (class)	`src/thread/thread_manager.hpp`	111
`cubthread::manager::push_task`	`src/thread/thread_manager.cpp`	157
`cubthread::manager::create_daemon`	`src/thread/thread_manager.cpp`	126
`cubthread::manager::create_worker_pool` (template)	`src/thread/thread_manager.hpp`	367
`cubthread::manager::set_max_thread_count_from_config`	`src/thread/thread_manager.cpp`	266
`cubthread::manager::claim_entry` / `retire_entry`	`src/thread/thread_manager.cpp`	234 / 242
`cubthread::initialize`	`src/thread/thread_manager.cpp`	315
`cubthread::initialize_thread_entries`	`src/thread/thread_manager.cpp`	378
`REGISTER_CONNECTION/WORKERPOOL/DAEMON` macros	`src/thread/thread_manager.hpp`	496–498
`cubthread::entry` (class)	`src/thread/thread_entry.hpp`	195
`cubthread::entry::status` enum	`src/thread/thread_entry.hpp`	202
`THREAD_TS_*` enum	`src/thread/thread_entry.hpp`	81
`thread_resume_suspend_status` enum	`src/thread/thread_entry.hpp`	139
`thread_type` enum	`src/thread/thread_entry.hpp`	124
`cubthread::entry` ctor	`src/thread/thread_entry.cpp`	78
`cubthread::entry::request_lock_free_transactions`	`src/thread/thread_entry.cpp`	220
`thread_suspend_wakeup_and_unlock_entry`	`src/thread/thread_entry.cpp`	497
`thread_wakeup_internal`	`src/thread/thread_entry.cpp`	600
`thread_suspend_with_other_mutex`	`src/thread/thread_entry.cpp`	688
`cubthread::worker_pool` (base)	`src/thread/thread_worker_pool.hpp`	54
`cubthread::worker_pool::core`	`src/thread/thread_worker_pool.hpp`	123
`cubthread::worker_pool::core::worker`	`src/thread/thread_worker_pool.hpp`	178
`cubthread::worker_pool_impl`	`src/thread/thread_worker_pool_impl.hpp`	105
`worker_pool_impl::core_impl`	`src/thread/thread_worker_pool_impl.hpp`	263
`core_impl::worker_impl`	`src/thread/thread_worker_pool_impl.hpp`	339
`core_impl::execute_task`	`src/thread/thread_worker_pool_impl.hpp`	896
`core_impl::get_task_or_become_available`	`src/thread/thread_worker_pool_impl.hpp`	1012
`core_impl::execute_task_as_temp`	`src/thread/thread_worker_pool_impl.hpp`	1194
`worker_impl::run`	`src/thread/thread_worker_pool_impl.hpp`	1387
`worker_impl::init_run` / `finish_run`	`src/thread/thread_worker_pool_impl.hpp`	1430 / 1457
`worker_impl::assign_task` (with task)	`src/thread/thread_worker_pool_impl.hpp`	1267
`worker_impl::get_new_task`	`src/thread/thread_worker_pool_impl.hpp`	1521
`worker_impl::stop_execution`	`src/thread/thread_worker_pool_impl.hpp`	1326
`worker_pool_impl::stop_execution`	`src/thread/thread_worker_pool_impl.hpp`	594
`worker_pool_task_capper`	`src/thread/thread_worker_pool_taskcap.hpp`	30
`cubthread::daemon`	`src/thread/thread_daemon.hpp`	87
`daemon::loop_with_context`	`src/thread/thread_daemon.cpp`	209
`daemon::loop_without_context`	`src/thread/thread_daemon.cpp`	248
`daemon::stop_execution`	`src/thread/thread_daemon.cpp`	90
`cubthread::looper`	`src/thread/thread_looper.hpp`	81
`looper::wait_type` enum	`src/thread/thread_looper.hpp`	132
`looper::put_to_sleep`	`src/thread/thread_looper.cpp`	119
`looper::setup_increasing_waits`	`src/thread/thread_looper.cpp`	208
`cubthread::lockfree_hashmap`	`src/thread/thread_lockfree_hash_map.hpp`	35
`lockfree_hashmap_forward_func` macro	`src/thread/thread_lockfree_hash_map.hpp`	162
`SYNC_CRITICAL_SECTION` struct	`src/thread/critical_section.h`	110
`CSECT_*` enum	`src/thread/critical_section.h`	57
`csect_Names[]`	`src/thread/critical_section.c`	76
`csect_initialize_static_critical_sections`	`src/thread/critical_section.c`	243
`csect_enter`	`src/thread/critical_section.c`	674
`csect_enter_critical_section`	`src/thread/critical_section.c`	474
`csect_enter_as_reader`	`src/thread/critical_section.c`	891
`cubsync::critical_section_tracker`	`src/thread/critical_section_tracker.hpp`	32
`critical_section_tracker::on_enter_as_reader`	`src/thread/critical_section_tracker.cpp`	60
`critical_section_tracker::check_csect_interdependencies`	`src/thread/critical_section_tracker.cpp`	188

Pool/daemon consumer call sites

Consumer	File	Line
`transaction` worker pool	`src/connection/server_support.c`	581
`vacuum` worker pool	`src/query/vacuum.c`	1342
`vacuum-master` daemon	`src/query/vacuum.c`	1352
`parallel-query` worker pool	`src/query/parallel/px_worker_manager_global.cpp`	69
`loaddb` worker pool	`src/loaddb/load_worker_manager.cpp`	124
`online-index` worker pool	`src/storage/btree_load.c`	5315
`backup-read` worker pool	`src/transaction/log_page_buffer.c`	7546
`recovery-redo` worker pool	`src/transaction/log_recovery_redo_parallel.cpp`	662
`log-checkpoint` daemon	`src/transaction/log_manager.c`	10415
`log-rm-archive` daemon	`src/transaction/log_manager.c`	10440
`log-clock` daemon	`src/transaction/log_manager.c`	10457
`ha-delay-check` daemon	`src/transaction/log_manager.c`	10482
`log-flush` daemon	`src/transaction/log_manager.c`	10500
`cdc-loginfo-producer` daemon	`src/transaction/log_manager.c`	14046
`deadlock-detect` daemon	`src/transaction/lock_manager.c`	5820
`dwb-flush-block` daemon	`src/storage/double_write_buffer.cpp`	4078
`dwb-file-sync` daemon	`src/storage/double_write_buffer.cpp`	4092
`pgbuf-maintain` daemon	`src/storage/page_buffer.c`	16538
`pgbuf-page-flush` daemon	`src/storage/page_buffer.c`	16556
`pgbuf-page-post-flush` daemon	`src/storage/page_buffer.c`	16580
`pgbuf-flush-control` daemon	`src/storage/page_buffer.c`	16604
`session-control` daemon	`src/session/session.c`	588
`pl-monitor` daemon	`src/sp/pl_sr.cpp`	266

Cross-check Notes

This document is the foundation for several others; it should be read in concert with them.

cubrid-vacuum.md (master + workers). Vacuum’s master is one of the daemons listed above; vacuum’s workers are one of the worker pools. The master is a cubthread::daemon with a fixed looper that, on each tick, scans the WAL forward and dispatches per-block jobs by calling thread_get_manager()->push_task on the vacuum pool. Every vacuum worker thread carries a vacuum_worker * on its cubthread::entry (the vacuum_Worker_entry_manager overrides on_create to attach it). Look there for the per-block job logic; everything in this document is the substrate.
cubrid-recovery-manager.md (parallel redo workers). Recovery’s parallel redo uses the recovery-redo worker pool registered in log_recovery_redo_parallel.cpp. The worker pool is constructed inside the recovery flow rather than at server startup, and is destroyed at the end of recovery — this is the one short-lived pool the manager hosts. Workers there pin a worker_pool_task_capper for backpressure on the redo job stream.
cubrid-page-buffer-manager.md (flush daemon, post-flush daemon, flush-control, maintain). Four daemons live in the page-buffer module, all created in pgbuf_initialize with fixed-period or infinite loopers. Their wakeup hints come from the page-buffer hot path: when the dirty count crosses pgbuf_flush_threshold, the buffer manager calls pgbuf_Page_flush_daemon->wakeup(). Look there for the flush-control feedback math; everything else (the daemon mechanism) is in this document.
cubrid-lock-manager.md (deadlock detector). The deadlock-detect daemon is in src/transaction/lock_manager.c with a fixed looper period of prm_get_integer_value (PRM_ID_DEADLOCK_DETECTION_INTERVAL_SECS). Its task scans the wait-for graph maintained in the lock table, which is itself a cubthread::lockfree_hashmap over LK_RES/LK_ENT slots (THREAD_TS_OBJ_LOCK_RES/THREAD_TS_OBJ_LOCK_ENT). The csect involved is CSECT_WFG.
cubrid-log-manager.md (log flush daemon, archive removal, CDC). Five daemons live in the log module: log-flush, log-checkpoint, log-rm-archive, log-clock, ha-delay-check, plus cdc-loginfo-producer. All are registered together with REGISTER_DAEMON(log_*) macros.
cubrid-2pc.md and cubrid-ha-replication.md. These cross to daemons that live outside src/thread/ proper (the master process’s heartbeat threads in src/executables/master_heartbeat.c use a different pattern — see cubrid-heartbeat.md for that). This document is strictly about the in-server thread system.

A few drift points worth flagging:

The entry class header explicitly says it is mid-refactor away from public fields. The TODO in src/thread/thread_entry.hpp (“make member variable private, remove content that does not belong here, migrate here thread entry related functionality from thread.c/h”) will not be finished in one sweep — expect new code to mix accessor calls and direct field access for some time.
The lock-free hashmap façade has two live implementations (OLD, NEW) selected by PRM_ID_ENABLE_NEW_LFHASH. The intent is clearly to retire the old lf_hash_table_cpp eventually; until then, the lockfree_hashmap_forward_func macro hides which is in use behind every call. New code that deviates from the macro will see the OLD-only path.
The csect set is fixed-size (CSECT_LAST ≈ 18). New csects must be appended to the enum and to csect_Names[], and a new cstrack_entry slot fits automatically into the per-thread tracker. The interdependency rule in check_csect_interdependencies is hard-coded against CSECT_LOCATOR_SR_CLASSNAME_TABLE ↔ CSECT_CT_OID_TABLE; any new partial ordering needs an entry there.
count_registry-driven sizing means a daemon or pool added without its REGISTER_* macro at file scope silently fails at create_* time (returns NULL because the entry pool is too small). The error is recoverable but easy to miss; the registration is the contract.

Open Questions

Why is the round-robin counter incremented modulo m_max_workers rather than m_cores.size()? The comment in get_round_robin_core_hash says “preserve assignments proportional to core size” but the modulus is the worker count not the core count. The arithmetic works out for evenly-sized cores; for unevenly-sized cores (worker_count not divisible by core_count), the dispatch is biased toward the first (remainder) cores.
What sets pool_threads=true in production? The flag bypasses the idle-timeout/exit path entirely. It is wired into loaddb and backup-read explicitly; the transaction pool uses the default. The perf-test override (wp_set_force_thread_always_alive) is enabled by PRM_ID_PERF_TEST_MODE. There may be additional implicit enablers I have not traced.
Can a worker context survive across very different tasks? recycle_context is called between tasks, so the entry’s trackers and TDES are reset, but private_heap_id persists. If a task allocates into the private heap and only frees on retire, a long-lived recycled entry could accumulate fragmentation. In practice the trackers assert this in debug builds; release-mode behaviour is silent.
Does the csect tracker have any production impact? ENABLE_TRACKERS is !NDEBUG && SERVER_MODE. The release path is statically dead. The interdependency check is debug-only. This means the only production protection against CSECT_LOCATOR_SR_CLASSNAME_TABLE ↔ CSECT_CT_OID_TABLE inversion is the codebase’s coding discipline, not a runtime assertion.
The intermediate m_temp_workers list and register_free_temp_list are designed for method/SP execution that needs a thread but should not consume a pool slot. Is there a hard cap on how many temp workers a single pool can spawn under load? On flooding, this could spawn unbounded OS threads. The taskcap wrapper helps but only for queued tasks, not for is_temp=true paths.
The “subtract execution time from period” detail in looper::put_to_sleep assumes m_start_execution_time is monotonic. It uses std::chrono::system_clock, which on Linux is wall-clock and can step backwards on NTP adjustment. A negative period - execution_time becomes delta_time(0), so the daemon does not sleep — fine. But a forward jump may make the daemon skip a tick. This is probably not material at second granularity but worth noting.

Sources

src/thread/thread_manager.cpp, src/thread/thread_manager.hpp — the manager singleton and its allocation, registration, and lifecycle.
src/thread/thread_entry.cpp, src/thread/thread_entry.hpp — cubthread::entry and the C-callable suspend/wake API.
src/thread/thread_worker_pool.hpp — the abstract base classes (worker_pool, core, worker).
src/thread/thread_worker_pool_impl.hpp, src/thread/thread_worker_pool_impl.cpp — the templated worker_pool_impl<bool Stats> and its core/worker implementations.
src/thread/thread_worker_pool_taskcap.hpp, src/thread/thread_worker_pool_taskcap.cpp — worker_pool_task_capper, the bounded-queue wrapper.
src/thread/thread_daemon.hpp, src/thread/thread_daemon.cpp — single-thread looper.
src/thread/thread_looper.hpp, src/thread/thread_looper.cpp — wait policy.
src/thread/thread_waiter.hpp, src/thread/thread_waiter.cpp — cubthread::waiter (the per-daemon mutex+condvar).
src/thread/thread_task.hpp, src/thread/thread_entry_task.hpp, src/thread/thread_entry_task.cpp — task<Context>, entry_task, entry_manager and friends.
src/thread/thread_lockfree_hash_map.hpp, src/thread/thread_lockfree_hash_map.cpp — cubthread::lockfree_hashmap facade.
src/thread/critical_section.h, src/thread/critical_section.c — SYNC_CRITICAL_SECTION and the csect API.
src/thread/critical_section_tracker.hpp, src/thread/critical_section_tracker.cpp — per-thread debug tracker.
Cross-references found via grep -nE 'create_worker_pool|thread_create_worker_pool|thread_create_stats_worker_pool|create_daemon':
- src/connection/server_support.c (transaction pool)
- src/query/vacuum.c (vacuum pool + master daemon)
- src/query/parallel/px_worker_manager_global.cpp (parallel-query pool)
- src/loaddb/load_worker_manager.cpp (loaddb pool)
- src/storage/btree_load.c (online-index pool)
- src/transaction/log_page_buffer.c (backup-read pool)
- src/transaction/log_recovery_redo_parallel.cpp (recovery-redo pool)
- src/transaction/log_manager.c (six log/HA/CDC daemons)
- src/transaction/lock_manager.c (deadlock-detect daemon)
- src/storage/double_write_buffer.cpp (two DWB daemons)
- src/storage/page_buffer.c (four pgbuf daemons)
- src/session/session.c (session-control daemon)
- src/sp/pl_sr.cpp (pl-monitor daemon)
Theory: Petrov, Database Internals, Ch. 14 “Concurrency Control”; Herlihy & Shavit, The Art of Multiprocessor Programming (Treiber stack, Michael–Scott queue, split-ordered hashmap); Maged M. Michael, High Performance Dynamic Lock-Free Hash Tables and List-Based Sets (SPAA 2002); the Java ConcurrentHashMap lineage for the new lock-free hashmap shape.