Skip to content

CUBRID Lock-Free Primitives — Overview, Two Generations, and Reclamation Spine

Contents:

A lock-free data structure is one in which at least one thread is guaranteed to make progress in any bounded number of steps. The definition is operational, not aesthetic: blocking a thread on a mutex is forbidden because a swapped-out lock-holder freezes everyone else; busy-spinning a CAS-loop on a contended word is allowed because the thread that just swapped the word forward did make progress, even if its competitors did not. The Art of Multiprocessor Programming (Herlihy & Shavit, ch. 3 §“Progress Conditions”) names the hierarchy: wait-free (every thread finishes in a bounded number of steps), lock-free (at least one thread finishes), and obstruction-free (any thread that runs in isolation finishes). Most production lock- free code is lock-free, not wait-free, because making it wait-free typically costs more in cache traffic than it saves in latency.

The hardware primitive every lock-free design composes from is some form of compare-and-swap. On x86 it is CMPXCHG (and the 16-byte variant CMPXCHG16B); on ARM it is the load-link / store-conditional pair (LDREX/STREX, or LSE atomics on ARMv8.1+); in C++11 these are abstracted as std::atomic<T>::compare_exchange_weak/strong. CAS gives a single-word atomic test-and-update; everything else — Treiber stacks, Michael–Scott queues, Harris linked lists, split-ordered hashes, RCU readers — is built on top by encoding the structure’s invariants into the value of the word being CAS’d.

Three problems separate lock-free in theory from lock-free in production code, and every CUBRID lock-free file in this analysis exists to solve some combination of them.

If a CAS reads value A, then a sibling thread changes it to B and back to A before the CAS retries, the CAS will succeed as if no change happened — even though the underlying object the pointer- shaped A referred to was destroyed and rebuilt in between. The canonical reproduction is a Treiber stack: thread 1 reads top → A, plans to CAS top from A to A->next; thread 2 pops A, pops B, frees both, allocates a new A' at the same address, pushes A' so top → A' with A'->next pointing somewhere unrelated; thread 1 wakes, CAS succeeds because top is A' which compares equal to the saved A, and top becomes a dangling pointer to free memory. The classical references are Treiber’s IBM Almaden report (1986), and Michael’s “Hazard Pointers” paper (Michael 2004, IEEE TPDS).

The ABA problem is a special case of a more general one: when can a node retired from a lock-free structure be returned to the allocator? Returning it too early gives ABA and use-after-free; returning it too late is a memory leak. A real lock-free system needs an epoch (some monotonic clock) and a way to determine “no thread holds a reference to a retired node older than epoch E.” There are four mainstream techniques:

TechniqueMechanicCost on read pathReclamation cost
Hazard pointers (Michael 2004)Each thread publishes the few pointers it is currently dereferencingO(1) atomic store per accessReader scans every thread’s hazard slots before freeing
Epoch-Based Reclamation / EBR (Fraser 2004)Each thread enters/exits an epoch on access; reclaimer waits until all threads cross the boundaryO(1) per critical sectionPeriodic global epoch advance
RCU (read-copy-update; McKenney 2001)Readers are unblocked; writers wait for a “grace period” that all readers transitEffectively zeroGrace-period detection cost
QSBR (quiescent-state-based)Each thread reports a “quiescent state” periodically; no node is freed until every thread has reported one since retirementZero on hot pathOne per-thread counter check at reclaim

CUBRID picks an EBR-flavoured scheme — its transaction-based reclamation described in §“CUBRID’s Approach” — and implements it once, in the lockfree::tran namespace. Every other lock-free structure (hashmap, freelist) plugs into it.

std::atomic<T> operations take a memory-order argument (memory_order_relaxed, acquire, release, acq_rel, seq_cst). Picking the wrong one is a defect that may not surface on x86 (which is close to sequentially consistent for word-sized loads and stores) but reorders into bugs on weak-memory architectures (ARM, POWER, RISC-V). The textbook account is C++ Concurrency in Action (Williams, ch. 5). The two anchor patterns are:

  • acquire-release synchronization. A release store on word W pairs with an acquire load on the same W; the loader then observes every store sequenced-before the releaser’s release. This is how a lock-free queue’s consumer observes the producer’s payload write.
  • read-modify-write (RMW) atomics. A successful compare_exchange is both an acquire and a release (when called with memory_order_acq_rel or stronger). It is the join point that publishes a writer’s prior stores and acquires the predecessor’s.

CUBRID’s lock-free code overwhelmingly uses default seq_cst ordering on std::atomic<T>, which is conservative but correct on every target architecture CUBRID supports. Tightening to acquire/release is a known optimization opportunity rather than a current invariant.

Every lock-free building block in CUBRID is one of three textbook shapes:

  • Lock-free linked list (Harris 2001 / Michael 2002). Insert is a single CAS that splices a new node in. Delete is a two-CAS protocol: first logically delete the node by setting a mark bit on its next pointer (so concurrent traversers know to skip it), then physically unlink it with a CAS on the predecessor’s next. Reads detect the mark and either help unlink or treat the node as gone. CUBRID’s hashmap chains are Harris-Michael lists; the mark bit is the low bit of the next pointer, encoded by lockfree::address_marker.
  • Lock-free FIFO queue (Michael & Scott 1996). The two-cursor variant: a producer cursor and a consumer cursor, each on its own atomic. CUBRID’s lockfree::circular_queue is a fixed- capacity bounded variant — closer in shape to LMAX Disruptor than to MS-queue — using cursor sequence numbers and per-slot block-flags.
  • Lock-free split-ordered hash (Shalev & Shavit 2003). The hash table is a single recursive lock-free linked list with bucket pointers acting as shortcuts; resize requires only inserting new bucket pointers, not relinking the data. CUBRID does not use split-ordered hashing — its hash table is a fixed-bucket-count chained design where each bucket points at a Harris-Michael list. Resize is not lock-free; the only resize-shaped operation is clear(), which atomic-swaps the bucket array with a back-buffer.

Three patterns name the design space the next two sections fill in.

Every modern DBMS pays for three lock-free responsibilities: inter-thread queues (the producer-consumer rendezvous that hands work between layers), shared associative tables (page buffers, lock tables, session caches keyed by some identifier), and per-thread free lists (so a transaction-private fast path does not bounce on a global allocator). The shapes converge.

The standard role: a daemon thread fills a queue, worker threads drain it (or vice versa). Variants:

  • Bounded MPMC ring buffer. Power-of-two capacity, two cursor atomics, sequence numbers per slot. Closest production exemplar: LMAX Disruptor. Used in InnoDB’s redo log writer ring, in PostgreSQL’s autovacuum worker queue, and in CUBRID’s lockfree::circular_queue (used by vacuum, page-buffer victim hand-off, and CDC log-info forwarding).
  • Unbounded Michael–Scott queue. Two atomic pointers head and tail, each node allocated on enqueue and freed on dequeue. Cleaner semantics but unbounded memory and mandatory per-op allocation.
  • Per-CPU SPMC / MPSC queues. One queue per CPU/thread group; steal across when local is empty (work-stealing). CUBRID does not use this pattern in the lock-free namespace.

Every DBMS has a lock table (one entry per resource being locked), a page buffer hash (one entry per resident page, keyed by (volid, pageid)), a session cache (one entry per client connection), and a statement cache (XASL, query plan). Every one of these wants concurrent insert / find / erase under heavy contention.

The shared engineering pattern:

  1. Bucket array of fixed size, hash(key) % size → bucket.
  2. Each bucket is the head of a lock-free linked list.
  3. List nodes carry the entry payload + a next pointer with the low bit reserved as a delete mark.
  4. Insert = walk to end of list, CAS bucket-or-tail next from NULL to new node.
  5. Delete = CAS-mark the victim’s next, then CAS-unlink it from the predecessor.
  6. Reclamation is deferred to an epoch / hazard-pointer / RCU / transactional scheme.

CUBRID’s lockfree::hashmap is exactly this shape, with one distinctive choice: per-entry pthread mutex, optional. When a descriptor declares using_mutex = 1, the hashmap takes the entry mutex during find/insert/delete and uses transactions only to guard the bucket chain against concurrent retirements; payload mutation is mutex-protected. When using_mutex = 0, the transaction itself is the only protection — payload reads must finish inside a transaction window.

Allocating a fresh list node for every insert and freeing on every delete is a malloc bottleneck even for a great allocator. The DBMS-typical answer is a per-data-structure free list of typed nodes — when an entry retires from the bucket chain, it does not go to free(), it goes to a free list keyed by the structure that owns it. Subsequent inserts pop from the free list rather than calling new. This is the role of lockfree::freelist<T>.

Two design wrinkles every implementation hits:

  • Contention on a single available list — the most-recent-pop cache line bounces between cores. Standard cure: a back-buffer block held aside, swapped wholesale into the available list when the available list drains, so the slow path of allocating a new block is hidden behind a one-CAS swap. CUBRID’s freelist does exactly this.
  • ABA on the head pointer — pop reads head, computes head->next, CAS-swaps head to head->next; in between, a sibling pops the same head and a third party pushes it back. The saved head->next is now stale. CUBRID’s freelist documents this hazard explicitly in pop_from_available() (lockfree_freelist.hpp) and relies on the transaction system rather than tagged pointers to bound the window.

The fourth shared piece — and the one CUBRID extracts most cleanly into its own class — is deferred reclamation tied to thread activity. Even if every consumer of a lock-free structure ships its own free list, somebody has to decide when a retired node can be physically reused. The shared template:

  1. A per-data-structure tranid counter; every retirement and every “I am about to read” event takes a snapshot.
  2. A per-thread descriptor, one per data structure, holding the thread’s last-fetched tranid and a list of “retired by me, not yet reclaimable” nodes.
  3. A periodic computation of min(active_tranid) across all threads.
  4. A node retired with tranid = X is reclaimable once min_active_tranid > X.

Compare this template to the Linux kernel’s RCU (read-side critical section bracketing a per-CPU counter, grace-period detection by a global counter chase) and it is the same shape, restricted to a per-DS scope rather than a system-wide one. CUBRID generalizes it as the lockfree::tran family and lets each lock-free structure own its own tran::table.

ConceptCUBRID name (legacy)CUBRID name (modern)
Per-DS reclamation epochLF_TRAN_SYSTEM::global_transaction_idlockfree::tran::table::m_global_tranid
Per-thread reader bracketlf_tran_start / lf_tran_enddescriptor::start_tran / end_tran
Per-thread retired listLF_TRAN_ENTRY::retired_listdescriptor::m_retired_head/tail
Min-active-epoch scanlf_tran_compute_minimum_transaction_idtable::compute_min_active_tranid
Refresh intervalLF_TRAN_SYSTEM::mati_refresh_interval = 100MATI_REFRESH_INTERVAL = 100
Per-thread index allocatorLF_TRAN_SYSTEM::lf_bitmap (LF_BITMAP)tran::system::m_tran_idx_map (bitmap)
Pointer-mark bitADDR_HAS_MARK / ADDR_WITH_MARK macroslockfree::address_marker<T> template
Bounded MPMC queuen/a (introduced as C++ first)lockfree::circular_queue<T>
Free-list of typed nodesLF_FREELISTlockfree::freelist<T>
Hash tableLF_HASH_TABLE / lf_hash_table_cpp<K,T>lockfree::hashmap<K,T>
Hash dispatch wrapper(was integrated into lf_hash_*)cubthread::lockfree_hashmap<K,T>
Bitmap allocatorLF_BITMAPlockfree::bitmap (same class, both names)

CUBRID’s lock-free namespace exists as two complete and co-existing implementations:

  • Legacy C — src/base/lock_free.{h,c}. The original family, shipped well before 2020. Procedural, pointer-table based, globally-instantiated LF_TRAN_SYSTEMs (one per consumer: obj_lock_res_Ts, sessions_Ts, xcache_Ts, etc.). The data structures are LF_HASH_TABLE, LF_FREELIST, plus the lf_io_list_* insert-only list, lf_stack_* Treiber stack, and the lf_list_* Harris-Michael primitives. The C++ shim lf_hash_table_cpp<K,T> (in the same header) wraps the C procedures behind a typed class.
  • Modern C++ — src/base/lockfree_*.{hpp,cpp}. Re-architected during the C++17 conversion. Templated, namespace-scoped (lockfree::*), reclamation broken out as lockfree::tran::*, address_marker<T> replacing the macro family, circular_queue<T> introduced as a new primitive, bitmap ported with both legacy typedef (LF_BITMAP) and modern name. Each new hashmap owns its own tran::table; there are no globally-instantiated reclamation systems.

The two coexist because the migration is opt-in per call site. The bridge lives in src/thread/thread_lockfree_hash_map.hpp (cubthread::lockfree_hashmap<K,T>), which holds both an old lf_hash_table_cpp<K,T> and a new lockfree::hashmap<K,T> and dispatches on a m_type ∈ {OLD, NEW, UNKNOWN} field. The choice is driven by the system parameter PRM_ID_ENABLE_NEW_LFHASH at init() time:

// cubthread::lockfree_hashmap::init — src/thread/thread_lockfree_hash_map.hpp
void init (lf_tran_system &transys, int entry_idx, int hash_size,
int freelist_block_size, int freelist_block_count, lf_entry_descriptor &edesc)
{
if (prm_get_bool_value (PRM_ID_ENABLE_NEW_LFHASH))
init_as_new (get_thread_entry_lftransys (), ...);
else
init_as_old (transys, hash_size, ...);
}

Operationally, a CUBRID server today still defaults to the legacy path unless enable_new_lfhash is set. The new path is the preferred forward direction; the legacy path remains for risk mitigation and for call sites that have not yet migrated.

The design split is intentional: every consumer of cubthread::lockfree_hashmap (lock manager’s resource and entry tables, session table, XASL cache, FP cache, slotted-page saving, catalog, global-unique-stats, hfid table, DWB slot table) gets the toggle for free; every other consumer (vacuum’s circular queue, page buffer’s victim queues, CDC’s log-info queue) is C++-only and sits on the modern stack unconditionally.

The modern stack is built around a single reclamation primitive that every lock-free data structure plugs into. Its three classes are:

// lockfree::tran::system — src/base/lockfree_transaction_system.hpp
class system {
size_t m_max_tran_per_table; // how many concurrent threads a table must accommodate
bitmap m_tran_idx_map; // index allocator (each thread gets one)
index assign_index ();
void free_index (index idx);
};
// lockfree::tran::table — src/base/lockfree_transaction_table.hpp
class table {
system &m_sys;
descriptor *m_all; // one descriptor per assigned index
std::atomic<id> m_global_tranid;
std::atomic<id> m_min_active_tranid;
static const id MATI_REFRESH_INTERVAL = 100;
// ...
};
// lockfree::tran::descriptor — src/base/lockfree_transaction_descriptor.hpp
class descriptor {
table *m_table;
id m_tranid; // INVALID_TRANID when not in a tran
reclaimable_node *m_retired_head, *m_retired_tail;
void start_tran ();
void start_tran_and_increment_id ();
void end_tran ();
void retire_node (reclaimable_node &);
void reclaim_retired_list ();
// ...
};

The protocol: a thread calls descriptor::start_tran() before touching any node in the data structure, finishes its work, then calls end_tran(). A retirement bumps the global counter and appends the retired node to the calling thread’s per-descriptor list. Periodically (every MATI_REFRESH_INTERVAL = 100 retirements) the table recomputes min(active_tranid) across descriptors. Any retired node whose recorded tranid is strictly less than the current min_active_tranid is reclaimable.

This mechanism is described exhaustively in cubrid-lockfree-transaction.md. For the overview it is enough to know that:

  • Every lock-free data structure (hashmap, freelist, circular_queue does not need it because it is bounded with no pointer-shaped retirement) owns its own tran::table — there is no global reclamation graph.
  • Every thread that may touch a lock-free structure is assigned a tran::index from the system’s bitmap, fixed for that thread’s lifetime.
  • The descriptor ring is dimensioned by the maximum number of concurrent threads (max_transaction_count), not by the rate of retirements; the retirement cost is O(1) on the descriptor’s list and O(N_threads) on the periodic min-scan.

The Harris-Michael delete protocol needs a way to set a “this node is logically deleted” flag without reserving extra space — the flag must travel inside the pointer itself. CUBRID encodes the flag in the low bit of the pointer (safe because T* is at minimum 2-byte aligned in the C++ ABI; in CUBRID’s case 8-byte aligned for pointer-shaped lock-free nodes). The legacy macros ADDR_HAS_MARK / ADDR_WITH_MARK / ADDR_STRIP_MARK in lock_free.c do this with raw casts; the modern lockfree::address_marker<T> template in lockfree_address_marker.hpp provides type-safe set_adress_mark / strip_address_mark / atomic_strip_address_mark static helpers and an instance form address_marker<T> that wraps a std::atomic<T*>.

Both are byte-compatible — they encode the same bit in the same position — so a node moving between code paths during the migration sees a stable representation.

Both generations need a way to assign a stable per-thread index for descriptor / entry lookup. lockfree::bitmap is the allocator: a chunked array of std::atomic<unsigned int> words, each bit representing a slot. get_entry() rotates a start_idx counter (round-robin in SERVER_MODE, last-allocated in client mode) to find a chunk with an empty bit, then CAS-flips the bit. free_entry() clears the bit. Two style modes (ONE_CHUNK for “must always succeed” usage like the transaction system, LIST_OF_CHUNKS with a usage threshold for usage-bounded contexts) are encoded in chunking_style. Full details: cubrid-lockfree-bitmap.md.

Every modern CUBRID hash table built from lockfree::hashmap<K,T> has six parts:

  1. A bucket array T **m_buckets[m_size]. Each entry is the head of a Harris-Michael chain.
  2. A back-buffer T **m_backbuffer[m_size] of marked-NULL pointers, used by the clear() operation: the live array is atomic-swapped with the back-buffer, the old array is then walked and every entry retired through the freelist.
  3. A per-table freelist of typed entries (freelist<freelist_node_data> where freelist_node_data carries the T m_entry plus the lf_entry_descriptor *m_edesc needed for on_reclaim).
  4. A per-table tran::table, owned by the freelist.
  5. An lf_entry_descriptor giving offsets and callbacks (f_hash, f_key_cmp, f_key_copy, f_init, f_uninit, f_duplicate, using_mutex, the of_* field offsets) — the sole shared type with the legacy code.
  6. A per-method statistics ring (atomic_counter_timer_stat) for find/insert/erase/unlock/clear/iter/claim/retire, activated by activate_stats().

The key-pointer interaction with type T is by byte offset, not by member access. The descriptor declares of_next, of_key, of_mutex, of_del_tran_id (legacy only), and the hashmap reads T*-shaped fields by (char*)p + offset casting. This is what makes lf_entry_descriptor the bridge: a single descriptor can drive both the legacy C path and the modern C++ path because both walk the same offsets.

Full details: cubrid-lockfree-hashmap.md.

The third structural piece is a fixed-capacity multi-producer multi-consumer queue. The implementation is a power-of-two slot array with two cursor atomics (m_produce_cursor, m_consume_cursor) and a per-slot block-flag word (m_blocked_cursors[N]). Producers CAS-acquire a slot, copy the payload in, then unblock the slot. Consumers detect that the slot is unblocked, copy the payload out, then CAS-bump the consume cursor. The implementation lives entirely in the header (lockfree_circular_queue.hpp) — no .cpp file beyond a stub. Full details: cubrid-lockfree-circular-queue.md.

The queue is unconditionally on the modern stack. It is used by:

  • Page buffer (src/storage/page_buffer.c) — waiter_threads_high_priority, waiter_threads_low_priority for fixers waiting on a victim, flushed_bcbs for post-flush processing, private_lrus_with_victims, big_private_lrus_with_victims, shared_lrus_with_victims for victim hand-off across LRU partitions.
  • Vacuum (src/query/vacuum.c) — vacuum_Block_data_buffer for log-block dispatch from the log-flush daemon to vacuum workers, vacuum_Finished_job_queue for completion-back notification.
  • CDC (src/transaction/log_manager.c) — cdc_Gl.loginfo_queue for CDC log-info forwarding.

Per-data-structure free list — freelist<T>

Section titled “Per-data-structure free list — freelist<T>”

The fourth piece is a typed free list. lockfree::freelist<T> takes a tran::system & and constructs its own tran::table. It maintains a single available-list head and a one-block back-buffer that swaps in lazily when the available list drains. Items are typed freelist<T>::free_node, which derives from tran::reclaimable_node so retirement plugs into the reclamation spine directly. The contract on T is that it expose an on_reclaim() method which the freelist invokes when the node crosses from “retired” to “reclaimed” — this is where typed cleanup runs (e.g., calling the lf_entry_descriptor::f_uninit callback from freelist_node_data::on_reclaim).

Full details: cubrid-lockfree-freelist.md.

A non-exhaustive list of in-engine consumers, by which generation they are pinned to:

ConsumerFileGeneration
Object-lock resource tablesrc/transaction/lock_manager.ctoggle (was obj_lock_res_Ts)
Object-lock entry tablesrc/transaction/lock_manager.ctoggle (was obj_lock_ent_Ts)
Session tablesrc/session/session.ctoggle (was sessions_Ts)
XASL cachesrc/query/xasl_cache.ctoggle (was xcache_Ts)
Filter-predicate cachesrc/query/filter_pred_cache.ctoggle (was fpcache_Ts)
Slotted-page savingsrc/storage/slotted_page.ctoggle (was spage_saving_Ts)
System catalogsrc/storage/system_catalog.ctoggle (was catalog_Ts)
Global unique statssrc/storage/btree.c (via log_manager)toggle (was global_unique_stats_Ts)
HFID tablesrc/storage/heap_file.ctoggle (was hfid_table_Ts)
DWB slot tablesrc/storage/double_write_buffer.cpptoggle (was dwb_slots_Ts)
Free-sort-list(legacy only)legacy (free_sort_list_Ts)
Vacuum block-data buffersrc/query/vacuum.cnew only (circular_queue)
Vacuum finished-job queuesrc/query/vacuum.cnew only (circular_queue)
Page-buffer victim/waiter queuessrc/storage/page_buffer.cnew only (circular_queue)
CDC log-info queuesrc/transaction/log_manager.cnew only (circular_queue)
Per-thread lf_tran_entry mapsrc/thread/thread_manager.cppboth (legacy via lf_tran_request_entry, new via lockfree::tran::system)
flowchart LR
  subgraph LegacyC["Legacy C (src/base/lock_free.{h,c})"]
    LFTS["LF_TRAN_SYSTEM<br/>per consumer (10 globals)"]
    LFFL["LF_FREELIST"]
    LFHT["LF_HASH_TABLE"]
    LFTS --> LFFL --> LFHT
    LFLI["lf_list_*<br/>Harris-Michael"]
    LFHT --> LFLI
    LFCPP["lf_hash_table_cpp&lt;K,T&gt;"]
    LFCPP --> LFHT
    LFCPP --> LFFL
  end

  subgraph ModernCpp["Modern C++ (src/base/lockfree_*)"]
    TS["lockfree::tran::system<br/>(holds bitmap)"]
    TT["lockfree::tran::table"]
    TD["lockfree::tran::descriptor[]"]
    TS --> TT
    TT --> TD
    BM["lockfree::bitmap"]
    AM["lockfree::address_marker&lt;T&gt;"]
    FL["lockfree::freelist&lt;T&gt;"]
    HM["lockfree::hashmap&lt;K,T&gt;"]
    CQ["lockfree::circular_queue&lt;T&gt;"]
    TS --> BM
    FL --> TT
    HM --> FL
    HM --> AM
  end

  Bridge["cubthread::lockfree_hashmap&lt;K,T&gt;<br/>m_type = OLD or NEW"]
  Bridge --> LFCPP
  Bridge --> HM

  PRM["PRM_ID_ENABLE_NEW_LFHASH"]
  PRM --> Bridge

Every std::atomic access in the modern stack uses the default memory_order_seq_cst unless the surrounding code uses an explicit compare_exchange_weak/strong (which is acq_rel by default on x86, seq_cst overall). The legacy stack uses CUBRID’s ATOMIC_INC_*, ATOMIC_CAS_ADDR, ATOMIC_TAS_* macros (defined in porting.h) which expand to GCC __atomic_* builtins with __ATOMIC_SEQ_CST and an explicit MEMORY_BARRIER() macro (__atomic_thread_fence (__ATOMIC_SEQ_CST)) on the lf_tran_* hot paths. There are two consequences:

  1. Correctness on weak-memory architectures is conservative but correct. A POWER or ARM build will pay full sync overhead on every CAS, but no defect is masked.
  2. Tightening to acquire/release is an open optimization. No measurement campaign has been run; the cost is obscured by the much heavier transaction-table min-scan on hot paths (lf_tran_compute_minimum_transaction_id).

The transactional reclamation scheme does not solve ABA for reads outside a transaction window. A thread that reads a pointer without first calling descriptor::start_tran() may observe a recycled node as if it were the original, with no detection. The discipline therefore is:

  • Every read on a hashmap chain must be inside a transaction. hashmap::list_find, list_insert_internal, list_delete all call tdes.start_tran() before any pointer dereference and tdes.end_tran() (or a mutex-acquire that supersedes it) before return.
  • The freelist’s pop_from_available() is the documented exception. The function carries a comment explicitly naming the ABA window:
    // todo: this is a dangerous preemption point; if I am preempted
    // here, and thread 2 comes and does:
    // - second thread gets same rhead and successfully moves m_available_list to next
    // - third thread gets next and successfully moves m_available_list to next->next
    // - second thread retires rhead. m_available_list becomes rhead and its next
    // becomes next->next
    // - I wake up, compare exchange m_available_list successfully because it is
    // rhead again, but next will become the item third thread already claimed.
    The hazard is bounded by the back-buffer pattern (a retired node spends a long time in the back-buffer before circling back) but is not eliminated. This is a known weakness; the current mitigation is the back-buffer time, not a tagged-pointer redesign.

How readers should navigate the rest of these docs

Section titled “How readers should navigate the rest of these docs”
Want to knowRead
The reclamation protocol from scratchcubrid-lockfree-transaction.md
Why two hash tables exist and how they share lf_entry_descriptorcubrid-lockfree-hashmap.md §“Two generations”
The Harris-Michael insert/delete walkcubrid-lockfree-hashmap.md §“List operations”
The MPMC ring buffer protocolcubrid-lockfree-circular-queue.md
The bitmap allocator used for transaction indexescubrid-lockfree-bitmap.md
The freelist back-buffer trickcubrid-lockfree-freelist.md
  • lockfree::tran::system (in lockfree_transaction_system.{hpp,cpp}) — index dispenser; owns m_tran_idx_map (a bitmap).
  • lockfree::tran::table (in lockfree_transaction_table.{hpp,cpp}) — per-DS reclamation table; owns m_global_tranid and m_min_active_tranid plus an array of descriptor. Calls compute_min_active_tranid() every MATI_REFRESH_INTERVAL (=100) global-id increments.
  • lockfree::tran::descriptor (in lockfree_transaction_descriptor.{hpp,cpp}) — per-thread per-DS state; one per (table, tran::index) pair.
  • lockfree::tran::reclaimable_node (in lockfree_transaction_reclaimable.hpp) — the base class every retirable node derives from. Provides m_retired_next, m_retire_tranid, and a virtual reclaim() (default delete this).
  • lockfree::address_marker<T> (in lockfree_address_marker.hpp) — typed wrapper for the low-bit pointer-mark trick.
  • lockfree::bitmap (in lockfree_bitmap.{hpp,cpp}) — chunked atomic bitfield allocator. Two chunking_style modes.
  • lockfree::freelist<T> (in lockfree_freelist.hpp, header-only template) — typed free list with back-buffer; constructs its own tran::table from a tran::system &.
  • lockfree::hashmap<K,T> (in lockfree_hashmap.hpp, header- only template; lockfree_hashmap.cpp is a stub) — chained hashmap with Harris-Michael chains, optional per-entry mutex, back-buffer-driven clear().
  • lockfree::circular_queue<T> (in lockfree_circular_queue.hpp, header-only) — bounded MPMC ring buffer with cursor sequence numbers and per-slot block flags.
  • LF_TRAN_SYSTEM, LF_TRAN_ENTRYlock_free.{h,c}. 11 global systems instantiated via lf_initialize_transaction_systems() at boot.
  • LF_FREELISTlock_free.{h,c}. Holds a single available stack; entries are linked through of_local_next.
  • LF_HASH_TABLE, lf_hash_*lock_free.{h,c}. The legacy hashmap. Entry layout is described by LF_ENTRY_DESCRIPTOR.
  • lf_list_find / lf_list_insert_internal / lf_list_delete — Harris-Michael chain walks called from lf_hash_*.
  • lf_io_list_find / lf_io_list_find_or_insert — insert-only list (no transaction system, no freelist).
  • lf_stack_push / lf_stack_pop — Treiber stack. Not currently used by any in-engine consumer; kept for historical reasons.
  • lf_hash_table_cpp<K,T> — C++ shim over lf_hash_*, gives the legacy stack a typed surface that matches the modern hashmap’s shape.
  • cubthread::lockfree_hashmap<K,T> (in src/thread/thread_lockfree_hash_map.{hpp,cpp}) — holds both lf_hash_table_cpp<K,T> and lockfree::hashmap<K,T>, dispatches via m_type ∈ {OLD, NEW, UNKNOWN} decided at init() time by prm_get_bool_value (PRM_ID_ENABLE_NEW_LFHASH).
  • cubthread::get_thread_entry_lftransys() — accessor for the per-thread-manager lockfree::tran::system shared by every new- style hashmap on a server.
SymbolFileLine
lockfree::tran::system::systemsrc/base/lockfree_transaction_system.cpp29
lockfree::tran::table::tablesrc/base/lockfree_transaction_table.cpp36
lockfree::tran::table::compute_min_active_tranidsrc/base/lockfree_transaction_table.cpp90
lockfree::tran::table::get_new_global_tranidsrc/base/lockfree_transaction_table.cpp73
lockfree::tran::descriptor::retire_nodesrc/base/lockfree_transaction_descriptor.cpp65
lockfree::tran::descriptor::reclaim_retired_listsrc/base/lockfree_transaction_descriptor.cpp133
lockfree::address_marker<T>src/base/lockfree_address_marker.hpp31
lockfree::bitmap::get_entry (lf_bitmap_get_entry)src/base/lockfree_bitmap.cpp138
lockfree::freelist<T>::claimsrc/base/lockfree_freelist.hpp291
lockfree::freelist<T>::pop_from_availablesrc/base/lockfree_freelist.hpp322
lockfree::hashmap<K,T>::findsrc/base/lockfree_hashmap.hpp280
lockfree::hashmap<K,T>::list_insert_internalsrc/base/lockfree_hashmap.hpp899
lockfree::hashmap<K,T>::list_deletesrc/base/lockfree_hashmap.hpp1111
lockfree::hashmap<K,T>::clearsrc/base/lockfree_hashmap.hpp378
lockfree::circular_queue<T>::producesrc/base/lockfree_circular_queue.hpp230
lockfree::circular_queue<T>::consumesrc/base/lockfree_circular_queue.hpp318
lf_tran_start (legacy)src/base/lock_free.c413
lf_tran_compute_minimum_transaction_id (legacy)src/base/lock_free.c379
cubthread::lockfree_hashmap::init (bridge)src/thread/thread_lockfree_hash_map.hpp129
  • Two co-existing implementations live in the tree. src/base/lock_free.{h,c} (legacy) and the src/base/lockfree_*.{hpp,cpp} family (modern) are both present and compiled. The bridge in src/thread/thread_lockfree_hash_map.hpp dispatches at init() time via prm_get_bool_value (PRM_ID_ENABLE_NEW_LFHASH) (verified by reading the source on 2026-05-07).
  • The minimum-active-tranid refresh interval is 100, hard-coded. Both LF_TRAN_SYSTEM::mati_refresh_interval (initialized to 100 in lf_tran_system_init) and the modern lockfree::tran::table::MATI_REFRESH_INTERVAL are 100. Not a runtime parameter; tuning requires a source change.
  • The legacy stack instantiates 11 named global LF_TRAN_SYSTEMs at boot. spage_saving_Ts, obj_lock_res_Ts, obj_lock_ent_Ts, catalog_Ts, sessions_Ts, free_sort_list_Ts, global_unique_stats_Ts, hfid_table_Ts, xcache_Ts, fpcache_Ts, dwb_slots_Ts. Each is one reclamation domain, populated in lf_initialize_transaction_systems. The modern stack has no global tran::system — each consumer builds its own.
  • The address-mark bit is the low bit of the pointer. Identical between legacy ADDR_* macros and modern lockfree::address_marker<T>::MARK = 0x1. Verified by direct comparison.
  • The pop_from_available ABA window is documented in source. See the multi-line comment in freelist<T>::pop_from_available (src/base/lockfree_freelist.hpp:336); the back-buffer time is the current mitigation, not a tagged pointer.
  • Statistics on the modern hashmap are opt-in. m_active_stats defaults to false; activate_stats() / deactivate_stats() toggle it. The legacy path has its own static lf_inserts, lf_deletes, etc. counters in lock_free.c, unconditional but only incremented under UNITTEST_LF.
  1. What fraction of in-engine consumers run under the modern path on a default-configured production server? The default for enable_new_lfhash is unverified in this doc; check prm_def[] in system_parameter.c and the release-notes history. If the default is false, the legacy path is still the production reality and the modern path is more “loaded gun” than “active engine.”
  2. Are any seq_cst accesses on the modern hot path measurably costly on ARM/POWER? No campaign run. Worth a perf-test on lockfree::hashmap find/insert/erase under contention with a profile-guided tightening to acquire/release where safe.
  3. Why is free_sort_list_Ts legacy-only? Migration not completed, intentionally? Or is this a candidate that simply nobody has touched yet? Check git history for the most recent commit on query_executor.c’s use of free_sort_list_Ts.
  4. The freelist’s pop_from_available ABA window — has it ever been hit in production? The hazard is documented but not bounded; a stress test that intentionally retires-and-re- acquires the same head with deliberate preemption (e.g., SIGSTOP on a worker) would tell us whether the back-buffer distance is empirically sufficient.

Beyond CUBRID — Comparative Designs & Research Frontiers

Section titled “Beyond CUBRID — Comparative Designs & Research Frontiers”
  • Hazard pointers vs. transactional reclamation. Michael’s hazard-pointer design (Michael 2004, IEEE TPDS) reclaims per-thread published pointers, not per-transaction; the cost is a per-access release store rather than a tran-bracket. A side-by-side cost comparison on CUBRID’s hashmap workload would tell us whether the MATI_REFRESH_INTERVAL = 100 amortization is empirically cheaper than per-access HP publication.
  • Userspace RCU (liburcu, Mathieu Desnoyers). Lock-free reads with bracketed rcu_read_lock/unlock, and writers defer reclamation to a grace period. CUBRID’s transactional reclamation is structurally a per-DS userspace RCU; the question is whether using liburcu (already a system library on most Linux distros) would simplify the engine’s lockfree::tran family without losing per-DS scoping.
  • Folly’s ConcurrentHashMap (Facebook open source). A production-grade C++ lock-free hash table with hazard-pointer reclamation and split-ordered semantics. Comparing it to lockfree::hashmap would tell us whether the clear()-via- back-buffer trick we use is a real cost or a cheap luxury.
  • LMAX Disruptor (Thompson et al., 2011). The canonical bounded MPMC ring; CUBRID’s lockfree::circular_queue is the same shape minus the batched-claim and waiting-strategy generalizations. Adopting batched claims would let DWB and page-buffer victim queues drain in chunks rather than one CAS per slot.
  • MERGE: a single lockfree::tran::system server-wide. Today every modern data structure builds its own tran::table from a server-wide tran::system (returned by cubthread::get_thread_entry_lftransys()). If a future refactor merged the legacy 11-system zoo into the same server-wide system, the legacy/modern distinction in consumer code would collapse and the cubthread::lockfree_hashmap bridge could retire.
  • Wait-free flat combining (Hendler et al., SPAA 2010). Replaces a CAS-loop hot spot with a per-thread publication record that a designated combiner thread sweeps. The technique pays off on rare-update / many-reader structures; the lock manager’s wait-for graph is a possible target.
  • Source files (in the CUBRID source tree at /data/hgryoo/references/cubrid/):
    • src/base/lock_free.h, src/base/lock_free.c
    • src/base/lockfree_transaction_def.hpp
    • src/base/lockfree_transaction_system.{hpp,cpp}
    • src/base/lockfree_transaction_table.{hpp,cpp}
    • src/base/lockfree_transaction_descriptor.{hpp,cpp}
    • src/base/lockfree_transaction_reclaimable.hpp
    • src/base/lockfree_address_marker.hpp
    • src/base/lockfree_bitmap.{hpp,cpp}
    • src/base/lockfree_freelist.hpp
    • src/base/lockfree_hashmap.{hpp,cpp}
    • src/base/lockfree_circular_queue.hpp
    • src/thread/thread_lockfree_hash_map.{hpp,cpp}
  • Textbook chapters cited:
    • Herlihy & Shavit, The Art of Multiprocessor Programming, ch. 3 §“Progress Conditions”, ch. 9–10 (lock-free linked lists), ch. 13 (concurrent queues), ch. 13 §“Bounded Lock-Free Queues”.
    • Williams, C++ Concurrency in Action, ch. 5 (memory model), ch. 7 (designing lock-free data structures).
  • Foundational papers:
    • Michael, M. M. “Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects.” IEEE TPDS 15(8), 2004.
    • Harris, T. L. “A Pragmatic Implementation of Non-Blocking Linked-Lists.” DISC 2001.
    • Michael, M. M., Scott, M. L. “Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms.” PODC 1996.
    • Shalev, O., Shavit, N. “Split-Ordered Lists: Lock-Free Extensible Hash Tables.” JACM 53(3), 2006.
    • Treiber, R. K. “Systems Programming: Coping with Parallelism.” IBM Almaden RJ 5118, 1986.
    • Fraser, K. “Practical Lock-Freedom.” Univ. of Cambridge PhD thesis (epoch-based reclamation), 2004.
  • Sibling docs in this repo:
    • cubrid-lockfree-transaction.md — the reclamation framework in depth.
    • cubrid-lockfree-hashmap.md — both hash table generations, with the migration narrative.
    • cubrid-lockfree-circular-queue.md — the bounded MPMC ring.
    • cubrid-lockfree-bitmap.md — the chunked atomic bitmap allocator.
    • cubrid-lockfree-freelist.md — the typed free list with back-buffer.
  • Adjacent code-analysis docs:
    • cubrid-thread-worker-pool.md — names how cubthread::entry::tran_entries[] and get_lf_tran_index() plug into the lock-free framework.
    • cubrid-mvcc.md, cubrid-lock-manager.md — primary consumers of lf_hash_* (legacy) / cubthread::lockfree_hashmap (bridged).