CUBRID Lock-Free Primitives — Overview, Two Generations, and Reclamation Spine

Contents:

Theoretical Background
Common DBMS Design
CUBRID’s Approach
Source Walkthrough
Source verification (as of 2026-05-07)
Beyond CUBRID — Comparative Designs & Research Frontiers
Sources

Theoretical Background

A lock-free data structure is one in which at least one thread is guaranteed to make progress in any bounded number of steps. The definition is operational, not aesthetic: blocking a thread on a mutex is forbidden because a swapped-out lock-holder freezes everyone else; busy-spinning a CAS-loop on a contended word is allowed because the thread that just swapped the word forward did make progress, even if its competitors did not. The Art of Multiprocessor Programming (Herlihy & Shavit, ch. 3 §“Progress Conditions”) names the hierarchy: wait-free (every thread finishes in a bounded number of steps), lock-free (at least one thread finishes), and obstruction-free (any thread that runs in isolation finishes). Most production lock- free code is lock-free, not wait-free, because making it wait-free typically costs more in cache traffic than it saves in latency.

The hardware primitive every lock-free design composes from is some form of compare-and-swap. On x86 it is CMPXCHG (and the 16-byte variant CMPXCHG16B); on ARM it is the load-link / store-conditional pair (LDREX/STREX, or LSE atomics on ARMv8.1+); in C++11 these are abstracted as std::atomic<T>::compare_exchange_weak/strong. CAS gives a single-word atomic test-and-update; everything else — Treiber stacks, Michael–Scott queues, Harris linked lists, split-ordered hashes, RCU readers — is built on top by encoding the structure’s invariants into the value of the word being CAS’d.

Three problems separate lock-free in theory from lock-free in production code, and every CUBRID lock-free file in this analysis exists to solve some combination of them.

1. The ABA problem

If a CAS reads value A, then a sibling thread changes it to B and back to A before the CAS retries, the CAS will succeed as if no change happened — even though the underlying object the pointer- shaped A referred to was destroyed and rebuilt in between. The canonical reproduction is a Treiber stack: thread 1 reads top → A, plans to CAS top from A to A->next; thread 2 pops A, pops B, frees both, allocates a new A' at the same address, pushes A' so top → A' with A'->next pointing somewhere unrelated; thread 1 wakes, CAS succeeds because top is A' which compares equal to the saved A, and top becomes a dangling pointer to free memory. The classical references are Treiber’s IBM Almaden report (1986), and Michael’s “Hazard Pointers” paper (Michael 2004, IEEE TPDS).

2. Safe memory reclamation

The ABA problem is a special case of a more general one: when can a node retired from a lock-free structure be returned to the allocator? Returning it too early gives ABA and use-after-free; returning it too late is a memory leak. A real lock-free system needs an epoch (some monotonic clock) and a way to determine “no thread holds a reference to a retired node older than epoch E.” There are four mainstream techniques:

Technique	Mechanic	Cost on read path	Reclamation cost
Hazard pointers (Michael 2004)	Each thread publishes the few pointers it is currently dereferencing	O(1) atomic store per access	Reader scans every thread’s hazard slots before freeing
Epoch-Based Reclamation / EBR (Fraser 2004)	Each thread enters/exits an epoch on access; reclaimer waits until all threads cross the boundary	O(1) per critical section	Periodic global epoch advance
RCU (read-copy-update; McKenney 2001)	Readers are unblocked; writers wait for a “grace period” that all readers transit	Effectively zero	Grace-period detection cost
QSBR (quiescent-state-based)	Each thread reports a “quiescent state” periodically; no node is freed until every thread has reported one since retirement	Zero on hot path	One per-thread counter check at reclaim

CUBRID picks an EBR-flavoured scheme — its transaction-based reclamation described in §“CUBRID’s Approach” — and implements it once, in the lockfree::tran namespace. Every other lock-free structure (hashmap, freelist) plugs into it.

3. Memory ordering

std::atomic<T> operations take a memory-order argument (memory_order_relaxed, acquire, release, acq_rel, seq_cst). Picking the wrong one is a defect that may not surface on x86 (which is close to sequentially consistent for word-sized loads and stores) but reorders into bugs on weak-memory architectures (ARM, POWER, RISC-V). The textbook account is C++ Concurrency in Action (Williams, ch. 5). The two anchor patterns are:

acquire-release synchronization. A release store on word W pairs with an acquire load on the same W; the loader then observes every store sequenced-before the releaser’s release. This is how a lock-free queue’s consumer observes the producer’s payload write.
read-modify-write (RMW) atomics. A successful compare_exchange is both an acquire and a release (when called with memory_order_acq_rel or stronger). It is the join point that publishes a writer’s prior stores and acquires the predecessor’s.

CUBRID’s lock-free code overwhelmingly uses default seq_cst ordering on std::atomic<T>, which is conservative but correct on every target architecture CUBRID supports. Tightening to acquire/release is a known optimization opportunity rather than a current invariant.

4. Three structural patterns

Every lock-free building block in CUBRID is one of three textbook shapes:

Lock-free linked list (Harris 2001 / Michael 2002). Insert is a single CAS that splices a new node in. Delete is a two-CAS protocol: first logically delete the node by setting a mark bit on its next pointer (so concurrent traversers know to skip it), then physically unlink it with a CAS on the predecessor’s next. Reads detect the mark and either help unlink or treat the node as gone. CUBRID’s hashmap chains are Harris-Michael lists; the mark bit is the low bit of the next pointer, encoded by lockfree::address_marker.
Lock-free FIFO queue (Michael & Scott 1996). The two-cursor variant: a producer cursor and a consumer cursor, each on its own atomic. CUBRID’s lockfree::circular_queue is a fixed- capacity bounded variant — closer in shape to LMAX Disruptor than to MS-queue — using cursor sequence numbers and per-slot block-flags.
Lock-free split-ordered hash (Shalev & Shavit 2003). The hash table is a single recursive lock-free linked list with bucket pointers acting as shortcuts; resize requires only inserting new bucket pointers, not relinking the data. CUBRID does not use split-ordered hashing — its hash table is a fixed-bucket-count chained design where each bucket points at a Harris-Michael list. Resize is not lock-free; the only resize-shaped operation is clear(), which atomic-swaps the bucket array with a back-buffer.

Three patterns name the design space the next two sections fill in.

Common DBMS Design

Every modern DBMS pays for three lock-free responsibilities: inter-thread queues (the producer-consumer rendezvous that hands work between layers), shared associative tables (page buffers, lock tables, session caches keyed by some identifier), and per-thread free lists (so a transaction-private fast path does not bounce on a global allocator). The shapes converge.

Inter-thread queue

The standard role: a daemon thread fills a queue, worker threads drain it (or vice versa). Variants:

Bounded MPMC ring buffer. Power-of-two capacity, two cursor atomics, sequence numbers per slot. Closest production exemplar: LMAX Disruptor. Used in InnoDB’s redo log writer ring, in PostgreSQL’s autovacuum worker queue, and in CUBRID’s lockfree::circular_queue (used by vacuum, page-buffer victim hand-off, and CDC log-info forwarding).
Unbounded Michael–Scott queue. Two atomic pointers head and tail, each node allocated on enqueue and freed on dequeue. Cleaner semantics but unbounded memory and mandatory per-op allocation.
Per-CPU SPMC / MPSC queues. One queue per CPU/thread group; steal across when local is empty (work-stealing). CUBRID does not use this pattern in the lock-free namespace.

Shared associative table

Every DBMS has a lock table (one entry per resource being locked), a page buffer hash (one entry per resident page, keyed by (volid, pageid)), a session cache (one entry per client connection), and a statement cache (XASL, query plan). Every one of these wants concurrent insert / find / erase under heavy contention.

The shared engineering pattern:

Bucket array of fixed size, hash(key) % size → bucket.
Each bucket is the head of a lock-free linked list.
List nodes carry the entry payload + a next pointer with the low bit reserved as a delete mark.
Insert = walk to end of list, CAS bucket-or-tail next from NULL to new node.
Delete = CAS-mark the victim’s next, then CAS-unlink it from the predecessor.
Reclamation is deferred to an epoch / hazard-pointer / RCU / transactional scheme.

CUBRID’s lockfree::hashmap is exactly this shape, with one distinctive choice: per-entry pthread mutex, optional. When a descriptor declares using_mutex = 1, the hashmap takes the entry mutex during find/insert/delete and uses transactions only to guard the bucket chain against concurrent retirements; payload mutation is mutex-protected. When using_mutex = 0, the transaction itself is the only protection — payload reads must finish inside a transaction window.

Per-thread free list

Allocating a fresh list node for every insert and freeing on every delete is a malloc bottleneck even for a great allocator. The DBMS-typical answer is a per-data-structure free list of typed nodes — when an entry retires from the bucket chain, it does not go to free(), it goes to a free list keyed by the structure that owns it. Subsequent inserts pop from the free list rather than calling new. This is the role of lockfree::freelist<T>.

Two design wrinkles every implementation hits:

Contention on a single available list — the most-recent-pop cache line bounces between cores. Standard cure: a back-buffer block held aside, swapped wholesale into the available list when the available list drains, so the slow path of allocating a new block is hidden behind a one-CAS swap. CUBRID’s freelist does exactly this.
ABA on the head pointer — pop reads head, computes head->next, CAS-swaps head to head->next; in between, a sibling pops the same head and a third party pushes it back. The saved head->next is now stale. CUBRID’s freelist documents this hazard explicitly in pop_from_available() (lockfree_freelist.hpp) and relies on the transaction system rather than tagged pointers to bound the window.

Transactional reclamation, generalized

The fourth shared piece — and the one CUBRID extracts most cleanly into its own class — is deferred reclamation tied to thread activity. Even if every consumer of a lock-free structure ships its own free list, somebody has to decide when a retired node can be physically reused. The shared template:

A per-data-structure tranid counter; every retirement and every “I am about to read” event takes a snapshot.
A per-thread descriptor, one per data structure, holding the thread’s last-fetched tranid and a list of “retired by me, not yet reclaimable” nodes.
A periodic computation of min(active_tranid) across all threads.
A node retired with tranid = X is reclaimable once min_active_tranid > X.

Compare this template to the Linux kernel’s RCU (read-side critical section bracketing a per-CPU counter, grace-period detection by a global counter chase) and it is the same shape, restricted to a per-DS scope rather than a system-wide one. CUBRID generalizes it as the lockfree::tran family and lets each lock-free structure own its own tran::table.

Theory ↔ CUBRID mapping

Concept	CUBRID name (legacy)	CUBRID name (modern)
Per-DS reclamation epoch	`LF_TRAN_SYSTEM::global_transaction_id`	`lockfree::tran::table::m_global_tranid`
Per-thread reader bracket	`lf_tran_start` / `lf_tran_end`	`descriptor::start_tran` / `end_tran`
Per-thread retired list	`LF_TRAN_ENTRY::retired_list`	`descriptor::m_retired_head/tail`
Min-active-epoch scan	`lf_tran_compute_minimum_transaction_id`	`table::compute_min_active_tranid`
Refresh interval	`LF_TRAN_SYSTEM::mati_refresh_interval = 100`	`MATI_REFRESH_INTERVAL = 100`
Per-thread index allocator	`LF_TRAN_SYSTEM::lf_bitmap` (`LF_BITMAP`)	`tran::system::m_tran_idx_map` (`bitmap`)
Pointer-mark bit	`ADDR_HAS_MARK` / `ADDR_WITH_MARK` macros	`lockfree::address_marker<T>` template
Bounded MPMC queue	n/a (introduced as C++ first)	`lockfree::circular_queue<T>`
Free-list of typed nodes	`LF_FREELIST`	`lockfree::freelist<T>`
Hash table	`LF_HASH_TABLE` / `lf_hash_table_cpp<K,T>`	`lockfree::hashmap<K,T>`
Hash dispatch wrapper	(was integrated into `lf_hash_*`)	`cubthread::lockfree_hashmap<K,T>`
Bitmap allocator	`LF_BITMAP`	`lockfree::bitmap` (same class, both names)

CUBRID’s Approach

Two generations, one engine, one toggle

CUBRID’s lock-free namespace exists as two complete and co-existing implementations:

Legacy C — src/base/lock_free.{h,c}. The original family, shipped well before 2020. Procedural, pointer-table based, globally-instantiated LF_TRAN_SYSTEMs (one per consumer: obj_lock_res_Ts, sessions_Ts, xcache_Ts, etc.). The data structures are LF_HASH_TABLE, LF_FREELIST, plus the lf_io_list_* insert-only list, lf_stack_* Treiber stack, and the lf_list_* Harris-Michael primitives. The C++ shim lf_hash_table_cpp<K,T> (in the same header) wraps the C procedures behind a typed class.
Modern C++ — src/base/lockfree_*.{hpp,cpp}. Re-architected during the C++17 conversion. Templated, namespace-scoped (lockfree::*), reclamation broken out as lockfree::tran::*, address_marker<T> replacing the macro family, circular_queue<T> introduced as a new primitive, bitmap ported with both legacy typedef (LF_BITMAP) and modern name. Each new hashmap owns its own tran::table; there are no globally-instantiated reclamation systems.

The two coexist because the migration is opt-in per call site. The bridge lives in src/thread/thread_lockfree_hash_map.hpp (cubthread::lockfree_hashmap<K,T>), which holds both an old lf_hash_table_cpp<K,T> and a new lockfree::hashmap<K,T> and dispatches on a m_type ∈ {OLD, NEW, UNKNOWN} field. The choice is driven by the system parameter PRM_ID_ENABLE_NEW_LFHASH at init() time:

// cubthread::lockfree_hashmap::init — src/thread/thread_lockfree_hash_map.hpp
void init (lf_tran_system &transys, int entry_idx, int hash_size,
           int freelist_block_size, int freelist_block_count, lf_entry_descriptor &edesc)
{
  if (prm_get_bool_value (PRM_ID_ENABLE_NEW_LFHASH))
    init_as_new (get_thread_entry_lftransys (), ...);
  else
    init_as_old (transys, hash_size, ...);
}

Operationally, a CUBRID server today still defaults to the legacy path unless enable_new_lfhash is set. The new path is the preferred forward direction; the legacy path remains for risk mitigation and for call sites that have not yet migrated.

The design split is intentional: every consumer of cubthread::lockfree_hashmap (lock manager’s resource and entry tables, session table, XASL cache, FP cache, slotted-page saving, catalog, global-unique-stats, hfid table, DWB slot table) gets the toggle for free; every other consumer (vacuum’s circular queue, page buffer’s victim queues, CDC’s log-info queue) is C++-only and sits on the modern stack unconditionally.

The reclamation spine

The modern stack is built around a single reclamation primitive that every lock-free data structure plugs into. Its three classes are:

// lockfree::tran::system — src/base/lockfree_transaction_system.hpp
class system {
  size_t m_max_tran_per_table;  // how many concurrent threads a table must accommodate
  bitmap m_tran_idx_map;        // index allocator (each thread gets one)
  index assign_index ();
  void  free_index (index idx);
};

// lockfree::tran::table — src/base/lockfree_transaction_table.hpp
class table {
  system &m_sys;
  descriptor *m_all;            // one descriptor per assigned index
  std::atomic<id> m_global_tranid;
  std::atomic<id> m_min_active_tranid;
  static const id MATI_REFRESH_INTERVAL = 100;
  // ...
};

// lockfree::tran::descriptor — src/base/lockfree_transaction_descriptor.hpp
class descriptor {
  table *m_table;
  id m_tranid;                  // INVALID_TRANID when not in a tran
  reclaimable_node *m_retired_head, *m_retired_tail;
  void start_tran ();
  void start_tran_and_increment_id ();
  void end_tran ();
  void retire_node (reclaimable_node &);
  void reclaim_retired_list ();
  // ...
};

The protocol: a thread calls descriptor::start_tran() before touching any node in the data structure, finishes its work, then calls end_tran(). A retirement bumps the global counter and appends the retired node to the calling thread’s per-descriptor list. Periodically (every MATI_REFRESH_INTERVAL = 100 retirements) the table recomputes min(active_tranid) across descriptors. Any retired node whose recorded tranid is strictly less than the current min_active_tranid is reclaimable.

This mechanism is described exhaustively in cubrid-lockfree-transaction.md. For the overview it is enough to know that:

Every lock-free data structure (hashmap, freelist, circular_queue does not need it because it is bounded with no pointer-shaped retirement) owns its own tran::table — there is no global reclamation graph.
Every thread that may touch a lock-free structure is assigned a tran::index from the system’s bitmap, fixed for that thread’s lifetime.
The descriptor ring is dimensioned by the maximum number of concurrent threads (max_transaction_count), not by the rate of retirements; the retirement cost is O(1) on the descriptor’s list and O(N_threads) on the periodic min-scan.

Pointer marking — `address_marker<T>`

The Harris-Michael delete protocol needs a way to set a “this node is logically deleted” flag without reserving extra space — the flag must travel inside the pointer itself. CUBRID encodes the flag in the low bit of the pointer (safe because T* is at minimum 2-byte aligned in the C++ ABI; in CUBRID’s case 8-byte aligned for pointer-shaped lock-free nodes). The legacy macros ADDR_HAS_MARK / ADDR_WITH_MARK / ADDR_STRIP_MARK in lock_free.c do this with raw casts; the modern lockfree::address_marker<T> template in lockfree_address_marker.hpp provides type-safe set_adress_mark / strip_address_mark / atomic_strip_address_mark static helpers and an instance form address_marker<T> that wraps a std::atomic<T*>.

Both are byte-compatible — they encode the same bit in the same position — so a node moving between code paths during the migration sees a stable representation.

Per-thread index allocation — `bitmap`

Both generations need a way to assign a stable per-thread index for descriptor / entry lookup. lockfree::bitmap is the allocator: a chunked array of std::atomic<unsigned int> words, each bit representing a slot. get_entry() rotates a start_idx counter (round-robin in SERVER_MODE, last-allocated in client mode) to find a chunk with an empty bit, then CAS-flips the bit. free_entry() clears the bit. Two style modes (ONE_CHUNK for “must always succeed” usage like the transaction system, LIST_OF_CHUNKS with a usage threshold for usage-bounded contexts) are encoded in chunking_style. Full details: cubrid-lockfree-bitmap.md.

Hash map structure

Every modern CUBRID hash table built from lockfree::hashmap<K,T> has six parts:

A bucket array T **m_buckets[m_size]. Each entry is the head of a Harris-Michael chain.
A back-buffer T **m_backbuffer[m_size] of marked-NULL pointers, used by the clear() operation: the live array is atomic-swapped with the back-buffer, the old array is then walked and every entry retired through the freelist.
A per-table freelist of typed entries (freelist<freelist_node_data> where freelist_node_data carries the T m_entry plus the lf_entry_descriptor *m_edesc needed for on_reclaim).
A per-table tran::table, owned by the freelist.
An lf_entry_descriptor giving offsets and callbacks (f_hash, f_key_cmp, f_key_copy, f_init, f_uninit, f_duplicate, using_mutex, the of_* field offsets) — the sole shared type with the legacy code.
A per-method statistics ring (atomic_counter_timer_stat) for find/insert/erase/unlock/clear/iter/claim/retire, activated by activate_stats().

The key-pointer interaction with type T is by byte offset, not by member access. The descriptor declares of_next, of_key, of_mutex, of_del_tran_id (legacy only), and the hashmap reads T*-shaped fields by (char*)p + offset casting. This is what makes lf_entry_descriptor the bridge: a single descriptor can drive both the legacy C path and the modern C++ path because both walk the same offsets.

Full details: cubrid-lockfree-hashmap.md.

Bounded MPMC queue — `circular_queue<T>`

The third structural piece is a fixed-capacity multi-producer multi-consumer queue. The implementation is a power-of-two slot array with two cursor atomics (m_produce_cursor, m_consume_cursor) and a per-slot block-flag word (m_blocked_cursors[N]). Producers CAS-acquire a slot, copy the payload in, then unblock the slot. Consumers detect that the slot is unblocked, copy the payload out, then CAS-bump the consume cursor. The implementation lives entirely in the header (lockfree_circular_queue.hpp) — no .cpp file beyond a stub. Full details: cubrid-lockfree-circular-queue.md.

The queue is unconditionally on the modern stack. It is used by:

Page buffer (src/storage/page_buffer.c) — waiter_threads_high_priority, waiter_threads_low_priority for fixers waiting on a victim, flushed_bcbs for post-flush processing, private_lrus_with_victims, big_private_lrus_with_victims, shared_lrus_with_victims for victim hand-off across LRU partitions.
Vacuum (src/query/vacuum.c) — vacuum_Block_data_buffer for log-block dispatch from the log-flush daemon to vacuum workers, vacuum_Finished_job_queue for completion-back notification.
CDC (src/transaction/log_manager.c) — cdc_Gl.loginfo_queue for CDC log-info forwarding.

Per-data-structure free list — `freelist<T>`

The fourth piece is a typed free list. lockfree::freelist<T> takes a tran::system & and constructs its own tran::table. It maintains a single available-list head and a one-block back-buffer that swaps in lazily when the available list drains. Items are typed freelist<T>::free_node, which derives from tran::reclaimable_node so retirement plugs into the reclamation spine directly. The contract on T is that it expose an on_reclaim() method which the freelist invokes when the node crosses from “retired” to “reclaimed” — this is where typed cleanup runs (e.g., calling the lf_entry_descriptor::f_uninit callback from freelist_node_data::on_reclaim).

Full details: cubrid-lockfree-freelist.md.

Map of consumers

A non-exhaustive list of in-engine consumers, by which generation they are pinned to:

Consumer	File	Generation
Object-lock resource table	`src/transaction/lock_manager.c`	toggle (was `obj_lock_res_Ts`)
Object-lock entry table	`src/transaction/lock_manager.c`	toggle (was `obj_lock_ent_Ts`)
Session table	`src/session/session.c`	toggle (was `sessions_Ts`)
XASL cache	`src/query/xasl_cache.c`	toggle (was `xcache_Ts`)
Filter-predicate cache	`src/query/filter_pred_cache.c`	toggle (was `fpcache_Ts`)
Slotted-page saving	`src/storage/slotted_page.c`	toggle (was `spage_saving_Ts`)
System catalog	`src/storage/system_catalog.c`	toggle (was `catalog_Ts`)
Global unique stats	`src/storage/btree.c` (via log_manager)	toggle (was `global_unique_stats_Ts`)
HFID table	`src/storage/heap_file.c`	toggle (was `hfid_table_Ts`)
DWB slot table	`src/storage/double_write_buffer.cpp`	toggle (was `dwb_slots_Ts`)
Free-sort-list	(legacy only)	legacy (`free_sort_list_Ts`)
Vacuum block-data buffer	`src/query/vacuum.c`	new only (`circular_queue`)
Vacuum finished-job queue	`src/query/vacuum.c`	new only (`circular_queue`)
Page-buffer victim/waiter queues	`src/storage/page_buffer.c`	new only (`circular_queue`)
CDC log-info queue	`src/transaction/log_manager.c`	new only (`circular_queue`)
Per-thread `lf_tran_entry` map	`src/thread/thread_manager.cpp`	both (legacy via `lf_tran_request_entry`, new via `lockfree::tran::system`)

Component overview

flowchart LR
  subgraph LegacyC["Legacy C (src/base/lock_free.{h,c})"]
    LFTS["LF_TRAN_SYSTEM<br/>per consumer (10 globals)"]
    LFFL["LF_FREELIST"]
    LFHT["LF_HASH_TABLE"]
    LFTS --> LFFL --> LFHT
    LFLI["lf_list_*<br/>Harris-Michael"]
    LFHT --> LFLI
    LFCPP["lf_hash_table_cpp&lt;K,T&gt;"]
    LFCPP --> LFHT
    LFCPP --> LFFL
  end

  subgraph ModernCpp["Modern C++ (src/base/lockfree_*)"]
    TS["lockfree::tran::system<br/>(holds bitmap)"]
    TT["lockfree::tran::table"]
    TD["lockfree::tran::descriptor[]"]
    TS --> TT
    TT --> TD
    BM["lockfree::bitmap"]
    AM["lockfree::address_marker&lt;T&gt;"]
    FL["lockfree::freelist&lt;T&gt;"]
    HM["lockfree::hashmap&lt;K,T&gt;"]
    CQ["lockfree::circular_queue&lt;T&gt;"]
    TS --> BM
    FL --> TT
    HM --> FL
    HM --> AM
  end

  Bridge["cubthread::lockfree_hashmap&lt;K,T&gt;<br/>m_type = OLD or NEW"]
  Bridge --> LFCPP
  Bridge --> HM

  PRM["PRM_ID_ENABLE_NEW_LFHASH"]
  PRM --> Bridge

Memory ordering posture

Every std::atomic access in the modern stack uses the default memory_order_seq_cst unless the surrounding code uses an explicit compare_exchange_weak/strong (which is acq_rel by default on x86, seq_cst overall). The legacy stack uses CUBRID’s ATOMIC_INC_*, ATOMIC_CAS_ADDR, ATOMIC_TAS_* macros (defined in porting.h) which expand to GCC __atomic_* builtins with __ATOMIC_SEQ_CST and an explicit MEMORY_BARRIER() macro (__atomic_thread_fence (__ATOMIC_SEQ_CST)) on the lf_tran_* hot paths. There are two consequences:

Correctness on weak-memory architectures is conservative but correct. A POWER or ARM build will pay full sync overhead on every CAS, but no defect is masked.
Tightening to acquire/release is an open optimization. No measurement campaign has been run; the cost is obscured by the much heavier transaction-table min-scan on hot paths (lf_tran_compute_minimum_transaction_id).

Posture against ABA

The transactional reclamation scheme does not solve ABA for reads outside a transaction window. A thread that reads a pointer without first calling descriptor::start_tran() may observe a recycled node as if it were the original, with no detection. The discipline therefore is:

Every read on a hashmap chain must be inside a transaction. hashmap::list_find, list_insert_internal, list_delete all call tdes.start_tran() before any pointer dereference and tdes.end_tran() (or a mutex-acquire that supersedes it) before return.

The freelist’s pop_from_available() is the documented exception. The function carries a comment explicitly naming the ABA window:

// todo: this is a dangerous preemption point; if I am preempted
// here, and thread 2 comes and does:
//   - second thread gets same rhead and successfully moves m_available_list to next
//   - third thread gets next and successfully moves m_available_list to next->next
//   - second thread retires rhead. m_available_list becomes rhead and its next
//     becomes next->next
//   - I wake up, compare exchange m_available_list successfully because it is
//     rhead again, but next will become the item third thread already claimed.

The hazard is bounded by the back-buffer pattern (a retired node spends a long time in the back-buffer before circling back) but is not eliminated. This is a known weakness; the current mitigation is the back-buffer time, not a tagged-pointer redesign.

How readers should navigate the rest of these docs

Want to know	Read
The reclamation protocol from scratch	`cubrid-lockfree-transaction.md`
Why two hash tables exist and how they share `lf_entry_descriptor`	`cubrid-lockfree-hashmap.md` §“Two generations”
The Harris-Michael insert/delete walk	`cubrid-lockfree-hashmap.md` §“List operations”
The MPMC ring buffer protocol	`cubrid-lockfree-circular-queue.md`
The bitmap allocator used for transaction indexes	`cubrid-lockfree-bitmap.md`
The freelist back-buffer trick	`cubrid-lockfree-freelist.md`

Source Walkthrough

Modern C++ stack

lockfree::tran::system (in lockfree_transaction_system.{hpp,cpp}) — index dispenser; owns m_tran_idx_map (a bitmap).
lockfree::tran::table (in lockfree_transaction_table.{hpp,cpp}) — per-DS reclamation table; owns m_global_tranid and m_min_active_tranid plus an array of descriptor. Calls compute_min_active_tranid() every MATI_REFRESH_INTERVAL (=100) global-id increments.
lockfree::tran::descriptor (in lockfree_transaction_descriptor.{hpp,cpp}) — per-thread per-DS state; one per (table, tran::index) pair.
lockfree::tran::reclaimable_node (in lockfree_transaction_reclaimable.hpp) — the base class every retirable node derives from. Provides m_retired_next, m_retire_tranid, and a virtual reclaim() (default delete this).
lockfree::address_marker<T> (in lockfree_address_marker.hpp) — typed wrapper for the low-bit pointer-mark trick.
lockfree::bitmap (in lockfree_bitmap.{hpp,cpp}) — chunked atomic bitfield allocator. Two chunking_style modes.
lockfree::freelist<T> (in lockfree_freelist.hpp, header-only template) — typed free list with back-buffer; constructs its own tran::table from a tran::system &.
lockfree::hashmap<K,T> (in lockfree_hashmap.hpp, header- only template; lockfree_hashmap.cpp is a stub) — chained hashmap with Harris-Michael chains, optional per-entry mutex, back-buffer-driven clear().
lockfree::circular_queue<T> (in lockfree_circular_queue.hpp, header-only) — bounded MPMC ring buffer with cursor sequence numbers and per-slot block flags.

Legacy C stack

LF_TRAN_SYSTEM, LF_TRAN_ENTRY — lock_free.{h,c}. 11 global systems instantiated via lf_initialize_transaction_systems() at boot.
LF_FREELIST — lock_free.{h,c}. Holds a single available stack; entries are linked through of_local_next.
LF_HASH_TABLE, lf_hash_* — lock_free.{h,c}. The legacy hashmap. Entry layout is described by LF_ENTRY_DESCRIPTOR.
lf_list_find / lf_list_insert_internal / lf_list_delete — Harris-Michael chain walks called from lf_hash_*.
lf_io_list_find / lf_io_list_find_or_insert — insert-only list (no transaction system, no freelist).
lf_stack_push / lf_stack_pop — Treiber stack. Not currently used by any in-engine consumer; kept for historical reasons.
lf_hash_table_cpp<K,T> — C++ shim over lf_hash_*, gives the legacy stack a typed surface that matches the modern hashmap’s shape.

Bridge

cubthread::lockfree_hashmap<K,T> (in src/thread/thread_lockfree_hash_map.{hpp,cpp}) — holds both lf_hash_table_cpp<K,T> and lockfree::hashmap<K,T>, dispatches via m_type ∈ {OLD, NEW, UNKNOWN} decided at init() time by prm_get_bool_value (PRM_ID_ENABLE_NEW_LFHASH).
cubthread::get_thread_entry_lftransys() — accessor for the per-thread-manager lockfree::tran::system shared by every new- style hashmap on a server.

Position hints (as of 2026-05-07)

Symbol	File	Line
`lockfree::tran::system::system`	`src/base/lockfree_transaction_system.cpp`	29
`lockfree::tran::table::table`	`src/base/lockfree_transaction_table.cpp`	36
`lockfree::tran::table::compute_min_active_tranid`	`src/base/lockfree_transaction_table.cpp`	90
`lockfree::tran::table::get_new_global_tranid`	`src/base/lockfree_transaction_table.cpp`	73
`lockfree::tran::descriptor::retire_node`	`src/base/lockfree_transaction_descriptor.cpp`	65
`lockfree::tran::descriptor::reclaim_retired_list`	`src/base/lockfree_transaction_descriptor.cpp`	133
`lockfree::address_marker<T>`	`src/base/lockfree_address_marker.hpp`	31
`lockfree::bitmap::get_entry` (`lf_bitmap_get_entry`)	`src/base/lockfree_bitmap.cpp`	138
`lockfree::freelist<T>::claim`	`src/base/lockfree_freelist.hpp`	291
`lockfree::freelist<T>::pop_from_available`	`src/base/lockfree_freelist.hpp`	322
`lockfree::hashmap<K,T>::find`	`src/base/lockfree_hashmap.hpp`	280
`lockfree::hashmap<K,T>::list_insert_internal`	`src/base/lockfree_hashmap.hpp`	899
`lockfree::hashmap<K,T>::list_delete`	`src/base/lockfree_hashmap.hpp`	1111
`lockfree::hashmap<K,T>::clear`	`src/base/lockfree_hashmap.hpp`	378
`lockfree::circular_queue<T>::produce`	`src/base/lockfree_circular_queue.hpp`	230
`lockfree::circular_queue<T>::consume`	`src/base/lockfree_circular_queue.hpp`	318
`lf_tran_start` (legacy)	`src/base/lock_free.c`	413
`lf_tran_compute_minimum_transaction_id` (legacy)	`src/base/lock_free.c`	379
`cubthread::lockfree_hashmap::init` (bridge)	`src/thread/thread_lockfree_hash_map.hpp`	129

Source verification (as of 2026-05-07)

Verified facts

Two co-existing implementations live in the tree. src/base/lock_free.{h,c} (legacy) and the src/base/lockfree_*.{hpp,cpp} family (modern) are both present and compiled. The bridge in src/thread/thread_lockfree_hash_map.hpp dispatches at init() time via prm_get_bool_value (PRM_ID_ENABLE_NEW_LFHASH) (verified by reading the source on 2026-05-07).
The minimum-active-tranid refresh interval is 100, hard-coded. Both LF_TRAN_SYSTEM::mati_refresh_interval (initialized to 100 in lf_tran_system_init) and the modern lockfree::tran::table::MATI_REFRESH_INTERVAL are 100. Not a runtime parameter; tuning requires a source change.
The legacy stack instantiates 11 named global LF_TRAN_SYSTEMs at boot. spage_saving_Ts, obj_lock_res_Ts, obj_lock_ent_Ts, catalog_Ts, sessions_Ts, free_sort_list_Ts, global_unique_stats_Ts, hfid_table_Ts, xcache_Ts, fpcache_Ts, dwb_slots_Ts. Each is one reclamation domain, populated in lf_initialize_transaction_systems. The modern stack has no global tran::system — each consumer builds its own.
The address-mark bit is the low bit of the pointer. Identical between legacy ADDR_* macros and modern lockfree::address_marker<T>::MARK = 0x1. Verified by direct comparison.
The pop_from_available ABA window is documented in source. See the multi-line comment in freelist<T>::pop_from_available (src/base/lockfree_freelist.hpp:336); the back-buffer time is the current mitigation, not a tagged pointer.
Statistics on the modern hashmap are opt-in. m_active_stats defaults to false; activate_stats() / deactivate_stats() toggle it. The legacy path has its own static lf_inserts, lf_deletes, etc. counters in lock_free.c, unconditional but only incremented under UNITTEST_LF.

Open questions

What fraction of in-engine consumers run under the modern path on a default-configured production server? The default for enable_new_lfhash is unverified in this doc; check prm_def[] in system_parameter.c and the release-notes history. If the default is false, the legacy path is still the production reality and the modern path is more “loaded gun” than “active engine.”
Are any seq_cst accesses on the modern hot path measurably costly on ARM/POWER? No campaign run. Worth a perf-test on lockfree::hashmap find/insert/erase under contention with a profile-guided tightening to acquire/release where safe.
Why is free_sort_list_Ts legacy-only? Migration not completed, intentionally? Or is this a candidate that simply nobody has touched yet? Check git history for the most recent commit on query_executor.c’s use of free_sort_list_Ts.
The freelist’s pop_from_available ABA window — has it ever been hit in production? The hazard is documented but not bounded; a stress test that intentionally retires-and-re- acquires the same head with deliberate preemption (e.g., SIGSTOP on a worker) would tell us whether the back-buffer distance is empirically sufficient.

Beyond CUBRID — Comparative Designs & Research Frontiers

Hazard pointers vs. transactional reclamation. Michael’s hazard-pointer design (Michael 2004, IEEE TPDS) reclaims per-thread published pointers, not per-transaction; the cost is a per-access release store rather than a tran-bracket. A side-by-side cost comparison on CUBRID’s hashmap workload would tell us whether the MATI_REFRESH_INTERVAL = 100 amortization is empirically cheaper than per-access HP publication.
Userspace RCU (liburcu, Mathieu Desnoyers). Lock-free reads with bracketed rcu_read_lock/unlock, and writers defer reclamation to a grace period. CUBRID’s transactional reclamation is structurally a per-DS userspace RCU; the question is whether using liburcu (already a system library on most Linux distros) would simplify the engine’s lockfree::tran family without losing per-DS scoping.
Folly’s ConcurrentHashMap (Facebook open source). A production-grade C++ lock-free hash table with hazard-pointer reclamation and split-ordered semantics. Comparing it to lockfree::hashmap would tell us whether the clear()-via- back-buffer trick we use is a real cost or a cheap luxury.
LMAX Disruptor (Thompson et al., 2011). The canonical bounded MPMC ring; CUBRID’s lockfree::circular_queue is the same shape minus the batched-claim and waiting-strategy generalizations. Adopting batched claims would let DWB and page-buffer victim queues drain in chunks rather than one CAS per slot.
MERGE: a single lockfree::tran::system server-wide. Today every modern data structure builds its own tran::table from a server-wide tran::system (returned by cubthread::get_thread_entry_lftransys()). If a future refactor merged the legacy 11-system zoo into the same server-wide system, the legacy/modern distinction in consumer code would collapse and the cubthread::lockfree_hashmap bridge could retire.
Wait-free flat combining (Hendler et al., SPAA 2010). Replaces a CAS-loop hot spot with a per-thread publication record that a designated combiner thread sweeps. The technique pays off on rare-update / many-reader structures; the lock manager’s wait-for graph is a possible target.

Sources

Source files (in the CUBRID source tree at /data/hgryoo/references/cubrid/):
- src/base/lock_free.h, src/base/lock_free.c
- src/base/lockfree_transaction_def.hpp
- src/base/lockfree_transaction_system.{hpp,cpp}
- src/base/lockfree_transaction_table.{hpp,cpp}
- src/base/lockfree_transaction_descriptor.{hpp,cpp}
- src/base/lockfree_transaction_reclaimable.hpp
- src/base/lockfree_address_marker.hpp
- src/base/lockfree_bitmap.{hpp,cpp}
- src/base/lockfree_freelist.hpp
- src/base/lockfree_hashmap.{hpp,cpp}
- src/base/lockfree_circular_queue.hpp
- src/thread/thread_lockfree_hash_map.{hpp,cpp}
Textbook chapters cited:
- Herlihy & Shavit, The Art of Multiprocessor Programming, ch. 3 §“Progress Conditions”, ch. 9–10 (lock-free linked lists), ch. 13 (concurrent queues), ch. 13 §“Bounded Lock-Free Queues”.
- Williams, C++ Concurrency in Action, ch. 5 (memory model), ch. 7 (designing lock-free data structures).
Foundational papers:
- Michael, M. M. “Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects.” IEEE TPDS 15(8), 2004.
- Harris, T. L. “A Pragmatic Implementation of Non-Blocking Linked-Lists.” DISC 2001.
- Michael, M. M., Scott, M. L. “Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms.” PODC 1996.
- Shalev, O., Shavit, N. “Split-Ordered Lists: Lock-Free Extensible Hash Tables.” JACM 53(3), 2006.
- Treiber, R. K. “Systems Programming: Coping with Parallelism.” IBM Almaden RJ 5118, 1986.
- Fraser, K. “Practical Lock-Freedom.” Univ. of Cambridge PhD thesis (epoch-based reclamation), 2004.
Sibling docs in this repo:
- cubrid-lockfree-transaction.md — the reclamation framework in depth.
- cubrid-lockfree-hashmap.md — both hash table generations, with the migration narrative.
- cubrid-lockfree-circular-queue.md — the bounded MPMC ring.
- cubrid-lockfree-bitmap.md — the chunked atomic bitmap allocator.
- cubrid-lockfree-freelist.md — the typed free list with back-buffer.
Adjacent code-analysis docs:
- cubrid-thread-worker-pool.md — names how cubthread::entry::tran_entries[] and get_lf_tran_index() plug into the lock-free framework.
- cubrid-mvcc.md, cubrid-lock-manager.md — primary consumers of lf_hash_* (legacy) / cubthread::lockfree_hashmap (bridged).