CUBRID Lock-Free Freelist — Typed Node Pool with Back-Buffer Block Allocator

Contents:

Theoretical Background
Common DBMS Design
CUBRID’s Approach
Source Walkthrough
Source verification (as of 2026-05-07)
Beyond CUBRID — Comparative Designs & Research Frontiers
Sources

Theoretical Background

A freelist is the simplest possible memory pool: a singly linked list of free objects. Allocate = pop the head; release = push to the head. The two-CAS protocol from Treiber’s stack applies directly. The textbook coverage is in Herlihy & Shavit, The Art of Multiprocessor Programming, ch. 11 §“Concurrent Stacks and Elimination” — the same shape, applied to typed reusable objects rather than abstract elements.

The interesting design tensions are:

What goes on the head? A single global head bounces between cores on every alloc/release. The standard mitigations are elimination (pair an alloc with a release without touching the head) and partitioning (per-CPU freelists with cross-shard stealing).
When does the freelist grow? A fixed-size pool is simplest but exhausts under load. A growable pool needs a block allocator and a strategy for when to call it. Block- on-demand at exhaustion stalls the calling thread; pre- allocate-once at init wastes memory.
How is ABA prevented? The Treiber stack’s classic ABA problem — pop reads head, computes head->next, CAS-bumps head to head->next — is hit head-on by every freelist that recycles addresses. The mitigations are tagged pointers (extra bits in the head word), hazard pointers (per-thread published references), or transactional reclamation (defer reuse until no reader could be holding the popped node).

CUBRID’s lockfree::freelist<T> picks: a single available list head, a one-block back-buffer for lazy growth, and the tran::table / tran::descriptor framework for ABA-bounded reuse. The back-buffer is the design’s distinctive feature.

Back-buffer as growth amortization

A naive claim() that finds the available list empty must either fail, spin, or allocate a new block. Allocation is expensive (a malloc + N typed-init calls). Concurrent allocators also race — three threads simultaneously discovering an empty list each call alloc_block, getting three blocks where one would suffice.

The back-buffer pattern: keep one already-built block held aside. When the available list drains, atomic-swap the back-buffer into the available list and asynchronously build a new back-buffer. The race is now bounded — at most one thread gets to swap the back-buffer; the others see the swap completed and try the available list again, which by then has the just-swapped block.

CUBRID’s freelist runs the back-buffer pre-allocation eagerly: on every successful swap, the freelist immediately allocates the next back-buffer. So at steady state the freelist holds two blocks of memory: one in m_available_list, one in m_backbuffer_*.

`on_reclaim` — typed cleanup before recycle

A freelist that recycles bare bytes (just void* from malloc/free) imposes no contract on the allocator. A typed freelist must let the type clean itself between uses — close file handles, release sub-allocations, scrub state. The shape the freelist enforces:

The payload type T must declare an on_reclaim() method (compile-time check via SFINAE on the template instantiation).
The freelist’s reclaim() calls m_t.on_reclaim() before pushing the node back to the available list.

The lockfree::hashmap’s freelist_node_data is the canonical example: its on_reclaim calls the per-table lf_entry_descriptor::f_uninit callback, so a hashmap entry’s custom destructor (releasing a sub-mutex, freeing a payload buffer, etc.) runs at every recycle.

Common DBMS Design

Per-data-structure free lists

Every DBMS uses pooled allocators for its lock-free entries — malloc on every insert and free on every delete is too slow. The shapes:

PostgreSQL uses MemoryContext arenas keyed by lifetime (top-level / per-transaction / per-tuple). Not lock-free; the context is per-backend.
InnoDB uses mem_heap_t arenas + a private freelist for hash entries (hash_table_t::heap). Mutex-protected.
MySQL has slab allocators for fixed-size objects.
CUBRID uses one freelist per lock-free hashmap (and per any other structure that wants pooled typed nodes), riding on tran::table reclamation.

The CUBRID approach is per-data-structure, not per-thread or per-transaction. The trade: each hashmap has its own pool with its own waste rate, but the pools never bleed cross-structure (retiring a hashmap clears only its own freelist).

Block allocator + back-buffer

The pattern of “allocate K objects at once, push as a block” is standard. The variant of “keep one block on the side” is less universal. Java’s ConcurrentLinkedDeque does not; Folly’s SmallVector does (capacity grows by doubling, holding the new buffer aside until the old is drained). CUBRID’s freelist is on the back-buffer side because the typical workload is bursty retirement (vacuum sweep, hashmap clear) followed by bursty claim (replay, repopulate) — the back-buffer absorbs the asymmetry.

Theory ↔ CUBRID mapping

Concept	CUBRID name
Typed node pool	`lockfree::freelist<T>`
Pool node	`freelist<T>::free_node` (derives from `tran::reclaimable_node`)
Available stack head	`m_available_list` (`std::atomic<free_node *>`)
Back-buffer head/tail	`m_backbuffer_head`, `m_backbuffer_tail` (`std::atomic<free_node *>`)
Block size (init param)	`m_block_size` (≥ 2 enforced)
On-reclaim hook	`T::on_reclaim ()` (caller-provided)
Per-pool reclamation	`m_trantable` (each freelist owns its own `tran::table`)
Allocate from available	`pop_from_available ()`
Allocate via swap	`swap_backbuffer ()`
Forced allocation	`force_alloc_block ()` (bypass back-buffer)
Statistics: alloc count	`m_alloc_count`
Statistics: available count	`m_available_count`
Statistics: back-buffer count	`m_bb_count`
Statistics: forced allocations	`m_forced_alloc_count`
Statistics: retired count	`m_retired_count`

CUBRID’s Approach

Two classes — `freelist<T>` and `freelist<T>::free_node`

// lockfree::freelist<T> — src/base/lockfree_freelist.hpp:42
template <class T>
class freelist {
public:
  class free_node;

  freelist () = delete;
  freelist (tran::system &transys, size_t block_size, size_t initial_block_count = 1);
  ~freelist ();

  free_node *claim (tran::descriptor &);
  free_node *claim (tran::index);

  void retire (tran::descriptor &, free_node &);
  void retire (tran::index, free_node &);

  // statistics:
  size_t get_alloc_count () const;
  size_t get_available_count () const;
  size_t get_backbuffer_count () const;
  size_t get_forced_allocation_count () const;
  size_t get_retired_count () const;
  size_t get_claimed_count () const;       // = alloc - (available + backbuffer + retired)

  tran::system &get_transaction_system ();
  tran::table  &get_transaction_table ();

private:
  tran::system &m_transys;
  tran::table  *m_trantable;          // owned: deleted in destructor
  size_t        m_block_size;
  std::atomic<free_node *> m_available_list;
  std::atomic<free_node *> m_backbuffer_head;
  std::atomic<free_node *> m_backbuffer_tail;
  std::atomic<size_t>      m_available_count;
  std::atomic<size_t>      m_alloc_count;
  std::atomic<size_t>      m_bb_count;
  std::atomic<size_t>      m_forced_alloc_count;
  std::atomic<size_t>      m_retired_count;

  void swap_backbuffer ();
  void alloc_backbuffer ();
  void force_alloc_block ();
  void alloc_list (free_node *&head, free_node *&tail);
  void dealloc_list (free_node *head);
  free_node *pop_from_available ();
  void push_to_list (free_node &head, free_node &tail, std::atomic<free_node *> &dest);
  void clear_free_nodes ();
  void final_sanity_checks () const;
  void check_my_pointer (free_node *node);
};

// lockfree::freelist<T>::free_node — src/base/lockfree_freelist.hpp:104
template <class T>
class freelist<T>::free_node : public tran::reclaimable_node {
public:
  free_node ();
  ~free_node () = default;
  T &get_data ();
  void reclaim () final override;       // recycles instead of delete
private:
  friend freelist;
  void set_owner (freelist &m_freelist);
  void set_freelist_next (free_node *next);
  void reset_freelist_next ();
  free_node *get_freelist_next ();
  freelist *m_owner;
  T m_t;                                 // payload
};

The free_node derives from tran::reclaimable_node, so a retired node lives on the descriptor’s retired ring through the inherited m_retired_next field. When reclamation eventually fires, the inherited reclaim() virtual is the recycle path that pushes back to m_available_list.

The clever overload of m_retired_next:

While the node is on the descriptor’s retired ring, m_retired_next is the retire-ring link.
While the node is on the freelist’s available list, m_retired_next is the freelist link.
A node is on exactly one of those at a time, so the field is reused.

free_node::set_freelist_next and get_freelist_next are private accessors that present m_retired_next as a freelist link with the correct return type (free_node *, not the base reclaimable_node *).

Construction — eager pre-allocation

// freelist<T>::freelist — src/base/lockfree_freelist.hpp:138
freelist (tran::system &transys, size_t block_size,
          size_t initial_block_count = 1)
  : m_transys (transys),
    m_trantable (new tran::table (transys)),     // OWN tran table
    m_block_size (block_size),
    /* counters initialized to 0 */
{
  if (initial_block_count <= 1) {
    m_block_size /= 2;
    initial_block_count = 2;
  }

  alloc_backbuffer ();
  for (size_t i = 0; i < initial_block_count; i++)
    swap_backbuffer ();      // pre-fill available list
}

Three things happen at init:

The freelist creates its own tran::table from the shared tran::system. Every freelist is a distinct reclamation domain — retiring nodes in freelist A does not bump the global tranid in freelist B.
The “block size at least 2 with initial blocks at least 2” rule. If the caller asks for initial_block_count = 1, the constructor halves the block size and doubles the block count. This guarantees there is always at least one block in the back-buffer and one block in the available list at steady state.
The init eagerly fills the available list to initial_block_count blocks. The back-buffer is one extra block beyond what was requested. After init, the freelist contains initial_block_count + 1 total blocks of memory, initial_block_count of which are immediately claimable.

`claim` — the hot allocate path

// freelist<T>::claim (descriptor flavor) — src/base/lockfree_freelist.hpp:291
free_node *claim (tran::descriptor &tdes)
{
  tdes.start_tran ();                           // bracket
  tdes.reclaim_retired_list ();                 // amortized cleanup

  free_node *node;
  size_t count = 0;
  for (node = pop_from_available (); node == NULL && count < 100;
       node = pop_from_available (), ++count) {
    // back-buffer allocator was preempted for a long time — force the swap
    swap_backbuffer ();
  }
  while (node == NULL) {
    // back-buffer still failed; force-allocate directly into available
    force_alloc_block ();
    node = pop_from_available ();
  }

  m_available_count--;
  check_my_pointer (node);
  return node;
}

The escalation ladder:

Fast path. Pop from m_available_list. Most calls succeed here.
Empty available list, pop returns NULL. Up to 100 retry cycles, each calling swap_backbuffer() to move the back-buffer’s block into the available list. If the back-buffer is also empty (because the back-buffer allocator was preempted between swap and rebuild), swap_backbuffer is a no-op and the next iteration tries pop again.
After 100 empty-available + empty-back-buffer cycles, force-allocate. force_alloc_block does alloc_list + push_to_list (m_available_list) directly — bypasses the back-buffer protocol entirely. This is the “back-buffer allocator deeply preempted” escape hatch.

The tdes.start_tran() at the top is the EBR bracket — every read of m_available_list, every CAS-pop, happens inside the descriptor’s transaction window. The bracket is not ended inside claim; the contract is transaction will remain started (the header comment notes this) — the caller is responsible for ending the transaction (typically by inserting the claimed node into a structure and ending the tran inside the insert).

`pop_from_available` — the documented ABA window

// freelist<T>::pop_from_available — src/base/lockfree_freelist.hpp:322
free_node *pop_from_available ()
{
  free_node *rhead = NULL, *rhead_copy = NULL, *next;
  do {
    rhead = m_available_list;
    if (rhead == NULL) return NULL;
    next = rhead->get_freelist_next ();
    rhead_copy = rhead;
    // todo: this is a dangerous preemption point; if I am preempted here, and
    // thread 2 comes and does:
    //   - second thread gets same rhead and successfully moves m_available_list
    //     to next
    //   - third thread gets next and successfully moves m_available_list to
    //     next->next
    //   - second thread retires rhead. m_available_list becomes rhead and its
    //     next becomes next->next
    //   - I wake up, compare exchange m_available_list successfully because it
    //     is rhead again, but next will become the item third thread already
    //     claimed.
  } while (!m_available_list.compare_exchange_weak (rhead_copy, next));
  rhead->reset_freelist_next ();
  return rhead;
}

The comment names the classical ABA scenario verbatim. The CAS on m_available_list succeeds when the head is rhead again — but if rhead was popped, retired, and re-pushed in between, our saved next pointer is stale and we corrupt the list.

The mitigation is time, not algorithm. A retired node must travel:

Through the descriptor’s retired ring,
Wait for min_active_tranid > retire_tranid (which means no reader’s transaction window predates the retirement),
Get reclaimed via reclaim(),
Be pushed back to m_available_list.

The back-buffer pattern lengthens this travel: most retirees go to the descriptor’s retired ring → reclaim() → freelist’s available, but if there is a back-buffer block in flight, the back-buffer’s block waits until swap_backbuffer moves it into available. The total time is empirically long enough that the ABA window in pop_from_available is not hit in practice, but it is not eliminated by algorithm.

The todo flag in the comment is the design’s open issue. A tagged-pointer redesign (lower 16 bits as a generation tag in the head pointer) would close the window at the cost of constraining free_node * alignment. CUBRID has not adopted this; the back-buffer time is the production mitigation.

`swap_backbuffer` — moving the spare block in

// freelist<T>::swap_backbuffer — src/base/lockfree_freelist.hpp:169
void swap_backbuffer ()
{
  free_node *bb_head = m_backbuffer_head;
  if (bb_head == NULL) return;          // somebody already swapped

  free_node *bb_head_copy = bb_head;
  if (!m_backbuffer_head.compare_exchange_strong (bb_head_copy, NULL))
    return;                             // somebody else is swapping

  free_node *bb_tail = m_backbuffer_tail.exchange (NULL);
  assert (bb_tail != NULL);

  m_available_count += m_bb_count.exchange (0);
  push_to_list (*bb_head, *bb_tail, m_available_list);

  alloc_backbuffer ();                  // start building next back-buffer
}

The protocol:

Claim the back-buffer. CAS m_backbuffer_head from the current pointer to NULL. Failure means someone else is already swapping; the loser returns immediately.
Detach the tail. Atomic exchange m_backbuffer_tail to NULL. The pre-CAS head and the exchanged tail bracket the block to splice.
Splice into available. push_to_list is the standard CAS-loop push — set tail->next = available_head, CAS available_head from available_head to block_head.
Start building the next back-buffer. alloc_backbuffer new-s a fresh block and CAS-installs it in the now-empty m_backbuffer_head/m_backbuffer_tail.

The alloc_backbuffer is what hides the malloc/init cost from the next claimer: by the time swap_backbuffer returns, the hot path (pop_from_available) has already been refilled, and the next swap’s destination block is being built in parallel.

`force_alloc_block` — the escape hatch

// freelist<T>::force_alloc_block — src/base/lockfree_freelist.hpp:213
void force_alloc_block ()
{
  free_node *new_head = NULL, *new_tail = NULL;
  alloc_list (new_head, new_tail);
  m_available_count += m_block_size;
  ++m_forced_alloc_count;
  push_to_list (*new_head, *new_tail, m_available_list);
}

Bypasses the back-buffer entirely; allocates and pushes straight to available. Triggered by claim’s 100-iteration escape from the back-buffer-empty loop. Tracked by m_forced_alloc_count so monitoring can flag pools that hit this path.

`retire` — the public retirement primitive

// freelist<T>::retire — src/base/lockfree_freelist.hpp:358
void retire (tran::descriptor &tdes, free_node &node)
{
  assert (node.get_freelist_next () == NULL);
  ++m_retired_count;
  check_my_pointer (&node);
  tdes.retire_node (node);                   // hand to descriptor's retired ring
}

The freelist defers to tdes.retire_node(), which:

Bumps the global tranid (via start_tran_and_increment_id).
Amortized-reclaims the descriptor’s existing retired prefix.
Stamps the node’s m_retire_tranid and links it to the descriptor’s retired ring.

The freelist’s m_retired_count is a global counter (atomic inc on retire, atomic dec on reclaim) — useful for observability but not for correctness.

`free_node::reclaim` — the recycle hook

// freelist<T>::free_node::reclaim — src/base/lockfree_freelist.hpp:523
void free_node::reclaim () final override
{
  m_t.on_reclaim ();                          // payload's cleanup hook

  m_retired_next = NULL;                      // detach from retire ring
  --m_owner->m_retired_count;
  ++m_owner->m_available_count;
  m_owner->push_to_list (*this, *this, m_owner->m_available_list);
}

Three steps:

Payload cleanup. m_t.on_reclaim(). For the hashmap’s freelist_node_data, this calls the per-table lf_entry_descriptor::f_uninit. For other consumers it is whatever destructor-like routine the type defines.
Counters. Decrement retired, increment available.
Push. Standard CAS-loop push to m_available_list. The node is now reclaimable through pop_from_available.

The final override qualifier is deliberate — no further override is allowed. This is the only place a node transitions from “retired” to “claimable.”

`clear_free_nodes` — destructor cleanup, not thread-safe

// freelist<T>::clear_free_nodes — src/base/lockfree_freelist.hpp:267
void clear_free_nodes ()       // not thread safe!
{
  final_sanity_checks ();
  dealloc_list (m_backbuffer_head.load ());
  m_backbuffer_head = NULL;
  m_backbuffer_tail = NULL;
  dealloc_list (m_available_list.load ());
  m_available_list = NULL;
  m_available_count = m_bb_count = m_alloc_count = 0;
}

Used only by the destructor; assumes no other thread is touching the freelist. The header comment makes this explicit. Reaches into both lists and deletes each free_node directly, then zeros the counters. The final_sanity_checks (only enabled in debug) verifies the available + back-buffer counts equal m_alloc_count and walks both lists to confirm.

`get_claimed_count` — derived statistic

The freelist tracks m_alloc_count (total ever allocated), m_available_count (currently claimable), m_bb_count (currently in back-buffer), and m_retired_count (currently waiting for reclamation). The claimed count — currently held by a caller — is derived:

size_t get_claimed_count () const {
  size_t alloc = m_alloc_count;
  size_t unused = m_available_count + m_bb_count + m_retired_count;
  return (alloc > unused) ? (alloc - unused) : 0;
}

The unused aggregate counters are loaded with seq_cst; the math is not transactional, so a fast burst of retires and claims can produce a transient negative (clamped to 0). For monitoring, the value is good enough.

Component overview

flowchart LR
  Caller["Caller (e.g., hashmap insert)"]
  Caller --> Claim["claim(tdes)"]
  Claim --> Pop["pop_from_available()"]
  Pop -- empty --> Swap["swap_backbuffer()"]
  Swap --> BB["m_backbuffer_head/tail"]
  Swap -.alloc next.-> AllocBB["alloc_backbuffer()"]
  Swap --> Avail["m_available_list"]
  Pop --> Avail
  Swap -- still empty --> Force["force_alloc_block()"]
  Force --> Avail

  Caller --> Retire["retire(tdes, node)"]
  Retire --> Tdes["tran::descriptor"]
  Tdes --> RetRing["m_retired_head/tail ring"]
  RetRing -.tranid bump + min-scan.-> Reclaim["free_node::reclaim()"]
  Reclaim --> OnReclaim["T::on_reclaim()"]
  Reclaim --> Avail

  Avail --> Pop

Statistics summary

Counter	Meaning	Updated when
`m_alloc_count`	Total `free_node`s ever created	`alloc_list` (block alloc)
`m_available_count`	Currently in `m_available_list`	`pop_from_available` (-1), reclaim (+1), swap (+block_size)
`m_bb_count`	Currently in back-buffer	`alloc_backbuffer` (+block_size), swap (zeroes via exchange)
`m_forced_alloc_count`	How many times the back-buffer escape hatch fired	`force_alloc_block`
`m_retired_count`	Currently in some descriptor’s retired ring	`retire` (+1), reclaim (-1)

A healthy freelist has small m_forced_alloc_count (ideally 0 under steady state), m_retired_count bounded by reclamation latency, and m_available_count + m_bb_count close to one or two block sizes.

Comparison with legacy `LF_FREELIST`

The legacy lf_freelist (lock_free.h:231+) does the same job with a different shape:

struct lf_freelist {
  void *available;            // single available stack head (Treiber)
  int   block_size;
  int   alloc_cnt;
  int   available_cnt;
  int   retired_cnt;
  LF_ENTRY_DESCRIPTOR *entry_desc;  // takes f_alloc/f_free callbacks
  LF_TRAN_SYSTEM      *tran_system; // shared global, not owned
};

Differences from the modern design:

No back-buffer. The legacy freelist allocates blocks on-demand inside lf_freelist_claim (when lf_stack_pop (&freelist->available, edesc) returns NULL) via lf_freelist_alloc_block. The header comment acknowledges multiple threads racing to alloc a block at once, calling it “acceptable given low enough block_size” (lock_free.c:840-842).
f_alloc/f_free callbacks instead of new/delete. Each block is built by calling the descriptor’s f_alloc per-entry; reclamation calls f_free. Modern freelist owns its own typed allocations.
max_alloc_cnt cap. The legacy freelist tracks alloc_cnt and refuses to grow past entry_desc->max_alloc_cnt (default 2,147,483,647). When the cap is hit and a retire happens, the legacy lf_freelist_transport f_frees the retiree instead of recycling. The modern freelist has no cap; once allocated, blocks live for the freelist’s lifetime.
Shared LF_TRAN_SYSTEM. A single legacy LF_TRAN_SYSTEM underlies many freelists in the same domain. Modern freelist owns its tran::table; reclamation pressure is per-pool.
lf_freelist_transport is the legacy reclamation walker. Walks the per-thread retired list, partitioning into “free this” (caps exceeded) and “transport to available” (still under cap). Modern code calls descriptor::reclaim_retired_list and the per-node reclaim() virtual.

The migration path: when a hashmap consumer flips from the legacy to the modern hashmap (via enable_new_lfhash), it trades the global LF_TRAN_SYSTEM + lf_freelist for a per-table tran::table + lockfree::freelist. The behavior is observably the same (entries claimed, retired, recycled); the bookkeeping is simpler.

Memory ordering

Every atomic in the freelist uses seq_cst defaults. The hot operations are:

m_available_list.compare_exchange_weak (pop and push) — seq_cst.
m_backbuffer_head.compare_exchange_strong (claim back-buffer for swap) — seq_cst.
m_backbuffer_tail.exchange (detach tail during swap) — seq_cst.
Counter ++ / -- / += — seq_cst RMW.

On x86 the seq_cst overhead is small (a couple of additional fences); on ARM it is non-trivial. No measurement campaign exists for tightening to acquire/release.

Source Walkthrough

Public surface

lockfree::freelist<T> (class template) — lockfree_freelist.hpp:42
freelist::freelist (tran::system &, size_t, size_t) — :138
freelist::~freelist — :259
freelist::claim (tran::descriptor &) — :291
freelist::claim (tran::index) — :284
freelist::retire (tran::descriptor &, free_node &) — :358
freelist::retire (tran::index, free_node &) — :351
freelist::get_alloc_count / get_available_count / ... — :382+
freelist::get_transaction_system / get_transaction_table — :433 / :440
freelist<T>::free_node — :104
free_node::get_data — :534
free_node::reclaim — :523 (the final override)

Private internals

freelist::swap_backbuffer — :169
freelist::alloc_backbuffer — :194
freelist::force_alloc_block — :213
freelist::alloc_list — :227
freelist::dealloc_list — :247
freelist::pop_from_available — :322
freelist::push_to_list — :368
freelist::clear_free_nodes — :267
freelist::final_sanity_checks — :447
freelist::check_my_pointer — :476

Position hints (as of 2026-05-07)

Symbol	File	Line
`lockfree::freelist<T>` (class template)	`src/base/lockfree_freelist.hpp`	42
`freelist<T>::free_node` (class)	`src/base/lockfree_freelist.hpp`	104
`freelist::freelist` (constructor)	`src/base/lockfree_freelist.hpp`	138
`freelist::swap_backbuffer`	`src/base/lockfree_freelist.hpp`	169
`freelist::alloc_backbuffer`	`src/base/lockfree_freelist.hpp`	194
`freelist::force_alloc_block`	`src/base/lockfree_freelist.hpp`	213
`freelist::alloc_list`	`src/base/lockfree_freelist.hpp`	227
`freelist::~freelist`	`src/base/lockfree_freelist.hpp`	259
`freelist::clear_free_nodes`	`src/base/lockfree_freelist.hpp`	267
`freelist::claim (descriptor)`	`src/base/lockfree_freelist.hpp`	291
`freelist::pop_from_available`	`src/base/lockfree_freelist.hpp`	322
`freelist::retire (descriptor)`	`src/base/lockfree_freelist.hpp`	358
`freelist::push_to_list`	`src/base/lockfree_freelist.hpp`	368
`freelist::get_claimed_count`	`src/base/lockfree_freelist.hpp`	418
`free_node::reclaim`	`src/base/lockfree_freelist.hpp`	523
`lf_freelist_alloc_block` (legacy)	`src/base/lock_free.c`	621
`lf_freelist_claim` (legacy)	`src/base/lock_free.c`	760
`lf_freelist_retire` (legacy)	`src/base/lock_free.c`	873
`lf_freelist_transport` (legacy)	`src/base/lock_free.c`	934

Source verification (as of 2026-05-07)

Verified facts

The freelist owns its tran::table. Constructor calls m_trantable = new tran::table (transys) at lockfree_freelist.hpp:141; destructor delete m_trantable at :261.
Block size is at least 2. Constructor enforces assert (block_size > 1) at :152 and the initial_block_count <= 1 branch at :154-158 halves block size and doubles count to ensure ≥ 2 blocks at init.
pop_from_available’s ABA window is documented in the source. Multi-line comment at :336-342 describing the exact race scenario. No tagged-pointer mitigation in the current implementation.
force_alloc_block is the escape hatch after 100 back-buffer-empty cycles. claim at :298 loops up to 100 swap attempts before falling through to the force_alloc_block while-loop at :306.
free_node::reclaim is final override. Declared at lockfree_freelist.hpp:112. No further overrides allowed.
Statistics are seq_cst atomics. All counters (m_alloc_count, m_available_count, m_bb_count, m_forced_alloc_count, m_retired_count) declared as std::atomic<size_t> at :82-86.
claim does not end the transaction. The header comment at :53 says “transaction will remain started!” Caller is responsible for ending it.
The free node’s m_retired_next is reused as the freelist link. set_freelist_next / get_freelist_next at :502-518 cast through the inherited tran::reclaimable_node::m_retired_next.

Open questions

Has the pop_from_available ABA window ever been hit in production? The comment is documented; no test evidence either way. A targeted stress test (deliberate SIGSTOP/SIGCONT on a worker thread between read of next and CAS) would tell us empirically. If it has not hit in N production-machine-years, the back-buffer time is the de-facto mitigation; if it has, the tagged-pointer redesign moves up the priority list.
Why the magic number 100 in the swap-loop limit? The count < 100 cap before falling through to force_alloc_block is unmotivated by the source. Perhaps tuned empirically; perhaps placeholder. Worth a focused review under stress.
Cap-on-allocation parity with legacy. Legacy lf_freelist honours entry_desc->max_alloc_cnt. Modern freelist<T> has no cap; once a block is allocated, it lives for the freelist’s lifetime. For long-running servers with bursty load, this is a memory-watermark concern. Investigation: does any consumer rely on the legacy cap to bound memory? If yes, the modern freelist needs a cap-and-shrink path.
alloc_backbuffer’s use of compare_exchange_strong on a NULL expectation. At :204, the function does m_backbuffer_tail.compare_exchange_strong (dummy_null, new_bb_tail) — succeeds only if the tail is currently NULL (which is true after swap_backbuffer exchanged it to NULL). On the first init call the tail is also NULL. But what if a second alloc_backbuffer raced (e.g., from two near-simultaneous swaps)? The CAS would fail, and the second alloc_backbuffer would push_to_list to the already-installed back-buffer head. Behavior under that race is plausible-but-not-obviously-correct; worth a focused review.
final_sanity_checks invariant under load. The debug-only check at :447-472 asserts m_available_count + m_bb_count == m_alloc_count. This ignores m_retired_count — i.e., it assumes no retires are in flight at destruction time. The destructor runs after every consumer has stopped using the freelist (via clear_free_nodes), so this is a reasonable precondition; but if a future change calls final_sanity_checks mid-life, the assertion is wrong.

Beyond CUBRID — Comparative Designs & Research Frontiers

Tagged pointers to close the pop_from_available ABA window. Stuff a 16-bit generation counter into the upper bits of free_node * (or use std::atomic<__int128>). Every push bumps the tag; every pop checks tag equality. Eliminates the documented ABA hazard at the cost of pointer-arithmetic discipline.
Hazard-pointer-based freelist (Michael 2004). Per-thread hazard pointer slot for the popped node; reclaim only when no thread has the address as a hazard. Cheaper read-side publication than EBR for hot workloads.
Per-CPU sharded freelist with stealing. Replace the single m_available_list with one per CPU (sched_getcpu()). Allocate from local; on local-empty, steal from a neighbor. Removes the cache-line bouncing on the head pointer at the cost of more memory and more complex sanity-check accounting.
Linked block lists rather than per-node lists. The freelist’s available list could be a list of blocks of nodes rather than individual nodes. Each block holds K nodes; pop gives one node from the head block (move to next when empty). Cuts CAS frequency by K. Requires the block to be the unit of cache-line concern, not the node.
Cap-and-shrink. Track high-water mark of in-use count; when the freelist is overprovisioned (in-use much smaller than allocated), shrink during a clear or a maintenance daemon tick. Would close the legacy parity gap (open question 3).

Sources

Source files (in the CUBRID source tree at /data/hgryoo/references/cubrid/):
- src/base/lockfree_freelist.hpp (header-only template, the entire implementation)
- src/base/lockfree_transaction_descriptor.hpp (the descriptor freelist::claim/retire rides on)
- src/base/lockfree_transaction_table.hpp (the table freelist constructs internally)
- src/base/lockfree_transaction_reclaimable.hpp (the base class free_node derives from)
- src/base/lockfree_hashmap.hpp (primary consumer, via freelist<freelist_node_data>)
- src/base/lock_free.h, src/base/lock_free.c (legacy lf_freelist counterpart)
Textbook chapters cited:
- Herlihy & Shavit, The Art of Multiprocessor Programming, ch. 11 §“Concurrent Stacks and Elimination” — the Treiber-stack pattern at the core of pop_from_available and push_to_list.
Foundational papers:
- Treiber, R. K. “Systems Programming: Coping with Parallelism.” IBM Almaden RJ 5118, 1986 — the classical Treiber stack and its ABA exposure.
- Michael, M. M. “Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects.” IEEE TPDS 15(8), 2004 — the alternative reclamation model the freelist does not use.
Sibling docs in this repo:
- cubrid-lockfree-overview.md — umbrella, two-generation map.
- cubrid-lockfree-transaction.md — the reclamation framework the freelist plugs into.
- cubrid-lockfree-hashmap.md — primary consumer; explains freelist_node_data::on_reclaim.