CUBRID Lock-Free Circular Queue — Bounded MPMC Ring with Per-Slot Block Flags
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Source verification (as of 2026-05-07)
- Beyond CUBRID — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”A bounded multi-producer multi-consumer (MPMC) queue is the canonical inter-thread hand-off primitive in a high-throughput system. Producers push payloads from one set of threads; consumers pop them from another. The throughput goal is to amortize the per-item cost down to a small number of atomic operations per element. The textbook reference is Herlihy & Shavit, The Art of Multiprocessor Programming, ch. 13 §“Concurrent Queues.” Three implementation families recur:
- Linked-list MPMC (Michael & Scott 1996). Two atomic
pointers (
head,tail); each enqueue allocates a node and CAS-links it; each dequeue CAS-advanceshead. Unbounded; one allocation per item. Production exemplar: Java’sConcurrentLinkedQueue. - Bounded ring buffer (Lamport 1983 SPSC; many MPMC variants). Power-of-two slot array + two cursor atomics + per-slot synchronization. No per-item allocation. Production exemplar: LMAX Disruptor (Thompson et al. 2011).
- Bounded fetch-and-add ring (Yang & Mellor-Crummey 2016).
Producers and consumers each fetch-and-add their cursor; the
slot’s “sequence number” tells whether the slot is ready for
produce or consume. The cleanest production design today
(Vyukov, Adamson). Used by Boost.Lockfree’s
spsc_queue/queueand many high-performance C++ libraries.
CUBRID’s lockfree::circular_queue<T> is closest to family (2)
but with a CAS-driven cursor (not fetch-and-add) and a per-slot
block flag instead of a sequence number. The choice matters:
- CAS-cursor lets a producer detect a full queue cheaply (compare consume cursor + capacity to produce cursor) but exposes a producer-vs-producer race for the same slot.
- Block flag resolves the race: a producer that lost the cursor CAS still incremented the cursor, but only the winner of the per-slot block-flag CAS gets to write into that slot. The block flag also doubles as a “produce complete” signal for consumers.
The design has a known weakness, called out in the source itself: a producer preempted between cursor-bump and block- release stalls every consumer at that slot until it wakes. This is not strictly lock-free under the textbook progress condition (a producer can be blocked while consumers are forced to wait). The source acknowledges this and notes a separation-of-concerns redesign as future work.
Cursor invariants
Section titled “Cursor invariants”Two uint64_t atomic cursors:
m_produce_cursor— strictly monotone; reads-and-bumps viacompare_exchange_strong.m_consume_cursor— strictly monotone; reads-and-bumps viacompare_exchange_strong.
The capacity is rounded up to the next power of two; the slot
index for a cursor is cursor & (capacity - 1).
The size invariant: produce_cursor - consume_cursor is the
current item count. The queue is full when
consume_cursor + capacity <= produce_cursor and empty
when produce_cursor <= consume_cursor.
Per-slot block flag
Section titled “Per-slot block flag”A parallel m_blocked_cursors[capacity] array of atomic
uint64_t. Each slot’s flag word encodes:
- The expected cursor value in the low bits — the cursor
value the slot is ready to accept next (initially the slot’s
index, then bumped by
+capacityafter each produce). - A BLOCK_FLAG in the highest bit (
0x8000_0000_0000_0000). When set, “produce in progress; consumer must not read this slot.”
The combination encodes three states implicitly:
| Slot state | flag word |
|---|---|
Ready for produce at cursor c | c (no BLOCK_FLAG) |
Produce in progress at cursor c | c | BLOCK_FLAG |
| Produced; ready for consume | c + capacity (after unblock) |
The consumer reads the block flag, sees that the BLOCK_FLAG is clear, then reads the slot. The slot generation is encoded by the cursor value modulo capacity, so a producer who tries to re-claim the same slot index for the next generation must bump the flag word to a value that no longer matches the previous expected value.
Common DBMS Design
Section titled “Common DBMS Design”Every DBMS that runs background work threads ships some MPMC queue. The shapes converge.
Vacuum dispatch (PostgreSQL, MySQL InnoDB purge, CUBRID)
Section titled “Vacuum dispatch (PostgreSQL, MySQL InnoDB purge, CUBRID)”The pattern: log-flush daemon publishes “log block N is now durable” entries; vacuum workers pop and process them. The queue’s role is to decouple the latency of log-flush (synchronous) from the latency of vacuum (eventually consistent).
- PostgreSQL uses no MPMC queue; autovacuum-launcher
scans
pg_classperiodically and dispatches via shared- memory state. The decoupling is by polling, not by queue. - InnoDB purge uses a single dispatch thread popping from
a ring buffer (
trx_purge_t::view); multiple worker threads pick from a partitioned set. - CUBRID vacuum uses two
lockfree::circular_queue<T>instances:vacuum_Block_data_bufferfor “log block ready for vacuum” andvacuum_Finished_job_queuefor “block N finished, can advance vacuum’s safe LSN.”
Page-buffer victim hand-off
Section titled “Page-buffer victim hand-off”When a thread fixes a page that is not buffered, it must evict a victim. The decision can be done inline (the fixer walks the LRU and evicts) or pipelined (a daemon scans the LRU, publishes ready-to-evict victims, fixers pop).
- MySQL InnoDB does inline eviction with mutex-protected
buf_LRUlists. - PostgreSQL does inline eviction with the
BufFreelistspinlock. - CUBRID uses the pipelined model with multiple
lockfree::circular_queue<int>queues (private_lrus_with_victims,big_private_lrus_with_victims,shared_lrus_with_victims) — page-buffer aoutility daemons publish victim LRU partition indexes; fixer threads pop and use them.
CDC / replication forwarding
Section titled “CDC / replication forwarding”CUBRID’s CDC subsystem forwards log-info entries from the log-
manager thread to a CDC publisher thread via
cdc_Gl.loginfo_queue (a lockfree::circular_queue <CDC_LOGINFO_ENTRY *>). The shape matches the producer-
consumer streaming pattern that PostgreSQL implements via
SLRU shared memory and that MySQL implements via
Log_event allocation under the binlog mutex.
Theory ↔ CUBRID mapping
Section titled “Theory ↔ CUBRID mapping”| Concept | CUBRID name |
|---|---|
| Bounded MPMC queue | lockfree::circular_queue<T> |
| Power-of-two capacity | m_capacity = next_pow2 (size) |
| Slot index mask | m_index_mask = m_capacity - 1 |
| Producer cursor | m_produce_cursor (std::atomic<uint64_t>) |
| Consumer cursor | m_consume_cursor (std::atomic<uint64_t>) |
| Per-slot block flag | m_blocked_cursors[i] (std::atomic<uint64_t>) |
| Block bit | BLOCK_FLAG = 1ULL << 63 |
| Empty test | produce_cursor <= consume_cursor |
| Full test | consume_cursor + capacity <= produce_cursor |
| Slot index from cursor | cursor & m_index_mask |
| Produce-in-progress slot signal | block flag set on m_blocked_cursors[i] |
| Generation bump on unblock | flag word ← cursor + capacity |
| Force-produce wait | std::this_thread::yield() |
| Optional execution-history trace | local_history, local_event (under DEBUG_LFCQ) |
CUBRID’s Approach
Section titled “CUBRID’s Approach”Shape and storage
Section titled “Shape and storage”// lockfree::circular_queue<T> — src/base/lockfree_circular_queue.hpp:46template <class T>class circular_queue {public: using cursor_type = std::uint64_t; using atomic_cursor_type = std::atomic<cursor_type>; using data_type = T; using atomic_flag_type = atomic_cursor_type;
circular_queue (std::size_t size); ~circular_queue ();
bool is_empty () const; // cheap snapshot — may be stale bool is_full () const; bool is_half_full (); std::size_t size ();
bool consume (T &element); // non-blocking; false on empty bool produce (const T &element); // non-blocking; false on full void force_produce (const T &element); // yield-loop until success std::uint64_t get_consumer_cursor ();
private: static const cursor_type BLOCK_FLAG; // (cursor_type) 1 << 63
data_type *m_data; // m_capacity slots atomic_flag_type *m_blocked_cursors; // m_capacity flag words atomic_cursor_type m_produce_cursor; // monotone atomic_cursor_type m_consume_cursor; // monotone std::size_t m_capacity; // power of two std::size_t m_index_mask; // m_capacity - 1};The constructor rounds the requested size up to the next power of two:
circular_queue<T>::circular_queue (std::size_t size) : m_produce_cursor (0), m_consume_cursor (0){ m_capacity = next_pow2 (size); m_index_mask = m_capacity - 1; assert ((m_capacity & m_index_mask) == 0); m_data = new data_type[m_capacity]; init_blocked_cursors ();}
void init_blocked_cursors () { m_blocked_cursors = new atomic_flag_type[m_capacity]; for (cursor_type c = 0; c < m_capacity; c++) m_blocked_cursors[c] = c; // initial expected value = slot index}The initial flag word for slot i is i — meaning “ready for
produce at cursor i, no block flag.” After the first produce-
unblock cycle, the flag becomes i + capacity; after the next,
i + 2*capacity; and so on. This is how the slot tracks its
generation without needing a separate generation counter.
Produce — the protocol
Section titled “Produce — the protocol”// circular_queue<T>::produce — src/base/lockfree_circular_queue.hpp:230bool produce (const T &element){ while (true) { pc = load_cursor (m_produce_cursor); cc = load_cursor (m_consume_cursor);
if (test_full_cursors (pc, cc)) return false; // QUEUE FULL
did_block = block (pc); // try to claim slot pc
// bump the produce cursor regardless of block success test_and_increment_cursor (m_produce_cursor, pc);
if (did_block) { store_data (pc, element); unblock (pc); // mark slot ready-to-consume return true; } // somebody else won the slot; loop }}The four sub-operations:
- Fullness check.
produce_cursor + 1 > consume_cursor + capacity≡consume_cursor + capacity <= produce_cursor. Cheap; uses snapshot loads. False positives on transient races are tolerated — the next loop iteration re-tests. - Block-flag CAS.
The expected value isbool block (cursor_type cursor) {cursor_type block_val = flag<cursor_type> (cursor).set (BLOCK_FLAG).get_flags ();return m_blocked_cursors[cursor & m_index_mask].compare_exchange_strong (cursor, block_val);}
cursor(no block flag, generation matches); the new value iscursor | BLOCK_FLAG. The CAS succeeds only if (a) the slot’s generation is exactlycursor, and (b) no other producer has set the block bit. - Cursor bump.
compare_exchange_strongfrompctopc + 1. The bump is best-effort: it succeeds for exactly one of the racing producers. Whichever producer wins the bump-CAS pushes the cursor forward; the others ignore the result — theirdid_blockdecided whether they own the slot, and the cursor will catch up when their loop iteration re-loads it. - Store + unblock.
The unblock writesvoid unblock (cursor_type cursor) {atomic_flag_type &ref = m_blocked_cursors[cursor & m_index_mask];flag<cursor_type> v = ref.load ();assert (v.is_set (BLOCK_FLAG));cursor_type next_gen = v.clear (BLOCK_FLAG).get_flags () + m_capacity;ref.store (next_gen);}
cursor + capacity(the next generation’s expected value) into the flag word. This is what tells a consumer that the slot is full, and tells the next-generation producer at the same slot index what to expect.
Consume — the protocol
Section titled “Consume — the protocol”// circular_queue<T>::consume — src/base/lockfree_circular_queue.hpp:318bool consume (T &element){ while (true) { cc = load_cursor (m_consume_cursor); pc = load_cursor (m_produce_cursor);
if (test_empty_cursors (pc, cc)) return false; // QUEUE EMPTY if (is_blocked (cc)) return false; // first item not yet produced
element = load_data (cc); // tentative copy
if (test_and_increment_cursor (m_consume_cursor, cc)) break; // CONSUME COMPLETE // raced — another consumer took the slot; loop } return true;}Three sub-operations:
- Empty + block check. Empty if
pc <= cc. Even if non-empty, if the slot atccis still blocked (producer hasn’t unblocked yet), treat the queue as empty for this call — returnfalseso the caller can try later. This is the “preempted producer” mitigation: a consumer does not wait for a stalled producer; it bails and lets the caller re-poll. - Tentative copy. Read the slot’s payload into the
caller’s output argument. If the cursor-CAS fails (someone
else consumed first), the caller’s
elementmay have been partly overwritten — the caller is documented not to trustelementonfalsereturn. - Cursor CAS. Strong CAS from
cctocc + 1. The thread that wins is the consumer of this slot.
The producer-and-consumer protocol is therefore asymmetric: producers fight for slots via the block flag, consumers fight for cursor values via the cursor CAS.
State machine
Section titled “State machine”stateDiagram-v2
[*] --> Initial : alloc, slot i flag = i
Initial --> ProduceInProgress : block flag CAS i -> i\nset BLOCK_FLAG
ProduceInProgress --> Produced : unblock\nflag = i + capacity
Produced --> Consumed : consume cursor CAS\nadvance to i+1
Consumed --> Initial : next generation\nslot reused at cursor i+capacity
note right of ProduceInProgress
Consumer sees BLOCK_FLAG\nand returns "empty"
end note
note right of Consumed
Slot becomes target for cursor i+capacity
end note
force_produce — yield-loop
Section titled “force_produce — yield-loop”void force_produce (const T &element) { while (!produce (element)) std::this_thread::yield ();}The simplest possible blocking-producer wrapper. No exponential
backoff, no condition variable — just yield and retry. This is
acceptable because the queues that use force_produce are
designed to drain rapidly (vacuum’s finished-job queue) or are
acceptable to drop-on-full elsewhere.
Documented limitation — preempted producer
Section titled “Documented limitation — preempted producer”The produce source comment names the worst-case scenario:
// NOTE: I have an implementation issue I don't know how to fix// without a significant overhead. When the queue is very stressed// (many threads produce/consume very often), the system may preempt// a thread while it has a slot blocked and may keep it preempted// for a very long time (long enough to produce and consume an entire// generation). I'd like to any blocking of any kind.//// One possible direction is to separate data storage (and slot index)// from cursor. When one wants to produce an element, it first reserves// a slot in storage, adds his data, and then saves slot index/cursor in// what is now m_blocked_cursors using CAS operation. if this succeeds,// produced data is immediately available for consumption. any// preemption would not block the queue (just delay when the produce// is happening).The comment is the design’s open issue. The current production behavior is: a preempted producer holding a slot’s BLOCK_FLAG blocks only that slot’s consumer, not the entire queue — the producer cursor advances, other producers fill subsequent slots, other consumers consume them. The single stalled slot’s consumer hits the “first item is blocked, return empty” branch and the consumer side moves on (or retries later). The queue-blocking case happens only if every producer in flight gets preempted simultaneously, which is rare.
is_empty / is_full / size are best-effort
Section titled “is_empty / is_full / size are best-effort”The public predicates do not synchronize the two cursors — they snapshot each independently and apply the test:
bool is_empty () const { return test_empty_cursors (m_produce_cursor, m_consume_cursor); }bool is_full () const { return test_full_cursors (m_produce_cursor, m_consume_cursor); }std::size_t size () { cursor_type cc = load_cursor (m_consume_cursor); cursor_type pc = load_cursor (m_produce_cursor); if (pc <= cc) return 0; return pc - cc;}Callers are expected to use these for early-out hints, not
for hard correctness. The header comments make this explicit.
A produce call on a queue that just is_full() == false-d may
still return false if a sibling producer beat it; the caller’s
loop must handle that.
Optional execution-history trace — DEBUG_LFCQ
Section titled “Optional execution-history trace — DEBUG_LFCQ”The header carries a parallel produce_debug /
consume_debug / force_produce_debug family conditionally
compiled under DEBUG_LFCQ. Each instruments the protocol with
a thread-local local_history ring buffer, recording every
cursor load, block CAS attempt, store, and unblock as a
local_event. The intent is post-mortem debugging of stuck or
lost-item scenarios on stress benches; in production builds the
trace is dead code.
Consumers
Section titled “Consumers”| Owner | Type parameter | Field | File |
|---|---|---|---|
| Vacuum block-data buffer | vacuum_data_entry | vacuum_Block_data_buffer | src/query/vacuum.c:467 |
| Vacuum finished-job queue | VACUUM_LOG_BLOCKID | vacuum_Finished_job_queue | src/query/vacuum.c:475 |
| CDC log-info queue | CDC_LOGINFO_ENTRY * | cdc_Gl.loginfo_queue | src/transaction/log_manager.c:14534 |
| Page-buffer high-priority waiter queue | THREAD_ENTRY * | pgbuf_Pool.waiter_threads_high_priority | src/storage/page_buffer.c:741 |
| Page-buffer low-priority waiter queue | THREAD_ENTRY * | pgbuf_Pool.waiter_threads_low_priority | src/storage/page_buffer.c:742 |
| Page-buffer post-flush queue | PGBUF_BCB * | pgbuf_Pool.flushed_bcbs | src/storage/page_buffer.c:813 |
| Page-buffer private LRU victim hint | int (LRU index) | pgbuf_Pool.private_lrus_with_victims | src/storage/page_buffer.c:815 |
| Page-buffer big-private LRU victim hint | int | pgbuf_Pool.big_private_lrus_with_victims | src/storage/page_buffer.c:816 |
| Page-buffer shared LRU victim hint | int | pgbuf_Pool.shared_lrus_with_victims | src/storage/page_buffer.c:817 |
Most queues sit on force_produce for their producer side
(blocking accept) and consume for their consumer side (non-
blocking poll, retry from the daemon’s looper).
Why no transactional reclamation?
Section titled “Why no transactional reclamation?”The queue is not on the reclamation spine. It does not
allocate per-item dynamic memory (the slots are pre-allocated
in m_data); it does not retire pointer-shaped nodes; it does
not have an ABA hazard on the cursor (the cursor is a 64-bit
counter, never wraps in any realistic uptime). The
tran::system / tran::table / tran::descriptor machinery is
unnecessary here.
The trade-off: the queue’s payload T must be copyable by
value. In practice CUBRID’s queues store either trivially
copyable types (int, THREAD_ENTRY *, PGBUF_BCB *,
VACUUM_LOG_BLOCKID) or small POD structs. Larger payloads
should be heap-allocated and the pointer enqueued — the
CDC_LOGINFO_ENTRY * queue is the example.
Source Walkthrough
Section titled “Source Walkthrough”Public surface
Section titled “Public surface”lockfree::circular_queue<T>— class template,lockfree_circular_queue.hpp:46circular_queue::circular_queue (size)—:208circular_queue::~circular_queue—:222circular_queue::is_empty / is_full / is_half_full / size—:373/:381/:387/:394circular_queue::consume—:318circular_queue::produce—:230circular_queue::force_produce—:301circular_queue::get_consumer_cursor—:311
Private helpers
Section titled “Private helpers”next_pow2—:409test_empty_cursors/test_full_cursors—:419/:425load_cursor—:432test_and_increment_cursor—:439(the cursor-bump CAS; comments note “weak vs strong made no perf difference”)get_cursor_index—:447store_data/load_data—:454/:461is_blocked/block/unblock—:467/:476/:485init_blocked_cursors—:498
Debug variants (under DEBUG_LFCQ)
Section titled “Debug variants (under DEBUG_LFCQ)”produce_debug—:511force_produce_debug—:563consume_debug—:572local_history/local_event/local_actionenum —:118+
Position hints (as of 2026-05-07)
Section titled “Position hints (as of 2026-05-07)”| Symbol | File | Line |
|---|---|---|
lockfree::circular_queue<T> (class template) | src/base/lockfree_circular_queue.hpp | 46 |
BLOCK_FLAG ((cursor_type) 1 << 63) | src/base/lockfree_circular_queue.hpp | 80 / 203 |
circular_queue::circular_queue | src/base/lockfree_circular_queue.hpp | 208 |
circular_queue::produce | src/base/lockfree_circular_queue.hpp | 230 |
circular_queue::force_produce | src/base/lockfree_circular_queue.hpp | 301 |
circular_queue::consume | src/base/lockfree_circular_queue.hpp | 318 |
circular_queue::is_empty | src/base/lockfree_circular_queue.hpp | 373 |
circular_queue::is_full | src/base/lockfree_circular_queue.hpp | 381 |
circular_queue::size | src/base/lockfree_circular_queue.hpp | 394 |
circular_queue::next_pow2 | src/base/lockfree_circular_queue.hpp | 409 |
circular_queue::block | src/base/lockfree_circular_queue.hpp | 476 |
circular_queue::unblock | src/base/lockfree_circular_queue.hpp | 485 |
circular_queue::init_blocked_cursors | src/base/lockfree_circular_queue.hpp | 498 |
circular_queue::test_and_increment_cursor | src/base/lockfree_circular_queue.hpp | 439 |
vacuum_Block_data_buffer | src/query/vacuum.c | 467 |
vacuum_Finished_job_queue | src/query/vacuum.c | 475 |
cdc_Gl.loginfo_queue field | src/transaction/log_impl.h | 877 |
pgbuf_Pool.waiter_threads_high_priority | src/storage/page_buffer.c | 741 |
pgbuf_Pool.flushed_bcbs | src/storage/page_buffer.c | 813 |
Source verification (as of 2026-05-07)
Section titled “Source verification (as of 2026-05-07)”Verified facts
Section titled “Verified facts”- Capacity is rounded up to the next power of two.
next_pow2atlockfree_circular_queue.hpp:409; the constructor asserts(m_capacity & m_index_mask) == 0. BLOCK_FLAGis the high bit ofuint64_t. Defined as((cursor_type) 1) << ((sizeof (cursor_type) * CHAR_BIT) - 1)at:203. On a 64-bit cursor, that’s0x8000_0000_0000_0000.- The cursor-bump CAS is
compare_exchange_strong. The comment at:443notes “I tested, no performance difference from using one or the other,” indicating the choice is defensive rather than essential. - Per-slot block flags are initialized to the slot index.
init_blocked_cursorsat:498-507. The first generation produces againstflag = i, setsBLOCK_FLAG, then unblocks withflag = i + capacity. consumedoes not wait on a blocked slot. Whenis_blocked (cc)is true,consumereturnsfalseimmediately at:349. This is the documented “first item not yet produced” early-out.force_produceis a yield-loop, no backoff.:301-307.- The queue does not use the
tran::*reclamation framework. Notran::reclaimable_node, no descriptor bracketing, no retire ring. Slots are pre-allocated inm_data. - 9 active in-engine consumers. Listed in the §“Consumers”
table; verified via
grep -rn 'lockfree::circular_queue'.
Open questions
Section titled “Open questions”- Has the preempted-producer scenario been seen in production? The source comment names it as a known weakness. Investigation: search the issue tracker for stuck-vacuum or stuck-page-buffer reports under stress; if none, the back-of-queue reuse pattern empirically dominates the worst case.
- Should
force_produceadd backoff?std::this_thread::yieldon a tight full queue is surprisingly cheap on modern Linux (sched_yield is a no-op if no other thread is runnable on the CPU) but exponentially-backed-off would still be cheaper under sustained pressure. No measurement campaign exists. - Why is the consumer’s tentative-copy pattern preferred
over a swap-then-CAS? The current consumer copies the
payload before the cursor CAS; if the CAS fails, the
tentative copy is wasted. A swap-pattern (CAS first, copy
second) would require sentinels to mark “consumed but
slot still occupied,” which has its own complexity.
The current design’s documented contract — caller must not
trust
elementonfalsereturn — is a deliberate simplification. DEBUG_LFCQactivation pattern. The trace is per-thread (local_history) and the comment says the caller should keep its history “in thread context to be inspected on demand.” Has anyone exercised the trace recently? The dual produce/consume code paths are a maintenance hazard if the production paths drift from the debug paths without a sync.- 64-bit cursor wrap. At 1 billion enqueues per second
(well beyond any current load), a
uint64_tcursor wraps in ~580 years. Practically infinite. The cursor-arithmetic tests (pc <= cc,cc + capacity <= pc) would misbehave at wrap. No mitigation; correctly considered out-of-scope.
Beyond CUBRID — Comparative Designs & Research Frontiers
Section titled “Beyond CUBRID — Comparative Designs & Research Frontiers”- Vyukov MPMC queue. The cleanest production MPMC ring today: per-slot sequence number (no separate block flag), fetch-and-add cursors, no producer-stall hazard. A direct comparison on CUBRID’s vacuum and page-buffer paths would tell us whether the conversion buys throughput in proportion to the migration cost.
- LMAX Disruptor. Adds batched claim (a producer can
claim N consecutive slots in one CAS) and waiting strategy
policies (
BlockingWait,BusySpin,Yielding,LiteBlocking). CUBRID’sforce_produceis hard-coded to yielding; making it pluggable would let DWB use blocking and page-buffer victim hand-off use busy-spin. - Folly’s
ProducerConsumerQueue(SPSC, lock-free) andMPMCQueue. Production-grade with extensive measurement history. CUBRID’s queue is in spirit similar; the API differences are minor (tryEnqueue,tryDequeuewith&&perfect-forwarded payloads). - Eliminate the producer-stall hazard. The source comment’s proposed redesign — separate slot reservation from cursor publication — is exactly Vyukov’s algorithm. Adopting it would let the queue be wait-free on the producer side.
std::atomic<T>with explicit memory ordering. Today every cursor and flag access isseq_cst. The producer’s store-payload-then-unblock is naturally an acquire/release pair; tightening might reduce barrier overhead on weak- memory architectures.
Sources
Section titled “Sources”- Source files (in the CUBRID source tree at
/data/hgryoo/references/cubrid/):src/base/lockfree_circular_queue.hpp(the entire implementation; no.cppcompanion)src/base/base_flag.hpp(theflag<T>helper used to encode the block bit)
- Consumer source files:
src/query/vacuum.csrc/transaction/log_manager.c,src/transaction/log_impl.hsrc/storage/page_buffer.c
- Textbook chapters cited:
- Herlihy & Shavit, The Art of Multiprocessor Programming, ch. 13 §“Concurrent Queues” — bounded vs. unbounded, MS queue, fetch-and-add ring.
- Foundational papers:
- Michael, M. M., Scott, M. L. “Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms.” PODC 1996.
- Lamport, L. “Specifying Concurrent Program Modules.” ACM TOPLAS 5(2), 1983 — origin of the SPSC ring buffer.
- Vyukov, D. “Bounded MPMC queue.” https://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue
- Sibling docs in this repo:
cubrid-lockfree-overview.md— umbrella.cubrid-vacuum.md(if curated) — primary consumer forvacuum_Block_data_bufferandvacuum_Finished_job_queue.cubrid-page-buffer-manager.md— consumer for the waiter and victim queues.