CUBRID AREA Allocator — Slab-Style Pool Allocator With Free Lists for Same-Size Objects
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Cross-check Notes
- Open Questions
- Sources
Theoretical Background
Section titled “Theoretical Background”A general-purpose malloc is the wrong tool when a subsystem
allocates and frees thousands of identical-size records every
second. Two costs dominate: per-chunk bookkeeping overhead (a
24-byte DB_VALUE paired with 16-32 bytes of allocator metadata
doubles the working-set footprint) and fragmentation (freed
cells land back in the general allocator’s size-class bins, where
they compete with every other class for cache lines and TLB
entries; the cell just freed is unlikely to be the cell reused).
The textbook answer is a slab allocator, introduced by Jeff Bonwick in The Slab Allocator: An Object-Caching Kernel Memory Allocator (USENIX Summer 1994) and refined as SLAB / SLUB / SLOB in the Linux kernel. The shape: pre-cut a contiguous arena (“slab” or “block”) into fixed cells of one specific size, track per-cell free / in-use as one bit each, chain slabs together when one fills, and avoid touching the global allocator on the hot path.
Database Internals (Petrov), Ch. 4 “Implementing B-Trees”, and Database System Concepts (Silberschatz, Korth, Sudarshan, 6th ed.) Ch. 13 both cite slab/pool allocation as the default discipline for “small, structurally identical, high-churn records”: index nodes during bulk-load, log records during commit, parse-tree nodes during compilation, expression atoms during evaluation.
Three implementation choices the model leaves open frame the rest of this document:
- One slab vs many slabs? Bonwick’s kernel slab maintains a per-CPU magazine to scale across cores; embedded and database engines often settle for one chain per object class and let lock-freedom inside the chain absorb contention. CUBRID picks the second.
- What backs the per-cell free bit? An embedded freelist
threaded through the cells is cheaper (one pointer load) but
pollutes freed cells and has the ABA hazard under concurrency;
a side bitmap keeps cells clean and admits lock-free CAS.
CUBRID picks the bitmap (
LF_BITMAP). - How does the allocator find a non-full slab? Linear chain
scan is O(n); per-thread caches shrink it to O(1) at the price
of cross-thread imbalance; a single “hint” pointer is the
cheapest option. CUBRID picks the hint (
area->hint_block).
Common DBMS Design
Section titled “Common DBMS Design”Every mature engine wraps hot-path same-size allocations behind some flavour of pool. They are not in Bonwick’s original paper; they are the engineering vocabulary between the kernel slab and the SQL engine.
Per-class pool, named by what it holds
Section titled “Per-class pool, named by what it holds”Pools are not generic. PostgreSQL’s MemoryContext
(AllocSetContext, SlabContext, GenerationContext); MySQL’s
MEM_ROOT (THD-owned per-statement); CUBRID’s AREA * carry a
name field plus a global pointer in the owning subsystem
(Value_area, tp_Domain_area, Set_Ref_Area, …). Naming makes
leaks visible — a runtime dump shows which pool bloated.
Lifetime aligned to a phase, not to a transaction
Section titled “Lifetime aligned to a phase, not to a transaction”A pool’s flush-everything primitive is its real value.
PostgreSQL MemoryContextReset at end-of-statement, MySQL
free_root(MEM_ROOT*, MYF_FLAGS), CUBRID area_flush all walk
every block and free in one pass. The discipline is “allocate
freely, reset at a phase boundary”; the phase is defined by the
subsystem.
Bitmap or embedded-freelist for cell tracking
Section titled “Bitmap or embedded-freelist for cell tracking”Two implementations co-exist: an embedded freelist (first 8
bytes of a freed cell carry the next-freed pointer; PostgreSQL
AllocSet, MySQL MEM_ROOT, most mallocs), or a side
bitmap (an external word array; cleaner cells, one extra memory
hop). CUBRID picks the side bitmap because the bitmap operations
can be made lock-free with CAS over 32-bit words, whereas the
embedded freelist has the classic ABA hazard.
Lock-free fast path, mutex-protected slow path
Section titled “Lock-free fast path, mutex-protected slow path”Almost every engine makes the “find a free cell” fast path
lock-free and the “allocate a new slab” slow path coarse-locked.
CUBRID separates them via area->area_mutex (held only when
adding a fresh AREA_BLOCK) and the per-block lock-free
LF_BITMAP::get_entry.
Variable-size cells via “size class” pools
Section titled “Variable-size cells via “size class” pools”CUBRID does not multiplex sizes inside one AREA — every AREA
holds one cell size fixed at area_create time, and each
subsystem creates as many AREAs as it needs distinct sizes. Same
choice Linux’s slab caches make.
Theory ↔ CUBRID mapping
Section titled “Theory ↔ CUBRID mapping”| Theoretical concept | CUBRID name |
|---|---|
| Per-class pool | AREA struct (area_alloc.h); one global per object class (Value_area, tp_Domain_area, …) |
| Slab / chunk of fixed cells | AREA_BLOCK (area_alloc.h); block_size = element_size * alloc_count |
| Cell-occupancy tracking | AREA_BLOCK::bitmap (LF_BITMAP from lockfree_bitmap.hpp) |
| Slab chain | AREA::blockset_list → AREA_BLOCKSET_LIST::items[] → AREA_BLOCK * |
| Slow-path lock | AREA::area_mutex (held only inside area_alloc’s “add new block” branch and area_insert_block) |
| Fast-path hint | AREA::hint_block (single AREA_BLOCK *) |
| Bulk reset / destroy | area_flush, area_destroy |
| Global registry of all pools | static AREA *area_List + pthread_mutex_t area_List_lock |
| Debug double-free / use-after-free guard | AREA_PREFIX_INITED (0) / AREA_PREFIX_FREED (0x01010101) tag in the cell prefix (!NDEBUG) |
| Cells per block (config) | AREA::alloc_count (rounded up by LF_BITMAP_COUNT_ALIGN to a multiple of 32) |
| Blocks per blockset | AREA_BLOCKSET_SIZE = 256 (compile-time constant) |
CUBRID’s Approach
Section titled “CUBRID’s Approach”The AREA module is one .h and one .c file totalling about
1 000 lines. It exposes seven public functions
(area_init, area_final, area_create, area_destroy,
area_alloc, area_free, area_flush, plus area_validate and
area_dump) and three structures (AREA, AREA_BLOCK,
AREA_BLOCKSET_LIST). The distinguishing choices are:
(1) one global linked list of pools (area_List) so lifecycle
is uniform across the engine; (2) the chain of slabs is itself
chunked — blocksets of 256 block pointers — so the lookup data
structure stays cache-friendly even when the chain grows to
hundreds of blocks; (3) each block fronts a LF_BITMAP for
lock-free cell allocation; (4) a single hint_block pointer
short-cuts the common case to one bitmap operation; (5) the
mutex is held only when growing the chain by one block, never on
the alloc/free hot path; (6) debug builds embed a
4-or-8-byte prefix to catch double-free.
Overall layout
Section titled “Overall layout”flowchart LR
subgraph G["global registry"]
AL["area_List<br/>(static AREA *)"]
ALK["area_List_lock<br/>(pthread_mutex_t)"]
end
subgraph A["AREA (one per object class)"]
AH["name, element_size,<br/>alloc_count, block_size"]
AHB["hint_block →"]
AM["area_mutex<br/>(grow only)"]
ABL["blockset_list →"]
end
subgraph BS["AREA_BLOCKSET_LIST (chain, 256 ptrs each)"]
BS1["BLOCKSET #0<br/>items[0..255]<br/>used_count"]
BS2["BLOCKSET #1<br/>items[0..255]<br/>used_count"]
BSn["…"]
end
subgraph B["AREA_BLOCK (one slab)"]
BBM["LF_BITMAP<br/>(per-cell free/used)"]
BBD["data[]<br/>(alloc_count cells of element_size)"]
end
AL --> A
AL -. next .-> A
A --> ABL
ABL --> BS1
BS1 -. next .-> BS2
BS2 -. next .-> BSn
BS1 --> B
BS2 --> B
AHB --> B
The figure encodes three boundaries. (global / per-area) the
program-wide area_List chains pools; lookup is by code path
(each subsystem holds its own AREA * global), not by name.
(per-area / per-blockset) the slab chain is two-level — a
linked list of blockset arrays — so that scanning the chain for a
free cell touches at most ⌈N / 256⌉ cache lines for the per-set
metadata and amortises the per-block hop cost. (per-block /
per-cell) inside one block, the bitmap is the only structure
threads contend on; cells are addressed by index.
Buffer Control Block — AREA
Section titled “Buffer Control Block — AREA”// AREA — src/base/area_alloc.hstruct area{ AREA *next; /* link in the global area_List */
char *name; /* diagnostic only (area_dump) */ size_t element_size; /* cell size, including !NDEBUG prefix */ size_t alloc_count; /* cells per block (32-aligned) */ size_t block_size; /* element_size * alloc_count */
AREA_BLOCKSET_LIST *blockset_list; /* head of slab-pointer chain */ AREA_BLOCK *hint_block; /* hint for the fast path */ pthread_mutex_t area_mutex; /* serialises adding a new block */
/* for dumping */ size_t n_allocs; size_t n_frees;
void (*failure_function) (void); /* OOM hook (ws_abort_transaction in client) */};The failure_function field lets each owning subsystem hook OOM
recovery: the client links it to ws_abort_transaction so a
failed malloc in area_alloc_block aborts the current
transaction instead of crashing the process; the server leaves it
NULL and lets the error code propagate.
The element_size value passed to area_create is silently
adjusted in two ways: in debug builds an AREA_PREFIX_SIZE
(sizeof(double) = 8 bytes) header is added to every cell to
detect double-free; in all builds the size is rounded up to the
next 8-byte multiple. That means a caller asking for 24-byte cells
gets 32-byte cells in release and 32-byte cells in debug as well
(24 + 8 = 32). The cell payload starts at data + i*element_size + AREA_PREFIX_SIZE in debug, data + i*element_size in release.
Block — AREA_BLOCK
Section titled “Block — AREA_BLOCK”// AREA_BLOCK — src/base/area_alloc.hstruct area_block{ LF_BITMAP bitmap; char *data;};Two pointers’ worth of metadata, then block_size bytes of cell
storage. The block is allocated as one contiguous malloc of
block_size + sizeof(AREA_BLOCK) and the data pointer is set
into the trailing cell region; freeing a block is a single
free_and_init after bitmap.destroy() releases the bitmap’s
own buffer.
// area_alloc_block — src/base/area_alloc.cstatic AREA_BLOCK *area_alloc_block (AREA * area){ AREA_BLOCK *new_block; size_t total = area->block_size + sizeof (AREA_BLOCK);
new_block = (AREA_BLOCK *) malloc (total); if (new_block == NULL) { er_set (ER_ERROR_SEVERITY, ARG_FILE_LINE, ER_OUT_OF_VIRTUAL_MEMORY, 1, total); if (area->failure_function != NULL) (*(area->failure_function)) (); return NULL; }
new_block->bitmap.init (LF_BITMAP_LIST_OF_CHUNKS, (int) area->alloc_count, LF_AREA_BITMAP_USAGE_RATIO); new_block->data = ((char *) new_block) + sizeof (AREA_BLOCK); return new_block;}Two non-obvious facts. First, the bitmap is initialised in
LF_BITMAP_LIST_OF_CHUNKS style — internally that gives every
fix-sized chunk of 32 cells its own start_idx so concurrent
get_entry calls from different threads typically land on
different chunks (round-robin) and don’t collide on the same
32-bit CAS word. Second, the usage-ratio threshold differs
between server and client (LF_AREA_BITMAP_USAGE_RATIO):
LF_BITMAP_95PERCENTILE_USAGE_RATIO for SERVER_MODE,
LF_BITMAP_FULL_USAGE_RATIO otherwise. The 95th-percentile
threshold means a server-mode block is considered “full” when 95%
of cells are taken; the remaining 5% buffer reduces CAS contention
on the last few cells, at the cost of slightly higher memory
usage. The client (one thread) takes the cheaper full-utilisation
choice.
Blockset — chunked slab chain
Section titled “Blockset — chunked slab chain”// AREA_BLOCKSET_LIST — src/base/area_alloc.h#define AREA_BLOCKSET_SIZE 256
struct area_blockset_list{ AREA_BLOCKSET_LIST *next; AREA_BLOCK *items[AREA_BLOCKSET_SIZE]; int used_count;};A blockset holds at most 256 block pointers, sorted by the
AREA_BLOCK * value (which is also the lowest address of that
block’s cell storage, modulo a constant offset). The
two-level structure exists because:
- A single linked list of blocks would force
area_find_block(used byarea_validateandarea_free) into an O(N) scan of every block to locate a cell pointer’s owner. - An array of blocks would have to be reallocated when it grew — but a concurrent reader holding a stale pointer to the array would dereference freed memory.
- The blockset chain has both: each blockset is a fixed array
(binary-searchable for
area_find_block), and growth happens by appending a new blockset (no in-place reallocation, so readers are always safe).
The trade-off is that a blockset uses ~2 KB even when nearly empty. With 256 blocks in a typical AREA (default 32–1024 cells per block × 8–256 bytes per cell = 8 KB to 256 KB per block), the blockset overhead is < 1 % of the working set.
Allocation flow — area_alloc
Section titled “Allocation flow — area_alloc”flowchart LR
S["area_alloc(area)"] --> H["hint_block.bitmap.get_entry()"]
H --> H1{"entry != -1?"}
H1 -- "yes (fast path)" --> RET["pointer = data + idx*element_size<br/>(+ AREA_PREFIX_SIZE in debug)"]
H1 -- "no" --> SC["scan blockset chain<br/>each block: bitmap.get_entry()"]
SC --> SC1{"found?"}
SC1 -- "yes" --> CAS["opportunistic<br/>ATOMIC_CAS_ADDR hint = block"]
CAS --> RET
SC1 -- "no" --> LK["lock area_mutex"]
LK --> NB["area_alloc_block (malloc)"]
NB --> INS["area_insert_block (sorted append)"]
INS --> SETH["hint_block = new_block"]
SETH --> UL["unlock"]
UL --> RET
Three steps. Step 1 (lock-free) — try the hint block;
LF_BITMAP::get_entry is one CAS on the chunk word, returning a
cell index or -1 if full. Step 2 (lock-free) — walk every
blockset using VOLATILE_ACCESS reads of blockset->next and
->used_count; the ATOMIC_INC_32 of used_count in
area_insert_block is the release barrier for the preceding
items[] store, so concurrent appends are visible without further
synchronisation. Step 3 (mutex-held) — only one thread at a
time can malloc a fresh block; losers observe it on retry.
A subtle hint update lives in step 2: when an allocator finds a
free cell in some block other than the hint and the hint is
currently full, it tries ATOMIC_CAS_ADDR (&area->hint_block, hint_block, block). The CAS may fail (another thread already
upgraded it) but no thread loses progress. This converges the hint
to a non-full block under steady-state churn without taking the
mutex.
// area_alloc — src/base/area_alloc.c (condensed)void *area_alloc (AREA * area){ AREA_BLOCKSET_LIST *blockset; AREA_BLOCK *block, *hint_block; int used_count, i, entry_idx; char *entry_ptr; int rv;
/* Step 1: hint block fast path. */ hint_block = VOLATILE_ACCESS (area->hint_block, AREA_BLOCK *); entry_idx = hint_block->bitmap.get_entry (); if (entry_idx != -1) { block = hint_block; goto found; }
/* Step 2: scan the blockset chain. */ for (blockset = area->blockset_list; blockset != NULL; blockset = VOLATILE_ACCESS (blockset->next, AREA_BLOCKSET_LIST *)) { used_count = VOLATILE_ACCESS (blockset->used_count, int); for (i = 0; i < used_count; i++) { block = VOLATILE_ACCESS (blockset->items[i], AREA_BLOCK *); entry_idx = block->bitmap.get_entry (); if (entry_idx != -1) { /* opportunistic hint upgrade */ hint_block = VOLATILE_ACCESS (area->hint_block, AREA_BLOCK *); if (LF_BITMAP_IS_FULL (&hint_block->bitmap) && !LF_BITMAP_IS_FULL (&block->bitmap)) ATOMIC_CAS_ADDR (&area->hint_block, hint_block, block); goto found; } } }
/* Step 3: grow the chain by one block. */ rv = pthread_mutex_lock (&area->area_mutex);
if (area->hint_block != hint_block) /* re-check after acquiring lock */ { block = area->hint_block; entry_idx = block->bitmap.get_entry (); if (entry_idx != -1) { pthread_mutex_unlock (&area->area_mutex); goto found; } }
block = area_alloc_block (area); /* ... condensed: error handling, area_insert_block, set hint ... */ area->hint_block = block; pthread_mutex_unlock (&area->area_mutex);
found: entry_ptr = block->data + area->element_size * entry_idx;#if !defined (NDEBUG) *(int *) entry_ptr = AREA_PREFIX_INITED; entry_ptr += AREA_PREFIX_SIZE;#endif return ((void *) entry_ptr);}Note the n_allocs / n_frees counters are only updated outside
SERVER_MODE (the server avoids the atomic increment on every
allocation; the comment notes it is intentional). The counters
exist purely for area_dump diagnostics on the client side.
Insertion — keeping the blockset sorted by address
Section titled “Insertion — keeping the blockset sorted by address”// area_insert_block — src/base/area_alloc.c (condensed)static intarea_insert_block (AREA * area, AREA_BLOCK * new_block){ AREA_BLOCKSET_LIST **last_blockset_p = &area->blockset_list; AREA_BLOCKSET_LIST *blockset, *new_blockset; int used_count;
for (blockset = area->blockset_list; blockset != NULL; blockset = blockset->next) { last_blockset_p = &blockset->next; used_count = blockset->used_count; if (used_count == AREA_BLOCKSET_SIZE) continue; /* this set is full */
/* Append iff new_block has a higher address than the last * block in this blockset — keeps items[] address-sorted. */ if (blockset->items[used_count - 1] < new_block) { blockset->items[used_count] = new_block; ATOMIC_INC_32 (&blockset->used_count, 1); /* release barrier */ return NO_ERROR; } }
/* No room — allocate a fresh blockset and append. */ new_blockset = area_alloc_blockset (area); /* ... condensed ... */ new_blockset->items[0] = new_block; ATOMIC_INC_32 (&new_blockset->used_count, 1); *last_blockset_p = new_blockset; return NO_ERROR;}Three properties matter. (a) Insert appends to the end of
the first non-full blockset whose last block’s address is below
the new block’s. Because malloc typically returns higher
addresses for later allocations, the appended block is
address-sorted within the blockset most of the time. (b) When
no blockset accepts the block (e.g., a low-address block came
back from malloc after a high-address one), a fresh blockset is
allocated and chained at the tail — items in the new blockset are
not necessarily greater than items in earlier blocksets. (c)
The ATOMIC_INC_32 of used_count is the release barrier; the
preceding items[used_count] = new_block must complete first.
Concurrent readers (area_alloc step 2, area_find_block)
re-read both fields and observe a consistent (item, used_count)
pair.
The “no shifting / re-sort” comment in the source admits the
tradeoff: blocksets are only mostly sorted. area_find_block
deals with this by trying every blockset (not just the one whose
range contains the pointer) — see below.
Lookup — area_find_block for free / validate
Section titled “Lookup — area_find_block for free / validate”area_free and area_validate need to find the owning block of
a cell pointer. The structure makes this O(log N) per blockset
(binary search) and O(B) over the chain (B = blockset count).
// area_find_block — src/base/area_alloc.c (condensed)static AREA_BLOCK *area_find_block (AREA * area, const void *ptr){ AREA_BLOCKSET_LIST *blockset; AREA_BLOCK *first_block, *last_block, *block; int middle, left, right, pos, used_count;
for (blockset = area->blockset_list; blockset != NULL; blockset = VOLATILE_ACCESS (blockset->next, AREA_BLOCKSET_LIST *)) { used_count = VOLATILE_ACCESS (blockset->used_count, int); first_block = blockset->items[0]; if ((char *) ptr < first_block->data) continue; /* below this set's range */
last_block = VOLATILE_ACCESS (blockset->items[used_count - 1], AREA_BLOCK *); if (last_block->data + area->block_size <= (char *) ptr) continue; /* above this set's range */
/* binary search inside the blockset's address range */ left = 0; right = used_count - 1; while (left <= right) { middle = (left + right) / 2; block = VOLATILE_ACCESS (blockset->items[middle], AREA_BLOCK *); if (block->data > (char *) ptr) right = middle - 1; else left = middle + 1; } pos = right;
block = VOLATILE_ACCESS (blockset->items[pos], AREA_BLOCK *); if ((block->data <= (char *) ptr) && ((char *) ptr < block->data + area->block_size)) return block; } return NULL;}The first-block / last-block check is a cheap range filter: if
the pointer is below the first block’s data or above the last
block’s tail, the entire blockset is skipped. If the pointer
falls within the bracket, binary search inside the blockset
finds the block whose data is the largest pointer ≤ ptr,
and a final containment check confirms it. A pointer that
genuinely belongs to the area always passes these tests —
the “mostly sorted” property is good enough because a
fresh-from-malloc block whose address is below the existing
range never gets inserted into an existing blockset (it lands in
a fresh one), so within a single blockset the items remain
address-sorted by construction.
Deallocation — area_free
Section titled “Deallocation — area_free”// area_free — src/base/area_alloc.c (condensed)intarea_free (AREA * area, void *ptr){ AREA_BLOCK *block, *hint_block; char *entry_ptr; int entry_idx, offset;
if (ptr == NULL) { /* error */ }
#if !defined (NDEBUG) entry_ptr = ((char *) ptr) - AREA_PREFIX_SIZE; /* strip debug prefix */#else entry_ptr = (char *) ptr;#endif
block = area_find_block (area, (const void *) entry_ptr); if (block == NULL) { /* ER_AREA_ILLEGAL_POINTER */ }
offset = (int) (entry_ptr - block->data); if (offset < 0 || offset % area->element_size != 0) { /* ER_AREA_ILLEGAL_POINTER — pointer not at cell boundary */ }
#if !defined (NDEBUG) if (*(int *) entry_ptr != AREA_PREFIX_INITED) { /* ER_AREA_FREE_TWICE */ } *(int *) entry_ptr = AREA_PREFIX_FREED;#endif
entry_idx = offset / (int) area->element_size; block->bitmap.free_entry (entry_idx);
/* opportunistic hint downgrade — pull hint to a non-full block */ hint_block = VOLATILE_ACCESS (area->hint_block, AREA_BLOCK *); if (LF_BITMAP_IS_FULL (&hint_block->bitmap) && !LF_BITMAP_IS_FULL (&block->bitmap)) ATOMIC_CAS_ADDR (&area->hint_block, hint_block, block);
return NO_ERROR;}The validation chain is a four-step funnel: (1) non-NULL pointer,
(2) area_find_block identifies the owning block, (3) offset is
a positive multiple of element_size (the pointer is at a cell
boundary, not into the middle of a cell), (4) in debug, the
prefix tag matches AREA_PREFIX_INITED. A pointer that fails any
step is rejected without freeing — area_free is hardened against
caller bugs in a way free is not.
The same hint-CAS that area_alloc uses on the upgrade side
runs here on the downgrade side: when a free turns a previously
full block into one with at least one free cell, the hint is
opportunistically pointed at it. The next allocation will then
hit the just-vacated block on the fast path, which keeps the
working set tight.
Lifecycle — init, create, flush, destroy, final
Section titled “Lifecycle — init, create, flush, destroy, final”sequenceDiagram
participant Boot as boot_sr / boot_cl
participant Init as area_init
participant Owner as subsystem (e.g. object_primitive)
participant Create as area_create
participant Use as runtime
participant Final as area_final
Boot->>Init: area_init() (zero area_List)
Boot->>Owner: pr_area_init() / set_area_init() / ws_area_init()
Owner->>Create: area_create("Value containers", sizeof(DB_VALUE), N)
Create-->>Owner: AREA *
Note over Owner: hold the AREA * in a global<br/>(Value_area, tp_Domain_area, …)
Use->>Use: many area_alloc / area_free
Boot->>Final: area_final() at shutdown
Final->>Final: walk area_List → area_flush each → free
area_init is called from boot_server_main /
boot_client_main and only zeroes the global area_List. After
that, each subsystem’s own *_area_init calls area_create with
its size class; area_create mallocs the AREA, pre-allocates
the first blockset and the first block (so the allocator never
has an empty chain), grabs area_List_lock, and links the new
AREA at the head of the global list.
// area_create — src/base/area_alloc.c (condensed)AREA *area_create (const char *name, size_t element_size, size_t alloc_count){ AREA *area = (AREA *) malloc (sizeof (AREA)); /* ... condensed: name strdup, NULL checks ... */
#if !defined (NDEBUG) element_size += AREA_PREFIX_SIZE;#endif /* round up to 8-byte multiple */ if (element_size % 8) element_size += 8 - (element_size % 8);
area->element_size = element_size; area->alloc_count = LF_BITMAP_COUNT_ALIGN (alloc_count); /* 32-aligned */ area->block_size = area->element_size * area->alloc_count; area->n_allocs = area->n_frees = 0;#if defined (SERVER_MODE) area->failure_function = NULL;#else area->failure_function = ws_abort_transaction;#endif
area->blockset_list = area_alloc_blockset (area); area->hint_block = area_alloc_block (area); area->blockset_list->items[0] = area->hint_block; area->blockset_list->used_count++;
pthread_mutex_init (&area->area_mutex, NULL);
pthread_mutex_lock (&area_List_lock); area->next = area_List; area_List = area; pthread_mutex_unlock (&area_List_lock); return area;}Two subtleties. First, LF_BITMAP_COUNT_ALIGN rounds the requested
alloc_count up to a multiple of 32 (one bitmap word) — a request
for 100 cells per block actually allocates 128. Second,
failure_function is wired client-side to ws_abort_transaction
so that an allocation failure inside, say, pr_make_value causes
the current transaction to be rolled back rather than the call
returning NULL up an ill-prepared call stack.
area_destroy is the symmetric operation: unlink from area_List
under area_List_lock, then area_flush to free every block, then
free the AREA itself. area_flush walks blockset_list, calls
bitmap.destroy and free_and_init on every block, and frees each
blockset; it is also the body of area_final (which walks
area_List and flushes every pool).
Debug prefix — double-free guard
Section titled “Debug prefix — double-free guard”In !NDEBUG builds every cell carries an 8-byte prefix (declared
as sizeof(double) for alignment) holding a 4-byte tag:
// debug prefix tags — src/base/area_alloc.c#define AREA_PREFIX_SIZE sizeof(double)enum { AREA_PREFIX_INITED = 0, AREA_PREFIX_FREED = 0x01010101};area_alloc writes AREA_PREFIX_INITED (zero) and returns the
pointer past the prefix; area_free reads the prefix back through
the cell pointer minus 8 bytes, asserts it equals
AREA_PREFIX_INITED, and writes AREA_PREFIX_FREED. A second
free reads AREA_PREFIX_FREED and trips
ER_AREA_FREE_TWICE. The two values are deliberately chosen so
that an integer field in a freed object’s caller-visible state
appears as 0x01010101 — easy to spot in a debugger as “freed
slab cell”.
The trade-off of the prefix is allocation overhead: a 24-byte
DB_VALUE becomes 32 bytes (24 payload + 8 prefix); a 56-byte
TP_DOMAIN becomes 64. In production builds the prefix is
removed and the cell is the rounded-up payload only.
Modules using AREA
Section titled “Modules using AREA”Eight call sites under src/object/ create AREAs at module-init
time. These are the hot-path same-size objects of the client/SA
mode — server-mode subsystems like the lock manager or catalog
allocate from db_private_alloc on the per-thread arena instead.
| AREA global | Element | area_create site | Purpose |
|---|---|---|---|
Value_area | DB_VALUE | pr_area_init (object_primitive.c) | Universal value containers built by pr_make_value |
tp_Domain_area | TP_DOMAIN | tp_init (object_domain.c) | Type-domain descriptors (one per declared type) |
Template_area | OBJ_TEMPLATE | obt_area_init (object_template.c) | Object update/insert templates |
Assignment_area | OBJ_TEMPASSIGN | obt_area_init (object_template.c) | Per-field assignment records inside a template |
Template_area | SM_TEMPLATE (schema) | classobj_area_init (class_object.c) | Schema-template editing (CREATE/ALTER class) |
Set_Ref_Area | DB_COLLECTION | set_area_init (set_object.c) | Per-handle reference to a set/multiset/sequence |
Set_Obj_Area | COL | set_area_init (set_object.c) | The set body (header + storage) |
Objlist_area | DB_OBJLIST | ws_area_init (work_space.c) | Singly-linked-list nodes for object id lists |
The deck’s “AREA structures used by CUBRID modules” diagram lists the same eight areas under the same eight global-variable names.
Two things to notice. (a) All eight AREAs serve client-side
or standalone-mode workloads — the parser/optimiser building
parse trees, the workspace tracking dirty objects, set bodies
backing collection-typed columns. The server proper (lock
manager, page buffer, query executor) does not consult AREA on
its hot path. (b) The naming convention is <Subsystem>_area
(or <subsystem>_<class>_area); each name maps one-to-one to a
specific area_create call and a single object class. Adding a
new same-size hot allocation in the engine therefore amounts to
declaring a new global AREA *, calling area_create from the
subsystem’s init, and routing the alloc/free through area_alloc
/ area_free.
Lock-free bitmap (LF_BITMAP) — brief
Section titled “Lock-free bitmap (LF_BITMAP) — brief”The engine’s LF_BITMAP (in src/base/lockfree_bitmap.{hpp,cpp})
fronts every AREA_BLOCK. Two relevant features:
- Chunking style.
LIST_OF_CHUNKS(used by AREA) splits the bitmap into 32-cell chunks each with its ownstart_idxfor round-robin allocation;ONE_CHUNKmaintains a single word. AREA always usesLIST_OF_CHUNKSso concurrent allocations from the same block tend to land on distinct chunks. - Usage threshold.
is_full ()returns true whenentry_count_in_use / entry_count >= usage_threshold. The threshold is afloatset atinit:FULL_USAGE_RATIO(1.0) for client/SA mode,NINTETYFIVE_PERCENTILE_USAGE_RATIO(0.95) for server mode. The 95% choice trades a 5% memory premium for reduced contention when the last few cells in a block would otherwise force every allocator onto the same word.
This document treats LF_BITMAP as a black box; the deeper
lock-free engineering (CAS retry strategy, ABA prevention, the
start_idx round-robin) is a separate analysis target — see
Open Questions §1.
Source Walkthrough
Section titled “Source Walkthrough”Anchor on symbol names, not line numbers. Use
git grep -n '<symbol>' src/base/area_alloc.{c,h}to locate the current position.
Public API (area_alloc.c)
Section titled “Public API (area_alloc.c)”area_init,area_final— module on/off; called fromboot_server_mainandboot_client_main.area_create— alloc AREA, round size, pre-allocate first blockset+block, link intoarea_List.area_destroy— unlink, flush, free.area_alloc— three-step alloc (hint / scan / grow); the hot path.area_free— find owning block, validate cell boundary and debug prefix, clear the bit.area_validate— pointer-range sanity check (header-exposed, no in-tree caller).area_flush— free all blocks for one AREA.area_dump— diagnostic; iteratesarea_List.
Internals (area_alloc.c)
Section titled “Internals (area_alloc.c)”area_alloc_block—mallocone slab, init the bitmap.area_alloc_blockset—mallocone 256-slot blockset.area_insert_block— sorted append into the chain.area_find_block— chain walk + binary search per blockset.area_info— dump body for one AREA.
Data structures (area_alloc.h)
Section titled “Data structures (area_alloc.h)”struct area,struct area_block,struct area_blockset_list.
Config / constants
Section titled “Config / constants”AREA_BLOCKSET_SIZE(area_alloc.h) — 256.AREA_PREFIX_SIZE,AREA_PREFIX_INITED,AREA_PREFIX_FREED(area_alloc.c,!NDEBUG).LF_AREA_BITMAP_USAGE_RATIO— 0.95 forSERVER_MODE, 1.0 otherwise.LF_BITMAP_COUNT_ALIGN(lockfree_bitmap.hpp) — round to multiple of 32.
Globals
Section titled “Globals”area_List(static AREA *),area_List_lock(pthread_mutex_t, server only).
External AREA owners (one global per object class)
Section titled “External AREA owners (one global per object class)”Value_area(object_primitive.c) —DB_VALUE.tp_Domain_area(object_domain.c) —TP_DOMAIN.Template_area(object_template.c) —OBJ_TEMPLATE.Assignment_area(object_template.c) —OBJ_TEMPASSIGN.Template_area(class_object.c) — schemaSM_TEMPLATE(separate TU-static binding from the object-template one).Set_Ref_Area,Set_Obj_Area(set_object.c) —DB_COLLECTION,COL.Objlist_area(work_space.c) —DB_OBJLIST.
Position hints as of 2026-04-30
Section titled “Position hints as of 2026-04-30”| Symbol | File | Line |
|---|---|---|
AREA_BLOCKSET_SIZE macro | base/area_alloc.h | 42 |
struct area_block | base/area_alloc.h | 51 |
struct area_blockset_list | base/area_alloc.h | 63 |
struct area | base/area_alloc.h | 74 |
area_create decl | base/area_alloc.h | 104 |
area_alloc decl | base/area_alloc.h | 108 |
area_free decl | base/area_alloc.h | 110 |
AREA_PREFIX_SIZE | base/area_alloc.c | 60 |
AREA_PREFIX_INITED / _FREED | base/area_alloc.c | 64 |
area_List | base/area_alloc.c | 72 |
area_List_lock | base/area_alloc.c | 74 |
LF_AREA_BITMAP_USAGE_RATIO | base/area_alloc.c | 78 |
area_init | base/area_alloc.c | 100 |
area_final | base/area_alloc.c | 119 |
area_create | base/area_alloc.c | 146 |
area_destroy | base/area_alloc.c | 245 |
area_alloc_block | base/area_alloc.c | 287 |
area_alloc_blockset | base/area_alloc.c | 323 |
area_alloc | base/area_alloc.c | 356 |
area_validate | base/area_alloc.c | 482 |
area_free | base/area_alloc.c | 509 |
area_flush | base/area_alloc.c | 592 |
area_insert_block | base/area_alloc.c | 636 |
area_find_block | base/area_alloc.c | 703 |
area_info | base/area_alloc.c | 778 |
area_dump | base/area_alloc.c | 848 |
LF_BITMAP_COUNT_ALIGN | base/lockfree_bitmap.hpp | 89 |
Value_area = area_create(...) | object/object_primitive.c | 1774 |
Objlist_area = area_create(...) | object/work_space.c | 3852 |
Template_area = area_create(...) (object) | object/object_template.c | 158 |
Assignment_area = area_create(...) | object/object_template.c | 164 |
tp_Domain_area = area_create(...) | object/object_domain.c | 659 |
Template_area = area_create(...) (schema) | object/class_object.c | 168 |
Set_Ref_Area = area_create(...) | object/set_object.c | 158 |
Set_Obj_Area = area_create(...) | object/set_object.c | 167 |
Cross-check Notes
Section titled “Cross-check Notes”The raw analysis (Korean PDF, v1.0) is broadly accurate but lags in five places:
-
“AREA is one global per process”. The deck phrases AREA itself as a single pool (“프로세스당 하나의 전역변수”). The source maintains a list of AREAs (
area_List), one per object class; the global is the list head, not a single AREA. The deck’s own “modules using AREA” diagram is correct. -
“AREA_BLOCKSET manages allocation lock-free”. True at the bitmap-cell level (
LF_BITMAPuses CAS forget_entry/free_entry), but not at the block-add level —area_alloc’s slow path takesarea->area_mutex, andarea_insert_blockruns strictly under that mutex. The lock-free property covers the steady-state hot path only. -
“Free list” terminology. The deck describes AREA as maintaining a free list of cells; implementation-wise there is no list — cell occupancy is tracked by the bitmap. The semantics are equivalent.
-
Hint block. The deck’s struct diagram includes
hint_blockbut does not describe its role. The hint is the single most-impactful structure for performance — it short-cuts step 2 ofarea_allocfor the common case, and the opportunisticATOMIC_CAS_ADDRon hint upgrade/downgrade is the main reason chain-scan overhead does not dominate under churn. -
Debug prefix. The deck’s struct diagrams show only the release-build cell layout;
!NDEBUGbuilds prepend an 8-byte prefix (AREA_PREFIX_INITED/AREA_PREFIX_FREED) for double-free detection. This explains whyelement_sizeprinted byarea_dumpdiffers from what the subsystem requested.
Open Questions
Section titled “Open Questions”-
LF_BITMAPdeeper analysis. The deck defers this to a future document, citing JIRA CBRD-21800 (lock_free.hC++ refactor). The bitmap’s CAS retry strategy under high contention, the round-robinstart_idxper chunk, and theis_fullthreshold semantics merit a dedicated walk-through oflockfree_bitmap.cpp(276 lines). -
Why is the slow-path lock per-AREA, not global? A single global mutex would suffice (the slow path is rare), and the per-AREA
area_mutexadds 40 bytes per AREA. Was the choice contention-driven (eight inits in parallel) or just symmetry witharea_List_lock? -
LF_AREA_BITMAP_USAGE_RATIO = 0.95server-mode trade-off. The 5 % buffer reduces CAS contention on the last cells of a block at the cost of wasted memory. Has this been measured against 0.9 / 0.99? -
Why doesn’t the server count
n_allocs/n_frees? The counter increments are commented out inSERVER_MODE.area_dumpstill reads them and would print zero — is the dump path intentionally client-only? -
Address-sorting fragility under jemalloc / tcmalloc. The blockset’s “mostly sorted” property assumes
mallocreturns higher addresses on average. Thread-cached / arena-partitioned allocators can break this; the fallback (new blockset on out-of-order address) is correct but degrades to one block per blockset. Worth instrumenting. -
area_validatecallers. The function is exposed in the header but no external callers were found in current source — is it dead, or used only by client (db_*API) code?
Sources
Section titled “Sources”Raw analyses (raw/code-analysis/cubrid/common/)
Section titled “Raw analyses (raw/code-analysis/cubrid/common/)”code_analysis_Common_module_AREA.pdf— primary deck (Korean, v1.0, hgryoo). Section structure: AREA 구조 소개, AREA의 내부 구조, AREA 관련 함수, AREA 구조를 사용하는 CUBRID의 모듈들, TODO. Four pages, three figures (struct layout, BLOCKSET layout, modules-using-AREA diagram)._converted/common-module-area.pdf.md—markitdownextract of the same.
Sibling docs
Section titled “Sibling docs”knowledge/code-analysis/cubrid/cubrid-page-buffer-manager.md— usesdb_private_alloc/mallocrather than AREA, but shares the design idiom of “per-class pool with lock-free bitmap” viaLF_BITMAPinstances inside eachAREA_BLOCK.knowledge/code-analysis/cubrid/cubrid-heartbeat.md— different module; cited as the style template for this document.
Textbook chapters / papers
Section titled “Textbook chapters / papers”- Database Internals (Petrov), Ch. 4 “Implementing B-Trees” — references slab/pool allocation in the context of B-tree node cache.
- Database System Concepts (Silberschatz, Korth, Sudarshan, 6th ed.), Ch. 13 “Storage and File Structure” — buffer-manager framing under which same-size allocators sit.
- Bonwick, The Slab Allocator: An Object-Caching Kernel Memory Allocator, USENIX Summer 1994 — the canonical slab paper; introduces the per-class cache and constructor/destructor pattern that AREA simplifies away.
CUBRID source (/data/hgryoo/references/cubrid/)
Section titled “CUBRID source (/data/hgryoo/references/cubrid/)”src/base/area_alloc.h— public types and API.src/base/area_alloc.c— implementation.src/base/lockfree_bitmap.{hpp,cpp}— the per-block bitmap used for cell-level lock-free allocation.src/object/object_primitive.c—Value_areasetup andpr_make_value/pr_clear_valuecallers.src/object/object_domain.c—tp_Domain_area.src/object/object_template.c—Template_area(object) andAssignment_area.src/object/class_object.c—Template_area(schema).src/object/set_object.c—Set_Ref_AreaandSet_Obj_Area.src/object/work_space.c—Objlist_area.src/transaction/boot_sr.c,src/transaction/boot_cl.c— callarea_initat boot,area_finalat shutdown.src/executables/unittests_area.c— the multi-threadedarea_alloc/area_freestress test.