Skip to content

CUBRID AREA Allocator — Slab-Style Pool Allocator With Free Lists for Same-Size Objects

Contents:

A general-purpose malloc is the wrong tool when a subsystem allocates and frees thousands of identical-size records every second. Two costs dominate: per-chunk bookkeeping overhead (a 24-byte DB_VALUE paired with 16-32 bytes of allocator metadata doubles the working-set footprint) and fragmentation (freed cells land back in the general allocator’s size-class bins, where they compete with every other class for cache lines and TLB entries; the cell just freed is unlikely to be the cell reused).

The textbook answer is a slab allocator, introduced by Jeff Bonwick in The Slab Allocator: An Object-Caching Kernel Memory Allocator (USENIX Summer 1994) and refined as SLAB / SLUB / SLOB in the Linux kernel. The shape: pre-cut a contiguous arena (“slab” or “block”) into fixed cells of one specific size, track per-cell free / in-use as one bit each, chain slabs together when one fills, and avoid touching the global allocator on the hot path.

Database Internals (Petrov), Ch. 4 “Implementing B-Trees”, and Database System Concepts (Silberschatz, Korth, Sudarshan, 6th ed.) Ch. 13 both cite slab/pool allocation as the default discipline for “small, structurally identical, high-churn records”: index nodes during bulk-load, log records during commit, parse-tree nodes during compilation, expression atoms during evaluation.

Three implementation choices the model leaves open frame the rest of this document:

  1. One slab vs many slabs? Bonwick’s kernel slab maintains a per-CPU magazine to scale across cores; embedded and database engines often settle for one chain per object class and let lock-freedom inside the chain absorb contention. CUBRID picks the second.
  2. What backs the per-cell free bit? An embedded freelist threaded through the cells is cheaper (one pointer load) but pollutes freed cells and has the ABA hazard under concurrency; a side bitmap keeps cells clean and admits lock-free CAS. CUBRID picks the bitmap (LF_BITMAP).
  3. How does the allocator find a non-full slab? Linear chain scan is O(n); per-thread caches shrink it to O(1) at the price of cross-thread imbalance; a single “hint” pointer is the cheapest option. CUBRID picks the hint (area->hint_block).

Every mature engine wraps hot-path same-size allocations behind some flavour of pool. They are not in Bonwick’s original paper; they are the engineering vocabulary between the kernel slab and the SQL engine.

Pools are not generic. PostgreSQL’s MemoryContext (AllocSetContext, SlabContext, GenerationContext); MySQL’s MEM_ROOT (THD-owned per-statement); CUBRID’s AREA * carry a name field plus a global pointer in the owning subsystem (Value_area, tp_Domain_area, Set_Ref_Area, …). Naming makes leaks visible — a runtime dump shows which pool bloated.

Lifetime aligned to a phase, not to a transaction

Section titled “Lifetime aligned to a phase, not to a transaction”

A pool’s flush-everything primitive is its real value. PostgreSQL MemoryContextReset at end-of-statement, MySQL free_root(MEM_ROOT*, MYF_FLAGS), CUBRID area_flush all walk every block and free in one pass. The discipline is “allocate freely, reset at a phase boundary”; the phase is defined by the subsystem.

Bitmap or embedded-freelist for cell tracking

Section titled “Bitmap or embedded-freelist for cell tracking”

Two implementations co-exist: an embedded freelist (first 8 bytes of a freed cell carry the next-freed pointer; PostgreSQL AllocSet, MySQL MEM_ROOT, most mallocs), or a side bitmap (an external word array; cleaner cells, one extra memory hop). CUBRID picks the side bitmap because the bitmap operations can be made lock-free with CAS over 32-bit words, whereas the embedded freelist has the classic ABA hazard.

Lock-free fast path, mutex-protected slow path

Section titled “Lock-free fast path, mutex-protected slow path”

Almost every engine makes the “find a free cell” fast path lock-free and the “allocate a new slab” slow path coarse-locked. CUBRID separates them via area->area_mutex (held only when adding a fresh AREA_BLOCK) and the per-block lock-free LF_BITMAP::get_entry.

Variable-size cells via “size class” pools

Section titled “Variable-size cells via “size class” pools”

CUBRID does not multiplex sizes inside one AREA — every AREA holds one cell size fixed at area_create time, and each subsystem creates as many AREAs as it needs distinct sizes. Same choice Linux’s slab caches make.

Theoretical conceptCUBRID name
Per-class poolAREA struct (area_alloc.h); one global per object class (Value_area, tp_Domain_area, …)
Slab / chunk of fixed cellsAREA_BLOCK (area_alloc.h); block_size = element_size * alloc_count
Cell-occupancy trackingAREA_BLOCK::bitmap (LF_BITMAP from lockfree_bitmap.hpp)
Slab chainAREA::blockset_listAREA_BLOCKSET_LIST::items[]AREA_BLOCK *
Slow-path lockAREA::area_mutex (held only inside area_alloc’s “add new block” branch and area_insert_block)
Fast-path hintAREA::hint_block (single AREA_BLOCK *)
Bulk reset / destroyarea_flush, area_destroy
Global registry of all poolsstatic AREA *area_List + pthread_mutex_t area_List_lock
Debug double-free / use-after-free guardAREA_PREFIX_INITED (0) / AREA_PREFIX_FREED (0x01010101) tag in the cell prefix (!NDEBUG)
Cells per block (config)AREA::alloc_count (rounded up by LF_BITMAP_COUNT_ALIGN to a multiple of 32)
Blocks per blocksetAREA_BLOCKSET_SIZE = 256 (compile-time constant)

The AREA module is one .h and one .c file totalling about 1 000 lines. It exposes seven public functions (area_init, area_final, area_create, area_destroy, area_alloc, area_free, area_flush, plus area_validate and area_dump) and three structures (AREA, AREA_BLOCK, AREA_BLOCKSET_LIST). The distinguishing choices are: (1) one global linked list of pools (area_List) so lifecycle is uniform across the engine; (2) the chain of slabs is itself chunked — blocksets of 256 block pointers — so the lookup data structure stays cache-friendly even when the chain grows to hundreds of blocks; (3) each block fronts a LF_BITMAP for lock-free cell allocation; (4) a single hint_block pointer short-cuts the common case to one bitmap operation; (5) the mutex is held only when growing the chain by one block, never on the alloc/free hot path; (6) debug builds embed a 4-or-8-byte prefix to catch double-free.

flowchart LR
  subgraph G["global registry"]
    AL["area_List<br/>(static AREA *)"]
    ALK["area_List_lock<br/>(pthread_mutex_t)"]
  end
  subgraph A["AREA (one per object class)"]
    AH["name, element_size,<br/>alloc_count, block_size"]
    AHB["hint_block →"]
    AM["area_mutex<br/>(grow only)"]
    ABL["blockset_list →"]
  end
  subgraph BS["AREA_BLOCKSET_LIST (chain, 256 ptrs each)"]
    BS1["BLOCKSET #0<br/>items[0..255]<br/>used_count"]
    BS2["BLOCKSET #1<br/>items[0..255]<br/>used_count"]
    BSn["…"]
  end
  subgraph B["AREA_BLOCK (one slab)"]
    BBM["LF_BITMAP<br/>(per-cell free/used)"]
    BBD["data[]<br/>(alloc_count cells of element_size)"]
  end
  AL --> A
  AL -. next .-> A
  A --> ABL
  ABL --> BS1
  BS1 -. next .-> BS2
  BS2 -. next .-> BSn
  BS1 --> B
  BS2 --> B
  AHB --> B

The figure encodes three boundaries. (global / per-area) the program-wide area_List chains pools; lookup is by code path (each subsystem holds its own AREA * global), not by name. (per-area / per-blockset) the slab chain is two-level — a linked list of blockset arrays — so that scanning the chain for a free cell touches at most ⌈N / 256⌉ cache lines for the per-set metadata and amortises the per-block hop cost. (per-block / per-cell) inside one block, the bitmap is the only structure threads contend on; cells are addressed by index.

// AREA — src/base/area_alloc.h
struct area
{
AREA *next; /* link in the global area_List */
char *name; /* diagnostic only (area_dump) */
size_t element_size; /* cell size, including !NDEBUG prefix */
size_t alloc_count; /* cells per block (32-aligned) */
size_t block_size; /* element_size * alloc_count */
AREA_BLOCKSET_LIST *blockset_list; /* head of slab-pointer chain */
AREA_BLOCK *hint_block; /* hint for the fast path */
pthread_mutex_t area_mutex; /* serialises adding a new block */
/* for dumping */
size_t n_allocs;
size_t n_frees;
void (*failure_function) (void); /* OOM hook (ws_abort_transaction in client) */
};

The failure_function field lets each owning subsystem hook OOM recovery: the client links it to ws_abort_transaction so a failed malloc in area_alloc_block aborts the current transaction instead of crashing the process; the server leaves it NULL and lets the error code propagate.

The element_size value passed to area_create is silently adjusted in two ways: in debug builds an AREA_PREFIX_SIZE (sizeof(double) = 8 bytes) header is added to every cell to detect double-free; in all builds the size is rounded up to the next 8-byte multiple. That means a caller asking for 24-byte cells gets 32-byte cells in release and 32-byte cells in debug as well (24 + 8 = 32). The cell payload starts at data + i*element_size + AREA_PREFIX_SIZE in debug, data + i*element_size in release.

// AREA_BLOCK — src/base/area_alloc.h
struct area_block
{
LF_BITMAP bitmap;
char *data;
};

Two pointers’ worth of metadata, then block_size bytes of cell storage. The block is allocated as one contiguous malloc of block_size + sizeof(AREA_BLOCK) and the data pointer is set into the trailing cell region; freeing a block is a single free_and_init after bitmap.destroy() releases the bitmap’s own buffer.

// area_alloc_block — src/base/area_alloc.c
static AREA_BLOCK *
area_alloc_block (AREA * area)
{
AREA_BLOCK *new_block;
size_t total = area->block_size + sizeof (AREA_BLOCK);
new_block = (AREA_BLOCK *) malloc (total);
if (new_block == NULL)
{
er_set (ER_ERROR_SEVERITY, ARG_FILE_LINE, ER_OUT_OF_VIRTUAL_MEMORY, 1, total);
if (area->failure_function != NULL)
(*(area->failure_function)) ();
return NULL;
}
new_block->bitmap.init (LF_BITMAP_LIST_OF_CHUNKS,
(int) area->alloc_count, LF_AREA_BITMAP_USAGE_RATIO);
new_block->data = ((char *) new_block) + sizeof (AREA_BLOCK);
return new_block;
}

Two non-obvious facts. First, the bitmap is initialised in LF_BITMAP_LIST_OF_CHUNKS style — internally that gives every fix-sized chunk of 32 cells its own start_idx so concurrent get_entry calls from different threads typically land on different chunks (round-robin) and don’t collide on the same 32-bit CAS word. Second, the usage-ratio threshold differs between server and client (LF_AREA_BITMAP_USAGE_RATIO): LF_BITMAP_95PERCENTILE_USAGE_RATIO for SERVER_MODE, LF_BITMAP_FULL_USAGE_RATIO otherwise. The 95th-percentile threshold means a server-mode block is considered “full” when 95% of cells are taken; the remaining 5% buffer reduces CAS contention on the last few cells, at the cost of slightly higher memory usage. The client (one thread) takes the cheaper full-utilisation choice.

// AREA_BLOCKSET_LIST — src/base/area_alloc.h
#define AREA_BLOCKSET_SIZE 256
struct area_blockset_list
{
AREA_BLOCKSET_LIST *next;
AREA_BLOCK *items[AREA_BLOCKSET_SIZE];
int used_count;
};

A blockset holds at most 256 block pointers, sorted by the AREA_BLOCK * value (which is also the lowest address of that block’s cell storage, modulo a constant offset). The two-level structure exists because:

  • A single linked list of blocks would force area_find_block (used by area_validate and area_free) into an O(N) scan of every block to locate a cell pointer’s owner.
  • An array of blocks would have to be reallocated when it grew — but a concurrent reader holding a stale pointer to the array would dereference freed memory.
  • The blockset chain has both: each blockset is a fixed array (binary-searchable for area_find_block), and growth happens by appending a new blockset (no in-place reallocation, so readers are always safe).

The trade-off is that a blockset uses ~2 KB even when nearly empty. With 256 blocks in a typical AREA (default 32–1024 cells per block × 8–256 bytes per cell = 8 KB to 256 KB per block), the blockset overhead is < 1 % of the working set.

flowchart LR
  S["area_alloc(area)"] --> H["hint_block.bitmap.get_entry()"]
  H --> H1{"entry != -1?"}
  H1 -- "yes (fast path)" --> RET["pointer = data + idx*element_size<br/>(+ AREA_PREFIX_SIZE in debug)"]
  H1 -- "no" --> SC["scan blockset chain<br/>each block: bitmap.get_entry()"]
  SC --> SC1{"found?"}
  SC1 -- "yes" --> CAS["opportunistic<br/>ATOMIC_CAS_ADDR hint = block"]
  CAS --> RET
  SC1 -- "no" --> LK["lock area_mutex"]
  LK --> NB["area_alloc_block (malloc)"]
  NB --> INS["area_insert_block (sorted append)"]
  INS --> SETH["hint_block = new_block"]
  SETH --> UL["unlock"]
  UL --> RET

Three steps. Step 1 (lock-free) — try the hint block; LF_BITMAP::get_entry is one CAS on the chunk word, returning a cell index or -1 if full. Step 2 (lock-free) — walk every blockset using VOLATILE_ACCESS reads of blockset->next and ->used_count; the ATOMIC_INC_32 of used_count in area_insert_block is the release barrier for the preceding items[] store, so concurrent appends are visible without further synchronisation. Step 3 (mutex-held) — only one thread at a time can malloc a fresh block; losers observe it on retry.

A subtle hint update lives in step 2: when an allocator finds a free cell in some block other than the hint and the hint is currently full, it tries ATOMIC_CAS_ADDR (&area->hint_block, hint_block, block). The CAS may fail (another thread already upgraded it) but no thread loses progress. This converges the hint to a non-full block under steady-state churn without taking the mutex.

// area_alloc — src/base/area_alloc.c (condensed)
void *
area_alloc (AREA * area)
{
AREA_BLOCKSET_LIST *blockset;
AREA_BLOCK *block, *hint_block;
int used_count, i, entry_idx;
char *entry_ptr;
int rv;
/* Step 1: hint block fast path. */
hint_block = VOLATILE_ACCESS (area->hint_block, AREA_BLOCK *);
entry_idx = hint_block->bitmap.get_entry ();
if (entry_idx != -1)
{
block = hint_block;
goto found;
}
/* Step 2: scan the blockset chain. */
for (blockset = area->blockset_list; blockset != NULL;
blockset = VOLATILE_ACCESS (blockset->next, AREA_BLOCKSET_LIST *))
{
used_count = VOLATILE_ACCESS (blockset->used_count, int);
for (i = 0; i < used_count; i++)
{
block = VOLATILE_ACCESS (blockset->items[i], AREA_BLOCK *);
entry_idx = block->bitmap.get_entry ();
if (entry_idx != -1)
{
/* opportunistic hint upgrade */
hint_block = VOLATILE_ACCESS (area->hint_block, AREA_BLOCK *);
if (LF_BITMAP_IS_FULL (&hint_block->bitmap)
&& !LF_BITMAP_IS_FULL (&block->bitmap))
ATOMIC_CAS_ADDR (&area->hint_block, hint_block, block);
goto found;
}
}
}
/* Step 3: grow the chain by one block. */
rv = pthread_mutex_lock (&area->area_mutex);
if (area->hint_block != hint_block) /* re-check after acquiring lock */
{
block = area->hint_block;
entry_idx = block->bitmap.get_entry ();
if (entry_idx != -1)
{
pthread_mutex_unlock (&area->area_mutex);
goto found;
}
}
block = area_alloc_block (area);
/* ... condensed: error handling, area_insert_block, set hint ... */
area->hint_block = block;
pthread_mutex_unlock (&area->area_mutex);
found:
entry_ptr = block->data + area->element_size * entry_idx;
#if !defined (NDEBUG)
*(int *) entry_ptr = AREA_PREFIX_INITED;
entry_ptr += AREA_PREFIX_SIZE;
#endif
return ((void *) entry_ptr);
}

Note the n_allocs / n_frees counters are only updated outside SERVER_MODE (the server avoids the atomic increment on every allocation; the comment notes it is intentional). The counters exist purely for area_dump diagnostics on the client side.

Insertion — keeping the blockset sorted by address

Section titled “Insertion — keeping the blockset sorted by address”
// area_insert_block — src/base/area_alloc.c (condensed)
static int
area_insert_block (AREA * area, AREA_BLOCK * new_block)
{
AREA_BLOCKSET_LIST **last_blockset_p = &area->blockset_list;
AREA_BLOCKSET_LIST *blockset, *new_blockset;
int used_count;
for (blockset = area->blockset_list; blockset != NULL; blockset = blockset->next)
{
last_blockset_p = &blockset->next;
used_count = blockset->used_count;
if (used_count == AREA_BLOCKSET_SIZE)
continue; /* this set is full */
/* Append iff new_block has a higher address than the last
* block in this blockset — keeps items[] address-sorted. */
if (blockset->items[used_count - 1] < new_block)
{
blockset->items[used_count] = new_block;
ATOMIC_INC_32 (&blockset->used_count, 1); /* release barrier */
return NO_ERROR;
}
}
/* No room — allocate a fresh blockset and append. */
new_blockset = area_alloc_blockset (area);
/* ... condensed ... */
new_blockset->items[0] = new_block;
ATOMIC_INC_32 (&new_blockset->used_count, 1);
*last_blockset_p = new_blockset;
return NO_ERROR;
}

Three properties matter. (a) Insert appends to the end of the first non-full blockset whose last block’s address is below the new block’s. Because malloc typically returns higher addresses for later allocations, the appended block is address-sorted within the blockset most of the time. (b) When no blockset accepts the block (e.g., a low-address block came back from malloc after a high-address one), a fresh blockset is allocated and chained at the tail — items in the new blockset are not necessarily greater than items in earlier blocksets. (c) The ATOMIC_INC_32 of used_count is the release barrier; the preceding items[used_count] = new_block must complete first. Concurrent readers (area_alloc step 2, area_find_block) re-read both fields and observe a consistent (item, used_count) pair.

The “no shifting / re-sort” comment in the source admits the tradeoff: blocksets are only mostly sorted. area_find_block deals with this by trying every blockset (not just the one whose range contains the pointer) — see below.

Lookup — area_find_block for free / validate

Section titled “Lookup — area_find_block for free / validate”

area_free and area_validate need to find the owning block of a cell pointer. The structure makes this O(log N) per blockset (binary search) and O(B) over the chain (B = blockset count).

// area_find_block — src/base/area_alloc.c (condensed)
static AREA_BLOCK *
area_find_block (AREA * area, const void *ptr)
{
AREA_BLOCKSET_LIST *blockset;
AREA_BLOCK *first_block, *last_block, *block;
int middle, left, right, pos, used_count;
for (blockset = area->blockset_list; blockset != NULL;
blockset = VOLATILE_ACCESS (blockset->next, AREA_BLOCKSET_LIST *))
{
used_count = VOLATILE_ACCESS (blockset->used_count, int);
first_block = blockset->items[0];
if ((char *) ptr < first_block->data)
continue; /* below this set's range */
last_block = VOLATILE_ACCESS (blockset->items[used_count - 1], AREA_BLOCK *);
if (last_block->data + area->block_size <= (char *) ptr)
continue; /* above this set's range */
/* binary search inside the blockset's address range */
left = 0; right = used_count - 1;
while (left <= right)
{
middle = (left + right) / 2;
block = VOLATILE_ACCESS (blockset->items[middle], AREA_BLOCK *);
if (block->data > (char *) ptr) right = middle - 1;
else left = middle + 1;
}
pos = right;
block = VOLATILE_ACCESS (blockset->items[pos], AREA_BLOCK *);
if ((block->data <= (char *) ptr) && ((char *) ptr < block->data + area->block_size))
return block;
}
return NULL;
}

The first-block / last-block check is a cheap range filter: if the pointer is below the first block’s data or above the last block’s tail, the entire blockset is skipped. If the pointer falls within the bracket, binary search inside the blockset finds the block whose data is the largest pointer ≤ ptr, and a final containment check confirms it. A pointer that genuinely belongs to the area always passes these tests — the “mostly sorted” property is good enough because a fresh-from-malloc block whose address is below the existing range never gets inserted into an existing blockset (it lands in a fresh one), so within a single blockset the items remain address-sorted by construction.

// area_free — src/base/area_alloc.c (condensed)
int
area_free (AREA * area, void *ptr)
{
AREA_BLOCK *block, *hint_block;
char *entry_ptr;
int entry_idx, offset;
if (ptr == NULL) { /* error */ }
#if !defined (NDEBUG)
entry_ptr = ((char *) ptr) - AREA_PREFIX_SIZE; /* strip debug prefix */
#else
entry_ptr = (char *) ptr;
#endif
block = area_find_block (area, (const void *) entry_ptr);
if (block == NULL) { /* ER_AREA_ILLEGAL_POINTER */ }
offset = (int) (entry_ptr - block->data);
if (offset < 0 || offset % area->element_size != 0)
{ /* ER_AREA_ILLEGAL_POINTER — pointer not at cell boundary */ }
#if !defined (NDEBUG)
if (*(int *) entry_ptr != AREA_PREFIX_INITED)
{ /* ER_AREA_FREE_TWICE */ }
*(int *) entry_ptr = AREA_PREFIX_FREED;
#endif
entry_idx = offset / (int) area->element_size;
block->bitmap.free_entry (entry_idx);
/* opportunistic hint downgrade — pull hint to a non-full block */
hint_block = VOLATILE_ACCESS (area->hint_block, AREA_BLOCK *);
if (LF_BITMAP_IS_FULL (&hint_block->bitmap) && !LF_BITMAP_IS_FULL (&block->bitmap))
ATOMIC_CAS_ADDR (&area->hint_block, hint_block, block);
return NO_ERROR;
}

The validation chain is a four-step funnel: (1) non-NULL pointer, (2) area_find_block identifies the owning block, (3) offset is a positive multiple of element_size (the pointer is at a cell boundary, not into the middle of a cell), (4) in debug, the prefix tag matches AREA_PREFIX_INITED. A pointer that fails any step is rejected without freeing — area_free is hardened against caller bugs in a way free is not.

The same hint-CAS that area_alloc uses on the upgrade side runs here on the downgrade side: when a free turns a previously full block into one with at least one free cell, the hint is opportunistically pointed at it. The next allocation will then hit the just-vacated block on the fast path, which keeps the working set tight.

Lifecycle — init, create, flush, destroy, final

Section titled “Lifecycle — init, create, flush, destroy, final”
sequenceDiagram
  participant Boot as boot_sr / boot_cl
  participant Init as area_init
  participant Owner as subsystem (e.g. object_primitive)
  participant Create as area_create
  participant Use as runtime
  participant Final as area_final
  Boot->>Init: area_init() (zero area_List)
  Boot->>Owner: pr_area_init() / set_area_init() / ws_area_init()
  Owner->>Create: area_create("Value containers", sizeof(DB_VALUE), N)
  Create-->>Owner: AREA *
  Note over Owner: hold the AREA * in a global<br/>(Value_area, tp_Domain_area, …)
  Use->>Use: many area_alloc / area_free
  Boot->>Final: area_final() at shutdown
  Final->>Final: walk area_List → area_flush each → free

area_init is called from boot_server_main / boot_client_main and only zeroes the global area_List. After that, each subsystem’s own *_area_init calls area_create with its size class; area_create mallocs the AREA, pre-allocates the first blockset and the first block (so the allocator never has an empty chain), grabs area_List_lock, and links the new AREA at the head of the global list.

// area_create — src/base/area_alloc.c (condensed)
AREA *
area_create (const char *name, size_t element_size, size_t alloc_count)
{
AREA *area = (AREA *) malloc (sizeof (AREA));
/* ... condensed: name strdup, NULL checks ... */
#if !defined (NDEBUG)
element_size += AREA_PREFIX_SIZE;
#endif
/* round up to 8-byte multiple */
if (element_size % 8) element_size += 8 - (element_size % 8);
area->element_size = element_size;
area->alloc_count = LF_BITMAP_COUNT_ALIGN (alloc_count); /* 32-aligned */
area->block_size = area->element_size * area->alloc_count;
area->n_allocs = area->n_frees = 0;
#if defined (SERVER_MODE)
area->failure_function = NULL;
#else
area->failure_function = ws_abort_transaction;
#endif
area->blockset_list = area_alloc_blockset (area);
area->hint_block = area_alloc_block (area);
area->blockset_list->items[0] = area->hint_block;
area->blockset_list->used_count++;
pthread_mutex_init (&area->area_mutex, NULL);
pthread_mutex_lock (&area_List_lock);
area->next = area_List;
area_List = area;
pthread_mutex_unlock (&area_List_lock);
return area;
}

Two subtleties. First, LF_BITMAP_COUNT_ALIGN rounds the requested alloc_count up to a multiple of 32 (one bitmap word) — a request for 100 cells per block actually allocates 128. Second, failure_function is wired client-side to ws_abort_transaction so that an allocation failure inside, say, pr_make_value causes the current transaction to be rolled back rather than the call returning NULL up an ill-prepared call stack.

area_destroy is the symmetric operation: unlink from area_List under area_List_lock, then area_flush to free every block, then free the AREA itself. area_flush walks blockset_list, calls bitmap.destroy and free_and_init on every block, and frees each blockset; it is also the body of area_final (which walks area_List and flushes every pool).

In !NDEBUG builds every cell carries an 8-byte prefix (declared as sizeof(double) for alignment) holding a 4-byte tag:

// debug prefix tags — src/base/area_alloc.c
#define AREA_PREFIX_SIZE sizeof(double)
enum {
AREA_PREFIX_INITED = 0,
AREA_PREFIX_FREED = 0x01010101
};

area_alloc writes AREA_PREFIX_INITED (zero) and returns the pointer past the prefix; area_free reads the prefix back through the cell pointer minus 8 bytes, asserts it equals AREA_PREFIX_INITED, and writes AREA_PREFIX_FREED. A second free reads AREA_PREFIX_FREED and trips ER_AREA_FREE_TWICE. The two values are deliberately chosen so that an integer field in a freed object’s caller-visible state appears as 0x01010101 — easy to spot in a debugger as “freed slab cell”.

The trade-off of the prefix is allocation overhead: a 24-byte DB_VALUE becomes 32 bytes (24 payload + 8 prefix); a 56-byte TP_DOMAIN becomes 64. In production builds the prefix is removed and the cell is the rounded-up payload only.

Eight call sites under src/object/ create AREAs at module-init time. These are the hot-path same-size objects of the client/SA mode — server-mode subsystems like the lock manager or catalog allocate from db_private_alloc on the per-thread arena instead.

AREA globalElementarea_create sitePurpose
Value_areaDB_VALUEpr_area_init (object_primitive.c)Universal value containers built by pr_make_value
tp_Domain_areaTP_DOMAINtp_init (object_domain.c)Type-domain descriptors (one per declared type)
Template_areaOBJ_TEMPLATEobt_area_init (object_template.c)Object update/insert templates
Assignment_areaOBJ_TEMPASSIGNobt_area_init (object_template.c)Per-field assignment records inside a template
Template_areaSM_TEMPLATE (schema)classobj_area_init (class_object.c)Schema-template editing (CREATE/ALTER class)
Set_Ref_AreaDB_COLLECTIONset_area_init (set_object.c)Per-handle reference to a set/multiset/sequence
Set_Obj_AreaCOLset_area_init (set_object.c)The set body (header + storage)
Objlist_areaDB_OBJLISTws_area_init (work_space.c)Singly-linked-list nodes for object id lists

The deck’s “AREA structures used by CUBRID modules” diagram lists the same eight areas under the same eight global-variable names.

Two things to notice. (a) All eight AREAs serve client-side or standalone-mode workloads — the parser/optimiser building parse trees, the workspace tracking dirty objects, set bodies backing collection-typed columns. The server proper (lock manager, page buffer, query executor) does not consult AREA on its hot path. (b) The naming convention is <Subsystem>_area (or <subsystem>_<class>_area); each name maps one-to-one to a specific area_create call and a single object class. Adding a new same-size hot allocation in the engine therefore amounts to declaring a new global AREA *, calling area_create from the subsystem’s init, and routing the alloc/free through area_alloc / area_free.

The engine’s LF_BITMAP (in src/base/lockfree_bitmap.{hpp,cpp}) fronts every AREA_BLOCK. Two relevant features:

  • Chunking style. LIST_OF_CHUNKS (used by AREA) splits the bitmap into 32-cell chunks each with its own start_idx for round-robin allocation; ONE_CHUNK maintains a single word. AREA always uses LIST_OF_CHUNKS so concurrent allocations from the same block tend to land on distinct chunks.
  • Usage threshold. is_full () returns true when entry_count_in_use / entry_count >= usage_threshold. The threshold is a float set at init: FULL_USAGE_RATIO (1.0) for client/SA mode, NINTETYFIVE_PERCENTILE_USAGE_RATIO (0.95) for server mode. The 95% choice trades a 5% memory premium for reduced contention when the last few cells in a block would otherwise force every allocator onto the same word.

This document treats LF_BITMAP as a black box; the deeper lock-free engineering (CAS retry strategy, ABA prevention, the start_idx round-robin) is a separate analysis target — see Open Questions §1.

Anchor on symbol names, not line numbers. Use git grep -n '<symbol>' src/base/area_alloc.{c,h} to locate the current position.

  • area_init, area_final — module on/off; called from boot_server_main and boot_client_main.
  • area_create — alloc AREA, round size, pre-allocate first blockset+block, link into area_List.
  • area_destroy — unlink, flush, free.
  • area_alloc — three-step alloc (hint / scan / grow); the hot path.
  • area_free — find owning block, validate cell boundary and debug prefix, clear the bit.
  • area_validate — pointer-range sanity check (header-exposed, no in-tree caller).
  • area_flush — free all blocks for one AREA.
  • area_dump — diagnostic; iterates area_List.
  • area_alloc_blockmalloc one slab, init the bitmap.
  • area_alloc_blocksetmalloc one 256-slot blockset.
  • area_insert_block — sorted append into the chain.
  • area_find_block — chain walk + binary search per blockset.
  • area_info — dump body for one AREA.
  • struct area, struct area_block, struct area_blockset_list.
  • AREA_BLOCKSET_SIZE (area_alloc.h) — 256.
  • AREA_PREFIX_SIZE, AREA_PREFIX_INITED, AREA_PREFIX_FREED (area_alloc.c, !NDEBUG).
  • LF_AREA_BITMAP_USAGE_RATIO — 0.95 for SERVER_MODE, 1.0 otherwise.
  • LF_BITMAP_COUNT_ALIGN (lockfree_bitmap.hpp) — round to multiple of 32.
  • area_List (static AREA *), area_List_lock (pthread_mutex_t, server only).

External AREA owners (one global per object class)

Section titled “External AREA owners (one global per object class)”
  • Value_area (object_primitive.c) — DB_VALUE.
  • tp_Domain_area (object_domain.c) — TP_DOMAIN.
  • Template_area (object_template.c) — OBJ_TEMPLATE.
  • Assignment_area (object_template.c) — OBJ_TEMPASSIGN.
  • Template_area (class_object.c) — schema SM_TEMPLATE (separate TU-static binding from the object-template one).
  • Set_Ref_Area, Set_Obj_Area (set_object.c) — DB_COLLECTION, COL.
  • Objlist_area (work_space.c) — DB_OBJLIST.
SymbolFileLine
AREA_BLOCKSET_SIZE macrobase/area_alloc.h42
struct area_blockbase/area_alloc.h51
struct area_blockset_listbase/area_alloc.h63
struct areabase/area_alloc.h74
area_create declbase/area_alloc.h104
area_alloc declbase/area_alloc.h108
area_free declbase/area_alloc.h110
AREA_PREFIX_SIZEbase/area_alloc.c60
AREA_PREFIX_INITED / _FREEDbase/area_alloc.c64
area_Listbase/area_alloc.c72
area_List_lockbase/area_alloc.c74
LF_AREA_BITMAP_USAGE_RATIObase/area_alloc.c78
area_initbase/area_alloc.c100
area_finalbase/area_alloc.c119
area_createbase/area_alloc.c146
area_destroybase/area_alloc.c245
area_alloc_blockbase/area_alloc.c287
area_alloc_blocksetbase/area_alloc.c323
area_allocbase/area_alloc.c356
area_validatebase/area_alloc.c482
area_freebase/area_alloc.c509
area_flushbase/area_alloc.c592
area_insert_blockbase/area_alloc.c636
area_find_blockbase/area_alloc.c703
area_infobase/area_alloc.c778
area_dumpbase/area_alloc.c848
LF_BITMAP_COUNT_ALIGNbase/lockfree_bitmap.hpp89
Value_area = area_create(...)object/object_primitive.c1774
Objlist_area = area_create(...)object/work_space.c3852
Template_area = area_create(...) (object)object/object_template.c158
Assignment_area = area_create(...)object/object_template.c164
tp_Domain_area = area_create(...)object/object_domain.c659
Template_area = area_create(...) (schema)object/class_object.c168
Set_Ref_Area = area_create(...)object/set_object.c158
Set_Obj_Area = area_create(...)object/set_object.c167

The raw analysis (Korean PDF, v1.0) is broadly accurate but lags in five places:

  1. “AREA is one global per process”. The deck phrases AREA itself as a single pool (“프로세스당 하나의 전역변수”). The source maintains a list of AREAs (area_List), one per object class; the global is the list head, not a single AREA. The deck’s own “modules using AREA” diagram is correct.

  2. “AREA_BLOCKSET manages allocation lock-free”. True at the bitmap-cell level (LF_BITMAP uses CAS for get_entry / free_entry), but not at the block-add level — area_alloc’s slow path takes area->area_mutex, and area_insert_block runs strictly under that mutex. The lock-free property covers the steady-state hot path only.

  3. “Free list” terminology. The deck describes AREA as maintaining a free list of cells; implementation-wise there is no list — cell occupancy is tracked by the bitmap. The semantics are equivalent.

  4. Hint block. The deck’s struct diagram includes hint_block but does not describe its role. The hint is the single most-impactful structure for performance — it short-cuts step 2 of area_alloc for the common case, and the opportunistic ATOMIC_CAS_ADDR on hint upgrade/downgrade is the main reason chain-scan overhead does not dominate under churn.

  5. Debug prefix. The deck’s struct diagrams show only the release-build cell layout; !NDEBUG builds prepend an 8-byte prefix (AREA_PREFIX_INITED / AREA_PREFIX_FREED) for double-free detection. This explains why element_size printed by area_dump differs from what the subsystem requested.

  1. LF_BITMAP deeper analysis. The deck defers this to a future document, citing JIRA CBRD-21800 (lock_free.h C++ refactor). The bitmap’s CAS retry strategy under high contention, the round-robin start_idx per chunk, and the is_full threshold semantics merit a dedicated walk-through of lockfree_bitmap.cpp (276 lines).

  2. Why is the slow-path lock per-AREA, not global? A single global mutex would suffice (the slow path is rare), and the per-AREA area_mutex adds 40 bytes per AREA. Was the choice contention-driven (eight inits in parallel) or just symmetry with area_List_lock?

  3. LF_AREA_BITMAP_USAGE_RATIO = 0.95 server-mode trade-off. The 5 % buffer reduces CAS contention on the last cells of a block at the cost of wasted memory. Has this been measured against 0.9 / 0.99?

  4. Why doesn’t the server count n_allocs / n_frees? The counter increments are commented out in SERVER_MODE. area_dump still reads them and would print zero — is the dump path intentionally client-only?

  5. Address-sorting fragility under jemalloc / tcmalloc. The blockset’s “mostly sorted” property assumes malloc returns higher addresses on average. Thread-cached / arena-partitioned allocators can break this; the fallback (new blockset on out-of-order address) is correct but degrades to one block per blockset. Worth instrumenting.

  6. area_validate callers. The function is exposed in the header but no external callers were found in current source — is it dead, or used only by client (db_* API) code?

Raw analyses (raw/code-analysis/cubrid/common/)

Section titled “Raw analyses (raw/code-analysis/cubrid/common/)”
  • code_analysis_Common_module_AREA.pdf — primary deck (Korean, v1.0, hgryoo). Section structure: AREA 구조 소개, AREA의 내부 구조, AREA 관련 함수, AREA 구조를 사용하는 CUBRID의 모듈들, TODO. Four pages, three figures (struct layout, BLOCKSET layout, modules-using-AREA diagram).
  • _converted/common-module-area.pdf.mdmarkitdown extract of the same.
  • knowledge/code-analysis/cubrid/cubrid-page-buffer-manager.md — uses db_private_alloc / malloc rather than AREA, but shares the design idiom of “per-class pool with lock-free bitmap” via LF_BITMAP instances inside each AREA_BLOCK.
  • knowledge/code-analysis/cubrid/cubrid-heartbeat.md — different module; cited as the style template for this document.
  • Database Internals (Petrov), Ch. 4 “Implementing B-Trees” — references slab/pool allocation in the context of B-tree node cache.
  • Database System Concepts (Silberschatz, Korth, Sudarshan, 6th ed.), Ch. 13 “Storage and File Structure” — buffer-manager framing under which same-size allocators sit.
  • Bonwick, The Slab Allocator: An Object-Caching Kernel Memory Allocator, USENIX Summer 1994 — the canonical slab paper; introduces the per-class cache and constructor/destructor pattern that AREA simplifies away.

CUBRID source (/data/hgryoo/references/cubrid/)

Section titled “CUBRID source (/data/hgryoo/references/cubrid/)”
  • src/base/area_alloc.h — public types and API.
  • src/base/area_alloc.c — implementation.
  • src/base/lockfree_bitmap.{hpp,cpp} — the per-block bitmap used for cell-level lock-free allocation.
  • src/object/object_primitive.cValue_area setup and pr_make_value / pr_clear_value callers.
  • src/object/object_domain.ctp_Domain_area.
  • src/object/object_template.cTemplate_area (object) and Assignment_area.
  • src/object/class_object.cTemplate_area (schema).
  • src/object/set_object.cSet_Ref_Area and Set_Obj_Area.
  • src/object/work_space.cObjlist_area.
  • src/transaction/boot_sr.c, src/transaction/boot_cl.c — call area_init at boot, area_final at shutdown.
  • src/executables/unittests_area.c — the multi-threaded area_alloc / area_free stress test.