CUBRID Private Allocator — Per-Thread Lea Heap, C++ STL Allocator Wrapper, and Build-Mode Routing
Contents:
- Theoretical Background
- CUBRID’s Approach
- Source Walkthrough
- Source verification (as of 2026-05-07)
- Cross-check Notes
- Open Questions
- Sources
Theoretical Background
Section titled “Theoretical Background”A general-purpose malloc becomes a contention point when a query
engine churns through millions of small allocations per second
across hundreds of worker threads. Two costs dominate. First, a
single global free-list serialises every malloc/free behind
one mutex (or, on modern allocators, behind a per-size-class
fast-path that still ends up sharing cache lines on the slow
path). Second, allocations cross threads freely: a worker that
frees what another thread allocated returns the block to the
allocator’s idea of a thread-local cache, not the worker’s,
fragmenting the heap into a jumble of size classes that no thread
re-uses cleanly. The textbook answer is the per-thread arena
popularised by tcmalloc and jemalloc: each worker has its
own bookkeeping pool, allocations are served and freed locally,
and freed regions stay in the same arena that produced them.
Berger et al.’s Hoard (PLDI 2000) was the first widely-cited
academic version of this argument; tcmalloc (Ghemawat & Menage,
2007) and jemalloc (Evans, 2006) industrialised it.
CUBRID predates both of those libraries, so its solution is older
and simpler. It bundles Doug Lea’s dlmalloc — the free-
list-based fast-bin allocator that has been the GNU libc default
since 1996 — as a self-contained internal allocator
(customheaps), instantiates one Lea heap per worker thread,
and funnels every “engine-internal but transaction-scoped”
allocation through it. The result is a per-thread arena built
out of a classical algorithm, sitting one indirection above OS
malloc and one below the heap-using subsystems. The arena
layer is customheaps’s hl_register_lea_heap /
hl_lea_alloc / hl_lea_free family; the per-thread state
is THREAD_ENTRY::private_heap_id; the public façade is the
db_private_alloc / db_private_free / db_private_realloc
macro family.
A second concern, orthogonal to contention, is the C++ side.
The STL containers (std::vector, std::map, std::list, …)
take an allocator template parameter that must implement a
fixed concept (Stepanov’s allocator model from SGI STL,
codified in C++03 §20.1.5 and relaxed in C++11). CUBRID wants
its containers to allocate out of the same per-thread heap as
its C-side code, so it wraps the C db_private_alloc API in
cubmem::private_allocator<T> — a (thread_p, heap_id)-pair
allocator type that satisfies the STL concept and lets
std::vector<int, private_allocator<int>> route through the
per-thread heap automatically. private_unique_ptr<T> is the
unique-pointer flavour; PRIVATE_BLOCK_ALLOCATOR is the
cubmem::block_allocator flavour for code that streams into
expandable byte buffers.
The third axis is the build-mode split (cubrid-sa-cs-runtime.md).
The same source compiles three ways:
cub_server(SERVER_MODE) — the engine runs as a daemon, every request is on a worker thread, the per-thread Lea heap is the natural place to allocate.libcubridcs(CS_MODE) — the client side of a network split, there is no server-side heap; allocations go to the client workspace (db_ws_alloc).libcubridsa(SA_MODE) — the engine is linked into a single-threaded admin utility; allocations go either to one global Lea heap (private_heap_id) when on-the-server logic is active, or to the workspace when it is not.
db_private_alloc is one façade across these three regimes.
The discriminator on SA_MODE is a per-block header
(PRIVATE_MALLOC_HEADER) that records whether the block came
from the Lea heap or from the workspace, so that
db_private_free can route the deallocation back through the
correct path even when the caller has no idea which it was.
SERVER_MODE does not need this header: the routing on free
is decided by the thread’s current private_heap_id, not by
an in-band tag.
CUBRID’s Approach
Section titled “CUBRID’s Approach”A THREAD_ENTRY carries an HL_HEAPID private_heap_id. The
heap itself is a customheaps-managed Lea heap (Doug Lea’s
dlmalloc, vendored under external/). When a server thread
starts, the boot sequence calls db_create_private_heap()
which delegates to hl_register_lea_heap() and stores the
returned id on the thread entry. From that point on, every
db_private_alloc(thread_p, size) call resolves the thread
entry’s private_heap_id and routes the request to
hl_lea_alloc. At thread exit, db_destroy_private_heap
calls hl_unregister_lea_heap to drop the entire heap;
db_clear_private_heap clears it without dropping (used at
request boundaries to recycle the heap without paying the
re-registration cost).
// db_create_private_heap — src/base/memory_alloc.cHL_HEAPIDdb_create_private_heap (void){ HL_HEAPID heap_id = 0;#if defined (SERVER_MODE) heap_id = hl_register_lea_heap ();#else /* SERVER_MODE */ if (db_on_server) { heap_id = hl_register_lea_heap (); }#endif /* SERVER_MODE */ return heap_id;}Two helpers manage transient swaps. db_change_private_heap
swaps the per-thread heap id and returns the old one, used to
“isolate” a sub-tree of allocations into a private heap (for
example, the parser’s per-PARSER_CONTEXT heap so the entire
tree can be freed in one db_destroy_private_heap after the
session is done). db_replace_private_heap allocates a fresh
heap and stores it in place of the existing one, returning
the old id so the caller can db_destroy_private_heap it
once it is done with the orphan.
The C-side allocation entry point is db_private_alloc. In
SERVER_MODE it reads the calling thread’s private_heap_id
and calls hl_lea_alloc, falling back to plain malloc if
the heap id is zero (which means “the thread is not yet
fully initialised, use the OS allocator and hope someone
frees it later”):
// db_private_alloc_release — src/base/memory_alloc.c (SERVER_MODE branch)heap_id = db_private_get_heapid_from_thread (thrd);
if (heap_id) { ptr = hl_lea_alloc (heap_id, size); }else { ptr = malloc (size); if (ptr == NULL) { er_set (ER_ERROR_SEVERITY, ARG_FILE_LINE, ER_OUT_OF_VIRTUAL_MEMORY, 1, size); } }In CS_MODE the same call is rerouted to the client-side
workspace (db_ws_alloc); in SA_MODE it tags the block with
a PRIVATE_MALLOC_HEADER so a later db_private_free can
route it correctly:
// db_private_alloc_release — src/base/memory_alloc.c (SA_MODE branch, condensed)if (private_heap_id) { PRIVATE_MALLOC_HEADER *h; size_t req_sz = private_request_size (size); h = (PRIVATE_MALLOC_HEADER *) hl_lea_alloc (private_heap_id, req_sz); if (h != NULL) { h->magic = PRIVATE_MALLOC_HEADER_MAGIC; h->alloc_type = PRIVATE_ALLOC_TYPE_LEA; return private_hl2user_ptr (h); } return NULL; }else { return malloc (size); }The header is 8 bytes, magic-checked on free, and stores
either PRIVATE_ALLOC_TYPE_LEA (came from the Lea heap) or
PRIVATE_ALLOC_TYPE_WS (came from the workspace). On free,
db_private_free looks at the magic, asserts it, reads the
type, and dispatches:
// db_private_free_release — src/base/memory_alloc.c (SA_MODE branch, condensed)PRIVATE_MALLOC_HEADER *h = private_user2hl_ptr (ptr);if (h->magic != PRIVATE_MALLOC_HEADER_MAGIC) { assert (false); return; }if (h->alloc_type == PRIVATE_ALLOC_TYPE_LEA) { hl_lea_free (private_heap_id, h); }else if (h->alloc_type == PRIVATE_ALLOC_TYPE_WS) { db_ws_free (ptr); }The dispatch graph across the three build modes is:
flowchart LR
CALL["db_private_alloc(thread_p, size)"]
subgraph SVR["SERVER_MODE"]
SVR_HID["thread_p->private_heap_id"]
SVR_NZ{"id != 0?"}
SVR_LEA["hl_lea_alloc(id, size)"]
SVR_MAL["malloc(size)"]
end
subgraph CSM["CS_MODE"]
CS_WS["db_ws_alloc(size)"]
end
subgraph SAM["SA_MODE"]
SA_ON{"db_on_server?"}
SA_WS["db_ws_alloc(size)"]
SA_HID{"private_heap_id != 0?"}
SA_HDR["wrap with<br/>PRIVATE_MALLOC_HEADER<br/>type = LEA"]
SA_LEA["hl_lea_alloc"]
SA_MAL["malloc(size)"]
end
CALL --> SVR
CALL --> CSM
CALL --> SAM
SVR --> SVR_HID --> SVR_NZ
SVR_NZ -- yes --> SVR_LEA
SVR_NZ -- no --> SVR_MAL
CSM --> CS_WS
SAM --> SA_ON
SA_ON -- no --> SA_WS
SA_ON -- yes --> SA_HID
SA_HID -- yes --> SA_HDR --> SA_LEA
SA_HID -- no --> SA_MAL
The C++ wrapper cubmem::private_allocator<T> adds nothing of
its own — it captures (thread_p, heap_id) at construction,
calls get_private_heapid to resolve a NULL thread to
cubthread::get_entry(), and forwards allocate(count) to
private_heap_allocate(thread_p, heap_id, count * sizeof(T)):
// cubmem::private_allocator<T> — src/base/memory_private_allocator.hpptemplate <typename T>private_allocator<T>::private_allocator (cubthread::entry *thread_p) : m_thread_p (thread_p){ m_heapid = get_private_heapid (m_thread_p); register_private_allocator (m_thread_p);}
template <typename T>typename private_allocator<T>::pointerprivate_allocator<T>::allocate (size_type count){ return reinterpret_cast<T *> (private_heap_allocate (m_thread_p, m_heapid, count * sizeof (T)));}The class is stateful only in those two pointers; equality
(operator==) returns true unconditionally so STL containers
treat any two private_allocator<T> instances as
interchangeable — consistent with the C++ allocator concept’s
“equal-or-rebound” requirement and consistent with the fact
that, on SERVER_MODE, every private_allocator<T> constructed
on the same thread does end up routing to the same heap.
The cross-thread case (allocator constructed on thread A,
deallocator called on thread B) is the corner the runtime
asserts against:
// cubmem::private_heap_deallocate — src/base/memory_private_allocator.cppif (heapid != thread_p->private_heap_id) { /* this is not something we should do! */ assert (false); HL_HEAPID save_heapid = db_private_set_heapid_to_thread (thread_p, heapid); db_private_free (thread_p, ptr); (void) db_private_set_heapid_to_thread (thread_p, save_heapid); }else { db_private_free (thread_p, ptr); }The fallback path (swap heap, free, swap back) lets the call
succeed even on the wrong thread, but the assert (false) is
how the engine signals that this is a bug in the caller.
cubmem::private_unique_ptr<T> is a thin wrapper around
std::unique_ptr<T, private_pointer_deleter<T>> whose deleter
calls db_private_free(thread_p, ptr). This is the standard
way to hold a pointer that was allocated through the private
allocator without forgetting to release it on the right heap.
PRIVATE_BLOCK_ALLOCATOR is a cubmem::block_allocator that
wraps db_private_alloc / _realloc / _free for the
cubmem::block abstraction (mem_block.hpp). Consumers of
mem_block-based containers (e.g., extensible_array, the
streaming buffer in packing_packer) plug in this allocator
to get the same per-thread routing without manually building
the C++ allocator wrapper.
switch_to_global_allocator_and_call(func, args...) is the
escape hatch:
// cubmem::switch_to_global_allocator_and_call — src/base/memory_private_allocator.hppHL_HEAPID save_id = db_change_private_heap (NULL, 0);func (std::forward<Args> (args)...);(void) db_change_private_heap (NULL, save_id);It calls db_change_private_heap(NULL, 0) to deactivate the
per-thread heap (sending allocations back to plain malloc),
runs func, and restores the previous heap id. This is used
by code that must allocate something the per-thread heap will
never free — for example, a string interned into a process-
global symbol table, or an OS handle whose lifetime exceeds
the thread that allocated it.
fixed_size_allocator<T, /*is_private=*/true> (in
fixed_size_allocator.hpp) sits one layer up: it slabs out
fixed-size cells out of block<T> (an array of 256 nodes
sized sizeof(T)), allocates each block through
private_allocator<block<T>>, and chains free cells in a
singly-linked list:
// cubmem::fixed_size_alloc::allocator<T, true>::expand — src/base/fixed_size_allocator.hppvoid *raw_mem = m_allocator.allocate (1);auto deleter = [alloc = &m_allocator] (block<T> *ptr){ ptr->~block(); alloc->deallocate (ptr);};m_blocks.push_back ( std::shared_ptr<block<T>> (new (raw_mem) block<T>(), deleter));for (node<T> &node : m_blocks.back()->nodes) { /* thread the new block's nodes onto the free list */ }It is morphologically a mini-AREA (cubrid-common-area.md)
parameterised on a C++ type rather than a runtime byte size.
Lock-free freelists and hashmaps that need typed pools use
this when the type isn’t already in AREA’s hard-coded list.
Debug builds add a per-thread leak counter through
register_private_allocator /
deregister_private_allocator, which increment / decrement
thread_p->count_private_allocators. The release build
compiles both functions to no-ops:
// cubmem::register_private_allocator — src/base/memory_private_allocator.cppvoidregister_private_allocator (cubthread::entry *thread_p){#if defined (SERVER_MODE) && !defined (NDEBUG) thread_p->count_private_allocators++;#else (void) thread_p;#endif}Debug builds also wire the C-side db_private_alloc_debug
into cuberr::resource_tracker, which records (file, line, ptr) per allocation and warns at thread exit if the counts
don’t balance. The wrapping happens in memory_alloc.h
through the _debug macro family:
#if !defined(NDEBUG)#define db_private_alloc(thrd, size) \ db_private_alloc_debug(thrd, size, true, __FILE__, __LINE__)#else#define db_private_alloc(thrd, size) \ db_private_alloc_release(thrd, size, false)#endifThe rc_track argument is the bool the tracker keys on; the
_external variants exist for callers that want to allocate
from the private heap without participating in the tracker
(public API surface, where the tracker would warn for blocks
the engine intentionally returns to the caller).
Source Walkthrough
Section titled “Source Walkthrough”Heap creation and per-thread state
Section titled “Heap creation and per-thread state”db_create_private_heap(memory_alloc.c) — registers a Lea heap viahl_register_lea_heapand returns the id.db_destroy_private_heap(memory_alloc.c) — unregisters the heap; can be called withheap_id == 0to mean “the thread’s current heap”.db_clear_private_heap(memory_alloc.c) —hl_clear_lea_heap, recycles the heap without dropping it.db_change_private_heap(memory_alloc.c) — swap-and-return the per-thread heap id.db_replace_private_heap(memory_alloc.c) — create fresh heap, store it as the thread’s heap, return the old id.db_private_get_heapid_from_thread(memory_alloc.c,static, SERVER_MODE only) — accessor.db_private_set_heapid_to_thread(memory_alloc.c, SERVER_MODE only) — setter, returns old.THREAD_ENTRY::private_heap_id(thread_entry.hpp) — per-thread heap id.
C-side allocation API
Section titled “C-side allocation API”db_private_allocmacro (memory_alloc.h) — NDEBUG-aware dispatch intodb_private_alloc_releaseordb_private_alloc_debug.db_private_alloc_release/db_private_alloc_debug(memory_alloc.c) — build-mode-aware: CS_MODE forwards todb_ws_alloc; SERVER_MODE reads the thread’s heap id and callshl_lea_alloc, falling back tomallocif the heap id is zero; SA_MODE wraps the request in aPRIVATE_MALLOC_HEADERkeyedLEAand goes through the globalprivate_heap_id, falling back tomallocifdb_on_serveris false or the global heap isn’t set.db_private_realloc_release/_debug(memory_alloc.c) — same dispatch withhl_lea_realloc/db_ws_reallocunderneath; SA_MODE re-keys on the block’s existingalloc_type.db_private_free_release/_debug(memory_alloc.c) — SA_MODE reads thePRIVATE_MALLOC_HEADERto decide betweenhl_lea_freeanddb_ws_free; the magic check is anassert (false)on mismatch.db_private_strdup/db_private_strndup(memory_alloc.c) — convenience wrappers.db_private_alloc_external/db_private_free_external/db_private_realloc_external(memory_alloc.c) — non- tracking wrappers for callers on the public API surface.
Resource-tracker integration (debug only)
Section titled “Resource-tracker integration (debug only)”cuberr::resource_tracker(resource_tracker.hpp) — per-thread(file, line, ptr)log;incrementon alloc,decrementon free; reports leaks at thread exit.- The
rc_trackargument threaded through every_debugvariant gates the per-call participation.
C++ STL allocator wrapper
Section titled “C++ STL allocator wrapper”cubmem::private_allocator<T>(memory_private_allocator.hpp) — STL allocator concept, captures(thread_p, heap_id)at construction, forwardsallocate/deallocatetoprivate_heap_allocate/_deallocate.cubmem::private_unique_ptr<T>/cubmem::private_pointer_deleter<T>(memory_private_allocator.hpp) — unique-pointer wrapper.cubmem::PRIVATE_BLOCK_ALLOCATOR(memory_private_allocator.cpp) —block_allocatorforcubmem::block-based containers, doubles the block on realloc growth.cubmem::get_private_heapid(memory_private_allocator.cpp) — resolves NULL thread tocubthread::get_entry(); SA_MODE returns 0.cubmem::private_heap_allocate/_deallocate(memory_private_allocator.cpp) — cross-heap-id assertion-and-fallback path.cubmem::register_private_allocator/cubmem::deregister_private_allocator(memory_private_allocator.cpp) — debug-onlycount_private_allocatorsincrement / decrement.cubmem::switch_to_global_allocator_and_call(memory_private_allocator.hpp) — temporarily-deactivate the per-thread heap.
One layer up: typed slab on top of the private allocator
Section titled “One layer up: typed slab on top of the private allocator”cubmem::fixed_size_alloc::allocator<T, true>(fixed_size_allocator.hpp) — typed slab ofblock<T>nodes, blocks come fromprivate_allocator<block<T>>, free list threaded throughnode<T>::m_next.cubmem::fixed_size_alloc::allocator<T, false>(fixed_size_allocator.hpp) — same slab on top ofstd::unique_ptr<block<T>>(i.e., plainnew/delete); theis_privateboolean picks one or the other.
SA_MODE in-band tagging
Section titled “SA_MODE in-band tagging”PRIVATE_MALLOC_HEADER(memory_alloc.h) — 8-byte header with magic + alloc_type.PRIVATE_ALLOC_TYPE_LEA/PRIVATE_ALLOC_TYPE_WS(memory_alloc.h) — alloc_type enum.private_request_size/private_hl2user_ptr/private_user2hl_ptr(memory_alloc.h) — pointer-arith macros that hop across the header.
Source verification (as of 2026-05-07)
Section titled “Source verification (as of 2026-05-07)”| Symbol | File | Line |
|---|---|---|
db_create_private_heap | src/base/memory_alloc.c | ~295 |
db_clear_private_heap | src/base/memory_alloc.c | ~316 |
db_change_private_heap | src/base/memory_alloc.c | ~338 |
db_replace_private_heap | src/base/memory_alloc.c | ~360 |
db_destroy_private_heap | src/base/memory_alloc.c | ~390 |
db_private_alloc_release | src/base/memory_alloc.c | ~437 |
db_private_realloc_release | src/base/memory_alloc.c | ~571 |
db_private_free_release | src/base/memory_alloc.c | ~780 |
db_private_strdup | src/base/memory_alloc.c | ~697 |
db_private_get_heapid_from_thread | src/base/memory_alloc.c | ~1002 |
db_private_set_heapid_to_thread | src/base/memory_alloc.c | ~1020 |
cubmem::private_allocator<T> (decl) | src/base/memory_private_allocator.hpp | ~57 |
cubmem::private_allocator<T>::allocate | src/base/memory_private_allocator.hpp | ~225 |
cubmem::switch_to_global_allocator_and_call | src/base/memory_private_allocator.hpp | ~349 |
cubmem::PRIVATE_BLOCK_ALLOCATOR | src/base/memory_private_allocator.cpp | ~76 |
cubmem::get_private_heapid | src/base/memory_private_allocator.cpp | ~82 |
cubmem::private_heap_allocate | src/base/memory_private_allocator.cpp | ~96 |
cubmem::private_heap_deallocate | src/base/memory_private_allocator.cpp | ~118 |
cubmem::register_private_allocator | src/base/memory_private_allocator.cpp | ~138 |
cubmem::fixed_size_alloc::allocator<T, true>::expand | src/base/fixed_size_allocator.hpp | ~191 |
PRIVATE_MALLOC_HEADER | src/base/memory_alloc.h | ~301 |
Line numbers are hints scoped to this revision. Anchor on the symbol name when the file shifts.
Cross-check Notes
Section titled “Cross-check Notes”- AREA vs private allocator.
cubrid-common-area.md(AREA) is for fixed-size objects whose lifetime spans many requests; the private allocator is for variable-size allocations whose lifetime is bounded by the request (or by an explicitdb_clear_private_heapcycle). The two coexist; AREA never goes throughdb_private_allocfor its individual cells, but AREA’s per-block bookkeeping arrays are allocated throughdb_private_alloc. - Workspace vs private allocator on the client. On the
client side (CS_MODE), every
db_private_allocis rerouted todb_ws_alloc(work_space.c), which is the OID-keyed object workspace. The “private allocator” name is a server- side abstraction; on the client the same calls land on the workspace’s allocator. This is what lets header-only code indbi-cciusedb_private_allocwithout caring about which side it ends up on. - SA_MODE header tag. SA_MODE adds an 8-byte
PRIVATE_MALLOC_HEADERto every block to remember whether the block came from the LEA heap or the workspace. SERVER_MODE does not do this — the routing on free is decided by the thread’s currentprivate_heap_id, not by an in-band tag. Documented inmemory_alloc.h(the#if defined (SA_MODE)block) and exercised indb_private_alloc_release/db_private_realloc_release/db_private_free_release(the SA_MODE branches). - Lea-heap is
dlmalloc.customheapsis a vendored copy of Doug Lea’sdlmallocfrom the late 1990s, exposed throughhl_lea_alloc / _free / _realloc. The vendored copy lives undersrc/base/customheaps.{c,h}and the vendored algorithm sources underexternal/. The choice predatestcmalloc/jemalloc— seecubrid-design-philosophy.mdfor the rationale. assert(false)on cross-heap free.private_heap_allocateandprivate_heap_deallocatein the C++ wrapper assert when the requested heap does not match the thread’s current heap. The fallback path that swaps the heap id temporarily and restores it after the call exists for legitimate cross-heap use cases (e.g.,switch_to_global_allocator_and_call’s deactivation) but is considered a code smell elsewhere.private_allocator<T>::operator==is unconditionaltrue. This is consistent with the STL allocator concept’s “interchangeable instances” requirement and with the fact that, on SERVER_MODE, every same-threadprivate_allocator<T>does route to the same heap. Twoprivate_allocator<T>instances constructed on different threads compare equal, which would be a bug if a container were ever moved between threads — but in practice STL containers in CUBRID are bound to a single thread for their lifetime.
Open Questions
Section titled “Open Questions”- Why two parallel APIs (
db_private_allocvsdb_private_alloc_external)? The only difference is whether the resource tracker is engaged (rc_track = falsefor external,truefor internal). The split is presumably a build-time choice for headers that are exposed outside the engine, but it is not documented in source. - Windows stubs return NULL.
db_private_alloc_release/_debugand the realloc / free variants are stubbed toreturn NULLon Windows (#if defined (WINDOWS)inmemory_alloc.c). Either the Windows build is degraded or there is a Windows-specific path living elsewhere; the current source does not say which. count_private_allocatorshas no consumer in this file.register_private_allocatorincrements the counter, but no thread-exit consumer of it is visible inmemory_private_allocator.cppormemory_alloc.c. It may be checked by athread_entrydestructor, but that is unconfirmed.
Sources
Section titled “Sources”src/base/memory_alloc.{h,c}— C-side API and per-thread heap state.src/base/memory_private_allocator.{hpp,cpp}— C++ STL allocator wrapper.src/base/fixed_size_allocator.hpp— typed slab on top of the private allocator.src/base/customheaps.{c,h}— Lea-heap (dlmalloc) wrappers.src/thread/thread_entry.hpp—THREAD_ENTRY::private_heap_id.src/base/resource_tracker.hpp— debug-build leak tracker.src/object/work_space.c—db_ws_alloc/db_ws_free/db_ws_realloc, the client-side workspace allocator.- Cross-references:
cubrid-common-area.md— AREA slab pool (fixed-size objects).cubrid-sa-cs-runtime.md— SA_MODE / SERVER_MODE / CS_MODE build split.cubrid-thread-worker-pool.md—cubthread::entryhostsprivate_heap_id.cubrid-thread-manager-ng.md— per-worker context freelists useprivate_allocator<T>for STL-based caches.cubrid-overview-base-infra.md— section overview.cubrid-design-philosophy.md— historical reason for using vendoreddlmallocrather thantcmalloc/jemalloc.