Jan 2026
AI 시대의 CUBRID
CUBVEC — 벡터 검색 · 사내 LLM · OLTP에서 분석까지
AI · LLM 환경에서 DBMS가 새로 떠안게 된 역할 — 특히 벡터 검색 — 을 CUBRID가 어떻게 풀고 있는지 정리. CUBVEC 프로젝트의 인덱스 구조, 임베딩 모델과의 통합, 사내(Private) LLM이 요구하는 데이터 파이프라인, OLTP를 넘어 분석 워크로드로 확장하는 방향을 다룬다.
Talks & Notes
주로 발표 자료를 모아둡니다. 곁가지로 코드 분석 노트와 읽은 책 · 논문에 대한 메모도 함께 둡니다. 관심사는 데이터베이스 내부와 그 주변의 시스템 소프트웨어입니다.
함께 보기 · Notes & Analyses ↗
Jan 2026
CUBVEC — 벡터 검색 · 사내 LLM · OLTP에서 분석까지
AI · LLM 환경에서 DBMS가 새로 떠안게 된 역할 — 특히 벡터 검색 — 을 CUBRID가 어떻게 풀고 있는지 정리. CUBVEC 프로젝트의 인덱스 구조, 임베딩 모델과의 통합, 사내(Private) LLM이 요구하는 데이터 파이프라인, OLTP를 넘어 분석 워크로드로 확장하는 방향을 다룬다.
Multi-Granularity Locking, Conversion, Deadlock Detection
How the textbook 2PL and multi-granularity locking get realized on top of CUBRID's OID-shaped objects. Walks through the core data structures (LK_ENTRY, LK_RES, the aggregate-mode cache), the 12-mode compatibility matrix, the acquisition flow in lock_object, how the release path is scoped by MVCC-disabled class set and isolation level (with X locks always commit-bound), and the waits-for-graph deadlock detector with its most-recently-blocked victim policy.
Processes, Layered Stack, Query Pipeline, Distribution
Front-door router for the CUBRID code-analysis tree. Names the four long-lived processes (cub_master, cub_server, cub_pl, cub_broker + cub_cas) and their IPC; walks the layered storage stack from disk to workspace; traces the query pipeline from parser to cursor; sketches the concurrency axis (MVCC + lock + log + checkpoint + recovery + DWB), the distribution layer (heartbeat + HA + CDC + 2PC + flashback + backup), the PL family, and the cross-cutting infrastructure. One diagram per axis, with cross-refs into ~70 detail docs across eight subcategories.
OID Workspace, Bulk Fetch/Flush, Server-Side Insert/Update/Delete Bridge
How CUBRID bridges in-memory objects and on-disk OIDs. A client-side workspace of MOPs marshals dirty objects into bidirectional LC_COPYAREA buffers (LC_COPYAREA_MANYOBJS + LC_COPYAREA_ONEOBJ records); locator_mflush_cache batches them at flush time; the server-side locator_attribute_info_force / locator_{insert,update,delete}_force family is the one canonical entry point that fans out into heap, btree, lock, log, FK, and replication.
Snapshot Construction, Active-MVCCID Tracking, Vacuum Coordination
How CUBRID implements snapshot isolation on top of a lazily-issued 64-bit MVCCID, a bit-array-plus-long-tran active set (`mvcc_active_tran`), and a 2048-slot history ring (`mvcctable`). Walks through the per-record `mvcc_rec_header`, the lock-free snapshot construction in `build_mvcc_info`, the three-valued `mvcc_satisfies_snapshot` predicate, and how `complete_mvcc` advances the ring and the `m_oldest_visible` watermark that gates vacuum.
Parser, Optimizer, XASL, Executor, Access Methods
How CUBRID turns a SQL string into a stream of tuples. Flex+Bison parses into a PT_NODE tree, semantic check resolves names and types and pushes predicates to CNF, the rewriter flattens subqueries / inlines views / lowers LIMIT, a DP-style join enumerator with a System R cost model emits a QO_PLAN that the XGEN walk lowers to an XASL_NODE tree, a SHA-1 keyed XASL cache short-circuits compile work on re-execute, the Volcano-style executor drives open/next/close over SCAN_ID handles that dispatch across heap / btree / list / value / json-table / show / dblink access methods, predicate evaluation reuses one PRED_EXPR walker for scan filters and joins, and every materialised tuple stream lands on the QFILE_LIST_ID list-file substrate that overflows to FILE_TEMP before the cursor hands rows back through the broker / CCI.
Disks, Pages, Heaps, B+Trees, Hash, Overflow
A layered tour of the CUBRID storage engine: disk_manager owns volumes, sectors (64-page allocation units), files, and pages; the page buffer caches pages through a per-bucket hash and a three-zone LRU split into per-thread private lists with adjustable quotas plus a shared list; the double-write buffer stages every dirty page sequentially before the home write to defeat torn writes; heap_manager, btree, and extendible_hash sit on top of the buffer, never on the disk. WAL ordering is structural — the buffer refuses to flush a dirty page until the matching log LSA is durable.
MVCC + Locks, WAL, ARIES, Vacuum, 2PC
How CUBRID realises ACID across eleven modules: MVCC and the lock manager carry isolation, while the log manager + prior list + checkpoint + recovery manager (with the double-write buffer as a Storage-Engine cross-section) underwrite atomicity and durability. Three extensions — 2PC, flashback, backup/restore — reuse the same TDES and WAL machinery for cross-server commit, time-travel queries, and point-in-time recovery.
CUBRID 디스크 기반 k-ANN 인덱스 구현을 위한 논문 리뷰
CUBRID의 디스크 기반 k-ANN 인덱스 구현 준비를 위한 사내 논문 리뷰. DiskANN (NeurIPS 2019), AiSAQ (arXiv 2024), FreshDiskANN (arXiv 2021) 세 편을 차례로 정리하고, 각 알고리즘이 SSD 라운드트립을 최소화하면서 십억 단위 벡터를 검색해내는 방식 — Vamana 그래프, Product Quantization, 빔 검색, 스트리밍 인덱싱 — 을 비교한다.
Paper review: Pan · Wang · Li (VLDBJ 2024)
사내 paper seminar 발표. Pan·Wang·Li의 VDBMS 서베이 (VLDB Journal 2024 / arXiv:2310.14021) 를 8개 섹션 (Introduction · Query Processing · Indexing · Query Optimization and Execution · Current Systems · Benchmarks · Challenges and Open Problem · Conclusion) 로 풀어 정리. CUBRID의 벡터 검색 로드맵(CUBVEC) 을 염두에 두고 인덱싱·쿼리 최적화·현행 시스템 비교를 훑는 reading 세션 슬라이드. 2026-05 보강 — PowerPoint 원본 43 페이지에 챕터 입구 overview · Benchmarks · Challenges and Open Problems 16 페이지를 Marp 로 작성해 PDF 병합 (총 59 페이지).
CUBRID 기여 튜토리얼 — 프로젝트 선택, 기여 진입점, 개발 프로세스
공개SW페스티벌 2021의 두 번째 세션. 'OPEN COMMUNITY' 라는 CUBRID의 지향점을 출발점으로, 외부 기여자가 처음 어떤 프로젝트를 고르고, 어디서 진입점(문서 갭, IN PROGRESS 이슈, 빌드 셋업, 단순 픽스)을 찾고, JIRA · GitHub · 코드리뷰 · 자동 테스트로 이어지는 흐름을 직접 거쳐온 경험으로 안내한다. CUBRID Foundation (2020-02 시애틀 설립), CUBRID 11.0의 라이선스 전환 (BSD → Apache 2.0 클라이언트 / 코어), 그리고 모듈별 저장소 구조까지 다룬다.
MySQL / MariaDB / PostgreSQL와의 비교, CUBRID 4단계 (2008–2019), 그리고 OSS DBMS 비즈모델
DD튜브 발표. 대표적인 오픈소스 RDBMS — MySQL · MariaDB · PostgreSQL · CUBRID — 의 개발 히스토리를 비교하고, CUBRID가 2008년 오픈소스 전환 이후 거쳐온 네 단계를 정리. OSS 라이선스 선택, G-Cloud · D-Cloud 같은 공공 클라우드 채택 사례, 그리고 오픈소스 DBMS 비즈니스 모델 — '코드를 무료로 풀면 무엇으로 먹고사는가' — 까지 다룬다.
KCD 2020 발표 · JIRA · GitHub · PR · CI
KCD (Korea Community Day) 2020에서 한 발표. 오픈소스 RDBMS인 CUBRID에 외부 기여자로서 어떻게 합류하는지를 다섯 단계 (Communication · Open Issue · Implementation · Pull Request · Testing) 로 풀고, JIRA 이슈 카테고리, GitHub PR 코드리뷰 흐름, 그리고 QA 팀이 돌리는 네 가지 테스트 스위트 (medium · SQL · performance · heavy) 까지 안내한다.
3D extrusion, boolean ops, and what PostGIS still can't do (as of 2018)
Walks through PostGIS's SFCGAL-backed 3D toolbox: ST_Extrude turning NYC building footprints into 3D buildings, visualizing them in QGIS 3.4's 3D map view, and SFCGAL boolean / intersection / union / difference on solids. Closes with the practical limits of using PostGIS as a 3D-data service backend.
FOSS4G 2017 · Boston
Talk given at FOSS4G 2017 (Boston) on extending GeoServer and GeoTools so that the Web Feature Service stack can carry 3D geometries — covering the modifications to GeoTools' geometry model and the WFS handlers in GeoServer.
Multi-Granularity Locking, Conversion, Deadlock Detection
How the textbook 2PL and multi-granularity locking get realized on top of CUBRID's OID-shaped objects. Walks through the core data structures (LK_ENTRY, LK_RES, the aggregate-mode cache), the 12-mode compatibility matrix, the acquisition flow in lock_object, how the release path is scoped by MVCC-disabled class set and isolation level (with X locks always commit-bound), and the waits-for-graph deadlock detector with its most-recently-blocked victim policy.
Processes, Layered Stack, Query Pipeline, Distribution
Front-door router for the CUBRID code-analysis tree. Names the four long-lived processes (cub_master, cub_server, cub_pl, cub_broker + cub_cas) and their IPC; walks the layered storage stack from disk to workspace; traces the query pipeline from parser to cursor; sketches the concurrency axis (MVCC + lock + log + checkpoint + recovery + DWB), the distribution layer (heartbeat + HA + CDC + 2PC + flashback + backup), the PL family, and the cross-cutting infrastructure. One diagram per axis, with cross-refs into ~70 detail docs across eight subcategories.
OID Workspace, Bulk Fetch/Flush, Server-Side Insert/Update/Delete Bridge
How CUBRID bridges in-memory objects and on-disk OIDs. A client-side workspace of MOPs marshals dirty objects into bidirectional LC_COPYAREA buffers (LC_COPYAREA_MANYOBJS + LC_COPYAREA_ONEOBJ records); locator_mflush_cache batches them at flush time; the server-side locator_attribute_info_force / locator_{insert,update,delete}_force family is the one canonical entry point that fans out into heap, btree, lock, log, FK, and replication.
Snapshot Construction, Active-MVCCID Tracking, Vacuum Coordination
How CUBRID implements snapshot isolation on top of a lazily-issued 64-bit MVCCID, a bit-array-plus-long-tran active set (`mvcc_active_tran`), and a 2048-slot history ring (`mvcctable`). Walks through the per-record `mvcc_rec_header`, the lock-free snapshot construction in `build_mvcc_info`, the three-valued `mvcc_satisfies_snapshot` predicate, and how `complete_mvcc` advances the ring and the `m_oldest_visible` watermark that gates vacuum.
Parser, Optimizer, XASL, Executor, Access Methods
How CUBRID turns a SQL string into a stream of tuples. Flex+Bison parses into a PT_NODE tree, semantic check resolves names and types and pushes predicates to CNF, the rewriter flattens subqueries / inlines views / lowers LIMIT, a DP-style join enumerator with a System R cost model emits a QO_PLAN that the XGEN walk lowers to an XASL_NODE tree, a SHA-1 keyed XASL cache short-circuits compile work on re-execute, the Volcano-style executor drives open/next/close over SCAN_ID handles that dispatch across heap / btree / list / value / json-table / show / dblink access methods, predicate evaluation reuses one PRED_EXPR walker for scan filters and joins, and every materialised tuple stream lands on the QFILE_LIST_ID list-file substrate that overflows to FILE_TEMP before the cursor hands rows back through the broker / CCI.
Disks, Pages, Heaps, B+Trees, Hash, Overflow
A layered tour of the CUBRID storage engine: disk_manager owns volumes, sectors (64-page allocation units), files, and pages; the page buffer caches pages through a per-bucket hash and a three-zone LRU split into per-thread private lists with adjustable quotas plus a shared list; the double-write buffer stages every dirty page sequentially before the home write to defeat torn writes; heap_manager, btree, and extendible_hash sit on top of the buffer, never on the disk. WAL ordering is structural — the buffer refuses to flush a dirty page until the matching log LSA is durable.
MVCC + Locks, WAL, ARIES, Vacuum, 2PC
How CUBRID realises ACID across eleven modules: MVCC and the lock manager carry isolation, while the log manager + prior list + checkpoint + recovery manager (with the double-write buffer as a Storage-Engine cross-section) underwrite atomicity and durability. Three extensions — 2PC, flashback, backup/restore — reuse the same TDES and WAL machinery for cross-server commit, time-travel queries, and point-in-time recovery.