Hyung-Gyu Ryoo's — Talks & Notes

Featured

Jun 2026

Checkpoint, Vacuum & Log Reclamation in CUBRID

Why archive logs stop being purged — and why it's checkpoint, not vacuum

A code-grounded walk through how CUBRID reclaims WAL archive logs, and the CBRD-26957 disk-full incident where archives grew monotonically under concurrent AUTO_INCREMENT load. Frames log retention as three independent horizons — REDO and UNDO on the LSA axis (owned by the fuzzy checkpoint), GC on the MVCCID axis (owned by vacuum) — that meet only at the archive-truncation MIN and at crash-recovery resume. Maps each to code: logpb_checkpoint releasing LOG_CS before the synchronous page flush, the redo point from the oldest unflushed dirty page, the checkpoint record carrying no MVCC state, the vacuum visibility gate (newest_mvccid vs oldest_visible), keep_from_log_pageid, and the MIN clamps in logpb_remove_archive_logs_exceed_limit. Closes on the incident: a hot db_serial page write-latched by ~120 workers starves the checkpoint's synchronous flush so it never completes and the syscrash horizon freezes at 0 — while vacuum, which flushes non-blocking, follows the log head normally. The sync-vs-async flush asymmetry is why it is checkpoint, not vacuum.

cubrid , code-analysis , checkpoint , wal , vacuum , log-archive EN html · pdf · KO html

Talks

Jun 2026

CUBRID slides-grab demo

the lock resource hash table

A demo deck proving the slides-grab authoring path in the cubrid tone.

cubrid , code-analysis , demo EN html · pdf · KO html

Jun 2026

CUBRID Lock Manager

Multi-Granularity Locking, Conversion, Deadlock Detection

How the textbook 2PL and multi-granularity locking get realized on top of CUBRID's OID-shaped objects — now extended to the code level with the detail analysis doc. Walks through the core data structures (LK_ENTRY, LK_RES, LK_TRAN_LOCK, the aggregate-mode cache), boot and the three-tier memory/lockfree boundary, the 12-mode compatibility matrix, the seven acquisition paths (A–G) inside lock_internal_perform_lock_object, conversion with the Upgrader Positioning Rule and the releaser-driven grant cascade, the release path scoped by MVCC-disabled class set and isolation level (with the NON2PL shadow protocol), conditional lock escalation, suspend/resume wake states, the waits-for-graph detector with its six-criteria victim selection, and the special paths (instant probe, composite lock, hint/demote/subclass).

cubrid , lock-manager , code-analysis , 2pl , deadlock EN html · pdf · KO html · pdf

docs Code analysis link analysis Code-level deep dive link

May 2026

CUBRID Architecture Overview draft

Processes, Layered Stack, Query Pipeline, Distribution

Front-door router for the CUBRID code-analysis tree. Names the four long-lived processes (cub_master, cub_server, cub_pl, cub_broker + cub_cas) and their IPC; walks the layered storage stack from disk to workspace; traces the query pipeline from parser to cursor; sketches the concurrency axis (MVCC + lock + log + checkpoint + recovery + DWB), the distribution layer (heartbeat + HA + CDC + 2PC + flashback + backup), the PL family, and the cross-cutting infrastructure. One diagram per axis, with cross-refs into ~70 detail docs across eight subcategories.

cubrid , architecture , overview , code-analysis , process-model EN html · pdf · KO html

docs Code analysis link

May 2026

CUBRID Locator draft

OID Workspace, Bulk Fetch/Flush, Server-Side Insert/Update/Delete Bridge

How CUBRID bridges in-memory objects and on-disk OIDs. A client-side workspace of MOPs marshals dirty objects into bidirectional LC_COPYAREA buffers (LC_COPYAREA_MANYOBJS + LC_COPYAREA_ONEOBJ records); locator_mflush_cache batches them at flush time; the server-side locator_attribute_info_force / locator_{insert,update,delete}_force family is the one canonical entry point that fans out into heap, btree, lock, log, FK, and replication.

cubrid , locator , code-analysis , oid , workspace EN html · pdf · KO html

docs Code analysis link

May 2026

CUBRID MVCC draft

Snapshot Construction, Active-MVCCID Tracking, Vacuum Coordination

How CUBRID implements snapshot isolation on top of a lazily-issued 64-bit MVCCID, a bit-array-plus-long-tran active set (`mvcc_active_tran`), and a 2048-slot history ring (`mvcctable`). Walks through the per-record `mvcc_rec_header`, the lock-free snapshot construction in `build_mvcc_info`, the three-valued `mvcc_satisfies_snapshot` predicate, and how `complete_mvcc` advances the ring and the `m_oldest_visible` watermark that gates vacuum.

cubrid , mvcc , code-analysis , snapshot-isolation , vacuum EN html · pdf · KO html

docs Code analysis link

May 2026

CUBRID Query Processing draft

Parser, Optimizer, XASL, Executor, Access Methods

How CUBRID turns a SQL string into a stream of tuples. Flex+Bison parses into a PT_NODE tree, semantic check resolves names and types and pushes predicates to CNF, the rewriter flattens subqueries / inlines views / lowers LIMIT, a DP-style join enumerator with a System R cost model emits a QO_PLAN that the XGEN walk lowers to an XASL_NODE tree, a SHA-1 keyed XASL cache short-circuits compile work on re-execute, the Volcano-style executor drives open/next/close over SCAN_ID handles that dispatch across heap / btree / list / value / json-table / show / dblink access methods, predicate evaluation reuses one PRED_EXPR walker for scan filters and joins, and every materialised tuple stream lands on the QFILE_LIST_ID list-file substrate that overflows to FILE_TEMP before the cursor hands rows back through the broker / CCI.

cubrid , query-processing , code-analysis , optimizer , xasl , executor EN html · pdf · KO html

docs Code analysis link

May 2026

CUBRID Storage Engine draft

Disks, Pages, Heaps, B+Trees, Hash, Overflow

A layered tour of the CUBRID storage engine: disk_manager owns volumes, sectors (64-page allocation units), files, and pages; the page buffer caches pages through a per-bucket hash and a three-zone LRU split into per-thread private lists with adjustable quotas plus a shared list; the double-write buffer stages every dirty page sequentially before the home write to defeat torn writes; heap_manager, btree, and extendible_hash sit on top of the buffer, never on the disk. WAL ordering is structural — the buffer refuses to flush a dirty page until the matching log LSA is durable.

cubrid , storage-engine , code-analysis , page-buffer , btree , heap EN html · pdf · KO html

docs Code analysis link

May 2026

CUBRID Transaction & Recovery draft

MVCC + Locks, WAL, ARIES, Vacuum, 2PC

How CUBRID realises ACID across eleven modules: MVCC and the lock manager carry isolation, while the log manager + prior list + checkpoint + recovery manager (with the double-write buffer as a Storage-Engine cross-section) underwrite atomicity and durability. Three extensions — 2PC, flashback, backup/restore — reuse the same TDES and WAL machinery for cross-server commit, time-travel queries, and point-in-time recovery.

cubrid , txn-recovery , code-analysis , aries , mvcc , wal EN html · pdf · KO html

docs Code analysis link

Jan 2026

AI 시대의 CUBRID

CUBVEC — 벡터 검색 · 사내 LLM · OLTP에서 분석까지

AI · LLM 환경에서 DBMS가 새로 떠안게 된 역할 — 특히 벡터 검색 — 을 CUBRID가 어떻게 풀고 있는지 정리. CUBVEC 프로젝트의 인덱스 구조, 임베딩 모델과의 통합, 사내(Private) LLM이 요구하는 데이터 파이프라인, OLTP를 넘어 분석 워크로드로 확장하는 방향을 다룬다.

cubrid , ai , llm , vector , rag , cubvec , talk KO html · pdf

Sep 2025

DiskANN and Related Works on Disk-Based k-ANN Indexes

CUBRID 디스크 기반 k-ANN 인덱스 구현을 위한 논문 리뷰

CUBRID의 디스크 기반 k-ANN 인덱스 구현 준비를 위한 사내 논문 리뷰. DiskANN (NeurIPS 2019), AiSAQ (arXiv 2024), FreshDiskANN (arXiv 2021) 세 편을 차례로 정리하고, 각 알고리즘이 SSD 라운드트립을 최소화하면서 십억 단위 벡터를 검색해내는 방식 — Vamana 그래프, Product Quantization, 빔 검색, 스트리밍 인덱싱 — 을 비교한다.

cubrid , ann , vector-search , diskann , paper-review , notes KO html · pdf

May 2025

Survey of Vector Database Management Systems

Paper review: Pan · Wang · Li (VLDBJ 2024)

사내 paper seminar 발표. Pan·Wang·Li의 VDBMS 서베이 (VLDB Journal 2024 / arXiv:2310.14021) 를 8개 섹션 (Introduction · Query Processing · Indexing · Query Optimization and Execution · Current Systems · Benchmarks · Challenges and Open Problem · Conclusion) 로 풀어 정리. CUBRID의 벡터 검색 로드맵(CUBVEC) 을 염두에 두고 인덱싱·쿼리 최적화·현행 시스템 비교를 훑는 reading 세션 슬라이드. 2026-05 보강 — PowerPoint 원본 43 페이지에 챕터 입구 overview · Benchmarks · Challenges and Open Problems 16 페이지를 Marp 로 작성해 PDF 병합 (총 59 페이지).

cubrid , vector-search , vdbms , ann , paper-review , survey , talk KO html · pdf

Nov 2021

오픈소스에서 개발자와 함께 성장하기

CUBRID 기여 튜토리얼 — 프로젝트 선택, 기여 진입점, 개발 프로세스

공개SW페스티벌 2021의 두 번째 세션. 'OPEN COMMUNITY' 라는 CUBRID의 지향점을 출발점으로, 외부 기여자가 처음 어떤 프로젝트를 고르고, 어디서 진입점(문서 갭, IN PROGRESS 이슈, 빌드 셋업, 단순 픽스)을 찾고, JIRA · GitHub · 코드리뷰 · 자동 테스트로 이어지는 흐름을 직접 거쳐온 경험으로 안내한다. CUBRID Foundation (2020-02 시애틀 설립), CUBRID 11.0의 라이선스 전환 (BSD → Apache 2.0 클라이언트 / 코어), 그리고 모듈별 저장소 구조까지 다룬다.

cubrid , open-source , tutorial , contribution , 공개SW페스티벌 , talk KO html · pdf

Sep 2021

CUBRID 오픈소스 개발 프로세스

MySQL / MariaDB / PostgreSQL와의 비교, CUBRID 4단계 (2008–2019), 그리고 OSS DBMS 비즈모델

DD튜브 발표. 대표적인 오픈소스 RDBMS — MySQL · MariaDB · PostgreSQL · CUBRID — 의 개발 히스토리를 비교하고, CUBRID가 2008년 오픈소스 전환 이후 거쳐온 네 단계를 정리. OSS 라이선스 선택, G-Cloud · D-Cloud 같은 공공 클라우드 채택 사례, 그리고 오픈소스 DBMS 비즈니스 모델 — '코드를 무료로 풀면 무엇으로 먹고사는가' — 까지 다룬다.

cubrid , open-source , dbms , license , g-cloud , talk , dd튜브 KO html · pdf

Nov 2020

오픈소스 데이터베이스, 큐브리드에 기여해보기

KCD 2020 발표 · JIRA · GitHub · PR · CI

KCD (Korea Community Day) 2020에서 한 발표. 오픈소스 RDBMS인 CUBRID에 외부 기여자로서 어떻게 합류하는지를 다섯 단계 (Communication · Open Issue · Implementation · Pull Request · Testing) 로 풀고, JIRA 이슈 카테고리, GitHub PR 코드리뷰 흐름, 그리고 QA 팀이 돌리는 네 가지 테스트 스위트 (medium · SQL · performance · heavy) 까지 안내한다.

cubrid , kcd2020 , open-source , contribution , jira , github , talk KO html · pdf

Dec 2018

Working with 3D Data in PostGIS via SFCGAL

3D extrusion, boolean ops, and what PostGIS still can't do (as of 2018)

Walks through PostGIS's SFCGAL-backed 3D toolbox: ST_Extrude turning NYC building footprints into 3D buildings, visualizing them in QGIS 3.4's 3D map view, and SFCGAL boolean / intersection / union / difference on solids. Closes with the practical limits of using PostGIS as a 3D-data service backend.

postgis , sfcgal , 3d , gis , qgis , talk EN html · pdf · KO html · pdf

Aug 2017

Development of an Extension of GeoServer for Handling 3D Spatial Data

FOSS4G 2017 · Boston

Talk given at FOSS4G 2017 (Boston) on extending GeoServer and GeoTools so that the Web Feature Service stack can carry 3D geometries — covering the modifications to GeoTools' geometry model and the WFS handlers in GeoServer.