CUBRID Internationalization — Section Overview
CUBRID Internationalization — Section Overview
Section titled “CUBRID Internationalization — Section Overview”What this section covers
Section titled “What this section covers”This section is internationalisation primitives — horizontal capabilities that every string operator, every comparison, every date-arithmetic call eventually passes through. There are exactly two: codeset-plus-collation, and timezone. Both are infrastructure: they disappear into the rest of the engine, surfacing only as opaque per-record encoded IDs (a LANG_COLLATION index, a 32-bit TZ_ID) that the storage and query layers carry around without inspection.
This section used to be a wider catch-all (i18n-specialty) holding self-contained features that did not fit storage / query / server-architecture. Those have been redistributed to their natural homes:
- JSON_TABLE — moved to Query Processing. It is a SCAN_TYPE in the executor’s
scan_managerregistry, sitting next to heap, list, and B+Tree scans. Seecubrid-json-table.md. - SHOW commands — moved to a new System Catalog section alongside
cubrid-catalog-manager. SHOW exposes server-internal runtime state through the same uniform SQL surface that the static catalog uses for schema. Seecubrid-overview-system-catalog.md. - compactdb — moved to Utilities, alongside
csql,cub-admin,loaddb,unloaddb. It is an offline SA-mode tool. Seecubrid-utilities-misc.mdfor the utilities cluster.
What remains here is the textbook concept of i18n — locale-aware text and time, the two layers of “the database speaks more than one human convention”.
The two primitives
Section titled “The two primitives”The two docs are independent of one another (charset code does not call into timezone code and vice versa) but they share a single architectural pattern, which is the reason they sit together:
Compile external standards data into a CUBRID-built shared library.
dlopenit at server boot. Pack the per-record state into a small fixed-width ID. Resolve the ID through the loaded library on the read path.
flowchart LR
subgraph build["Build / install time"]
LDML["LDML locale rules<br/>(per-locale XML)"]
IANA["IANA tzdata<br/>(zone1970.tab,<br/>africa, asia, ...)"]
GENL["genlocale binary"]
MKTZ["make_tz binary"]
CL["libcubrid_collations.so<br/>(UCA weight tables,<br/>per-codeset comparators)"]
TZ["libcubrid_timezones.so<br/>(zone, offset-rule,<br/>DS-rule arrays)"]
LDML --> GENL --> CL
IANA --> MKTZ --> TZ
end
subgraph runtime["cub_server runtime"]
BOOT["boot_sr → lang_init / tz_load_library"]
LANG["LANG_COLLATION vtable<br/>fastcmp / strmatch / next_alpha_char"]
TZID["TZ_ID 32 bits<br/>(zone, offset-rule, DS-rule)"]
BOOT -->|dlopen| CL
BOOT -->|dlopen| TZ
CL --> LANG
TZ --> TZID
end
LANG -.feeds.-> btree["B+Tree key compare<br/>· sort · hash · LIKE / =<br/>· every string scalar"]
TZID -.feeds.-> dt["DATETIMETZ / TIMESTAMPTZ<br/>· tz_create_datetimetz<br/>· tz_explain_tz_id<br/>· every CAST · every date scalar"]
cubrid-charset-collation.md — text. Four codesets (binary, ISO-8859-1, EUC-KR, UTF-8); LDML locale rules compiled by genlocale into UCA weight tables shipped as a per-platform shared library; comparison dispatched through a function-pointer LANG_COLLATION vtable consumed by B+Tree, sort, hash, and every string scalar.
cubrid-timezone.md — time. Raw IANA tzdata files compiled by make_tz into a generated timezones.c and a shared library libcubrid_timezones.so; a 32-bit TZ_ID packs (zone, gmt-offset-rule, ds-rule) or a raw signed offset; tz_datetime_utc_conv resolves wall-clock to UTC honouring LOCAL_STD / LOCAL_WALL / UTC “AT” qualifiers and spring-forward / fall-back overlaps.
Reading order
Section titled “Reading order”The two i18n primitives are independent. Pick by what you are working on:
If you care about strings, identifiers, comparison, indexing, sorting, joining, or hashing — read cubrid-charset-collation.md first. Almost every other doc in the repository eventually mentions INTL_CODESET, LANG_COLLATION, or one of the lang_*cmp* comparators. The charset-collation doc is the only place where the per-codeset comparator family, the LDML / UCA pipeline, and the LANG_GET_BINARY_COLLATION macro are explained from scratch.
If you care about dates, timestamps, sessions, or anything client-server about “local time” — read cubrid-timezone.md first. The packing of TZ_ID, the AT-time qualifier semantics, and the connection-time session region are explained nowhere else.
Read both back-to-back if you want to see the architectural pattern. “Compile external standards into a dlopen-ed .so, pack a small ID, surface through SQL” is identical in both subsystems. Reading them as a pair makes the pattern obvious — and primes you to recognise it elsewhere (e.g. PL bridge libraries).
Cross-cutting concerns
Section titled “Cross-cutting concerns”Both primitives are deeply load-bearing. They touch most other sections of the knowledge base, and a comprehensive read of any of those sections is incomplete without an awareness of what these two layers are doing on its behalf.
- Charset-collation feeds B+Tree and every string operator. Every comparator on the hot path inside
btree.cultimately resolves through aLANG_COLLATION.fastcmp/LANG_COLLATION.strmatchfunction pointer. Same forLIKE,=,<,ORDER BY,GROUP BY, sort-merge join keys, and hash-join hashing. The collation surface is therefore inseparable from B+Tree (cubrid-btree.md), external sort (cubrid-external-sort.md), hash join (cubrid-hash-join.md), and string scalar functions (cubrid-scalar-functions.md). - Timezone feeds DATETIMETZ / TIMESTAMPTZ in scalar functions and at session boundaries. Anything operating on
DB_DATETIMETZorDB_TIMESTAMPTZ—tz_create_datetimetz,tz_conv_tz_datetime_w_region,tz_explain_tz_id, every CAST, every date-arithmetic operator — walks theTZ_DATAblob fromlibcubrid_timezones.so. The session-leveltz_Region_session/session_tz_regionruntime variables tie this into the connection lifecycle covered bycubrid-network-protocol.md,cubrid-server-session.md, andcubrid-boot.md. - Both surface through
boot_srat the same point in the topological boot order.lang_initandtz_load_libraryrun in the same boot phase, after sysparams but before page buffer / log / lock — because every later subsystem may need to compare strings or interpret timestamps. Seecubrid-boot.mdfor the full ordering. - Both surface through SHOW commands. SHOW LOCALES and SHOW TIMEZONES expose the in-memory state of the loaded
.sos — locale name, charset, codeset, contraction count, DS rule count, etc. — through the catalog overview’scubrid-overview-system-catalog.mdvirtual-scan path. This is the standard introspection surface for “what is the engine actually using?”.
Detail-doc summaries
Section titled “Detail-doc summaries”| Doc | One-line summary |
|---|---|
cubrid-charset-collation.md | Four-codeset text model (binary, ISO-8859-1, EUC-KR, UTF-8) plus locale-aware comparison via the LANG_COLLATION vtable; LDML + UCA weights compiled by genlocale into a per-platform shared library that the server dlopens at startup. |
cubrid-timezone.md | IANA tzdata compiled into libcubrid_timezones.so; 32-bit TZ_ID packs (zone, offset-rule, ds-rule) or a raw signed offset; tz_datetime_utc_conv walks per-zone offset and DS rules honouring s / w / u AT-time qualifiers and overlap intervals. |
Adjacent sections
Section titled “Adjacent sections”- System Catalog — both primitives surface through SHOW commands. SHOW COLLATION, SHOW LOCALES, SHOW TIMEZONES, SHOW FULL TIMEZONES all read the in-memory
.sostate through the virtual-scan registry. Seecubrid-overview-system-catalog.md. - Storage Engine. Charset-collation drives every B+Tree comparator (
cubrid-btree.md) and surfaces through external sort (cubrid-external-sort.md). Timezone is touched only indirectly here — DATETIMETZ values stored in heaps carryTZ_IDs but storage code never inspects them. - Query Processing. Both primitives surface in scalar functions (
cubrid-scalar-functions.md), the optimiser’s collation-aware index choice (cubrid-query-optimizer.md), and the executor’s hash and sort comparators (cubrid-hash-join.md,cubrid-external-sort.md). - Server Architecture. Both
dlopenat boot fromcubrid-boot.md. Timezone is also a session-level concept managed at connect time throughcubrid-network-protocol.mdandcubrid-server-session.md. Both library paths and locale settings come from system parameters (cubrid-system-parameters.md).