CUBRID Timezone — IANA Data Compilation, tz_id Resolution, and DATETIMETZ/TIMESTAMPTZ Conversion

Contents:

Theoretical Background
Common DBMS Design
CUBRID’s Approach
Source Walkthrough
Cross-check Notes
Open Questions
Sources

Theoretical Background

A timezone is not a static UTC offset — it is a piecewise function from civil time to UTC, with discontinuities at every DST transition and at every historical change of standard offset. The IANA Time Zone Database (tzdata) is the canonical encoding: ~400 named zones (Europe/Berlin, America/New_York, Asia/Seoul, …) each described by a sequence of offset rules (“from year Y₁ to year Y₂, standard time was UTC+N, daylight saving was governed by ruleset R”) and a set of DS rules (“between from_year and to_year, on month/on, at at_time, save save_time hours”). Three textbook concerns shape any real implementation.

Wall-clock to UTC is not a function. Spring-forward creates a gap (02:30 on the transition day does not exist in America/New_York); fall-back creates an overlap (01:30 occurs twice). The DBMS must (a) refuse the input, (b) pick a deterministic side, or (c) accept a user-supplied disambiguator (+05:00, EDT vs EST). PostgreSQL takes (b) by default. CUBRID combines (b) and (c): the user can append a daylight-saving abbreviation (Europe/Bucharest EET vs Europe/Bucharest EEST) and the engine searches both sides of the overlap.

AT time qualifiers. The IANA database tags every transition time with one of three suffixes: s (standard local time), w (wall clock, default), and u/g/z (UTC). The same numeric at_time = 02:00 means a different absolute moment depending on the suffix — any conversion that ignores it produces a one-hour error around transitions.

Year-boundary overlap. A DS rule active “May → September” matches the current year; a rule active “October → February” straddles a year boundary — the lookup for “1 January” must consider the previous year. CUBRID encodes this by always evaluating with year_to_apply_rule = src_year - 1 first when in_month > src_month, then comparing against src_year.

Beyond the algorithm, two engineering concerns dominate.

Storage cost. A zone-tagged datetime conceptually carries (datetime, zone, offset-rule, DS-rule); naïvely 8+64+8+8 bytes per row. Real systems pack the zone tag into 4 bytes. PostgreSQL stores the UTC instant only and re-derives the zone from session state; Oracle stores a 7-byte ID; CUBRID stores a 4-byte TZ_ID encoding either a (zone, offset-rule, DS-rule) triple or a raw signed offset, distinguished by two reserved high bits.

Update cadence. IANA releases tzdata every few months. The DBMS must absorb new versions without invalidating existing on-disk columns: zone names are stable but per-zone rules change (new countries, abolished DST, retroactive corrections). CUBRID handles this through gen_tz with three modes: new (rebuild from scratch — for new CUBRID releases), update (refresh existing rules without renumbering — disabled in current paths), and extend (rebuild rules while preserving prior zone_id numbering — the production path; if a zone changed enough to break backward compatibility, in-place migration via tzc_extend → tzc_update).

Common DBMS Design

Every relational engine supporting WITH TIME ZONE solves the same sub-problems.

Where the data lives. PostgreSQL ships zoneinfo/ straight from IANA (binary tzfile(5) blobs parsed on first use, exposed via pg_timezone_names). MySQL stores zone data inside the server (mysql.time_zone* tables, populated by mysql_tzinfo_to_sql). Oracle compiles tzdata into timezone_<n>.dat + timezlrg_<n>.dat, migrated via DBMS_DST. SQLite has no built-in database — only fixed offsets via 'utc' / 'localtime'.

Encoding. PostgreSQL stores TIMESTAMP WITH TIME ZONE as UTC int64 microseconds and the zone name only on display (session TimeZone GUC). MySQL stores TIMESTAMP as int32 UTC seconds and converts at I/O using @@session.time_zone. Oracle carries the zone explicitly: 7 bytes UTC + 2-byte zone id. CUBRID follows Oracle: DATETIMETZ is 8-byte DB_DATETIME (UTC) + 4-byte TZ_ID; TIMESTAMPTZ is 4-byte UTC DB_TIMESTAMP + 4-byte TZ_ID.

Session state. Postgres SET TimeZone; MySQL @@session.time_zone; Oracle ALTER SESSION SET TIME_ZONE. CUBRID has two settings: db_timezone (system-level, set at server boot, used by daemons with no client session) and timezone (session-level). The runtime variable tz_Region_session (client) or session_tz_region (server, per THREAD_ENTRY) holds the effective value.

DST handling. Postgres rejects ambiguous local times by default. MySQL silently picks standard-time. Oracle accepts 'US/Pacific PDT' for explicit DST. CUBRID matches Oracle: 'America/New_York EDT' resolves to the post-spring-forward instant; EST resolves to the pre-fall-back instant.

Update protocol. Postgres ships fresh zoneinfo/ per release (replace in place — no on-disk format change). MySQL re-runs mysql_tzinfo_to_sql. Oracle’s DBMS_DST.find_affected_tables enumerates affected columns, then upgrade_database rewrites them. CUBRID’s cubrid gen_tz -g extend is closest to Oracle: rebuild the library, detect whether zone IDs survived (tzc_extend), invoke tzc_update → conv_tz per row if not.

Where CUBRID sits. Smaller surface than Postgres (one library, no per-zone files). Heavier on-disk format (+4 bytes per zone-tagged value) but more accurate for read-back — Postgres TIMESTAMPTZ is lossy on zone name; CUBRID’s DATETIMETZ round-trips exactly. Closest design analogue is Oracle, but where Oracle’s identifier is opaque, CUBRID’s TZ_ID is structurally interpretable — high bits, zone-name index, offset-rule index, DS-rule index can all be extracted by bit-mask without consulting TZ_DATA, which is what makes conv_tz upgrade migrations possible.

CUBRID’s Approach

The subsystem lives in three files. src/base/timezone_lib_common.h defines the data structures (TZ_DATA, TZ_TIMEZONE, TZ_OFFSET_RULE, TZ_DS_RULE, …) shared between compiler and loader. src/base/tz_compile.c (~6,800 lines) is the build-time tooling: parse IANA source files, sort, deduplicate, generate a C source timezones.c that is compiled into libcubrid_timezones.so. src/base/tz_support.c (~5,500 lines) is the runtime: load the shared library, encode/decode TZ_ID, and drive every DATETIMETZ / TIMESTAMPTZ conversion. The compiled blob contains roughly 600 named zones (most are aliases — US/Eastern is an alias for America/New_York), 1,800 offset rules, 250 DS rulesets, and 500 DS rules; the binary is ~1 MB.

flowchart LR
  subgraph compile_time["Compile time (cubrid gen_tz)"]
    A[timezones/tzdata/<br/>africa europe asia ...<br/>leapseconds zone.tab] --> B[tzc_load_raw_data]
    B --> C[TZ_RAW_DATA] --> D[tzc_compile_data] --> E[TZ_DATA]
    E --> F[md5 checksum] --> G[tzc_export_timezone_dot_c<br/>-> timezones.c]
    G --> H[gcc -shared<br/>-> libcubrid_timezones.so]
  end
  H -. dlopen + dlsym .-> I
  subgraph runtime["Runtime"]
    I[tz_load] --> J[static tz_Timezone_data] --> K[tz_get_data]
    K --> L[tz_create_datetimetz<br/>tz_conv_tz_datetime_w_region<br/>tz_explain_tz_id]
  end

TZ_DATA — the runtime blob

The core is a single compile-time-allocated TZ_DATA struct populated by dlsym lookups against the loaded libcubrid_timezones.so. Every dynamic structure in the blob is a flat array; lookups are binary search. TZ_DATA (in timezone_lib_common.h) carries: countries (ISO 3166); timezones + timezone_names (canonical zones, parallel arrays); offset_rules (every (UTC offset, until-when) record); names (ALL names — canonical AND aliases — sorted, the lookup index for tz_get_zone_id_by_name); ds_rulesets; ds_rules; ds_leap_sec; an optional windows_iana_map under #if defined (WINDOWS); and a 32-character checksum. Aliases redirect to canonical zones via tz_name.zone_id.

tz_load_data_from_lib performs a dlsym resolution for every field — tz_country_count, tz_countries, timezone_count, tz_timezone_names, timezones, offset_rule_count, offset_rules, tz_name_count, tz_names, ds_ruleset_count, ds_rulesets, ds_rule_count, ds_rules, ds_leap_sec_count, ds_leap_sec, tz_timezone_checksum. The wrappers TZLIB_GET_VAL (scalars) and TZLIB_GET_ADDR (pointers) jump to error_loading_symbol: if any symbol is missing; these 32 names are the only hard-coded ABI between compiler and loader.

TZ_ID — packing zone, offset-rule, DS-rule into 32 bits

The on-disk identifier carried by every DATETIMETZ / TIMESTAMPTZ is a 32-bit unsigned integer with a two-bit type tag in the high bits:

 31 30 29              16 15           8 7            0
+-----+--------------------+--------------+-------------+
| F F |  zone_id (10 bits) | offset_id    | dst_id      |    F=00 : geographical zone
+-----+--------------------+--------------+-------------+
| 0 1 |       positive offset (30-bit seconds)         |    F=01 : raw +offset
+-----+------------------------------------------------+
| 1 0 |       negative offset (30-bit seconds)         |    F=10 : raw -offset
+-----+------------------------------------------------+

Constants (tz_support.c and tz_support.h): TZ_MASK_TZ_ID_FLAG = 0xc0000000, TZ_OFFSET_MASK = 0x3fffffff, TZ_ZONE_ID_MAX = 0x3ff (index into tzd->names), TZ_OFFSET_ID_MAX = TZ_DS_ID_MAX = 0xff. The encoder tz_encode_tz_id: for TZ_REGION_OFFSET, mask the absolute value with TZ_OFFSET_MASK and OR in 0x1 << 30 (positive) or 0x2 << 30 (negative); for TZ_REGION_ZONE, OR dst_id | (offset_id << 8) | (zone_id << 16).

Three structural decisions encoded here are worth flagging. (1) TZ_ID = 0 is not a sentinel — it decodes to “geographical zone, zone_id=0” (typically Africa/Abidjan after tzc_sort_raw_data). Code marking “no zone” uses TZ_ZONE_ID_MAX = 0x3ff for zones or TZ_INVALID_OFFSET = 86399 for offsets. (2) offset_id and dst_id are cached from the most recent successful conversion; they can be 0xff (“not yet resolved”), in which case tz_datetime_utc_conv runs the full search and rewrites them. This is why DATETIMETZ columns can survive an extend of the timezone library — the zone_id slot is the canonical anchor and the other two re-resolve on read. (3) The two raw-offset flags (01, 10) carry a signed magnitude representation rather than two’s complement, so the bit pattern survives signed/unsigned int punning across the JNI boundary into the Java PL engine.

TZ_REGION — the session/system zone setting

A TZ_REGION is the pre-resolution form of a TZ_ID: a tagged union carrying either a zone_id or a fixed signed offset (TZ_REGION_OFFSET / TZ_REGION_ZONE). Two globals hold system-wide and session-wide regions. The system region is module-static tz_Region_system, set once at server boot from db_timezone via tz_set_tz_region_system. The session region differs by build mode: in client builds, a single module-static tz_Region_session; in server builds, per-THREAD_ENTRY, reached through tz_get_server_tz_region_session → session_get_session_tz_region (thread_p). The server-side function has fallbacks for special threads: a CDC daemon returns the system region; a worker emulating another thread (emulate_tid != thread_id_t ()) inherits the emulated thread’s region; a TT_VACUUM_WORKER falls back to system region. This is the path that connects parser-level SYSDATETIME (wall-clock tagged with the session region) to the per-row encoder.

Wall-clock to UTC — the heart of the system

tz_datetime_utc_conv is the single algorithm that turns a (datetime, zone) pair into a (UTC datetime, fully-resolved offset_id, fully-resolved dst_id) — and turns a UTC datetime back into wall-clock when called with src_is_utc = true. It is ~900 lines (line 2817) and implements three nested searches.

flowchart TD
  S[src_dt, tz_info, src_is_utc] --> T0{type == OFFSET?}
  T0 -- yes --> X0[total_offset = tz_info.offset]
  T0 -- no  --> T2[walk gmt_off_rule_start..count<br/>pick rule whose 'until' bracket contains src_dt]
  T2 --> T3{ds_type}
  T3 -- FIXED --> T4[total_offset = gmt_off + ds_ruleset]
  T3 -- RULESET_ID --> T5[scan ds_ruleset; for each rule<br/>resolve year via from/to/in_month]
  T5 --> T6[tz_get_ds_change_julian_date_diff]
  T6 --> T7{|date_diff| < 2d?}
  T7 -- yes --> T8[tz_check_ds_match_string + leap interval]
  T7 -- no  --> T9[smallest positive date_diff wins]
  T8 --> T9
  T9 --> T10[total_offset = gmt_off + save_time]
  T4 --> T11[apply offset -> dest_dt]
  T10 --> T11
  X0 --> T11
  T11 --> Z[dest_dt + resolved offset_id, dst_id]

The first search is over the zone’s offset-rule list. Most zones have one rule; Africa/Cairo has eight; Europe/Lisbon has thirty-plus. Each rule covers [prev_rule.until, this_rule.until); the loop picks the first rule whose until is after src_dt. until_flag = UNTIL_INFINITE is treated as TZ_MAX_JULIAN_DATE. The break uses src_julian_date <= rule_julian_date + 1 — the +1 is a safety buffer for LOCAL_WALL until_time shifting up to 14 hours when interpreted as UTC.

The second search depends on ds_type. DS_TYPE_FIXED is immediate. DS_TYPE_RULESET_ID scans the DS ruleset for the most recent rule whose effective date precedes src_dt. Pruning: src_year + 1 < curr_ds_rule->from_year breaks (sorted ascending); src_year - 1 > curr_ds_rule->to_year saves a “second-best” candidate. For each surviving rule tz_get_ds_change_julian_date_diff produces the rule’s effective Julian date (year resolved via get_year_to_apply_rule) and date_diff against source. Match thresholds: date_diff >= DATE_DIFF_MATCH_SAFE_THRESHOLD_SEC (≥ 2 days) accepts immediately; |date_diff| smaller triggers the deep DST-string match + leap-interval check.

The third search is the AT time qualifier dance. IANA tags every at_time with s / w / u/g/z; CUBRID maps these to TZ_TIME_TYPE_LOCAL_STD, TZ_TIME_TYPE_LOCAL_WALL, and TZ_TIME_TYPE_UTC. tz_offset (src_is_utc, until_time_type, gmt_offset_sec, ds_save_time) produces the offset to apply when crossing a rule boundary — shifting src_dt into the rule’s reference frame (src_is_utc=true) or shifting the rule’s until into wall-clock (src_is_utc=false). When src_is_utc=true: LOCAL_STD adds gmt_offset_sec, LOCAL_WALL adds gmt_offset_sec + ds_save_time. When src_is_utc=false: UTC subtracts gmt_offset_sec + ds_save_time, LOCAL_STD subtracts ds_save_time. ds_save_time is what the rule would save if active.

The fall-back (overlap) case hits tz_check_ds_match_string, comparing the user’s optional dst_str (e.g. EDT from 'America/New_York EDT') against three candidate format strings: var_format is the %s-templated form (E%sT → EST/EDT); save_format is the explicit DST string; std_format is the explicit standard string. letter_abbrev comes from the matched DS rule’s LETTER column (S/D/W for many North American zones), with '-' treated as “no abbreviation”.

TIMESTAMP vs DATETIME — two encodings, same algorithm

CUBRID has six datetime types touched by the TZ layer:

Type	Layout	”LTZ” variant?
`DATE`	int32 julian day	n/a
`TIME`	int32 milliseconds-of-day	n/a
`DATETIME`	{ int32 julian, int32 ms-of-day }	`DATETIMELTZ`
`DATETIMETZ`	DATETIME (UTC) + 4-byte TZ_ID	—
`TIMESTAMP`	int32 UTC seconds-since-epoch	`TIMESTAMPLTZ`
`TIMESTAMPTZ`	TIMESTAMP (UTC) + 4-byte TZ_ID	—

The L-variants carry no on-disk zone tag — they store UTC and re-apply the current session zone at read time. They are CUBRID’s analogue of PostgreSQL TIMESTAMP WITH TIME ZONE: round-trip-stable in absolute time, lossy in zone name. The two *TZ variants share the conversion machinery: DATETIMETZ uses tz_create_datetimetz directly; TIMESTAMPTZ first passes through db_timestamp_encode_utc to compress the date and time-of-day into the int32 epoch representation.

tz_create_datetimetz decodes optional tz_str (or falls back to default_tz_region) into a TZ_DECODE_INFO, runs tz_datetime_utc_conv to convert wall-clock to UTC, then tz_encode_tz_id packs the resolved triple. tz_create_timestamptz is the same flow with one extra step: bump DB_TIME (seconds) to milliseconds, run the conversion, divide back, then db_timestamp_encode_utc packs UTC date+time into the 4-byte epoch. The only difference is the storage compression at the end.

Cross-zone conversion — `tz_conv_tz_datetime_w_region`

Cross-zone is two passes through tz_datetime_utc_conv: source-zone wall → UTC, then UTC → dest-zone wall. tz_conv_tz_datetime_w_zone_info compresses one pass when source or destination is already UTC (TZ_REGION_OFFSET && offset == 0), and short-circuits when source and destination are the same zone — the source datetime is copied verbatim and the resolved TZ_DECODE_INFO is propagated. The same-zone short-circuit preserves the source tz_id exactly: without it, a no-op CONVERT_TZ('America/New_York', 'America/New_York', x) would re-resolve the offset/DS sub-IDs and could produce a different TZ_ID. Public surface: tz_conv_tz_datetime_w_region (TZ_REGION → TZ_REGION) and tz_conv_tz_datetime_w_zone_name (string → string).

Library compilation — from tzdata source to `libcubrid_timezones.so`

The compile path runs from cubrid gen_tz (entry: timezone_compile_data, SA-mode-only). The fixed pipeline: tzc_check_new_package_validity (verify required files), tzc_load_raw_data (parse africa, europe, … into TZ_RAW_DATA), tzc_import_old_data (in extend mode, graft prior zone IDs from the existing library), tzc_del_unused_raw_data, tzc_sort_raw_data, tzc_index_data (assign zone_id / offset_id / ds_id), tzc_compile_data (TZ_RAW_DATA → TZ_DATA). Under Windows: tzc_load_windows_iana_map. In TZ_GEN_TYPE_EXTEND: tzc_extend preserves prior zone_id ordering; on ER_TZ_COMPILE_ERROR it falls into tzc_update to migrate user data. Finally tzc_compute_timezone_checksum (md5) and tzc_export_timezone_dot_c emit the C source.

Output timezones.c is ~2 MB, containing exactly the symbols the runtime expects: int tz_country_count = …;, TZ_COUNTRY tz_countries[] = { … };, … char tz_timezone_checksum[] = "…";. make_tz.sh orchestrates: in extend mode it iterates every database in $CUBRID_DATABASES/databases.txt and runs cubrid gen_tz -g extend $DATABASE_NAME (internally walking every DATETIMETZ / TIMESTAMPTZ column and rewriting through conv_tz if the zone’s zone_id can’t survive); in new mode it runs once. Then timezones/tzlib/build_tz.sh calls gcc -shared -o libcubrid_timezones.so timezones.c and moves the .so into $CUBRID/lib.

Migration — `conv_tz`

When a tzdata update is severe enough that a zone’s representation changes (a ruleset restructured, an offset rule split), tzc_extend returns ER_TZ_COMPILE_ERROR and tzc_update triggers a per-row rewrite via conv_tz, dispatching by DB_TYPE.

The DB_TYPE_TIMESTAMPTZ flow is representative. tz_decode_tz_id recovers the triple. Raw-offset rows (TZ_REGION_OFFSET) are copied verbatim — offsets don’t change across tzdata releases. Otherwise the function consults tz_Is_backward_compatible_timezone[ZONE_MAX] (populated by tzc_extend): if true, the migration is a pure name-lookup remap via set_new_zone_id — only the zone_id number changes, not the on-disk datetime. If false, the row’s UTC datetime is decoded, the zone name is looked up in the new library, and tz_datetime_utc_conv re-resolves offset_id / dst_id. If the new tzdata dropped the row’s DST abbreviation (e.g. MSD after Russia’s 2011 DST abolition), the loop retries with an empty dst_str — accepting that the resolved DST no longer matches user intent but preserving the absolute UTC value.

The other three types are variations: DB_TYPE_DATETIMETZ is the same without epoch compression; DB_TYPE_TIMESTAMPLTZ and DB_TYPE_DATETIMELTZ re-tag with the session zone and only convert if the session zone itself moved.

`tz_explain_tz_id` — surfacing the resolved triple to SQL

TZ_OFFSET(), DBTIMEZONE, and the format='YYYY-MM-DD HH:MI:SS TZR TZD TZH:TZM' formatter recover the human-readable form via tz_explain_tz_id, which calls tz_decode_tz_id (tz_id, true, &tz_info) (full decode — p_zone_off_rule and p_ds_rule populated). For raw-offset TZ_IDs, both string outputs are empty and tzh/tzm come from tz_info.offset / 3600 and (tz_info.offset % 3600) / 60. For geographical zones, total offset is p_zone_off_rule->gmt_off + (ds_type == DS_TYPE_RULESET_ID && p_ds_rule != NULL ? p_ds_rule->save_time : 0); tzr is tzd->names[zone_id].name; tzdst comes from tz_info.zone.dst_str — NOT recomputed from the offset rule, but populated by tz_decode_tz_id formatting var_format against the matched DS rule’s letter_abbrev. This is the only direct surface for DST abbreviations.

OS zone detection — `tz_resolve_os_timezone`

When CUBRID resolves the host zone (during boot, or cubrid_timezone='SYSTEM'), tz_resolve_os_timezone dispatches per OS. Linux: find_timezone_from_clock reads /etc/sysconfig/clock ZONE="..." (Red-Hat); fall back to find_timezone_from_localtime (the symlink target of /etc/localtime — Debian; path tail under /usr/share/zoneinfo/ is the IANA name). AIX: $TZ. Windows: _get_tzname returns a Windows name (Pacific Standard Time), looked up in tzd->windows_iana_map (from windowsZones.xml via tzc_load_windows_iana_map). The function returns the zone_id (index into tzd->names), not the name itself.

Compatibility checksum

Every TZ_DATA carries a 32-character md5 checksum computed by tzc_compute_timezone_checksum over the marshalled binary. check_timezone_compat compares client and server checksums on connect; mismatch raises ER_TZ_INCOMPATIBLE_TZ_LIB. The failure mode it prevents: a client where Asia/Seoul = zone_id 41 against a server where it is 43 — any DATETIMETZ round-trip would silently misencode. Fix is to rebuild both sides against the same tzdata or ship the same libcubrid_timezones.so.

Source Walkthrough

Symbols grouped by call-flow.

Library load. tz_load / tz_unload open/close libcubrid_timezones.so and populate tz_Timezone_data. tz_load_library wraps dlopen; tz_load_data_from_lib dlsyms every symbol via TZLIB_GET_VAL / TZLIB_GET_ADDR. tz_get_data / tz_set_data are accessors. tz_get_new_timezone_data / tz_set_new_timezone_data swap in the second blob during extend migration.

Region (session/system). tz_set_tz_region_system, tz_get_system_tz_region, tz_get_system_timezone. tz_get_session_tz_region calls tz_get_client_tz_region_session (CS) or tz_get_server_tz_region_session (SVR). tz_get_utc_tz_id, tz_get_utc_tz_region, tz_get_invalid_tz_region are sentinel constructors. tz_str_to_region parses '+08:00' / 'Europe/Berlin' / 'Europe/Berlin EEST'.

TZ_ID encode/decode. tz_encode_tz_id / tz_decode_tz_id pack/unpack the 32-bit identifier; tz_encode_tz_region / tz_decode_tz_region for TZ_REGION. tz_id_to_str renders 'America/New_York EDT'. tz_get_zone_id_by_name is binary search; tz_get_best_match_zone is the prefix-match cousin. tz_tzid_convert_region_to_offset collapses a triple into an absolute offset.

Wall ↔ UTC (the heart). tz_datetime_utc_conv (both directions, gated by src_is_utc). tz_offset (three-way LOCAL_STD / LOCAL_WALL / UTC qualifier). tz_get_ds_change_julian_date_diff converts a DS rule’s ON column (lastSun, Fri>=24) to a date diff; tz_get_first_weekday_around_date is the underlying resolver. tz_fast_find_ds_rule is the second-pass DS scan; tz_check_ds_match_string matches dst_str against var_format / save_format / std_format. Supporting predicates: get_date_diff_from_ds_rule, get_closest_ds_rule, get_saving_time_from_offset_rule, get_year_to_apply_rule, is_in_overlap_interval.

Type-creation entry points (called from string_opfunc.c). tz_create_datetimetz, tz_create_timestamptz (wall-clock + tz string); tz_create_datetimetz_from_ses (session zone); *_from_offset (±HH:MM); *_from_zoneid_and_tzd (zone-id + DST abbreviation); tz_create_datetimetz_from_utc (UTC + dest region); tz_create_datetimetz_from_parts. Session-tzid variants: tz_create_session_tzid_for_{datetime,timestamp,time}.

Cross-zone & explain. tz_conv_tz_datetime_w_region (used by CONVERT_TZ), tz_conv_tz_datetime_w_zone_name, tz_conv_tz_time_w_zone_name, tz_conv_tz_datetime_w_zone_info (internal). tz_utc_datetimetz_to_local, tz_datetimeltz_to_local reverse direction. tz_datetimetz_fix_zone, tz_timestamptz_fix_zone re-resolve sub-IDs. tz_explain_tz_id surfaces (TZR, TZD, TZH, TZM); tz_get_timezone_offset is the (tz_str, utc_dt) → offset shorthand.

Leap seconds. Gated by PRM_ID_TZ_LEAP_SECOND_SUPPORT (off by default). tz_get_leapsec_support, tz_timestamp_encode_leap_sec_adj, tz_timestamp_decode_leap_sec_adj, tz_timestamp_decode_no_leap_sec, tz_timestamp_decode_sec.

OS zone detection. tz_resolve_os_timezone per-OS dispatcher; Linux: find_timezone_from_clock, find_timezone_from_localtime; Windows: tz_get_iana_zone_id_by_windows_zone.

Migration. conv_tz per-row rewrite; set_new_zone_id name-based remap; put_timezone_checksum / check_timezone_compat connect handshake; tz_check_geographic_tz, tz_check_session_has_geographic_tz guard offset-only sessions.

Compile-time tooling (tz_compile.c). timezone_compile_data (SA-mode-only top-level). Pipeline: tzc_check_new_package_validity, tzc_load_raw_data, tzc_load_countries, tzc_load_zone_names, tzc_load_rule_file, tzc_load_backward_zones, tzc_load_leap_secs, tzc_add_{zone,link,offset_rule,ds_rule,leap_sec}, parsers tzc_parse_ds_change_on, tzc_read_time_type, str_to_offset_rule_until, str_month_to_int, str_day_to_int, str_read_day_var. Then tzc_sort_raw_data, tzc_index_data, tzc_compile_data, tzc_compile_ds_rules, tzc_extend, tzc_update, tzc_update_internal, tzc_compute_timezone_checksum, tzc_export_timezone_dot_c. Windows-only: tzc_load_windows_iana_map, xml_start_mapZone. Debug: tzc_dump_{summary,countries,timezones,one_timezone,leap_sec}.

SQL layer surface. db_to_datetimetz → tz_create_datetimetz_from_{parts,offset,zoneid_and_tzd}. db_conv_tz → conv_tz. db_format (%TZR %TZD %TZH:%TZM) → tz_explain_tz_id. db_to_char_datetimetz renders explicit DST. Builtin CONVERT_TZ → tz_conv_tz_datetime_w_zone_name; NEW_TIME → tz_conv_tz_time_w_zone_name.

Position hints (as of this revision)

Symbol	File	Line
`TZ_DATA` / `TZ_OFFSET_RULE` / `TZ_DS_RULE`	src/base/timezone_lib_common.h	198
`TZ_TIME_TYPE` / `TZ_GEN_TYPE`	src/base/timezone_lib_common.h	55
`TZ_REGION` / `TZ_ID`	src/compat/dbtype_def.h	774
`TZ_DECODE_INFO` / `TZ_MASK_TZ_ID_FLAG`	src/base/tz_support.c	58
`tz_load` / `tz_load_data_from_lib`	src/base/tz_support.c	282
`tz_get_zone_id_by_name`	src/base/tz_support.c	1110
`tz_str_timezone_decode` / `tz_str_to_region`	src/base/tz_support.c	1171
`tz_create_datetimetz` / `tz_create_timestamptz`	src/base/tz_support.c	1384
`tz_utc_datetimetz_to_local`	src/base/tz_support.c	1572
`tz_encode_tz_id` / `tz_decode_tz_id`	src/base/tz_support.c	1892
`tz_get_ds_change_julian_date_diff`	src/base/tz_support.c	2334
`tz_fast_find_ds_rule` / `tz_check_ds_match_string`	src/base/tz_support.c	2389
`tz_offset` / `get_year_to_apply_rule`	src/base/tz_support.c	2560
`tz_datetime_utc_conv`	src/base/tz_support.c	2816
`tz_conv_tz_datetime_w_zone_info` / `_w_region`	src/base/tz_support.c	3731
`tz_explain_tz_id`	src/base/tz_support.c	3894
`tz_create_datetimetz_from_offset` / `_zoneid_tzd`	src/base/tz_support.c	4069
`tz_resolve_os_timezone`	src/base/tz_support.c	4589
`tz_get_server_tz_region_session`	src/base/tz_support.c	4676
`check_timezone_compat`	src/base/tz_support.c	5050
`tz_tzid_convert_region_to_offset`	src/base/tz_support.c	5075
`set_new_zone_id` / `conv_tz`	src/base/tz_support.c	5184
`timezone_compile_data`	src/base/tz_compile.c	637
`tzc_load_raw_data` / `tzc_compile_data`	src/base/tz_compile.c	866
`tzc_export_timezone_dot_c` / `tzc_extend`	src/base/tz_compile.c	4215
`tzc_compute_timezone_checksum` / `tzc_update`	src/base/tz_compile.c	6273

Cross-check Notes

TZ_GEN_TYPE_UPDATE declared but not invoked from make_tz.sh. Shell script only passes -g new or -g extend. update mode is reachable via tzc_extend → tzc_update_internal only when a backward-incompatible change is detected.
TZ_TIME_TYPE_UTC collapses three IANA suffixes (u, g, z — aliases by IANA convention).
tz_get_offset_in_mins ignores DST (TODO at line 5483). DST-aware offsets must go through tz_explain_tz_id.
is_full_decode = false leaves pointer fields unset. Only zone_id / offset_id / dst_id populated; p_timezone, p_zone_off_rule, p_ds_rule are NULL. Reaching into them will segfault.
tz_Is_backward_compatible_timezone is SA_MODE-only. The bitmap is populated only in the standalone build; #if defined (SA_MODE) paths in conv_tz are unreachable in CS_MODE/SERVER_MODE.
The “+1 day safety buffer” in the offset-rule walk is correct: LOCAL_WALL until_time can shift up to 14 hours as UTC, crossing a Julian boundary.
TZ_ID = 0 is a valid zone, not a sentinel — decodes to the first name in tzd->names. Use tz_get_invalid_tz_region / TZ_INVALID_OFFSET.
leapseconds is parsed but parameter-gated. tzc_load_leap_secs populates tzd->ds_leap_sec always; runtime adjustment only when PRM_ID_TZ_LEAP_SECOND_SUPPORT is on (off by default).

Open Questions

gen_tz extend with a row whose dst_id no longer exists in the new ruleset? conv_tz retries with empty dst_str, but the overlap branch may flip silently, shifting the value by save_time (~1 hour). Is this drift logged anywhere?
Is gmt_off_rule_count > 0 guaranteed? The assertion at line 2886 enforces at runtime, but tzc_del_unused_raw_data could in principle filter all of a zone’s rules.
Does tz_get_offset_in_mins matter for any user-visible result? Only callers are diagnostic. Removing (vs fixing DST handling) might be cleaner.
db_timezone='Asia/Seoul' but OS host is America/New_York — which wins for SYSTIMESTAMP? Code suggests system region wins; tz_resolve_os_timezone is consulted only when db_timezone='SYSTEM' literally.
tz_Compare_datetimetz_tz_id / tz_Compare_timestamptz_tz_id SA-mode globals. Whether identical UTC datetimes with different tz_ids compare equal isn’t visible from the timezone module alone — requires tracing into mr_*_tztimestamp / mr_*_tzdatetime.

Sources

src/base/tz_support.{h,c}, src/base/tz_compile.{h,c}, src/base/timezone_lib_common.h
src/compat/dbtype_def.h (TZ_REGION, TZ_ID, DATETIMETZ, TIMESTAMPTZ)
src/query/string_opfunc.c (SQL entry points: db_to_datetimetz, db_conv_tz, db_format, CONVERT_TZ, NEW_TIME)
timezones/tzdata/ (IANA source: africa, antarctica, asia, australasia, backward, etcetera, europe, northamerica, pacificnew, southamerica, iso3166.tab, leapseconds, zone.tab, windowsZones.xml)
timezones/make_tz.sh, timezones/tzlib/build_tz.sh
IANA Time Zone Database (https://www.iana.org/time-zones) — canonical input spec.