Skip to content

PostgreSQL Datatype Library — varlena, numeric, datetime, jsonb, and arrays

Contents:

A relational database is, at bottom, a machine for storing and comparing values, and a value only means something relative to its type. The relational model (Codd 1970; Database System Concepts, Silberschatz 7e, ch. 4 “Intermediate SQL” and the type-system discussion in ch. 5) takes the domain — the set of permissible values for an attribute — as a primitive. SQL turns the domain into a data type with an external textual syntax, an internal storage encoding, a total or partial order, and a battery of operators. For the engine, a type is therefore not a passive label but an abstract data type (ADT): a representation plus the operations that respect it. The representation is opaque to the rest of the system; only the type’s own functions may interpret the bytes. This is exactly the encapsulation that Liskov & Zilles formalized for programming languages, applied to on-disk tuples.

Three forces shape how a DBMS realizes its types. The first is storage efficiency. A row store packs many attributes into a fixed-width tuple, so the engine wants fixed-length types (a 4-byte integer, an 8-byte timestamp) to be stored inline with no per-value overhead. But strings, decimals, JSON documents, and arrays are intrinsically variable-length, and a billion-row table cannot afford a fat header on every tiny string. The encoding must therefore pay for size only in proportion to size — a one-character string should not cost the same header as a one-megabyte blob.

The second force is comparison and ordering. Indexes (B-trees), sorts, hash joins, and GROUP BY all require that a type expose a consistent comparison function and, where applicable, a hash function. The catch is that “consistent” is type-specific and sometimes locale- or collation-specific: byte order is the right order for bytea but the wrong order for human-language text, and 1.0 must compare equal to 1.00 for numeric even though their byte representations differ. The ordering contract — the operator class — is what lets a generic B-tree index any type without knowing what the type is (see postgres-nbtree.md and postgres-index-am.md).

The third force is extensibility. What Goes Around Comes Around (Stonebraker & Hellerstein 2005; captured in dbms-papers/goes-around.md) traces how the object-relational lineage — POSTGRES above all — chose to make the type system open: users and extensions add new base types, and the engine treats them exactly like built-ins because every type is described by catalog rows plus a set of registered C functions. The price of that openness is a rigid calling convention: every type function, built-in or third-party, must speak the same ABI so the executor can call it indirectly. The benefit is that PostGIS can add a geometry type, or a citext extension a case-insensitive text, without patching the core executor — the same generality that The Design of POSTGRES (Stonebraker & Rowe 1986; dbms-papers/) set out to provide.

The deep idea, then, is uniformity through indirection. The query executor manipulates values as undifferentiated Datums — a register-width token that is either the value itself (for pass-by-value types like int4) or a pointer to it (for pass-by-reference types like text). All type-specific behavior is reached by looking up a function in the catalog and calling it through a uniform signature. The ADT library is the collection of those functions for the built-in types. It is the layer where the abstract relational notion of a domain becomes concrete bytes and concrete C code.

Every SQL engine must answer the same questions for each type, and the answers converge on a recognizable pattern.

The I/O quartet. A type needs to convert between its internal byte representation and (a) the human-readable text form used in SQL literals and COPY/psql output, and (b) a machine-readable binary form used on the client wire protocol for efficiency. That yields four functions, which nearly every engine has in some form:

  • input — parse external text into internal bytes (used by literals, COPY ... FROM, casts from text);
  • output — render internal bytes as text (used by result rows, COPY ... TO);
  • receive — decode the binary wire form into internal bytes;
  • send — encode internal bytes into the binary wire form.

Text I/O is canonical and stable; binary I/O is an optimization that trades human-readability for parse speed and exactness (no float round-tripping through decimal). A robust engine never lets the binary form become a hidden on-disk format it cannot evolve — hence binary protocols usually carry a version byte.

Variable-length representation. Fixed-length types are trivial: store the bytes inline. Variable-length types need a length somewhere. The two classic choices are a length-prefixed encoding (a header word giving the byte count, then the payload) and a sentinel-terminated encoding (C strings, NUL-terminated). Databases overwhelmingly choose length-prefix because it permits embedded NULs, O(1) length, and binary-safe copying. The refinement every mature engine eventually makes is to shrink the header for small values and to spill large values out of line — because a row store wants tuples to fit on a page, and a multi-megabyte value would otherwise make the tuple unstorable. The out-of-line mechanism (Oracle’s LOBs, SQL Server’s LOB_DATA / row-overflow pages, PostgreSQL’s TOAST) stores the big payload in a side relation and leaves a small pointer in the tuple.

Arbitrary-precision decimal. float is fast but lossy; financial and exact-arithmetic workloads need a decimal type that represents 0.1 exactly and supports hundreds of significant digits. The universal implementation is a sign, an exponent/scale, and an array of digits in some radix (often a power of ten so that decimal rounding and text conversion are clean), with schoolbook algorithms for add/subtract/multiply/divide.

Temporal types. Dates and timestamps are stored as integers — a count of days, or of microseconds, from an epoch — because integer arithmetic is exact and fast, and calendar conversion (the messy part: leap years, month lengths, Gregorian reform) is isolated in a Julian-day kernel that maps (year, month, day) to a single day number and back.

Semi-structured and collection types. Modern engines add JSON and array types. The naive implementation stores the text and re-parses on every access; the mature implementation stores a parsed binary tree so that field access and containment tests are fast, trading insert-time serialization cost for query-time speed. Collection types (arrays, nested tables) carry dimensionality and a null map alongside packed element data.

flowchart TD
  subgraph cat["catalog description"]
    T["pg_type row<br/>typinput typoutput<br/>typreceive typsend<br/>typlen typbyval typalign typstorage"]
    P["pg_proc rows<br/>(the C functions)"]
  end
  subgraph adt["ADT library (utils/adt)"]
    IN["typeIN: cstring to internal"]
    OUT["typeOUT: internal to cstring"]
    RECV["typeRECV: binary to internal"]
    SEND["typeSEND: internal to binary"]
    CMP["btTYPEcmp / hashTYPE / sortsupport"]
  end
  T --> P --> adt
  EXEC["executor / COPY / wire protocol"] -->|"FunctionCall via fmgr"| adt
  IN --> DATUM["Datum (value or pointer)"]
  RECV --> DATUM
  DATUM --> OUT
  DATUM --> SEND
  DATUM --> CMP

PostgreSQL’s distinctive choice is to make this pattern fully catalog-driven and uniform across built-in and user types. There is no privileged “built-in type” code path in the executor: int4 and a PostGIS geometry are dispatched identically, through the function manager (fmgr). That is the through-line of the rest of this document.

The type is a catalog row; behavior is registered functions

Section titled “The type is a catalog row; behavior is registered functions”

A PostgreSQL type is a row in pg_type. Its scalar attributes — typlen (length, or -1 for varlena, -2 for cstring), typbyval (pass-by-value vs by-reference), typalign (c/s/i/d), typstorage (plain/extended/external/main, governing TOAST) — tell generic code how to move a value around without understanding it. Its function references — typinput, typoutput, typreceive, typsend, plus the operator-class entries for comparison and hashing — point at pg_proc rows, i.e. at C functions in the ADT library. Generic subsystems (the executor, COPY, the wire protocol, the planner’s selectivity code) never hard-code a type; they read these catalog fields and call the registered functions through fmgr. The mechanics of that dispatch — FmgrInfo, FunctionCallInfo, PG_FUNCTION_ARGS, the V1 ABI — are the subject of postgres-fmgr.md; here we take them as given and focus on what the ADT functions do.

Every ADT function has the same C signature:

// textin — src/backend/utils/adt/varlena.c
Datum
textin(PG_FUNCTION_ARGS)
{
char *inputText = PG_GETARG_CSTRING(0);
PG_RETURN_TEXT_P(cstring_to_text(inputText));
}
// textout — src/backend/utils/adt/varlena.c
Datum
textout(PG_FUNCTION_ARGS)
{
Datum txt = PG_GETARG_DATUM(0);
PG_RETURN_CSTRING(TextDatumGetCString(txt));
}

PG_FUNCTION_ARGS expands to a single FunctionCallInfo fcinfo parameter; the PG_GETARG_* and PG_RETURN_* macros unpack arguments and box the result back into a Datum. This is the uniformity that lets the executor call any type’s input function by OID. The recv/send pair is the binary twin of in/out, reading and writing a StringInfo message buffer:

// textrecv — src/backend/utils/adt/varlena.c
Datum
textrecv(PG_FUNCTION_ARGS)
{
StringInfo buf = (StringInfo) PG_GETARG_POINTER(0);
text *result;
char *str;
int nbytes;
str = pq_getmsgtext(buf, buf->len - buf->cursor, &nbytes);
result = cstring_to_text_with_len(str, nbytes);
pfree(str);
PG_RETURN_TEXT_P(result);
}

Every variable-length value — text, bytea, numeric, jsonb, arrays, and user types declared with typlen = -1 — is a varlena. The contract (stated at the top of src/include/varatt.h) is that the value begins with a header whose first byte’s low bits encode which of four physical layouts this datum uses. The header is read only through macros, never field access, because the layout differs by endianness and by form.

The four forms (little-endian flag bits shown; big-endian is mirrored):

FormFirst-byte flagHeader sizeCapacityUse
4-byte uncompressedxxxxxx004 B (aligned)up to ~1 GBnormal value
4-byte compressed in-linexxxxxx104 B + va_tcinfoup to ~1 GBpglz/lz4-compressed, still in the tuple
1-byte shortxxxxxxx11 B (unaligned)up to 126 Bsmall values, saves alignment padding
TOAST pointer000000012 B tag + bodyn/aout-of-line / indirect / expanded

The 4-byte length word includes itself, so the payload length is VARSIZE - VARHDRSZ. The endianness-specific extraction is:

// VARSIZE_4B / VARSIZE_1B — src/include/varatt.h (little-endian)
#define VARSIZE_4B(PTR) \
((((varattrib_4b *) (PTR))->va_4byte.va_header >> 2) & 0x3FFFFFFF)
#define VARSIZE_1B(PTR) \
((((varattrib_1b *) (PTR))->va_header >> 1) & 0x7F)

The 1-byte short header is the cleverest piece. A normal 4-byte header must be 4-byte aligned, which on a tuple full of tiny strings wastes up to 3 padding bytes per value plus 3 of the 4 header bytes. The short form uses a single byte for both length and flag, is stored unaligned, and caps the value at 126 bytes — perfect for the short strings that dominate real schemas. A datum can be down-converted to short form when it fits (VARATT_CAN_MAKE_SHORT). Because short and external datums are unaligned, code that might see them must use the *_ANY family of macros, which dispatch on the flag bits:

// VARSIZE_ANY_EXHDR / VARDATA_ANY — src/include/varatt.h
#define VARSIZE_ANY_EXHDR(PTR) \
(VARATT_IS_1B_E(PTR) ? VARSIZE_EXTERNAL(PTR)-VARHDRSZ_EXTERNAL : \
(VARATT_IS_1B(PTR) ? VARSIZE_1B(PTR)-VARHDRSZ_SHORT : \
VARSIZE_4B(PTR)-VARHDRSZ))
#define VARDATA_ANY(PTR) \
(VARATT_IS_1B(PTR) ? VARDATA_1B(PTR) : VARDATA_4B(PTR))

A value that is compressed in-line or pushed out of line is extended (VARATT_IS_EXTENDED). Before a type function can touch the payload it must detoast: pg_detoast_datum expands any extended datum to a plain 4-byte varlena, while pg_detoast_datum_packed leaves a short header alone (it only needs to undo compression/externalization, and the *_ANY macros handle the short header). The actual out-of-line fetch and decompression are detoast_attr in access/common/detoast.c, covered by postgres-toast.md:

// pg_detoast_datum_packed — src/backend/utils/fmgr/fmgr.c
struct varlena *
pg_detoast_datum_packed(struct varlena *datum)
{
if (VARATT_IS_COMPRESSED(datum) || VARATT_IS_EXTERNAL(datum))
return detoast_attr(datum);
else
return datum;
}

This is why the canonical text_to_cstring calls pg_detoast_datum_packed and VARDATA_ANY/VARSIZE_ANY_EXHDR, then frees the unpacked copy only if it differs from the original (i.e. only if detoasting actually allocated):

// text_to_cstring — src/backend/utils/adt/varlena.c
char *
text_to_cstring(const text *t)
{
text *tunpacked = pg_detoast_datum_packed(unconstify(text *, t));
int len = VARSIZE_ANY_EXHDR(tunpacked);
char *result;
result = (char *) palloc(len + 1);
memcpy(result, VARDATA_ANY(tunpacked), len);
result[len] = '\0';
if (tunpacked != t)
pfree(tunpacked);
return result;
}

The constructor side is symmetric and always builds the full 4-byte header (datums “begin life untoasted”); the system may later shorten or TOAST them at tuple-assembly time:

// cstring_to_text_with_len — src/backend/utils/adt/varlena.c
text *
cstring_to_text_with_len(const char *s, int len)
{
text *result = (text *) palloc(len + VARHDRSZ);
SET_VARSIZE(result, len + VARHDRSZ);
memcpy(VARDATA(result), s, len);
return result;
}
flowchart TD
  D["incoming varlena Datum"] --> Q{"flag bits in first byte"}
  Q -->|"xxxxxx00"| FB["4-byte header<br/>aligned, plain<br/>VARDATA / VARSIZE"]
  Q -->|"xxxxxx10"| FC["4-byte header<br/>compressed in-line<br/>va_tcinfo holds rawsize+method"]
  Q -->|"xxxxxxx1"| SH["1-byte header<br/>unaligned short<br/>up to 126 bytes"]
  Q -->|"00000001"| EX["TOAST pointer (1b_e)<br/>tag: ONDISK / INDIRECT / EXPANDED"]
  FC -->|"detoast_attr"| FB
  EX -->|"detoast_attr"| FB
  FB --> USE["type function reads payload"]
  SH --> USE

text and bytea: copy plus collation-aware ordering

Section titled “text and bytea: copy plus collation-aware ordering”

text and bytea are varlena with no extra structure: the payload is the string/byte sequence. Input/output are essentially memcpy around a header (textin/textout above; byteain/byteaout add escape handling for non-printable bytes). The interesting code is ordering, because text must sort by collation, not by byte value. varstr_cmp is the kernel: it fast-paths the C locale to memcmp, and otherwise calls the collation provider (pg_strncoll), with a memcmp-equality shortcut to dodge the expensive collation call when strings are byte-identical:

// varstr_cmp — src/backend/utils/adt/varlena.c
int
varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
{
int result;
pg_locale_t mylocale;
check_collation_set(collid);
mylocale = pg_newlocale_from_collation(collid);
if (mylocale->collate_is_c)
{
result = memcmp(arg1, arg2, Min(len1, len2));
if ((result == 0) && (len1 != len2))
result = (len1 < len2) ? -1 : 1;
}
else
{
if (len1 == len2 && memcmp(arg1, arg2, len1) == 0)
return 0;
result = pg_strncoll(arg1, len1, arg2, len2, mylocale);
/* Break tie if necessary. */
if (result == 0 && mylocale->deterministic)
{
result = memcmp(arg1, arg2, Min(len1, len2));
if ((result == 0) && (len1 != len2))
result = (len1 < len2) ? -1 : 1;
}
}
return result;
}

For sorts, text registers a SortSupport function so the executor can skip the per-comparison fmgr call and even use abbreviated keys (pack a prefix of the collation key into a Datum for cheap first-pass comparison). The C-locale fast comparator is a bare memcmp:

// varstrfastcmp_c — src/backend/utils/adt/varlena.c
static int
varstrfastcmp_c(Datum x, Datum y, SortSupport ssup)
{
VarString *arg1 = DatumGetVarStringPP(x);
VarString *arg2 = DatumGetVarStringPP(y);
char *a1p = VARDATA_ANY(arg1);
char *a2p = VARDATA_ANY(arg2);
int len1 = VARSIZE_ANY_EXHDR(arg1);
int len2 = VARSIZE_ANY_EXHDR(arg2);
int result;
result = memcmp(a1p, a2p, Min(len1, len2));
if ((result == 0) && (len1 != len2))
result = (len1 < len2) ? -1 : 1;
/* We can't afford to leak memory here. */
if (PointerGetDatum(arg1) != x)
pfree(arg1);
if (PointerGetDatum(arg2) != y)
pfree(arg2);
return result;
}

Note the explicit pfree of any detoasted copy: B-tree comparators must not leak, because they run once per comparison in a long sort. The collation machinery and the abbreviated-key encoding belong to the i18n/sort subsystems (postgres-overview-i18n-text.md, postgres-agg-sort-nodes.md); here the point is only that text ordering is not memcmp in general — it is a catalog-driven, collation-aware function reachable through the same operator class a B-tree uses for any type.

numeric is the canonical “fat varlena”: an arbitrary-precision decimal whose on-disk form packs a sign, a weight, a display scale, and a digit array, while its arithmetic form (NumericVar) unpacks the same digits into a mutable buffer. The digit radix is NBASE = 10000 (DEC_DIGITS = 4 decimal digits per stored int16 digit), chosen so that a product of two digits fits in an int32 and decimal text conversion is a straightforward 4-digits-at-a-time loop:

// numeric.c — radix selection (the NBASE==10000 branch is the live one)
#define NBASE 10000
#define HALF_NBASE 5000
#define DEC_DIGITS 4 /* decimal digits per NBASE digit */
#define MUL_GUARD_DIGITS 2 /* these are measured in NBASE digits */
#define DIV_GUARD_DIGITS 4
typedef int16 NumericDigit;

The in-memory NumericVar separates the palloc’d buffer (buf) from the first significant digit (digits), deliberately leaving a spare leading digit so a carry out of the top can be absorbed by just decrementing digits and incrementing weight — no reallocation:

// NumericVar — src/backend/utils/adt/numeric.c
typedef struct NumericVar
{
int ndigits; /* # of digits in digits[] - can be 0! */
int weight; /* weight of first digit */
int sign; /* NUMERIC_POS, _NEG, _NAN, _PINF, or _NINF */
int dscale; /* display scale */
NumericDigit *buf; /* start of palloc'd space for digits[] */
NumericDigit *digits; /* base-NBASE digits */
} NumericVar;

On-disk, the header is itself adaptive — a short form (one 16-bit header word, used when weight and scale are small) or a long form (two header words), with a third “special” form whose header alone encodes NaN / +Inf / -Inf. The flag bits live in the top of the first header word:

// numeric.c — on-disk header flag bits
#define NUMERIC_SIGN_MASK 0xC000
#define NUMERIC_POS 0x0000
#define NUMERIC_NEG 0x4000
#define NUMERIC_SHORT 0x8000 /* short (1-word) header */
#define NUMERIC_SPECIAL 0xC000 /* NaN / Inf, header is all there is */
#define NUMERIC_HDRSZ (VARHDRSZ + sizeof(uint16) + sizeof(int16))
#define NUMERIC_HDRSZ_SHORT (VARHDRSZ + sizeof(uint16))

Output is “unpack then stringify”: numeric_out special-cases the three non-finite values, otherwise calls init_var_from_num to get a NumericVar view over the stored digits and get_str_from_var to render them:

// numeric_out — src/backend/utils/adt/numeric.c
Datum
numeric_out(PG_FUNCTION_ARGS)
{
Numeric num = PG_GETARG_NUMERIC(0);
NumericVar x;
char *str;
if (NUMERIC_IS_SPECIAL(num)) /* NaN / Infinity / -Infinity */
{
if (NUMERIC_IS_PINF(num)) PG_RETURN_CSTRING(pstrdup("Infinity"));
else if (NUMERIC_IS_NINF(num)) PG_RETURN_CSTRING(pstrdup("-Infinity"));
else PG_RETURN_CSTRING(pstrdup("NaN"));
}
init_var_from_num(num, &x);
str = get_str_from_var(&x);
PG_RETURN_CSTRING(str);
}

Arithmetic is schoolbook on the digit arrays. add_var dispatches on signs to add_abs/sub_abs (and cmp_abs to decide which is larger when signs differ), so the absolute-value routines only ever handle like-signed addition and the larger-minus-smaller subtraction:

// add_var — src/backend/utils/adt/numeric.c (sign dispatch, condensed)
static void
add_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result)
{
if (var1->sign == NUMERIC_POS)
{
if (var2->sign == NUMERIC_POS) /* (+a) + (+b) */
{
add_abs(var1, var2, result);
result->sign = NUMERIC_POS;
}
else /* (+a) + (-b) */
{
switch (cmp_abs(var1, var2))
{
case 0: zero_var(result); ...; break;
case 1: sub_abs(var1, var2, result); result->sign = NUMERIC_POS; break;
/* case -1: sub_abs(var2, var1, result); sign = NUMERIC_NEG; */
}
}
}
/* ... symmetric branch for var1->sign == NUMERIC_NEG ... */
}

Re-packing a NumericVar to the on-disk Numeric is make_result (via make_result_opt_error): it strips leading and trailing zero digits, forces a canonical zero, and chooses the short vs long header by whether the weight and scale fit (NUMERIC_CAN_BE_SHORT). This is the place the in-memory and on-disk representations meet:

// make_result_opt_error — src/backend/utils/adt/numeric.c (condensed)
n = var->ndigits;
while (n > 0 && *digits == 0) { digits++; weight--; n--; } /* leading 0s */
while (n > 0 && digits[n - 1] == 0) n--; /* trailing 0s */
if (n == 0) { weight = 0; sign = NUMERIC_POS; } /* canonical zero */
if (NUMERIC_CAN_BE_SHORT(var->dscale, weight)) /* short header */
{
len = NUMERIC_HDRSZ_SHORT + n * sizeof(NumericDigit);
result = (Numeric) palloc(len);
SET_VARSIZE(result, len);
result->choice.n_short.n_header =
(sign == NUMERIC_NEG ? (NUMERIC_SHORT | NUMERIC_SHORT_SIGN_MASK)
: NUMERIC_SHORT)
| (var->dscale << NUMERIC_SHORT_DSCALE_SHIFT)
| (weight < 0 ? NUMERIC_SHORT_WEIGHT_SIGN_MASK : 0)
| (weight & NUMERIC_SHORT_WEIGHT_MASK);
}
else { /* long header: n_sign_dscale + n_weight */ }
memcpy(NUMERIC_DIGITS(result), digits, n * sizeof(NumericDigit));

The binary I/O pair (numeric_recv/numeric_send) reads/writes the same digits over the wire as int16 words plus the weight/sign/dscale, so binary transmission is exact (no decimal-text round-trip).

flowchart TD
  ON["on-disk Numeric<br/>short / long / special header<br/>+ base-10000 digit array"]
  ON -->|"init_var_from_num"| NV["NumericVar<br/>ndigits weight sign dscale<br/>buf + digits (spare leading digit)"]
  NV -->|"add_var / mul_var<br/>(add_abs sub_abs cmp_abs)"| NV2["NumericVar result<br/>guard digits, then round"]
  NV2 -->|"make_result<br/>strip 0s, pick header"| ON2["on-disk Numeric"]
  NV -->|"get_str_from_var"| STR["cstring (numeric_out)"]
  ON -->|"numeric_send"| WIRE["binary wire (int16 digits)"]

datetime: integer storage over a Julian-day kernel

Section titled “datetime: integer storage over a Julian-day kernel”

date, time, timestamp, and timestamptz are fixed-length types (typlen 4 or 8, typbyval where it fits), so unlike the varlena types above they need no header at all — the Datum carries the integer directly. date is a day count from the PostgreSQL epoch (2000-01-01); timestamp is a microsecond count from the same epoch. Their I/O does not parse calendars inline: it routes through the shared datetime tokenizer (ParseDateTimeDecodeDateTime) which handles field order, time zones, and locale month names, and the calendar math itself is isolated in the classic Julian-day kernel date2j / j2date (in src/backend/utils/adt/ datetime.c). That isolation is the textbook design from §“Common DBMS Design”: exact integer arithmetic for ordering and intervals, with the messy Gregorian conversion confined to two functions. The tokenizer and the timezone database are large enough to warrant their own treatment; this doc notes only that the type contract is the same I/O quartet, with the calendar complexity pushed below it.

jsonb: a TOAST-compressible binary document tree

Section titled “jsonb: a TOAST-compressible binary document tree”

jsonb is the most elaborate ADT type: it stores a parsed document so that key lookup and containment are fast, yet stays a single varlena so TOAST can compress and spill it. The unit is a JsonbContainer: a 32-bit header whose low 28 bits count children and whose top bits flag array/object/scalar, followed by a parallel JEntry array, followed by the children’s variable-length payloads:

// JsonbContainer + JEntry — src/include/utils/jsonb.h
typedef struct JsonbContainer
{
uint32 header; /* # of elements or pairs, plus flag bits */
JEntry children[FLEXIBLE_ARRAY_MEMBER];
/* the data for each child node follows. */
} JsonbContainer;
#define JB_CMASK 0x0FFFFFFF /* mask for the count field */
#define JB_FSCALAR 0x10000000 /* top-level scalar wrapped in a 1-elem array */
#define JB_FOBJECT 0x20000000
#define JB_FARRAY 0x40000000
typedef uint32 JEntry;
#define JENTRY_OFFLENMASK 0x0FFFFFFF /* length OR offset of this child */
#define JENTRY_TYPEMASK 0x70000000 /* string/numeric/bool/null/container */
#define JENTRY_HAS_OFF 0x80000000 /* this JEntry holds an offset, not a len */

The subtle design choice is length-or-offset with a stride. Storing a length per child makes the value highly compressible (lengths of similar children are similar bytes) but turns “find child k” into an O(k) prefix sum. Storing an offset per child gives O(1) random access but defeats compression. PostgreSQL compromises: store a length in most JEntrys, but convert every JB_OFFSET_STRIDE-th (= 32nd) child’s field to a cumulative offset (flagged JENTRY_HAS_OFF), so any child is reachable by summing at most 31 lengths from the nearest stored offset:

// jsonb.h — the stride rationale (verbatim constant)
#define JB_OFFSET_STRIDE 32

A value is built first as an in-memory JsonbValue tree (by the parser or by pushJsonbValue), then serialized depth-first into the flat binary form. JsonbValueToJsonb is the entry point; it wraps a bare scalar in a one-element JB_FSCALAR array (so the top level is always a container) and otherwise calls convertToJsonb:

// JsonbValueToJsonb — src/backend/utils/adt/jsonb_util.c (condensed)
Jsonb *
JsonbValueToJsonb(JsonbValue *val)
{
if (IsAJsonbScalar(val)) /* wrap scalar as rawScalar array */
{
JsonbParseState *pstate = NULL;
JsonbValue scalarArray, *res;
scalarArray.type = jbvArray;
scalarArray.val.array.rawScalar = true;
scalarArray.val.array.nElems = 1;
pushJsonbValue(&pstate, WJB_BEGIN_ARRAY, &scalarArray);
pushJsonbValue(&pstate, WJB_ELEM, val);
res = pushJsonbValue(&pstate, WJB_END_ARRAY, NULL);
out = convertToJsonb(res);
}
else if (val->type == jbvObject || val->type == jbvArray)
out = convertToJsonb(val); /* object/array container */
else { /* jbvBinary: already-serialized child, just copy with a header */ }
return out;
}

The recursive serializer convertJsonbArray shows the stride logic in action: it reserves the JEntry slots, converts each element (which appends its payload to the buffer and returns a JEntry with that element’s length in the low bits), and converts the length to a cumulative offset on every stride boundary:

// convertJsonbArray — src/backend/utils/adt/jsonb_util.c (condensed)
containerhead = nElems | JB_FARRAY;
if (val->val.array.rawScalar) containerhead |= JB_FSCALAR;
appendToBuffer(buffer, &containerhead, sizeof(uint32));
jentry_offset = reserveFromBuffer(buffer, sizeof(JEntry) * nElems);
totallen = 0;
for (i = 0; i < nElems; i++)
{
convertJsonbValue(buffer, &meta, &val->val.array.elems[i], level + 1);
totallen += JBE_OFFLENFLD(meta); /* running data size */
if (totallen > JENTRY_OFFLENMASK) ereport(ERROR, ...); /* 256 MB cap */
if ((i % JB_OFFSET_STRIDE) == 0) /* every 32nd: store offset */
meta = (meta & JENTRY_TYPEMASK) | totallen | JENTRY_HAS_OFF;
copyToBuffer(buffer, jentry_offset, &meta, sizeof(JEntry));
jentry_offset += sizeof(JEntry);
}
*header = JENTRY_ISCONTAINER | (buffer->len - base_offset);

Comparison is structural, not byte-wise: compareJsonbContainers walks both documents with synchronized iterators (JsonbIteratorNext), comparing token by token so two jsonb values that differ only in key insertion order or whitespace compare equal. This is what lets jsonb participate in B-tree and GROUP BY with semantic equality:

// compareJsonbContainers — src/backend/utils/adt/jsonb_util.c (condensed)
ita = JsonbIteratorInit(a);
itb = JsonbIteratorInit(b);
do {
ra = JsonbIteratorNext(&ita, &va, false);
rb = JsonbIteratorNext(&itb, &vb, false);
if (ra == rb) {
if (ra == WJB_DONE) break; /* decisively equal */
if (va.type == vb.type)
res = compareJsonbScalarValue(&va, &vb); /* compare like-typed */
else
res = (va.type > vb.type) ? 1 : -1; /* type ordering */
}
else res = ...; /* shorter/structurally-different document orders first */
} while (res == 0);
flowchart TD
  TXT["jsonb input text"] -->|"jsonb_in / parser"| JV["JsonbValue tree<br/>(in-memory, jbvObject/Array/String/...)"]
  JV -->|"JsonbValueToJsonb<br/>convertJsonbValue depth-first"| BIN["flat Jsonb varlena<br/>JsonbContainer header<br/>+ JEntry[] (len, offset every 32)<br/>+ payloads"]
  BIN -->|"TOAST compress / spill"| DISK["on-disk attribute"]
  BIN -->|"JsonbIteratorNext"| ACCESS["key lookup / containment"]
  BIN2["other jsonb"] --> CMP["compareJsonbContainers<br/>structural, order-independent"]
  BIN --> CMP

arrays: dimensioned varlena with an optional null bitmap

Section titled “arrays: dimensioned varlena with an optional null bitmap”

A PostgreSQL array is a varlena whose ArrayType header carries the dimensionality, an element-type OID, and an offset that is zero exactly when the array has no NULLs — the presence of a null bitmap is encoded in that one field:

// ArrayType — src/include/utils/array.h
typedef struct ArrayType
{
int32 vl_len_; /* varlena header (do not touch directly!) */
int ndim; /* # of dimensions */
int32 dataoffset; /* offset to data, or 0 if no null bitmap */
Oid elemtype; /* element type OID */
} ArrayType;
/* followed by: int dims[ndim], int lbound[ndim], [null bitmap], element data */
#define ARR_NDIM(a) ((a)->ndim)
#define ARR_HASNULL(a) ((a)->dataoffset != 0)
#define ARR_ELEMTYPE(a) ((a)->elemtype)

Because the element type is stored only as an OID, generic array code must look up the element’s typlen/typbyval/typalign to stride through the packed data — exactly the catalog-driven indirection of §“PostgreSQL’s Approach”. array_get_element is the canonical reader: it detoasts, validates the subscripts against dims/lbound, computes a linear offset with ArrayGetOffset, and then walks element-by-element (arrays are not randomly addressable when elements are variable-length or nullable):

// array_get_element — src/backend/utils/adt/arrayfuncs.c (condensed)
else /* the normal flat-array case */
{
ArrayType *array = DatumGetArrayTypeP(arraydatum); /* detoasts */
ndim = ARR_NDIM(array);
dim = ARR_DIMS(array);
lb = ARR_LBOUND(array);
arraydataptr = ARR_DATA_PTR(array);
arraynullsptr = ARR_NULLBITMAP(array);
}
if (ndim != nSubscripts || ndim <= 0 || ndim > MAXDIM) { *isNull = true; return (Datum) 0; }
for (i = 0; i < ndim; i++)
if (indx[i] < lb[i] || indx[i] >= (dim[i] + lb[i])) { *isNull = true; return (Datum) 0; }
offset = ArrayGetOffset(nSubscripts, dim, lb, indx);
/* then array_seek + fetch_att honoring elmlen/elmbyval/elmalign and the null map */

Bulk consumers use deconstruct_array to explode the packed form into parallel Datum/isnull C arrays in one pass, and array_in/array_out handle the {...} text syntax (with per-element calls back into the element type’s own I/O functions — again the uniform quartet, one level down). The expanded-array machinery (ExpandedArrayHeader, VARATT_IS_EXTERNAL_EXPANDED) is an optimization for repeated in-place updates and is part of the TOAST/expanded- datum story in postgres-toast.md.

The ADT library is organized one file per type family under src/backend/utils/adt/. The through-line is always the same: a pg_proc- registered V1 function (Datum f(PG_FUNCTION_ARGS)) unpacks its Datum arguments, does type-specific work, and boxes a Datum back.

The I/O quartet and varlena plumbing (varlena.c, varatt.h, fmgr.c)

Section titled “The I/O quartet and varlena plumbing (varlena.c, varatt.h, fmgr.c)”
  • textin / textout / textrecv / textsend — the reference quartet; thin wrappers over cstring_to_text* / text_to_cstring and the StringInfo wire helpers (pq_getmsgtext).
  • cstring_to_text_with_len — the constructor; always builds a full 4-byte header via SET_VARSIZE (datums “begin life untoasted”).
  • text_to_cstring — the consumer; pg_detoast_datum_packed then VARDATA_ANY / VARSIZE_ANY_EXHDR, freeing the unpacked copy only if it differs from the input.
  • VARSIZE_4B / VARSIZE_1B / VARSIZE_ANY_EXHDR / VARDATA_ANY (in varatt.h) — the endianness- and form-dispatching header macros that make the four physical layouts uniform to callers.
  • pg_detoast_datum_packed / pg_detoast_datum (fmgr.c) — the detoast entry points; delegate to detoast_attr (access/common/detoast.c, covered by postgres-toast.md) only for compressed/external datums.
  • varstr_cmp / varstrfastcmp_c — collation-aware ordering and the C-locale memcmp SortSupport fast path.
  • NumericVar — the in-memory arithmetic format; init_var_from_num views an on-disk Numeric as a NumericVar, make_result / make_result_opt_error packs one back, choosing short vs long header.
  • numeric_in / numeric_out / numeric_recv / numeric_send — the quartet; output goes through get_str_from_var, binary I/O ships raw int16 digits.
  • add_var / add_abs / sub_abs / cmp_abs / mul_var — schoolbook digit-array arithmetic with guard digits (MUL_GUARD_DIGITS).

datetime (date.c, timestamp.c, datetime.c)

Section titled “datetime (date.c, timestamp.c, datetime.c)”
  • ParseDateTime / DecodeDateTime — the shared field tokenizer/decoder used by every temporal input function.
  • date2j / j2date — the Julian-day kernel: (year,month,day) ↔ day number, isolating Gregorian calendar math from the integer-arithmetic storage.
  • JsonbContainer / JEntry / JB_OFFSET_STRIDE — the binary container format: count+flags header, length-or-offset child array, packed payloads.
  • JsonbValueToJsonb / convertToJsonb / convertJsonbValue / convertJsonbArray / convertJsonbObject / convertJsonbScalar — the in-memory-tree → flat-binary depth-first serializer.
  • getJsonbOffset / JsonbIteratorNext — random access (sum of ≤31 lengths from the nearest stored offset) and ordered traversal.
  • compareJsonbContainers / compareJsonbScalarValue — structural, order-independent comparison.
  • ArrayType and the ARR_* macros (ARR_NDIM, ARR_DIMS, ARR_LBOUND, ARR_DATA_PTR, ARR_NULLBITMAP, ARR_HASNULL, ARR_ELEMTYPE) — the header layout and accessors.
  • array_in / array_out / array_recv — text {...} and binary I/O, recursing into each element type’s own functions.
  • array_get_element / ArrayGetOffset / deconstruct_array — element access and bulk explosion to Datum/isnull arrays.

Position hints (as of 2026-06-05, REL_18 273fe94)

Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”
SymbolFileLine
textinsrc/backend/utils/adt/varlena.c588
textoutsrc/backend/utils/adt/varlena.c599
textrecvsrc/backend/utils/adt/varlena.c610
cstring_to_text_with_lensrc/backend/utils/adt/varlena.c205
text_to_cstringsrc/backend/utils/adt/varlena.c226
varstr_cmpsrc/backend/utils/adt/varlena.c1666
varstrfastcmp_csrc/backend/utils/adt/varlena.c2121
VARSIZE_4B / VARSIZE_1Bsrc/include/varatt.h192 / 194
VARSIZE_ANY_EXHDRsrc/include/varatt.h317
VARDATA_ANYsrc/include/varatt.h324
pg_detoast_datumsrc/backend/utils/fmgr/fmgr.c1832
pg_detoast_datum_packedsrc/backend/utils/fmgr/fmgr.c1864
NumericVarsrc/backend/utils/adt/numeric.c313
numeric_insrc/backend/utils/adt/numeric.c637
numeric_outsrc/backend/utils/adt/numeric.c816
numeric_recvsrc/backend/utils/adt/numeric.c1078
numeric_sendsrc/backend/utils/adt/numeric.c1163
init_var_from_numsrc/backend/utils/adt/numeric.c7570
get_str_from_varsrc/backend/utils/adt/numeric.c7613
make_result_opt_errorsrc/backend/utils/adt/numeric.c7901
make_resultsrc/backend/utils/adt/numeric.c8010
add_varsrc/backend/utils/adt/numeric.c8550
mul_varsrc/backend/utils/adt/numeric.c8788
add_abssrc/backend/utils/adt/numeric.c11942
JsonbContainer / JEntrysrc/include/utils/jsonb.h190 / 136
JB_OFFSET_STRIDEsrc/include/utils/jsonb.h178
JsonbValueToJsonbsrc/backend/utils/adt/jsonb_util.c92
compareJsonbContainerssrc/backend/utils/adt/jsonb_util.c191
getJsonbOffsetsrc/backend/utils/adt/jsonb_util.c134
JsonbIteratorNextsrc/backend/utils/adt/jsonb_util.c859
convertJsonbValuesrc/backend/utils/adt/jsonb_util.c1603
convertJsonbArraysrc/backend/utils/adt/jsonb_util.c1628
convertJsonbObjectsrc/backend/utils/adt/jsonb_util.c1712
ArrayTypesrc/include/utils/array.h92
array_insrc/backend/utils/adt/arrayfuncs.c179
array_outsrc/backend/utils/adt/arrayfuncs.c1016
array_recvsrc/backend/utils/adt/arrayfuncs.c1271
array_get_elementsrc/backend/utils/adt/arrayfuncs.c1820
deconstruct_arraysrc/backend/utils/adt/arrayfuncs.c3631

All symbols, constants, and code excerpts below were read directly from the REL_18_STABLE working tree at commit 273fe94 on 2026-06-05.

  • The I/O quartet is real and thin. textin/textout/textrecv in varlena.c are exactly the few-line wrappers quoted; the V1 ABI (PG_FUNCTION_ARGS, PG_GETARG_*, PG_RETURN_*) is the universal signature. Confirmed.
  • Four varlena physical forms. varatt.h defines the 4-byte (plain/compressed), 1-byte short, and 1B_E external/TOAST-pointer forms, with the flag-bit dispatch in VARSIZE_ANY_EXHDR / VARDATA_ANY. The short header caps at 126 bytes (VARATT_SHORT_MAX-derived). Confirmed.
  • Detoast split. pg_detoast_datum_packed (fmgr.c) returns the input untouched unless it is VARATT_IS_COMPRESSED or VARATT_IS_EXTERNAL, in which case it calls detoast_attr. Confirmed; the actual fetch/decompress lives in access/common/detoast.c (deferred to postgres-toast.md).
  • numeric radix. The live NBASE is 10000 with DEC_DIGITS = 4 and NumericDigit = int16; MUL_GUARD_DIGITS = 2. The NumericVar struct and the short/long/special on-disk header families are as quoted. Confirmed.
  • make_result canonicalization. make_result_opt_error strips leading and trailing zero digits, canonicalizes zero to weight 0 / positive, and picks the short header when NUMERIC_CAN_BE_SHORT. Confirmed.
  • jsonb stride. JB_OFFSET_STRIDE == 32; convertJsonbArray converts every 32nd child’s JEntry to a JENTRY_HAS_OFF cumulative offset and caps payload at JENTRY_OFFLENMASK (0x0FFFFFFF, 256 MB). Confirmed.
  • Structural jsonb comparison. compareJsonbContainers drives two JsonbIterators in lock-step, so equality is order-independent. Confirmed.
  • array null bitmap encoding. ARR_HASNULL(a) is literally ((a)->dataoffset != 0); the null bitmap exists iff dataoffset is nonzero. array_get_element detoasts, bounds-checks, and computes a linear offset via ArrayGetOffset. Confirmed.
  • Where exactly short-header down-conversion happens. Constructors build full 4-byte headers; the conversion to 1-byte short form occurs during tuple assembly (heap_fill_tuple / fill_val path). The precise trigger and the interaction with typstorage belong to postgres-toast.md / postgres-heap-am.md; this doc only asserts the four forms exist.
  • lz4 vs pglz selection for in-line compression. The va_tcinfo high bits encode the method, but the default-compression GUC and the per-column ALTER ... SET COMPRESSION path are TOAST concerns, not ADT concerns. Deferred.
  • Abbreviated-key encoding for non-C collations. varstrfastcmp_c is the C-locale fast path; the ICU/libc abbreviated-key converter and its abort-if-unhelpful heuristic live in the SortSupport/i18n code and are out of scope here.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”

The object-relational bet. PostgreSQL’s “a type is catalog rows plus registered C functions” is the direct descendant of The Design of POSTGRES (Stonebraker & Rowe 1986) and the lineage traced in What Goes Around Comes Around (Stonebraker & Hellerstein 2005; captured in dbms-papers/goes-around.md): the object-relational school chose an open type system over a fixed one. The payoff is visible today — PostGIS (geometry), pgvector (vector), citext, and hstore are all “just types” with no executor patches. The cost is the rigid V1 ABI and the fmgr indirection on every call, which a closed system (e.g. a hand-tuned analytics engine) avoids by hard-coding its handful of types.

Variable-length headers elsewhere. The varlena “shrink the header for small values, spill the big ones out of line” pattern recurs across engines. SQL Server uses in-row vs row-overflow vs LOB_DATA allocation units; Oracle distinguishes inline VARCHAR2 from out-of-line LOB segments with a LOB locator; MySQL/InnoDB stores long VARCHAR/BLOB columns off-page with a 20-byte pointer. PostgreSQL’s distinctive 1-byte short header (an unaligned length-and-flag byte capped at 126 bytes) is unusually aggressive about the small-string common case, reflecting how much of a real schema is short text. Column stores push this further: dictionary/RLE/bit-packing encodings (the C-Store/Vertica lineage, dbms-papers/column-vs-row.md) make the “length” implicit in the encoding rather than per-value.

Decimal arithmetic. Base-10000 schoolbook arithmetic is the conventional choice (IBM’s decNumber, Java’s BigDecimal, and most engines use a power-of-ten radix for clean rounding and text conversion). Hardware decimal (IEEE 754-2008 decimal floating point, POWER’s DFP unit) is the road not taken for general-purpose engines, which prefer the portability and unbounded precision of a software digit array.

Binary JSON. The length-or-offset-with-stride trick is PostgreSQL’s answer to a tension every binary-JSON format faces: MySQL’s binary JSON stores full offset tables (fast access, poor compression), while a pure length encoding compresses well but is O(n) to index. The stride is a tunable midpoint. Research on succinct/compressed semi-structured storage and on schema-inference for JSON columns (e.g. JSON tiles, Sinew) continues to probe whether a fully columnar shredding of JSON beats a single TOASTed blob for analytic workloads — a frontier where PostgreSQL’s “one varlena per document” is deliberately on the OLTP-friendly side.

Arrays vs nested relations. PostgreSQL’s flat dimensioned array with an element-type OID is the SQL-standard ARRAY realized as a single value. The alternative lineage — nested tables / MULTISET (Oracle), and the NF² (non-first-normal-form) research tradition — models collections as first-class relations. PostgreSQL stays closer to first normal form, treating the array as an opaque scalar that the unnest/array_agg operators bridge to and from rows.

In-tree source files (REL_18_STABLE, commit 273fe94)

Section titled “In-tree source files (REL_18_STABLE, commit 273fe94)”
  • src/backend/utils/adt/varlena.ctext/bytea I/O, varstr_cmp, SortSupport, *_to_text / text_to_cstring.
  • src/backend/utils/adt/numeric.c — arbitrary-precision decimal: I/O, NumericVar, make_result, digit-array arithmetic.
  • src/backend/utils/adt/jsonb.c, src/backend/utils/adt/jsonb_util.cjsonb I/O, the JsonbContainer/JEntry binary format, serialization (convertJsonb*), structural comparison, iteration.
  • src/backend/utils/adt/arrayfuncs.c — array I/O, ArrayType access, array_get_element, deconstruct_array.
  • src/backend/utils/adt/date.c, src/backend/utils/adt/datetime.c — temporal I/O and the date2j/j2date Julian kernel.
  • src/include/varatt.h — the varlena header layouts and VAR* macros.
  • src/include/utils/jsonb.h, src/include/utils/array.hjsonb and array on-disk structures and accessor macros.
  • src/backend/utils/fmgr/fmgr.cpg_detoast_datum* detoast entry points.
  • postgres-fmgr.md — the V1 calling convention, FmgrInfo, FunctionCallInfo, and how ADT functions are dispatched by OID.
  • postgres-toast.md — out-of-line storage, in-line compression (pglz/lz4), detoast_attr, expanded datums.
  • postgres-nbtree.md, postgres-index-am.md — how operator classes consume the comparison/hash functions these types register.
  • postgres-overview-base-infra.md, postgres-overview-i18n-text.md — surrounding base-infrastructure and collation/text context.
  • dbms-papers/goes-around.mdWhat Goes Around Comes Around (the object-relational type-system lineage).
  • research/dbms-general/database-system-concepts.md — domains, types, and the relational type system (ch. 4–5).