PostgreSQL TOAST — Oversized Attribute Storage, Compression, and Detoasting
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”A fixed-size page of 8 KB is PostgreSQL’s storage atom. Every heap tuple
must fit on a single page (the canonical limit is MaxHeapTupleSize,
approximately 8 KB minus page and tuple overhead). This is a hard constraint:
the page I/O model gives the buffer manager a uniform unit, and the slotted-
page layout (Database Internals, Petrov, ch. 3 “File Formats”, §“Slotted
Pages”) addresses records by (page, slot) — a slot cannot span two pages.
Real-world data routinely violates this constraint. A TEXT column holding
a product description, a BYTEA column holding a document, or a JSONB
column holding a large object can all exceed 8 KB. The engine must therefore
provide a mechanism to store values that are larger than a page while
preserving the illusion to the query layer that the datum is just another
column in the tuple.
The two canonical approaches from Database System Concepts (Silberschatz, 7e, ch. 13 “Data Storage Structures”) are:
-
Large-object / BLOB storage. The engine maintains a separate file or object store; the main tuple holds only a handle. The application must explicitly manage reads and writes through a large-object API. Oracle
LOBcolumns and PostgreSQL’s ownlarge_objectsubsystem follow this model. -
Transparent overflow. The engine intercepts writes and reads automatically, so the application sees an ordinary column of arbitrary size. The planner, executor, and access methods operate on an opaque
Datumpointer; the storage layer decides whether the datum lives inline or in a secondary store, and decompresses or reassembles it before handing it to the caller. This is the TOAST model.
TOAST (acronym coined by the PostgreSQL development team as “The Oversized- Attribute Storage Technique”, sometimes glossed humorously as “the best thing since sliced bread”) is the second approach. Its design constraints are:
- Transparency. No SQL change is required from the user. Existing
TEXT,BYTEA, and other variable-length (varlena) columns are eligible automatically based on their declared type storage strategy. - Two reduction strategies. The engine first tries inline compression to keep the datum on the main page. Only if compressed size still exceeds the threshold is the datum moved to an external table.
- Random access slice retrieval. A caller that only needs part of a datum
(e.g.,
substring(col, 1, 100)) should not have to fetch and decompress the entire datum.
The theoretical anchor for the compression side is general-purpose lossless data compression (PGLZ is a custom LZ-family algorithm; LZ4 is a well-known fast-path variant). Neither algorithm is described in the standard DBMS textbooks; both are engineering choices within the TOAST framework.
Common DBMS Design
Section titled “Common DBMS Design”The varlena header problem
Section titled “The varlena header problem”Every engine that stores variable-length values in fixed-size pages must solve the same header problem: a column value needs to carry its own length (the page does not store it separately for variable-length attributes), and that length field consumes bytes that reduce the usable data space. The universal answer is a compact variable-length header — a small integer prefix whose bit pattern encodes both the length and metadata flags. The design tension is between header size (smaller is better for small strings) and the range of sizes that can be expressed.
PostgreSQL’s struct varlena uses a two-tier scheme:
- 4-byte header (
varattrib_4b): the top two bits encode whether the datum is compressed or external; the remaining bits give the total length including the header. Supports values up to ~1 GB. - 1-byte short header (
varattrib_1b): the top bit is1to signal “short”; the remaining 7 bits give the total length including the header. Supports values up to 126 bytes inline without paying the 4-byte tax.
The short-header optimization is a well-known technique: MySQL VARCHAR
uses a 1- or 2-byte length prefix depending on declared column length; SQL
Server uses VARLEN columns with a similar 2-byte length prefix. The exact
encoding varies; the tradeoff is always header size vs. addressable range.
Out-of-line storage patterns
Section titled “Out-of-line storage patterns”Engines that support transparent overflow typically store the out-of-line data in one of three places:
- A row-overflow page in the same file (SQL Server’s row-overflow and LOB pages; MySQL InnoDB’s off-page columns using the same tablespace).
- A dedicated secondary heap keyed by
(value_id, chunk_seq)(Oracle LOB segments; PostgreSQL TOAST tables). - An external file outside the page file (Oracle
BFILE; PostgreSQL large objects are also external, but they are a separate mechanism from TOAST).
PostgreSQL TOAST takes approach 2: each relation that has at least one
toastable column gets an associated pg_toast_<OID> heap with a schema of
(chunk_id OID, chunk_seq INT4, chunk_data BYTEA). This keeps the out-of-
line data inside the MVCC machinery — chunks are regular heap tuples subject
to vacuum, snapshots, and WAL — but isolates them from the main heap to
avoid layout interference.
Compression in the storage path
Section titled “Compression in the storage path”Compressing before externalizing is a standard optimization: it reduces I/O,
increases the chance that the datum stays inline, and halves the detoasting
cost when the datum is read frequently. The practical question is when to
give up and accept incompressibility — TOAST’s “savings of more than 2 bytes”
threshold (if (VARSIZE(tmp) < valsize - 2) in toast_compress_datum) is
typical: a compression that saves nothing net after header and alignment
padding is not worth keeping.
Pointer-based indirection for out-of-line datums
Section titled “Pointer-based indirection for out-of-line datums”When a datum is moved to secondary storage, the main tuple holds a small pointer (also called a toast pointer or indirect datum) rather than the data. The pointer must be small enough not to trigger further toasting and must carry enough information to retrieve the datum: a relation OID and a value OID, from which the engine can reconstruct the chunks. This pattern appears in every engine that supports transparent overflow; the exact pointer layout varies.
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”The varlena encoding universe
Section titled “The varlena encoding universe”All PostgreSQL variable-length types use struct varlena as their in-memory
and on-disk representation. The first one or four bytes are the header; the
exact interpretation depends on the two high bits of the first byte:
| Pattern | Encoding | Meaning |
|---|---|---|
1xxxxxxx | 1-byte short | total length = low 7 bits; data inline |
00xxxxxx xxxxxxxx xxxxxxxx xxxxxxxx | 4-byte normal | total length = low 30 bits; data inline |
10000000 (4-byte) | 4-byte compressed | datum is PGLZ or LZ4 compressed; actual size in tcinfo |
00000001 (1-byte) | 1-byte external tag | datum is an external pointer; VARTAG byte follows |
The external pointer’s VARTAG byte discriminates three kinds:
VARTAG_ONDISK = 18— the standardvaratt_externaltoast pointer.VARTAG_INDIRECT = 1— an in-memory indirect pointer to anothervarlena(used by the expanded-object and other transient mechanisms).VARTAG_EXPANDED_RO / VARTAG_EXPANDED_RW = 2 / 3— an expanded-object pointer (e.g., an in-memory array or composite in expanded form).
Only VARTAG_ONDISK ever appears on disk. The others are strictly in-memory
representations produced during query execution.
Storage strategy per column
Section titled “Storage strategy per column”Each column of a toastable type carries a typstorage flag (stored in
pg_attribute.attstorage, default from pg_type.typstorage):
| Flag | Constant | Meaning |
|---|---|---|
'p' | TYPSTORAGE_PLAIN | Never toast. Store inline as-is; fail if the tuple overflows the page. |
'e' | TYPSTORAGE_EXTERNAL | Allow out-of-line but not inline compression. Useful for large blobs where the caller will handle compression (e.g., already-compressed PNG). |
'm' | TYPSTORAGE_MAIN | Prefer inline; compress inline if possible; move out-of-line only as last resort. |
'x' | TYPSTORAGE_EXTENDED | Full TOAST: try inline compression first; move out-of-line if still too large. Default for TEXT, BYTEA, JSONB. |
The 'm' strategy is the weakest form of the promise “keep this inline”.
The toaster will honour it up to TOAST_TUPLE_TARGET_MAIN (approximately
one tuple per page), but if even that is violated, the datum goes out-of-
line anyway.
Toasting threshold and the four-round algorithm
Section titled “Toasting threshold and the four-round algorithm”The toaster activates when a tuple’s data area exceeds TOAST_TUPLE_TARGET
(approximately BLCKSZ / 4 - overhead, roughly 2 KB for the default 8 KB
page). The logic lives in heap_toast_insert_or_update in heaptoast.c.
It loops over the tuple’s attributes four times in decreasing aggressiveness:
Round 1 — Compress EXTENDED, externalize very large EXTENDED/EXTERNAL.
For each attribute with attstorage = TYPSTORAGE_EXTENDED, call
toast_tuple_try_compression. If after compression the individual attribute
is still larger than maxDataLen, push it out-of-line immediately with
toast_tuple_externalize. EXTERNAL attributes that cannot be compressed are
marked TOASTCOL_INCOMPRESSIBLE and skipped in future compression passes.
Round 2 — Externalize all remaining EXTENDED/EXTERNAL.
If the tuple still exceeds maxDataLen and a toast table exists
(rel->rd_rel->reltoastrelid != InvalidOid), push remaining eligible
attributes out-of-line.
Round 3 — Compress MAIN.
If the tuple still exceeds maxDataLen, apply toast_tuple_try_compression
to TYPSTORAGE_MAIN attributes. This is the inline-compression last resort
for “prefer inline” columns.
Round 4 — Externalize MAIN.
The target is relaxed to TOAST_TUPLE_TARGET_MAIN (approximately one full
page). If still exceeded, push MAIN attributes out-of-line.
The entry point first deforms both the new and (for UPDATE) old tuple,
initialises the ToastTupleContext, and computes the header overhead so it
can convert TOAST_TUPLE_TARGET into a data-size limit. Note that the
loop condition is re-evaluated by recomputing heap_compute_data_size on
each iteration — the toaster reacts to the current (possibly already
compressed/externalized) state of the value array, not a precomputed plan:
// heap_toast_insert_or_update — src/backend/access/heap/heaptoast.cheap_deform_tuple(newtup, tupleDesc, toast_values, toast_isnull);if (oldtup != NULL) heap_deform_tuple(oldtup, tupleDesc, toast_oldvalues, toast_oldisnull);/* ... fill ttc fields ... */toast_tuple_init(&ttc);
/* compute header overhead --- this should match heap_form_tuple() */hoff = SizeofHeapTupleHeader;if ((ttc.ttc_flags & TOAST_HAS_NULLS) != 0) hoff += BITMAPLEN(numAttrs);hoff = MAXALIGN(hoff);/* now convert to a limit on the tuple data size */maxDataLen = RelationGetToastTupleTarget(rel, TOAST_TUPLE_TARGET) - hoff;The four passes share the same skeleton — pick the largest eligible
attribute via toast_tuple_find_biggest_attribute, act on it, and re-test.
The two boolean arguments to that selector encode the round’s policy:
for_compression (only consider not-yet-compressed columns) and
check_main (whether MAIN-storage columns are eligible this round):
// heap_toast_insert_or_update — src/backend/access/heap/heaptoast.c/* Round 1: compress EXTENDED; externalize a single huge value early */while (heap_compute_data_size(tupleDesc, toast_values, toast_isnull) > maxDataLen){ int biggest_attno = toast_tuple_find_biggest_attribute(&ttc, true, false); if (biggest_attno < 0) break;
if (TupleDescAttr(tupleDesc, biggest_attno)->attstorage == TYPSTORAGE_EXTENDED) toast_tuple_try_compression(&ttc, biggest_attno); else /* has attstorage EXTERNAL, ignore on subsequent compression passes */ toast_attr[biggest_attno].tai_colflags |= TOASTCOL_INCOMPRESSIBLE;
/* if it alone still busts the budget, push it out now */ if (toast_attr[biggest_attno].tai_size > maxDataLen && rel->rd_rel->reltoastrelid != InvalidOid) toast_tuple_externalize(&ttc, biggest_attno, options);}
/* Round 2: externalize remaining EXTENDED/EXTERNAL (needs a toast table) */while (heap_compute_data_size(tupleDesc, toast_values, toast_isnull) > maxDataLen && rel->rd_rel->reltoastrelid != InvalidOid){ int biggest_attno = toast_tuple_find_biggest_attribute(&ttc, false, false); if (biggest_attno < 0) break; toast_tuple_externalize(&ttc, biggest_attno, options);}
/* Round 3: now take MAIN attributes into compression */while (heap_compute_data_size(tupleDesc, toast_values, toast_isnull) > maxDataLen){ int biggest_attno = toast_tuple_find_biggest_attribute(&ttc, true, true); if (biggest_attno < 0) break; toast_tuple_try_compression(&ttc, biggest_attno);}
/* Round 4: relax the budget to one tuple/page, then externalize MAIN */maxDataLen = TOAST_TUPLE_TARGET_MAIN - hoff;while (heap_compute_data_size(tupleDesc, toast_values, toast_isnull) > maxDataLen && rel->rd_rel->reltoastrelid != InvalidOid){ int biggest_attno = toast_tuple_find_biggest_attribute(&ttc, false, true); if (biggest_attno < 0) break; toast_tuple_externalize(&ttc, biggest_attno, options);}The “externalize a single huge value early” branch in Round 1 is a
deliberate optimisation: in the common case of one long TEXT/JSONB
column and several short ones, pushing the giant out immediately avoids
spending CPU compressing the short columns that were never the problem.
If any value was replaced, TOAST_NEEDS_CHANGE is set and the function
rebuilds a fresh HeapTuple from the (now smaller) toast_values array
via heap_fill_tuple, recomputing t_hoff because an intervening
ALTER TABLE ADD COLUMN could have changed the null-bitmap width since the
old tuple was stored.
Saving a datum to the TOAST table
Section titled “Saving a datum to the TOAST table”toast_save_datum in toast_internals.c opens the toast relation, assigns a
fresh valueid OID via GetNewOidWithIndex, slices the datum into
TOAST_MAX_CHUNK_SIZE-byte chunks, and inserts each as a heap tuple
(valueid, chunk_seq, chunk_data). After all chunks are inserted, it
constructs and returns an 18-byte varatt_external pointer:
The on-disk pointer is the four-field varatt_external struct. The key
subtlety is va_extinfo: it packs both the stored payload size and the
compression method into one uint32, leaving va_rawsize to hold the
fully-decompressed size (so the reader can pre-size its result buffer
without decompressing):
// varatt_external — src/include/varatt.htypedef struct varatt_external{ int32 va_rawsize; /* Original data size (includes header) */ uint32 va_extinfo; /* External saved size (without header) and * compression method */ Oid va_valueid; /* Unique ID of value within TOAST table */ Oid va_toastrelid; /* RelID of TOAST table containing it */} varatt_external;toast_save_datum opens the toast relation and its indexes, then derives
data_p / data_todo and the two pointer fields from the shape of the
incoming datum. A short-header datum is written as if it had a normal
header; an already-inline-compressed datum carries its compression method
straight into va_extinfo (the datum is stored compressed, never
re-compressed):
// toast_save_datum — src/backend/access/common/toast_internals.cif (VARATT_IS_SHORT(dval)){ data_p = VARDATA_SHORT(dval); data_todo = VARSIZE_SHORT(dval) - VARHDRSZ_SHORT; toast_pointer.va_rawsize = data_todo + VARHDRSZ; /* as if not short */ toast_pointer.va_extinfo = data_todo;}else if (VARATT_IS_COMPRESSED(dval)){ data_p = VARDATA(dval); data_todo = VARSIZE(dval) - VARHDRSZ; toast_pointer.va_rawsize = VARDATA_COMPRESSED_GET_EXTSIZE(dval) + VARHDRSZ; VARATT_EXTERNAL_SET_SIZE_AND_COMPRESS_METHOD(toast_pointer, data_todo, VARDATA_COMPRESSED_GET_COMPRESS_METHOD(dval)); Assert(VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer));}else{ data_p = VARDATA(dval); data_todo = VARSIZE(dval) - VARHDRSZ; toast_pointer.va_rawsize = VARSIZE(dval); toast_pointer.va_extinfo = data_todo;}The valueid is a fresh OID from GetNewOidWithIndex (uniqueness checked
against the toast table’s own index). During CLUSTER/VACUUM FULL rewrites,
rel->rd_toastoid is set and the code instead reuses the old value’s OID,
short-circuiting the chunk loop with data_todo = 0 if the value already
exists in the new toast table — this is how a rewrite avoids duplicating
shared toast values.
The chunk loop itself slices data_p into TOAST_MAX_CHUNK_SIZE-byte
spans, forms a (valueid, chunk_seq, chunk_data) heap tuple for each, and
inserts both the tuple and a matching index entry:
// toast_save_datum — src/backend/access/common/toast_internals.ct_values[0] = ObjectIdGetDatum(toast_pointer.va_valueid);t_values[2] = PointerGetDatum(&chunk_data);
while (data_todo > 0){ CHECK_FOR_INTERRUPTS(); chunk_size = Min(TOAST_MAX_CHUNK_SIZE, data_todo);
t_values[1] = Int32GetDatum(chunk_seq++); SET_VARSIZE(&chunk_data, chunk_size + VARHDRSZ); memcpy(VARDATA(&chunk_data), data_p, chunk_size); toasttup = heap_form_tuple(toasttupDesc, t_values, t_isnull);
heap_insert(toastrel, toasttup, mycid, options, NULL);
/* index entry for each toast index (columns mirror the table) */ for (i = 0; i < num_indexes; i++) if (toastidxs[i]->rd_index->indisvalid) index_insert(toastidxs[i], t_values, t_isnull, &(toasttup->t_self), toastrel, toastidxs[i]->rd_index->indisunique ? UNIQUE_CHECK_YES : UNIQUE_CHECK_NO, false, NULL);
heap_freetuple(toasttup); data_todo -= chunk_size; data_p += chunk_size;}Finally it builds the 18-byte external varlena and hands it back to
toast_tuple_externalize, which drops it into ttc_values[attno]:
// toast_save_datum — src/backend/access/common/toast_internals.cresult = (struct varlena *) palloc(TOAST_POINTER_SIZE);SET_VARTAG_EXTERNAL(result, VARTAG_ONDISK);memcpy(VARDATA_EXTERNAL(result), &toast_pointer, sizeof(toast_pointer));return PointerGetDatum(result);The TOAST table always has a B-tree index on (chunk_id, chunk_seq) — this
is what heap_fetch_toast_slice scans with ScanKeyInit to retrieve chunks
in order.
Compression
Section titled “Compression”toast_compress_datum in toast_internals.c dispatches to pglz_compress_datum
or lz4_compress_datum based on default_toast_compression (GUC, default
pglz; lz4 requires compile-time USE_LZ4). The compression method ID is
stored in the top 2 bits of va_extinfo for external datums and in tcinfo
for inline-compressed datums.
// toast_compress_datum — src/backend/access/common/toast_internals.cswitch (cmethod) { case TOAST_PGLZ_COMPRESSION: tmp = pglz_compress_datum((const struct varlena *) value); cmid = TOAST_PGLZ_COMPRESSION_ID; break; case TOAST_LZ4_COMPRESSION: tmp = lz4_compress_datum((const struct varlena *) value); cmid = TOAST_LZ4_COMPRESSION_ID; break;}if (VARSIZE(tmp) < valsize - 2) { /* net savings; keep compressed form */ TOAST_COMPRESS_SET_SIZE_AND_COMPRESS_METHOD(tmp, valsize, cmid); return PointerGetDatum(tmp);} else { pfree(tmp); return PointerGetDatum(NULL); /* incompressible */}The “net savings of more than 2 bytes” guard prevents compression from inflating small datums after header and alignment overhead.
PGLZ vs LZ4 at the datum level
Section titled “PGLZ vs LZ4 at the datum level”The two compressors share a contract — take a plain varlena, return a
VARHDRSZ_COMPRESSED-prefixed compressed varlena or NULL on failure —
but differ in how they decide failure and in their speed/ratio tradeoff.
PGLZ is PostgreSQL’s in-tree LZ77-family coder. It refuses tiny or huge
inputs up front (PGLZ_strategy_default bounds) and treats a negative
return from the core pglz_compress as “incompressible”:
// pglz_compress_datum — src/backend/access/common/toast_compression.cvalsize = VARSIZE_ANY_EXHDR(value);if (valsize < PGLZ_strategy_default->min_input_size || valsize > PGLZ_strategy_default->max_input_size) return NULL;
tmp = (struct varlena *) palloc(PGLZ_MAX_OUTPUT(valsize) + VARHDRSZ_COMPRESSED);len = pglz_compress(VARDATA_ANY(value), valsize, (char *) tmp + VARHDRSZ_COMPRESSED, NULL);if (len < 0){ pfree(tmp); return NULL;}SET_VARSIZE_COMPRESSED(tmp, len + VARHDRSZ_COMPRESSED);return tmp;LZ4 (compiled in only under USE_LZ4; the stub raises an error otherwise)
is far faster with a typically lower ratio. It sizes its buffer with
LZ4_compressBound, hard-errors on a genuine library failure, and treats
“output bigger than input” as the incompressible signal:
// lz4_compress_datum — src/backend/access/common/toast_compression.cmax_size = LZ4_compressBound(valsize);tmp = (struct varlena *) palloc(max_size + VARHDRSZ_COMPRESSED);
len = LZ4_compress_default(VARDATA_ANY(value), (char *) tmp + VARHDRSZ_COMPRESSED, valsize, max_size);if (len <= 0) elog(ERROR, "lz4 compression failed");
/* data is incompressible so just free the memory and return NULL */if (len > valsize){ pfree(tmp); return NULL;}SET_VARSIZE_COMPRESSED(tmp, len + VARHDRSZ_COMPRESSED);return tmp;The compressed datum’s first four bytes are the va_tcinfo word
(toast_compress_header): 30 bits of original payload size plus a 2-bit
ToastCompressionId. That id is what toast_decompress_datum reads back to
choose the decompressor — the method travels with the datum, so a table
can hold a mix of PGLZ- and LZ4-compressed values after an
ALTER TABLE ... ALTER COLUMN ... SET COMPRESSION:
// toast_decompress_datum — src/backend/access/common/detoast.ccmid = TOAST_COMPRESS_METHOD(attr);switch (cmid){ case TOAST_PGLZ_COMPRESSION_ID: return pglz_decompress_datum(attr); case TOAST_LZ4_COMPRESSION_ID: return lz4_decompress_datum(attr); default: elog(ERROR, "invalid compression method id %d", cmid); return NULL; /* keep compiler quiet */}The asymmetry that drives the slice path (below) is decompression
randomness: PGLZ exposes pglz_decompress_datum_slice which can stop after
a prefix length, and LZ4’s LZ4_decompress_safe_partial does the same —
but only for liblz4 ≥ 1.8.3, so lz4_decompress_datum_slice falls back to
full decompression on older libraries:
// lz4_decompress_datum_slice — src/backend/access/common/toast_compression.c/* slice decompression not supported prior to 1.8.3 */if (LZ4_versionNumber() < 10803) return lz4_decompress_datum(value);
result = (struct varlena *) palloc(slicelength + VARHDRSZ);rawsize = LZ4_decompress_safe_partial((char *) value + VARHDRSZ_COMPRESSED, VARDATA(result), VARSIZE(value) - VARHDRSZ_COMPRESSED, slicelength, slicelength);Detoasting
Section titled “Detoasting”detoast_attr in detoast.c is the full detoast path: fetch from external
storage if needed, then decompress if needed, then expand from short-header
if needed. The result is always a normal 4-byte-header varlena.
// detoast_attr — src/backend/access/common/detoast.cif (VARATT_IS_EXTERNAL_ONDISK(attr)){ /* externally stored --- fetch it back from there */ attr = toast_fetch_datum(attr); /* If it's compressed, decompress it */ if (VARATT_IS_COMPRESSED(attr)) { struct varlena *tmp = attr; attr = toast_decompress_datum(tmp); pfree(tmp); }}else if (VARATT_IS_EXTERNAL_INDIRECT(attr)){ /* in-memory indirect pointer --- dereference and recurse */ struct varatt_indirect redirect; VARATT_EXTERNAL_GET_POINTER(redirect, attr); attr = detoast_attr((struct varlena *) redirect.pointer); /* ... copy if it was already flat ... */}else if (VARATT_IS_EXTERNAL_EXPANDED(attr)) attr = detoast_external_attr(attr); /* flatten expanded object */else if (VARATT_IS_COMPRESSED(attr)) attr = toast_decompress_datum(attr); /* inline-compressed only */else if (VARATT_IS_SHORT(attr)){ /* short-header varlena --- convert to 4-byte header format */ Size data_size = VARSIZE_SHORT(attr) - VARHDRSZ_SHORT; struct varlena *new_attr = (struct varlena *) palloc(data_size + VARHDRSZ); SET_VARSIZE(new_attr, data_size + VARHDRSZ); memcpy(VARDATA(new_attr), VARDATA_SHORT(attr), data_size); attr = new_attr;}return attr;The five branches are mutually exclusive and ordered by frequency on the
hot path: a fully external on-disk datum first, then the two in-memory
transient forms (INDIRECT, EXPANDED) that never appear on disk, then
inline-compressed, then short-header. The post-condition is always a plain
4-byte-header varlena that callers may pfree.
toast_fetch_datum is the reassembly engine. It copies the (potentially
unaligned) varatt_external out of the pointer, pre-sizes a result buffer
to the stored size, marks it compressed or not so the caller’s
VARATT_IS_COMPRESSED test works, and delegates the actual chunk read to
the table AM:
// toast_fetch_datum — src/backend/access/common/detoast.cVARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);attrsize = VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer);
result = (struct varlena *) palloc(attrsize + VARHDRSZ);if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) SET_VARSIZE_COMPRESSED(result, attrsize + VARHDRSZ);else SET_VARSIZE(result, attrsize + VARHDRSZ);
if (attrsize == 0) return result; /* shouldn't happen, but be safe */
toastrel = table_open(toast_pointer.va_toastrelid, AccessShareLock);table_relation_fetch_toast_slice(toastrel, toast_pointer.va_valueid, attrsize, 0, attrsize, result);table_close(toastrel, AccessShareLock);return result;table_relation_fetch_toast_slice is the table-AM hook; heap implements it
as heap_fetch_toast_slice. That function computes the chunk range, builds
1–3 scan keys (equality on valueid, plus an optional equality or range
condition on chunk_seq), and walks the toast index in order. The “all
chunks” fast path uses a single key; a sub-range uses BTGreaterEqual /
BTLessEqual bounds:
// heap_fetch_toast_slice — src/backend/access/heap/heaptoast.cstartchunk = sliceoffset / TOAST_MAX_CHUNK_SIZE;endchunk = (sliceoffset + slicelength - 1) / TOAST_MAX_CHUNK_SIZE;
ScanKeyInit(&toastkey[0], (AttrNumber) 1, BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(valueid));
if (startchunk == 0 && endchunk == totalchunks - 1) nscankeys = 1; /* whole value */else if (startchunk == endchunk){ ScanKeyInit(&toastkey[1], (AttrNumber) 2, BTEqualStrategyNumber, F_INT4EQ, Int32GetDatum(startchunk)); nscankeys = 2; /* single chunk */}else{ ScanKeyInit(&toastkey[1], (AttrNumber) 2, BTGreaterEqualStrategyNumber, F_INT4GE, Int32GetDatum(startchunk)); ScanKeyInit(&toastkey[2], (AttrNumber) 2, BTLessEqualStrategyNumber, F_INT4LE, Int32GetDatum(endchunk)); nscankeys = 3; /* chunk range */}
toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex], get_toast_snapshot(), nscankeys, toastkey);/* loop: copy each chunk's VARDATA into result, verifying curchunk == expectedchunk */The per-chunk loop verifies curchunk == expectedchunk and that the chunk
size matches the expected size for its position, raising
ERRCODE_DATA_CORRUPTED on any gap, duplicate, or out-of-order chunk — a
cheap integrity check that catches toast-table corruption at read time.
The slice variant detoast_attr_slice supports partial retrieval. For
uncompressed external datums it takes a fast path straight to
toast_fetch_datum_slice (which narrows the chunk range as shown above).
For compressed external datums it must fetch enough compressed bytes to
cover the requested decompressed prefix: PGLZ exposes
pglz_maximum_compressed_size to bound that; LZ4 has no equivalent, so the
slice path fetches the entire compressed value:
// detoast_attr_slice — src/backend/access/common/detoast.cif (VARATT_IS_EXTERNAL_ONDISK(attr)){ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/* fast path for non-compressed external datums */ if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) return toast_fetch_datum_slice(attr, sliceoffset, slicelength);
/* compressed: fetch enough to decompress the requested prefix */ if (slicelimit >= 0) { int32 max_size = VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer); if (VARATT_EXTERNAL_GET_COMPRESS_METHOD(toast_pointer) == TOAST_PGLZ_COMPRESSION_ID) max_size = pglz_maximum_compressed_size(slicelimit, max_size); /* (LZ4 has no such bound, so max_size stays = full size) */ preslice = toast_fetch_datum_slice(attr, 0, max_size); } else preslice = toast_fetch_datum(attr);}/* ... INDIRECT / EXPANDED / inline cases ... */
if (VARATT_IS_COMPRESSED(preslice)){ struct varlena *tmp = preslice; if (slicelimit >= 0) preslice = toast_decompress_datum_slice(tmp, slicelimit); else preslice = toast_decompress_datum(tmp); if (tmp != attr) pfree(tmp);}/* ... then copy [sliceoffset, sliceoffset+slicelength) out of preslice ... */This is why substring(big_text, 1, 100) on an uncompressed external value
touches only the first chunk, but the same call on an LZ4-compressed value
pays for the whole datum: the prefix bound is a property of the compressor,
not of TOAST.
toast_fetch_datum calls table_relation_fetch_toast_slice, which dispatches
to heap_fetch_toast_slice in heaptoast.c. That function opens the toast
relation, runs systable_beginscan_ordered with a SnapshotToastData snapshot,
and reads chunks in chunk_seq order, copying each chunk’s VARDATA into
the pre-allocated result buffer.
TOAST snapshot semantics
Section titled “TOAST snapshot semantics”Detoasting requires a snapshot to read the TOAST table. get_toast_snapshot
(in toast_internals.c) returns &SnapshotToastData, a special snapshot that
reads committed versions the same way SnapshotSelf does for the TOAST
relation. It enforces one safety rule: an active snapshot must be registered
in the current session before detoasting is attempted, so that the TOAST data
cannot be vacuumed away between the main-table fetch and the detoast fetch.
// get_toast_snapshot — src/backend/access/common/toast_internals.cif (!HaveRegisteredOrActiveSnapshot()) elog(ERROR, "cannot fetch toast data without an active snapshot");return &SnapshotToastData;TOAST and the table AM interface
Section titled “TOAST and the table AM interface”TOAST is not hardcoded to the heap access method. The table AM interface
(tableam.h) exposes table_relation_fetch_toast_slice, which the heap AM
implements as heap_fetch_toast_slice. A custom AM that stores large values
differently can provide its own implementation. The toasting decision logic
in toast_internals.c and detoast.c is AM-agnostic; only the chunk
storage and retrieval are AM-specific.
Flow diagrams
Section titled “Flow diagrams”Write path (INSERT/UPDATE):
flowchart TD
A[heap_insert / heap_update] --> B[heap_toast_insert_or_update]
B --> C{tuple > TOAST_TUPLE_TARGET?}
C -- No --> Z[return original tuple]
C -- Yes --> D[Round 1: compress EXTENDED\nthen externalize if still large]
D --> E[Round 2: externalize remaining\nEXTENDED / EXTERNAL]
E --> F[Round 3: compress MAIN]
F --> G[Round 4: externalize MAIN\ntarget = TOAST_TUPLE_TARGET_MAIN]
G --> H[heap_form_tuple with\nreplaced Datum values]
H --> Z2[return new tuple]
D -->|toast_save_datum| T1[toast rel: INSERT chunks\nassign valueid OID\nreplace Datum with varatt_external pointer]
E -->|toast_save_datum| T1
G -->|toast_save_datum| T1
Read path (detoasting):
flowchart TD
R[executor reads Datum from slot] --> V{varlena tag?}
V -- short header --> S[detoast_attr: expand to 4-byte header]
V -- inline compressed --> IC[detoast_attr: decompress\npglz / lz4]
V -- VARTAG_ONDISK --> E1[toast_fetch_datum:\nopen pg_toast_N\nsystable_beginscan_ordered\nread chunks in order]
E1 --> E2{chunk data compressed?}
E2 -- Yes --> IC
E2 -- No --> Done[return assembled varlena]
IC --> Done
S --> Done
Slice read (partial retrieval):
flowchart TD
P[detoast_attr_slice sliceoffset slicelength] --> Q{VARTAG_ONDISK?}
Q -- No compressed --> QA[toast_fetch_datum_slice:\nnarrow chunk range scan]
Q -- PGLZ compressed --> QB[pglz_maximum_compressed_size\nfetch minimal prefix chunks]
Q -- LZ4 compressed --> QC[fetch all chunks\nno streaming prefix API]
QA --> D2[copy slice from assembled buffer]
QB --> D3[decompress prefix then slice]
QC --> D3
D2 --> Out[return slice varlena]
D3 --> Out
End-to-end datum lifecycle (one EXTENDED column):
This view follows a single oversized TEXT/JSONB value across the symbol
boundaries — from the toaster decision down through chunk storage, and back
up through reassembly and decompression on read.
flowchart TD
subgraph WRITE [Write path]
W1[toast_tuple_try_compression] --> W2[toast_compress_datum]
W2 --> W3{pglz_compress_datum / lz4_compress_datum<br/>net saved over 2 bytes?}
W3 -- No --> W4[keep raw varlena]
W3 -- Yes --> W5[set va_tcinfo: size + method id]
W4 --> W6[toast_tuple_externalize]
W5 --> W6
W6 --> W7[toast_save_datum]
W7 --> W8[slice into TOAST_MAX_CHUNK_SIZE chunks<br/>heap_insert + index_insert per chunk]
W8 --> W9[build varatt_external<br/>va_rawsize / va_extinfo / va_valueid / va_toastrelid]
W9 --> W10[replace Datum with VARTAG_ONDISK pointer]
end
W10 -.stored on disk.-> R1
subgraph READ [Read path]
R1[detoast_attr] --> R2{VARATT_IS_EXTERNAL_ONDISK?}
R2 -- Yes --> R3[toast_fetch_datum]
R3 --> R4[table_relation_fetch_toast_slice<br/>heap_fetch_toast_slice]
R4 --> R5[systable_beginscan_ordered on toast index<br/>get_toast_snapshot]
R5 --> R6{VARATT_IS_COMPRESSED result?}
R6 -- Yes --> R7[toast_decompress_datum<br/>pglz / lz4 by va_tcinfo method id]
R6 -- No --> R8[plain varlena]
R7 --> R8
end
Source Walkthrough
Section titled “Source Walkthrough”Write-side symbols
Section titled “Write-side symbols”| Symbol | File | Role |
|---|---|---|
heap_toast_insert_or_update | access/heap/heaptoast.c | Entry point called by heap_insert / heap_update; drives the four-round loop |
heap_toast_delete | access/heap/heaptoast.c | Cascaded delete of toast rows when main tuple is deleted |
toast_tuple_init | access/common/toast_helper.c | Initialises ToastTupleContext, classifies each attribute |
toast_tuple_find_biggest_attribute | access/common/toast_helper.c | Selects the largest eligible attribute for each round |
toast_tuple_try_compression | access/common/toast_helper.c | Calls toast_compress_datum; replaces value in ttc_values if compressed |
toast_tuple_externalize | access/common/toast_helper.c | Calls toast_save_datum; replaces value with varatt_external pointer |
toast_tuple_cleanup | access/common/toast_helper.c | Frees old external values that were replaced |
toast_save_datum | access/common/toast_internals.c | Opens toast rel, assigns valueid, inserts chunks, returns pointer |
toast_delete_datum | access/common/toast_internals.c | Deletes all chunks for one valueid |
toast_delete_external | access/common/toast_internals.c | Iterates columns and calls toast_delete_datum for external ones |
toast_compress_datum | access/common/toast_internals.c | Dispatches to pglz_compress_datum or lz4_compress_datum |
toast_open_indexes | access/common/toast_internals.c | Opens all indexes on the toast relation; returns the valid one |
toast_close_indexes | access/common/toast_internals.c | Closes toast indexes and frees the array |
Read-side symbols
Section titled “Read-side symbols”| Symbol | File | Role |
|---|---|---|
detoast_attr | access/common/detoast.c | Full detoast: fetch external + decompress + expand short header |
detoast_external_attr | access/common/detoast.c | Fetch external datum only (may still be compressed) |
detoast_attr_slice | access/common/detoast.c | Partial retrieval; dispatches narrow scan or prefix decompress |
toast_fetch_datum | access/common/detoast.c (static) | Reassembles all chunks for a varatt_external datum |
toast_fetch_datum_slice | access/common/detoast.c (static) | Reassembles a chunk range for a varatt_external datum |
toast_decompress_datum | access/common/detoast.c (static) | Dispatches to pglz_decompress_datum or lz4_decompress_datum |
toast_decompress_datum_slice | access/common/detoast.c (static) | Decompresses only a prefix length of the datum |
heap_fetch_toast_slice | access/heap/heaptoast.c | AM-side chunk retrieval: systable_beginscan_ordered on the toast index |
get_toast_snapshot | access/common/toast_internals.c | Returns SnapshotToastData; enforces active-snapshot precondition |
toast_raw_datum_size | access/common/detoast.c | Returns decompressed size without fully detoasting |
toast_datum_size | access/common/detoast.c | Returns physical stored size |
Compression symbols
Section titled “Compression symbols”| Symbol | File | Role |
|---|---|---|
pglz_compress_datum | access/common/toast_compression.c | Compresses via pglz_compress; returns NULL if incompressible |
pglz_decompress_datum | access/common/toast_compression.c | Full PGLZ decompression |
pglz_decompress_datum_slice | access/common/toast_compression.c | Partial PGLZ decompression |
lz4_compress_datum | access/common/toast_compression.c | LZ4 compression (requires USE_LZ4) |
lz4_decompress_datum | access/common/toast_compression.c | Full LZ4 decompression |
lz4_decompress_datum_slice | access/common/toast_compression.c | Partial LZ4 decompression (requires liblz4 ≥ 1.8.3) |
toast_get_compression_id | access/common/toast_compression.c | Extracts ToastCompressionId from a varlena |
CompressionNameToMethod | access/common/toast_compression.c | Maps "pglz" / "lz4" string to compression method char |
Key data structures
Section titled “Key data structures”| Symbol | File | Role |
|---|---|---|
varatt_external | include/varatt.h | 18-byte on-disk toast pointer: va_rawsize, va_extinfo, va_valueid, va_toastrelid |
varatt_indirect | include/varatt.h | In-memory indirect pointer to another varlena |
ToastAttrInfo | include/access/toast_helper.h | Per-attribute classification flags and current size within ToastTupleContext |
ToastTupleContext | include/access/toast_helper.h | Working context for the four-round loop: rel, values, isnull, oldvalues, attr array, flags |
toast_compress_header | include/access/toast_internals.h | Inline-compressed datum header: vl_len_ + tcinfo (size + method ID) |
Constants
Section titled “Constants”| Constant | Definition | Value (8 KB page) |
|---|---|---|
TOAST_TUPLE_TARGET | heaptoast.h | ~2 KB (≡ TOAST_TUPLE_THRESHOLD, 4 tuples/page) |
TOAST_TUPLE_TARGET_MAIN | heaptoast.h | ~8 KB (1 tuple/page, one full page) |
TOAST_MAX_CHUNK_SIZE | heaptoast.h | ~1996 bytes (4 chunks/page minus overhead) |
TOAST_POINTER_SIZE | detoast.h | 18 bytes (VARHDRSZ_EXTERNAL + sizeof(varatt_external)) |
TYPSTORAGE_PLAIN/EXTERNAL/MAIN/EXTENDED | pg_type.h | 'p', 'e', 'm', 'x' |
Position hints (as of 2026-06-05, commit 273fe94)
Section titled “Position hints (as of 2026-06-05, commit 273fe94)”| Symbol | File | Approx. line |
|---|---|---|
heap_toast_insert_or_update | src/backend/access/heap/heaptoast.c | 96 |
heap_toast_delete | src/backend/access/heap/heaptoast.c | 43 |
heap_fetch_toast_slice | src/backend/access/heap/heaptoast.c | 626 |
toast_flatten_tuple | src/backend/access/heap/heaptoast.c | 350 |
toast_save_datum | src/backend/access/common/toast_internals.c | 119 |
toast_delete_datum | src/backend/access/common/toast_internals.c | 385 |
toast_compress_datum | src/backend/access/common/toast_internals.c | 46 |
toast_open_indexes | src/backend/access/common/toast_internals.c | 562 |
get_toast_snapshot | src/backend/access/common/toast_internals.c | 638 |
detoast_attr | src/backend/access/common/detoast.c | 116 |
detoast_external_attr | src/backend/access/common/detoast.c | 45 |
detoast_attr_slice | src/backend/access/common/detoast.c | 205 |
toast_fetch_datum | src/backend/access/common/detoast.c | 342 |
toast_fetch_datum_slice | src/backend/access/common/detoast.c | 395 |
toast_decompress_datum | src/backend/access/common/detoast.c | 470 |
toast_decompress_datum_slice | src/backend/access/common/detoast.c | 502 |
pglz_compress_datum | src/backend/access/common/toast_compression.c | 39 |
pglz_decompress_datum | src/backend/access/common/toast_compression.c | 81 |
pglz_decompress_datum_slice | src/backend/access/common/toast_compression.c | 108 |
lz4_compress_datum | src/backend/access/common/toast_compression.c | 138 |
lz4_decompress_datum | src/backend/access/common/toast_compression.c | 181 |
lz4_decompress_datum_slice | src/backend/access/common/toast_compression.c | 214 |
varatt_external | src/include/varatt.h | 32 |
TOAST_TUPLE_TARGET | src/include/access/heaptoast.h | 50 |
TOAST_MAX_CHUNK_SIZE | src/include/access/heaptoast.h | 84 |
TYPSTORAGE_EXTENDED | src/include/catalog/pg_type.h | 309 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”Verified against REL_18_STABLE, commit 273fe94.
Confirmed:
heap_toast_insert_or_updatefour-round loop structure matches the description: rounds 1–4,TYPSTORAGE_EXTENDED/TYPSTORAGE_MAINdiscrimination,TOAST_TUPLE_TARGET/TOAST_TUPLE_TARGET_MAINthresholds.toast_save_datumchunk loop,valueidassignment viaGetNewOidWithIndex, andvaratt_externalpointer construction confirmed.detoast_attrbranch structure for ONDISK / COMPRESSED / SHORT confirmed.detoast_attr_slicenarrow-scan path for uncompressed external, PGLZ prefix viapglz_maximum_compressed_size, and LZ4 full-fetch fallback confirmed.get_toast_snapshotHaveRegisteredOrActiveSnapshotguard confirmed.TOAST_MAX_CHUNK_SIZEdefined asEXTERN_TUPLE_MAX_SIZE - MAXALIGN(SizeofHeapTupleHeader) - sizeof(Oid) - sizeof(int32) - VARHDRSZinheaptoast.h.TOAST_POINTER_SIZE = VARHDRSZ_EXTERNAL + sizeof(varatt_external)indetoast.h.- LZ4
lz4_decompress_datum_slicefalls back to full decompression ifLZ4_versionNumber() < 10803confirmed.
AM interface:
table_relation_fetch_toast_sliceis the tableam dispatch point; heap’s implementation isheap_fetch_toast_sliceconfirmed insrc/backend/access/heap/heaptoast.c.
Unresolved / out of scope:
- The
toast_build_flattened_tuple/toast_flatten_tuplevariants are utility helpers (for container types and CLUSTER/rewrite); the table-rewrite OID-preservation path intoast_save_datumis present but not fully traced. - The
large_objectsubsystem (storage/large_object/) is a separate mechanism from TOAST and is not covered here. - Per-column
default_toast_compressionoverrides (ALTER TABLE … SET COMPRESSION) interact withtoast_compress_datum’scmethodparameter; verified that the parameter flows frompg_attribute.attcompressionthrough the caller chain, but the catalog path is not traced.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”MySQL/InnoDB off-page columns
Section titled “MySQL/InnoDB off-page columns”InnoDB stores large BLOB, TEXT, and VARCHAR values in overflow pages in
the same .ibd file. The threshold is approximately 40 bytes for COMPACT
row format: once the inline portion drops to 20 bytes, the rest goes to
overflow pages. Unlike PostgreSQL’s TOAST, InnoDB does not compress at the
storage layer by default — compression is a table-level option that applies
to B-tree pages, not individual column values. The trade-off: InnoDB’s off-
page storage avoids the separate heap, but TOAST’s out-of-line scheme lets
vacuum reclaim TOAST rows independently and keeps compression orthogonal to
page compression.
Oracle LOBs
Section titled “Oracle LOBs”Oracle LOB columns are stored in a dedicated LOB segment outside the main
table segment. An inline LOB below a threshold (11g: 4 KB; 12c+: configurable)
is stored inside the row (“inline LOB”). Oracle BasicFiles use an extent-based
storage model; SecureFiles (11g+) add deduplication, compression, and
encryption at the LOB level. The semantic difference from TOAST: Oracle LOBs
are a distinct SQL type with a read/write cursor API; PostgreSQL TOAST is
fully transparent — the application uses TEXT or BYTEA normally.
SQL Server row overflow and LOB pages
Section titled “SQL Server row overflow and LOB pages”SQL Server stores VARCHAR(MAX), NVARCHAR(MAX), VARBINARY(MAX), and
TEXT/IMAGE in LOB pages separate from the 8 KB data page. The main row
holds a 24-byte pointer. Row overflow for smaller values (still in 8 KB range
but exceeding 8060-byte row limit) uses row-overflow pages. No inline
compression equivalent to TOAST’s Round 1 exists; page-level compression
(PAGE COMPRESSION) is a table option analogous to InnoDB’s page compression.
Research: columnar and compressed storage
Section titled “Research: columnar and compressed storage”TOAST’s per-datum compression model predates the columnar storage movement, which compresses whole column runs (run-length encoding, dictionary encoding, delta encoding) rather than individual values. Columnar compression ratios are typically far higher because the compressor sees many values of the same type and distribution, not isolated varlenas. Engines like DuckDB and Apache Parquet use columnar compression exclusively; hybrid HTAP engines (like Greenplum’s AO tables, which build on PostgreSQL) bypass TOAST for columnar segments and handle large values at the block level.
The pluggable-AM surface in PostgreSQL 12+ (table_relation_fetch_toast_slice
dispatch) is the extension point for an AM that wants to bypass TOAST
entirely. The Citus columnar AM (columnar) introduced its own large-value
handling (though that is a contrib extension and thus out of scope here as
per the plan’s contrib boundary).
Future: TOAST and async I/O (PG18)
Section titled “Future: TOAST and async I/O (PG18)”PostgreSQL 18 introduced an async I/O layer (storage/aio/) for prefetching
heap pages during sequential scans. Toast chunk reads (heap_fetch_toast_slice
via systable_beginscan_ordered) currently bypass async I/O and use
synchronous ReadBuffer calls. A potential future optimisation would be to
prefetch toast chunks when an index scan on the main table detects an
out-of-line pointer — this is an open engineering item, not yet implemented
in REL_18_STABLE.
Sources
Section titled “Sources”src/backend/access/heap/heaptoast.c— heap-specific toasting entry points and chunk fetchsrc/backend/access/common/toast_internals.c—toast_save_datum,toast_delete_datum,toast_compress_datum, index helpers,get_toast_snapshotsrc/backend/access/common/detoast.c— detoasting, partial fetch, size reportingsrc/backend/access/common/toast_compression.c— PGLZ and LZ4 compress/decompress dispatchsrc/include/access/heaptoast.h—TOAST_TUPLE_TARGET,TOAST_MAX_CHUNK_SIZE, function declarationssrc/include/access/toast_internals.h—toast_compress_header,TOAST_COMPRESS_*macros, function declarationssrc/include/access/detoast.h—TOAST_POINTER_SIZEsrc/include/access/toast_helper.h—ToastAttrInfo,ToastTupleContext,TOAST_HAS_NULLS,TOAST_NEEDS_CHANGE,TOASTCOL_*flagssrc/include/varatt.h—varatt_external,varatt_indirect,VARTAG_*,VARATT_IS_*macros,VARHDRSZ_*src/include/catalog/pg_type.h—TYPSTORAGE_PLAIN/EXTERNAL/MAIN/EXTENDEDknowledge/code-analysis/postgres/postgres-heap-am.md— heap tuple layout, slotted page, HOT contextknowledge/code-analysis/postgres/postgres-page-layout.md—BLCKSZ, page header layoutknowledge/code-analysis/postgres/postgres-mvcc-snapshots.md— snapshot semanticsknowledge/code-analysis/postgres/postgres-vacuum.md— vacuum reclaims dead TOAST rows- Database System Concepts, Silberschatz et al., 7e, ch. 13 “Data Storage Structures”
- Database Internals, Alex Petrov, ch. 3 “File Formats”