Skip to content

PostgreSQL GiST — Generalized Search Trees and Opclass Support Functions

Contents:

A B-tree (postgres-nbtree.md) is the right index for data that has a total order: integers, strings, timestamps. Its entire machinery — the separator keys, the “move right to recover,” the binary search inside a page — rests on the assumption that any two keys can be compared with <. But a vast amount of interesting data has no natural total order. A geometric box is not “less than” another box; a point in the plane has no single linear rank that preserves nearness; an IP subnet, a range of integers, a tsvector of lexemes, a trigram set of a string — none of these sit on a number line. What these datatypes do have is a notion of containment or overlap: a query box overlaps an indexed box; a query point is contained in an indexed bounding rectangle; a query range is contained in an indexed range. An index that could prune the search tree using overlap rather than order would serve all of them.

The structure that generalizes the B-tree from order to overlap is the R-tree (Guttman 1984; the variant PostgreSQL effectively implements for spatial keys is closer to the R*-tree of Beckmann, Kriegel, Schneider & Seeger 1990, captured in dbms-papers/rstar-tree.md). An R-tree is a height-balanced tree in which every entry is a bounding predicate — for spatial data, a minimum bounding rectangle (MBR) — paired with a pointer. A leaf entry’s MBR bounds a single object; an internal entry’s MBR bounds the union of all MBRs in the subtree it points at. The two invariants that make it a search tree are:

  1. Coverage. An internal entry’s predicate covers (contains) the predicates of every child beneath it. A query predicate that is disjoint from an internal entry can therefore safely skip the entire subtree.
  2. Balance. All leaves are at the same depth, so lookups, inserts, and the R*-tree’s forced-reinsertion stay O(log N).

Unlike a B-tree, an R-tree’s bounding predicates may overlap between siblings, so a point query can descend into more than one subtree. The quality of an R-tree — how little it overlaps, how tightly its MBRs hug the data — is determined entirely by two heuristics: the insert path choice (“which child’s MBR do I enlarge least to swallow the new entry?”) and the node split (“when a page overflows, how do I partition its entries into two groups that minimize total area / overlap?”). Guttman’s quadratic and linear split heuristics, and the R*-tree’s later margin-minimizing split plus forced reinsertion, are exactly these two knobs tuned for rectangles in the plane.

The conceptual leap that produced GiST is the observation that the tree skeleton, the balance maintenance, the concurrency protocol, and the crash-recovery logic do not depend on the keys being rectangles at all. Hellerstein, Naughton & Pfeffer 1995, “Generalized Search Trees for Database Systems” (SIGMOD; the foundational reference cited at the top of src/backend/access/gist/README) made this explicit. Their thesis: a B-tree and an R-tree are both instances of one balanced, multiway, bounding- predicate tree that differs only in (a) what a “key” is and (b) four abstract operations on keys. Pull those four operations out into a user-supplied interface and you get a single, reusable index template that an extension author can specialize for any datatype with a notion of containment — not just rectangles, but ranges, full-text, network types, even exotica an extension invents. The four canonical GiST methods, in the 1995 paper’s vocabulary, are:

  • Consistent(E, q) — given an index entry E (a predicate) and a query predicate q, return false only if E and q can provably never both hold for any object. This is the pruning primitive: a false lets the search skip E’s subtree; a true (possibly over-cautious) forces descent.
  • Union(E1..En) — return a single predicate that holds for every object satisfying any of E1..En. This is what an internal downlink stores: the bounding predicate of its children.
  • Penalty(E1, E2) — a number measuring the cost of inserting E2 into the subtree rooted at E1 (typically the increase in E1’s “size,” e.g. area enlargement for a rectangle). This drives the insert path choice.
  • PickSplit(P) — given an overfull set of entries P, partition them into two groups. This drives the node split.

Two more methods round out the practical interface (the paper hints at them, PostgreSQL formalizes them): Compress/Decompress, which let the on-disk key be a lossy or compact representation of the indexed value (an MBR instead of a polygon), and Equal, used during downlink maintenance to detect when a recomputed union is unchanged. PostgreSQL adds Distance (for nearest-neighbor ORDER BY), Fetch (to reconstruct the original value for index-only scans), Options (opclass-parameter parsing), and SortSupport (to enable the bottom-up sorted build).

The payoff is enormous leverage: PostgreSQL’s box_ops, range_ops, inet_ops, tsvector_ops, PostGIS’s spatial index, pg_trgm’s similarity index, and btree_gist’s order-emulating opclasses are all the same access method, differing only in a dozen small C functions. The cost is that the access method cannot reason about the keys: it cannot binary- search a page (no order), it must call a user function for every pruning decision, and a sloppy opclass (a Consistent that returns true too often, a PickSplit that produces overlapping groups) silently degrades to a sequential scan dressed up as an index. GiST trades the B-tree’s self-contained efficiency for extensibility, and the rest of this document is about how PostgreSQL makes that trade crash-safe and concurrent.

Before reading PostgreSQL’s symbols, it helps to name the recurring design moves that any extensible, overlap-based, balanced tree makes. These are the conventions GiST instantiates.

The “template method” index: skeleton in the engine, semantics in a plug-in

Section titled “The “template method” index: skeleton in the engine, semantics in a plug-in”

The defining pattern is the classic Template Method: the database engine owns the algorithm skeleton (descend, split, propagate, lock, log, recover) and calls out to a small set of hooks that supply the datatype semantics. Every extensible index converges on roughly the same hook set — a match predicate for pruning, a bounding operation for downlinks, a cost function for path selection, and a partition function for splitting — because those are exactly the four decision points a balanced bounding- predicate tree must make and the only four that depend on what a key means. Oracle’s Extensible Indexing (the ODCIIndex interface), Informix’s Datablade virtual indexes, and Microsoft’s CLR-based spatial indexes are all the same idea under different names; GiST is the open-source canonical form, and the 1995 paper is its specification.

Because sibling predicates may overlap, three things differ from a B-tree:

  1. Search may fan out. A single query can match the Consistent test of several children of one page, so the search is not a single root-to-leaf path but a traversal of a frontier. Implementations keep a work queue (or stack) of pages still to visit rather than a single descent pointer.
  2. There is no high key to “move right” against. A B-tree recovers from a concurrent split by comparing the search key to the page’s high key. With no order, that test is unavailable — so the recovery signal must be made explicit (a sequence number per page) rather than derived from the key.
  3. Split quality is a heuristic, not a guarantee. A B-tree split is trivially balanced (cut the sorted run in half). An overlap tree’s split must choose a partition that minimizes future false descents, and the choice is datatype-specific — hence PickSplit is a user hook, and a bad one merely produces a slow index, not a wrong one.

A naive bounding-predicate insert descends to a leaf, inserts, then walks back up enlarging every ancestor’s predicate to cover the new entry — two passes, and a crash mid-way leaves ancestors that under-cover their subtrees. The robust alternative, which PostgreSQL adopts, is to enlarge the chosen downlink during the downward pass: at each internal page, before recursing into the chosen child, replace the child’s downlink with the union of the old downlink and the new key. When the leaf is reached, every ancestor already covers the new entry, so the insert is a single pass and the tree is self-consistent after writing the leaf — no upward fix-up, which is what makes crash recovery simple.

Section titled “Right-links + a sequence number for lock-light concurrency”

The B-link tree’s great trick (postgres-nbtree.md) — each level is a right-linked list, a search holds one page latch at a time, and a search that lands on a just-split page chases the right-link — transfers directly to an overlap tree, except for the split-detection signal. Lacking a high key, the overlap tree stamps each page with a node sequence number (NSN): when a split posts its downlink to the parent, it copies the parent-update’s LSN into the children’s NSN. A search remembers the parent page’s LSN when it read the downlink; on reaching the child it compares that remembered LSN to the child’s NSN. If the child’s NSN is newer, the child split after the search saw the parent, so the search must also follow the child’s right-link to the moved-away entries. This is Kornacker’s adaptation of L&Y to GiST (“Access Methods for Next-Generation Database Systems,” the README’s concurrency reference), and it is the exact mechanism PostgreSQL implements.

Crash-safe multi-page mutations via WAL-atomic steps

Section titled “Crash-safe multi-page mutations via WAL-atomic steps”

As in any WAL engine, a split spanning N pages is decomposed into atomic log records each leaving a valid tree. The canonical decomposition: (1) atomically write the new right page(s), set the old page’s right-link, and flag the old page “my right sibling has no parent downlink yet”; (2) atomically post the downlink(s) to the parent and clear the flag, stamping the NSN. A crash between (1) and (2) leaves a fully searchable tree — the flag tells both searches and the next inserter to chase the right-link — and the next descent repairs the incomplete split lazily.

Inserting a billion tuples one at a time into a disk tree is random-I/O bound: each insert touches a root-to-leaf path that is mostly not cached. Two standard accelerations apply. Sorted build (B-tree style): if the keys can be linearized into a sort order that preserves locality, sort them and pack leaves bottom-up, building each parent level from the level below. Buffering build (Arge, Hinrichs, Vahrenhold & Vitter, the README’s build reference): attach in-memory/temp-file buffers to internal nodes, push each incoming tuple only as far down as the first buffered node, and flush a buffer en masse when it fills — converting depth-first random descents into breadth-first batched ones sized to fit the cache.

Theory / conventionPostgreSQL name (gist)
Consistent (pruning predicate)GIST_CONSISTENT_PROCgiststate->consistentFn, called in gistindex_keytest
Union (bounding predicate of children)GIST_UNION_PROCunionFn, via gistMakeUnionItVec
Penalty (insert-path cost)GIST_PENALTY_PROCpenaltyFn, via gistpenalty / gistchoose
PickSplit (node partition)GIST_PICKSPLIT_PROCpicksplitFn, via gistUserPicksplit / gistSplitByKey
Compress / Decompress (on-disk key form)GIST_COMPRESS_PROC / GIST_DECOMPRESS_PROCcompressFn / decompressFn
Equal (unchanged-union detection)GIST_EQUAL_PROCequalFn, via gistgetadjusted
Distance (k-NN ORDER BY)GIST_DISTANCE_PROCdistanceFn, k-NN queue in gistget.c
Fetch (index-only reconstruction)GIST_FETCH_PROCfetchFn, via gistFetchTuple
SortSupport (bottom-up build)GIST_SORTSUPPORT_PROC, gates GIST_SORTED_BUILD
Bounding-predicate downlinkinner IndexTuple whose key is a union; t_tid is the child block
Adjust-on-the-way-downgistgetadjusted + gistinserttuple inside gistdoinsert
Node sequence number (split signal)GistPageGetNSN / GistPageSetNSN, GistNSN = XLogRecPtr
Right-linkGISTPageOpaqueData.rightlink
”Right sibling lacks downlink” flagF_FOLLOW_RIGHT (GistFollowRight / GistMarkFollowRight)
Split step 1 / step 2gistplacetopage (link+flag) / gistfinishsplit (post downlink, set NSN)
Lazy incomplete-split repairgistfixsplit (triggered on F_FOLLOW_RIGHT)
Re-find migrated parentgistFindCorrectParent / gistFindPath
Work queue for fan-out searchGISTScanOpaque->queue (a pairingheap)
Sorted bulk buildgist_indexsortbuild (GIST_SORTED_BUILD)
Buffered bulk buildgistInitBuffering + GISTBuildBuffers (GIST_BUFFERING_ACTIVE)
AM dispatch tablegisthandler returns IndexAmRoutine (see postgres-index-am.md)

PostgreSQL’s src/backend/access/gist/ is, in the README’s words, “an implementation of GiST indexing,” concurrency-hardened per Kornacker and bulk-load-accelerated per Arge et al. The on-disk tree is a height-balanced, right-linked tree of BLCKSZ pages; the AM owns descent, split, WAL, and concurrency; an opclass supplies the datatype semantics through up to twelve support functions. The whole personality of the AM is declared in one handler.

The AM handler: what GiST can and cannot do

Section titled “The AM handler: what GiST can and cannot do”
// gisthandler — src/backend/access/gist/gist.c
Datum
gisthandler(PG_FUNCTION_ARGS)
{
IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);
amroutine->amstrategies = 0; // strategy numbers are opclass-defined
amroutine->amsupport = GISTNProcs; // 12 support procs (consistent..sortsupport)
amroutine->amcanorder = false; // no total order -> can't drive ORDER BY col
amroutine->amcanorderbyop = true; // CAN drive ORDER BY <-> (k-NN)
amroutine->amcanmulticol = true; // multi-column GiST
amroutine->amoptionalkey = true;
amroutine->amsearchnulls = true;
amroutine->amstorage = true; // on-disk key type != indexed type (compress)
amroutine->amclusterable = true;
amroutine->ampredlocks = true; // SERIALIZABLE predicate locking supported
amroutine->amcanparallel = false;
amroutine->amcaninclude = true; // INCLUDE columns
// ... method pointers ...
amroutine->ambuild = gistbuild;
amroutine->aminsert = gistinsert;
amroutine->ambulkdelete = gistbulkdelete;
amroutine->amgettuple = gistgettuple;
amroutine->amgetbitmap = gistgetbitmap;
amroutine->amcanreturn = gistcanreturn; // index-only scans (if Fetch defined)
amroutine->amtranslatecmptype = gisttranslatecmptype;
PG_RETURN_POINTER(amroutine);
}

The two telling flags are amcanorder = false (GiST keys have no total order, so a GiST index can never satisfy a plain ORDER BY col) and amcanorderbyop = true (it can satisfy ORDER BY col <-> point, the k-nearest-neighbor operator, because the Distance support function provides a lower-bound metric). amstorage = true reflects Compress: the stored key type may differ from the indexed type. amsupport = GISTNProcs (12) is the size of the support-function array the opclass may fill.

Every operation on a GiST index runs against a GISTSTATE, a per-statement cache of the resolved support-function FmgrInfos, built once by initGISTstate and stashed in IndexInfo->ii_AmCache:

// initGISTstate — src/backend/access/gist/gist.c
for (i = 0; i < IndexRelationGetNumberOfKeyAttributes(index); i++)
{
fmgr_info_copy(&(giststate->consistentFn[i]),
index_getprocinfo(index, i + 1, GIST_CONSISTENT_PROC), scanCxt);
fmgr_info_copy(&(giststate->unionFn[i]),
index_getprocinfo(index, i + 1, GIST_UNION_PROC), scanCxt);
/* opclasses are not required to provide a Compress method */
if (OidIsValid(index_getprocid(index, i + 1, GIST_COMPRESS_PROC)))
fmgr_info_copy(&(giststate->compressFn[i]), ... , scanCxt);
else
giststate->compressFn[i].fn_oid = InvalidOid;
// ... penalty, picksplit, equal are mandatory; decompress/distance/fetch optional ...
}

consistent, union, penalty, picksplit, and equal are mandatory; compress, decompress, distance, and fetch are optional and detected by OidIsValid. The nonLeafTupdesc is a truncated copy of the index tuple descriptor without INCLUDE columns, used when forming downlink (internal) tuples — internal keys carry only the key columns, never the payload.

Section titled “The two key-shapes: leaf entries vs. downlinks”

A GiST leaf IndexTuple stores the (possibly Compress-ed) key for one heap row, with t_tid pointing at the heap tuple. An internal IndexTuple stores a union key — the bounding predicate of its child page — with t_tid holding the child block number. The Consistent function is invoked with a flag telling it whether it is looking at a leaf entry (an exact object) or an internal entry (a bounding predicate), because “does the query overlap this box” is asked of both, but the answer’s meaning differs.

Section titled “Descending by least penalty, adjusting downlinks on the way down”

gistdoinsert walks from the root holding one page at a time. At each internal page it calls gistchoose, which evaluates penaltyFn for every downlink and returns the offset of least penalty (ties broken with a small randomization for space/cache balance). Before recursing, it asks whether the chosen downlink already covers the new key; if not, it enlarges it in place:

// gistdoinsert — src/backend/access/gist/gist.c (internal-page case)
downlinkoffnum = gistchoose(state.r, stack->page, itup, giststate);
iid = PageGetItemId(stack->page, downlinkoffnum);
idxtuple = (IndexTuple) PageGetItem(stack->page, iid);
childblkno = ItemPointerGetBlockNumber(&(idxtuple->t_tid));
/* Does the child's downlink already cover the key we're inserting? */
newtup = gistgetadjusted(state.r, idxtuple, itup, giststate);
if (newtup)
{
/* No: enlarge the downlink in place (may itself split this page). */
if (gistinserttuple(&state, stack, giststate, newtup, downlinkoffnum))
{
/* the parent split; retry from the parent */
}
}

gistgetadjusted calls unionFn to compute union(old downlink, new key) and equalFn to check whether the union actually changed; it returns NULL (skip the update) when the downlink already covers the key. Because this runs on the downward pass, by the time we reach the leaf every ancestor covers the new entry — the single-pass property the README emphasizes.

Choosing the subtree: gistchoose / gistpenalty

Section titled “Choosing the subtree: gistchoose / gistpenalty”
// gistpenalty — src/backend/access/gist/gistutil.c
float
gistpenalty(GISTSTATE *giststate, int attno,
GISTENTRY *orig, bool isNullOrig,
GISTENTRY *add, bool isNullAdd)
{
float penalty = 0.0;
if (giststate->penaltyFn[attno].fn_strict == false ||
(isNullOrig == false && isNullAdd == false))
{
FunctionCall3Coll(&giststate->penaltyFn[attno],
giststate->supportCollation[attno],
PointerGetDatum(orig), PointerGetDatum(add),
PointerGetDatum(&penalty));
if (isnan(penalty) || penalty < 0.0) // sanitize a misbehaving opclass
penalty = 0.0;
}
else if (isNullOrig && isNullAdd)
penalty = 0.0;
else
penalty = get_float4_infinity(); // avoid mixing null and non-null
return penalty;
}

For box_ops the penalty is the area enlargement of the bounding box; gistchoose picks the downlink with the smallest enlargement, walking multi-column indexes column-by-column with earlier columns dominating.

Section titled “NSN, right-links, and the F_FOLLOW_RIGHT flag”

The page header’s opaque area carries the concurrency machinery:

// gist.h — page flags, NSN type, and accessors
#define F_LEAF (1 << 0) /* leaf page */
#define F_DELETED (1 << 1) /* the page has been deleted */
#define F_FOLLOW_RIGHT (1 << 3) /* page to the right has no downlink */
typedef XLogRecPtr GistNSN; /* node sequence number == a special LSN */
#define GistFollowRight(page) (GistPageGetOpaque(page)->flags & F_FOLLOW_RIGHT)
#define GistPageGetNSN(page) (PageXLogRecPtrGet(GistPageGetOpaque(page)->nsn))
#define GistPageSetNSN(page, val) (PageXLogRecPtrSet(GistPageGetOpaque(page)->nsn, val))

GistNSN is literally an XLogRecPtr; the NSN of a child is set to the LSN of the WAL record that inserted its downlink into the parent. A search compares “the parent LSN I remembered” against “this child’s NSN” — if the NSN is larger, the split happened after I read the parent, so I must chase the right-link. This is the order-free replacement for the B-tree’s high-key test.

Search as a frontier traversal with a pairing heap

Section titled “Search as a frontier traversal with a pairing heap”

Because predicates overlap, a search is not a single descent. gistgettuple seeds a queue with the root and pulls items off it; gistScanPage scans one page, runs Consistent on each entry, and pushes matching children (or, for k-NN, matching heap tuples with their distances) back onto the queue. For plain scans the queue keeps heap tuples at the front for depth-first, LIMIT- friendly output; for ORDER BY <-> it is a distance-ordered pairingheap so the nearest tuple always emerges next.

flowchart TD
  A["gistgettuple<br/>(root pushed to queue)"] --> B{"pull next<br/>queue item"}
  B -->|"heap tuple"| C["return heapPtr<br/>(recheck if lossy)"]
  B -->|"index page"| D["gistScanPage(page)"]
  D --> E{"NSN > remembered<br/>parent LSN?"}
  E -->|"yes: concurrent split"| F["push right sibling<br/>to front of queue"]
  E -->|"no"| G["scan entries"]
  F --> G
  G --> H{"Consistent(entry, query)?"}
  H -->|"false"| I["skip subtree"]
  H -->|"true, leaf"| J["enqueue heap tuple<br/>(+ distance for k-NN)"]
  H -->|"true, internal"| K["enqueue child page<br/>(remember this page's LSN)"]
  J --> B
  K --> B
  I --> B
  C --> B

This is the same “one page locked at a time, chase the right-link on a concurrent split” idiom as nbtree, with three differences forced by the overlap model: the descent is a queue not a pointer, the split signal is the NSN not a high key, and pruning is a user function call not a key comparison.

This section walks the call flows in gist.c, gistget.c, gistsplit.c, and gistbuild.c, anchored on stable symbol names. Line numbers live only in the position-hint table at the end.

The support-function contract (gist.c, gist.h)

Section titled “The support-function contract (gist.c, gist.h)”

The twelve support-procedure numbers are macros in gist.h: GIST_CONSISTENT_PROC (1), GIST_UNION_PROC (2), GIST_COMPRESS_PROC (3), GIST_DECOMPRESS_PROC (4), GIST_PENALTY_PROC (5), GIST_PICKSPLIT_PROC (6), GIST_EQUAL_PROC (7), GIST_DISTANCE_PROC (8), GIST_FETCH_PROC (9), GIST_OPTIONS_PROC (10), GIST_SORTSUPPORT_PROC (11), with GISTNProcs = 12. gisthandler exposes the AM; initGISTstate resolves each procedure’s OID to an FmgrInfo and caches it in the GISTSTATE, marking optional procedures with InvalidOid when absent. gistvalidate (not quoted here) is the amvalidate hook that CREATE OPERATOR CLASS runs to check an opclass supplies a coherent set.

The Consistent function is reached through gistindex_keytest, which is the literal pruning gate of every scan:

// gistindex_keytest — src/backend/access/gist/gistget.c
recheck = true; // safest assumption if Consistent forgets to set it
test = FunctionCall5Coll(&key->sk_func, key->sk_collation,
PointerGetDatum(&de), // index datum as GISTENTRY*
key->sk_argument, // the query value
Int16GetDatum(key->sk_strategy),
ObjectIdGetDatum(key->sk_subtype),
PointerGetDatum(&recheck));
if (!DatumGetBool(test))
return false; // provably disjoint: prune this entry/subtree
*recheck_p |= recheck; // lossy match: caller must recheck against heap

Two subtleties live here. First, recheck: a lossy opclass (e.g. pg_trgm, or a Compress that stored a bounding box for a polygon) returns true with recheck = true, meaning “this might match — confirm against the real heap value.” The executor re-evaluates the original operator on the heap tuple. Second, NULL handling: on an internal page a NULL key cannot prove the subtree is null-free (GiST’s rule is union(VAL, NULL) = VAL), so SK_SEARCHNULL only prunes on leaf pages — the README’s NULL-safety guarantee made concrete.

Insert: gistinsertgistdoinsertgistplacetopage

Section titled “Insert: gistinsert → gistdoinsert → gistplacetopage”

gistinsert is a thin wrapper: it lazily builds the GISTSTATE, forms the leaf IndexTuple via gistFormTuple (which runs Compress), stamps t_tid with the heap CTID, and calls gistdoinsert. gistdoinsert is the descent engine. It starts at GIST_ROOT_BLKNO, grabs a shared lock optimistically, and at each page checks, in order: (a) an incomplete split to repair, (b) a concurrent split/deletion to retry from the parent, (c) leaf vs. internal.

// gistdoinsert — src/backend/access/gist/gist.c (concurrency checks)
/* (a) crashed mid-split? finish it before touching the page. */
if (GistFollowRight(stack->page))
{
gistfixsplit(&state, giststate);
state.stack = stack = stack->parent; // retry from parent
continue;
}
/* (b) page split or was deleted since we read its parent downlink? */
if ((stack->blkno != GIST_ROOT_BLKNO &&
stack->parent->lsn < GistPageGetNSN(stack->page)) ||
GistPageIsDeleted(stack->page))
{
state.stack = stack = stack->parent; // rechoose best child
continue;
}

When the leaf is reached, gistinserttuple/gistinserttuples call the workhorse gistplacetopage, which either fits the tuple onto the page or splits it. The split path is where the opclass PickSplit and the WAL-atomic decomposition meet:

// gistplacetopage — src/backend/access/gist/gist.c (split path)
is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
if (is_split && GistPageIsLeaf(page) && GistPageHasGarbage(page))
{
gistprunepage(rel, page, buffer, heapRel); // reclaim LP_DEAD first
is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
}
if (is_split)
{
itvec = gistextractpage(page, &tlen); // all existing tuples
itvec = gistjoinvector(itvec, &tlen, itup, ntup);
dist = gistSplit(rel, page, itvec, tlen, giststate); // calls PickSplit
if (npage > GIST_MAX_SPLIT_PAGES)
elog(ERROR, "GiST page split into too many halves (%d, maximum %d)", ...);
// ... allocate buffers, distribute tuples, set rightlinks ...
}

Note the order: GiST tries to prune dead tuples (gistprunepage, the LP_DEAD reclamation) before splitting, exactly as nbtree deletes-or-dedups before splitting. Note also GIST_MAX_SPLIT_PAGES: because keys are variable-length and PickSplit has no size information, one split may need to produce more than two pages, and the recursion in gistSplit handles that.

Linking the split and the F_FOLLOW_RIGHT / NSN protocol

Section titled “Linking the split and the F_FOLLOW_RIGHT / NSN protocol”

After PickSplit decides the partition, gistplacetopage wires the new pages into a right-linked chain and flags every non-rightmost page:

// gistplacetopage — src/backend/access/gist/gist.c (linking + flagging)
/* Set up rightlinks */
if (ptr->next && ptr->block.blkno != GIST_ROOT_BLKNO)
GistPageGetOpaque(ptr->page)->rightlink = ptr->next->block.blkno;
else
GistPageGetOpaque(ptr->page)->rightlink = oldrlink; // tail keeps old rightlink
/* Flag all but the rightmost: "my right sibling has no parent downlink yet" */
if (ptr->next && !is_rootsplit && markfollowright)
GistMarkFollowRight(ptr->page);
else
GistClearFollowRight(ptr->page);
/* Copy the original page's NSN to every fragment so scans chase rightlinks */
GistPageSetNSN(ptr->page, oldnsn);

This is step 1 of the split: the new geometry is durable and searchable (a reader chases the right-link because F_FOLLOW_RIGHT is set), but the parent does not yet have downlinks for the new pages. gistplacetopage returns the new downlink tuples to its caller via *splitinfo. Step 2 is gistfinishsplit, which posts those downlinks to the parent (recursing up if the parent itself splits) and, in doing so, sets each child’s NSN to the LSN of the downlink-insertion record and clears F_FOLLOW_RIGHT. A crash between the two steps leaves an index that searches traverse correctly and that the next descent repairs through gistfixsplit.

flowchart TD
  A["gistplacetopage:<br/>page full"] --> B["gistSplit ->gistSplitByKey<br/>(opclass PickSplit)"]
  B --> C["allocate right pages,<br/>distribute tuples"]
  C --> D["set rightlinks,<br/>GistMarkFollowRight,<br/>copy NSN — STEP 1"]
  D --> E["XLogInsert split record<br/>(WAL-atomic)"]
  E --> F["return *splitinfo<br/>(new downlinks) to caller"]
  F --> G["gistfinishsplit:<br/>post downlinks to parent"]
  G --> H{"parent full?"}
  H -->|"yes"| A
  H -->|"no"| I["set child NSN = downlink LSN,<br/>GistClearFollowRight — STEP 2"]
  I --> J["split complete"]
  D -.->|"crash here"| K["F_FOLLOW_RIGHT left set;<br/>next descent calls<br/>gistfixsplit to repair"]

Re-finding a migrated parent: gistFindCorrectParent / gistFindPath

Section titled “Re-finding a migrated parent: gistFindCorrectParent / gistFindPath”

During an upward downlink insertion, the parent page recorded in the descent stack may itself have split and migrated rightward. gistFindCorrectParent checks whether the parent’s LSN changed since descent; if so it walks the parent level’s right-links searching for the page holding the child’s downlink, and falls back to gistFindPath — a breadth-first re-search from the root following the README’s findPath pseudocode — when right-link walking is insufficient. This is the GiST analogue of nbtree’s _bt_getstackbuf.

PickSplit and the multi-column / secondary-split machinery

Section titled “PickSplit and the multi-column / secondary-split machinery”

gistSplit (the recursive splitter) calls gistSplitByKey, which decides which column to split on and invokes the user PickSplit through gistUserPicksplit:

// gistUserPicksplit — src/backend/access/gist/gistsplit.c
FunctionCall2Coll(&giststate->picksplitFn[attno],
giststate->supportCollation[attno],
PointerGetDatum(entryvec), // entries to partition
PointerGetDatum(sv)); // GIST_SPLITVEC out-param
if (sv->spl_nleft == 0 || sv->spl_nright == 0)
{
/* opclass PickSplit put everything on one side: complain, then cope */
ereport(DEBUG1, (errmsg("picksplit method for column %d of index \"%s\" failed", ...)));
genericPickSplit(giststate, entryvec, sv, attno); // fall back to a 50/50 split
}

Several robustness layers are visible. A PickSplit that degenerately dumps everything on one side is caught and replaced with genericPickSplit (a size-balanced halving) so a buggy opclass yields a poor but correct index. For a multi-column GiST, gistSplitByKey recurses: if splitting on column 0 leaves the left and right union keys equal (gistKeyIsEQ) the split is useless on that column, so it identifies don’t-care tuples (those equally placeable on either side per findDontCares) and re-splits them on the next column — the README’s “secondary split.” NULL keys are routed wholesale to one side (gistSplitByKey’s null handling) so a page never mixes NULL and non-NULL keys, preserving the union(VAL, NULL) = VAL invariant. After the recursion, gistunionsubkey recomputes the left/right union datums for every column so the two new downlinks correctly bound their pages.

Search: gistgettuple, gistgetbitmap, and k-NN

Section titled “Search: gistgettuple, gistgetbitmap, and k-NN”

gistgettuple drives ordered and plain scans; gistgetbitmap drives bitmap scans. Both funnel through gistScanPage, already shown. The k-NN path is distinctive: gistindex_keytest also evaluates the Distance support function for each ORDER BY x <-> q clause, giving heap entries an exact distance and internal entries a lower-bound distance (the closest any child could be). The pairingheap orders the queue by distance, so pulling items in heap order yields tuples in increasing true distance — an incremental nearest-neighbor scan that never has to materialize or sort the whole result. If Distance set its recheck flag, the distance is only a lower bound and the executor re-sorts with the exact value.

gistgettuple also performs lazy dead-tuple marking: when the executor reports kill_prior_tuple, the offset is stashed in so->killedItems and later flushed by gistkillitems, which sets LP_DEAD on the index entries — the same hint-bit reclamation tier as nbtree’s _bt_killitems.

gistbuild chooses among three strategies up front:

// gistbuild — src/backend/access/gist/gistbuild.c (strategy selection)
if (options && options->buffering_mode == GIST_OPTION_BUFFERING_ON)
buildstate.buildMode = GIST_BUFFERING_STATS; // forced buffering
else
buildstate.buildMode = GIST_BUFFERING_AUTO; // decide later
if (buildstate.buildMode != GIST_BUFFERING_STATS)
{
/* Use the sorted build if EVERY key column has a sortsupport proc. */
bool hasallsortsupports = true;
for (int i = 0; i < keyscount; i++)
if (!OidIsValid(index_getprocid(index, i + 1, GIST_SORTSUPPORT_PROC)))
{ hasallsortsupports = false; break; }
if (hasallsortsupports)
buildstate.buildMode = GIST_SORTED_BUILD;
}
  1. Sorted build (GIST_SORTED_BUILD), chosen when every key column’s opclass provides GIST_SORTSUPPORT_PROC. The heap is scanned into a tuplesort (tuplesort_begin_index_gist), sorted by the opclass’s linearization (for points, typically a space-filling curve), then gist_indexsortbuild packs leaves bottom-up and builds each parent level from the level below — the B-tree-style bottom-up build. This is the fastest path and the modern default for point_ops and PostGIS.

  2. Buffering build (GIST_BUFFERING_ACTIVE), used for opclasses without sortsupport once the index outgrows effective_cache_size. gistbuild starts in plain insert mode, watches the index size in gistBuildCallback, and switches via gistInitBuffering when the tree no longer fits in cache. Buffers attached to internal nodes (GISTBuildBuffers) absorb incoming tuples; gistProcessEmptyingQueue flushes half-full buffers down a level at a time, converting random descents into cache-sized breadth-first batches (Arge et al.).

  3. Plain insert build, the fallback: gistBuildCallback calls gistdoinsert directly for each heap tuple, exactly like run-time inserts but with WAL skipped (is_build = true, pages stamped with GistBuildLSN).

// gistBuildCallback — src/backend/access/gist/gistbuild.c (per-heap-tuple)
itup = gistFormTuple(buildstate->giststate, index, values, isnull, true);
itup->t_tid = *tid;
if (buildstate->buildMode == GIST_BUFFERING_ACTIVE)
gistBufferingBuildInsert(buildstate, itup); // push into node buffers
else
gistdoinsert(index, itup, buildstate->freespace,
buildstate->giststate, buildstate->heaprel, true); // direct

Position hints (as of 2026-06-05, REL_18 273fe94)

Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”
SymbolFileLine
gisthandlersrc/backend/access/gist/gist.c59
gistinsertsrc/backend/access/gist/gist.c165
gistplacetopagesrc/backend/access/gist/gist.c230
gistdoinsertsrc/backend/access/gist/gist.c639
gistFindPathsrc/backend/access/gist/gist.c914
gistFindCorrectParentsrc/backend/access/gist/gist.c1027
gistformdownlinksrc/backend/access/gist/gist.c1140
gistfixsplitsrc/backend/access/gist/gist.c1200
gistinserttuplesrc/backend/access/gist/gist.c1260
gistSplitsrc/backend/access/gist/gist.c1466
initGISTstatesrc/backend/access/gist/gist.c1537
gistindex_keytestsrc/backend/access/gist/gistget.c125
gistScanPagesrc/backend/access/gist/gistget.c328
gistkillitemssrc/backend/access/gist/gistget.c38
gistgettuplesrc/backend/access/gist/gistget.c612
gistgetbitmapsrc/backend/access/gist/gistget.c745
gistcanreturnsrc/backend/access/gist/gistget.c797
gistunionsubkeysrc/backend/access/gist/gistsplit.c80
findDontCaressrc/backend/access/gist/gistsplit.c113
supportSecondarySplitsrc/backend/access/gist/gistsplit.c258
genericPickSplitsrc/backend/access/gist/gistsplit.c344
gistUserPicksplitsrc/backend/access/gist/gistsplit.c415
gistSplitHalfsrc/backend/access/gist/gistsplit.c585
gistSplitByKeysrc/backend/access/gist/gistsplit.c623
gistbuildsrc/backend/access/gist/gistbuild.c179
gist_indexsortbuildsrc/backend/access/gist/gistbuild.c400
gistInitBufferingsrc/backend/access/gist/gistbuild.c626
gistBuildCallbacksrc/backend/access/gist/gistbuild.c822
gistProcessItupsrc/backend/access/gist/gistbuild.c925
gistbufferinginserttuplessrc/backend/access/gist/gistbuild.c1056
gistchoosesrc/backend/access/gist/gistutil.c374
gistpenaltysrc/backend/access/gist/gistutil.c724
gistMakeUnionItVecsrc/backend/access/gist/gistutil.c155
gistdentryinitsrc/backend/access/gist/gistutil.c547
gistFetchTuplesrc/backend/access/gist/gistutil.c667
GIST_CONSISTENT_PROC..GISTNProcssrc/include/access/gist.h32–44
F_LEAF / F_DELETED / F_FOLLOW_RIGHTsrc/include/access/gist.h49–53
GistNSN / GistBuildLSNsrc/include/access/gist.h65 / 72
GistFollowRight / GistPageGetNSNsrc/include/access/gist.h185 / 189
GISTSTATE (support FmgrInfo cache)src/include/access/gist_private.h75

All claims below were checked against the REL_18_STABLE working tree at commit 273fe94.

  • Twelve support procedures, GISTNProcs == 12. Verified in src/include/access/gist.h: GIST_CONSISTENT_PROC (1) through GIST_SORTSUPPORT_PROC (11), GIST_OPTIONS_PROC (10), #define GISTNProcs 12. gisthandler sets amsupport = GISTNProcs.
  • amcanorder = false, amcanorderbyop = true. Verified verbatim in gisthandler — GiST cannot satisfy ORDER BY col but can satisfy ORDER BY col <-> q via the Distance proc.
  • Mandatory vs. optional procs. initGISTstate copies consistentFn, unionFn, penaltyFn, picksplitFn, equalFn unconditionally; compressFn, decompressFn, distanceFn, fetchFn are guarded by OidIsValid(index_getprocid(...)) and left InvalidOid when absent.
  • GistNSN is an XLogRecPtr. Verified typedef XLogRecPtr GistNSN; in gist.h. The split detector stack->parent->lsn < GistPageGetNSN(...) in gistdoinsert and the symmetric test in gistScanPage both compare a remembered parent LSN to the child NSN.
  • F_FOLLOW_RIGHT lifecycle. gistplacetopage calls GistMarkFollowRight on non-rightmost split fragments (step 1); gistfinishsplit/downlink insertion clears it and sets the NSN (step 2); gistdoinsert and gistScanPage both honor it; gistfixsplit repairs a page left flagged by a crash. All verified in gist.c/gistget.c.
  • Prune-before-split. gistplacetopage calls gistprunepage to reclaim GistPageHasGarbage LP_DEAD tuples before deciding it must split — verified in the split path.
  • GIST_MAX_SPLIT_PAGES guard. gistplacetopage raises elog(ERROR, "GiST page split into too many halves ...") — verified; a single split can legitimately produce more than two pages because keys are variable length.
  • Degenerate-PickSplit fallback. gistUserPicksplit falls back to genericPickSplit and emits a DEBUG1 “picksplit method … failed” when the opclass returns an empty side — verified in gistsplit.c.
  • Secondary split on multi-column. gistSplitByKey recurses to the next column on a degenerate split, using findDontCares and recomputing unions via gistunionsubkey — verified.
  • Three build strategies. gistbuild selects GIST_SORTED_BUILD iff every key column has GIST_SORTSUPPORT_PROC; otherwise buffering (GIST_BUFFERING_AUTO/ACTIVE via gistInitBuffering) or plain inserts through gistBuildCallbackgistdoinsert. Verified.
  • Predicate locking / SERIALIZABLE. gistScanPage calls PredicateLockPage, and gisthandler sets ampredlocks = true — verified.
  • k-NN via pairing heap. gistgettuple routes numberOfOrderBys > 0 to getNextNearest; gistScanPage pushes items into so->queue, a pairingheap, ordered by Distance-proc results. Verified.
  • No parallel scan/build. gisthandler sets amcanparallel = false and amcanbuildparallel = false on REL_18. Verified (distinguishes GiST from nbtree/BRIN which do support parallel builds).

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”
  • GiST 1995 vs. the production implementation. Hellerstein, Naughton & Pfeffer’s paper specifies the interface (Consistent / Union / Penalty / PickSplit, plus the search and insert templates) but is silent on concurrency, recovery, and bulk loading — the original Berkeley prototype (Hellerstein & Aoki, the README notes) was a single-user “university project.” Everything that makes PostgreSQL’s GiST a production index — the NSN + right-link protocol, the two-step WAL-atomic split, the gistfixsplit lazy repair, the buffering build — is outside the 1995 paper. A focused note mapping each production mechanism back to the (absent) paper guarantee it had to invent would sharpen exactly what “generalize the B-tree” leaves unspecified. Paper: Hellerstein/Naughton/Pfeffer 1995 (the README’s foundational reference).

  • NSN concurrency vs. Lehman & Yao’s high key. GiST’s split-detection signal is Kornacker’s NSN, an explicit per-page sequence number, precisely because an overlap tree has no high key to “move right” against (postgres-nbtree.md uses the high-key test). The two protocols are isomorphic — one page latched at a time, chase the right-link on a concurrent split, repair the incomplete split lazily — but GiST pays an 8-byte NSN per page where nbtree reuses the separator it already stores. A side-by-side of _bt_moveright (key comparison) against gistScanPage’s parentlsn < GistPageGetNSN test is the cleanest way to see what “order-free Blink-tree” costs. Paper: Kornacker, “Access Methods for Next-Generation Database Systems” (the README’s concurrency reference).

  • R-tree heuristics live entirely in the opclass. Guttman’s quadratic / linear splits and the R*-tree’s margin-minimizing split + forced reinsertion are not in gistsplit.c — they are choices a box_ops / point_ops PickSplit could make. PostgreSQL’s in-core geometric PickSplit is a Guttman-style double-sort; the R*-tree’s forced reinsertion is not implemented (GiST never re-inserts entries from an overflowing page). An opclass that added R*-style reinsertion, or used a Hilbert/Z-order linearization for split, is an unrealized quality improvement the framework permits but does not provide. Paper: Beckmann, Kriegel, Schneider & Seeger 1990, R*-tree (dbms-papers/rstar-tree.md).

  • Buffering build vs. sorted build vs. STR bulk-load. PostgreSQL’s two fast builds bracket the bulk-load literature: the sorted build (GIST_SORTED_BUILD, gated on SortSupport) is the GiST analogue of bottom-up B-tree packing and, for a space-filling-curve linearization, approximates the Sort-Tile-Recursive (STR) bulk-load of spatial trees; the buffering build is Arge/Hinrichs/Vahrenhold/Vitter’s cache-aware algorithm for opclasses that cannot be linearized. The crossover point — when buffering beats plain inserts — is the effective_cache_size threshold that gistInitBuffering watches. Comparing GiST’s heuristic switch against STR’s one-shot tiling would frame when “sort once, pack bottom-up” wins over “buffer and batch.” Paper: Arge et al., “Efficient Bulk Operations on Dynamic R-trees” (the README’s build reference).

  • GiST vs. SP-GiST vs. BRIN — three answers to “no total order.” GiST is the balanced, overlapping-bounding-predicate answer; SP-GiST (postgres-spgist.md) is the space-partitioned, non-overlapping answer (quadtrees, k-d trees, radix tries) where partitions are disjoint so a point query follows a single path; BRIN (postgres-brin.md) is the lossy block-range summary answer that gives up the tree entirely for a tiny min/max-per-range map. The three share the “engine owns the skeleton, opclass owns the semantics” template but differ in whether the partition overlaps (GiST yes, SP-GiST no) and whether there is a tree at all (BRIN no). A unifying note on the extensible-index family is the natural parent doc to all three.

  • CUBRID has no GiST analogue. CUBRID ships a B+-tree and specialized spatial support but no general extensible-key index framework; an extension author who wants a range or full-text or trigram index in CUBRID has no “supply four C functions” path. Contrasting PostgreSQL’s opclass-pluggable AM against CUBRID’s fixed index set is the sharpest illustration of what the GiST abstraction buys an ecosystem. (See the CUBRID index analyses in the cubrid tree.)

In-tree READMEs and source files (REL_18_STABLE, commit 273fe94)

Section titled “In-tree READMEs and source files (REL_18_STABLE, commit 273fe94)”
  • src/backend/access/gist/README — the design document: the GiST interface (Consistent/Union/Compress/Decompress/Penalty/PickSplit/Equal/Distance/ Fetch), the findPath re-search pseudocode, the NSN + F_FOLLOW_RIGHT concurrency protocol, the incomplete-split repair, the buffering build, and the sorted build — plus the foundational paper citations quoted in section 6.
  • src/include/access/gist.hGIST_CONSISTENT_PROC..GIST_SORTSUPPORT_PROC and GISTNProcs, the F_LEAF/F_DELETED/F_FOLLOW_RIGHT page flags, GistNSN (typedef XLogRecPtr), GistBuildLSN, and the GistFollowRight/GistPageGetNSN/GistPageSetNSN accessors.
  • src/include/access/gist_private.hGISTSTATE (the per-statement support-FmgrInfo cache), GISTScanOpaque (the pairingheap queue and killedItems), GISTBuildBuffers, and the split/insert helper prototypes.
  • src/backend/access/gist/gist.cgisthandler, gistinsert, gistdoinsert, gistplacetopage, gistfinishsplit, gistfixsplit, gistFindCorrectParent, gistFindPath, gistformdownlink, gistinserttuple(s), gistSplit, and initGISTstate.
  • src/backend/access/gist/gistget.cgistindex_keytest (the Consistent gate), gistScanPage, gistgettuple, gistgetbitmap, getNextNearest (k-NN), gistkillitems, gistcanreturn.
  • src/backend/access/gist/gistsplit.cgistSplitByKey, gistUserPicksplit, genericPickSplit, findDontCares, supportSecondarySplit, gistunionsubkey, gistSplitHalf.
  • src/backend/access/gist/gistbuild.cgistbuild (strategy selection), gist_indexsortbuild, gistInitBuffering, gistBuildCallback, gistProcessItup, gistbufferinginserttuples.
  • src/backend/access/gist/gistutil.cgistchoose, gistpenalty, gistMakeUnionItVec, gistgetadjusted, gistFetchTuple, gistdentryinit.
  • Hellerstein, J. M., Naughton, J. F. & Pfeffer, A. (1995). “Generalized Search Trees for Database Systems.” Proc. VLDB 1995, 562-573. The GiST interface and search/insert templates; the README’s foundational reference.
  • Kornacker, M. “Access Methods for Next-Generation Database Systems.” The NSN + right-link concurrency protocol PostgreSQL implements; the README’s concurrency reference.
  • Arge, L., Hinrichs, K., Vahrenhold, J. & Vitter, J. S. “Efficient Bulk Operations on Dynamic R-trees.” The cache-aware buffering build; the README’s build reference.
  • Guttman, A. (1984). “R-Trees: A Dynamic Index Structure for Spatial Searching.” SIGMOD 1984, 47-57. The overlapping-MBR tree GiST generalizes; background in knowledge/research/dbms-papers/btree.md / rstar-tree.md.
  • Beckmann, N., Kriegel, H.-P., Schneider, R. & Seeger, B. (1990). “The R*-tree: An Efficient and Robust Access Method for Points and Rectangles.” SIGMOD 1990, 322-331. The split / forced-reinsertion heuristics a geometric opclass may implement; captured in knowledge/research/dbms-papers/rstar-tree.md.
  • Database Internals (Petrov 2019), ch. 2-7 — B-tree concurrency, WAL framing, and the bulk-loading discussion that frames the build strategies (knowledge/research/dbms-general/).

Sibling docs (cross-references — mechanism owned there, not duplicated here)

Section titled “Sibling docs (cross-references — mechanism owned there, not duplicated here)”
  • postgres-index-am.md — the IndexAmRoutine dispatch contract that gisthandler fills in (amgettuple/amgetbitmap/ambuild/aminsert).
  • postgres-spgist.md — the space-partitioned, non-overlapping cousin in the extensible-index family.
  • postgres-brin.md — the lossy block-range-summary answer to “no total order,” the third extensible AM.
  • postgres-nbtree.md — the Lehman & Yao high-key concurrency protocol whose order-free adaptation is GiST’s NSN.
  • postgres-buffer-manager.md — buffer pins and content-lock (LWLock) latching that GiST’s one-page-at-a-time descent is built on.
  • postgres-ssi-predicate-locking.md — the predicate locks that gistScanPage’s PredicateLockPage / CheckForSerializableConflictIn feed (ampredlocks = true).