PostgreSQL Hash Index — Linear Hashing, Buckets, and Overflow Pages
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”A hash index answers exactly one question — “which rows have this key?” —
and it answers nothing else. Unlike a B-tree it cannot serve a range scan,
an inequality, an ORDER BY, or a prefix match, because hashing
deliberately destroys the ordering of keys: two adjacent values land in
arbitrarily distant buckets. What it buys in exchange is the textbook
promise of O(1) equality lookup. Database System Concepts (Silberschatz
7e, ch. 14 “Indexing”, captured in research/dbms-general/) frames the
ideal: a hash function h(K) maps a key to one of B buckets, and a
lookup computes h(K), goes straight to that bucket, and scans only the
handful of entries there — no root-to-leaf descent, no log_F N page
reads. The cost model is a single bucket access plus an overflow chain
walk when a bucket holds more entries than fit in one page.
The hard part of any on-disk hash table is not the lookup; it is growth.
A static hash table sized for N rows degrades catastrophically once it
holds 10N rows — every bucket becomes a long overflow chain and the O(1)
promise collapses to O(chain length). The naive fix, rehash everything
into a table twice the size, is fine in memory but ruinous on disk: it
rewrites the entire index in one stop-the-world operation, doubling the file
and blocking every concurrent reader and writer for the duration. The
discipline that solves this is dynamic hashing, and the specific variant
PostgreSQL adopts is linear hashing, due to Witold Litwin (1980) and
popularized for UNIX by the operative reference at the very top of
src/backend/access/hash/README: Margo Seltzer & Ozan Yigit, “A New
Hashing Package for UNIX”, USENIX Winter 1991 (catalogued in
postgres-paper-bibliography.md; the same lineage feeds PostgreSQL’s
in-memory dynahash.c).
Linear hashing’s insight is to grow the table one bucket at a time, and
to split buckets in a fixed round-robin order that is independent of which
bucket actually overflowed. Maintain two masks: a lowmask covering the
current “level” of 2^L buckets and a highmask = 2*lowmask + 1 covering
the next level. To address a key, first compute b = h(K) & highmask; if
b exceeds the highest bucket that currently exists, fall back to
b = h(K) & lowmask. A split pointer (PostgreSQL’s hashm_maxbucket
plays this role) marks how far the round-robin expansion has progressed.
When the table is judged too full, the bucket at the split pointer is
split — its entries are re-hashed under the wider highmask and partitioned
between the old bucket and a brand-new high-numbered bucket — and the
pointer advances by one. Crucially the bucket that splits is not the one
that overflowed; overflow pages absorb local hotspots, while the global
split pointer sweeps evenly so that, averaged over a full level, the whole
table doubles smoothly without ever rewriting it wholesale.
Two structural consequences organize the rest of this document:
-
Buckets need overflow pages. Because a split only happens when the global load factor crosses a threshold — not when a particular bucket fills — any individual bucket may temporarily receive more tuples than fit on one page. Those spill onto overflow pages chained off the primary page. Overflow is the pressure-relief valve that lets the split schedule stay round-robin rather than reactive.
-
The bucket-to-block mapping must be computable, not stored. A naive implementation would keep a directory array mapping bucket number to disk block. Linear hashing avoids the directory: because buckets are allocated in disciplined power-of-two splitpoint groups, the physical block of a bucket can be computed from the bucket number plus a small
spares[]array — which is exactly what PostgreSQL’sBUCKET_TO_BLKNOmacro does.
The companion theme is concurrency and crash safety. The textbook hash
table is single-threaded and in-memory; a production index must let many
backends read and insert concurrently while splits and garbage collection
run, and must survive a crash mid-split. PostgreSQL’s answer leans entirely
on the buffer manager’s three primitives — content locks, pins, and
cleanup locks (see postgres-buffer-manager.md) — rather than the
heavyweight lock manager, and decomposes every multi-page mutation into
single-page atomic WAL actions, exactly the discipline the B-tree uses
(postgres-nbtree.md) but adapted to the bucket-chain shape.
Common DBMS Design
Section titled “Common DBMS Design”Engineers building a disk hash index converge on a recurring set of conventions. Naming them first lets the PostgreSQL symbols read as choices within a shared design space.
Directory hashing vs. directoryless linear hashing
Section titled “Directory hashing vs. directoryless linear hashing”There are two dominant families of dynamic on-disk hashing:
- Extendible hashing (Fagin et al. 1979) keeps an explicit directory:
an array of
2^dpointers indexed by the topdbits of the hash. When a bucket overflows, only that bucket splits, and the directory either doubles (ifdmust grow) or simply re-points two slots. Lookups are always exactly two I/Os (directory + bucket), but the directory itself can become large and is a shared mutable structure. - Linear hashing (Litwin 1980; Seltzer-Yigit 1991) keeps no directory. Bucket addresses are computed arithmetically from the hash, the two masks, and the split pointer. Growth is round-robin rather than on-demand. This trades the occasional extra overflow-chain hop for the elimination of a centralized directory — a good trade when the directory would otherwise be a contention and memory hotspot.
PostgreSQL chose linear hashing without a directory, which is why the
metapage carries highmask, lowmask, maxbucket, and the spares[]
array instead of a pointer table.
flowchart TD
K["key K"] --> H["h = hash(K)<br/>32-bit hashcode"]
H --> HM["b = h & highmask"]
HM --> CMP{"b > maxbucket?"}
CMP -->|yes| LM["b = h & lowmask<br/>bucket not yet split"]
CMP -->|no| USE["use bucket b"]
LM --> USE
USE --> BLK["BUCKET_TO_BLKNO(metap, b)<br/>compute physical block"]
BLK --> PBP["primary bucket page"]
PBP --> OVF["overflow page chain<br/>(doubly linked)"]
The bucket chain: primary page plus overflow
Section titled “The bucket chain: primary page plus overflow”Almost every disk hash index represents a bucket as a linked list of
pages: one page that is permanently the bucket’s home, plus a chain of
overflow pages added as the bucket grows. PostgreSQL uses a doubly-linked
chain (forward hasho_nextblkno, backward hasho_prevblkno) so that
garbage collection can splice a freed overflow page out of the middle of the
chain by fixing its two neighbors. Entries within a page are kept sorted by
hash code so a page can be binary-searched, even though no ordering holds
across pages of a bucket.
Computed addressing via splitpoint groups
Section titled “Computed addressing via splitpoint groups”To avoid a directory while still letting buckets be allocated incrementally,
linear-hashing implementations allocate bucket pages in power-of-two
groups. PostgreSQL calls each group a splitpoint. Splitpoint s
contributes 2^(s-1) new buckets (so the table doubles per group), and to
avoid allocating a huge slab at once, groups >= 10 are broken into four
equal phases. A per-splitpoint array (spares[]) records how many
overflow pages were allocated before each splitpoint’s bucket pages, which
is the one piece of state needed to turn a bucket number into a block
number arithmetically.
Page-level locking instead of the lock manager
Section titled “Page-level locking instead of the lock manager”A B-tree and a hash index face the same concurrency problem and reach for
the same toolbox: short-lived buffer content locks for individual page
reads/writes, long-lived buffer pins to prevent a page from being
reorganized out from under an in-flight cursor, and cleanup locks (an
exclusive content lock held while observing that no one else has the page
pinned) to authorize a wholesale page or bucket reorganization. None of this
touches the heavyweight lock manager (postgres-lock-manager.md). The hash
AM’s specific convention: a cleanup lock on a primary bucket page = the
right to reorganize the entire bucket, and a scan holds a pin on the
primary bucket page for its entire lifetime.
Decompose multi-page mutations into atomic WAL actions
Section titled “Decompose multi-page mutations into atomic WAL actions”A split touches the metapage, the old bucket’s pages, and the new bucket’s
pages; an overflow free touches the doomed page, its two neighbors, a bitmap
page, and the metapage. No single WAL record can cover an unbounded number
of pages (XLR_MAX_BLOCK_ID caps it at 32). The shared discipline is to
break each logical operation into a sequence of single-record atomic
steps, each of which leaves the index in a valid, searchable state, and to
make the recovery rule “any intermediate state is legal” — so a crash
between steps is repaired lazily by the next insert, split, or VACUUM rather
than requiring an undo.
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”PostgreSQL’s hash AM is src/backend/access/hash/. The handler
hashhandler() fills an IndexAmRoutine (the AM dispatch contract lives in
postgres-index-am.md); the interesting machinery is in five files:
hashpage.c (metapage, bucket allocation, splits), hashinsert.c
(insertion), hashsearch.c (scans), hashovfl.c (overflow page lifecycle
and squeeze), and hashutil.c (the addressing arithmetic). We walk the
design in the order a tuple experiences it: addressing, then search, then
insert, then split, then overflow and garbage collection.
Page types and the special space
Section titled “Page types and the special space”There are four physical page kinds, distinguished by hasho_flag bits in
the HashPageOpaqueData special space at the end of every page:
// HashPageOpaqueData — src/include/access/hash.htypedef struct HashPageOpaqueData{ BlockNumber hasho_prevblkno; /* prev page in chain, OR maxbucket-as-of-split */ BlockNumber hasho_nextblkno; /* next page in bucket chain */ Bucket hasho_bucket; /* bucket number this page belongs to */ uint16 hasho_flag; /* LH_* page type + transient flag bits */ uint16 hasho_page_id; /* 0xFF80, identifies hash pages to pg_filedump */} HashPageOpaqueData;
#define LH_OVERFLOW_PAGE (1 << 0)#define LH_BUCKET_PAGE (1 << 1)#define LH_BITMAP_PAGE (1 << 2)#define LH_META_PAGE (1 << 3)#define LH_BUCKET_BEING_POPULATED (1 << 4)#define LH_BUCKET_BEING_SPLIT (1 << 5)#define LH_BUCKET_NEEDS_SPLIT_CLEANUP (1 << 6)#define LH_PAGE_HAS_DEAD_TUPLES (1 << 7)The low four bits are the page type (block 0 is always LH_META_PAGE); the
high four are transient bucket-state flags used during split and cleanup.
Note the dual meaning of hasho_prevblkno: on an overflow page it is the
real previous block, but on a primary bucket page (where there is no
previous page) it instead stores the value of hashm_maxbucket as of the
bucket’s last split. That overload is the linchpin of the metapage cache
(below) — it costs nothing because a primary page’s “previous block” is
always invalid anyway.
The metapage and computed addressing
Section titled “The metapage and computed addressing”Block 0 is the metapage. It stores the table’s live geometry:
// HashMetaPageData (excerpt) — src/include/access/hash.htypedef struct HashMetaPageData{ double hashm_ntuples; /* tuples stored, drives split decision */ uint16 hashm_ffactor; /* target fill factor (tuples per bucket) */ uint32 hashm_maxbucket; /* ID of highest live bucket (= split pointer) */ uint32 hashm_highmask; /* mask into the whole current table */ uint32 hashm_lowmask; /* mask into the lower half */ uint32 hashm_ovflpoint; /* current splitpoint for overflow allocation */ uint32 hashm_firstfree; /* lowest-numbered free overflow bit */ uint32 hashm_nmaps; /* number of bitmap pages */ uint32 hashm_spares[HASH_MAX_SPLITPOINTS]; /* ovfl pages before each splitpoint */ BlockNumber hashm_mapp[HASH_MAX_BITMAPS]; /* blocks of the bitmap pages */} HashMetaPageData;The address of a key’s bucket is computed, never looked up in a directory.
Two functions do the arithmetic. _hash_hashkey2bucket applies the linear
hashing masks:
// _hash_hashkey2bucket — src/backend/access/hash/hashutil.cBucket_hash_hashkey2bucket(uint32 hashkey, uint32 maxbucket, uint32 highmask, uint32 lowmask){ Bucket bucket;
bucket = hashkey & highmask; if (bucket > maxbucket) bucket = bucket & lowmask; /* this bucket not yet split off */
return bucket;}Then BUCKET_TO_BLKNO turns a bucket number into a physical block by adding
the overflow pages that precede that bucket’s splitpoint group — the
spares[] entry is what makes this directoryless:
// BUCKET_TO_BLKNO — src/include/access/hash.h#define BUCKET_TO_BLKNO(metap,B) \ ((BlockNumber) ((B) + ((B) ? (metap)->hashm_spares[_hash_spareindex((B)+1)-1] : 0)) + 1)The + 1 accounts for the metapage at block 0; buckets 0 and 1 always live
at blocks 1 and 2. _hash_spareindex maps a bucket count to its global
splitpoint phase, and _hash_get_totalbuckets inverts it — together they
implement the “groups < 10 are single-phase, groups >= 10 are four-phase”
allocation schedule described in the README:
// _hash_spareindex — src/backend/access/hash/hashutil.cuint32_hash_spareindex(uint32 num_bucket){ uint32 splitpoint_group = pg_ceil_log2_32(num_bucket);
if (splitpoint_group < HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) return splitpoint_group; /* ... groups >= 10 are split into 4 equal phases ... */ return splitpoint_phases;}Caching the metapage to avoid contention
Section titled “Caching the metapage to avoid contention”Both scanning and inserting must locate a bucket, which needs
maxbucket/highmask/lowmask. Locking the metapage on every operation
would make block 0 a brutal hotspot, so each backend keeps a copy of the
metapage in its relcache (rd_amcache) via _hash_getcachedmetap. The
danger is staleness: a concurrent split could have moved the target key into
a new bucket since the cache was filled. The validation trick uses the
overloaded hasho_prevblkno: after locking the candidate primary bucket
page, compare its stored maxbucket-as-of-last-split against the cached
maxbucket. If the page’s value is larger, the bucket has split since our
cache was made, so we drop the lock, refresh, and retry:
// _hash_getbucketbuf_from_hashkey — src/backend/access/hash/hashpage.cfor (;;){ bucket = _hash_hashkey2bucket(hashkey, metap->hashm_maxbucket, metap->hashm_highmask, metap->hashm_lowmask); blkno = BUCKET_TO_BLKNO(metap, bucket); buf = _hash_getbuf(rel, blkno, access, LH_BUCKET_PAGE); opaque = HashPageGetOpaque(BufferGetPage(buf));
/* If this bucket hasn't been split since our cache, we're done. */ if (opaque->hasho_prevblkno <= metap->hashm_maxbucket) break;
/* Stale: drop lock, force-refresh the cached metapage, and retry. */ _hash_relbuf(rel, buf); metap = _hash_getcachedmetap(rel, &metabuf, true);}The README justifies why retries are cheap: any given bucket can split only
a few dozen times ever (the total bucket count is bounded by 2^32),
whereas accesses are unbounded, so the steady state almost never retries.
Scanning a bucket
Section titled “Scanning a bucket”A hash scan supports only single-column equality (HTEqualStrategyNumber)
and rejects whole-index scans outright. _hash_first computes the hash key,
finds the primary bucket page, and pins it for the whole scan. The crucial
performance and concurrency move: a hash scan slurps all matching tuples
from a page into backend-local storage (so->currPos.items[]) in one shot,
then releases the page content lock and processes heap TIDs while holding no
index lock — only the bucket-page pin. This is why concurrent inserts can
proceed on the same page without the scan having to re-find its position.
flowchart TD
START["_hash_first(scan, dir)"] --> HK["hashkey = hash(scankey)"]
HK --> GBB["_hash_getbucketbuf_from_hashkey<br/>pin + lock primary bucket page"]
GBB --> POP{"bucket being<br/>populated by split?"}
POP -->|yes| OLD["also pin old (split) bucket<br/>set hashso_buc_populated"]
POP -->|no| RP["_hash_readpage"]
OLD --> RP
RP --> BIN["_hash_binsearch(page, hashkey)<br/>locate first match"]
BIN --> LOAD["_hash_load_qualified_items<br/>copy matches to currPos.items[]"]
LOAD --> REL["release content lock<br/>(retain pin on primary bucket)"]
REL --> NEXT{"more items<br/>in currPos?"}
NEXT -->|yes| RET["return xs_heaptid"]
NEXT -->|no, fwd| STEP["_hash_next: step to<br/>hasho_nextblkno overflow page"]
STEP --> RP
When a scan starts on a bucket that is being populated by a split, it must
also scan the old bucket it was split from, while skipping tuples already
copied (marked INDEX_MOVED_BY_SPLIT). _hash_load_qualified_items enforces
both that skip and the dead-tuple skip:
// _hash_load_qualified_items (forward) — src/backend/access/hash/hashsearch.cif ((so->hashso_buc_populated && !so->hashso_buc_split && (itup->t_info & INDEX_MOVED_BY_SPLIT_MASK)) || (scan->ignore_killed_tuples && (ItemIdIsDead(PageGetItemId(page, offnum))))){ offnum = OffsetNumberNext(offnum); /* skip moved-by-split / dead */ continue;}if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup) && _hash_checkqual(scan, itup)){ _hash_saveitem(so, itemIndex, offnum, itup); /* qualified match */ itemIndex++;}else break; /* page is hash-sorted, so first miss ends the run */Because entries are hash-sorted within a page, the binary search jumps to the first possible match and the loop stops at the first non-match — no full page scan.
Inserting a tuple
Section titled “Inserting a tuple”_hash_doinsert extracts the precomputed hash key from the index tuple,
locates and write-locks the primary bucket page, then walks the bucket chain
looking for a page with room. Three notable behaviors: (1) if the bucket is
being split, it tries to finish the split first, because that may free
space and avoid a needless overflow page; (2) if a full page has dead
(LP_DEAD-hinted) tuples, it vacuums that one page before resorting to
overflow; (3) only after placing the tuple does it lock the metapage to bump
the tuple count and decide whether a split is now due.
// _hash_doinsert (chain walk) — src/backend/access/hash/hashinsert.cwhile (PageGetFreeSpace(page) < itemsz){ BlockNumber nextblkno;
/* try reclaiming dead tuples on this page before overflowing */ if (H_HAS_DEAD_TUPLES(pageopaque) && IsBufferCleanupOK(buf)) { _hash_vacuum_one_page(rel, heapRel, metabuf, buf); if (PageGetFreeSpace(page) >= itemsz) break; }
nextblkno = pageopaque->hasho_nextblkno; if (BlockNumberIsValid(nextblkno)) { /* step to existing overflow page (drop lock; keep pin on primary) */ if (buf != bucket_buf) _hash_relbuf(rel, buf); else LockBuffer(buf, BUFFER_LOCK_UNLOCK); buf = _hash_getbuf(rel, nextblkno, HASH_WRITE, LH_OVERFLOW_PAGE); } else { /* end of chain: allocate and link a fresh overflow page */ LockBuffer(buf, BUFFER_LOCK_UNLOCK); buf = _hash_addovflpage(rel, metabuf, buf, (buf == bucket_buf)); } page = BufferGetPage(buf); pageopaque = HashPageGetOpaque(page);}The actual placement keeps the page hash-sorted via _hash_pgaddtup (binary
search for the insertion point, unless the caller asserts append order). The
count bump and split decision are made under the metapage exclusive lock,
inside a critical section, and the whole thing is one WAL record covering the
target page and the metapage:
// _hash_doinsert (commit + split decision) — src/backend/access/hash/hashinsert.cLockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);START_CRIT_SECTION();itup_off = _hash_pgaddtup(rel, buf, itemsz, itup, sorted);MarkBufferDirty(buf);metap = HashPageGetMeta(metapage);metap->hashm_ntuples += 1;do_expand = metap->hashm_ntuples > (double) metap->hashm_ffactor * (metap->hashm_maxbucket + 1);MarkBufferDirty(metabuf);/* ... XLOG_HASH_INSERT registers buf (block 0) and metabuf (block 1) ... */END_CRIT_SECTION();LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);_hash_relbuf(rel, buf);if (do_expand) _hash_expandtable(rel, metabuf); /* may split one bucket */The split test — ntuples > ffactor * (maxbucket + 1) — is the global load
factor, kept deliberately in sync between _hash_doinsert and
_hash_expandtable. It triggers on the whole table being overfull, not on
the current bucket, which is the round-robin discipline of linear hashing.
Splitting a bucket incrementally
Section titled “Splitting a bucket incrementally”_hash_expandtable is the heart of growth. It locks the metapage, rechecks
that a split is still wanted, then picks the bucket to split using the linear
hashing rule: the new bucket is maxbucket + 1, and the old bucket it
splits from is new_bucket & lowmask. It takes a conditional cleanup lock
on the old bucket — conditional because it must not block while holding the
metapage lock; if it can’t get the lock it simply gives up, leaving the index
overfull but correct, and the next inserter retries.
// _hash_expandtable (bucket selection) — src/backend/access/hash/hashpage.cnew_bucket = metap->hashm_maxbucket + 1;old_bucket = (new_bucket & metap->hashm_lowmask);start_oblkno = BUCKET_TO_BLKNO(metap, old_bucket);
buf_oblkno = _hash_getbuf_with_condlock_cleanup(rel, start_oblkno, LH_BUCKET_PAGE);if (!buf_oblkno) goto fail; /* couldn't get cleanup lock; abandon split, stay overfull */If the old bucket still has a pending split (LH_BUCKET_BEING_SPLIT) or
leftover split-cleanup work (LH_BUCKET_NEEDS_SPLIT_CLEANUP), it finishes
that first and restarts — a bucket can never be in two splits at once. Once
clear, it may need to allocate a whole new splitpoint’s worth of bucket pages
(_hash_alloc_buckets, which just extends the file’s logical EOF by writing
a single zero page at the end of the range, relying on filesystem holes for
the rest). Then, inside a critical section, it updates the metapage masks and
flags both buckets as in-progress, all under one
XLOG_HASH_SPLIT_ALLOCATE_PAGE record:
// _hash_expandtable (mask + flag update) — src/backend/access/hash/hashpage.cSTART_CRIT_SECTION();metap->hashm_maxbucket = new_bucket;if (new_bucket > metap->hashm_highmask){ /* a full doubling completed: widen the masks one bit */ metap->hashm_lowmask = metap->hashm_highmask; metap->hashm_highmask = new_bucket | metap->hashm_lowmask;}/* old bucket: mark being-split AND stamp maxbucket into hasho_prevblkno */oopaque->hasho_flag |= LH_BUCKET_BEING_SPLIT;oopaque->hasho_prevblkno = maxbucket;/* new bucket: fresh primary page, marked being-populated */nopaque->hasho_flag = LH_BUCKET_PAGE | LH_BUCKET_BEING_POPULATED;The mask-widening only fires at a power-of-two boundary; between boundaries
the table grows one bucket per split while the masks stay fixed — that is
linear hashing’s smooth incremental doubling. _hash_splitbucket then walks
the old bucket’s chain, and for each tuple recomputes its bucket under the
new masks. Tuples destined for the new bucket are copied (not moved in
place), stamped INDEX_MOVED_BY_SPLIT_MASK, and added to the new bucket,
allocating overflow pages there as needed:
// _hash_splitbucket (partition loop) — src/backend/access/hash/hashpage.cbucket = _hash_hashkey2bucket(_hash_get_indextuple_hashkey(itup), maxbucket, highmask, lowmask);if (bucket == nbucket){ new_itup = CopyIndexTuple(itup); new_itup->t_info |= INDEX_MOVED_BY_SPLIT_MASK; /* scans skip in old bucket */ /* buffer up; flush to new page (and chain overflow) when full */ itups[nitups++] = new_itup;}else Assert(bucket == obucket); /* stays put */Tuples that stay in the old bucket are left in place for now — they are
removed lazily later by the split-cleanup step. When the chain is exhausted,
the routine re-locks both primary pages (old first, to respect the
lower-bucket-first lock order), clears the in-progress flags, and sets
LH_BUCKET_NEEDS_SPLIT_CLEANUP on the old bucket under an
XLOG_HASH_SPLIT_COMPLETE record:
// _hash_splitbucket (completion) — src/backend/access/hash/hashpage.coopaque->hasho_flag &= ~LH_BUCKET_BEING_SPLIT;nopaque->hasho_flag &= ~LH_BUCKET_BEING_POPULATED;oopaque->hasho_flag |= LH_BUCKET_NEEDS_SPLIT_CLEANUP; /* old copies still present */The two-phase “copy now under moved-by-split, delete later under needs-cleanup” design is what makes a split survivable across a crash and invisible to concurrent scans: a scan that began before the split sees the old bucket’s originals; a scan that began during it sees the new bucket plus the old bucket’s not-yet-moved tuples; and a crash mid-split leaves both flags set, so the reader algorithm transparently scans both halves until the next insert finishes the job.
flowchart TD
INS["insert pushes ntuples ><br/>ffactor * (maxbucket+1)"] --> EXP["_hash_expandtable"]
EXP --> META["lock metapage<br/>recheck split needed"]
META --> PICK["new = maxbucket+1<br/>old = new & lowmask"]
PICK --> CL{"cond. cleanup lock<br/>on old bucket?"}
CL -->|no| GIVEUP["give up<br/>stay overfull, retry later"]
CL -->|yes| PEND{"old has pending<br/>split / cleanup?"}
PEND -->|yes| FIN["finish it, restart_expand"]
PEND -->|no| ALLOC["_hash_alloc_buckets if<br/>new splitpoint group"]
ALLOC --> FLAG["crit section: widen masks,<br/>flag old=BEING_SPLIT new=BEING_POPULATED<br/>WAL: SPLIT_ALLOCATE_PAGE"]
FLAG --> PART["_hash_splitbucket:<br/>copy nbucket tuples,<br/>mark MOVED_BY_SPLIT<br/>WAL: SPLIT_PAGE per full page"]
PART --> DONE["clear flags,<br/>set NEEDS_SPLIT_CLEANUP<br/>WAL: SPLIT_COMPLETE"]
DONE --> CLEAN["hashbucketcleanup:<br/>delete moved originals from old"]
Overflow pages: allocation, freeing, and the free-space bitmap
Section titled “Overflow pages: allocation, freeing, and the free-space bitmap”Overflow pages are tracked by bitmap pages — dedicated pages whose payload
is a giant bit array, one bit per overflow page (0 = free, 1 = in use).
_hash_addovflpage first searches the bitmaps starting at hashm_firstfree
for a recyclable page; only if none is free does it extend the index.
Allocation and chaining are a single XLOG_HASH_ADD_OVFL_PAGE record so a
crash cannot leak a half-linked page:
// _hash_addovflpage (recycle vs. extend) — src/backend/access/hash/hashovfl.cfor (; bit <= last_inpage; j++, bit += BITS_PER_MAP){ if (freep[j] != ALL_SET) /* a free bit in this bitmap word */ { bit += _hash_firstfreebit(freep[j]); blkno = bitno_to_blkno(metap, bit + (i << BMPG_SHIFT(metap))); ovflbuf = _hash_getinitbuf(rel, blkno); /* recycle this page */ goto found; }}/* ... fell through: extend the relation for a new overflow page ... */Freeing is the inverse and is atomic with moving the page’s tuples
elsewhere — _hash_freeovflpage splices the doomed page out of the
doubly-linked chain (fixing both neighbors), clears its bitmap bit, lowers
hashm_firstfree if appropriate, and adds the relocated tuples to the write
page, all under one XLOG_HASH_SQUEEZE_PAGE record. The README is explicit
that combining the move and the delink into one action is what prevents a
standby from briefly showing a tuple twice:
// _hash_freeovflpage (delink) — src/backend/access/hash/hashovfl.cif (BufferIsValid(prevbuf)) prevopaque->hasho_nextblkno = nextblkno; /* fore sibling skips us */if (BufferIsValid(nextbuf)) nextopaque->hasho_prevblkno = prevblkno; /* aft sibling skips us */CLRBIT(freep, bitmapbit); /* mark page free in bitmap */if (ovflbitno < metap->hashm_firstfree) metap->hashm_firstfree = ovflbitno;VACUUM (hashbulkdelete → hashbucketcleanup) removes dead and
moved-by-split tuples page by page under cleanup locks chained down the
bucket, then _hash_squeezebucket compacts the bucket: it works a read
pointer backward from the chain’s tail and a write pointer forward from the
primary page, moving tuples onto earlier pages and freeing emptied overflow
pages, until the two pointers meet. This reclaims overflow pages into the
bitmap free pool — the only mechanism (short of REINDEX) by which a hash
index gives space back, since the bucket count itself never shrinks.
Bucket-level locking and the scan/VACUUM interlock
Section titled “Bucket-level locking and the scan/VACUUM interlock”The locking contract is small but strict. To avoid deadlock between any two operations that must touch two buckets, the lower-numbered bucket is always locked first, and the metapage is locked only after all bucket locks. A scan keeps a pin (not a lock) on its primary bucket page for its whole life, which blocks VACUUM and split (both of which need a cleanup lock, i.e. the sole pin) from reorganizing that bucket while the scan is parked in it. The one residual hazard — a scan that starts after VACUUM released a bucket’s cleanup lock but then overtakes the still-running cleanup, and wrongly kills a TID that VACUUM is about to reuse — is closed by lock chaining: cleanup locks the next page in the chain before releasing the current one, so a scan can never leapfrog the cleanup frontier.
Source Walkthrough
Section titled “Source Walkthrough”The hash AM lives entirely under src/backend/access/hash/. Symbols are
grouped by call-flow below; the position-hint table at the end pairs each
with its (file, line) as observed on 2026-06-05.
AM dispatch surface (hash.c)
Section titled “AM dispatch surface (hash.c)”hashhandler— fills theIndexAmRoutineadvertised to the index AM layer (postgres-index-am.md). Declares the AM as supporting only equality (amsearcharray = false, one strategy), not unique, not ordered.hashbuild— bulk-build entry point; uses a spool when the index is large enough to sort by hash code first (hashsort.c, out of scope here).hashinsert— per-tuple public entry; computes the index tuple and calls_hash_doinsert.hashgettuple— scan entry; drives_hash_first/_hash_nextand manages LP_DEAD kill-item hinting.hashbulkdelete/hashbucketcleanup— VACUUM’s per-bucket cleanup driver, removing dead and moved-by-split tuples and clearingLH_BUCKET_NEEDS_SPLIT_CLEANUP, then squeezing.
Addressing arithmetic (hashutil.c)
Section titled “Addressing arithmetic (hashutil.c)”_hash_datum2hashkey— call the opclass hash proc to get the 32-bit hash code for a key (the cached, non-cross-type fast path)._hash_hashkey2bucket— applyhighmask/lowmask/maxbucketto map a hash code to a bucket number._hash_spareindex— map a bucket count to its global splitpoint phase (single-phase for groups < 10, four-phase for groups >= 10)._hash_get_totalbuckets— inverse: total buckets allocated through a given splitpoint phase._hash_get_oldblock_from_newbucket/_hash_get_newblock_from_oldbucket— translate between a split’s two bucket numbers, used by scans and split completion.BUCKET_TO_BLKNO(macro,hash.h) — bucket number to physical block.
Metapage, allocation, and split (hashpage.c)
Section titled “Metapage, allocation, and split (hashpage.c)”_hash_init/_hash_init_metabuffer— create the metapage, the initialNbucket pages, and the first bitmap page at CREATE INDEX._hash_getbuf/_hash_getnewbuf/_hash_getinitbuf/_hash_getbuf_with_condlock_cleanup— typed buffer accessors that run_hash_checkpageagainst the allowed page-type mask._hash_getcachedmetap— return (and lazily refresh) the relcache copy of the metapage inrd_amcache._hash_getbucketbuf_from_hashkey— the cache-validated lookup loop that yields a locked primary bucket page, retrying on stale-cache detection._hash_expandtable— the split driver: pick the bucket, take cleanup locks, widen masks, allocate, and flag in-progress._hash_alloc_buckets— extend the file’s logical EOF for a new splitpoint group by writing one zero page at the range end._hash_splitbucket— re-partition the old bucket’s tuples between old and new, marking moved tuplesINDEX_MOVED_BY_SPLIT._hash_finish_split— complete a split interrupted by a crash or a failed cleanup-lock acquisition, using a TID hash table to skip already-moved tuples.
Insertion (hashinsert.c)
Section titled “Insertion (hashinsert.c)”_hash_doinsert— locate bucket, walk the chain for free space (adding overflow if needed), place the tuple, bump the metapage count, and trigger a split if overfull._hash_pgaddtup/_hash_pgaddmultitup— add one / many tuples to a page preserving hash-code order (binary search for the slot)._hash_vacuum_one_page— opportunistically delete LP_DEAD tuples from a single full page before resorting to an overflow page.
Scanning (hashsearch.c)
Section titled “Scanning (hashsearch.c)”_hash_first— start a scan: hash the key, pin the primary bucket page, handle the being-populated case, position via binary search._hash_readpage— load all qualifying tuples on a page intoso->currPos.items[], then release the content lock (keep the pin)._hash_load_qualified_items— the per-page match loop, skipping moved-by-split and dead tuples._hash_next/_hash_readnext/_hash_readprev— advance to the next/previous page in the chain, crossing into the split partner bucket when a split was in progress at scan start.
Overflow lifecycle (hashovfl.c)
Section titled “Overflow lifecycle (hashovfl.c)”_hash_addovflpage— recycle a free overflow page from the bitmap, or extend the index; chain it to the bucket tail atomically._hash_freeovflpage— delink an overflow page, clear its bitmap bit, and move its tuples to a write page, all in one WAL action._hash_squeezebucket— compact a bucket by moving tuples toward the front and freeing emptied overflow pages._hash_initbitmapbuffer— initialize a bitmap page with all bits set (“in use”).bitno_to_blkno/_hash_ovflblkno_to_bitno— convert between an overflow page’s bitmap bit number and its physical block number.
Position hints (as of 2026-06-05, REL_18 273fe94)
Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”| Symbol | File | Line |
|---|---|---|
hashhandler | src/backend/access/hash/hash.c | 58 |
hashbuild | src/backend/access/hash/hash.c | 122 |
hashinsert | src/backend/access/hash/hash.c | 258 |
hashgettuple | src/backend/access/hash/hash.c | 290 |
hashbulkdelete | src/backend/access/hash/hash.c | 465 |
hashbucketcleanup | src/backend/access/hash/hash.c | 690 |
_hash_datum2hashkey | src/backend/access/hash/hashutil.c | 82 |
_hash_hashkey2bucket | src/backend/access/hash/hashutil.c | 125 |
_hash_spareindex | src/backend/access/hash/hashutil.c | 142 |
_hash_get_totalbuckets | src/backend/access/hash/hashutil.c | 174 |
_hash_get_oldblock_from_newbucket | src/backend/access/hash/hashutil.c | 422 |
_hash_get_newblock_from_oldbucket | src/backend/access/hash/hashutil.c | 461 |
_hash_init | src/backend/access/hash/hashpage.c | 327 |
_hash_init_metabuffer | src/backend/access/hash/hashpage.c | 498 |
_hash_getbuf | src/backend/access/hash/hashpage.c | 70 |
_hash_getbuf_with_condlock_cleanup | src/backend/access/hash/hashpage.c | 96 |
_hash_getnewbuf | src/backend/access/hash/hashpage.c | 198 |
_hash_expandtable | src/backend/access/hash/hashpage.c | 614 |
_hash_alloc_buckets | src/backend/access/hash/hashpage.c | 992 |
_hash_splitbucket | src/backend/access/hash/hashpage.c | 1073 |
_hash_finish_split | src/backend/access/hash/hashpage.c | 1356 |
_hash_getcachedmetap | src/backend/access/hash/hashpage.c | 1501 |
_hash_getbucketbuf_from_hashkey | src/backend/access/hash/hashpage.c | 1559 |
_hash_doinsert | src/backend/access/hash/hashinsert.c | 38 |
_hash_pgaddtup | src/backend/access/hash/hashinsert.c | 274 |
_hash_pgaddmultitup | src/backend/access/hash/hashinsert.c | 331 |
_hash_vacuum_one_page | src/backend/access/hash/hashinsert.c | 370 |
_hash_next | src/backend/access/hash/hashsearch.c | 48 |
_hash_readnext | src/backend/access/hash/hashsearch.c | 131 |
_hash_readprev | src/backend/access/hash/hashsearch.c | 197 |
_hash_first | src/backend/access/hash/hashsearch.c | 288 |
_hash_readpage | src/backend/access/hash/hashsearch.c | 448 |
_hash_load_qualified_items | src/backend/access/hash/hashsearch.c | 604 |
bitno_to_blkno | src/backend/access/hash/hashovfl.c | 35 |
_hash_ovflblkno_to_bitno | src/backend/access/hash/hashovfl.c | 62 |
_hash_addovflpage | src/backend/access/hash/hashovfl.c | 112 |
_hash_freeovflpage | src/backend/access/hash/hashovfl.c | 490 |
_hash_initbitmapbuffer | src/backend/access/hash/hashovfl.c | 777 |
_hash_squeezebucket | src/backend/access/hash/hashovfl.c | 842 |
HashMetaPageData | src/include/access/hash.h | 244 |
HashPageOpaqueData | src/include/access/hash.h | 77 |
BUCKET_TO_BLKNO | src/include/access/hash.h | 39 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”All claims below were checked against the REL_18 tree at
/data/hgryoo/references/postgres (commit 273fe94).
- Page types and flags. The four physical page types
(
LH_OVERFLOW_PAGE,LH_BUCKET_PAGE,LH_BITMAP_PAGE,LH_META_PAGE) and the four transient flags (LH_BUCKET_BEING_POPULATED,LH_BUCKET_BEING_SPLIT,LH_BUCKET_NEEDS_SPLIT_CLEANUP,LH_PAGE_HAS_DEAD_TUPLES) are defined inhash.hlines 54–61, exactly as quoted.HASHO_PAGE_IDis0xFF80(line 101). Verified. hasho_prevblknooverload. The header comment (lines 67–72) confirms that on a primary bucket pagehasho_prevblknostores “thehashm_maxbucketvalue as of the last time the bucket was last split, or else as of the time the bucket was created”, used for cache validation._hash_initbufsets it tomax_bucket(line 176) and_hash_expandtableupdates it tomaxbucketon split (oopaque->hasho_prevblkno = maxbucket). Verified.- Metapage cache validation.
_hash_getbucketbuf_from_hashkeybreaks the loop whenopaque->hasho_prevblkno <= metap->hashm_maxbucketand otherwise force-refreshes via_hash_getcachedmetap(rel, &metabuf, true). Verified against the quoted excerpt. - Linear hashing mask logic.
_hash_hashkey2bucketcomputesbucket = hashkey & highmask; if (bucket > maxbucket) bucket &= lowmask;verbatim. The mask-widening on a doubling boundary in_hash_expandtable(lowmask = highmask; highmask = new_bucket | lowmask;) is present. Verified. - Split selection.
new_bucket = maxbucket + 1; old_bucket = new_bucket & lowmask;confirmed in_hash_expandtable. The conditional cleanup lock (_hash_getbuf_with_condlock_cleanup) and the “give up on failure” path (goto fail) are as described. Verified. - moved-by-split marking.
_hash_splitbucketcopies qualifying tuples withCopyIndexTupleand ORs inINDEX_MOVED_BY_SPLIT_MASK(=INDEX_AM_RESERVED_BIT,hash.hline 293), and_hash_load_qualified_itemsskips such tuples whenhashso_buc_populated && !hashso_buc_split. Verified. - Split decision formula. Both
_hash_doinsertand_hash_expandtableusentuples > ffactor * (maxbucket + 1), with explicit “keep in sync” comments at both sites. Verified. - WAL op codes.
XLOG_HASH_INIT_META_PAGE(0x00),INIT_BITMAP_PAGE,INSERT,ADD_OVFL_PAGE,SPLIT_ALLOCATE_PAGE,SPLIT_PAGE,SPLIT_COMPLETE,MOVE_PAGE_CONTENTS,SQUEEZE_PAGE,DELETE,SPLIT_CLEANUP,UPDATE_META_PAGE,VACUUM_ONE_PAGEare all defined inhash_xlog.h(lines 27–45). EachXLogInsert(RM_HASH_ID, ...)call cited matches its operation. Verified. - Overflow free-space bitmap.
_hash_addovflpagesearches bitmaps fromhashm_firstfree, recycles a page if afreep[j] != ALL_SETword is found (else extends), and_hash_freeovflpagedoesCLRBIT(freep, bitmapbit)and lowershashm_firstfree. The atomic move+delink underXLOG_HASH_SQUEEZE_PAGEis confirmed. Verified. - Scan locking.
_hash_firstkeepsso->hashso_bucket_bufpinned;_hash_readpagereleases the content lock but retains the primary-bucket pin (LockBuffer(... BUFFER_LOCK_UNLOCK)whencurrPos.buf == hashso_bucket_buf). The whole-page slurp intocurrPos.items[]is present. Verified. - Whole-index scan rejection.
_hash_firstraisesERRCODE_FEATURE_NOT_SUPPORTED“hash indexes do not support whole-index scans” whenscan->numberOfKeys < 1. Verified. - One caveat. The doc says VACUUM “gives space back to the bitmap free pool but never to the OS”; the README (lines 31–34) confirms a hash index is never shrunk except by REINDEX, and overflow pages are recycled but never returned to the filesystem. Verified.
- WAL and crash recovery. Hash indexes have been fully WAL-logged since
PostgreSQL 10; prior to that they were not crash-safe. The
RM_HASH_IDresource manager (hash_xlog.c) replays each of the op codes above, and the README’s “Lock definitions” and “Pseudocode” sections describe the recovery invariant that any intermediate split / squeeze state is a legal index. Verified againsthash_xlog.hand theSTART_CRIT_SECTION/XLogInsertcall sites quoted above.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”-
Extendible hashing (Fagin et al. 1979) vs. PostgreSQL’s directoryless linear hashing. The dominant alternative keeps a
2^ddirectory indexed by the topdhash bits, guaranteeing exactly two I/Os per probe and splitting only the bucket that overflowed. PostgreSQL deliberately rejected the directory —BUCKET_TO_BLKNOcomputes the block fromspares[]instead — trading the occasional extra overflow-chain hop for the elimination of a shared, mutable, memory-resident directory that would itself be a contention hotspot. A focused note mapping eachhashm_*metapage field to its extendible-hashing analogue (directory vs. masks, local-depth vs.hasho_prevblkno-as-maxbucket) would sharpen the trade. Paper: Fagin, Nievergelt, Pippenger & Strong, “Extendible Hashing”, ACM TODS 4(3), 1979. -
Litwin’s original linear hashing vs. Seltzer-Yigit’s UNIX package. The README credits Seltzer & Yigit 1991 (
dbms-papers/bibliography) as the direct lineage, which is itself an engineering refinement of Litwin’s 1980 linear hashing. The PostgreSQLspares[]/splitpoint-phase scheme (groups< 10single-phase, groups>= 10four-phase) is a disk-allocation optimization absent from the textbook algorithm — it avoids extending the file by a full power-of-two slab at once. Comparing_hash_spareindexagainst Litwin’s flat group numbering is a self-contained exercise in how a paper algorithm hardens into production storage code. Paper: Litwin, “Linear Hashing: A New Tool for File and Table Addressing”, VLDB 1980. -
Hash vs. B-tree as the default equality index. Until WAL logging arrived in PG 10, hash indexes were not crash-safe and the documentation actively steered users to B-tree even for equality. Even now a B-tree on a single column serves equality in
O(log_F N)with full crash safety, ordering, and uniqueness, so the hash AM wins only on very large keys (where storing the 4-byte hash code beats storing the full key) and high-cardinality equality workloads. A benchmark note pittingpostgres-nbtree.md’s_bt_searchagainst_hash_firston a wide-text equality column would quantify the crossover. (Seepostgres-nbtree.md.) -
In-memory dynamic hashing —
dynahash.cshares the lineage. PostgreSQL’s own in-memory hash tables (src/backend/utils/hash/dynahash.c, behind the buffer mapping table, the relcache, the lock manager’sLOCKhash) descend from the same Seltzer-Yigit package, but expand by directory segments rather than disk splitpoints and never persist. Contrasting the on-diskhashm_spares[]addressing withdynahash’s segmented directory shows two forks of one algorithm specialized for disk vs. RAM. -
Modern concurrent / latch-free hash indexes. Research engines push past page-content-lock linear hashing toward latch-free split-ordered lists (Shalev-Shavit 2006) and cache-conscious extendible hashing (e.g. level hashing, CCEH for persistent memory). Positioning PostgreSQL’s cleanup-lock-per-bucket, pin-for-scan-lifetime protocol against these would frame what a disk-oriented, buffer-shared engine gives up for crash safety and replica consistency. The hash AM’s “one atomic WAL action per page mutation, any intermediate state is searchable” discipline is precisely the property latch-free in-memory designs do not need and cannot easily provide.
-
CUBRID has no hash index AM. CUBRID’s persistent indexing is B+-tree only; hashing appears in its query executor (hash join, hash aggregation) and in in-memory structures, not as a secondary index access method. A note on why a DBMS might ship a B-tree-only index catalog — uniformity of the optimizer’s cost model, one recovery protocol to maintain — is the natural companion to this file. (See the CUBRID code-analysis tree.)
Sources
Section titled “Sources”In-tree READMEs and source files (REL_18_STABLE, commit 273fe94)
Section titled “In-tree READMEs and source files (REL_18_STABLE, commit 273fe94)”src/backend/access/hash/README— the design document: linear-hashing basics and the Seltzer-Yigit lineage, the splitpoint/spares[]addressing scheme, lock definitions (page content locks, pins, cleanup locks), the metapage-cache validation viahasho_prevblkno, incremental split protocol, overflow-page free-space bitmaps, the squeeze/cleanup VACUUM path, and the WAL crash-safety invariants.src/include/access/hash.h—HashPageOpaqueData,HashMetaPageData, theLH_*page-type and transient flag bits,BUCKET_TO_BLKNO,INDEX_MOVED_BY_SPLIT_MASK,HASHO_PAGE_ID, splitpoint/bitmap constants.src/include/access/hash_xlog.h— theXLOG_HASH_*op codes and per-record payload structs replayed byRM_HASH_ID.src/backend/access/hash/hash.c—hashhandler, the AM callbacks (hashinsert,hashgettuple,hashbulkdelete), andhashbucketcleanup.src/backend/access/hash/hashpage.c— metapage init, typed buffer accessors,_hash_getcachedmetap,_hash_getbucketbuf_from_hashkey,_hash_expandtable,_hash_alloc_buckets,_hash_splitbucket,_hash_finish_split.src/backend/access/hash/hashinsert.c—_hash_doinsert,_hash_pgaddtup,_hash_vacuum_one_page.src/backend/access/hash/hashsearch.c—_hash_first,_hash_readpage,_hash_load_qualified_items,_hash_next/_hash_readnext/_hash_readprev.src/backend/access/hash/hashovfl.c—_hash_addovflpage,_hash_freeovflpage,_hash_squeezebucket, the bitmap helpers.src/backend/access/hash/hashutil.c—_hash_hashkey2bucket,_hash_spareindex,_hash_get_totalbuckets, the old/new bucket translators.
Papers and textbook chapters
Section titled “Papers and textbook chapters”- Seltzer, M. & Yigit, O. (1991). “A New Hashing Package for UNIX.”
Proc. Winter USENIX Conf. The operative reference cited at the top of the
hash README; source of the linear-hashing core and the
spares[]addressing. Catalogued inknowledge/research/dbms-papers/and thepostgres-paper-bibliography.mdplan. - Litwin, W. (1980). “Linear Hashing: A New Tool for File and Table Addressing.” Proc. VLDB. The original linear-hashing algorithm that Seltzer-Yigit refines.
- Fagin, R., Nievergelt, J., Pippenger, N. & Strong, H. R. (1979). “Extendible Hashing — A Fast Access Method for Dynamic Files.” ACM TODS 4(3):315-344. The directory-based alternative contrasted in the “Common DBMS Design” and “Beyond PostgreSQL” sections.
- Database System Concepts (Silberschatz, Korth, Sudarshan, 7e), ch. 14
“Indexing” — the static-hashing and dynamic-hashing foundation
(
knowledge/research/dbms-general/). - Database Internals (Petrov 2019), ch. 2-6 — page layout, overflow chains, and WAL framing for on-disk hash structures.
Sibling docs (cross-references — mechanism owned there, not duplicated here)
Section titled “Sibling docs (cross-references — mechanism owned there, not duplicated here)”postgres-index-am.md— theIndexAmRoutinedispatch contract thathashhandlerfills in (build/insert/scan/vacuum callback surface).postgres-nbtree.md— the B-tree AM; the default equality index this AM is measured against, and the source of the shared “decompose into atomic WAL actions” concurrency discipline.postgres-buffer-manager.md— buffer pins, content locks (LWLocks), and cleanup locks that the bucket-level locking protocol is built on.postgres-lock-manager.md— the SQL-level heavyweight lock table, which the hash AM deliberately does not use for page-level concurrency.postgres-wal-records-rmgr.md— theRM_HASH_IDresource manager and the rmgr WAL-replay framing for theXLOG_HASH_*records.postgres-architecture-overview.md— Axis 4 (pluggable access methods), where the hash AM sits beside nbtree, GiST, GIN, SP-GiST, and BRIN.