Skip to content

PostgreSQL Sequences — Relation-Backed Generators, Caching, and WAL

Contents:

A sequence is the relational database’s answer to a deceptively simple need: hand out a stream of distinct numbers — 1, 2, 3, … — to callers that arrive concurrently, without any two callers ever receiving the same number. The textbook framing in Database System Concepts (Silberschatz 7e) places sequences in the SQL chapter under “sequences used to generate unique key values”: they are the machinery behind surrogate primary keys, SERIAL/bigserial pseudo-types, and the SQL-standard GENERATED … AS IDENTITY columns. The contract a sequence offers is narrow but precise, and getting the contract right is the entire design problem:

  1. Uniqueness. No two successful nextval calls ever return the same value. This is the one guarantee applications actually depend on for primary-key generation.
  2. Monotonicity. Values advance in the direction of the increment (ascending by default). Within a single backend the values a session sees are strictly increasing.
  3. Explicitly not gap-free. A sequence makes no promise that the values it issues are contiguous. Gaps appear for three independent reasons — a transaction that consumed a value then rolled back, a cache of values discarded at session end, and a crash that lost pre-logged values. Treating a sequence as a gap-free row counter is the single most common application-level misuse, and the design actively refuses to support it because gap-freedom is fundamentally incompatible with concurrency: a gap-free counter would have to serialize every allocator against every other and hold the allocation until commit.

The tension that organizes every sequence implementation is (2)+(3) versus performance. If correctness only required a global counter protected by a lock, the design would be trivial and slow: every nextval would contend on one lock and, if durability mattered, force one disk write. Real systems issue millions of sequence values per second, so the counter cannot be a single contended, synchronously-logged cell. The escape hatch is batching: an allocator reserves a block of values under a single lock acquisition and a single durability event, then serves individual values from that block cheaply. The price of batching is exactly the gaps in (3) — a reserved-but-unissued block that is lost (session ends, crash) becomes a permanent hole. The design chooses to pay that price, and the SQL standard blesses the choice.

A second, subtler tension is transactionality. A naive reading says a generator is just a mutable counter, so its updates should be transactional like any other row: roll back the transaction, give the number back. But that is precisely the wrong behavior for a key generator. If nextval were transactional, two concurrent transactions could each call nextval, both see “next is 5,” and the loser’s rollback would release 5 for reuse — destroying uniqueness the moment concurrency appears. So a generator’s current position must advance with non-transactional, immediately-visible semantics: the increment is durable and visible to all sessions the instant it happens, independent of the calling transaction’s fate. This is the deepest conceptual point about sequences: they live inside the transactional storage engine but deliberately opt out of transactional rollback for their core operation, while still opting in for their definition changes (renaming, re-parameterizing).

The third axis is durability granularity. A counter that is purely in-memory loses its position on crash and re-issues values — fatal for uniqueness across restarts. A counter that fsyncs on every increment is correct but unusably slow. The resolution, again, is to pre-reserve durability: log a position ahead of where you are, so that after a crash recovery resumes from the logged-ahead position and the only loss is the gap between actual and logged — never a re-issue. This is the WAL log_cnt batching that PostgreSQL implements, and it is the storage-engine twin of the in-memory cache batching: one batches the lock, the other batches the write-ahead log.

Across engines a recurring playbook for sequence/identity generation emerges. Naming the shared moves first lets PostgreSQL’s specific choices read as points in a known design space.

Two broad families:

  • Catalog/metadata-row counter. The current position is a column in a system catalog (or a dedicated one-row table). Reading/advancing it is a catalog update. Simple, but every advance contends on catalog buffers and catalog cache invalidation. Several engines store the “next value” as metadata and rely on a dedicated latch.
  • Dedicated sequence object with its own storage. The sequence is a first-class schema object backed by its own page(s), advanced under its own lock, logged under its own log-record type. This is the high-concurrency choice and the one PostgreSQL, Oracle, and DB2 all make.

Almost universally, a fast generator has two batching levels:

  1. A durable/shared reservation — a block of values reserved in the persistent counter under one lock + one log event. Oracle calls the size CACHE n; PostgreSQL spells it CACHE too but applies it per-session (below), and separately pre-logs SEQ_LOG_VALS to the WAL.
  2. A session-local cache — each backend pulls a sub-block into private memory and serves nextval from it with no shared-state access at all.

The session-local cache is what makes nextval essentially free in the common case, and it is the direct cause of inter-session gaps and out-of-order values across sessions: session A may cache [1..10] and session B [11..20], so the order rows physically appear can interleave arbitrarily with the values they carry.

The durable counter must survive crashes without re-issuing. The shared trick is to persist a position ahead of the issued position so replay resumes past anything that was handed out. The recovery rule is “on redo, fast-forward the counter to at least the logged value.” PostgreSQL’s log_cnt is exactly this: the on-disk last_value after a log record is the value after log_cnt more fetches, so replay restores the logged-ahead position and the live system simply skips the unissued tail.

The universally adopted resolution is to split a sequence’s state:

  • Position advance (nextval/setval) — non-transactional. Durable and globally visible immediately; never rolled back.
  • Definition (create/alter/drop, re-parameterize, reset) — transactional. Honors MVCC and rollback like any DDL.

Engines implement the second half by versioning the sequence’s storage: an ALTER/RESTART writes a new physical file/segment that becomes visible only on commit, so a rollback leaves the old storage in place. This is exactly PostgreSQL’s RelationSetNewRelfilenumber rewrite.

flowchart TB
  subgraph Defn["Definition state — TRANSACTIONAL (MVCC, rollback)"]
    PGS["pg_sequence catalog row<br/>start, increment, min, max,<br/>cache, cycle, typid"]
    REL["sequence relation identity<br/>relfilenumber (new file on ALTER/RESTART)"]
  end
  subgraph Pos["Position state — NON-TRANSACTIONAL (immediate, durable)"]
    TUP["one heap tuple on block 0<br/>last_value, log_cnt, is_called"]
    WAL["XLOG_SEQ_LOG record<br/>pre-logs future last_value"]
  end
  APP["application: nextval / currval / setval"] --> Pos
  DDL["CREATE / ALTER / RESTART / DROP"] --> Defn
  Pos -. "reads params from" .-> PGS

SERIAL and GENERATED AS IDENTITY are not separate mechanisms: they are a sequence plus an OWNED BY dependency plus a column default. The generator underneath is the same nextval. PostgreSQL implements the ownership link with process_owned_by, recording a pg_depend row so that dropping the table drops the sequence.

PostgreSQL’s defining decision is that a sequence is a relationrelkind = RELKIND_SEQUENCE — not a privileged in-memory object. It owns a pg_class row, a relfilenode, a buffer-managed data file, and it flows through the same buffer manager, WAL, and smgr machinery as a heap table. What makes it a sequence is only that (a) its data file holds exactly one heap tuple on block 0, with a special-area magic number, and (b) its mutating operations go through sequence.c rather than the generic executor. This “everything is a relation” choice is why sequences inherit crash safety, replication, and pg_upgrade for free.

A sequence’s state is split across two storage locations, mirroring the transactional/non-transactional divide:

// FormData_pg_sequence — src/include/catalog/pg_sequence.h
// IMMUTABLE PARAMETERS, transactional, in the pg_sequence catalog
typedef struct FormData_pg_sequence
{
Oid seqrelid; /* OID of the owning sequence relation */
Oid seqtypid; /* smallint | integer | bigint */
int64 seqstart; /* START WITH */
int64 seqincrement; /* INCREMENT BY */
int64 seqmax; /* MAXVALUE */
int64 seqmin; /* MINVALUE */
int64 seqcache; /* CACHE (per-backend batch size) */
bool seqcycle; /* CYCLE? */
} FormData_pg_sequence;
// FormData_pg_sequence_data — src/include/commands/sequence.h
// MUTABLE POSITION, non-transactional, the single tuple on block 0
typedef struct FormData_pg_sequence_data
{
int64 last_value; /* last value handed to the on-disk state */
int64 log_cnt; /* how many more fetches are pre-logged */
bool is_called; /* has nextval ever advanced past last_value? */
} FormData_pg_sequence_data;

The split is deliberate. The parameters never change during normal operation, so they live in a catalog reached through the SEQRELID syscache (a hash lookup, no buffer lock) — see postgres-catcache-syscache.md. The position changes on every cache refill, so it lives in the relation’s own page where it can be ex-locked, mutated, and WAL-logged independently of catalog machinery.

The hot-path optimization is a process-local hash table, seqhashtab, mapping each touched sequence’s relid to a SeqTableData:

// SeqTableData — src/backend/commands/sequence.c
typedef struct SeqTableData
{
Oid relid; /* pg_class OID of this sequence (hash key) */
RelFileNumber filenumber; /* last seen relfilenumber of this sequence */
LocalTransactionId lxid; /* xact in which we last did a seq op */
bool last_valid; /* do we have a valid "last" value? */
int64 last; /* value last returned by nextval */
int64 cached; /* last value already cached for nextval */
/* if last != cached, we have not used up all the cached values */
int64 increment; /* copy of sequence's increment field */
} SeqTableData;

The invariant last != cached means “this backend still holds unissued cached values.” When it holds, nextval returns by simply adding the increment to lastno buffer, no lock, no WAL, no catalog lookup. The lxid field implements once-per-transaction locking: the relation lock is taken under the top transaction’s resource owner so it is acquired at most once per xact regardless of how many nextval calls occur. cached is reset to last (discarding the window) whenever the sequence is detected to have been transactionally replaced — its relfilenumber changed out from under us, the signature of an ALTER/RESTART commit.

flowchart TB
  START["nextval_internal(relid)"] --> INIT["init_sequence:<br/>find/create SeqTable entry,<br/>lock_and_open_sequence once per xact"]
  INIT --> CACHED{"elm-&gt;last != elm-&gt;cached ?<br/>(unissued cached values)"}
  CACHED -- yes --> FAST["elm-&gt;last += increment;<br/>return — NO buffer/WAL"]
  CACHED -- no --> PARAMS["SearchSysCache SEQRELID:<br/>load incby/max/min/cache/cycle"]
  PARAMS --> READ["read_seq_tuple:<br/>ex-lock block 0, read last_value/log_cnt"]
  READ --> DECIDE{"log_cnt &lt; fetch  OR<br/>page LSN &lt;= redo ptr ?"}
  DECIDE -- yes --> FORCE["force WAL: fetch = log = cache + SEQ_LOG_VALS;<br/>logit = true"]
  DECIDE -- no --> NOLOG["serve from cache only, no new WAL"]
  FORCE --> LOOP["loop: apply increment,<br/>check max/min, cycle, count rescnt"]
  NOLOG --> LOOP
  LOOP --> CRIT["START_CRIT_SECTION;<br/>MarkBufferDirty"]
  CRIT --> WALW{"logit &amp;&amp; RelationNeedsWAL ?"}
  WALW -- yes --> EMIT["write tuple as 'after log more fetches';<br/>XLogInsert XLOG_SEQ_LOG; PageSetLSN"]
  WALW -- no --> SKIP["skip WAL"]
  EMIT --> FINAL["install final tuple:<br/>last_value=last, log_cnt=log"]
  SKIP --> FINAL
  FINAL --> RET["END_CRIT_SECTION; unlock; return result"]

Why nextval is non-transactional, and the guards that enforce it

Section titled “Why nextval is non-transactional, and the guards that enforce it”

nextval_internal never enrolls its buffer change in the transaction’s undo path — the increment is durable the instant the buffer is dirtied and the WAL (if any) is flushed at commit. Consequently it must refuse contexts where that would be unsafe or meaningless: it calls PreventCommandIfParallelMode("nextval()") (parallel workers cannot share the backend-local cache) and PreventCommandIfReadOnly("nextval()") for non-temp sequences (a read-only xact may not durably advance shared state). By contrast, the same file’s ResetSequence and AlterSequence are transactional: they call RelationSetNewRelfilenumber to stamp a brand-new storage file, so an aborted ALTER simply never publishes the new file and the old position survives.

The code divides cleanly into five clusters: the access-method shim (access/sequence/sequence.c), creation/alteration (DefineSequence, AlterSequence, ResetSequence, init_params), the nextval hot path (init_sequence, read_seq_tuple, nextval_internal), read/set helpers (currval, lastval, do_setval, pg_* functions), and WAL/redo (seq_redo, seq_mask, seq_desc). All of commands/sequence.c is walked below in call-flow order.

The sequence access method — a relkind assertion

Section titled “The sequence access method — a relkind assertion”

The “sequence AM” is intentionally trivial. Unlike the heap or nbtree AMs, it has no IndexAmRoutine/TableAmRoutine vector; it is just two wrappers around relation_open/relation_close that assert the relkind. Sequences reuse heap tuple format on the page, so no per-tuple AM is needed.

// sequence_open — src/backend/access/sequence/sequence.c
Relation
sequence_open(Oid relationId, LOCKMODE lockmode)
{
Relation r;
r = relation_open(relationId, lockmode);
validate_relation_kind(r); /* ereport unless RELKIND_SEQUENCE */
return r;
}

validate_relation_kind raises ERRCODE_WRONG_OBJECT_TYPE if the relkind is not RELKIND_SEQUENCE, which is how nextval('not_a_seq') is rejected.

DefineSequence builds an ordinary three-column table (last_value, log_cnt, is_called), creates it via DefineRelation with relkind RELKIND_SEQUENCE, then writes the initial one-row tuple and inserts the parameter row into pg_sequence:

// DefineSequence — src/backend/commands/sequence.c
init_params(pstate, seq->options, seq->for_identity, true,
&seqform, &seqdataform, &need_seq_rewrite, &owned_by);
/* ... build CreateStmt with three int8/bool columns ... */
address = DefineRelation(stmt, RELKIND_SEQUENCE, seq->ownerId, NULL, NULL);
seqoid = address.objectId;
rel = sequence_open(seqoid, AccessExclusiveLock);
tuple = heap_form_tuple(tupDesc, value, null);
fill_seq_with_data(rel, tuple); /* lay down block 0 */
/* ... then CatalogTupleInsert into pg_sequence with the parameters ... */

init_params is the long validator that turns the WITH option list into seqform/seqdataform, applying type-dependent defaults (INT8/INT4/INT2 min/max), cross-checks (seqmin < seqmax, START inside range), and crucially resets log_cnt to 0 whenever any generation-affecting parameter changes — so a parameter change can never leave a stale pre-log window.

Laying down block 0 — fill_seq_with_data

Section titled “Laying down block 0 — fill_seq_with_data”

fill_seq_with_data initializes the relation’s first page. It extends the buffer, lays a sequence_magic (0x1717) into the special area, force-sets the tuple’s xmin to FrozenTransactionId (sequences are never VACUUMed, so the tuple must be permanently visible), and WAL-logs the whole page with REGBUF_WILL_INIT:

// fill_seq_fork_with_data — src/backend/commands/sequence.c
PageInit(page, BufferGetPageSize(buf), sizeof(sequence_magic));
sm = (sequence_magic *) PageGetSpecialPointer(page);
sm->magic = SEQ_MAGIC;
/* sequences are never vacuumed → force the tuple permanently visible */
HeapTupleHeaderSetXmin(tuple->t_data, FrozenTransactionId);
HeapTupleHeaderSetXminFrozen(tuple->t_data);
/* ... */
START_CRIT_SECTION();
MarkBufferDirty(buf);
offnum = PageAddItem(page, (Item) tuple->t_data, tuple->t_len,
InvalidOffsetNumber, false, false);
if (RelationNeedsWAL(rel) || forkNum == INIT_FORKNUM)
{
xl_seq_rec xlrec;
XLogBeginInsert();
XLogRegisterBuffer(0, buf, REGBUF_WILL_INIT);
xlrec.locator = rel->rd_locator;
XLogRegisterData(&xlrec, sizeof(xl_seq_rec));
XLogRegisterData(tuple->t_data, tuple->t_len);
recptr = XLogInsert(RM_SEQ_ID, XLOG_SEQ_LOG);
PageSetLSN(page, recptr);
}
END_CRIT_SECTION();

The wrapper fill_seq_with_data additionally handles unlogged sequences: it writes the same content into the INIT_FORKNUM fork (and flushes), so that after a crash the unlogged sequence is reset to its init image rather than carrying torn data.

Per-transaction locking — init_sequence and lock_and_open_sequence

Section titled “Per-transaction locking — init_sequence and lock_and_open_sequence”

Every entry point routes through init_sequence, which finds or creates the SeqTable entry and opens the relation. The relation lock is taken at most once per transaction by checking the cached lxid:

// lock_and_open_sequence — src/backend/commands/sequence.c
LocalTransactionId thislxid = MyProc->vxid.lxid;
if (seq->lxid != thislxid)
{
ResourceOwner currentOwner = CurrentResourceOwner;
CurrentResourceOwner = TopTransactionResourceOwner; /* lock held to xact end */
LockRelationOid(seq->relid, RowExclusiveLock);
CurrentResourceOwner = currentOwner;
seq->lxid = thislxid; /* remember we hold it */
}
return sequence_open(seq->relid, NoLock);

init_sequence then detects a transactional replacement: if the relation’s relfilenode differs from the cached filenumber, the sequence was ALTER/RESTART-ed under us, so any cached-but-unissued window is dropped by setting elm->cached = elm->last (while last/currval state is left intact):

// init_sequence — src/backend/commands/sequence.c
seqrel = lock_and_open_sequence(elm);
if (seqrel->rd_rel->relfilenode != elm->filenumber)
{
elm->filenumber = seqrel->rd_rel->relfilenode;
elm->cached = elm->last; /* discard pre-fetched cache window */
}

read_seq_tuple ex-locks block 0, validates the magic number, and returns a pointer into the buffer. It also opportunistically scrubs a stale xmax left by historical SELECT FOR UPDATE bugs, treating the fix as an unlogged hint:

// read_seq_tuple — src/backend/commands/sequence.c
*buf = ReadBuffer(rel, 0);
LockBuffer(*buf, BUFFER_LOCK_EXCLUSIVE);
page = BufferGetPage(*buf);
sm = (sequence_magic *) PageGetSpecialPointer(page);
if (sm->magic != SEQ_MAGIC)
elog(ERROR, "bad magic number in sequence \"%s\": %08X", ..., sm->magic);
lp = PageGetItemId(page, FirstOffsetNumber);
seqdatatuple->t_data = (HeapTupleHeader) PageGetItem(page, lp);
seqdatatuple->t_len = ItemIdGetLength(lp);
/* clean up a non-frozen xmax left by old SELECT FOR UPDATE bugs */
if (HeapTupleHeaderGetRawXmax(seqdatatuple->t_data) != InvalidTransactionId)
{
HeapTupleHeaderSetXmax(seqdatatuple->t_data, InvalidTransactionId);
seqdatatuple->t_data->t_infomask |= HEAP_XMAX_INVALID;
MarkBufferDirtyHint(*buf, true);
}

nextval_internal is the heart of the module. First the cache fast path:

// nextval_internal — src/backend/commands/sequence.c (fast path)
if (elm->last != elm->cached) /* some numbers were cached */
{
Assert(elm->last_valid);
Assert(elm->increment != 0);
elm->last += elm->increment;
sequence_close(seqrel, NoLock);
last_used_seq = elm;
return elm->last; /* no buffer, no WAL, no catalog */
}

On a cache miss it loads parameters from SEQRELID, reads the tuple, and decides whether this refill must be WAL-logged. Two triggers force a log: the local demand exceeds the pre-logged budget (log < fetch), or the page’s last update predates the current checkpoint redo pointer — without the latter, redo from the checkpoint would not advance the sequence past already-issued values:

// nextval_internal — src/backend/commands/sequence.c (log decision)
log = seq->log_cnt;
if (log < fetch || !seq->is_called)
{
fetch = log = fetch + SEQ_LOG_VALS; /* grab 32 extra, pre-log them */
logit = true;
}
else
{
XLogRecPtr redoptr = GetRedoRecPtr();
if (PageGetLSN(page) <= redoptr) /* first nextval after checkpoint */
{
fetch = log = fetch + SEQ_LOG_VALS;
logit = true;
}
}

The allocation loop applies the increment up to fetch times, enforcing MAXVALUE/MINVALUE and CYCLE, and recording the first issued value as the function result and the last as the new cache high-water:

// nextval_internal — src/backend/commands/sequence.c (allocation loop, ascending arm)
if ((maxv >= 0 && next > maxv - incby) || (maxv < 0 && next + incby > maxv))
{
if (rescnt > 0) break; /* satisfied caller; stop */
if (!cycle)
ereport(ERROR, (errcode(ERRCODE_SEQUENCE_GENERATOR_LIMIT_EXCEEDED),
errmsg("nextval: reached maximum value of sequence \"%s\" (%lld)", ...)));
next = minv; /* CYCLE wraps to MINVALUE */
}
else
next += incby;
fetch--;
if (rescnt < cache) { log--; rescnt++; last = next; if (rescnt == 1) result = next; }

Finally the WAL/buffer protocol. The buffer is dirtied before the values are installed (safe because the buffer is ex-locked). The WAL record carries the tuple as it would look after log more fetches, then the in-buffer tuple is set to the actual state with the remaining log_cnt:

// nextval_internal — src/backend/commands/sequence.c (WAL + install)
START_CRIT_SECTION();
MarkBufferDirty(buf);
if (logit && RelationNeedsWAL(seqrel))
{
XLogBeginInsert();
XLogRegisterBuffer(0, buf, REGBUF_WILL_INIT);
seq->last_value = next; /* the LOGGED-AHEAD value */
seq->is_called = true;
seq->log_cnt = 0;
xlrec.locator = seqrel->rd_locator;
XLogRegisterData(&xlrec, sizeof(xl_seq_rec));
XLogRegisterData(seqdatatuple.t_data, seqdatatuple.t_len);
recptr = XLogInsert(RM_SEQ_ID, XLOG_SEQ_LOG);
PageSetLSN(page, recptr);
}
/* now the intended final in-buffer state, possibly behind the logged value */
seq->last_value = last;
seq->is_called = true;
seq->log_cnt = log; /* this many fetches remain pre-logged */
END_CRIT_SECTION();

This is the log_cnt batching: after one WAL write the on-disk last_value is SEQ_LOG_VALS ahead, and the next ~32 refills decrement log_cnt in the buffer with no new WAL record. A crash loses the unissued tail (the gap), never re-issues. Note GetTopTransactionId() is called before the critical section when logit — forcing the top xact to have a real XID so commit triggers a WAL flush and synchronous-replication wait.

currval_oid returns the session’s last nextval result straight from elm->last (erroring if last_valid is false). lastval returns the last value from any sequence this session touched, via the file-static last_used_seq pointer (re-checking the sequence still exists). do_setval is nextval’s simpler sibling: it validates the target against min/max, discards the local cache (elm->cached = elm->last), and always WAL-logs the new tuple with log_cnt = 0:

// do_setval — src/backend/commands/sequence.c
if ((next < minv) || (next > maxv))
ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), ...));
if (iscalled) { elm->last = next; elm->last_valid = true; }
elm->cached = elm->last; /* forget any future cached numbers */
START_CRIT_SECTION();
seq->last_value = next;
seq->is_called = iscalled;
seq->log_cnt = 0;
MarkBufferDirty(buf);
/* ... unconditional XLOG_SEQ_LOG when RelationNeedsWAL ... */

The two- and three-argument SQL forms (setval_oid, setval3_oid) differ only in whether the caller may clear is_called — the three-arg form exists so pg_dump can restore a sequence’s exact is_called state.

seq_redo handles XLOG_SEQ_LOG by reinitializing block 0 from scratch. Because the same record type is used for live updates and a hot standby may be reading the page concurrently, redo builds the new page in palloc’d workspace and memcpys it in atomically so no torn intermediate is ever visible:

// seq_redo — src/backend/commands/sequence.c
buffer = XLogInitBufferForRedo(record, 0);
page = (Page) BufferGetPage(buffer);
localpage = (Page) palloc(BufferGetPageSize(buffer));
PageInit(localpage, BufferGetPageSize(buffer), sizeof(sequence_magic));
sm = (sequence_magic *) PageGetSpecialPointer(localpage);
sm->magic = SEQ_MAGIC;
item = (char *) xlrec + sizeof(xl_seq_rec);
itemsz = XLogRecGetDataLen(record) - sizeof(xl_seq_rec);
PageAddItem(localpage, (Item) item, itemsz, FirstOffsetNumber, false, false);
PageSetLSN(localpage, lsn);
memcpy(page, localpage, BufferGetPageSize(buffer)); /* atomic swap-in */
MarkBufferDirty(buffer);

The resource manager is registered in rmgrlist.h as PG_RMGR(RM_SEQ_ID, "Sequence", seq_redo, seq_desc, seq_identify, NULL, NULL, seq_mask, NULL). seq_mask (used by wal_consistency_checking) masks the page LSN/checksum and unused space before comparing standby and primary pages; seq_desc prints the relfilelocator for pg_waldump.

Position hints (as of 2026-06-05, REL_18 273fe94)

Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”
SymbolFileLine
SEQ_LOG_VALS (=32)src/backend/commands/sequence.c58
SEQ_MAGIC (=0x1717)src/backend/commands/sequence.c63
sequence_magicsrc/backend/commands/sequence.c65
SeqTableDatasrc/backend/commands/sequence.c76
seqhashtab (static)src/backend/commands/sequence.c91
last_used_seq (static)src/backend/commands/sequence.c97
DefineSequencesrc/backend/commands/sequence.c121
ResetSequencesrc/backend/commands/sequence.c262
fill_seq_with_datasrc/backend/commands/sequence.c338
fill_seq_fork_with_datasrc/backend/commands/sequence.c359
AlterSequencesrc/backend/commands/sequence.c437
SequenceChangePersistencesrc/backend/commands/sequence.c541
DeleteSequenceTuplesrc/backend/commands/sequence.c570
nextval_oidsrc/backend/commands/sequence.c615
nextval_internalsrc/backend/commands/sequence.c623
currval_oidsrc/backend/commands/sequence.c866
lastvalsrc/backend/commands/sequence.c897
do_setvalsrc/backend/commands/sequence.c945
setval_oidsrc/backend/commands/sequence.c1049
lock_and_open_sequencesrc/backend/commands/sequence.c1085
create_seq_hashtablesrc/backend/commands/sequence.c1113
init_sequencesrc/backend/commands/sequence.c1129
read_seq_tuplesrc/backend/commands/sequence.c1190
init_paramssrc/backend/commands/sequence.c1257
process_owned_bysrc/backend/commands/sequence.c1593
sequence_optionssrc/backend/commands/sequence.c1707
pg_get_sequence_datasrc/backend/commands/sequence.c1787
pg_sequence_last_valuesrc/backend/commands/sequence.c1847
seq_redosrc/backend/commands/sequence.c1892
ResetSequenceCachessrc/backend/commands/sequence.c1945
seq_masksrc/backend/commands/sequence.c1960
sequence_opensrc/backend/access/sequence/sequence.c37
sequence_closesrc/backend/access/sequence/sequence.c58
validate_relation_kindsrc/backend/access/sequence/sequence.c70
seq_desc / seq_identifysrc/backend/access/rmgrdesc/seqdesc.c21 / 34
FormData_pg_sequencesrc/include/catalog/pg_sequence.h23
FormData_pg_sequence_data / SEQ_COL_*src/include/commands/sequence.h25 / 38
XLOG_SEQ_LOG / xl_seq_recsrc/include/commands/sequence.h46 / 48
PG_RMGR(RM_SEQ_ID, "Sequence", …)src/include/access/rmgrlist.h43

Checks performed against the REL_18_STABLE worktree at /data/hgryoo/references/postgres (commit 273fe94):

  • SEQ_LOG_VALS is 32 and SEQ_MAGIC is 0x1717. Confirmed at sequence.c:58 and :63. The pre-log batch size is a compile-time constant, not a GUC.
  • FormData_pg_sequence_data has exactly three fields {last_value, log_cnt, is_called} (commands/sequence.h:25), and the relation’s three columns map to them via SEQ_COL_LASTVAL/LOG/CALLED = 1/2/3 (commands/sequence.h:38). Verified the column build loop in DefineSequence emits last_value, log_cnt, is_called in that order.
  • Parameters live in pg_sequence with fields seqrelid, seqtypid, seqstart, seqincrement, seqmax, seqmin, seqcache, seqcycle (catalog/pg_sequence.h:23), reached through MAKE_SYSCACHE(SEQRELID, pg_sequence_seqrelid_index, 32) — confirming the SearchSysCache1(SEQRELID, …) calls in nextval_internal, do_setval, ResetSequence, sequence_options.
  • The cache fast path takes no lock/WAL. Re-read nextval_internal lines 668–676: when elm->last != elm->cached it returns after only sequence_close(seqrel, NoLock), with no read_seq_tuple, no XLogInsert. Confirmed.
  • Checkpoint force-log condition is PageGetLSN(page) <= redoptr with redoptr = GetRedoRecPtr() (nextval_internal ~721–728). Verified the comment block explaining “first nextval after a checkpoint.”
  • The logged tuple is the future state. In the WAL arm, seq->last_value = next (the logged-ahead value) is set before XLogInsert, then overwritten with seq->last_value = last; seq->log_cnt = log after. Verified the two assignments straddle the XLogInsert call. This is the mechanism by which redo restores a value ahead of what was issued.
  • nextval/setval guards. PreventCommandIfParallelMode and PreventCommandIfReadOnly (the latter gated on !rd_islocaltemp) appear in both nextval_internal and do_setval. Confirmed.
  • ALTER/RESET are transactional via relfilenumber. ResetSequence and AlterSequence (when need_seq_rewrite) call RelationSetNewRelfilenumber then fill_seq_with_data. Confirmed at :309 and :503.
  • seq_redo reinit-and-memcpy. Verified XLogInitBufferForRedo, palloc local page, PageAddItem, memcpy(page, localpage, …). The comment explicitly cites hot-standby concurrent reads as the reason.
  • RM registration. rmgrlist.h:43 registers RM_SEQ_ID “Sequence” with seq_redo, seq_desc, seq_identify, and seq_mask (decode/undo hooks NULL). Confirmed no PG19-only rmgr changes are referenced.
  • The sequence AM is a relkind assertion only. access/sequence/sequence.c is 79 lines: sequence_open, sequence_close, validate_relation_kind. No AM routine vector exists. Confirmed.
  • The pre-log critical-section XID forcing. GetTopTransactionId() is invoked inside nextval_internal (line 396 path is fill_seq_with_data’s; the nextval pre-log site forces a top XID) and in both rewrite arms of AlterSequence/SequenceChangePersistence (:497, :559), each immediately preceding RelationSetNewRelfilenumber (:309, :503, :562). Confirmed the rewrite path and the pre-log path both force a real top-transaction XID so the eventual commit drives a WAL flush.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”
  • Oracle’s CACHE/NOCACHE and the NOORDER/ORDER knob. Oracle stores the sequence’s high-water mark in the SEQ$ dictionary table and caches a block of values in the SGA (shared, not per-session). Its ORDER clause forces RAC-wide ordering by serializing allocation across instances through the global lock — the same uniqueness-vs-ordering-vs-throughput trilemma PostgreSQL resolves by refusing cross-session ordering. PostgreSQL’s CACHE is per-backend (a sharper source of inter-session gaps than Oracle’s shared cache), but it never pays the cross-instance round-trip Oracle’s ORDER demands. A side-by-side of nextval_internal’s SEQ_LOG_VALS pre-log against Oracle’s SGA cache + dictionary update would sharpen what “log ahead” buys versus “cache ahead.”

  • Sequence ranges as a distributed-systems primitive. In a sharded or multi-master system a single monotonic counter is a global serialization point — exactly the bottleneck batching exists to avoid. The standard escape is range allocation: a coordinator hands each node a disjoint block ([1000..1999]), and nodes serve locally. This is structurally identical to PostgreSQL’s per-backend SeqTable window scaled up one level — the backend is the “node,” the WAL pre-log is the “coordinator handoff.” Snowflake-style ID generators go further and abandon the central counter entirely, encoding (timestamp, node_id, per-node_counter) so no coordination is needed at the cost of strict global monotonicity. PostgreSQL’s design sits at the conservative end: one authoritative durable counter, batched twice (cache + WAL), trading distributed scalability for single-node simplicity and exact crash-recovery semantics.

  • Lock-manager contention as the real cost. A Scalable Lock Manager for Multicores (dbms-papers/scalable-lock-manager.md) frames why even a RowExclusiveLock per nextval would be ruinous on many cores: the lock table’s partition latches become the bottleneck long before the buffer does. PostgreSQL sidesteps this two ways — the lxid check takes the relation lock once per transaction (not once per nextval), and the cache fast path takes no lock at all when elm->last != elm->cached. The paper’s speculative-lock-inheritance ideas are the research frontier for the rare workload that defeats both (millions of single-value nextvals in autocommit, each its own transaction, on one hot sequence).

  • MVCC-free position state. Database Internals (Petrov, ch. 5 “Transaction Processing and Recovery”) motivates why a generator must opt out of MVCC: rolling back an allocation would destroy uniqueness under concurrency. The frozen-xmin one-row tuple (HeapTupleHeaderSetXminFrozen) is PostgreSQL’s concrete encoding of “this row is never versioned, never vacuumed, always visible” — a deliberate hole in the otherwise-uniform MVCC fabric. A research comparison against systems that do version sequence state (and pay the rollback-reuse cost) would quantify the trade.

  • GENERATED AS IDENTITY vs. SERIAL and the SQL standard. The SQL:2003 identity-column feature standardized what SERIAL did informally. PostgreSQL implements both over the same nextval machinery, differing only in the catalog dependency (process_owned_by records pg_depend with DEPENDENCY_INTERNAL for identity vs. DEPENDENCY_AUTO for serial), which governs whether the sequence can be dropped independently. The generator contract — uniqueness, monotonicity, no gap-freedom — is identical; only the ownership lifecycle differs. This is a clean example of a storage mechanism (the relation-backed counter) being reused under two SQL surface syntaxes.

  • Logical replication of sequence advances. Historically nextval on a publisher did not replicate to a logical subscriber, so failover lost the position. The XLOG_SEQ_LOG record carries everything needed to replicate the logged-ahead last_value; building a sequence-decoding output-plugin path on top of seq_desc’s relfilelocator is an active area (see postgres-logical-decoding.md). The pre-log semantics matter here: a logical subscriber that replays the logged-ahead value inherits the same “skip the unissued tail” gap behavior, which is correct for uniqueness but surprising to operators expecting contiguous values.

In-tree source files (REL_18_STABLE, commit 273fe94)

Section titled “In-tree source files (REL_18_STABLE, commit 273fe94)”
  • src/backend/commands/sequence.c — the entire module: DefineSequence, AlterSequence, ResetSequence, SequenceChangePersistence, init_params, fill_seq_with_data/fill_seq_fork_with_data, the SeqTableData cache, init_sequence/lock_and_open_sequence, read_seq_tuple, nextval_internal, currval_oid/lastval, do_setval/setval_oid, process_owned_by, and the WAL seq_redo/seq_mask. The position-hint table above pins each symbol.
  • src/backend/access/sequence/sequence.c — the 79-line “sequence AM”: sequence_open, sequence_close, validate_relation_kind. A relkind assertion over relation_open, nothing more.
  • src/backend/access/rmgrdesc/seqdesc.cseq_desc/seq_identify for pg_waldump decoding of XLOG_SEQ_LOG.
  • src/include/commands/sequence.hFormData_pg_sequence_data {last_value, log_cnt, is_called}, SEQ_COL_* column numbers, XLOG_SEQ_LOG, and the xl_seq_rec WAL header struct.
  • src/include/catalog/pg_sequence.hFormData_pg_sequence (the immutable parameters: seqstart, seqincrement, seqmax, seqmin, seqcache, seqcycle, seqtypid).
  • src/include/access/rmgrlist.hPG_RMGR(RM_SEQ_ID, "Sequence", seq_redo, seq_desc, seq_identify, NULL, NULL, seq_mask, NULL) registration.
  • Database System Concepts (Silberschatz, Korth, Sudarshan, 7e), SQL chapter — “sequences used to generate unique key values,” GENERATED AS IDENTITY, surrogate keys (knowledge/research/dbms-general/database-system-concepts.md).
  • Database Internals (Petrov 2019), ch. 5 “Transaction Processing and Recovery” — why a generator opts out of MVCC, and WAL log-ahead framing (knowledge/research/dbms-general/database-internals.md).
  • A Scalable Lock Manager for Multicores — lock-table partition-latch contention; motivates the once-per-transaction lock and the lock-free cache fast path (knowledge/research/dbms-papers/scalable-lock-manager.md).
  • Optimistic Concurrency Control (Kung & Robinson 1981) — background for why “validate then advance, never roll back” is the right discipline for a shared counter (knowledge/research/dbms-papers/occ.md).

Sibling docs (cross-references — mechanism owned there, not duplicated here)

Section titled “Sibling docs (cross-references — mechanism owned there, not duplicated here)”
  • postgres-heap-am.md — the heap tuple format and page item layout the one-row sequence tuple reuses; this doc treats the tuple as opaque.
  • postgres-xlog-wal.mdXLogBeginInsert/XLogRegisterBuffer/XLogInsert, REGBUF_WILL_INIT, redo buffer initialization (XLogInitBufferForRedo), and the resource-manager dispatch that routes RM_SEQ_ID to seq_redo.
  • postgres-catcache-syscache.md — the SEQRELID syscache that nextval_internal/do_setval/ResetSequence use to load sequence parameters without a buffer lock.
  • postgres-smgr-md.md / postgres-buffer-manager.md — the storage-manager and buffer layers under ReadBuffer(rel, 0) and the INIT_FORKNUM handling for unlogged sequences.
  • postgres-ddl-execution.mdDefineRelation and RelationSetNewRelfilenumber, the DDL machinery DefineSequence/AlterSequence/ResetSequence ride on.
  • postgres-architecture-overview.md — where the relation-backed sequence sits in the storage-engine axis as a degenerate one-row table.