Skip to content

PostgreSQL Lock Manager — The Heavyweight Lock Table, Lock Modes, and Deadlock Detection

Contents:

The lock manager is the component that realizes pessimistic concurrency control: before a transaction reads or writes a logical object, it asks the manager for a lock in a particular mode, and the manager either grants it, blocks the requester, or (in a cycle) aborts somebody. Database System Concepts (Silberschatz, 7e, ch. 18 “Concurrency Control”) frames this as the lock-based protocol family, whose canonical member is two-phase locking (2PL): every transaction acquires all its locks in a growing phase and releases them in a shrinking phase, never acquiring after the first release. 2PL guarantees conflict-serializability; strict 2PL — the variant nearly all engines use — additionally holds every exclusive lock until commit/abort, which also guarantees recoverability and avoids cascading aborts. PostgreSQL holds transaction-level locks to end-of-transaction, so it is a strict-2PL engine for the locks it takes.

Two refinements of the basic model shape every real lock manager and are the knobs PostgreSQL turns:

  1. Lock granularity and multi-granularity locking (MGL). A lock can name a whole table, a page, or a single tuple. Coarse locks are cheap to track but kill concurrency; fine locks maximize concurrency but explode the lock table. Database System Concepts §18.3 (“Multiple Granularity”) resolves this with a hierarchy (database → table → page → tuple) and intention modes (IS, IX, SIX): before locking a node in S or X mode, a transaction locks every ancestor in the corresponding intention mode, so a coarse locker can detect a fine locker without walking the whole subtree. The richness of the mode set — and the conflict table that defines which modes coexist — is the heart of the design.

  2. Deadlock handling. Because transactions request locks in an order the manager cannot predict, cycles are possible: A holds x and waits for y while B holds y and waits for x. The textbook (§18.2.2) gives two families: prevention (timeouts, wait-die / wound-wait timestamp ordering, or acquiring all locks atomically) and detection-and-recovery (build a waits-for graph (WFG) — a node per transaction, an edge A→B when A waits for a lock B holds — and periodically test it for a cycle, aborting a victim to break it). PostgreSQL chooses detection: it lets the cycle form, then runs a WFG search after a timeout.

The lock manager is deliberately separate from MVCC. Under snapshot isolation, reads do not take row locks at all — they read a visible version (see postgres-mvcc-visibility.md). The heavyweight lock manager exists for the conflicts MVCC cannot resolve by versioning: writer/writer serialization, DDL vs. DML, explicit LOCK TABLE, SELECT ... FOR UPDATE, and waiting on another transaction to finish. This is why a read-mostly OLTP workload takes millions of weak relation locks that never conflict — the observation that motivates the fast path (below).

The textbook gives the model (2PL, MGL, WFG). This section names the engineering conventions that nearly every production lock manager — PostgreSQL, Oracle, DB2, SQL Server, MySQL/InnoDB, CUBRID — adopts to make that model run on a multicore box. PostgreSQL’s specific choices in the next section read as one set of dials within this shared space.

A central lock table keyed by an object identifier

Section titled “A central lock table keyed by an object identifier”

Every engine keeps a hash table whose key identifies the lockable object (a table OID, a page number, a tuple address, a transaction id) and whose value records who holds and who waits. Two value records are nearly universal: a per-object record (the queue of holders and waiters on this object) and a per-holder record (what one transaction holds/awaits on this object). The per-object record carries an aggregate of all granted modes so a conflict test is one bitwise operation instead of a list walk.

The meaning of “mode M1 conflicts with mode M2” is encoded in a static conflict table (a 2-D bit matrix), not in branching code. A request is tested against the object’s aggregate “granted modes” mask by ANDing the requester’s conflict row with the mask. This makes the mode set a piece of data that can be extended without touching the acquisition path.

Within one transaction the same lock is requested and released many times (every catalog lookup re-locks pg_class). Touching shared memory each time would serialize backends on a latch. Every engine keeps a backend-local count of how many times it holds each lock, so the second through N-th acquisition of an already-held lock never touches shared memory.

A single global mutex over the lock table is the classic multicore bottleneck — the central finding of A Scalable Lock Manager for Multicores (Jung et al., SIGMOD 2013), which traced MySQL’s throughput collapse at 16 cores to cache-line bouncing on the one lock-table mutex even on a read-only, conflict-free workload. The standard mitigation is to partition the lock table by a hash of the key and give each partition its own latch, so unrelated objects never contend. The frontier the paper pushes past — splitting lock allocation from lock manipulation so the manipulation is latch-free — is exactly the gap PostgreSQL’s fast path attacks from a different angle.

A fast path for the common, non-conflicting case

Section titled “A fast path for the common, non-conflicting case”

Even partitioned, a hot table funnels every locker into one partition latch. Engines therefore add an escape hatch: if a lock is known to be weak (unlikely to conflict) and the system can cheaply prove no conflicting strong lock exists, record it in a per-backend private array and skip the shared table entirely. The cost is moved to the rare strong locker, who must “flush” everyone’s fast-path entries into the shared table before it can conflict-check.

Deadlock detection deferred and asynchronous

Section titled “Deadlock detection deferred and asynchronous”

Running a WFG search on every block would dominate the no-deadlock common case. The convention is optimistic waiting: block immediately with no check, arm a timer, and only search the WFG if the wait outlasts a timeout (PostgreSQL’s deadlock_timeout, default 1 s). A deadlock that resolves itself within the timeout costs nothing.

Theory / conventionPostgreSQL name
Lockable object identityLOCKTAG — 16-byte tag, locktag_typeLockTagType
Per-object holder/waiter recordLOCK (in shared LockMethodLockHash)
Per-holder recordPROCLOCK (in shared LockMethodProcLockHash)
Per-backend lock cacheLOCALLOCK (in backend-private LockMethodLocalHash)
Conflict table (mode × mode bit matrix)LockConflicts[] + LockMethodData.conflictTab
The 8 lock modesAccessShareLockAccessExclusiveLock (lockdefs.h)
Aggregate granted-modes maskLOCK.grantMask; awaited modes in LOCK.waitMask
Partitioned latchingNUM_LOCK_PARTITIONS (=16) partition LWLocks, hash of LOCKTAG
Fast path for weak relation locksPGPROC.fpRelId[] / fpLockBits[]; FastPathStrongRelationLocks
Wait queue on the objectLOCK.waitProcs (list of PGPROC); request stored in PGPROC
Waits-for graph + detectionDeadLockCheck / FindLockCycle (deadlock.c)
Optimistic waiting + timeoutProcSleep arms deadlock_timeout; CheckDeadLock runs the search

PostgreSQL’s README (in src/backend/storage/lmgr/) calls these “regular locks” or “heavyweight locks” and contrasts them with three lighter interprocess locks — spinlocks and LWLocks (own doc: postgres-lwlock-spinlock.md) and SIReadLock predicate locks (own doc: postgres-ssi-predicate-locking.md). The heavyweight manager is the only one of the four with “a variety of lock modes with table-driven semantics, full deadlock detection, and automatic release at transaction end.” Everything below is that manager.

The distinguishing choices are: (1) three tiers of lock record — backend-local LOCALLOCK, shared LOCK, shared PROCLOCK — so repeated acquisition is latch-free and shared state stores only one grant per backend/object/mode; (2) the conflict semantics are a static 8×8 bit table; (3) the shared hashes are partitioned 16 ways by LOCKTAG hash; (4) a fast path lets weak relation locks bypass shared memory entirely; (5) deadlock detection is deferred until deadlock_timeout and resolves soft cycles by rearranging wait queues rather than aborting.

The three lock records and how they relate

Section titled “The three lock records and how they relate”
flowchart LR
  subgraph BK["backend-private memory"]
    LL["LOCALLOCK<br/>tag = (LOCKTAG, mode)<br/>nLocks, lockOwners[]<br/>lock*, proclock*"]
  end
  subgraph SHM["shared memory (partitioned)"]
    LK["LOCK<br/>tag = LOCKTAG<br/>grantMask, waitMask<br/>granted[], requested[]<br/>procLocks, waitProcs"]
    PL["PROCLOCK<br/>tag = (myLock, myProc)<br/>holdMask, releaseMask<br/>groupLeader"]
  end
  PG["PGPROC<br/>(this backend)<br/>waitLock, waitLockMode<br/>fpRelId[], fpLockBits[]"]
  LL -- "lock*" --> LK
  LL -- "proclock*" --> PL
  PL -- "tag.myLock" --> LK
  PL -- "tag.myProc" --> PG
  LK -- "procLocks (dlist)" --> PL
  LK -- "waitProcs (dclist)" --> PG

Figure 1 — The three lock records. LOCALLOCK is a backend-private cache keyed by (object, mode) holding the local acquisition count; it points at the shared LOCK (per-object) and PROCLOCK (per-backend-per-object) when the lock lives in the main table. A PROCLOCK is doubly linked into both its LOCK’s procLocks list and its owning PGPROC. When a lock is taken via the fast path, LOCALLOCK.lock and .proclock are NULL.

The LOCK is “per lockable object” and the PROCLOCK is “per lock-and-requestor” (README). The shared structures “only allow a single lock grant to be made per lockable object/lock mode/backend” — the internal request counts (a transaction taking the same lock twice) are held in LOCALLOCK so the shared data need not be touched.

// LOCK — src/include/storage/lock.h
typedef struct LOCK
{
LOCKTAG tag; /* unique identifier of lockable object */
LOCKMASK grantMask; /* bitmask for lock types already granted */
LOCKMASK waitMask; /* bitmask for lock types awaited */
dlist_head procLocks; /* list of PROCLOCK objects assoc. with lock */
dclist_head waitProcs; /* list of PGPROC objects waiting on lock */
int requested[MAX_LOCKMODES]; /* counts of requested locks */
int nRequested; /* total of requested[] array */
int granted[MAX_LOCKMODES]; /* counts of granted locks */
int nGranted; /* total of granted[] array */
} LOCK;

grantMask is the aggregate the conflict test ANDs against; granted[i] is the per-mode count behind it (bit i of grantMask is on iff granted[i] > 0). waitMask flags modes some waiter wants but cannot get yet. When nRequested falls to zero the LOCK is freed.

// PROCLOCK + PROCLOCKTAG — src/include/storage/lock.h
typedef struct PROCLOCKTAG
{
LOCK *myLock; /* link to per-lockable-object information */
PGPROC *myProc; /* link to PGPROC of owning backend */
} PROCLOCKTAG;
typedef struct PROCLOCK
{
PROCLOCKTAG tag; /* unique identifier of proclock object */
PGPROC *groupLeader; /* proc's lock group leader, or proc itself */
LOCKMASK holdMask; /* bitmask for lock types currently held */
LOCKMASK releaseMask; /* bitmask for lock types to be released */
dlist_node lockLink; /* list link in LOCK's list of proclocks */
dlist_node procLink; /* list link in PGPROC's list of proclocks */
} PROCLOCK;

The PROCLOCKTAG uses raw pointers as a key — safe because “a PROCLOCK never outlives either its lock or its proc” (README). holdMask is the per-backend slice of LOCK.grantMask. The two dlist_nodes thread the same object into two lists: the LOCK’s holder list and the PGPROC’s held-locks list.

// LOCALLOCK + LOCALLOCKTAG — src/include/storage/lock.h
typedef struct LOCALLOCKTAG
{
LOCKTAG lock; /* identifies the lockable object */
LOCKMODE mode; /* lock mode for this table entry */
} LOCALLOCKTAG;
typedef struct LOCALLOCK
{
LOCALLOCKTAG tag; /* unique identifier of locallock entry */
uint32 hashcode; /* copy of LOCKTAG's hash value */
LOCK *lock; /* associated LOCK object, if any */
PROCLOCK *proclock; /* associated PROCLOCK object, if any */
int64 nLocks; /* total number of times lock is held */
int numLockOwners; /* # of relevant ResourceOwners */
int maxLockOwners; /* allocated size of array */
LOCALLOCKOWNER *lockOwners; /* dynamically resizable array */
bool holdsStrongLockCount; /* bumped FastPathStrongRelationLocks */
bool lockCleared; /* we read all sinval msgs for lock */
} LOCALLOCK;

Note LOCALLOCK’s tag includes the mode — there is one LOCALLOCK per (object, mode) the backend touches, whereas there is one LOCK per object across all modes. nLocks is the local re-entrancy count; lockOwners[] tracks which ResourceOwners hold it, so subtransaction rollback can release exactly its share (see postgres-backend-lifecycle.md for the resource-owner tree).

The LOCKTAG: naming any lockable object in 16 bytes

Section titled “The LOCKTAG: naming any lockable object in 16 bytes”
// LOCKTAG — src/include/storage/lock.h
typedef struct LOCKTAG
{
uint32 locktag_field1; /* a 32-bit ID field */
uint32 locktag_field2; /* a 32-bit ID field */
uint32 locktag_field3; /* a 32-bit ID field */
uint16 locktag_field4; /* a 16-bit ID field */
uint8 locktag_type; /* see enum LockTagType */
uint8 locktag_lockmethodid; /* lockmethod indicator */
} LOCKTAG;

The struct is “defined with malice aforethought to fit into 16 bytes with no padding” — padding bytes would make the hash random because the whole struct is hashed. locktag_type selects how the four ID fields are read; the SET_LOCKTAG_* macros encode the convention. The 12 types (LockTagType) span the granularity hierarchy and several non-relation objects:

locktag_typeIdentified by (fields)Typical use
LOCKTAG_RELATIONdbOid, relOidwhole table (most DML/DDL locks)
LOCKTAG_RELATION_EXTENDdbOid, relOidright to add a block to a relation
LOCKTAG_DATABASE_FROZEN_IDSdbOidpg_database.datfrozenxid update
LOCKTAG_PAGEdbOid, relOid, blocknumone page (e.g. GIN cleanup)
LOCKTAG_TUPLEdbOid, relOid, blocknum, offnumone physical tuple (SELECT FOR …)
LOCKTAG_TRANSACTIONxidwait for an xact to finish
LOCKTAG_VIRTUALTRANSACTIONprocNumber, lxidwait for a vxid (CIC, hot standby)
LOCKTAG_SPECULATIVE_TOKENxid, tokenspeculative insert (INSERT ... ON CONFLICT)
LOCKTAG_OBJECTdbOid, classOid, objOid, objsubidnon-relation catalog object
LOCKTAG_USERLOCK(reserved)old contrib/userlock
LOCKTAG_ADVISORY4 user idspg_advisory_lock(...)
LOCKTAG_APPLY_TRANSACTIONdbOid, subOid, xid, objidlogical-replication apply conflict

locktag_lockmethodid selects the lock method: DEFAULT_LOCKMETHOD (1) for everything system-driven, USER_LOCKMETHOD (2) for advisory locks. Both methods share the same mode set and conflict table — they differ only in lifetime rules (advisory/user locks can outlive a transaction; see README §“User Locks”).

// lock modes — src/include/storage/lockdefs.h
#define NoLock 0 /* not a lock; "don't get a lock" */
#define AccessShareLock 1 /* SELECT */
#define RowShareLock 2 /* SELECT FOR UPDATE/FOR SHARE */
#define RowExclusiveLock 3 /* INSERT, UPDATE, DELETE */
#define ShareUpdateExclusiveLock 4 /* VACUUM, ANALYZE, CREATE INDEX CONCURRENTLY */
#define ShareLock 5 /* CREATE INDEX (non-concurrent) */
#define ShareRowExclusiveLock 6 /* like EXCLUSIVE but allows ROW SHARE */
#define ExclusiveLock 7 /* blocks ROW SHARE / SELECT FOR UPDATE */
#define AccessExclusiveLock 8 /* ALTER/DROP TABLE, VACUUM FULL, LOCK TABLE */

These are table-level modes — PostgreSQL’s MGL hierarchy is shallow (it does not take page/tuple S/X locks for ordinary MVCC reads). The lower modes (AccessShareLock, RowShareLock, RowExclusiveLock) are what DML takes and are mutually compatible; the higher modes are DDL/maintenance and progressively exclude. The conflict semantics live entirely in one array:

// LockConflicts[] — src/backend/storage/lmgr/lock.c (condensed)
static const LOCKMASK LockConflicts[] = {
0,
/* AccessShareLock */
LOCKBIT_ON(AccessExclusiveLock),
/* RowShareLock */
LOCKBIT_ON(ExclusiveLock) | LOCKBIT_ON(AccessExclusiveLock),
/* RowExclusiveLock */
LOCKBIT_ON(ShareLock) | LOCKBIT_ON(ShareRowExclusiveLock) |
LOCKBIT_ON(ExclusiveLock) | LOCKBIT_ON(AccessExclusiveLock),
/* ShareUpdateExclusiveLock */
LOCKBIT_ON(ShareUpdateExclusiveLock) | LOCKBIT_ON(ShareLock) |
LOCKBIT_ON(ShareRowExclusiveLock) |
LOCKBIT_ON(ExclusiveLock) | LOCKBIT_ON(AccessExclusiveLock),
/* ShareLock */
LOCKBIT_ON(RowExclusiveLock) | LOCKBIT_ON(ShareUpdateExclusiveLock) |
LOCKBIT_ON(ShareRowExclusiveLock) |
LOCKBIT_ON(ExclusiveLock) | LOCKBIT_ON(AccessExclusiveLock),
/* ShareRowExclusiveLock */
LOCKBIT_ON(RowExclusiveLock) | LOCKBIT_ON(ShareUpdateExclusiveLock) |
LOCKBIT_ON(ShareLock) | LOCKBIT_ON(ShareRowExclusiveLock) |
LOCKBIT_ON(ExclusiveLock) | LOCKBIT_ON(AccessExclusiveLock),
/* ExclusiveLock */
LOCKBIT_ON(RowShareLock) | LOCKBIT_ON(RowExclusiveLock) |
LOCKBIT_ON(ShareUpdateExclusiveLock) | LOCKBIT_ON(ShareLock) |
LOCKBIT_ON(ShareRowExclusiveLock) |
LOCKBIT_ON(ExclusiveLock) | LOCKBIT_ON(AccessExclusiveLock),
/* AccessExclusiveLock */
LOCKBIT_ON(AccessShareLock) | LOCKBIT_ON(RowShareLock) |
LOCKBIT_ON(RowExclusiveLock) | LOCKBIT_ON(ShareUpdateExclusiveLock) |
LOCKBIT_ON(ShareLock) | LOCKBIT_ON(ShareRowExclusiveLock) |
LOCKBIT_ON(ExclusiveLock) | LOCKBIT_ON(AccessExclusiveLock)
};

LockConflicts[i] is the set of modes that conflict with mode i. The matrix is symmetric; the conventional rendering (✗ = conflict):

ACSRSRESUESSREEAE
AccessShare (1)
RowShare (2)
RowExclusive (3)
ShareUpdateExcl (4)
Share (5)
ShareRowExcl (6)
Exclusive (7)
AccessExclusive (8)

Read it as the MGL/intent picture flattened to table granularity: AccessShareLock (a reader) conflicts only with AccessExclusiveLock (a table rewriter), so SELECTs and most DDL-free work coexist freely. The three DML modes (1–3) form a mutually compatible block — many INSERT/UPDATE/DELETE backends run against one table at once. ShareUpdateExclusiveLock is self-conflicting, which is why only one VACUUM/ANALYZE/CREATE INDEX CONCURRENTLY runs on a table at a time. The single DoLockModesConflict helper is the public face of the table:

// DoLockModesConflict — src/backend/storage/lmgr/lock.c
bool
DoLockModesConflict(LOCKMODE mode1, LOCKMODE mode2)
{
LockMethod lockMethodTable = LockMethods[DEFAULT_LOCKMETHOD];
if (lockMethodTable->conflictTab[mode1] & LOCKBIT_ON(mode2))
return true;
return false;
}

The two shared hashes (LockMethodLockHash for LOCKs, LockMethodProcLockHash for PROCLOCKs) are partitioned 16 ways. Before 8.2 one global LockMgrLock protected everything and became “a contention bottleneck” (README); the fix was to assign each lockable object to a partition by the low-order bits of its LOCKTAG hash, each partition guarded by its own LWLock.

// partition selection — src/include/storage/lock.h + lwlock.h
#define LOG2_NUM_LOCK_PARTITIONS 4
#define NUM_LOCK_PARTITIONS (1 << LOG2_NUM_LOCK_PARTITIONS) /* = 16 */
#define LockHashPartition(hashcode) \
((hashcode) % NUM_LOCK_PARTITIONS)
#define LockHashPartitionLock(hashcode) \
(&MainLWLockArray[LOCK_MANAGER_LWLOCK_OFFSET + \
LockHashPartition(hashcode)].lock)

A normal acquire/release touches one partition lock. The trick that lets one partition lock cover both hashes is a specialized PROCLOCK hash function that forces a PROCLOCK into the same partition as its LOCK:

// proclock_hash — src/backend/storage/lmgr/lock.c
static uint32
proclock_hash(const void *key, Size keysize)
{
const PROCLOCKTAG *proclocktag = (const PROCLOCKTAG *) key;
uint32 lockhash;
Datum procptr;
/* Look into the associated LOCK object, and compute its hash code */
lockhash = LockTagHashCode(&proclocktag->myLock->tag);
/* xor the proc address in, left-shifted past the partition-number bits */
procptr = PointerGetDatum(proclocktag->myProc);
lockhash ^= ((uint32) procptr) << LOG2_NUM_LOCK_PARTITIONS;
return lockhash;
}

The proc pointer is mixed in above the low 4 bits, so the partition number is inherited from the LOCK while the rest of the hash still spreads PROCLOCKs across hash chains. Deadlock checking, which must see the whole graph, locks all 16 partitions in partition-number order (a fixed order that prevents LWLock deadlock among the partition locks themselves).

LockAcquire (and its backing LockAcquireExtended) is the heart of the manager. The control flow has four exits — already-held (local count bump), fast-path grant, immediate shared-table grant, and wait-then-grant — plus the deadlock/no-wait error exit.

flowchart TD
  A["LockAcquireExtended(tag, mode)"] --> B["find/create LOCALLOCK<br/>(backend-private hash)"]
  B --> C{"locallock.nLocks &gt; 0?<br/>(already held)"}
  C -- "yes" --> C1["GrantLockLocal:<br/>bump local count, return"]
  C -- "no" --> D{"eligible for<br/>relation fast path?"}
  D -- "yes" --> E{"strong-lock count<br/>for partition == 0?"}
  E -- "yes" --> E1["FastPathGrantRelationLock<br/>→ record in PGPROC array,<br/>return LOCKACQUIRE_OK"]
  E -- "no" --> F
  D -- "no" --> F{"this is a strong lock<br/>that conflicts w/ fast path?"}
  F -- "yes" --> F1["BeginStrongLockAcquire +<br/>FastPathTransferRelationLocks<br/>(flush others' fp locks)"]
  F1 --> G
  F -- "no" --> G["LWLockAcquire(partitionLock)"]
  G --> H["SetupLockInTable:<br/>find/create LOCK + PROCLOCK"]
  H --> I{"conflictTab[mode] &amp; waitMask<br/>OR LockCheckConflicts?"}
  I -- "no conflict" --> J["GrantLock → OK"]
  I -- "conflict" --> K["JoinWaitQueue(dontWait)"]
  K --> L{"result?"}
  L -- "OK (grant-on-join)" --> J
  L -- "ERROR (early deadlock<br/>or dontWait)" --> M["clean up, DeadLockReport<br/>or return NOT_AVAIL"]
  L -- "WAITING" --> N["WaitOnLock → ProcSleep<br/>(sleep on semaphore)"]
  N --> O{"granted or<br/>deadlock?"}
  O -- "granted" --> J
  O -- "deadlock" --> M

Figure 2 — LockAcquireExtended control flow. The two cheap exits (local re-acquire, fast-path grant) avoid shared memory entirely. Only a genuine shared-table conflict drops the backend into JoinWaitQueueWaitOnLockProcSleep. A strong-lock acquisition first transfers every backend’s matching fast-path locks into the shared table so conflict and deadlock detection see them.

The already-held check is the first and cheapest exit:

// LockAcquireExtended — src/backend/storage/lmgr/lock.c (condensed)
locallock = (LOCALLOCK *) hash_search(LockMethodLocalHash, &localtag,
HASH_ENTER, &found);
/* ... initialize if !found ... */
/* If we already hold the lock, just increase the count locally. */
if (locallock->nLocks > 0)
{
GrantLockLocal(locallock, owner);
if (locallock->lockCleared)
return LOCKACQUIRE_ALREADY_CLEAR;
else
return LOCKACQUIRE_ALREADY_HELD;
}

If not already held, the shared-table conflict test is two steps — a cheap mask test against waiters, then the full LockCheckConflicts:

// LockAcquireExtended — shared-table grant decision (condensed)
proclock = SetupLockInTable(lockMethodTable, MyProc, locktag, hashcode, lockmode);
/* ... */
locallock->proclock = proclock;
lock = proclock->tag.myLock;
locallock->lock = lock;
/* If request conflicts with waiters, must queue; else check held locks. */
if (lockMethodTable->conflictTab[lockmode] & lock->waitMask)
found_conflict = true;
else
found_conflict = LockCheckConflicts(lockMethodTable, lockmode, lock, proclock);
if (!found_conflict)
{
GrantLock(lock, proclock, lockmode); /* No conflict — granted */
waitResult = PROC_WAIT_STATUS_OK;
}
else
waitResult = JoinWaitQueue(locallock, lockMethodTable, dontWait);

The “conflicts with waiters” test first (waitMask) implements FIFO fairness: a new requester that could be granted but would jump ahead of an incompatible waiter is made to queue instead — otherwise a stream of compatible weak lockers could starve a waiting strong locker.

GrantLock is the bookkeeping that records the grant in the shared structures:

// GrantLock — src/backend/storage/lmgr/lock.c
void
GrantLock(LOCK *lock, PROCLOCK *proclock, LOCKMODE lockmode)
{
lock->nGranted++;
lock->granted[lockmode]++;
lock->grantMask |= LOCKBIT_ON(lockmode);
if (lock->granted[lockmode] == lock->requested[lockmode])
lock->waitMask &= LOCKBIT_OFF(lockmode);
proclock->holdMask |= LOCKBIT_ON(lockmode);
}

LockCheckConflicts: conflict test with group-locking subtraction

Section titled “LockCheckConflicts: conflict test with group-locking subtraction”
// LockCheckConflicts — src/backend/storage/lmgr/lock.c (condensed)
int conflictMask = lockMethodTable->conflictTab[lockmode];
/* global conflict: does any granted mode conflict with my request? */
if (!(conflictMask & lock->grantMask))
return false; /* nobody conflicts — granted */
/* Subtract out locks I already hold myself. */
myLocks = proclock->holdMask;
for (i = 1; i <= numLockModes; i++)
{
if ((conflictMask & LOCKBIT_ON(i)) == 0) { conflictsRemaining[i] = 0; continue; }
conflictsRemaining[i] = lock->granted[i];
if (myLocks & LOCKBIT_ON(i))
--conflictsRemaining[i];
totalConflictsRemaining += conflictsRemaining[i];
}
if (totalConflictsRemaining == 0)
return false; /* only my own locks conflicted */
/* If no group locking, it's definitely a conflict. */
if (proclock->groupLeader == MyProc && MyProc->lockGroupLeader == NULL)
return true;
/* ... otherwise subtract conflicts held by my parallel lock-group peers ... */

The first AND is the common-case fast test. The subtractions handle two “conflicts that aren’t really conflicts”: a process never conflicts with its own held locks (you can take a stronger lock while holding a weaker one), and — since parallel query — members of the same lock group (a leader and its parallel workers) treat each other’s locks as non-conflicting, with one exception: LOCKTAG_RELATION_EXTEND conflicts even within a group. Group locking exists because a parallel worker would otherwise self-deadlock against a lock its own leader holds; the README §“Group Locking” lays out why predicting workers’ lock requests was rejected in favor of declaring intra-group locks non-conflicting.

A read-mostly workload hammering one table sends every backend into the same partition lock — “measurable even on 2-core servers, and very pronounced as core count increases” (README). The fast path (since 9.2) lets a backend record a weak relation lock (AccessShareLock, RowShareLock, RowExclusiveLock — all below ShareUpdateExclusiveLock) in a private array in its own PGPROC, never touching the shared hash:

// PGPROC fast-path fields — src/include/storage/proc.h
LWLock fpInfoLock; /* protects per-backend fast-path state */
uint64 *fpLockBits; /* lock modes held for each fast-path slot */
Oid *fpRelId; /* slots for rel oids */

Slots are grouped (FP_LOCK_SLOTS_PER_GROUP = 16 per group), and a relation hashes to a group via FAST_PATH_REL_GROUP. Three bits per slot encode which of the three weak modes are held. Eligibility and the grant are:

// EligibleForRelationFastPath + grant — src/backend/storage/lmgr/lock.c (condensed)
#define EligibleForRelationFastPath(locktag, mode) \
((locktag)->locktag_lockmethodid == DEFAULT_LOCKMETHOD && \
(locktag)->locktag_type == LOCKTAG_RELATION && \
(locktag)->locktag_field1 == MyDatabaseId && \
MyDatabaseId != InvalidOid && \
(mode) < ShareUpdateExclusiveLock)
if (EligibleForRelationFastPath(locktag, lockmode) && /* group not full */ )
{
uint32 fasthashcode = FastPathStrongLockHashPartition(hashcode);
LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
if (FastPathStrongRelationLocks->count[fasthashcode] != 0)
acquired = false; /* a strong locker is present — fall back */
else
acquired = FastPathGrantRelationLock(locktag->locktag_field2, lockmode);
LWLockRelease(&MyProc->fpInfoLock);
if (acquired) { locallock->lock = NULL; locallock->proclock = NULL;
GrantLockLocal(locallock, owner); return LOCKACQUIRE_OK; }
}

The correctness hinge is FastPathStrongRelationLocks, a 1024-way array of strong-lock counts (a “1024-way partitioning of the lock space”, README):

// FastPathStrongRelationLockData — src/backend/storage/lmgr/lock.c
#define FAST_PATH_STRONG_LOCK_HASH_PARTITIONS (1 << 10) /* = 1024 */
typedef struct {
slock_t mutex;
uint32 count[FAST_PATH_STRONG_LOCK_HASH_PARTITIONS];
} FastPathStrongRelationLockData;
static volatile FastPathStrongRelationLockData *FastPathStrongRelationLocks;

A weak locker may use the fast path only when its partition’s strong count is zero. When a strong locker (> ShareUpdateExclusiveLock) arrives, it does the reverse: bump the strong count, then transfer every backend’s matching fast-path entries into the shared table so the conflict and deadlock machinery can see them:

sequenceDiagram
    participant W as Weak locker (DML)
    participant FP as MyProc fast-path array
    participant CNT as FastPathStrongRelationLocks
    participant S as Strong locker (DDL)
    participant SHM as Shared LOCK table

    Note over W,CNT: normal case — no strong lock present
    W->>CNT: read count[partition] == 0
    W->>FP: FastPathGrantRelationLock (set bit), return OK

    Note over S,SHM: strong lock arrives
    S->>CNT: BeginStrongLockAcquire: count[partition]++
    S->>FP: FastPathTransferRelationLocks: scan every PGPROC
    FP->>SHM: SetupLockInTable + GrantLock for each matching fp entry
    S->>SHM: now conflict-check against the (now complete) shared table

Figure 3 — Fast-path interaction. The memory-ordering guarantee (README): LWLock acquisition is a sequence point, so a weak locker that holds its own fpInfoLock and reads a zero strong count is safe — the strong locker, to transfer locks, must acquire every backend’s fpInfoLock in turn and will therefore see any weak lock taken after its own count bump. “Deadlock detection does not need to examine the fast-path data structures, because any lock that could possibly be involved in a deadlock must have been transferred to the main tables beforehand.”

When a request conflicts, JoinWaitQueue decides where in LOCK.waitProcs the backend goes. The default is the tail (FIFO), but there is one exception that pre-empts a deadlock: if the requester already holds a lock that conflicts with some earlier waiter’s request, it inserts itself just before that waiter (otherwise ProcLockWakeup would never wake it before the earlier waiter, manufacturing a wait cycle the deadlock detector would have to untangle):

// JoinWaitQueue — src/backend/storage/lmgr/proc.c (condensed)
dclist_foreach(iter, waitQueue)
{
PGPROC *proc = dlist_container(PGPROC, links, iter.cur);
/* Must he wait for me? (my held locks conflict with his request) */
if (lockMethodTable->conflictTab[proc->waitLockMode] & myHeldLocks)
{
/* Must I wait for him? (his held locks conflict with my request) */
if (lockMethodTable->conflictTab[lockmode] & proc->heldLocks)
{
RememberSimpleDeadLock(MyProc, lockmode, lock, proc);
early_deadlock = true; /* immediate, no timeout wait */
break;
}
/* I go before him; if I conflict with nothing ahead, grant now */
if ((lockMethodTable->conflictTab[lockmode] & aheadRequests) == 0 &&
!LockCheckConflicts(lockMethodTable, lockmode, lock, proclock))
{
GrantLock(lock, proclock, lockmode);
return PROC_WAIT_STATUS_OK;
}
insert_before = proc;
break;
}
aheadRequests |= LOCKBIT_ON(proc->waitLockMode);
}

The PGPROC wait fields (waitLock, waitProcLock, waitLockMode, heldLocks) are the outgoing edge of the WFG: they record exactly what this backend is blocked on. Once queued, ProcSleep arms the deadlock_timeout timer and blocks on the backend’s semaphore until granted or signaled:

// ProcSleep — src/backend/storage/lmgr/proc.c (the timeout arming, condensed)
deadlock_state = DS_NOT_YET_CHECKED;
got_deadlock_timeout = false;
/* Set a timer so CheckDeadLock() fires after deadlock_timeout ms. */
/* ... enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout) ... */

When the holder releases, ProcLockWakeup walks the wait queue front-to-back and wakes each waiter whose request now conflicts with neither held locks nor the requests of earlier un-wakable waiters — preserving arrival order for conflicting requests:

// ProcLockWakeup — src/backend/storage/lmgr/proc.c (condensed)
dclist_foreach_modify(miter, waitQueue)
{
PGPROC *proc = dlist_container(PGPROC, links, miter.cur);
LOCKMODE lockmode = proc->waitLockMode;
if ((lockMethodTable->conflictTab[lockmode] & aheadRequests) == 0 &&
!LockCheckConflicts(lockMethodTable, lockmode, lock, proc->waitProcLock))
{
GrantLock(lock, proc->waitProcLock, lockmode);
ProcWakeup(proc, PROC_WAIT_STATUS_OK);
}
else
aheadRequests |= LOCKBIT_ON(lockmode); /* keep blocking those behind */
}

If deadlock_timeout elapses before the lock is granted, the signal handler runs CheckDeadLock, which locks all 16 partitions and calls DeadLockCheck. The graph is the standard WFG: a node per process, an edge A→B when A waits for a lock B holds (or, more subtly, when A is behind B in a wait queue for conflicting requests). PostgreSQL distinguishes two edge kinds:

  • Hard edge — B already holds a lock conflicting with A’s request. This cannot be undone without aborting someone.
  • Soft edge — B is merely ahead of A in a wait queue with a conflicting request, so ProcLockWakeup would grant B first. This can be undone by reordering the wait queue.
flowchart TD
  A["CheckDeadLock fires<br/>(deadlock_timeout elapsed)"] --> B["lock all 16 partitions<br/>in partition order"]
  B --> C{"still on wait queue?"}
  C -- "no (woken meanwhile)" --> Z["release locks, resume"]
  C -- "yes" --> D["DeadLockCheck →<br/>DeadLockCheckRecurse"]
  D --> E["FindLockCycle: DFS along<br/>hard + soft waits-for edges"]
  E --> F{"cycle through<br/>start point?"}
  F -- "no cycle" --> G["DS_NO_DEADLOCK<br/>(maybe DS_BLOCKED_BY_AUTOVACUUM)"]
  F -- "cycle, all hard edges" --> H["DS_HARD_DEADLOCK<br/>→ abort this transaction"]
  F -- "cycle w/ soft edges" --> I["try reversing a soft edge:<br/>TopoSort the wait queue"]
  I --> J{"rearrangement<br/>removes all cycles<br/>w/o making new ones?"}
  J -- "yes" --> K["DS_SOFT_DEADLOCK:<br/>apply new wait order,<br/>ProcLockWakeup"]
  J -- "no (exhausted)" --> H
  G --> Z
  K --> Z

Figure 4 — Deadlock detection state space. The recursive search (DeadLockCheckRecurse) tries each soft-edge reversal as a topological-sort constraint, recursively, until it either finds a queue rearrangement that eliminates every cycle or proves none exists (hard deadlock). Only then is a transaction aborted.

FindLockCycleRecurse is the DFS. It marks visited procs; returning to index 0 (the start proc) is a deadlock cycle, while a cycle that does not include the start point is “somebody else’s deadlock” and ignored (the process that closed that cycle will detect it):

// FindLockCycleRecurse — src/backend/storage/lmgr/deadlock.c (condensed)
for (i = 0; i < nVisitedProcs; i++)
{
if (visitedProcs[i] == checkProc)
{
if (i == 0) /* returned to start: real cycle */
{
nDeadlockDetails = depth;
return true;
}
return false; /* cycle not through start: ignore */
}
}
visitedProcs[nVisitedProcs++] = checkProc;

The hard-edge scan walks the awaited lock’s procLocks for holders with a conflicting holdMask; the soft-edge scan walks the wait queue (or a hypothetical reordered queue) for earlier waiters with conflicting requests:

// FindLockCycleRecurseMember — hard-edge scan (condensed)
dlist_foreach(proclock_iter, &lock->procLocks)
{
PROCLOCK *proclock = dlist_container(PROCLOCK, lockLink, proclock_iter.cur);
proc = proclock->tag.myProc;
leader = proc->lockGroupLeader == NULL ? proc : proc->lockGroupLeader;
if (leader != checkProcLeader) /* never blocks self/group */
for (lm = 1; lm <= numLockModes; lm++)
if ((proclock->holdMask & LOCKBIT_ON(lm)) &&
(conflictMask & LOCKBIT_ON(lm)))
{
if (FindLockCycleRecurse(proc, depth + 1, softEdges, nSoftEdges))
return true; /* hard edge in a cycle */
/* note an autovacuum directly hard-blocking us, for cancel */
if (checkProc == MyProc && proc->statusFlags & PROC_IS_AUTOVACUUM)
blocking_autovacuum_proc = proc;
break;
}
}

Three outcomes follow (DeadLockState): DS_NO_DEADLOCK, DS_SOFT_DEADLOCK (resolved by TopoSort-driven queue rearrangement — see README for the edge-reversal search), and DS_HARD_DEADLOCK (abort the start-point transaction via DeadLockReport, which raises ERROR and never returns). A fourth code, DS_BLOCKED_BY_AUTOVACUUM, is a special “no deadlock, but an autovacuum worker is in our way” signal — the detector is “abused” (README) to implement autovacuum’s low locking priority: the blocked backend can cancel the autovacuum rather than wait.

Two important guarantees the README proves: no deadlock is ever missed by asynchronous checking (the last process to close a cycle will always detect it when its own timeout fires), and the soft-edge rearrangement never creates a new deadlock (it is validated by re-running FindLockCycle before being applied).

LockRelease decrements the LOCALLOCK count; only when it reaches zero does it touch shared memory (or clear the fast-path bit). LockReleaseAll, called at transaction end, sweeps the backend’s LOCALLOCK hash and the fast-path array in one pass, then ProcLockWakeups each freed object’s queue. Because transaction locks live to commit/abort, this is where strict-2PL’s shrinking phase happens — all at once.

Anchor on symbol names, not line numbers. The lmgr source moves between releases; function/struct names are the stable handle. Use git grep -n '<symbol>' src/backend/storage/lmgr/ to locate the current position. Line numbers in the table below were observed at commit 273fe94 (REL_18) and are hints only.

  • struct LOCKTAG, enum LockTagType (in lock.h) — the 16-byte object key and its 12 object types; SET_LOCKTAG_* macros encode each type.
  • struct LOCK (in lock.h) — per-object record: grantMask, waitMask, granted[]/requested[], procLocks, waitProcs.
  • struct PROCLOCK + PROCLOCKTAG (in lock.h) — per-backend-per-object: holdMask, groupLeader, the two dlist_node links.
  • struct LOCALLOCK + LOCALLOCKTAG (in lock.h) — backend-private cache keyed by (object, mode); nLocks, lockOwners[].
  • enum DeadLockState (in lock.h) — the four detector verdicts.
  • lock-mode constants AccessShareLockAccessExclusiveLock, MaxLockMode (in lockdefs.h).
  • struct PGPROC fast-path + wait fields fpRelId/fpLockBits/fpInfoLock, waitLock/waitProcLock/waitLockMode/heldLocks (in proc.h).

Conflict semantics (src/backend/storage/lmgr/lock.c)

Section titled “Conflict semantics (src/backend/storage/lmgr/lock.c)”
  • LockConflicts[] — the 8×8 conflict bit matrix.
  • lock_mode_names[], default_lockmethod, user_lockmethod, LockMethods[] — the lock-method tables.
  • DoLockModesConflict — public conflict test for two modes.
  • LockCheckConflicts — full conflict test with self- and group-subtraction.
  • LockTagHashCode — hash of a LOCKTAG; low 4 bits are the partition.
  • proclock_hash / ProcLockHashCode — forces a PROCLOCK into its LOCK’s partition.
  • LockHashPartition, LockHashPartitionLock (macros in lock.h).
  • LockAcquire / LockAcquireExtended — the acquisition state machine.
  • SetupLockInTable — find/create the shared LOCK and PROCLOCK.
  • GrantLock / GrantLockLocal — record a grant in shared / local state.
  • UnGrantLock / CleanUpLock — the inverse; garbage-collect empty objects.
  • FastPathGrantRelationLock / FastPathUnGrantRelationLock — manipulate the per-backend array.
  • FastPathTransferRelationLocks — flush all backends’ fast-path entries to the shared table when a strong locker arrives.
  • FastPathStrongRelationLockData / FastPathStrongLockHashPartition — the 1024-way strong-lock counter.
  • BeginStrongLockAcquire / FinishStrongLockAcquire / AbortStrongLockAcquire.
  • JoinWaitQueue — choose queue position; detect the immediate (early) deadlock.
  • ProcSleep — arm deadlock_timeout, block on the semaphore.
  • ProcLockWakeup — wake wakable waiters in arrival order on release.
  • CheckDeadLock — timeout handler: lock all partitions, run the detector.
  • RemoveFromWaitQueue (in lock.c) — dequeue a proc (deadlock victim / cancel).
  • DeadLockCheck — entry point; returns a DeadLockState, applies soft fixes.
  • DeadLockCheckRecurse / TestConfiguration — search soft-edge reversals.
  • FindLockCycle / FindLockCycleRecurse / FindLockCycleRecurseMember — the WFG DFS over hard and soft edges.
  • ExpandConstraints / TopoSort — build a hypothetical reordered wait queue.
  • DeadLockReport — raise the ERROR that aborts the victim.
  • InitDeadLockChecking — preallocate worst-case workspace from MaxBackends.
  • LockRelationOid / UnlockRelationOid / LockRelation — the relation-lock entry points most of the backend actually calls.

Position hints (as of 2026-06-05, REL_18 273fe94)

Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”
SymbolFileLine
LockConflicts[]lock.c65
lock_mode_names[]lock.c108
LockMethods[]lock.c150
FastPathStrongRelationLockDatalock.c306
LockTagHashCodelock.c556
proclock_hashlock.c573
DoLockModesConflictlock.c622
LockAcquirelock.c808
LockAcquireExtendedlock.c835
SetupLockInTablelock.c1282
LockCheckConflictslock.c1528
GrantLocklock.c1657
UnGrantLocklock.c1680
CleanUpLocklock.c1737
LockReleaselock.c2070
LockReleaseAlllock.c2275
FastPathGrantRelationLocklock.c2750
FastPathTransferRelationLockslock.c2829
VirtualXactLocklock.c4713
JoinWaitQueueproc.c1173
ProcSleepproc.c1342
ProcLockWakeupproc.c1772
CheckDeadLockproc.c1820
DeadLockCheckdeadlock.c220
DeadLockCheckRecursedeadlock.c312
FindLockCycledeadlock.c446
FindLockCycleRecursedeadlock.c457
FindLockCycleRecurseMemberdeadlock.c536
TopoSortdeadlock.c862
DeadLockReportdeadlock.c1075
LockRelationOidlmgr.c107
struct LOCKTAGlock.h165
struct LOCKlock.h309
struct PROCLOCKlock.h370
struct LOCALLOCKlock.h427
enum DeadLockStatelock.h509
AccessShareLockMaxLockModelockdefs.h36
NUM_LOCK_PARTITIONSlwlock.h97
FP_LOCK_SLOTS_PER_GROUPproc.h98

Each entry leads with a fact about the current source at commit 273fe94 (REL_18_STABLE), readable without any external material. The trailing sentence records how it was checked. Open questions follow as recorded gaps.

  • There are exactly 8 standard lock modes plus NoLock (0). Verified in lockdefs.h: AccessShareLock=1AccessExclusiveLock=8, MaxLockMode=8. MAX_LOCKMODES=10 in lock.h is the array sizing constant (≥ modes + slack), not the mode count. InplaceUpdateTupleLock is #defined to ExclusiveLock, i.e. a reuse, not a 9th mode.

  • DEFAULT and USER lock methods share one conflict table. Both default_lockmethod and user_lockmethod point at the same LockConflicts[] and lock_mode_names[] (lock.c). They differ only in their debug trace flag and in lifetime rules enforced elsewhere; the mode semantics are identical.

  • The shared lock hash is partitioned exactly 16 ways. LOG2_NUM_LOCK_PARTITIONS = 4NUM_LOCK_PARTITIONS = 16 (lwlock.h). The partition is hashcode % 16 (LockHashPartition), and proclock_hash mixes the proc pointer in above the low 4 bits to keep a PROCLOCK in its LOCK’s partition. Confirmed by reading both the macro and proclock_hash.

  • Fast-path eligibility is strictly the three weak relation modes on an unshared relation in the current database. EligibleForRelationFastPath requires DEFAULT_LOCKMETHOD, LOCKTAG_RELATION, locktag_field1 == MyDatabaseId, and mode < ShareUpdateExclusiveLock. ShareUpdateExclusiveLock is excluded because it is self-conflicting; the comment notes it “does not conflict with any of the locks that do, so we can ignore it completely.”

  • The fast-path group count is FP_LOCK_SLOTS_PER_GROUP = 16 slots per group, marked “don’t change”. Verified in proc.h. The number of groups per backend (FastPathLockGroupsPerBackend) is derived at startup from max_locks_per_transaction, up to FP_LOCK_GROUPS_PER_BACKEND_MAX = 1024. This is a PG18-era refinement: earlier releases had a fixed 16-slot array.

  • The strong-lock counter is a 1024-way array, not per-partition-lock. FAST_PATH_STRONG_LOCK_HASH_PARTITIONS = 1 << 10 = 1024 (lock.c), a finer-grained partitioning than the 16 lock-table partitions, so a strong lock on one relation rarely disables the fast path for unrelated relations.

  • Deadlock detection is deferred until deadlock_timeout and runs under all 16 partition locks. ProcSleep arms the timer; CheckDeadLock acquires LockHashPartitionLockByIndex(i) for all i in 0..15 in order before calling DeadLockCheck. Verified by reading both functions. The fixed ascending order is what prevents LWLock deadlock among the partition locks.

  • The detector resolves soft cycles by wait-queue rearrangement before ever aborting. DeadLockCheck returns DS_SOFT_DEADLOCK and rewrites lock->waitProcs from waitOrders[] when a topological reordering removes all cycles; only an unresolvable (all-hard) cycle yields DS_HARD_DEADLOCK and an abort. Verified in DeadLockCheck / DeadLockCheckRecurse.

  • LOCKTAG_RELATION_EXTEND conflicts even within a parallel lock group, and can never be in a deadlock cycle. LockCheckConflicts special-cases it before the group-subtraction loop; FindLockCycleRecurseMember returns early for it. The IsRelationExtensionLockHeld assert enforces that no other heavyweight lock is taken while holding it, so it cannot extend a wait chain.

  • The deadlock detector is reused to implement autovacuum cancellation. A DS_BLOCKED_BY_AUTOVACUUM return with blocking_autovacuum_proc set lets the blocked backend signal-cancel a directly-blocking autovacuum worker rather than wait. Verified in FindLockCycleRecurseMember (the PROC_IS_AUTOVACUUM branch) and GetBlockingAutoVacuumPgproc.

  1. Memory-ordering correctness of the fast-path/strong-lock handshake under weak memory models. The README argues the LWLock-as-sequence-point guarantee on SMP, but the exact set of barriers and whether anything beyond LWLock acquire/release is relied upon on ARM/POWER is not traced here. Investigation path: read BeginStrongLockAcquire/FastPathTransferRelationLocks alongside the LWLock memory-barrier semantics in postgres-lwlock-spinlock.md.

  2. Worst-case cost of soft-edge rearrangement search. DeadLockCheckRecurse recursively tries soft-edge reversals; the README claims combinatorial blowup “shouldn’t be a big problem in practice” because cycles are few and small, but the actual bound as a function of wait-queue length and group size is not established. Investigation path: instrument nCurConstraints / nPossibleConstraints under a synthetic many-way soft-deadlock workload.

  3. Interaction of group locking with non-relation locks under future parallel writes. The README notes that declaring intra-group locks non-conflicting is safe only because parallel mode is currently read-only, and flags GIN page locks as a hazard if parallel writes are enabled. Whether REL_18 has tightened any of this is unverified here. Investigation path: cross-check against access/transam/README.parallel and any PG18 parallel-write work.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”

Pointers, not analysis. Each bullet is a handle for a follow-up doc.

  • A Scalable Lock Manager for Multicores (Jung, Han, Fekete, Heiser, Yeom, SIGMOD 2013) — the anchor paper. It diagnoses MySQL’s throughput collapse as lock-table latch contention and proposes staged lock allocation: split allocation/de-allocation (done in bulk, asynchronously, as garbage collection) from manipulation (done latch-free with RAW memory barriers). PostgreSQL attacks the same bottleneck from the opposite end — partitioning plus a fast path that avoids the shared table for weak locks — rather than making the shared table itself latch-free. A side-by-side study of “avoid the table” vs. “make the table latch-free” would quantify what each gives up.

  • CUBRID’s heavyweight lock manager (cubrid-lock-manager.md) — CUBRID uses lk_res/lk_entry records analogous to PG’s LOCK/PROCLOCK, but takes row-level locks for SERIALIZABLE (first-updater-wins) where PG leans on MVCC plus SSI predicate locks. CUBRID’s victim policy is “most-recently-blocked”; PG aborts the cycle-closing process. A comparison of the two deadlock-victim policies and granularity choices is a natural follow-up.

  • SSI predicate locking (postgres-ssi-predicate-locking.md; Cahill 2008, Ports & Grittner 2012) — the other lock manager. SIReadLocks are a separate subsystem (README-SSI) that detects rw-antidependency cycles to make SI serializable, complementing (not replacing) the heavyweight manager analyzed here. The boundary — what each subsystem is responsible for — is worth a dedicated cross-walk.

  • Wait-die / wound-wait timestamp ordering (Database System Concepts §18.2) — deadlock prevention schemes that PostgreSQL rejected in favor of detection. Yu et al., Staring into the Abyss (VLDB 2015), benchmarks detection vs. prevention at 1000 cores; relevant if PG’s all-partition-lock detection becomes a scaling limit.

  • Lock-table-free / optimistic designs (Hekaton, Silo, Cicada) — in-memory engines that replace the central lock table with per-record metadata or optimistic validation. Orthogonal to PG’s disk-resident design but instructive for separating costs intrinsic to locking from costs intrinsic to a centralized lock table.

  • src/backend/storage/lmgr/README — the authoritative design doc: lock data structures, internal partitioned locking, fast-path locking, the deadlock algorithm (hard/soft edges, topological-sort rearrangement), group locking, user locks, hot-standby locking.

PostgreSQL source (under /data/hgryoo/references/postgres, REL_18 273fe94)

Section titled “PostgreSQL source (under /data/hgryoo/references/postgres, REL_18 273fe94)”
  • src/backend/storage/lmgr/lock.c — the primary lock mechanism.
  • src/backend/storage/lmgr/proc.c — wait queue, ProcSleep, CheckDeadLock.
  • src/backend/storage/lmgr/deadlock.c — the WFG detector and soft-edge solver.
  • src/backend/storage/lmgr/lmgr.c — relation-lock wrappers.
  • src/include/storage/lock.h, src/include/storage/lockdefs.h — structures and lock-mode constants.
  • src/include/storage/proc.h, src/include/storage/lwlock.h — PGPROC fast-path/wait fields and partition constants.

Papers (under knowledge/research/dbms-papers/)

Section titled “Papers (under knowledge/research/dbms-papers/)”
  • scalable-lock-manager.md — Jung et al., A Scalable Lock Manager for Multicores, SIGMOD 2013.

Textbook chapters (under knowledge/research/dbms-general/)

Section titled “Textbook chapters (under knowledge/research/dbms-general/)”
  • Database System Concepts (Silberschatz et al., 7e), ch. 18 “Concurrency Control” — §18.1 lock-based protocols & 2PL, §18.2 deadlock handling (prevention vs. detection, waits-for graph), §18.3 multiple granularity & intention locks.

Sibling docs (cross-references, mechanism not duplicated here)

Section titled “Sibling docs (cross-references, mechanism not duplicated here)”
  • postgres-lwlock-spinlock.md — the lightweight locks the partition latches and fpInfoLock are built from.
  • postgres-ssi-predicate-locking.md — the separate SIReadLock subsystem.
  • postgres-shared-memory-ipc.md — where the shared hashes and PGPROC array live.
  • postgres-backend-lifecycle.md — the ResourceOwner tree behind LOCALLOCK.lockOwners and LockReleaseAll.