Skip to content

PostgreSQL ResourceOwner — Hierarchical Resource Tracking and Error-Safe Release

Contents:

Every database backend that executes a SQL statement acquires a swarm of short-lived, query-lifespan resources: it pins buffers so the buffer manager won’t evict pages out from under a scan, it takes lock-manager locks on the relations and tuples it touches, it bumps reference counts on cached relation descriptors and catalog-cache entries, it opens transient files for sorts and hash spills, it registers snapshots, it allocates JIT-compiled expression modules and DSM segments for parallel workers. Each of these is a claim on a finite shared pool that must be returned, and the central correctness problem is not the happy path — where the executor tidily releases each resource as it finishes — but the error path. A SQL query can fail at essentially any C statement: a palloc hits the memory ceiling, a datatype input function rejects a literal, a unique-index insertion raises a constraint violation, a SIGINT arrives mid-scan. When that happens, the fifty nested C function frames that were holding pins and references are torn down by a longjmp, and none of their carefully-placed “release” calls run. If the resources those frames held were tracked only in the frames’ own local variables, they would leak — a leaked buffer pin pins a page forever, a leaked lock blocks every other backend, a leaked relcache reference corrupts invalidation.

The conceptual answer is the same one that systems languages reach for under the name RAII (Resource Acquisition Is Initialization): bind every resource’s lifetime to the lifetime of some owner object, and make tearing down the owner automatically release everything it owns. C++ does this with stack-allocated objects whose destructors run during stack unwinding; Rust does it with Drop. C has neither destructors nor unwinding, so PostgreSQL builds the equivalent by hand. It interposes, between the resource and the C stack frame that acquired it, a heap-allocated bookkeeping object — the ResourceOwner — that records “this owner now holds this pin / lock / reference.” The owner is not tied to a C stack frame; it is tied to a transaction, subtransaction, or portal, which are precisely the scopes at whose boundaries PostgreSQL wants resources reclaimed. When a transaction ends — commit or abort — its ResourceOwner is walked and every still-held resource is released by a kind-specific callback. The error path then becomes trivial: a longjmp lands in the abort handler, which releases the top transaction’s ResourceOwner, which frees everything every failed frame forgot.

There is a second, subtler theoretical requirement: ordering. Resources are not independent. A pinned buffer is visible to other backends through the shared buffer descriptor’s refcount; a held lock is something other backends are blocked waiting on. If a committing transaction released its locks before its buffer pins, another backend could acquire the freed lock, look at the relation, and find pages still pinned (and thus possibly mid-modification) by the “finished” transaction — a backend that, from the lock’s perspective, has already left. So the release must be phased: everything visible to other backends (pins) goes before the locks; backend-internal cleanup (catalog caches, transient files) goes after. This is the same ordering discipline ARIES (Mohan et al. 1992, captured in knowledge/research/dbms-papers/aries.md) imposes on commit/abort: locks are the last thing released to other transactions, and only after the transaction’s externally-visible state is consistent. The ResourceOwner’s three-phase release is the operational embodiment of that rule inside a single backend.

The standard texts treat this only obliquely. Database System Concepts (Silberschatz 7e, ch. 17–18) discusses transactions as the unit of atomicity and recovery, and Database Internals (Petrov 2019, ch. 5–6) discusses buffer management and lock lifetimes, but neither names the in-backend bookkeeping object that ties resource lifetime to transaction scope. That object is an implementation invention, and PostgreSQL’s resowner/README is explicit that its design was modeled on the MemoryContext API — another PostgreSQL invention that solved the analogous problem for heap allocations. The two are deliberately separate (memory leaks and resource leaks have different usage patterns) but share the same shape: a tree of scopes, a “current” pointer, and a recursive bulk-free at scope exit.

Every serious DBMS confronts the leak-on-error problem; they differ in what language mechanism they lean on and how explicit the bookkeeping is.

Garbage-collected / managed engines (Java H2, Derby; C# engines) get memory reclamation for free but still need explicit release for non-memory resources (locks, file handles, latches). They typically use try/finally blocks or language using/try-with-resources constructs, which run cleanup code during exception propagation. This is RAII-by-language-runtime: the unwinding machinery is built in, and each frame’s finally releases what that frame holds. The cost is that the discipline is distributed — every frame must remember its own cleanup — and a missing finally silently leaks.

C++ engines (much of the storage layer of systems written in modern C++) use RAII directly: a BufferGuard or LockGuard is a stack object whose destructor unpins/unlocks, and C++ stack unwinding during exception propagation runs the destructors automatically. This is elegant and local — the resource and its release live in one type — but it ties resource lifetime to C stack scope, which is not always the scope you want. A buffer that must stay pinned across several function returns but be released at transaction end cannot live in a stack guard; you need a heap-resident owner anyway.

C engines — PostgreSQL, SQLite, and the many engines descended from Berkeley/Ingres lineages — have no destructors and (in PostgreSQL’s case) no true exceptions, only setjmp/longjmp. They therefore cannot rely on the compiler to run cleanup during unwinding, and must maintain an explicit, centralized registry of outstanding resources that a single error handler walks. SQLite, being single-threaded-per-connection and arena-ish, largely sidesteps this by scoping almost everything to the sqlite3 connection object and its prepared statements. PostgreSQL, being a heavily concurrent, multi-resource engine, needs something richer: a forest of owners mirroring the transaction/subtransaction/portal nesting, with a “current owner” global that acquisition primitives consult automatically so call sites don’t have to pass an owner around.

Three design axes recur across these implementations:

  1. Granularity of the owner. Tie it to the C frame (C++ guards), to the statement, or to the transaction/savepoint? PostgreSQL chooses the transaction/subtransaction/portal, because those are the recovery boundaries and the points where it actually wants bulk reclamation.

  2. Phasing of release. Engines that hold locks until end-of-transaction (two-phase locking, which is essentially all of them) must release backend-visible state before locks. Most encode this as an ordered list of cleanup steps in the commit/abort routine; PostgreSQL encodes it declaratively as a per-resource-kind phase + priority and lets the ResourceOwner sort.

  3. The lock special case. Locks held by a subtransaction or portal that commits must not be released — they belong to the enclosing transaction now. So the “release” of a child owner is, for locks, really a transfer to the parent. Engines with savepoints all need this; PostgreSQL builds it into the ResourceOwner’s lock handling directly.

The unifying idea is that resource cleanup is too important to leave to the correctness of fifty hand-written release calls scattered through the executor. Localize the tracking into one module, make acquisition implicitly register with the current owner, and make scope-exit a single recursive sweep. PostgreSQL’s resowner.c is one of the cleaner realizations of that idea in any open-source engine.

PostgreSQL’s ResourceOwner is a heap object in TopMemoryContext (so it outlives any transaction’s memory context and is freed only explicitly), holding a parent pointer and a child list — a forest. Four global owners anchor the forest:

// resowner.c — the four globally-known owners (resowner.c, GLOBAL MEMORY)
ResourceOwner CurrentResourceOwner = NULL;
ResourceOwner CurTransactionResourceOwner = NULL;
ResourceOwner TopTransactionResourceOwner = NULL;
ResourceOwner AuxProcessResourceOwner = NULL;

CurrentResourceOwner is the one that matters at acquisition time: when ReadBuffer pins a page or LockAcquire takes a lock, it records the resource against whatever CurrentResourceOwner points at right now. The README is emphatic that this is NULL outside any transaction (and inside a failed transaction), and that acquiring a query-lifespan resource then is illegal. CurTransactionResourceOwner is the current (sub)transaction’s owner; TopTransactionResourceOwner is the outermost transaction’s owner (the root of the per-transaction subtree); AuxProcessResourceOwner serves non-backend auxiliary processes (checkpointer, walwriter) that have no real transactions but still pin buffers.

The shape of the forest follows the nesting exactly. A subtransaction’s owner is created as a child of its parent’s owner:

// xact.c — StartSubTransaction creates a child owner (xact.c)
s->curTransactionOwner =
ResourceOwnerCreate(s->parent->curTransactionOwner,
"SubTransaction");
CurTransactionResourceOwner = s->curTransactionOwner;
CurrentResourceOwner = s->curTransactionOwner;

A portal’s owner is created as a child of the current transaction’s owner, so that when the portal closes, any locks it still holds become the transaction’s responsibility:

// portalmem.c — CreatePortal hangs the portal owner under the transaction
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
"Portal");

Each tracked resource kind registers a ResourceOwnerDesc that declares when and in what order it is released, plus the callbacks to release and to debug-print it. Buffer pins, for example, are a BEFORE_LOCKS resource (they are visible to other backends) at the buffer-pin priority:

// bufmgr.c — the buffer-pin resource kind descriptor
const ResourceOwnerDesc buffer_pin_resowner_desc =
{
.name = "buffer pin",
.release_phase = RESOURCE_RELEASE_BEFORE_LOCKS,
.release_priority = RELEASE_PRIO_BUFFER_PINS,
.ReleaseResource = ResOwnerReleaseBufferPin,
.DebugPrint = ResOwnerPrintBufferPin
};

The three phases and the built-in priorities are fixed in the header. The phase enum encodes the ordering rule directly — pins and other externally-visible resources are BEFORE_LOCKS, locks are their own phase, and backend-internal caches (catcache, plancache, tupdesc, snapshots, files, wait-event sets) are AFTER_LOCKS:

// resowner.h — the three release phases and selected built-in priorities
typedef enum
{
RESOURCE_RELEASE_BEFORE_LOCKS = 1,
RESOURCE_RELEASE_LOCKS,
RESOURCE_RELEASE_AFTER_LOCKS,
} ResourceReleasePhase;
/* priorities of built-in BEFORE_LOCKS resources */
#define RELEASE_PRIO_BUFFER_IOS 100
#define RELEASE_PRIO_BUFFER_PINS 200
#define RELEASE_PRIO_RELCACHE_REFS 300
/* priorities of built-in AFTER_LOCKS resources */
#define RELEASE_PRIO_CATCACHE_REFS 100
#define RELEASE_PRIO_SNAPSHOT_REFS 500
#define RELEASE_PRIO_FILES 600

The release itself is driven from xact.c, which calls ResourceOwnerRelease on TopTransactionResourceOwner once per phase — the phasing is the caller’s responsibility, with engine cleanup interleaved between phases (catalog invalidation, relcache cleanup) at exactly the right moment relative to lock release. The error-unwind path is the same three calls with isCommit=false. This is the crux of the PG_TRY integration: a longjmp from anywhere lands in AbortTransaction, which runs these three calls and thereby frees every resource the aborted C frames left behind.

The whole design intentionally parallels MemoryContexts: a tree of scopes, a Current* pointer that acquisition primitives consult, and a recursive bulk sweep at scope exit. The difference is that ResourceOwners track typed external claims (each with a release callback), whereas MemoryContexts track raw allocations — which is why the README resists unifying them.

flowchart TD
    TT["TopTransactionResourceOwner<br/>(TopTransaction)"]
    ST["SubTransaction owner<br/>(child of parent xact)"]
    P1["Portal owner<br/>(child of CurTransactionResourceOwner)"]
    P2["Portal owner #2"]
    AUX["AuxProcessResourceOwner<br/>(separate root: checkpointer, walwriter)"]

    TT --> ST
    TT --> P1
    TT --> P2
    ST --> STP["nested Portal under subxact"]

    CUR["CurrentResourceOwner<br/>(global: who owns NEW acquisitions)"]
    CUR -.points at active scope.-> P1

    subgraph claims["resources remembered against an owner"]
        B["buffer pins (BEFORE_LOCKS)"]
        L["lmgr locks (LOCKS phase, lossy cache)"]
        R["relcache / catcache refs"]
        F["transient files, snapshots (AFTER_LOCKS)"]
        A["AIO handles (dlist, critical-section safe)"]
    end
    P1 --> claims

ResourceOwnerData is the heart of the module. Note the three storage regions: a fixed 32-slot array for the most-recent resources, a hash table it spills into, and a separate 15-entry lock cache. The releasing/sorted flags lock the owner against further Remember/Forget once release starts.

// resowner.c — ResourceOwnerData (abridged to the load-bearing fields)
struct ResourceOwnerData
{
ResourceOwner parent; /* NULL if no parent (toplevel owner) */
ResourceOwner firstchild; /* head of linked list of children */
ResourceOwner nextchild; /* next child of same parent */
const char *name; /* name (just for debugging) */
bool releasing; /* release has started; no more Remember */
bool sorted; /* are 'hash' and 'arr' sorted by priority? */
uint8 nlocks; /* number of owned locks */
uint8 narr; /* how many items are stored in the array */
uint32 nhash; /* how many items are stored in the hash */
ResourceElem arr[RESOWNER_ARRAY_SIZE]; /* recent resources (size 32) */
ResourceElem *hash; /* open-addressing spill table */
uint32 capacity; /* allocated length of hash[] */
uint32 grow_at; /* grow hash when reach this */
LOCALLOCK *locks[MAX_RESOWNER_LOCKS]; /* lossy lock cache (size 15) */
dlist_head aio_handles; /* AIO handles, registered in crit sections */
};

A ResourceElem is just { Datum item; const ResourceOwnerDesc *kind; }. The design comment at the top of the file explains the array/hash split: the common case is remember-then-forget-shortly-after (pin a buffer, read a tuple, unpin), which a linear scan of a small array serves cheaply; long-lived or numerous resources spill into the hash. Creation just zeroes the struct in TopMemoryContext and links it under its parent:

// resowner.c — ResourceOwnerCreate
ResourceOwner
ResourceOwnerCreate(ResourceOwner parent, const char *name)
{
ResourceOwner owner;
owner = (ResourceOwner) MemoryContextAllocZero(TopMemoryContext,
sizeof(struct ResourceOwnerData));
owner->name = name;
if (parent)
{
owner->parent = parent;
owner->nextchild = parent->firstchild;
parent->firstchild = owner;
}
dlist_init(&owner->aio_handles);
return owner;
}

Enlarge-then-remember: the reserve-before-acquire contract

Section titled “Enlarge-then-remember: the reserve-before-acquire contract”

Remembering a resource is split into two calls deliberately. ResourceOwnerEnlarge guarantees room before the resource is acquired, because if making room fails (out of memory growing the hash), it must fail before you have an untracked pin in hand:

// resowner.c — ResourceOwnerEnlarge (hash-growth path abridged)
void
ResourceOwnerEnlarge(ResourceOwner owner)
{
if (owner->releasing)
elog(ERROR, "ResourceOwnerEnlarge called after release started");
if (owner->narr < RESOWNER_ARRAY_SIZE)
return; /* no work needed — array has room */
/* array full: ensure the hash has space, growing (doubling) if needed */
if (owner->narr + owner->nhash >= owner->grow_at)
{
uint32 newcap = (owner->capacity > 0) ? owner->capacity * 2
: RESOWNER_HASH_INIT_SIZE;
ResourceElem *newhash = MemoryContextAllocZero(TopMemoryContext,
newcap * sizeof(ResourceElem));
/* ... after this point we assume no failure, so scribble on owner ... */
owner->hash = newhash;
owner->capacity = newcap;
owner->grow_at = RESOWNER_HASH_MAX_ITEMS(newcap);
/* re-hash old entries, pfree old table */
}
/* Drain the 32-slot array into the hash so the array is free again */
for (int i = 0; i < owner->narr; i++)
ResourceOwnerAddToHash(owner, owner->arr[i].item, owner->arr[i].kind);
owner->narr = 0;
}

ResourceOwnerRemember then just appends into the (now guaranteed non-full) array. It asserts that the caller reserved space and that release has not started:

// resowner.c — ResourceOwnerRemember appends to the fast array
void
ResourceOwnerRemember(ResourceOwner owner, Datum value, const ResourceOwnerDesc *kind)
{
Assert(kind->release_phase != 0);
Assert(kind->release_priority != 0);
Assert(!owner->releasing);
Assert(!owner->sorted);
if (owner->narr >= RESOWNER_ARRAY_SIZE)
elog(ERROR, "ResourceOwnerRemember called but array was full");
owner->arr[owner->narr].item = value;
owner->arr[owner->narr].kind = kind;
owner->narr++;
}

The bufmgr pin path shows the contract in practice: ResourceOwnerEnlarge(CurrentResourceOwner) is called up front (alongside ReservePrivateRefCountEntry), and the actual ResourceOwnerRemember of the pin happens later once the pin is secured — the README’s warning “make sure there are no unrelated ResourceOwnerRemember calls between your Enlarge and the Remember you reserved for” is exactly about preserving the one reserved slot.

ResourceOwnerForget searches the array back-to-front (the most-recent slot is the most-likely target), and on a hit swaps the last element down — an O(1) unordered removal. Only if the array misses does it probe the hash. The back-to-front scan plus swap-with-last is what makes the forget-the-just-remembered case fall out for free, which several callers rely on:

// resowner.c — ResourceOwnerForget (array scan; hash probe abridged)
void
ResourceOwnerForget(ResourceOwner owner, Datum value, const ResourceOwnerDesc *kind)
{
if (owner->releasing)
elog(ERROR, "ResourceOwnerForget called for %s after release started", kind->name);
Assert(!owner->sorted);
/* Search the array first, newest-first */
for (int i = owner->narr - 1; i >= 0; i--)
{
if (owner->arr[i].item == value && owner->arr[i].kind == kind)
{
owner->arr[i] = owner->arr[owner->narr - 1]; /* swap last down */
owner->narr--;
return;
}
}
/* else probe the open-addressing hash, NULL out the slot, nhash-- */
/* ... */
elog(ERROR, "%s %p is not owned by resource owner %s",
kind->name, DatumGetPointer(value), owner->name);
}

ResourceOwnerRelease is a thin wrapper over ResourceOwnerReleaseInternal, which is where the phasing, recursion, and lock special-casing live. Two structural facts dominate it: (1) it recurses into children first so that a portal/subxact is fully released before its parent within each phase; (2) on the first call it sets releasing and sorts the resources by phase+priority, after which no more Remember/Forget is allowed.

// resowner.c — ResourceOwnerReleaseInternal (children-first recursion + sort)
static void
ResourceOwnerReleaseInternal(ResourceOwner owner, ResourceReleasePhase phase,
bool isCommit, bool isTopLevel)
{
ResourceOwner child;
ResourceOwner save;
/* Recurse to handle descendants before self */
for (child = owner->firstchild; child != NULL; child = child->nextchild)
ResourceOwnerReleaseInternal(child, phase, isCommit, isTopLevel);
if (!owner->releasing)
{
Assert(phase == RESOURCE_RELEASE_BEFORE_LOCKS);
owner->releasing = true;
}
if (!owner->sorted)
{
ResourceOwnerSort(owner); /* sort by reverse phase+priority */
owner->sorted = true;
}
/* Make the release callbacks see the owner being released as current */
save = CurrentResourceOwner;
CurrentResourceOwner = owner;
/* ... per-phase work below ... */
CurrentResourceOwner = save;
}

The per-phase body is where the ordering rule is enforced. BEFORE_LOCKS releases the externally-visible resources (and drains AIO handles); AFTER_LOCKS releases backend-internal ones — both via the sorted ResourceOwnerReleaseAll. The LOCKS phase is special:

// resowner.c — the LOCKS phase: bulk for top xact, transfer-or-release for children
if (phase == RESOURCE_RELEASE_LOCKS)
{
if (isTopLevel)
{
/* top xact: drop ALL locks in one lmgr call at the top of recursion */
if (owner == TopTransactionResourceOwner)
{
ProcReleaseLocks(isCommit);
ReleasePredicateLocks(isCommit, false);
}
}
else
{
/* subxact/portal: hand this owner's locks to the lock manager */
LOCALLOCK **locks;
int nlocks;
if (owner->nlocks > MAX_RESOWNER_LOCKS) /* cache overflowed */
locks = NULL, nlocks = 0; /* lmgr scans its own table */
else
locks = owner->locks, nlocks = owner->nlocks;
if (isCommit)
LockReassignCurrentOwner(locks, nlocks); /* transfer to parent */
else
LockReleaseCurrentOwner(locks, nlocks); /* truly release */
}
}

This is the lock special case the README and theory both demand: on commit of a subtransaction or portal, locks are reassigned to the parent (they must outlive the child up to end-of-transaction); on abort, they are genuinely released. The 15-entry owner->locks cache is a lossy fast path: if a child held ≤15 locks it can be reassigned/released directly from the cache, but if it overflowed, the code passes NULL and the lock manager falls back to scanning its own local-lock hash table.

ResourceOwnerSort consolidates the array and hash into one contiguous run and qsorts it by resource_priority_cmp, which orders by phase then priority in reverse, so that ResourceOwnerReleaseAll can release from the tail and stop as soon as it crosses into the next phase:

// resowner.c — resource_priority_cmp orders reverse so release walks from the end
static int
resource_priority_cmp(const void *a, const void *b)
{
const ResourceElem *ra = a;
const ResourceElem *rb = b;
/* Note: reverse order */
if (ra->kind->release_phase == rb->kind->release_phase)
return pg_cmp_u32(rb->kind->release_priority, ra->kind->release_priority);
else if (ra->kind->release_phase > rb->kind->release_phase)
return -1;
else
return 1;
}

ResourceOwnerReleaseAll then walks from nitems-1 downward, invoking each kind’s ReleaseResource(value) callback, and — when printLeakWarnings is set (i.e. on commit, where the executor should have released everything itself) — emits WARNING: resource was not closed: ... using the kind’s DebugPrint. On abort, leaks are expected and silent; that asymmetry is the README’s “at commit the owner should be empty; at abort we truly rely on this mechanism.”

Locks bypass the array/hash entirely and live in the 15-slot cache, populated by ResourceOwnerRememberLock. The cache is intentionally lossy — once it overflows it stops tracking, trading exact accounting for cheap bulk release/reassign:

// resowner.c — ResourceOwnerRememberLock: lossy 15-entry cache
void
ResourceOwnerRememberLock(ResourceOwner owner, LOCALLOCK *locallock)
{
Assert(locallock != NULL);
if (owner->nlocks > MAX_RESOWNER_LOCKS)
return; /* already overflowed: stop tracking */
if (owner->nlocks < MAX_RESOWNER_LOCKS)
owner->locks[owner->nlocks] = locallock;
else
{
/* overflowed (nlocks becomes MAX+1, a sentinel) */
}
owner->nlocks++;
}

AIO handles get their own dlist because they may be remembered inside critical sections, where the normal ResourceOwnerEnlarge (which can palloc) is forbidden — so they use a no-allocation push/pop (ResourceOwnerRememberAioHandle/ForgetAioHandle) and are drained in the BEFORE_LOCKS phase via pgaio_io_release_resowner. Finally, ResourceOwnerDelete frees the owner object itself, but only after every resource is gone — it asserts the array, hash, and lock count are empty (the lock count may legitimately be the overflow sentinel) and recursively deletes children:

// resowner.c — ResourceOwnerDelete asserts emptiness, recurses, frees
void
ResourceOwnerDelete(ResourceOwner owner)
{
Assert(owner != CurrentResourceOwner);
Assert(owner->narr == 0);
Assert(owner->nhash == 0);
Assert(owner->nlocks == 0 || owner->nlocks == MAX_RESOWNER_LOCKS + 1);
while (owner->firstchild != NULL)
ResourceOwnerDelete(owner->firstchild); /* child delinks itself */
ResourceOwnerNewParent(owner, NULL); /* delink from parent */
if (owner->hash)
pfree(owner->hash);
pfree(owner);
}

How xact.c drives it — the PG_TRY error-unwind tie-in

Section titled “How xact.c drives it — the PG_TRY error-unwind tie-in”

The phasing is the caller’s job. CommitTransaction issues the three ResourceOwnerRelease calls with isCommit=true, interleaving engine cleanup between phases — note RESOURCE_RELEASE_BEFORE_LOCKS runs, then AtEOXact_Buffers/AtEOXact_RelationCache/AtEOXact_Inval run, then the LOCKS and AFTER_LOCKS phases — so catalog invalidation is published while locks are still held:

// xact.c — CommitTransaction: phased release with cleanup interleaved
CurrentResourceOwner = NULL;
ResourceOwnerRelease(TopTransactionResourceOwner,
RESOURCE_RELEASE_BEFORE_LOCKS, true, true);
AtEOXact_Buffers(true);
AtEOXact_RelationCache(true);
AtEOXact_Inval(true); /* publish catalog invalidations under lock */
AtEOXact_MultiXact();
ResourceOwnerRelease(TopTransactionResourceOwner,
RESOURCE_RELEASE_LOCKS, true, true);
ResourceOwnerRelease(TopTransactionResourceOwner,
RESOURCE_RELEASE_AFTER_LOCKS, true, true);

AbortTransaction runs the same three phases with isCommit=false. This is the payoff of the whole design: when a longjmp from a failing C frame lands in the abort path (via PG_TRY/sigsetjmp in the top-level loop), the executor’s own release calls never ran, so the ResourceOwner is the only thing that knows about the still-held pins, locks, and references — and these three calls free them all:

// xact.c — AbortTransaction: same three phases, isCommit=false (error unwind)
if (TopTransactionResourceOwner != NULL)
{
CallXactCallbacks(XACT_EVENT_ABORT);
ResourceOwnerRelease(TopTransactionResourceOwner,
RESOURCE_RELEASE_BEFORE_LOCKS, false, true);
AtEOXact_Buffers(false);
AtEOXact_RelationCache(false);
AtEOXact_Inval(false);
ResourceOwnerRelease(TopTransactionResourceOwner,
RESOURCE_RELEASE_LOCKS, false, true);
ResourceOwnerRelease(TopTransactionResourceOwner,
RESOURCE_RELEASE_AFTER_LOCKS, false, true);
}

CleanupTransaction then calls ResourceOwnerDelete(TopTransactionResourceOwner) to free the (now-empty) owner objects and nulls the three globals. The ResourceOwnerReleaseInternal re-entrancy comment (“if an error happens between the release phases, we might get called again for the same ResourceOwner from AbortTransaction”) is what makes a failure during commit’s release safely fall through to abort’s release without double-sorting.

flowchart TD
    ERR["error: ereport(ERROR) / elog(ERROR)<br/>anywhere in executor C frames"]
    LJ["siglongjmp to PG_exception_stack<br/>(set by PG_TRY/sigsetjmp in main loop)"]
    AB["AbortTransaction()"]
    P1["ResourceOwnerRelease(BEFORE_LOCKS, isCommit=false)<br/>→ unpin buffers, drain AIO"]
    EC["AtEOXact_Buffers / RelationCache / Inval"]
    P2["ResourceOwnerRelease(LOCKS, false)<br/>→ ProcReleaseLocks (truly release)"]
    P3["ResourceOwnerRelease(AFTER_LOCKS, false)<br/>→ drop catcache/files/snapshots"]
    CL["CleanupTransaction → ResourceOwnerDelete<br/>free owner tree, null globals"]

    ERR --> LJ --> AB --> P1 --> EC --> P2 --> P3 --> CL
    P1 -. children-first recursion .-> P1

Position hints (as of 2026-06-06, REL_18 273fe94)

Section titled “Position hints (as of 2026-06-06, REL_18 273fe94)”
SymbolFileLine
struct ResourceOwnerDatasrc/backend/utils/resowner/resowner.c112
RESOWNER_ARRAY_SIZE (32)src/backend/utils/resowner/resowner.c73
MAX_RESOWNER_LOCKS (15)src/backend/utils/resowner/resowner.c107
CurrentResourceOwner (globals)src/backend/utils/resowner/resowner.c173
resource_priority_cmpsrc/backend/utils/resowner/resowner.c269
ResourceOwnerSortsrc/backend/utils/resowner/resowner.c292
ResourceOwnerReleaseAllsrc/backend/utils/resowner/resowner.c348
ResourceOwnerCreatesrc/backend/utils/resowner/resowner.c421
ResourceOwnerEnlargesrc/backend/utils/resowner/resowner.c452
ResourceOwnerRemembersrc/backend/utils/resowner/resowner.c524
ResourceOwnerForgetsrc/backend/utils/resowner/resowner.c564
ResourceOwnerReleasesrc/backend/utils/resowner/resowner.c658
ResourceOwnerReleaseInternalsrc/backend/utils/resowner/resowner.c678
ResourceOwnerReleaseAllOfKindsrc/backend/utils/resowner/resowner.c818
ResourceOwnerDeletesrc/backend/utils/resowner/resowner.c871
ResourceOwnerNewParentsrc/backend/utils/resowner/resowner.c914
CreateAuxProcessResourceOwnersrc/backend/utils/resowner/resowner.c999
ResourceOwnerRememberLocksrc/backend/utils/resowner/resowner.c1062
ResourceOwnerForgetLocksrc/backend/utils/resowner/resowner.c1082
ResourceOwnerRememberAioHandlesrc/backend/utils/resowner/resowner.c1104
ResourceReleasePhase enumsrc/include/utils/resowner.h52
ResourceOwnerDesc structsrc/include/utils/resowner.h91
buffer_pin_resowner_descsrc/backend/storage/buffer/bufmgr.c244
Subxact owner createsrc/backend/access/transam/xact.c1293
Portal owner createsrc/backend/utils/mmgr/portalmem.c205
CommitTransaction releasesrc/backend/access/transam/xact.c2411
AbortTransaction releasesrc/backend/access/transam/xact.c2967
CleanupTransaction deletesrc/backend/access/transam/xact.c3027
  • A ResourceOwner is allocated in TopMemoryContext and freed only explicitly. Verified in ResourceOwnerCreate (MemoryContextAllocZero(TopMemoryContext, ...)) and ResourceOwnerDelete (pfree(owner)). This is why owners survive the reset of any per-transaction memory context — the resource accounting must outlive the memory it tracks.

  • The forest mirrors transaction/portal nesting: subxact owner is a child of its parent xact’s owner; portal owner is a child of CurTransactionResourceOwner. Verified in xact.c StartSubTransaction (ResourceOwnerCreate(s->parent->curTransactionOwner, "SubTransaction")) and portalmem.c CreatePortal (ResourceOwnerCreate(CurTransactionResourceOwner, "Portal")). The README’s “any remaining [portal] resources become the responsibility of the current transaction” is implemented by this parent linkage.

  • Release runs in exactly three phases, driven three times by the caller, with children released before parents within each phase. Verified in ResourceOwnerReleaseInternal (children-first for loop over firstchild) and the RESOURCE_RELEASE_BEFORE_LOCKS / LOCKS / AFTER_LOCKS branch structure, plus the three ResourceOwnerRelease calls in both CommitTransaction and AbortTransaction.

  • Buffer pins are BEFORE_LOCKS and released before locks. Verified in bufmgr.c buffer_pin_resowner_desc (.release_phase = RESOURCE_RELEASE_BEFORE_LOCKS, .release_priority = RELEASE_PRIO_BUFFER_PINS). The README’s rationale is that pins are visible to other backends, so they must be gone before a lock another backend waits on is released.

  • On subtransaction/portal commit, locks are reassigned to the parent, not released; on abort they are released. Verified in the RESOURCE_RELEASE_LOCKS non-top-level branch: isCommit ? LockReassignCurrentOwner(...) : LockReleaseCurrentOwner(...). The README’s “release operation on a child transfers lock ownership to the parent if isCommit is true” matches exactly.

  • The lock cache holds at most MAX_RESOWNER_LOCKS (15) entries and is lossy on overflow. Verified in ResourceOwnerRememberLock (returns early when nlocks > MAX_RESOWNER_LOCKS; sentinel nlocks == MAX+1) and the LOCKS phase passing locks = NULL when overflowed so lmgr scans its own table. The 15 value and its pg_dump-derived justification are in the MAX_RESOWNER_LOCKS comment.

  • The fast store is a 32-slot array spilling into an open-addressing hash; Enlarge must be called before Remember. Verified by RESOWNER_ARRAY_SIZE 32, ResourceOwnerEnlarge (grows/drains before returning), and ResourceOwnerRemember (elog(ERROR, "...array was full") if you skipped Enlarge). The bufmgr pin path calls ResourceOwnerEnlarge(CurrentResourceOwner) up front (bufmgr.c ~692/2023/2366).

  • Release sorts in reverse priority and walks from the tail, stopping at the phase boundary. Verified in resource_priority_cmp (/* Note: reverse order */) and ResourceOwnerReleaseAll (while (nitems > 0) { ... if (kind->release_phase > phase) break; ... nitems--; }).

  • Commit warns on leaked resources; abort is silent. Verified in ResourceOwnerReleaseAll: printLeakWarnings (passed as isCommit) gates the elog(WARNING, "resource was not closed: %s", ...). The README states the owner should be empty at commit but it is normal to have resources at abort.

  • AIO handles use a dlist, not the ResourceElem array, because they may be remembered in critical sections. Verified in ResourceOwnerRememberAioHandle (dlist_push_tail) and the BEFORE_LOCKS drain loop calling pgaio_io_release_resowner. The struct comment names the critical-section constraint explicitly.

  1. How often the 15-entry lock cache actually overflows in real OLTP. The MAX_RESOWNER_LOCKS comment cites 9.2-era pg_dump measurements (≤9 locks per non-top owner). Whether modern partitioned schemas with hundreds of per-partition locks routinely overflow the top owner’s cache (forcing the slower lmgr-hash scan at commit) is workload-dependent and unmeasured here. Investigation path: instrument ResourceOwnerRememberLock’s overflow branch under a partition-heavy benchmark.

  2. The cost of the array→hash spill threshold (32) for wide executor trees. A deep plan pinning many buffers simultaneously crosses the 32-slot array into the hash, paying a re-hash and losing the cheap linear forget. Whether RESOWNER_ARRAY_SIZE is still well-tuned for current executor pin counts is not established from the code. Investigation path: trace narr high-water marks across TPC-style queries.

  3. Whether ResourceOwnerReleaseAllOfKind (the retail bulk-release used by, e.g., snapshot/relcache reset) interacts cleanly with a subsequent normal phased release. It temporarily sets releasing without sorting; the re-entrancy comment in ResourceOwnerReleaseInternal suggests the interplay is intentional, but the exact set of callers that mix the two on one owner is not enumerated here.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”
  • CUBRID. CUBRID does not have a single unifying ResourceOwner abstraction; resource cleanup is distributed across its transaction descriptor (LOG_TDES), its page-fix bookkeeping in the page buffer (pgbuf), and its lock manager’s per-transaction lock entry list. Buffer fixes are tracked per thread/transaction and unfixed at the end of a request or on rollback; locks are chained off the transaction descriptor and released in lock_unlock_all. The effect is the same — everything is reclaimed at transaction end and on error unwinding through er_set/setjmp-style error handling — but the bookkeeping is per-subsystem rather than funneled through one owner object with declarative release phases. PostgreSQL’s single phase+priority-sorted owner is notably more uniform; CUBRID trades that uniformity for subsystem-local control. (See the CUBRID code-analysis tree under knowledge/code-analysis/cubrid/ for the lock-manager and buffer details.)

  • C++ storage engines (RocksDB-style, InnoDB). These lean on RAII guard objects (mtr_t mini-transactions in InnoDB; unique_ptr/scope guards in RocksDB) whose destructors run during C++ stack unwinding. The lifetime is lexical — bound to the C frame — which is cleaner for resources that genuinely have lexical scope but cannot express “pin held across many frames, released at transaction end” without a heap-resident owner. InnoDB’s mini-transaction is the closest analog to a scoped ResourceOwner, but it is statement/operation-scoped rather than transaction-scoped, and locks are tracked separately in the trx_t.

  • Managed-runtime engines. Java/C# engines get memory reclamation from GC and use try-with-resources/using for the rest, distributing cleanup across frames. The trade-off versus PostgreSQL’s centralized owner is the classic one: language-integrated unwinding is ergonomic but each frame must remember its own cleanup, whereas a central registry makes a single sweep authoritative at the cost of an explicit Remember/Forget protocol on every acquisition.

  • Relation to the MemoryContext lineage. The README states the ResourceOwner API was modeled on MemoryContexts, and the parallel is the research-frontier-relevant observation: PostgreSQL discovered that a tree-of-scopes + current-pointer + recursive-bulk-free pattern, proven for heap allocations, generalizes to any reclaimable claim. The deliberate decision not to unify them (different usage patterns: allocations are untyped and enormously frequent; resources are typed, callback-bearing, and fewer) is a small but instructive design judgment — the same abstraction shape, instantiated twice with different cost models.

  • Error handling as the real driver. The deeper point, echoing ARIES’s insistence on disciplined commit/abort ordering, is that the ResourceOwner exists because PostgreSQL chose setjmp/longjmp over per-frame cleanup. An engine built on a language with destructors or checked exceptions might never invent it. It is the absence of automatic unwinding in C, combined with the need for transaction-scoped (not frame-scoped) lifetimes, that makes a centralized, phase-ordered owner the right answer here. See postgres-error-handling.md for the PG_TRY/sigsetjmp machinery that this module is the cleanup half of.

In-tree READMEs and source files (REL_18_STABLE, commit 273fe94)

Section titled “In-tree READMEs and source files (REL_18_STABLE, commit 273fe94)”
  • src/backend/utils/resowner/README — the design document: the MemoryContext-modeled rationale, the forest/parent-transfer semantics, the lock special case, the “adding a new resource type” recipe, and the three-phase release ordering with the worked parent/child priority example.
  • src/include/utils/resowner.hResourceOwner opaque type, the four global owners, ResourceReleasePhase, the RELEASE_PRIO_* built-in priorities, ResourceOwnerDesc, and the full exported function surface (Create/Release/Delete/Enlarge/Remember/Forget/RememberLock/…).
  • src/backend/utils/resowner/resowner.cResourceOwnerData, the array/hash store, Enlarge/Remember/Forget, ResourceOwnerSort + resource_priority_cmp, ResourceOwnerReleaseInternal (the three-phase recursion and lock transfer), the lock cache, AIO handles, and the aux-process owner.
  • src/backend/access/transam/xact.c — the driver: StartSubTransaction owner creation, and the three phased ResourceOwnerRelease calls in CommitTransaction / AbortTransaction, with ResourceOwnerDelete in CleanupTransaction.
  • src/backend/utils/mmgr/portalmem.cCreatePortal hangs the portal’s owner under CurTransactionResourceOwner.
  • src/backend/storage/buffer/bufmgr.cbuffer_pin_resowner_desc / buffer_io_resowner_desc and the ResourceOwnerEnlarge reserve-before-pin call sites.
  • src/backend/storage/lmgr/lock.cResourceOwnerRememberLock / ForgetLock call sites and LockReassignCurrentOwner / LockReleaseCurrentOwner used by the LOCKS phase.
  • Mohan, C. et al. (1992). “ARIES: A Transaction Recovery Method…” ACM TODS 17(1):94-162. The discipline of releasing locks last and only after externally-visible state is consistent — the principle the three-phase release embodies. Captured in knowledge/research/dbms-papers/aries.md.
  • Database System Concepts (Silberschatz, Korth, Sudarshan, 7e), ch. 17-18 — transactions as the unit of atomicity/recovery, the scope at whose boundary resources are reclaimed (knowledge/research/dbms-general/).
  • Database Internals (Petrov 2019), ch. 5-6 — buffer management and lock lifetimes; the resource claims a ResourceOwner tracks (knowledge/research/dbms-general/).
  • RAII / scope-bound resource management (Stroustrup, the C++ idiom) — the language-runtime alternative PostgreSQL emulates by hand because C lacks destructors and true exceptions.

Cross-references within this knowledge base

Section titled “Cross-references within this knowledge base”
  • postgres-memory-contexts.md — the sibling allocator the ResourceOwner API was modeled on; tree-of-scopes + current-pointer + recursive bulk-free.
  • postgres-error-handling.md — PG_TRY/sigsetjmp/ereport; the unwinding half whose cleanup this module performs.
  • postgres-buffer-manager.md — buffer pins, the canonical BEFORE_LOCKS resource, and the ReservePrivateRefCountEntry + ResourceOwnerEnlarge reserve protocol.
  • postgres-lock-manager.mdLOCALLOCK, LockReassignCurrentOwner, and the local-lock hash that backs the lossy lock cache.
  • postgres-xact.md — transaction/subtransaction state machine that creates and drives the per-transaction owners.
  • postgres-portals-prepared.md — portals, whose owners are children of the current transaction’s owner.
  • postgres-aio.md — asynchronous I/O handles tracked via the owner’s dlist.
  • postgres-overview-base-infra.md — where ResourceOwner sits in the base-infrastructure layer.