CUBRID File & Disk Manager — Code-Level Deep Dive

Where this document fits: The high-level analysis cubrid-disk-manager.md covers design intent and theoretical background for both the file and disk managers. This document traces every branch and field at the code level, centred on file_manager.c with the disk manager as its substrate. Each chapter is self-contained, but reading in order follows the full lifecycle of a single data page — from reserved sector to owning file — inside the kernel.

Contents:

Ch	Title	Status
1	Data-Structure Map	✅
2	Initialization and Memory Management	✅
3	Volume Format and the Sector Allocation Table	✅
4	Sector Reservation Two-Step Protocol	✅
5	Volume Extension as a Nested Top Action	✅
6	File Creation and the Three-Table Layout	✅
7	Permanent File Page Allocation	✅
8	Temporary File Page Allocation	✅
9	Page Deallocation and File Destruction	✅
10	Numerable Files and the User Page Table	✅
11	Special Paths Tempcache Tracker Sticky Page TDE and Recovery	✅

Chapter 1: Data-Structure Map

This chapter is the field dictionary for the whole document; later chapters trace operations over these structures without re-explaining a field. The reader question: what are all the structures the disk and file managers share, and what does every field mean? For design rationale, read the companion cubrid-disk-manager.md (“Volume layout”, “File architecture”, “Permanent vs temporary purpose split”); this chapter assumes that theory and only names fields.

Two boundaries organize everything. The disk/file boundary: the disk manager owns volumes and hands out sectors (64-page extents); the file manager carves pages from them. The on-disk/in-memory boundary: some structures persist byte-for-byte in pages (disk_volume_header, file_header, file_extensible_data, file_partial_sector); others live only in server heap to summarize or coordinate them (disk_cache, disk_extend_info, disk_stab_cursor, disk_reserve_context).

1.1 The disk side, at a glance

flowchart TB
  subgraph ondisk["On disk (one per volume)"]
    VH["disk_volume_header<br/>page 0 of every volume"]
    STAB["sector allocation table<br/>bitmap pages, 1 bit per sector"]
  end
  subgraph mem["In memory (one disk_Cache, process-wide)"]
    DC["disk_cache"]
    DC --> VOLS["vols[LOG_MAX_DBVOLID+1]<br/>per-volume disk_cache_volinfo"]
    DC --> PERM["perm_purpose_info<br/>disk_perm_info"]
    DC --> TEMP["temp_purpose_info<br/>disk_temp_info"]
    PERM --> PEI["extend_info: disk_extend_info"]
    TEMP --> TEI["extend_info: disk_extend_info"]
  end
  subgraph transient["Transient (per reserve call / per iteration)"]
    RC["disk_reserve_context"]
    RC --> CVR["cache_vol_reserve[]:<br/>disk_cache_vol_reserve"]
    CUR["disk_stab_cursor"]
  end
  VH -. "cached as" .-> VOLS
  STAB -. "walked by" .-> CUR
  DC -. "drained into" .-> RC

Figure 1-1. Disk-side structure relationships. disk_Cache is the single in-memory summary of all volumes; disk_reserve_context and disk_stab_cursor are transient scratch used while reserving sectors and walking the bitmap.

`disk_volume_header` — the persisted page-0 of every volume

The only disk-manager structure with variable size: it ends in a var_fields[1] flexible region holding the full volume path strings, so sizeof is never used on it (note the literal comment DON'T USE sizeof on this structure).

// disk_volume_header -- src/storage/disk_manager.c
struct disk_volume_header
{
  char magic[CUBRID_MAGIC_MAX_LENGTH];  /* magic for file/magic Unix utility; DON'T MOVE */
  INT16 iopagesize;
  INT16 volid;
  INT8 db_charset;
  INT8 dummy1;
  DB_VOLPURPOSE purpose;
  DB_VOLTYPE type;
  DKNPAGES sect_npgs;           /* pages per sector (== DISK_SECTOR_NPAGES = 64) */
  DKNSECTS nsect_total;
  DKNSECTS nsect_max;
  SECTID hint_allocsect;
  DKNPAGES stab_npages;
  PAGEID stab_first_page;
  PAGEID sys_lastpage;
  INT32 dummy2;
  INT64 db_creation;
  INT64 vol_creation;
  LOG_LSA chkpt_lsa;
  HFID boot_hfid;
  INT32 reserved0; INT32 reserved1; INT32 reserved2; INT32 reserved3;
  INT16 next_volid;
  INT16 offset_to_vol_fullname;
  INT16 offset_to_next_vol_fullname;
  INT16 offset_to_vol_remarks;
  char var_fields[1];           /* variable: vol_fullname, next_vol_fullname, remarks */
};

Field	Role	Why it exists
`magic`	Fixed signature at byte 0	`file`/`magic(5)` and CUBRID’s own check identify a volume by it; must not move.
`iopagesize`	IO page size at format	Sanity check only; authoritative size is in the log.
`volid`	This volume’s id	Self-identification; traces a stray page to its volume.
`db_charset`	Database charset code	Volume must match db charset; checked at attach.
`dummy1`, `dummy2`	Alignment padding.	—
`purpose`	Permanent vs temporary data purpose	Picks which rollup the free space feeds (§1.2).
`type`	Permanent vs temporary volume type	Differs from purpose: a perm-typed volume may serve temp purpose.
`sect_npgs`	Pages per sector	Always 64; stored so the format is self-describing.
`nsect_total`	Sectors currently formatted	Upper bound for sector ids that physically exist now.
`nsect_max`	Max sectors after all extension	Sizes the allocation table once so the bitmap never moves.
`hint_allocsect`	Next sector to scan	Skips known-full prefix of the bitmap.
`stab_npages`	Table length in pages	`DISK_STAB_NPAGES(nsect_max)`; bounds the cursor walk.
`stab_first_page`	First bitmap page id	Table starts after the header; cursor maps offsets via this.
`sys_lastpage`	Last system page	Everything `<= sys_lastpage` is header+table; user sectors follow.
`db_creation`	DB creation timestamp	Replicated everywhere so a foreign volume can’t be attached.
`vol_creation`	This volume’s creation time	Per-volume provenance.
`chkpt_lsa`	Recovery start LSA	Recovery skips older log records for this volume.
`boot_hfid`	Boot/system heap file id	Bootstraps multivolume access.
`reserved0..3`	Four spare `INT32` for forward-compatible growth without an offset change.	—
`next_volid`	Link to next volume	The volume set is a singly linked chain.
`offset_to_vol_fullname`	Offset within `var_fields`	This volume’s path string.
`offset_to_next_vol_fullname`	Offset within `var_fields`	Next volume’s path (chain followed without a catalog).
`offset_to_vol_remarks`	Offset within `var_fields`	Free-text remarks.
`var_fields`	Flexible tail	Holds the three strings; length is `DB_PAGESIZE` minus the byte offset of `var_fields` within the page.

Invariant — the sector allocation table is sized once, for nsect_max, never for nsect_total. stab_npages == DISK_STAB_NPAGES(nsect_max) and sys_lastpage cover header plus the full table at creation. Extension (Ch.5) raises nsect_total toward nsect_max but never touches stab_first_page/stab_npages. If the table could move, every cached disk_stab_cursor.pageid and reserved VSID would dangle.

`disk_cache_volinfo`, `disk_extend_info`, `disk_perm_info`, `disk_temp_info`, `disk_cache`

These five form the in-memory free-space summary. disk_cache is the root; there is exactly one (static DISK_CACHE *disk_Cache).

// disk_cache_volinfo -- src/storage/disk_manager.c
struct disk_cache_volinfo
{
  DB_VOLPURPOSE purpose;
  DKNSECTS nsect_free;          /* hint of free sectors on this volume */
};

Field (`disk_cache_volinfo`)	Role	Why it exists
`purpose`	Per-volume purpose (perm/temp)	Classifies `vols[volid]` without reading the volume header.
`nsect_free`	Per-volume free-sector hint	Fast per-volume estimate; the bitmap holds the authoritative count, this is a cache hint.

// disk_extend_info -- src/storage/disk_manager.c
struct disk_extend_info
{
  volatile DKNSECTS nsect_free;      /* free sectors across all volumes of this purpose */
  volatile DKNSECTS nsect_total;
  volatile DKNSECTS nsect_max;
  volatile DKNSECTS nsect_intention; /* sectors a thread intends to add by extending */
  pthread_mutex_t mutex_reserve;
#if !defined (NDEBUG)
  volatile int owner_reserve;        /* debug: tid holding mutex_reserve */
#endif
  DKNSECTS nsect_vol_max;
  VOLID volid_extend;
  DB_VOLTYPE voltype;
};

Field (`disk_extend_info`)	Role	Why it exists
`nsect_free`	Free sectors over all volumes of one purpose	Fast number reservation decrements before touching any bitmap (Ch.4); `volatile` for cross-thread visibility.
`nsect_total`	Formatted sectors of this purpose	Distinguishes exhausted from merely fragmented.
`nsect_max`	Ceiling of this purpose	Distinguishes “extend existing” from “add new volume”.
`nsect_intention`	Sectors promised but not yet committed by an extender	Prevents thundering-herd extension (Ch.5).
`mutex_reserve`	Lock guarding the four counters	Serializes the hot reservation path.
`owner_reserve`	Debug owner tid	NDEBUG-only lock-discipline aid.
`nsect_vol_max`	Largest sector count a new volume may take	Caps a single extension’s size.
`volid_extend`	Volume the next extension grows	Cached target, no rescan.
`voltype`	Volume type for this rollup	Tags perm vs temp.

// disk_perm_info / disk_temp_info -- src/storage/disk_manager.c
struct disk_perm_info  { DISK_EXTEND_INFO extend_info; };
struct disk_temp_info  {
  DISK_EXTEND_INFO extend_info;
  DKNSECTS nsect_perm_free;     /* free sectors on PERMANENT volumes usable for temp purpose */
  DKNSECTS nsect_perm_total;
};

Field	Role	Why it exists
`disk_perm_info.extend_info`	The perm-purpose rollup	All permanent free space funnels here.
`disk_temp_info.extend_info`	The temp-volume rollup	Free space on genuine temp volumes.
`disk_temp_info.nsect_perm_free`	Free temp-usable sectors on perm volumes	Fallback pool when temp volumes exhausted (temp-on-perm); kept separate so temp alloc prefers real temp volumes first.
`disk_temp_info.nsect_perm_total`	Total such sectors	Sizes the fallback pool.

// disk_cache -- src/storage/disk_manager.c
struct disk_cache
{
  int nvols_perm;
  int nvols_temp;
  DISK_CACHE_VOLINFO vols[LOG_MAX_DBVOLID + 1];    /* per-volume free hint, indexed by volid */
  DISK_PERM_PURPOSE_INFO perm_purpose_info;
  DISK_TEMP_PURPOSE_INFO temp_purpose_info;
  pthread_mutex_t mutex_extend; /* never take while holding a reserve mutex */
#if !defined (NDEBUG)
  volatile int owner_extend;
#endif
};

Field (`disk_cache`)	Role	Why it exists
`nvols_perm`	Number of permanent volumes	Iteration bound / placement.
`nvols_temp`	Number of temporary volumes	Same; temp volumes index from the high end of `vols`.
`vols[]`	Per-volume `disk_cache_volinfo`	Direct `vols[volid]` lookup; sized `LOG_MAX_DBVOLID + 1`.
`perm_purpose_info`	Permanent rollup	Aggregate perm free space.
`temp_purpose_info`	Temporary rollup	Aggregate temp free space plus perm-fallback.
`mutex_extend`	Lock for volume-set extension	Coarser than `mutex_reserve`.
`owner_extend`	Debug owner	NDEBUG-only.

LOG_MAX_DBVOLID is VOLID_MAX - 1 (SHRT_MAX - 1), so vols[] indexes any valid VOLID.

Invariant — lock ordering: mutex_reserve before mutex_extend, never the reverse. Both struct comments state it. Reservation (frequent) takes mutex_reserve; extension (rare) takes mutex_extend. Opposite ordering across two threads would deadlock. Ch.4 and Ch.5 rely on this.

`disk_stab_cursor` and `DISK_STAB_UNIT`

The sector allocation table is a bitmap, one bit per sector. The iteration unit is a UINT64.

// DISK_STAB_UNIT -- src/storage/disk_manager.c
typedef UINT64 DISK_STAB_UNIT;        /* one 64-bit word of the bitmap */

// disk_stab_cursor -- src/storage/disk_manager.c
struct disk_stab_cursor
{
  const DISK_VOLUME_HEADER *volheader;
  PAGEID pageid;                       /* current bitmap page id (real, not table-relative) */
  int offset_to_unit;
  int offset_to_bit;
  SECTID sectid;
  PAGE_PTR page;                       /* fixed bitmap page (NULL until fixed) */
  DISK_STAB_UNIT *unit;                /* pointer to current unit inside page */
};

Field	Role	Why it exists
`volheader`	Volume being walked	Source of `stab_first_page`/`nsect_total` bounds.
`pageid`	Current real bitmap page	From `sectid` plus `stab_first_page`.
`offset_to_unit`	Which `UINT64` word in the page	`DISK_ALLOCTBL_SECTOR_UNIT_OFFSET`.
`offset_to_bit`	Which bit in the word	`DISK_ALLOCTBL_SECTOR_BIT_OFFSET`.
`sectid`	Sector the cursor names	The (page, unit, bit) triple decomposes this.
`page`	Pinned page pointer	`NULL` = no page fixed; non-NULL = a latch is held.
`unit`	Pointer into `page` at `offset_to_unit`	Reads/writes the live word without recomputing the address.

Invariant — page == NULL iff no latch is held. Crossing a page boundary must unfix the old page before fixing the next and resetting unit. A non-NULL page left after the walk is a leaked latch. Ch.4’s bitmap-commit step depends on this.

`disk_cache_vol_reserve` and `disk_reserve_context`

Transient scratch the two-step reservation (Ch.4) uses. disk_reserve_context lives on the caller’s stack for one reservation.

// disk_cache_vol_reserve -- src/storage/disk_manager.c
struct disk_cache_vol_reserve
{
  VOLID volid;        /* a volume from which sectors were drawn */
  DKNSECTS nsect;     /* how many sectors drawn from it */
};

// disk_reserve_context -- src/storage/disk_manager.c
struct disk_reserve_context
{
  int nsect_total;                              /* total sectors this request must reserve */
  VSID *vsidp;                                  /* output cursor: next VSID write position */
  DISK_CACHE_VOL_RESERVE cache_vol_reserve[VOLID_MAX]; /* per-volume tally drawn from cache */
  int n_cache_vol_reserve;
  int n_cache_reserve_remaining;                /* entries not yet committed to bitmaps */
  DKNSECTS nsects_lastvol_remaining;            /* sectors still owed on the last volume */
  DB_VOLPURPOSE purpose;
};

Field (`disk_reserve_context`)	Role	Why it exists
`nsect_total`	Sectors the request needs	The loop’s goal.
`vsidp`	Write cursor into the caller’s `VSID[]`	Each committed sector appends here.
`cache_vol_reserve[]`	Per-volume plan (volume, count)	Step one fills it from cache; step two replays against bitmaps. Sized `VOLID_MAX`.
`n_cache_vol_reserve`	Count of populated plan entries	Bounds the replay loop.
`n_cache_reserve_remaining`	Entries not yet committed	Enables precise rollback.
`nsects_lastvol_remaining`	Sectors still owed on the current volume	Progress within one entry.
`purpose`	Request purpose	Routes to perm or temp rollup.

disk_cache_vol_reserve is just a (volid, nsect) pair; an array is the reservation plan. DISK_PRERESERVE_BUF_DEFAULT (16) is the default batch the cache reserve fills.

1.2 The file side, at a glance

flowchart TB
  subgraph hdrpage["File header page (page 0 of a file)"]
    FH["file_header"]
    FH --> TS["tablespace: file_tablespace"]
    FH --> DESC["descriptor: file_descriptors (union, 64 B)"]
    FH -. "offset_to_partial_ftab" .-> PART["partial table:<br/>file_extensible_data of file_partial_sector"]
    FH -. "offset_to_full_ftab" .-> FULL["full table:<br/>file_extensible_data of VSID"]
    FH -. "offset_to_user_page_ftab" .-> UPT["user page table (numerable):<br/>file_extensible_data of VPID"]
  end
  PART --> PS["file_partial_sector<br/>{ vsid, page_bitmap }"]
  PART -. "vpid_next" .-> MOREP["overflow extdata page"]

Figure 1-2. File-side structure relationships. The header page embeds file_header, which carries the tablespace policy, the typed descriptor, and three byte-offsets into the three extensible tables co-located in the same page (overflowing via vpid_next).

`file_header` — the persisted page-0 of every file

// file_header -- src/storage/file_manager.c
struct file_header
{
  INT64 time_creation;
  VFID self;                       /* this file's own VFID */
  FILE_TABLESPACE tablespace;
  FILE_DESCRIPTORS descriptor;
  /* Page counts. */
  int n_page_total;
  int n_page_user;
  int n_page_ftab;
  int n_page_free;                 /* reserved-on-disk, not yet allocated */
  int n_page_mark_delete;          /* numerable: pages marked deleted */
  /* Sector counts. */
  int n_sector_total;
  int n_sector_partial;
  int n_sector_full;
  int n_sector_empty;              /* empty sectors are a subset of partial */
  FILE_TYPE type;
  INT32 file_flags;                /* NUMERABLE / TEMPORARY / ENCRYPTED_* */
  VOLID volid_last_expand;
  INT16 offset_to_partial_ftab;
  INT16 offset_to_full_ftab;
  INT16 offset_to_user_page_ftab;  /* user page table (numerable only) */
  VPID vpid_sticky_first;          /* first page if sticky; never deallocated */
  /* Temporary files: last-allocation cursor. */
  VPID vpid_last_temp_alloc;
  int offset_to_last_temp_alloc;
  /* Numerable files. */
  VPID vpid_last_user_page_ftab;   /* last page of user page table (append point) */
  VPID vpid_find_nth_last;         /* cache: page of last find_nth result */
  int first_index_find_nth_last;   /* cache: index of first entry in that page */
  INT32 reserved0; INT32 reserved1; INT32 reserved2; INT32 reserved3;
};

Field	Role	Why it exists
`time_creation`	File creation timestamp	Provenance.
`self`	The file’s own `VFID`	A header page in isolation knows its file.
`tablespace`	Expansion policy (below)	Drives growth aggressiveness.
`descriptor`	Typed metadata union (below)	Each type stashes its ids here.
`n_page_total`	Pages owned (user + table + free)	Master accounting.
`n_page_user`	Pages handed to the owner	The useful count.
`n_page_ftab`	Pages used by the three tables	Overhead; grows on overflow.
`n_page_free`	Reserved-but-unallocated pages	Available without a new reservation.
`n_page_mark_delete`	Numerable pages marked deleted	Numerable files flag, not remove (Ch.10).
`n_sector_total`	Sectors reserved	`= n_sector_partial + n_sector_full`.
`n_sector_partial`	Partial sectors	Have a free page; in the partial table.
`n_sector_full`	Full sectors	All 64 pages used; in the full table (perm only).
`n_sector_empty`	Sectors with zero pages	Subset of partial; reclaimed first on extension.
`type`	`FILE_TYPE`	Selects table layout and numerable/temp eligibility.
`file_flags`	Bit flags	`NUMERABLE 0x1`, `TEMPORARY 0x2`, `ENCRYPTED_AES 0x4`, `ENCRYPTED_ARIA 0x8`; via `FILE_IS_*`.
`volid_last_expand`	Volume last grown	Locality hint for next expansion.
`offset_to_partial_ftab`	Offset to partial table in this page	`FILE_HEADER_GET_PART_FTAB` asserts range.
`offset_to_full_ftab`	Offset to full table	Asserted non-temporary (temp has no full table).
`offset_to_user_page_ftab`	Offset to user page table	Numerable only; asserted numerable.
`vpid_sticky_first`	First page, if sticky	Never deallocated (Ch.11).
`vpid_last_temp_alloc`	Temp alloc cursor: page	Temp files alloc forward, never dealloc (Ch.8).
`offset_to_last_temp_alloc`	Temp alloc cursor: sector offset	Offset component of that cursor.
`vpid_last_user_page_ftab`	Numerable: append point	New user pages appended here (Ch.10).
`vpid_find_nth_last`	Numerable: cached find-nth page	Optimizes sequential find-nth (Ch.10).
`first_index_find_nth_last`	Numerable: cached first-entry index	Companion to the cache above.
`reserved0..3`	Four spare `INT32` for forward compatibility.	—

Invariant — accounting balances: n_page_total == n_page_user + n_page_ftab + n_page_free, n_sector_total == n_sector_partial + n_sector_full, and n_sector_empty <= n_sector_partial. Every alloc/dealloc in Ch.7–Ch.9 adjusts these as a set under the header latch. Drift means the file believes it owns space it does not, or leaks it; file_validate and the FILE_HEADER_GET_*_FTAB assertions guard. The empty-subset relation lets extension prefer empty sectors without a separate table.

`file_extensible_data` — the generic multi-page table component

All three file tables are file_extensible_data: a small header followed by an array of fixed-size items, chained page-to-page.

// file_extensible_data -- src/storage/file_manager.c
struct file_extensible_data
{
  VPID vpid_next;       /* next component page, NULL if last */
  INT16 max_size;       /* capacity in bytes for items in this component */
  INT16 size_of_item;   /* byte size of one item */
  INT16 n_items;        /* number of items currently stored */
};

Field	Role	Why it exists
`vpid_next`	Link to next component	Chains overflow pages; NULL terminates.
`max_size`	Byte capacity here	Bounds `n_items`.
`size_of_item`	Size of one item	Partial table = `file_partial_sector` (16 B), full = `VSID` (8 B), user-page = `VPID` (8 B). One format, three item types.
`n_items`	Items stored	Drives iteration; insert/delete bump it.

Invariant — n_items * size_of_item <= max_size, items kept densely packed from FILE_EXTDATA_HEADER_ALIGNED_SIZE. An insert that would overflow allocates a new component linked via vpid_next; a delete shifts the tail down. The density invariant is what lets find-nth index by position (Ch.6–Ch.10).

`file_partial_sector`, `FILE_ALLOC_BITMAP`, `file_tablespace`, `file_descriptors`

// FILE_ALLOC_BITMAP -- src/storage/file_manager.h
typedef UINT64 FILE_ALLOC_BITMAP;     /* one bit per page in a sector (64 pages) */
#define FILE_FULL_PAGE_BITMAP      0xFFFFFFFFFFFFFFFF  /* Full allocation bitmap */
#define FILE_EMPTY_PAGE_BITMAP      0x0000000000000000  /* Empty allocation bitmap */
#define FILE_ALLOC_BITMAP_NBITS ((int) (sizeof (FILE_ALLOC_BITMAP) * CHAR_BIT))  /* 64 */

// file_partial_sector -- src/storage/file_manager.h
struct file_partial_sector
{
  VSID vsid;      /* MUST be first member: reinterpreted as VSID in file table */
  FILE_ALLOC_BITMAP page_bitmap;
};

VSID is { int32_t sectid; short volid; } = 6 bytes padded to 8; FILE_ALLOC_BITMAP is a UINT64 = 8 bytes; so sizeof (file_partial_sector) == 16.

Field	Role	Why it exists
`file_partial_sector.vsid`	The reserved sector’s id (8 B)	First member by contract: the full table stores bare `VSID`, so a `file_partial_sector` is reinterpreted as `VSID` on promotion.
`file_partial_sector.page_bitmap`	64-bit page allocation map (8 B)	Bit i set = page i allocated. `FILE_FULL_PAGE_BITMAP` = full; `FILE_EMPTY_PAGE_BITMAP` = empty.

Invariant — vsid is the first member, deliberately. The source comment: “VSID must be first member … the FILE_PARTIAL_SECTOR pointers in file table are reinterpreted as VSID.” When a partial sector fills, the file manager moves the leading VSID bytes into the full table without copying the bitmap. Reordering would corrupt the full table silently. FILE_ALLOC_BITMAP_NBITS == DISK_SECTOR_NPAGES == 64, so one bitmap covers one sector exactly.

// file_tablespace -- src/storage/file_manager.h
struct file_tablespace
{
  INT64 initial_size;     /* bytes the file starts with */
  float expand_ratio;     /* fraction of current size to add when expanding */
  int expand_min_size;    /* lower clamp on an expansion, in bytes */
  int expand_max_size;    /* upper clamp on an expansion, in bytes */
};

Field	Role	Why it exists
`initial_size`	Starting byte size	`MAX(1, npages) * DB_PAGESIZE` at create.
`expand_ratio`	Growth fraction	~1% of current size for perm (`FILE_TABLESPACE_DEFAULT_RATIO_EXPAND`); 0 for temp.
`expand_min_size`	Minimum expansion	At least one sector for perm; 0 for temp.
`expand_max_size`	Maximum expansion	Caps one growth (1024 sectors perm; 0 temp).

Temp files use FILE_TABLESPACE_FOR_TEMP_NPAGES, zeroing ratio/min/max — temp files do not auto-expand the same way.

// file_descriptors -- src/storage/file_manager.h
/* note: if you change file descriptors size, make sure to change disk compatibility version too! */
#define FILE_DESCRIPTORS_SIZE 64
union file_descriptors
{
  FILE_HEAP_DES heap;
  FILE_OVF_HEAP_DES heap_overflow;
  FILE_BTREE_DES btree;
  FILE_OVF_BTREE_DES btree_key_overflow;  /* TODO: rename FILE_OVF_BTREE_DES */
  FILE_EHASH_DES ehash;
  FILE_VACUUM_DATA_DES vacuum_data;
  char dummy_align[FILE_DESCRIPTORS_SIZE];
};

The per-member struct shapes below are added annotations (the source defines each FILE_*_DES separately, not inline):

Member	Shape (annotation)	Role	Why it exists
`heap`	`{ OID class_oid; HFID hfid; }`	Heap file’s class OID + HFID	A heap file points back to its class and heap id.
`heap_overflow`	`{ HFID hfid; OID class_oid; }`	Overflow heap’s HFID + class OID	Overflow records for large heap rows.
`btree`	`{ OID class_oid; int attr_id; }`	Index file’s class OID + attribute id	A btree file knows the indexed class and attribute.
`btree_key_overflow`	`{ BTID btid; OID class_oid; }`	Long-key overflow file (`file_ovf_btree_des`)	Long keys overflow into a separate file.
`ehash`	`{ OID class_oid; int attr_id; }`	Extensible hash’s class OID + attr id	Identifies the hashed attribute.
`vacuum_data`	`{ VPID vpid_first; }`	First VPID of vacuum data	Vacuum’s bookkeeping file.
`dummy_align`	`char[FILE_DESCRIPTORS_SIZE]`	64-byte padding	Pins the union at `FILE_DESCRIPTORS_SIZE`; the source ties this size to the on-disk compatibility version, so it must not change casually.

The union is interpreted per file_header.type. FILE_TYPE_CAN_BE_NUMERABLE, FILE_TYPE_IS_ALWAYS_TEMP, and the file_flags bits decide which tables the file actually carries — covered in Ch.6 and Ch.10.

1.3 Chapter summary — key takeaways

There are two persisted page-0 structures — disk_volume_header (one per volume) and file_header (one per file) — and the rest either summarize them in memory (disk_cache family) or are scratch for one operation (disk_reserve_context, disk_stab_cursor).
The disk manager hands out sectors (64-page extents) tracked by a one-bit-per-sector table sized once for nsect_max; the table is immovable, which is why every reserved VSID and cached cursor stays valid across volume extension.
disk_cache is the single in-memory free-space oracle: vols[] per-volume hints feed two purpose rollups (disk_perm_info, disk_temp_info), and the lock order mutex_reserve before mutex_extend is a hard invariant against deadlock.
Sector reservation is two-step: disk_reserve_context drains a plan from the cache into cache_vol_reserve[], then replays it against the bitmaps via a disk_stab_cursor; the *_remaining counters make a partial reservation precisely reversible.
The file manager carves pages from reserved sectors using three file_extensible_data tables — partial, full, user-page — the same chained, densely-packed, fixed-item format differing only in size_of_item.
file_partial_sector (16 B) puts vsid first on purpose so a filled sector can be promoted to the full table by reinterpreting the pointer as a bare VSID; its 64-bit page_bitmap maps exactly one sector’s 64 pages.
file_header’s page and sector counters must balance as accounting identities; its three offset_to_*_ftab, the temp-alloc cursor, and the numerable find-nth cache are the only state distinguishing regular, temporary, and numerable files — operational meaning deferred to Ch.7, Ch.8, and Ch.10.

Chapter 2: Initialization and Memory Management

The Chapter 1 structures have no on-disk form; the source of truth is the per-volume header plus its sector allocation table (Chapter 3), and disk_Cache is a derived rollup recomputed from those headers at every boot. This chapter answers: where do disk_Cache and the file-manager globals come from at server start, and how is the cache rebuilt by walking the mounted-volume chain? For why CUBRID keeps a coarse RAM counter, see the companion’s “In-memory cache” section.

2.1 The bootstrap call chain

Two modules wake up at boot: the disk manager (owns disk_Cache, disk_manager_init) reconstructs state from disk; the file manager (owns file_Tempcache and the tracker globals, file_manager_init) only zeroes RAM.

flowchart TD
  boot["server boot"] --> dmi["disk_manager_init(load_from_disk=true)"]
  dmi --> dci["disk_cache_init -> malloc disk_Cache"]
  dci --> dclav["disk_cache_load_all_volumes"]
  dclav --> fmm["fileio_map_mounted: walk mounted volumes"]
  fmm --> dclv["disk_cache_load_volume (per volid)"]
  dclv --> dvb["disk_volume_boot: read header + count free"]
  boot --> fmi["file_manager_init"]
  fmi --> ftci["file_tempcache_init -> zero file_Tempcache"]

Figure 2-1. Boot-time initialization fan-out.

2.2 `disk_manager_init` — parameter capture, reload guard, optional load

disk_manager_init does four things in order: derive the temp-volume sector cap, capture the logging flag, (re)allocate the cache, and conditionally load from disk.

// disk_manager_init -- src/storage/disk_manager.c
int
disk_manager_init (THREAD_ENTRY * thread_p, bool load_from_disk)
{
  int error_code = NO_ERROR;

  disk_Temp_max_sects = (DKNSECTS) prm_get_integer_value (PRM_ID_BOSR_MAXTMP_PAGES);
  if (disk_Temp_max_sects < 0)
    disk_Temp_max_sects = SECTID_MAX;                  /* <- negative param means "no cap" (infinite) */
  else
    disk_Temp_max_sects = disk_Temp_max_sects / DISK_SECTOR_NPAGES;  /* <- pages -> sectors */
  // ... condensed: disk_Logging = prm_get_bool_value (PRM_ID_DISK_LOGGING) ...

  if (disk_Cache != NULL)
    disk_cache_final ();                        /* <- idempotent reload: tear down stale cache first */
  error_code = disk_cache_init ();
  if (error_code != NO_ERROR)
    {
      ASSERT_ERROR ();
      return error_code;                        /* <- malloc failure: nothing to clean up */
    }
  assert (disk_Cache != NULL);

  if (load_from_disk && !disk_cache_load_all_volumes (thread_p))
    {
      ASSERT_ERROR_AND_SET (error_code);
      disk_manager_final ();                    /* <- partial load failed: roll the whole cache back */
      return error_code;
    }
  return NO_ERROR;
}

Branch accounting:

Branch	Condition	Effect
`disk_Temp_max_sects < 0`	param negative (default `-1`)	cap = `SECTID_MAX` -> infinite temp space
else	param `>= 0`	param is a page count; / `DISK_SECTOR_NPAGES` -> sector cap
`disk_Cache != NULL`	prior cache exists (reload)	`disk_cache_final` frees it first — makes init idempotent
`disk_cache_init != NO_ERROR`	`malloc` failed	early return; nothing allocated to free
`load_from_disk && load fails`	a volume failed to boot	`disk_manager_final` frees the half-cache, propagate error
`load_from_disk == false`	first-volume format path	cache stays empty; caller fills it manually

The static initializer static DKNSECTS disk_Temp_max_sects = -2; is a pre-init sentinel (“not yet computed”), distinct from the parameter default -1 (“Infinite”). disk_manager_init always overwrites it from PRM_ID_BOSR_MAXTMP_PAGES (temp_file_max_size_in_pages) per the branch table; this later bounds permanent-volume-as-temp growth.

Invariant — the reload path is destructive-then-rebuilding. disk_manager_init may run more than once (reload after recovery phases), so it must never leak the old cache. The disk_Cache != NULL guard calls disk_cache_final first; without it a second init leaks the previous allocation and its three mutexes.

2.3 `disk_cache_init` — allocating and zeroing the global cache

disk_cache_init is the only allocator of disk_Cache. It mallocs one flat DISK_CACHE (the vols[] array is inline, sized for LOG_MAX_DBVOLID), then zeroes every counter so the per-volume load can simply add into the rollup.

// disk_cache_init -- src/storage/disk_manager.c
static int
disk_cache_init (void)
{
  int i;
  assert (disk_Cache == NULL);                 /* <- never double-allocate */
  disk_Cache = (DISK_CACHE *) malloc (sizeof (DISK_CACHE));
  if (disk_Cache == NULL)
    { /* ... er_set OUT_OF_VIRTUAL_MEMORY, return ER_OUT_OF_VIRTUAL_MEMORY ... */ }

  disk_Cache->nvols_perm = disk_Cache->nvols_temp = 0;
  disk_Cache->perm_purpose_info.extend_info.nsect_vol_max =                   /* default new-vol size */
    DISK_SECTS_ROUND_UP ((DKNSECTS) (prm_get_bigint_value (PRM_ID_DB_VOLUME_SIZE) / IO_SECTORSIZE));
  // ... condensed: perm free/total/max = 0 (load ADDS in); volid_extend = NULL_VOLID; voltype = PERM ...
  // ... condensed: temp extend_info same vol_max, zeroed, NULL_VOLID; nsect_perm_free/total = 0 ...
  // ... condensed: 3 pthread_mutex_init (perm/temp mutex_reserve, mutex_extend) ...
  for (i = 0; i <= LOG_MAX_DBVOLID; i++)       /* <- inclusive of highest legal volid */
    {
      disk_Cache->vols[i].purpose = DISK_UNKNOWN_PURPOSE;       /* <- every slot starts "no volume here" */
      disk_Cache->vols[i].nsect_free = 0;
    }
  return NO_ERROR;
}

nsect_vol_max (both purposes) is the default new-volume size for later auto-extension, not a current value. Both volid_extend start NULL_VOLID (discovered during load), both nvols_* start 0, and every slot starts DISK_UNKNOWN_PURPOSE / zero free. Since load only adds, a fresh disk_cache_init must precede any load.

2.4 `disk_cache_load_all_volumes` — walking the mounted-volume chain

disk_cache_load_all_volumes is a thin wrapper — it asserts the cache exists and returns fileio_map_mounted (thread_p, disk_cache_load_volume, NULL), handing the per-volume callback to the chain walker.

fileio_map_mounted (in file_io.c) is that walker. It iterates the file-IO volume-info header in two passes: permanent volumes ascending from volid 0 up to next_perm_volid - 1, then temporary volumes descending to next_temp_volid (the file-IO equivalent of the on-disk next_volid chain). Unmounted slots (vol_info_p->vdes == NULL_VOLDES) are skipped. If the callback returns false, the walk stops and returns false, which disk_manager_init treats as fatal.

flowchart TD
  start["fileio_map_mounted"] --> permloop{"perm volid <= next_perm_volid-1?"}
  permloop -- "vdes live" --> cb1["disk_cache_load_volume(volid)"]
  permloop -- "skip / done" --> temploop{"temp volid >= next_temp_volid?"}
  cb1 -- false --> stopf["return false"]
  cb1 -- true --> permloop
  temploop -- "vdes live" --> cb2["disk_cache_load_volume(volid)"]
  cb2 -- false --> stopf
  cb2 -- true --> temploop
  temploop -- done --> okt["return true"]

Figure 2-2. fileio_map_mounted two-pass walk driving the cache load.

2.5 `disk_cache_load_volume` — rolling one header into the rollup

The heart of cache reconstruction. Per volume it boots the header via disk_volume_boot (reads the header, counts free sectors — Chapter 3), then folds the result into the right purpose info.

// disk_cache_load_volume -- src/storage/disk_manager.c
static bool
disk_cache_load_volume (THREAD_ENTRY * thread_p, INT16 volid, void *ignore)
{
  DB_VOLPURPOSE vol_purpose;
  DB_VOLTYPE vol_type;
  DISK_VOLUME_SPACE_INFO space_info = DISK_VOLUME_SPACE_INFO_INITIALIZER;

  if (disk_volume_boot (thread_p, volid, &vol_purpose, &vol_type, &space_info) != NO_ERROR)
    {
      ASSERT_ERROR ();
      return false;                            /* <- aborts the whole map walk */
    }

  if (vol_type != DB_PERMANENT_VOLTYPE)
    {
      /* don't save temporary volumes... they will be dropped anyway */
      return true;                             /* <- temp-type volumes are not cached at all */
    }

  if (vol_purpose == DB_PERMANENT_DATA_PURPOSE)
    {
      // perm_purpose_info.extend_info.nsect_{free,total,max} += space_info.n_{free,total,max}_sects
      // ... condensed: assert nsect_free <= nsect_total <= nsect_max ...
      if (space_info.n_total_sects < space_info.n_max_sects)
        {
          assert (disk_Cache->perm_purpose_info.extend_info.volid_extend == NULL_VOLID);
          disk_Cache->perm_purpose_info.extend_info.volid_extend = volid;  /* <- this vol can still grow */
        }
    }
  else                                          /* perm type, temp purpose */
    {
      assert (space_info.n_total_sects == space_info.n_max_sects);         /* <- perm-as-temp is fully grown */
      // temp_purpose_info.nsect_perm_{free,total} += space_info.n_{free,total}_sects
      // ... condensed: assert nsect_perm_free <= nsect_perm_total ...
    }

  disk_Cache->vols[volid].nsect_free = space_info.n_free_sects;
  disk_Cache->vols[volid].purpose = vol_purpose;
  disk_Cache->nvols_perm++;                     /* <- runs for BOTH branches above */
  return true;
}

Branch accounting:

Branch	Condition	Effect
boot fails	`disk_volume_boot != NO_ERROR`	return `false`; map walk and whole init abort
`vol_type != DB_PERMANENT_VOLTYPE`	temporary-type volume	return `true` — not cached (dropped/reformatted at boot)
`vol_purpose == DB_PERMANENT_DATA_PURPOSE`	perm volume, perm data	add free/total/max into `perm_purpose_info.extend_info`; if below max size, set `volid_extend`
else (perm type, temp purpose)	perm volume repurposed for temp	add free/total into `temp_purpose_info.nsect_perm_*`; assert fully grown

The else-branch is the subtle case: type (survives restart?) and purpose (what it holds) are orthogonal. A perm-type/temp-purpose volume’s space rolls into nsect_perm_* (“permanent sectors lent to temp”), distinct from temp_purpose_info.extend_info (genuine temporary-type volumes, skipped above). The perm-path assert (... == NULL_VOLID) enforces at most one permanent volume “growing”. After the if/else, the slot recording (vols[volid].*) and nvols_perm++ run unconditionally for every permanent-TYPE volume regardless of purpose — so a perm-as-temp volume is counted in nvols_perm, never nvols_temp; since temporary-type volumes returned early, after a full load nvols_temp == 0.

Invariant — the cache is a derived rollup and may legitimately undercount. nsect_free is allowed to be lower than reality at any time; the two-step reservation protocol (Chapter 4) depends on this — a reservation may pessimistically decrement the cache and reconcile against the allocation table later. Never treat nsect_free as exact; the allocation table is the source of truth.

2.6 `disk_manager_final` / `disk_cache_final` — teardown

Teardown is branch-light; disk_manager_final delegates to disk_cache_final.

// disk_manager_final -- src/storage/disk_manager.c
void disk_manager_final (void) { disk_cache_final (); }

// disk_cache_final -- src/storage/disk_manager.c
static void
disk_cache_final (void)
{
  if (disk_Cache == NULL)
    {
      return;                                  /* <- safe to call when never initialized */
    }
  // ... condensed: assert perm/temp owner_reserve == -1 and owner_extend == -1 (no lock held at teardown) ...
  // ... condensed: pthread_mutex_destroy the perm/temp mutex_reserve and mutex_extend ...
  free_and_init (disk_Cache);                  /* <- frees and NULLs the pointer */
}

The disk_Cache == NULL guard makes final idempotent, which is why both the reload path and the load-failure rollback call it unconditionally. The three owner_* asserts (debug only) document that no thread may hold the reserve or extend mutex at teardown — a violation is caught here, not as a destroyed locked mutex. free_and_init zeroes the pointer so a later disk_cache_init passes assert (disk_Cache == NULL).

2.7 `file_manager_init` / `file_manager_final` and the file-manager globals

The file manager reconstructs nothing from disk: it captures one logging flag, sanity-checks a size assumption, and initializes the temporary-file cache.

// file_manager_init -- src/storage/file_manager.c
int
file_manager_init (void)
{
  file_Logging = prm_get_bool_value (PRM_ID_FILE_LOGGING);
  assert (FILE_DESCRIPTORS_SIZE == sizeof (FILE_DESCRIPTORS));   /* <- layout self-check */
  return file_tempcache_init ();
}

// file_manager_final -- src/storage/file_manager.c
void file_manager_final (void) { file_tempcache_final (); }

file_manager_init does not touch file_Tracker_vfid / file_Tracker_vpid; they are statically zero-initialized (VFID_INITIALIZER / VPID_INITIALIZER) and only filled when the tracker file is created or located during boot (Chapters 6 and 9). file_Tempcache is likewise static, “empty” until file_tempcache_init populates it:

// file_tempcache_init -- src/storage/file_manager.c
static int
file_tempcache_init (void)
{
  int ntrans = logtb_get_number_of_total_tran_indices () + 1;   /* SERVER_MODE; else 1 */
  assert (file_Tempcache.tran_files == NULL);                   /* <- tran_files != NULL means "initialized" */

  // ... condensed: free_entries/cached_* = NULL, ncached_* = 0, nfree_entries_max = ntrans*8 ...
  file_Tempcache.ncached_max = prm_get_integer_value (PRM_ID_MAX_ENTRIES_IN_TEMP_FILE_CACHE);
  pthread_mutex_init (&file_Tempcache.mutex, NULL);

  file_Tempcache.tran_files = (FILE_TEMPCACHE_TRAN_ENTRY *) malloc (ntrans * sizeof (...));
  if (file_Tempcache.tran_files == NULL)
    {
      pthread_mutex_destroy (&file_Tempcache.mutex);            /* <- undo the mutex on alloc failure */
      // ... er_set OUT_OF_VIRTUAL_MEMORY; return ER_OUT_OF_VIRTUAL_MEMORY ...
    }
  // ... condensed: memset tran_files; per-tran mutex_init loop; memset spacedb_temp ...
  return NO_ERROR;
}

Branch accounting: the only non-trivial branch is the malloc failure, which destroys file_Tempcache.mutex before returning so nothing is half-constructed. file_tempcache_final mirrors this — early return if tran_files == NULL, else free every per-transaction list, the cached numerable / not-numerable lists and the free-entry pool, and destroy the mutexes.

Invariant — file_Tempcache.tran_files == NULL is the “uninitialized” sentinel. Both init (via assert) and final (via early return) treat tran_files as the single truth for whether the tempcache exists. Code that allocates or frees it must keep this honest, or final skips a real teardown or double-frees.

2.8 Chapter summary — key takeaways

disk_manager_init is the only assembler of disk_Cache and idempotent: the disk_Cache != NULL guard tears down any prior cache, disk_cache_init allocates, and a failed load_from_disk rolls back via disk_manager_final.
disk_cache_init zeroes all rollup counters so load purely adds, and seeds every vols[] slot to DISK_UNKNOWN_PURPOSE.
The cache is rebuilt by walking mounted volumes — fileio_map_mounted (two-pass perm-ascending / temp-descending, bounded by next_*_volid), one disk_cache_load_volume per live descriptor.
disk_cache_load_volume distinguishes type from purpose: temp-type volumes are skipped; perm-data feeds perm_purpose_info.extend_info (may set the single volid_extend); perm-type/temp-purpose feeds temp_purpose_info.nsect_perm_*. nvols_perm++ runs for every permanent-type volume regardless of purpose, so after a full load nvols_temp == 0.
The cache is a derived, lower-bound rollup that may legitimately undercount free sectors; the allocation table is the source of truth.
disk_Temp_max_sects starts at -2 (pre-init sentinel, vs parameter default -1 = Infinite), overwritten from PRM_ID_BOSR_MAXTMP_PAGES: negatives map to SECTID_MAX, non-negative pages divide by DISK_SECTOR_NPAGES.
The file manager reconstructs nothing from disk: file_manager_init only captures a flag and runs file_tempcache_init; the tracker globals stay static *_INITIALIZER zeros, and file_Tempcache.tran_files == NULL is the uninitialized sentinel guarding both init and final.

Chapter 3: Volume Format and the Sector Allocation Table

This chapter answers: how is a CUBRID volume laid out on disk, and how does the disk manager flip bits in the sector allocation table without scanning the bitmap one bit at a time? The high-level companion (cubrid-disk-manager.md) covers why a sector is the allocation quantum and why a bitmap beats a free-list; here we trace the byte layout, the format-time writers, and the bitmap-as-functor machinery. DISK_VOLUME_HEADER and DISK_STAB_CURSOR are introduced field-by-field in Chapter 1.

3.1 The on-disk volume layout

Every CUBRID volume — permanent or temporary, first or extension — shares one macro-layout: page 0 is the volume header, then a contiguous run of sector-table (STAB) pages, then data. Three header fields fix it:

// disk_volume_header_set_stab -- src/storage/disk_manager.c
volheader->stab_first_page = DISK_VOLHEADER_PAGE + 1;                                    /* <- STAB always starts at page 1 */
volheader->stab_npages = CEIL_PTVDIV (volheader->nsect_max, DISK_STAB_PAGE_BIT_COUNT);  /* <- sized by nsect_max, not nsect_total */
volheader->sys_lastpage = volheader->stab_first_page + volheader->stab_npages - 1;      /* <- last reserved sys page */

DISK_VOLHEADER_PAGE is 0, so stab_first_page is always page 1. The decisive choice is the divisor — nsect_max, not nsect_total: a volume grows its used size up to its capacity without moving the data region, because the STAB was sized for the maximum on day one. Chapter 5 (extension) depends on this — extension flips already-present STAB bits and never re-lays-out the volume.

flowchart LR
  subgraph Volume["Volume file (pages)"]
    H["page 0<br/>DISK_VOLUME_HEADER<br/>magic, volid, purpose,<br/>nsect_total, nsect_max,<br/>stab_first_page, stab_npages,<br/>sys_lastpage, hint_allocsect"]
    S["pages 1 .. sys_lastpage<br/>SECTOR ALLOCATION TABLE<br/>stab_npages pages of UINT64 units<br/>1 bit == 1 sector"]
    D["pages sys_lastpage+1 .. end<br/>DATA SECTORS<br/>64 pages each"]
  end
  H --> S --> D

Figure 3-1: macro-layout of any CUBRID volume. The STAB is sized for nsect_max so the data region’s start never moves.

A “sector” is 64 consecutive pages (DISK_SECTOR_NPAGES); SECTOR_FROM_PAGEID(pageid) is pageid / 64. The system sectors a volume self-reserves at format time number SECTOR_FROM_PAGEID(sys_lastpage) + 1 (header + all STAB pages, rounded up) — the value that drives disk_stab_init (§3.3).

Invariant — STAB sizing is pinned to nsect_max. disk_verify_volume_header asserts stab_npages == CEIL_PTVDIV(nsect_max, DISK_STAB_PAGE_BIT_COUNT), stab_npages >= CEIL_PTVDIV(nsect_total, ...), and stab_first_page == DISK_VOLHEADER_PAGE + 1. Sizing by nsect_total instead would leave a later extension with no bitmap bits for the new sectors, and the assert would fire on the next header fetch.

3.2 disk_format and disk_format_first_volume — writing the header

disk_format creates any volume; disk_format_first_volume is a thin shim that bootstraps the first volume (LOG_DBFIRST_VOLID) plus the cache: it calls disk_manager_init, bumps disk_Cache->nvols_perm = 1 (rolled back to 0 on failure), and sets ext_info.nsect_total == ext_info.nsect_max (no headroom on the first volume).

disk_format has many error paths. The flowchart accounts for every branch via its edge labels; the prose below adds only what the flowchart cannot carry.

flowchart TD
  A["validate name & purpose"] -->|name too long| RET1["return ER_..._TOO_LONG"]
  A -->|bad purpose| RET2["return ER_DISK_UNKNOWN_PURPOSE"]
  A -->|ok| B{"voltype == PERMANENT?"}
  B -->|yes: log undo RVDK_FORMAT| C["force flush both paths<br/>then fileio_format OS file"]
  B -->|no| C
  C -->|NULL_VOLDES| RET3["return error, nothing to clean"]
  C -->|ok| E["fix page 0 NEW_PAGE,<br/>ptype PAGE_VOLHEADER"]
  E -->|fix fails| X["goto exit"]
  E -->|ok| F["fill header,<br/>set_stab"]
  F --> G{"sys_lastpage >= extend_npages?"}
  G -->|yes: ER_IO_FORMAT_BAD_NPAGES| X
  G -->|no: set params/name/remarks, err goto exit| I{"PERMANENT?"}
  I -->|yes: RVDK_NEWVOL + RVDK_FORMAT redo offset=-1| K["disk_stab_init"]
  I -->|no| K
  K -->|err| X
  K -->|ok| L{"PERMANENT and volid != FIRST?"}
  L -->|yes: disk_set_link prev vol, err goto exit| N{"PERMANENT?"}
  L -->|no| N
  N -->|yes: RVDK_FORMAT redo offset=0| P{"TEMPORARY?"}
  N -->|no| P
  P -->|yes: flush+dwb, sys pages temp-LSA, err goto exit| R["nsect_free_out, dirty_and_free,<br/>flush + dwb_synchronize"]
  P -->|no| R
  R --> X["exit: unfix header page"]
  X --> S{"error_code != NO_ERROR?"}
  S -->|no| RET4["return NO_ERROR"]
  S -->|yes| T["pgbuf_invalidate_all"]
  T --> U{"TEMPORARY?"}
  U -->|yes| V["disk_unformat now,<br/>temp not logged"]
  U -->|no| RET5["return error, rollback removes it"]
  V --> RET5

Figure 3-2: every branch of disk_format. The cleanup split at the bottom is the heart of crash safety.

Two points the flowchart cannot fully carry:

Undo is logical, force-flush is unconditional. Only the undo RVDK_FORMAT (log_append_undo_data, carrying just the name) is gated on voltype == DB_PERMANENT_VOLTYPE — it lets rollback remove the whole volume, since there is no page-level undo. But logpb_force_flush_pages then runs on both paths, so the log reaches disk before the OS file exists and a crash mid-format is recoverable.
The exit: split. After any post-fix error, goto exit unfixes the header page, then pgbuf_invalidate_all. A temporary volume is then disk_unformat-ed immediately (no log, no rollback to lean on); a permanent volume returns the error and lets the top-action rollback (Chapter 5) replay the logical undo. The two permanent RVDK_FORMAT redos use addr.offset = -1 before disk_stab_init and 0 after linking — the sentinel recovery uses to tell a started format from a completed one.

3.3 disk_stab_init — laying out the bitmap

After the header is written, disk_stab_init walks every STAB page and marks the system sectors (those the header+STAB occupy) reserved, leaving the rest zero (free).

// disk_stab_init -- src/storage/disk_manager.c
DKNSECTS nsects_sys = SECTOR_FROM_PAGEID (volheader->sys_lastpage) + 1;   /* <- sectors to pre-reserve */
assert (nsects_sys < DISK_STAB_PAGE_BIT_COUNT);                          /* <- sys region fits in STAB page 0 */
for ( /* each STAB page */ ; ...; vpid_stab.pageid++)
  {
    page_stab = pgbuf_fix (..., NEW_PAGE, PGBUF_LATCH_WRITE, ...);        // NULL -> return error
    pgbuf_set_page_ptype (thread_p, page_stab, PAGE_VOLBITMAP);
    if (volheader->purpose == DB_TEMPORARY_DATA_PURPOSE) pgbuf_set_lsa_as_temporary (...);  /* <- no log for temp */
    memset (page_stab, 0, DB_PAGESIZE);                                  /* <- all sectors free by default */
    if (nsects_sys > 0)                                                  /* <- only while sys sectors remain (page 0 only) */
      { nsect_copy = nsects_sys;
        disk_stab_cursor_set_at_sectid (volheader, /* page start */ ..., &start_cursor);
        if ( /* last STAB page */ ) disk_stab_cursor_set_at_end (volheader, &end_cursor);   /* <- end = nsect_total */
        else disk_stab_cursor_set_at_sectid (volheader, /* next page start */ ..., &end_cursor);
        error_code = disk_stab_iterate_units (..., disk_stab_set_bits_contiguous, &nsect_copy); }  // err -> unfix + return
    if (volheader->purpose != DB_TEMPORARY_DATA_PURPOSE)                 /* <- permanent: log only the count, not the image */
      { DKNSECTS nsects_set = nsects_sys - nsect_copy;
        log_append_redo_data2 (thread_p, RVDK_INITMAP, NULL, page_stab, NULL_OFFSET, sizeof (nsects_set), &nsects_set); }
    if (!LOG_ISRESTARTED ()) { pgbuf_set_dirty (...); pgbuf_flush (..., FREE); page_stab = NULL; }  /* <- format: flush, pool invalidated next */
    else pgbuf_set_dirty_and_free (thread_p, page_stab);                                            /* <- recovery replay: dirty+free */
    nsects_sys = nsect_copy; nsect_copy = 0;   /* <- carry leftover to next page (normally 0 after page 1) */
  }

Every branch is tagged inline. The loop runs stab_npages times zeroing each page; the nsects_sys > 0 block fires only on the first page (the assert guarantees the system sectors fit there), and disk_stab_set_bits_contiguous fills whole BIT64_FULL units then trailing bits up to the end cursor.

3.4 disk_unformat — removing the OS file

Destruction is anticlimactic: the disk manager owns no in-memory bitmap, so disk_unformat only flushes, invalidates the page-buffer image, and deletes the file.

// disk_unformat -- src/storage/disk_manager.c
volid = fileio_find_volume_id_with_label (thread_p, vol_fullname);
if (volid != NULL_VOLID)
  {
    (void) pgbuf_flush_all (thread_p, volid);        /* <- push any dirty pages */
    (void) pgbuf_invalidate_all (thread_p, volid);   /* <- drop them from the pool */
  }
fileio_unformat (thread_p, vol_fullname);            /* <- delete the OS file */
return ret;                                          /* <- always NO_ERROR */

The single branch is volid != NULL_VOLID: an unmounted volume (no id for the label) skips flush/invalidate and only fileio_unformat runs. This is what disk_format calls on its temporary-volume error path (§3.2) and what recovery calls when undoing a permanent format.

3.5 The bitmap-as-functor pattern

Callers never read the STAB bit-by-bit. The manager quantizes it into 64-bit units and exposes one iterator — disk_stab_iterate_units — driving a DISK_STAB_UNIT_FUNC callback over a unit range. Reserve, unreserve, count-free, has-used, and contiguous-set are all just different callbacks.

Quantization. DISK_STAB_UNIT is UINT64. The macros mapping a SECTID to a position are pure integer arithmetic — a flat index split into (page, unit, bit):

// allocation-table addressing macros -- src/storage/disk_manager.c
#define DISK_ALLOCTBL_SECTOR_PAGE_OFFSET(sect) ((sect) / DISK_STAB_PAGE_BIT_COUNT)
#define DISK_ALLOCTBL_SECTOR_UNIT_OFFSET(sect) (((sect) % DISK_STAB_PAGE_BIT_COUNT) / DISK_STAB_UNIT_BIT_COUNT)
#define DISK_ALLOCTBL_SECTOR_BIT_OFFSET(sect)  (((sect) % DISK_STAB_PAGE_BIT_COUNT) % DISK_STAB_UNIT_BIT_COUNT)
#define DISK_STAB_NPAGES(nsect_max) (CEIL_PTVDIV (nsect_max, DISK_STAB_PAGE_BIT_COUNT))

DISK_STAB_NPAGES is the same CEIL_PTVDIV as in disk_volume_header_set_stab, keeping the header field and the macro in agreement.

flowchart LR
  SECT["SECTID"] --> PG["page offset<br/>sect / PAGE_BIT_COUNT"]
  SECT --> UN["unit offset<br/>(sect mod PAGE_BIT_COUNT) / 64"]
  SECT --> BT["bit offset<br/>(sect mod PAGE_BIT_COUNT) mod 64"]
  PG --> POS["cursor.pageid = stab_first_page + page offset"]
  UN --> POS2["cursor.offset_to_unit"]
  BT --> POS3["cursor.offset_to_bit"]

Figure 3-3: a SECTID split into (page, unit, bit) by three modulo/divide macros. The cursor stores all three plus the live unit pointer.

Cursor positioning

Three inline setters seed a DISK_STAB_CURSOR (fields in Chapter 1), differing only in the target sector; all leave page/unit NULL (the page is fixed lazily by disk_stab_cursor_fix).

disk_stab_cursor_set_at_sectid — general case: asserts 0 <= sectid <= nsect_total, fills pageid/offset_to_unit/offset_to_bit from the three macros, asserting pageid stays within stab_npages.
disk_stab_cursor_set_at_end — one past the last valid sector via set_at_sectid(volheader, nsect_total, cursor), first asserting nsect_total is unit-rounded (DISK_SECTS_ASSERT_ROUNDED) so iteration ends on a 64-bit boundary.
disk_stab_cursor_set_at_start — hard-codes sectid = 0, pageid = stab_first_page, both offsets 0 (skips set_at_sectid; the all-zero position is trivial).

Invariant — cursor position consistency. disk_stab_cursor_check_valid asserts (pageid - stab_first_page) * PAGE_BIT_COUNT + offset_to_unit * 64 + offset_to_bit == sectid, and that whenever unit != NULL, (char*)unit - page == offset_to_unit * DISK_STAB_UNIT_SIZE_OF. The iterator re-establishes this before every callback. If the offsets drift from sectid, reserved VSIDs name the wrong sectors — silent cross-linking corruption.

The iterator

// disk_stab_iterate_units -- src/storage/disk_manager.c
assert (disk_stab_cursor_compare (start, end) < 0);                   /* <- start strictly before end */
for (cursor = *start; cursor.pageid <= end->pageid; cursor.pageid++, cursor.offset_to_unit = 0)
  {
    error_code = disk_stab_cursor_fix (thread_p, &cursor, mode);       /* <- fix this STAB page */
    // ... err -> return ...
    end_unit = ((DISK_STAB_UNIT *) cursor.page)
      + (cursor.pageid == end->pageid ? end->offset_to_unit : DISK_STAB_PAGE_UNITS_COUNT);  /* <- clamp last page */
    for (; cursor.unit < end_unit;
         cursor.unit++, cursor.offset_to_unit++,
         cursor.sectid += (DISK_STAB_UNIT_BIT_COUNT - cursor.offset_to_bit),  /* <- advance by remaining bits */
         cursor.offset_to_bit = 0)
      {
        error_code = f_unit (thread_p, &cursor, &stop, f_unit_args);   /* <- the functor */
        if (error_code != NO_ERROR) { disk_stab_cursor_unfix (...); return error_code; }
        if (stop) { disk_stab_cursor_unfix (...); return NO_ERROR; }   /* <- early-out */
      }
    disk_stab_cursor_unfix (thread_p, &cursor);
  }

The inner stride advances sectid by DISK_STAB_UNIT_BIT_COUNT - cursor.offset_to_bit — normally a full 64, but a callback may leave offset_to_bit partway through a unit (as disk_stab_unit_reserve does), so the stride compensates. Two short-circuits unfix the page first: a callback error (returns the error) and a callback setting *stop = true (returns NO_ERROR). disk_stab_iterate_units_all wraps this with set_at_start/set_at_end.

Reserve — disk_stab_unit_reserve

The most branch-rich functor: it reserves up to nsects_lastvol_remaining free bits and records each VSID. All three branches are tagged inline.

// disk_stab_unit_reserve -- src/storage/disk_manager.c
if (*cursor->unit == BIT64_FULL) return NO_ERROR;    /* <- (1) full unit: nothing free, skip; no dirty/log */
context = (DISK_RESERVE_CONTEXT *) args;
if (*cursor->unit == 0)                              /* <- (2) empty unit: grab up to 64 in one store */
  { int bits_to_set = MIN (context->nsects_lastvol_remaining, DISK_STAB_UNIT_BIT_COUNT);
    *cursor->unit = (bits_to_set == DISK_STAB_UNIT_BIT_COUNT) ? BIT64_FULL
                                                              : bit64_set_trailing_bits (*cursor->unit, bits_to_set);
    log_unit = *cursor->unit; context->nsects_lastvol_remaining -= bits_to_set; /* ... emit one VSID per bit ... */ }
else                                                 /* <- (3) mixed unit: skip leading ones, set each free bit */
  { log_unit = 0;
    for (cursor->offset_to_bit = bit64_count_trailing_ones (*cursor->unit), cursor->sectid += cursor->offset_to_bit;
         cursor->offset_to_bit < DISK_STAB_UNIT_BIT_COUNT && context->nsects_lastvol_remaining > 0;
         cursor->offset_to_bit++, cursor->sectid++)
      if (!disk_stab_cursor_is_bit_set (cursor))
        { disk_stab_cursor_set_bit (cursor); log_unit = bit64_set (log_unit, cursor->offset_to_bit);
          context->nsects_lastvol_remaining--; /* ... push VSID ... */ } }
assert (log_unit != 0 && (log_unit & *cursor->unit) == log_unit);
if (context->purpose == DB_PERMANENT_DATA_PURPOSE)   /* <- permanent: undoredo delta; temp skips logging */
  log_append_undoredo_data2 (thread_p, RVDK_RESERVE_SECTORS, NULL, cursor->page, cursor->offset_to_unit,
                             sizeof (log_unit), sizeof (log_unit), &log_unit, &log_unit);
pgbuf_set_dirty (thread_p, cursor->page, DONT_FREE);
if (context->nsects_lastvol_remaining <= 0) *stop = true;

log_unit accumulates only the bits this call set; for permanent volumes it is both the redo and undo image of RVDK_RESERVE_SECTORS (redo re-sets, undo clears).

Invariant — log_unit is a strict subset of the unit’s set bits. The assert (log_unit != 0 && (log_unit & *cursor->unit) == log_unit) guarantees the logged delta holds only bits actually set and is never a no-op. A bit absent from *cursor->unit would make recovery’s redo set a bit the live run never set — divergence between logged and live bitmaps.

Unreserve — disk_stab_unit_unreserve

The mirror functor clears bits whose sector IDs the caller already knows (sorted in context->vsidp).

// disk_stab_unit_unreserve -- src/storage/disk_manager.c
while (context->nsects_lastvol_remaining > 0 && context->vsidp->sectid < cursor->sectid + DISK_STAB_UNIT_BIT_COUNT)
  { unreserve_bits = bit64_set (unreserve_bits, context->vsidp->sectid - cursor->sectid);  /* <- accumulate this unit's window, abs->rel bit */
    context->nsects_lastvol_remaining--; context->vsidp++; nsect++; }
assert ((unreserve_bits & (*cursor->unit)) == unreserve_bits);   /* <- only clear bits that are set */
if (unreserve_bits != 0)                                         /* <- skip an untouched unit */
  {
    if (context->purpose == DB_PERMANENT_DATA_PURPOSE)                 /* <- permanent: postpone clears at commit, rollback skips it */
      log_append_postpone (thread_p, RVDK_UNRESERVE_SECTORS, &addr /* page,offset_to_unit */, ..., &unreserve_bits);
    else                                                              /* <- temp: clear now + cache update under temp reserve lock */
      { (*cursor->unit) &= ~unreserve_bits; pgbuf_set_dirty (thread_p, cursor->page, DONT_FREE);
        disk_cache_update_vol_free (cursor->volheader->volid, nsect); }
  }
if (context->nsects_lastvol_remaining <= 0) *stop = true;

The purpose split is the asymmetry worth remembering, and it is tagged inline: permanent unreserve emits a postpone record, so a rollback never runs it and the sectors stay reserved; temporary unreserve clears immediately and updates the cache free count.

Invariant — unreserve only clears set bits. assert((unreserve_bits & *cursor->unit) == unreserve_bits) enforces that every sector being freed was actually reserved; a violation means double-free or a stale VSID list, corrupting free-sector accounting.

3.6 The 64-bit coupling and the hint_allocsect note

Hidden 64-bit coupling. The cursor primitives call bit64_is_set, bit64_set, bit64_set_trailing_bits, bit64_count_trailing_ones, bit64_count_zeros — all hard-wired to 64-bit operands. The DISK_STAB_UNIT comment suggests the unit type “can be modified and handled automatically,” but changing typedef UINT64 DISK_STAB_UNIT would silently break every bit64_* call and BIT64_FULL. The quantization macros adapt via DISK_STAB_UNIT_SIZE_OF; the bit-op layer does not. (Open question: whether the “automatic” claim was ever true.) Treat 64 bits as a fixed contract.

hint_allocsect. disk_format only seeds this to NULL_SECTID; the live update is on the reservation path Chapter 4 owns (disk_reserve_sectors_in_volume). The subtlety relevant here: it goes stale after an unreserve — disk_stab_unit_unreserve frees bits below the hint but never lowers it, so a later reservation skips the freshly freed sectors until the wrap-around pass reclaims them. It is an optimization, not an invariant, so the code neither logs nor dirties it.

3.7 Chapter summary — key takeaways

Layout is fixed and header-driven. Page 0 is the header; stab_first_page (always 1) begins a contiguous STAB sized by DISK_STAB_NPAGES(nsect_max); data follows sys_lastpage. Sizing by nsect_max not nsect_total lets a volume grow without re-layout.
disk_format is branch-heavy for crash safety. The logical undo (RVDK_FORMAT) is permanent-only, but logpb_force_flush_pages runs unconditionally before the OS file is created; permanent volumes log the header redo twice (offset -1 then 0); temporary volumes get temp-LSAs and are disk_unformat-ed immediately on error.
disk_stab_init pre-reserves exactly the system sectors (SECTOR_FROM_PAGEID(sys_lastpage)+1, all in the first STAB page), leaves the rest free, and logs only the count (RVDK_INITMAP), not the page image.
The bitmap is never scanned bit-by-bit. A SECTID decomposes into (page, unit, bit) via three macros, and disk_stab_iterate_units drives a DISK_STAB_UNIT_FUNC over 64-bit units, short-circuiting on full/empty units.
Reserve and unreserve are mirror functors with a purpose split. Permanent reserve logs an undoredo delta; permanent unreserve uses a postpone record so rollback keeps the sectors; temporary skips logging and updates the cache directly. The log_unit/unreserve_bits invariants keep logged and live bitmaps in lockstep.
64 bits is a hard contract, not a tunable: the bit64_* primitives and BIT64_FULL are not parameterized by unit size, despite the optimistic comment on DISK_STAB_UNIT.
hint_allocsect is live state owned by Chapter 4; disk_format only seeds it to NULL_SECTID. Its one subtlety here is staleness after unreserve — freeing sectors below the hint never lowers it.

Chapter 4: Sector Reservation Two-Step Protocol

A file that needs N sectors does not flip N bits under one lock. The disk manager splits the work into two disjoint phases (the high-level companion, CUBRID Disk Manager, explains why the cache exists). This chapter answers: when a file needs N sectors, how does the disk manager hand them out across volumes while keeping the hot mutex short and staying crash-safe?

4.1 The two structs that carry a reservation

A reservation is disk_reserve_context, a stack local in disk_reserve_sectors (re-built in the unreserve path), threaded through every function below.

// disk_reserve_context -- src/storage/disk_manager.c
struct disk_reserve_context
{
  int nsect_total;                                     /* original request size */
  VSID *vsidp;                                         /* write cursor into output array */
  DISK_CACHE_VOL_RESERVE cache_vol_reserve[VOLID_MAX]; /* per-volume ledger from step 1 */
  int n_cache_vol_reserve;                             /* ledger slots used */
  int n_cache_reserve_remaining;                       /* cache-phase debt */
  DKNSECTS nsects_lastvol_remaining;                   /* current-volume bitmap debt */
  DB_VOLPURPOSE purpose;                               /* permanent-data / temporary-data */
};

Field	Role	Why it exists
`nsect_total`	Immutable copy of request `N`.	Final `assert (vsidp - reserved_sectors == n_sectors)`; never decremented.
`vsidp`	Write pointer into `reserved_sectors[]`.	`vsidp - reserved_sectors` = sectors reserved so far; error path reads it for rollback.
`cache_vol_reserve[]`	Step-1 ledger, one `{volid, nsect}` per volume drawn from.	Step 2 replays it; error path refunds un-flipped sectors from it.
`n_cache_vol_reserve`	Count of used ledger slots.	Loop bound for step 2 and the rollback scan.
`n_cache_reserve_remaining`	Cache-phase debt; starts `N`, decremented by `disk_reserve_from_cache_volume`, 0 when satisfied.	Drives volume-iteration and extend decisions in step 1.
`nsects_lastvol_remaining`	Bitmap-phase debt within the current volume; seeded per-volume, decremented as bits flip.	`disk_stab_unit_reserve` drives off it; 0 sets `*stop`.
`purpose`	`DB_PERMANENT_DATA_PURPOSE` / `DB_TEMPORARY_DATA_PURPOSE`.	Selects cache mutex/extend-info and whether STAB changes are logged.

// disk_cache_vol_reserve -- src/storage/disk_manager.c
struct disk_cache_vol_reserve { VOLID volid; DKNSECTS nsect; };

Field	Role	Why it exists
`volid`	Volume the cache reserved from.	Step 2 fixes its header and flips its bits; rollback decrements its cache counter.
`nsect`	Count promised from `volid`.	Seeds `nsects_lastvol_remaining`; rollback decrements it per sector returned by undo, leaving the not-yet-flipped remainder.

Each ledger entry {volid, nsect} seeds one step-2 per-volume scan (Figure 4-1).

flowchart LR
  RC["disk_reserve_context"] --> L["cache_vol_reserve[i]\n{volid, nsect}"] -.seeds nsects_lastvol_remaining.-> S2["step 2 per-volume scan -> reserved_sectors[]"]

Figure 4-1. Reserve context, its per-volume ledger, and the step-2 scan that fills the output array.

Invariant — the two remaining-counters never alias. n_cache_reserve_remaining is the cache debt; nsects_lastvol_remaining is the current-volume bitmap debt. The cache phase finishes with n_cache_reserve_remaining == 0 and sum(cache_vol_reserve[i].nsect) == N. Separating them lets step 2 be re-driven per volume without re-touching the cache; aliasing would let a partial volume scan corrupt cache accounting.

4.2 The outer driver: `disk_reserve_sectors`

disk_reserve_sectors(thread_p, purpose, volid_hint, n_sectors, reserved_sectors) is the disk/file boundary call. volid_hint is accepted but ignored; volume order is governed by purpose.

Guards. assert purpose is perm or temp; n_sectors <= 0 || reserved_sectors == NULL -> assert_release(false); ER_FAILED.

Sysop precondition for permanent reservations (their STAB changes are logged onto the outer transaction):

// disk_reserve_sectors -- src/storage/disk_manager.c
if (purpose != DB_TEMPORARY_DATA_PURPOSE && !log_check_system_op_is_started (thread_p))
  { assert (false); er_set (...ER_GENERIC_ERROR, 0); return ER_FAILED; } /* caller forgot sysop */

retry: / log_sysop_start — even temp reservations open a sysop to scope the bitmap phase.
CSECT_DISK_CHECK as reader (excludes the consistency checker). Fail -> log_sysop_abort; return.
Init context in place: nsect_total = n_cache_reserve_remaining = n_sectors, vsidp = reserved_sectors, n_cache_vol_reserve = 0.
Step 1 — disk_reserve_from_cache. Error -> goto error.
Step 2 — loop disk_reserve_sectors_in_volume over [0, n_cache_vol_reserve). Any error -> goto error.
Success. assert ((vsidp - reserved_sectors) == n_sectors); exit csect; log_sysop_attach_to_outer; in debug, if did_extend, disk_check; return NO_ERROR. The error: path (4.7) handles rollback.

flowchart TD
  C["csect_enter CSECT_DISK_CHECK after sysop start"] -->|fail| AB["log_sysop_abort, return err"]
  C -->|ok| D["init context"] --> E["step 1: disk_reserve_from_cache"]
  E -->|err| ERR["goto error: rollback (4.7)"]
  E -->|ok| F["step 2 loop: disk_reserve_sectors_in_volume per ledger entry"]
  F -->|err| ERR
  F -->|ok| H["assert vsidp-base==N, attach_to_outer, NO_ERROR"]

Figure 4-2. disk_reserve_sectors control flow.

4.3 Step 1 entry: `disk_reserve_from_cache`

Moves free-sector counts into the ledger, extending the disk if short, holding the reserve mutex only across counter math.

disk_Cache == NULL -> assert_release(false); return ER_FAILED.
Lock the purpose’s reserve mutex (disk_cache_lock_reserve_for_purpose).

Temp purpose prefers perm-type-temp-purpose volumes before genuine temp volumes:

// disk_reserve_from_cache -- src/storage/disk_manager.c
if (context->purpose == DB_TEMPORARY_DATA_PURPOSE)
  {
    extend_info = &disk_Cache->temp_purpose_info.extend_info;
    if (disk_Cache->temp_purpose_info.nsect_perm_free > 0)
      disk_reserve_from_cache_vols (DB_PERMANENT_VOLTYPE, context); /* <- perm-temp first */
    if (context->n_cache_reserve_remaining <= 0)                    /* satisfied from perm-temp */
      { disk_cache_unlock_reserve_for_purpose (context->purpose); return NO_ERROR; }
    // ... temp-ceiling check, then fall through to temp-volume extend ...
  }
else
  extend_info = &disk_Cache->perm_purpose_info.extend_info;

nsect_perm_free = free sectors on perm-type volumes carrying temp purpose; when 0 those volumes are skipped.

Temp-space ceiling (temp branch, before extending temp volumes): if extend_info->nsect_total - extend_info->nsect_free + n_cache_reserve_remaining > disk_Temp_max_sects -> er_set (ER_BO_MAXTEMP_SPACE_HAS_BEEN_EXCEEDED ...); unlock; return. Operands are the extend-info pool aggregates, not the context’s nsect_total.
Common tail: assert (n_cache_reserve_remaining > 0) and assert this thread holds the mutex.

Reserve from existing free space if the pool is big enough:

if (extend_info->nsect_free > context->n_cache_reserve_remaining) /* strict >: a hair of headroom, see Ch 5 */
  {
    disk_reserve_from_cache_vols (extend_info->voltype, context);
    if (context->n_cache_reserve_remaining <= 0)
      { disk_cache_unlock_reserve (extend_info); return NO_ERROR; } /* <- done from existing */
  }

Short -> extend. Bump extend_info->nsect_intention (signals concurrent reservers), drop the reserve mutex, take disk_lock_extend(), re-take the reserve mutex and re-check. If a peer already extended so nsect_free now suffices: decrement intention, retry disk_reserve_from_cache_vols, return. Else call disk_extend (Ch 5) and back the intention out. Both locks released on every exit.
Post-extend. disk_extend error -> return it. Still n_cache_reserve_remaining > 0 -> assert_release(false); ER_FAILED. Else *did_extend = true; return NO_ERROR.

Invariant — the reserve mutex is never held across a STAB scan or an extend. The intention counter is the hand-off token that lets the mutex drop during the slow extend without two threads double-extending.

4.4 Iterating volumes: `disk_reserve_from_cache_vols`

// disk_reserve_from_cache_vols -- src/storage/disk_manager.c
if (type == DB_PERMANENT_VOLTYPE)                       /* perm: ascend 0..nvols_perm */
  { start_iter = 0; end_iter = disk_Cache->nvols_perm; incr = 1; min_free = MIN (context->nsect_total, perm...nsect_vol_max) / 2; }
else                                                    /* temp: descend from top of volid space */
  { start_iter = LOG_MAX_DBVOLID; end_iter = LOG_MAX_DBVOLID - disk_Cache->nvols_temp; incr = -1; min_free = MIN (context->nsect_total, temp...nsect_vol_max) / 2; }
min_free = MAX (min_free, 1);                           /* half the smaller of request/per-vol max, floored at 1 */

for (volid_iter = start_iter;
     volid_iter != end_iter && context->n_cache_reserve_remaining > 0;     /* stop when range exhausted or debt paid */
     volid_iter += incr)
  {
    if (disk_Cache->vols[volid_iter].purpose != context->purpose) continue;   /* wrong purpose */
    if (disk_Cache->vols[volid_iter].nsect_free < min_free) continue;         /* too fragmented */
    disk_reserve_from_cache_volume (volid_iter, context);
  }

4.5 Decrementing one volume’s counter: `disk_reserve_from_cache_volume`

The only place step 1 actually moves sectors out of the cache.

// disk_reserve_from_cache_volume -- src/storage/disk_manager.c
if (context->n_cache_vol_reserve >= LOG_MAX_DBVOLID)
  { assert_release (false); return; }                       /* <- ledger overflow guard */
disk_check_own_reserve_for_purpose (context->purpose);      /* <- assert mutex held by us */
nsects = MIN (disk_Cache->vols[volid].nsect_free, context->n_cache_reserve_remaining);
disk_cache_update_vol_free (volid, -nsects);                /* <- decrement cache + purpose pool */
context->cache_vol_reserve[context->n_cache_vol_reserve].volid = volid;
context->cache_vol_reserve[context->n_cache_vol_reserve].nsect = nsects;
context->n_cache_reserve_remaining -= nsects;
context->n_cache_vol_reserve++;                             /* <- bitmap untouched, only counters */

disk_cache_update_vol_free also adjusts the matching purpose-pool aggregate.

4.6 Step 2: `disk_reserve_sectors_in_volume` flips the bits

Per ledger entry, fixes the volume header under a write latch (cache mutex not held) and flips STAB bits until the per-volume debt hits zero.

Read ledger. volid = cache_vol_reserve[vol_index].volid; if NULL_VOLID -> assert_release(false); ER_FAILED. Seed nsects_lastvol_remaining = cache_vol_reserve[vol_index].nsect.
Fix volume header PGBUF_LATCH_WRITE; on error -> return.

Hint-guided scan. Three scan shapes via disk_stab_iterate_units(..., disk_stab_unit_reserve, context); each error path does goto exit:

// disk_reserve_sectors_in_volume -- src/storage/disk_manager.c
if (volheader->hint_allocsect > 0 && volheader->hint_allocsect < volheader->nsect_total)
  {
    // ... cursors hint..end; iterate ...                              /* after hint */
    if (context->nsects_lastvol_remaining > 0)                         /* still short: wrap start..hint */
      { end_cursor = start_cursor; disk_stab_cursor_set_at_start (volheader, &start_cursor);
        error_code = disk_stab_iterate_units (...); }
  }
else
  { /* ... cursors start..end; iterate whole table ... */ }

Must be satisfied. if (nsects_lastvol_remaining != 0) { assert_release(false); ER_FAILED; goto exit; } — residue means cache and bitmap disagree (a bug).
Advance the hint. hint_allocsect = (vsidp - 1)->sectid + 1; best-effort, neither dirtied nor logged.
exit: unfix the header if fixed; return error_code.

The bit-flip lives in the disk_stab_unit_reserve callback, invoked per 64-bit STAB unit (full unit BIT64_FULL returns early; empty unit 0 filled in bulk; partial unit walked bit by bit), recording each new sector into context->vsidp. Permanent purpose logs each change:

// disk_stab_unit_reserve -- src/storage/disk_manager.c
if (context->purpose == DB_PERMANENT_DATA_PURPOSE)         /* redo+undo image = the changed bits mask */
  log_append_undoredo_data2 (thread_p, RVDK_RESERVE_SECTORS, NULL, cursor->page,
                             cursor->offset_to_unit, sizeof (log_unit), sizeof (log_unit), &log_unit, &log_unit);
pgbuf_set_dirty (thread_p, cursor->page, DONT_FREE);
if (context->nsects_lastvol_remaining <= 0) { *stop = true; }   /* <- end the volume scan */

Redo and undo images are the same log_unit mask; the recovery handlers disk_rv_reserve_sectors / disk_rv_unreserve_sectors re-sync the cache under CSECT_DISK_CHECK (recovery chapter). Temporary reservations log nothing — their bits reset wholesale on restart.

Invariant — the cache mutex is released throughout step 2. Step 1 charged the counters; step 2 touches only page latches and the WAL, so the hot reserve mutex is held for O(volumes) counter math, never O(sectors) bitmap I/O.

4.7 Failure and rollback

If either step errors, disk_reserve_sectors jumps to error:. Let nreserved = vsidp - reserved_sectors be the sectors actually flipped.

nreserved > 0 and temp purpose: nothing was logged, so abort cannot undo the partial bitmap changes; disable interrupt checks, qsort the VSIDs, call disk_unreserve_ordered_sectors_without_csect. Permanent skips this — the log_sysop_abort below undoes its logged changes.

Reconcile the ledger with what abort/undo already returned: for each flipped sector, decrement its volume’s cache_vol_reserve[].nsect, leaving only sectors charged to the cache but never flipped:

// disk_reserve_sectors (error path) -- src/storage/disk_manager.c
for (iter_vsid = 0; iter_vsid < nreserved; iter_vsid++)
  {
    for (iter = 0; iter < context.n_cache_vol_reserve; iter++)
      if (reserved_sectors[iter_vsid].volid == context.cache_vol_reserve[iter].volid)
        { context.cache_vol_reserve[iter].nsect--; break; }      /* <- don't double-credit */
    assert (iter < context.n_cache_vol_reserve);
  }

Refund the residue via disk_cache_free_reserved(&context) (adds remaining nsect back through disk_cache_update_vol_free under the reserve mutex).
Exit csect, log_sysop_abort — for permanent purpose this rolls the logged STAB bits back.
Classify the error. Expected IO/interrupt errors (ER_INTERRUPTED, ER_IO_MOUNT_FAIL, ER_IO_FORMAT_OUT_OF_SPACE, ER_IO_WRITE, ER_BO_CANNOT_CREATE_VOL) return as-is. Anything else trips assert_release(false) and self-heals: if not yet retried, disk_check(thread_p, true); if it reports DISK_INVALID, clear the error, set retried = true, goto retry. A second failure or non-skew cause returns.

disk_unreserve_ordered_sectors_without_csect rebuilds a fresh context from the ordered VSID list, grouping consecutive same-volid runs into ledger entries (asserting increasing volids and sectids), then calls disk_unreserve_sectors_from_volume per group, returning the first error (ASSERT_ERROR (); return error_code;) without refunding the remaining groups. Its disk_stab_unit_unreserve callback clears bits and returns sectors to the cache — the “removed from cache too” effect loop (2) compensates for.

Invariant — reserve order cache->bitmap, release order bitmap->cache; the cache never overcounts. On reserve the counter drops before the bit is set; on release the bit clears before the counter rises. Both transients leave the cache showing less free than the bitmap, so two reservers can never both be told a sector is free; disk_check repairs the bounded skew, which is why the error: path can retry through it.

4.8 Chapter summary — key takeaways

Two disjoint phases. Step 1 (disk_reserve_from_cache) moves free-sector counts out of the cache under the short reserve mutex; step 2 (disk_reserve_sectors_in_volume) flips STAB bits under page latches with that mutex released.
Two independent debt counters. n_cache_reserve_remaining (cache) and nsects_lastvol_remaining (per-volume bitmap) never alias, so step 2 is driven volume by volume off cache_vol_reserve[].
Temp prefers perm-type-temp-purpose volumes. When nsect_perm_free > 0 those are scanned first, then fall through to temp-volume extension bounded by disk_Temp_max_sects.
The hot mutex is never held across slow work. An intention counter lets the reserve mutex drop during disk_extend; step 2 never re-takes it.
Permanent reservations are WAL-logged per STAB unit; temporary ones are not. Temp bits reset on restart, so temp rollback physically un-flips them via disk_unreserve_ordered_sectors_without_csect.
The transient skew is always conservative. Reserve cache->bitmap, release bitmap->cache, so the cache never reports more free than exists; disk_check repairs the bounded skew and the error: path retries through it once.
The error path reconciles before refunding. It decrements ledger entries for already-returned sectors, then disk_cache_free_reserved refunds only the never-flipped residue, avoiding double-credit.

Chapter 5: Volume Extension as a Nested Top Action

The reader question this chapter answers: what happens inside Step 1 of sector reservation when even the permanent-type / temporary-purpose fallback runs dry and the cache can no longer satisfy the request? The reserving thread must grow the database — extend an existing OS file or create a new volume — before it can finish reserving. This chapter traces that escalation from disk_reserve_from_cache through disk_extend, disk_volume_expand, and disk_add_volume, and shows why the growth must be a nested top action committed independently of the triggering reservation. It continues Chapter 4 and the cache-vs-disk split in the high-level companion (cubrid-disk-manager.md, “Sector reservation”).

5.1 Where extension is triggered: the race window in `disk_reserve_from_cache`

When the running free count cannot cover n_cache_reserve_remaining, the function records its intention, drops the reserve mutex, then takes the extend mutex. The order is mandatory — mutex_extend carries the comment never get expand mutex while keeping reserve mutexes; the opposite order would deadlock against a concurrent expander already holding mutex_extend.

// disk_reserve_from_cache -- src/storage/disk_manager.c
  extend_info->nsect_intention += context->n_cache_reserve_remaining;   /* <- publish demand BEFORE releasing */
  disk_cache_unlock_reserve (extend_info);
  disk_lock_extend ();                          /* <- serializes all expanders; flips reserve -> extend mutex */
  disk_cache_lock_reserve (extend_info);
  if (extend_info->nsect_free > context->n_cache_reserve_remaining)     /* <- race: someone already grew it */
    {
      extend_info->nsect_intention -= context->n_cache_reserve_remaining;
      disk_reserve_from_cache_vols (extend_info->voltype, context);
      if (context->n_cache_reserve_remaining <= 0)
        { disk_cache_unlock_reserve (extend_info); disk_unlock_extend (); return NO_ERROR; }  /* <- no extend */
      extend_info->nsect_intention += context->n_cache_reserve_remaining;
    }
  save_remaining = context->n_cache_reserve_remaining;   /* <- snapshot, to undo intention after extend */
  disk_cache_unlock_reserve (extend_info);
  error_code = disk_extend (thread_p, extend_info, context);    /* <- the slow path */

The interval between the two mutexes is the race window: another thread can grab mutex_extend first, grow the volume, and refill nsect_free. The double-check after disk_lock_extend() catches that — if the volume is now large enough this thread reverses its nsect_intention bump and reserves from the grown cache with no disk I/O. (disk_extend opens with assert (disk_Cache->owner_extend == thread_get_entry_index (thread_p)), proving it runs only under the extend mutex.)

INVARIANT — nsect_intention is the load-bearing accumulator of unmet demand. A thread adds its remaining need under the reserve mutex and subtracts the same save_remaining snapshot once met. If violated (an add with no matching subtract on an error path), every future disk_extend over-allocates by the leaked amount forever, since it reads nsect_intention as the floor of how much to grow.

5.2 `disk_extend`: deciding how much, then expand-then-add

disk_extend runs under mutex_extend over a snapshot of the DISK_EXTEND_INFO counters (Chapter 1), sizing the growth then executing it in two phases.

// disk_extend -- src/storage/disk_manager.c
  target_free = MAX ((DKNSECTS) (total * 0.01), DISK_MIN_VOLUME_SECTS);   /* <- 1% of size, floored */
  nsect_extend = MAX (target_free - free, 0) + intention;                 /* <- coalesce all unmet demand */
  if (nsect_extend <= 0)
    return NO_ERROR;            /* <- branch 1: free exceeds target, no intentions */
  // ... condensed ...
  if (total < max)              /* <- phase 1: extendable volume still has room */
    {
      to_expand = MIN (nsect_extend, max - total);    /* <- never exceed this volume's ceiling */
      log_sysop_start (thread_p);                      /* <- NESTED TOP ACTION begins */
      error_code = disk_volume_expand (thread_p, extend_info->volid_extend, voltype, to_expand, &nsect_free_new);
      if (error_code != NO_ERROR)
        { ASSERT_ERROR (); log_sysop_abort (thread_p); return error_code; }   /* <- header undo */
      log_sysop_commit (thread_p);             /* <- commit independently of outer reservation */
      if (extend_info->nsect_total == extend_info->nsect_max)
        extend_info->volid_extend = NULL_VOLID;   /* <- maxed out; never extend this volume again */
      nsect_extend -= nsect_free_new;
      // ... condensed: bump nsect_total; under reserve mutex update vol_free + reserve ahead ...
      if (nsect_extend <= 0)
        return NO_ERROR;                        /* <- expansion alone covered the demand */
    }
  // ... condensed: assert (nsect_extend > 0); volext init (nsect_max, voltype, purpose, overwrite=false) ...
  while (nsect_extend > 0)     /* <- phase 2: add fresh volumes */
    {
      if (check_interrupt && logtb_is_interrupted (thread_p, true, &continue_check))
        { er_set (..., ER_INTERRUPTED, 0); return ER_INTERRUPTED; }   /* <- branch: only if re-enabled */
      volext.nsect_total = nsect_extend + DISK_SYS_NSECT_SIZE (volext.nsect_max);
      // ... condensed: clamp to [DISK_MIN_VOLUME_SECTS, nsect_max] then DISK_SECTS_ROUND_UP ...
      error_code = disk_add_volume (thread_p, &volext, &volid_new, &nsect_free_new);
      if (error_code != NO_ERROR)
        { ASSERT_ERROR (); return error_code; }   /* <- disk_add_volume aborted its own sysop */
      nsect_extend -= nsect_free_new;
      // ... condensed: bump nsect_total/nsect_max; under reserve mutex set vol_free + reserve ahead ...
      if (extend_info->nsect_total < extend_info->nsect_max)
        extend_info->volid_extend = volid_new;  /* <- newest non-maxed volume becomes extendable */
    }
  return NO_ERROR;

nsect_extend adds the (non-negative) headroom shortfall to intention, so one expansion serves every thread blocked on this purpose. Phase 1 grows the sub-ceiling volid_extend volume and reserves ahead; phase 2’s three branches — interrupt, disk_add_volume error (callee already aborted), and the volid_extend update (only if sub-max) — are annotated inline.

INVARIANT — exactly one volume per purpose is “extendable”. volid_extend names the single volume phase 1 grows; the code clears it to NULL_VOLID the instant a volume reaches nsect_max and re-points it at the newest sub-max volume. If violated, a maxed volume could reach disk_volume_expand with to_expand = MIN(nsect_extend, max - total) non-positive, tripping an assert.

flowchart TD
  C{"nsect_extend <= 0?"} -->|yes| Z1["return NO_ERROR"]
  C -->|no| D{"total < max?"}
  D -->|yes| E["sysop_start; disk_volume_expand"]
  E --> F{"error?"} -->|yes| G["sysop_abort; return error"]
  F -->|no| H["sysop_commit; cache + reserve ahead"]
  H --> I{"nsect_extend <= 0?"} -->|yes| Z2["return NO_ERROR"]
  I -->|no| J["phase 2 loop"]
  D -->|no| J
  J --> K{"interrupted?"} -->|yes| L["return ER_INTERRUPTED"]
  K -->|no| M["size volext; disk_add_volume"]
  M --> N{"error?"} -->|yes| O["return error"]
  N -->|no| P["cache + reserve ahead; set volid_extend if sub-max"]
  P --> Q{"nsect_extend > 0?"} -->|yes| K
  Q -->|no| Z3["return NO_ERROR"]

Figure 5-1. Branch-complete flow of disk_extend: sizing, optional in-place expand, then the add-volume loop.

5.3 `disk_volume_expand`: growing one file as its own sysop

disk_volume_expand grows a single volume in place. Its six-step recipe’s ordering is the whole point — it makes the growth crash-safe.

// disk_volume_expand -- src/storage/disk_manager.c
  error_code = disk_get_volheader (thread_p, volid, PGBUF_LATCH_WRITE, &page_volheader, &volheader);
  if (error_code != NO_ERROR)
    { assert_release (false); er_set (..., ER_GENERIC_ERROR, 0); return ER_FAILED; }  /* <- header fix fatal */
  do_logging = (volheader->type == DB_PERMANENT_VOLTYPE);   /* <- temp volumes are not logged */
  log_sysop_start (thread_p);                    /* step 1: own sysop so header change can be undone */
  volheader->nsect_total += nsect_extend;
  if (do_logging)
    log_append_undoredo_data2 (thread_p, RVDK_VOLHEAD_EXPAND, ...);   /* step 2: header undo/redo */
  volume_new_npages = DISK_SECTS_NPAGES (volheader->nsect_total);
  if (do_logging)
    log_append_dboutside_redo (thread_p, RVDK_EXPAND_VOLUME, ...);    /* step 3: unattached redo */
  pgbuf_set_dirty_and_free (thread_p, page_volheader);   /* free header only after step 3 is logged */
  log_sysop_commit (thread_p);                   /* step 4: cancel the header-undo */
  logpb_force_flush_pages (thread_p);            /* step 5: log MUST be on disk before the file grows */
  error_code = fileio_expand_to (thread_p, volid, volume_new_npages, voltype);   /* step 6: grow OS file */
  if (error_code != NO_ERROR)
    { assert (false); return error_code; }        /* <- cannot-happen: growth already durable; cache desyncs */
  *nsect_extended_out = nsect_extend;
  return NO_ERROR;

The header-fix failure is fatal; the do_logging branch skips both log records for temp volumes (never recovered); the fileio_expand_to failure is a cannot-happen branch since log_sysop_commit already made the growth durable. RVDK_VOLHEAD_EXPAND (disk_rv_volhead_extend_undo/..._redo) adjusts nsect_total and the cache by the same delta; RVDK_EXPAND_VOLUME re-runs fileio_expand_to on recovery.

INVARIANT — the file-growth redo log must be durable before the file is grown. Step 5 (logpb_force_flush_pages) sits between the committed header update and fileio_expand_to. If skipped, a crash in between leaves the recovered header and the OS file disagreeing on size.

5.4 `disk_add_volume`: a fresh OS file plumbed into three registries

When in-place expansion is exhausted, disk_extend calls disk_add_volume — the second nested top action — wrapping the cache-mutating steps in log_sysop_start/log_sysop_commit.

// disk_add_volume -- src/storage/disk_manager.c
  if (disk_Cache->nvols_perm + disk_Cache->nvols_temp >= LOG_MAX_DBVOLID)
    return ER_BO_MAXNUM_VOLS_HAS_BEEN_EXCEEDED;            /* <- volume-id space exhausted */
  error_code = boot_get_new_volume_name_and_id (..., &volid);     /* step 1: name + id from boot */
  // ... condensed: raw-device symlink, partition free-space check ...
  if (nsect_part_max >= 0 && nsect_part_max < extinfo->nsect_max)
    return ER_IO_FORMAT_OUT_OF_SPACE;                       /* step 2 failed: not enough OS disk space */
  if (!extinfo->overwrite && fileio_is_volume_exist (extinfo->name))
    { /* ... condensed: disk_can_overwrite_data_volume check ... */
      return ER_BO_VOLUME_EXISTS;                           /* <- refuse to clobber an existing file */ }
  log_sysop_start (thread_p);                                /* NESTED TOP ACTION begins */
  if (extinfo->voltype == DB_PERMANENT_VOLTYPE) disk_Cache->nvols_perm++;   /* step 3: cache before format */
  else disk_Cache->nvols_temp++;
  disk_Cache->vols[volid].purpose = extinfo->purpose;
  error_code = disk_format (thread_p, boot_db_full_name (), volid, extinfo, nsects_free_out);  /* step 4 */
  if (error_code != NO_ERROR)
    { ASSERT_ERROR (); goto exit; }
  if (extinfo->voltype == DB_PERMANENT_VOLTYPE)
    if (logpb_add_volume (NULL, volid, extinfo->name, DB_PERMANENT_DATA_PURPOSE) == NULL_VOLID)
      { ASSERT_ERROR_AND_SET (error_code); goto exit; }      /* step 5: register in _vinf (perm only) */
  error_code = boot_dbparm_save_volume (thread_p, extinfo->voltype, volid);    /* step 6: persist in boot_Db_parm */
  if (error_code != NO_ERROR)
    {
      ASSERT_ERROR ();
      if (extinfo->voltype == DB_TEMPORARY_VOLTYPE && disk_unformat (thread_p, extinfo->name) != NO_ERROR)
        assert (false);   /* <- rollback won't drop temp file; do it by hand */
      goto exit;
    }
  *volid_out = volid;
exit:
  if (error_code == NO_ERROR)
    log_sysop_commit (thread_p);
  else
    {
      log_sysop_abort (thread_p);
      if (extinfo->voltype == DB_TEMPORARY_VOLTYPE) disk_Cache->nvols_temp--;   /* <- undo cache count manually */
      else disk_Cache->nvols_perm--;
    }
  return error_code;

Three registries (Figure 5-2): boot_Db_parm updated last (a crash before it leaves no dangling reference); the _vinf registry via logpb_add_volume, permanent only; and disk_Cache, nvols_* and vols[volid].purpose bumped first so disk_format’s page fixes find the volume classified. Every goto exit funnels into one log_sysop_abort; two things logging cannot undo are fixed by hand in the abort arm — the raw nvols_* counter and, for a temp volume, the file (disk_unformat, since temp creation is not journaled). A permanent volume’s file is handled by recovery via the logged format records.

graph TD
  AV["disk_add_volume\nnew volume file"] --> BP["boot_Db_parm\nboot_dbparm_save_volume()"]
  AV --> VI["_vinf volinfo registry\nlogpb_add_volume() perm only"]
  AV --> DC["disk_Cache\nnvols_*++, vols[volid].purpose"]
  AV --> FMT["disk_format()\nzeroes file, writes volheader + sector table"]

Figure 5-2. The three registries disk_add_volume plumbs a new volume into, plus the on-disk format step.

5.5 `disk_add_volume_extension`: the addvoldb / boot-time entry, and the retired daemon

disk_extend is the automatic path; disk_add_volume_extension is the explicit entry, called by addvoldb and at database creation. It does not size against nsect_intention — the caller dictates npages — but respects the same serialization, taking disk_lock_extend() and the CSECT_DISK_CHECK reader latch so an admin addvol cannot race an automatic disk_extend.

// disk_add_volume_extension -- src/storage/disk_manager.c
  error_code = csect_enter_as_reader (thread_p, CSECT_DISK_CHECK, INF_WAIT);
  disk_lock_extend ();                            /* <- block other expansions */
  // ... condensed: realpath, fill ext_info from caller args ...
  ext_info.nsect_total = disk_sectors_to_extend_npages (npages);
  ext_info.nsect_max = ext_info.nsect_total;      /* <- born at its max: never auto-grown */
  if (voltype == DB_TEMPORARY_VOLTYPE)
    {
      if (disk_Cache->temp_purpose_info.extend_info.nsect_total + ext_info.nsect_total > disk_Temp_max_sects)
        { er_set (..., ER_BO_MAXTEMP_SPACE_HAS_BEEN_EXCEEDED, ...);
          disk_unlock_extend (); csect_exit (thread_p, CSECT_DISK_CHECK);
          return ER_BO_MAXTEMP_SPACE_HAS_BEEN_EXCEEDED; }   /* <- temp-space cap: release BOTH locks */
      ext_info.voltype = DB_TEMPORARY_VOLTYPE;
    }
  else
    ext_info.voltype = DB_PERMANENT_VOLTYPE;
  error_code = disk_add_volume (thread_p, &ext_info, &volid_new, &nsect_free);
  if (error_code != NO_ERROR)
    { ASSERT_ERROR (); disk_unlock_extend (); csect_exit (thread_p, CSECT_DISK_CHECK); return error_code; }
  // ... condensed: bump per-purpose nsect_total/nsect_max, update vol_free under reserve mutex ...
  disk_unlock_extend (); csect_exit (thread_p, CSECT_DISK_CHECK);
  *volid_out = volid_new;
  return NO_ERROR;

ext_info.nsect_max = ext_info.nsect_total means a user-added volume is born at its maximum size, never a candidate for in-place expansion. Three branches: the temp-space-exceeded early return, the disk_add_volume error return (both releasing the extend mutex and the critical section), and the success path. The post-add bookkeeping distinguishes a permanent-type volume serving temporary purpose from a true temporary-type volume — the three-way classification used throughout the cache.

The retired daemon. The comment atop disk_extend still mentions an auto-expansion thread keeping “a stable level of free space,” but that daemon has been removed — which is why nsect_intention is now the sole coalescing mechanism: the first thread to take the extend mutex must grow enough for itself and every thread that published an intention while it waited.

5.6 Why a nested top action — and not the outer transaction

Both disk_volume_expand and disk_add_volume wrap their durable work in log_sysop_start/log_sysop_commit rather than letting it ride on the outer reservation’s transaction. The grower acts on behalf of all co-users of the new space: reserve-ahead hands fresh sectors to the triggering reservation, but other waiting threads reserve from the same volume once the extend mutex is released. If the growth rode the outer transaction and that transaction later rolled back, every co-user would be forced to roll back too — a volume several transactions depend on would vanish. Committing as an independent nested top action makes the space durable regardless of the triggering transaction’s fate: the reservation can still abort; the volume stays. This is the discipline the companion describes for file-table updates, applied to the coarsest unit of growth.

5.7 Chapter summary — key takeaways

The extend path is entered only after the cache fails twice. disk_reserve_from_cache records nsect_intention, releases the reserve mutex, takes mutex_extend, and re-checks free space — the double-check absorbs the race where another thread already grew the volume between the two mutexes.
nsect_intention is the load-bearing accumulator. With the auto-expansion daemon removed, it is the only mechanism coalescing concurrent demand; disk_extend adds it to nsect_extend so one expansion serves every waiting thread, and paired +=/-= (with a save_remaining snapshot) keep it balanced across error paths.
disk_extend is expand-then-add. It grows the single volid_extend volume in place up to nsect_max (one volume per purpose may grow), then loops adding fresh volumes for residual demand, reserving ahead into the caller’s context after each step.
disk_volume_expand orders log-before-grow. Header undo/redo plus an unattached RVDK_EXPAND_VOLUME redo, a forced log flush, then fileio_expand_to — whose failure is unrecoverable by construction.
disk_add_volume plumbs the new file into three registries — boot_Db_parm (last), the _vinf file (permanent only), and disk_Cache (counts first) — manually undoing the unlogged cache counter and unformatting orphaned temp files on error.
disk_add_volume_extension is the explicit addvol / boot-time twin: same mutex_extend serialization, caller-supplied size, and nsect_max == nsect_total so user volumes are never auto-grown.
Growth is a nested top action so co-users are not held hostage: committing the expansion independently means a later rollback of the triggering reservation cannot destroy a volume other transactions now depend on.

Chapter 6: File Creation and the Three-Table Layout

The high-level companion (cubrid-disk-manager.md) explains why a file is a set of reserved sectors. This chapter answers the mechanical follow-up: once disk_reserve_sectors (Ch.4) returns a sorted VSID array, how does file_create turn it into a usable file — header page, VFID, and the partial / full / user-page tables every later allocation relies on? We assume the VSID array exists and trace the file-manager side.

file_create (in file_manager.c) is the one engine. Everything else — the file_create_heap / temp / ehash family — is a thin wrapper that picks two booleans (is_temp, is_numerable), a FILE_TYPE, a FILE_TABLESPACE, and an optional FILE_DESCRIPTORS, then calls it.

A file’s header page (PAGE_FTAB) begins with one file_header struct, followed by one to three file_extensible_data table headers. Two of file_header’s members are themselves structs (FILE_TABLESPACE, FILE_DESCRIPTORS).

// struct file_header -- src/storage/file_manager.c
struct file_header
{
  INT64 time_creation;          /* Time of file creation. */
  VFID self;                    /* Self VFID */
  FILE_TABLESPACE tablespace;   /* The table space definition */
  FILE_DESCRIPTORS descriptor;  /* File descriptor. Depends on file type. */
  // ... page / sector counters, flags, table offsets, temp+numerable cursors ...
};

file_header fields.

Field	Role / why it exists
`time_creation`	Wall-clock create time; distinguishes reused fileids.
`self`	This file’s own VFID; self-identifying header for recovery.
`tablespace`	Embedded `FILE_TABLESPACE`; perm extension (Ch.5), zeroed temp.
`descriptor`	Embedded `FILE_DESCRIPTORS` union; type-specific owner metadata.
`n_page_total`	Total pages over all sectors; allocation ceiling.
`n_page_user`	User pages handed out (0); user vs table pages.
`n_page_ftab`	Pages used by the file’s tables; starts at 1 (header).
`n_page_free`	Reserved-but-unallocated pages; Ch.7/8 draws down.
`n_page_mark_delete`	Removed numerable pages; marked, not compacted.
`n_sector_total`	Reserved-sector count; equals `n_sectors`.
`n_sector_partial`	Sectors with a free page (`total-full`); alloc candidates.
`n_sector_full`	Sectors fully used by tables; perm only.
`n_sector_empty`	Sectors with no page allocated; starts `-1` (header sector).
`type`	`FILE_TYPE` enum; type routing.
`file_flags`	`NUMERABLE`/`TEMPORARY`/`ENCRYPTED_`; truth for `FILE_IS_`.
`volid_last_expand`	Last volume that supplied a sector; seeds next extension.
`offset_to_partial_ftab`	Offset to partial table; anchors `GET_PART_FTAB`.
`offset_to_full_ftab`	Offset to full table; perm only, else `NULL_OFFSET`.
`offset_to_user_page_ftab`	Offset to user-page table; numerable only, else `NULL_OFFSET`.
`vpid_sticky_first`	Undeletable first page; set later (Ch.11).
`vpid_last_temp_alloc` + `offset_to_last_temp_alloc`	Temp-alloc cursor (page + offset); temp shortcut (Ch.8).
`vpid_last_user_page_ftab`	Last user-page-table page; numerable append (Ch.10).
`vpid_find_nth_last` / `first_index_find_nth_last`	Cached `find_nth` position; nth-lookup speedup (Ch.10).
`reserved0..3`	Padding, zeroed; forward-compat.

graph LR
  FH["file_header"] -->|embeds| TS["FILE_TABLESPACE"]
  FH -->|embeds union| DES["FILE_DESCRIPTORS"]
  FH -->|offset_to_partial_ftab| PT["partial table"]
  FH -->|offset_to_full_ftab perm| FT["full table"]
  FH -->|offset_to_user_page_ftab numerable| UT["user-page table"]

Figure 6-1. file_header embeds two structs and points at one to three file_extensible_data tables in the header page.

FILE_TABLESPACE — four fields set by FILE_TABLESPACE_FOR_PERM_NPAGES / _FOR_TEMP_NPAGES: initial_size (requested bytes MAX(1,npages)*DB_PAGESIZE, seeds total_size); expand_ratio (geometric-growth fraction, 0 for temp); expand_min_size / expand_max_size (per-extension clamps, both 0 for temp so temp never auto-extends).

FILE_DESCRIPTORS is a union padded to 64 bytes (FILE_DESCRIPTORS_SIZE). Arms: heap (class_oid, hfid), heap_overflow, btree (class_oid, attr_id), btree_key_overflow, ehash (class_oid, attr_id), vacuum_data (vpid_first), dummy_align (forces the 64-byte footprint). The fixed size is load-bearing — the header warns “if you change file descriptors size, make sure to change disk compatibility version too!”: the union size is part of the on-disk format.

file_extensible_data is the table header repeated up to three times after file_header. Four fields: vpid_next (continuation-page link when a table outgrows the header page), max_size (item capacity in bytes, fixed at file_extdata_init), size_of_item (bytes per item — one struct, three item types), n_items (items stored, starts 0).

6.2 Estimating size: data plus worst-case file-table sectors

file_create turns the requested byte size into a sector count, then reserves extra sectors for the file’s own tables. The estimate is pessimistic on purpose — over-reserving is cheap, under-reserving forces a mid-create extension.

// file_create -- src/storage/file_manager.c
total_size = tablespace->initial_size;
if (!is_numerable) max_size_ftab = total_size / 8 / 1024;       /* <- partial+full (~1 byte/8KB) */
else               max_size_ftab = total_size * 33 / 8 / 1024;  /* <- + user-page table */
total_size += max_size_ftab;
n_sectors = (int) CEIL_PTVDIV (total_size, DB_SECTORSIZE);
vsids_reserved = (VSID *) db_private_alloc (thread_p, n_sectors * sizeof (VSID));

On db_private_alloc failure: er_set(ER_OUT_OF_VIRTUAL_MEMORY) then goto exit (nothing reserved yet). Otherwise, for permanent files only (do_logging = !is_temp), log_sysop_start opens a system operation; temp files skip it. This do_logging split recurs at every dirty/unfix call below.

6.3 Reserving and choosing the VFID

// file_create -- src/storage/file_manager.c
volpurpose = is_temp ? DB_TEMPORARY_DATA_PURPOSE : DB_PERMANENT_DATA_PURPOSE;
error_code = disk_reserve_sectors (thread_p, volpurpose, NULL_VOLID, n_sectors, vsids_reserved);
if (error_code != NO_ERROR) { ASSERT_ERROR (); goto exit; }
was_temp_reserved = is_temp;                              /* <- arm temp-leak cleanup */
volid_last_expand = vsids_reserved[n_sectors - 1].volid;  /* <- before sort! */
qsort (vsids_reserved, n_sectors, sizeof (VSID), disk_compare_vsids);

volid_last_expand is grabbed before the sort: sectors come back in reservation order, and the last one is the most-recently-extended volume, where future growth should continue. was_temp_reserved arms the manual unreserve at exit (temp reservations are not undone by recovery).

Header-page (hence VFID) selection then branches on type (Figure 6-2):

flowchart TD
  B{"SERVER_MODE and type in\nBTREE/HEAP/HEAP_REUSE_SLOTS?"}
  B -->|yes| C["scan fileids in first volume\nvacuum_is_file_dropped per fileid"]
  C --> D{"non-dropped found?"}
  D -->|yes| E["vfid = found_vfid"]
  D -->|no| F["assert_release false -> exit"]
  B -->|no| G["vfid = first page of sectid[0]"]
  E --> H["vpid_fhead = vfid"]
  G --> H

Figure 6-2. Header-page / VFID selection.

The default branch takes the first page of the first sorted sector. The vacuum-aware branch (SERVER_MODE and type in BTREE/HEAP/HEAP_REUSE_SLOTS) exists because reusing a VFID vacuum still believes is “dropped” would corrupt its dropped-files list. It walks every fileid of every sector in the first volume (the VFID must share that volume) and picks the first vacuum_is_file_dropped reports clean; that function erroring is goto exit, a fully-dropped first volume is assert_release(false) (impossible).

6.4 Initializing the header

// file_create -- src/storage/file_manager.c
page_fhead = pgbuf_fix (thread_p, &vpid_fhead, NEW_PAGE, PGBUF_LATCH_WRITE, PGBUF_UNCONDITIONAL_LATCH);
if (page_fhead == NULL) { ASSERT_ERROR_AND_SET (error_code); goto exit; }
// ... condensed: memset(0), set ptype PAGE_FTAB, fhead = page; self/tablespace/type set ...
if (des != NULL) { fhead->descriptor = *des; }   /* <- temp/query-area pass NULL */
if (is_numerable) { fhead->file_flags |= FILE_FLAG_NUMERABLE; }
if (is_temp)      { fhead->file_flags |= FILE_FLAG_TEMPORARY; }
// ... condensed: time_creation, NULL cursors, zero counters ...
fhead->n_page_ftab = 1;        /* <- the header page is itself a table page */
fhead->n_sector_empty--;       /* <- start negative: header sector is not empty */

The header is fixed new (error path on NULL), zeroed, typed PAGE_FTAB, and self/tablespace/type stamped in. Two non-obvious seeds: n_page_ftab starts at 1 (the header is a table page) and n_sector_empty at -1 so the header’s sector is not counted as empty when partial sectors are later tallied.

6.5 The three-table layout — four flavors of the header byte budget

After the offset cursor offset_ftab is seeded to FILE_HEADER_ALIGNED_SIZE (the first byte past the fixed header), file_create carves the remaining DB_PAGESIZE - offset_ftab bytes into tables (Figure 6-3). The four-way split keys on the (is_temp, is_numerable) pair:

flowchart TD
  N{"is_numerable?"}
  N -->|yes| NT{"is_temp?"}
  N -->|no| RT{"is_temp?"}
  NT -->|yes| A["temp numerable\npartial 1/16, user-page 15/16"]
  NT -->|no| B["perm numerable\npartial 1/32, full 1/32, user-page 15/16"]
  RT -->|yes| C["temp regular\npartial = all remaining"]
  RT -->|no| D["perm regular\npartial 1/2, full 1/2"]

Figure 6-3. The four flavors of header-page partitioning. Every flavor allocates a partial table; full and user-page are conditional.

Each table is initialized with file_extdata_init(item_size, size, extdata) — item_size is sizeof(FILE_PARTIAL_SECTOR) for partial, sizeof(VSID) for full, sizeof(VPID) for user-page. Each assignment fhead->offset_to_*_ftab = offset_ftab is followed by offset_ftab += file_extdata_max_size(extdata) so the next table starts aligned past it. The permanent-numerable arm is the only one to advance the cursor twice (after partial, after full) before the user-page table consumes the remainder; all others advance it at most once.

Invariant: every file has a partial table, correctly aligned. All four branches end asserting offset_to_partial_ftab != NULL_OFFSET, and every offset_to_*_ftab assignment is followed by assert((INT16) DB_ALIGN(offset, MAX_ALIGNMENT) == offset). The partial table is the universal entry point (Ch.7/8 walk it first); full/user-page offsets stay NULL_OFFSET when unused. Alignment holds by construction (FILE_HEADER_ALIGNED_SIZE is pre-aligned, file_extdata_max_size returns an aligned span). The FILE_HEADER_GET_*_FTAB macros enforce the contract on every later read: GET_FULL_FTAB asserts !FILE_IS_TEMPORARY(fh), GET_USER_PAGE_FTAB asserts FILE_IS_NUMERABLE(fh), and all three bound the offset in [FILE_HEADER_ALIGNED_SIZE, DB_PAGESIZE). A mis-set offset is a loud crash, not silent corruption; broken alignment means unaligned INT64/VPID access.

6.6 Populating the partial table and splitting full sectors

file_create walks vsids_reserved, appending one FILE_PARTIAL_SECTOR per sector into the partial table (file_extdata_append). When the in-header table fills (file_extdata_is_full), it allocates a continuation page from the sectors it is currently recording, chains it via vpid_next, bumps n_page_ftab, and continues there; continuation pages’ bits are set in their sectors’ bitmaps so they are never re-handed to a user.

After the walk the last sector partsect_ftab points at may itself be full (it held the last table page); if so, partsect_ftab++; fhead->n_sector_full++;. Then, for permanent files only, sectors fully consumed by the file table migrate from the partial table into the full table:

// file_create (full-sector migration) -- src/storage/file_manager.c
if (!is_temp && fhead->n_sector_full > 0)
  {
    // ... condensed: GET_PART_FTAB + GET_FULL_FTAB into extdata_part_ftab / extdata_full_ftab ...
    for (i = 0; i < fhead->n_sector_full; i++)
      {
        partsect_iter = (FILE_PARTIAL_SECTOR *) file_extdata_at (extdata_part_ftab, i);
        /* ... condensed: drops the file_extdata_is_full / assert_release(false) guard ... */
        file_extdata_append (extdata_full_ftab, &partsect_iter->vsid);  /* <- VSID only */
      }
    file_extdata_remove_at (extdata_part_ftab, 0, fhead->n_sector_full);
  }

Temp files skip this entirely (no full table); they instead seed the temp cursor (vpid_last_temp_alloc = vpid_fhead, offset_to_last_temp_alloc = n_sector_full). Numerable files (temp or perm) seed the user-page-table head (vpid_last_user_page_ftab and vpid_find_nth_last both set to vpid_fhead). Finally the counters are reconciled — n_sector_total = n_sectors, n_sector_partial = total - full, n_sector_empty += n_sector_partial, n_page_total = n_sector_total * DISK_SECTOR_NPAGES, n_page_free = n_page_total - n_page_ftab — and file_header_sanity_check asserts the header is internally consistent.

6.7 Commit, tracker registration, and the error/exit path

// file_create (finish) -- src/storage/file_manager.c
if (do_logging)
  { pgbuf_log_new_page (thread_p, page_fhead, DB_PAGESIZE, PAGE_FTAB);
    pgbuf_unfix_and_init (thread_p, page_fhead); }
else
  { pgbuf_set_dirty_and_free (thread_p, page_fhead); }    /* <- temp: no redo log */
if (!is_temp && file_type != FILE_TRACKER)
  { error_code = file_tracker_register (thread_p, vfid, file_type, NULL);
    if (error_code != NO_ERROR) { ASSERT_ERROR (); goto exit; } }
if (is_temp) { ATOMIC_INC_32 (&file_Tempcache.spacedb_temp.nfile, 1); /* ...stats... */ }

Permanent files log the header for redo and register with the file tracker — except FILE_TRACKER itself, which would be circular (tracker registration is Ch.11). Temp files only bump in-memory spacedb_temp counters. The shared exit label handles every branch’s failure: (1) unfix page_ftab / page_fhead if still held; (2) if is_sysop_started, on error log_sysop_abort (rolls back reserve+layout), on success log_sysop_end_logical_undo(RVFL_DESTROY, vfid) so a later transaction abort tears the whole file down; (3) on error VFID_SET_NULL(vfid) so callers never see a half-built id, and if was_temp_reserved the temp sectors are manually unreserved here (recovery won’t, since temp work isn’t logged) under logtb_set_check_interrupt(false); (4) always db_private_free(vsids_reserved).

6.8 The wrappers and what each sets

Every public creator funnels into file_create with a fixed (is_temp, is_numerable, file_type):

Wrapper	`file_type`	`is_temp`	`is_numerable`	Descriptor	Tablespace
`file_create_heap`	`FILE_HEAP` / `FILE_HEAP_REUSE_SLOTS`	no	no	`heap` (class_oid)	perm, npages=1
`file_create_temp`	`FILE_TEMP`	yes	no	NULL	temp
`file_create_temp_numerable`	`FILE_TEMP`	yes	yes	NULL	temp
`file_create_query_area`	`FILE_QUERY_AREA`	yes	no	NULL	temp, npages=1
`file_create_ehash`	`FILE_EXTENDIBLE_HASH`	caller’s `is_tmp`	yes	`ehash`	temp-sized
`file_create_ehash_dir`	`FILE_EXTENDIBLE_HASH_DIRECTORY`	caller’s `is_tmp`	yes	`ehash`	temp-sized

file_create_heap builds the descriptor (memset, then des.heap.class_oid = *class_oid) and routes through file_create_with_npages. The three temp wrappers all go through file_create_temp_internal, which is not a thin pass-through:

// file_create_temp_internal -- src/storage/file_manager.c
error_code = file_tempcache_get (thread_p, ftype, is_numerable, &tempcache_entry);
if (VFID_ISNULL (&tempcache_entry->vfid))         /* <- cache miss: create fresh */
  {
    FILE_TABLESPACE_FOR_TEMP_NPAGES (&tablespace, npages);
    file_tempcache_lock_tran_entry (tran_entry);  /* <- rmutex_topop guard */
    error_code = file_create (thread_p, ftype, &tablespace, NULL, true, is_numerable, vfid_out);
    file_tempcache_unlock_tran_entry (tran_entry);
    // ... condensed: on error file_tempcache_retire_entry + return; else cache the vfid ...
  }
else { *vfid_out = tempcache_entry->vfid; }        /* <- cache hit: reuse, no file_create */
file_tempcache_push_tran_file (thread_p, tempcache_entry);

So temp creation may skip file_create entirely and return a cached file. When it does call file_create, it wraps the call in a per-transaction lock because file_create’s log_sysop_start uses rmutex_topop, unsafe across parallel workers of one transaction (tempcache is Ch.11). The ehash wrappers are thin: temp-sized tablespace, FILE_EHASH_DES as descriptor, is_numerable = true unconditionally (nth-page lookup), is_temp forwarded from the caller’s is_tmp.

6.9 Chapter summary — key takeaways

file_create is the single engine; the wrappers only pick (file_type, is_temp, is_numerable, descriptor, tablespace). Heap/ehash supply a descriptor; temp/query-area pass NULL.
The reserved-sector count is over-estimated to fit the file’s own tables (total/8/1024 extra bytes regular, total*33/8/1024 numerable), avoiding a mid-create extension. The VFID is the first page of the first reserved sector — except heap/btree under SERVER_MODE, which scan past any fileid vacuum still considers dropped.
The header page hosts one file_header plus one to three file_extensible_data tables, partitioned by flavor: perm regular 1/2+1/2, perm numerable 1/32+1/32+15/16, temp regular partial-only, temp numerable 1/16+15/16. Two invariants hold throughout: every file has a partial table, and every offset is MAX_ALIGNMENT-aligned (enforced by FILE_HEADER_GET_*_FTAB).
Permanent files migrate fully-consumed file-table sectors into the full table; temp files keep one cursor (vpid_last_temp_alloc) instead.
do_logging = !is_temp governs durability: perm files run a sysop that logs the header and registers RVFL_DESTROY as logical undo; temp files are set-dirty-and-free, manually unreserved on error, and may be served straight from the tempcache without reaching file_create.

Chapter 7: Permanent File Page Allocation

This chapter answers: given a permanent file that already owns sectors, how does file_alloc hand out the next user page while preserving the head-of-Partial-table invariant that keeps the next allocation O(1)? We trace file_alloc, the engine file_perm_alloc, and its helpers. Theory of sectors, partial vs. full tables, and FILE_EXTENSIBLE_DATA lives in the companion (cubrid-disk-manager.md, “File layout” / “Three-table model”). Temporary allocation (Ch.8) and numerable tables (Ch.10) are out of scope.

7.1 The data unit: `file_partial_sector`

Every entry in the partial table is a file_partial_sector (typedef FILE_PARTIAL_SECTOR); file_perm_alloc mutates its bitmap on every fast-path allocation.

// file_partial_sector -- src/storage/file_manager.h
struct file_partial_sector
{
  VSID vsid;            /* Important - VSID must be first member ...
                          * Sometimes, the FILE_PARTIAL_SECTOR pointers
                          * in file table are reinterpreted as VSID. */
  FILE_ALLOC_BITMAP page_bitmap;
};

Field	Role	Why it exists
`vsid`	Sector identity `{ volid, sectid }`; the on-disk address of the 64-page run this entry covers.	Says “this sector is reserved by this file.” Also the bare full-table entry (see invariant).
`page_bitmap`	64-bit `FILE_ALLOC_BITMAP` (`UINT64`); bit k set ⇒ page k allocated. `DISK_SECTOR_NPAGES == 64`, one bit per page.	Flip one bit instead of scanning. `0x0…0` = `FILE_EMPTY_PAGE_BITMAP`, `0xF…F` = `FILE_FULL_PAGE_BITMAP`.

Invariant — VSID must be the first member. Full-table and expansion code reinterprets a FILE_PARTIAL_SECTOR * as a VSID * (a full-table entry is just a VSID). A field placed before vsid would make that cast read garbage. The struct layout is the contract.

classDiagram
  class FILE_PARTIAL_SECTOR {
    VSID vsid
    FILE_ALLOC_BITMAP page_bitmap
  }
  FILE_PARTIAL_SECTOR --> "full table reuses the prefix" VSID : vsid is first

Figure 7-1. The full table stores a bare VSID — exactly the leading field of a partial entry. This prefix compatibility recurs throughout the chapter.

7.2 `file_alloc`: the dispatcher

file_alloc fixes the header, branches on FILE_IS_TEMPORARY, allocates, optionally registers and initializes the page, and frames the permanent path inside a logical-undo system operation.

// file_alloc -- src/storage/file_manager.c
page_fhead = pgbuf_fix (thread_p, &vpid_fhead, OLD_PAGE, PGBUF_LATCH_WRITE, ...);
// ... condensed ...
if (FILE_IS_TEMPORARY (fhead))
  error_code = file_temp_alloc (thread_p, page_fhead, FILE_ALLOC_USER_PAGE, vpid_out);  /* <- Ch.8 */
else
  {
    log_sysop_start_atomic (thread_p);   /* <- nested top action, atomic so undo is one unit */
    is_sysop_started = true;
    error_code = file_perm_alloc (thread_p, page_fhead, FILE_ALLOC_USER_PAGE, vpid_out);  /* <- 7.3 */
    VFID_COPY ((VFID *) undo_log_data, vfid);
    VPID_COPY ((VPID *) (undo_log_data + sizeof (VFID)), vpid_out);  /* <- undo payload {vfid,vpid} */
  }

Remaining exits (errors → goto exit): (1) pgbuf_fix fails → return, nothing fixed. (2) numerable → file_numerable_add_page (tail call, Ch.10). (3) f_init supplied → fix the new page NEW_PAGE, init, set TDE, hand back via page_out or unfix; failure unfixes. (4) no f_init → asserted temporary; return the raw page if page_out requested. (5) exit → sysop aborts on error, else commit-and-undo via log_sysop_end_logical_undo (RVFL_ALLOC, …), then sanity-check and unfix. Structural changes are redo-logged eagerly inside file_perm_alloc; the sysop’s single logical undo (RVFL_ALLOC) is the “deallocate {vfid,vpid}” record — the nested-top-action discipline of Ch.5.

7.3 `file_perm_alloc`: the engine

Four phases: ensure free pages, ensure the header section holds a partial sector, flip a head-sector bit, then migrate to the full table if it just filled.

flowchart TD
  A["file_perm_alloc(alloc_type)"] --> B{"n_page_free == 0 ?"}
  B -- yes --> C["file_perm_expand\nreserve more sectors"]
  B -- no --> D
  C --> D{"header partial section empty ?"}
  D -- yes --> E["file_table_move_partial_sectors_to_header"]
  E --> F{"vpid_alloc_out set ?"}
  F -- yes --> Z["goto exit\npage already chosen"]
  F -- no --> G
  D -- no --> G["partsect = head of partial section"]
  G --> H["file_partsect_alloc:\nset first 0-bit, emit vpid"]
  H --> I{"alloc_type ==\nTABLE_PAGE_FULL_SECTOR ?"}
  I -- yes --> J["file_table_append_full_sector_page"]
  I -- no --> K
  J --> K["file_header_alloc:\ncounters + WAL"]
  K --> L{"sector now full ?"}
  L -- no --> Z2["exit OK"]
  L -- yes --> M["remove head from partial table"]
  M --> N["file_table_add_full_sector(vsid)"]
  N --> Z2

Figure 7-2. file_perm_alloc control flow — every branch and goto.

Phase 1 — guarantee a free page

// file_perm_alloc -- src/storage/file_manager.c
if (fhead->n_page_free == 0)
  {
    error_code = file_perm_expand (thread_p, page_fhead);   /* <- 7.4 */
    if (error_code != NO_ERROR) { ASSERT_ERROR (); goto exit; }
  }
assert (fhead->n_page_free > 0 && fhead->n_sector_partial > 0);

Invariant — the header holds a Partial entry while n_page_free > 0. Free pages live only inside partial sectors (full sectors have none; empty is a subset of partial), so any free page implies a partial sector — the two asserts confirm it. Phase 2 guarantees one sits in the header section.

Phase 2 — guarantee the head section is non-empty

FILE_HEADER_GET_PART_FTAB (fhead, extdata_part_ftab);
if (file_extdata_is_empty (extdata_part_ftab))
  {
    error_code = file_table_move_partial_sectors_to_header (thread_p, page_fhead, alloc_type, vpid_alloc_out);  /* <- 7.5 */
    if (error_code != NO_ERROR) { ASSERT_ERROR (); goto exit; }
    if (!VPID_ISNULL (vpid_alloc_out))
      {
        goto exit;   /* <- a freed overflow page was reused as the allocation; done */
      }
  }
assert (!file_extdata_is_empty (extdata_part_ftab));

Either the move repopulated the header (vpid_alloc_out NULL, fall through) or it drained an overflow page and reused that page as the result (vpid_alloc_out set → goto exit, no bitmap touched — the CBRD-21242 path, 7.5).

Phase 3 — flip the head bit

partsect = (FILE_PARTIAL_SECTOR *) file_extdata_start (extdata_part_ftab);  /* <- head item, position 0 */
assert (!file_partsect_is_full (partsect));
was_empty = file_partsect_is_empty (partsect);
if (!file_partsect_alloc (partsect, vpid_alloc_out, &offset_to_alloc_bit))  /* <- 7.6 */
  {
    assert_release (false);          /* head sector must have a free page (invariant) */
    error_code = ER_FAILED; goto exit;
  }
log_append_undoredo_data2 (thread_p, RVFL_PARTSECT_ALLOC, NULL, page_fhead,
                           (PGLENGTH) ((char *) partsect - page_fhead), /* <- offset of partsect in page */
                           ..., &offset_to_alloc_bit, &offset_to_alloc_bit);  /* <- undo == redo == bit offset */

Invariant — the head sector always has a free page. Allocation always reads position 0 (file_extdata_start); Phases 1–2 guarantee a non-full partial sector there, so the two asserts treat a full head as a logic error. RVFL_PARTSECT_ALLOC logs only the bit offset (undo == redo) at partsect’s byte offset in the header page.

`FILE_ALLOC_TABLE_PAGE` vs `FILE_ALLOC_USER_PAGE`

Right after the bit flip, if (alloc_type == FILE_ALLOC_TABLE_PAGE_FULL_SECTOR) calls file_table_append_full_sector_page (...) (7.7). alloc_type says what the page is for; file_header_alloc (7.8) bumps n_page_user or n_page_ftab accordingly. The enum file_alloc_type has three values — FILE_ALLOC_USER_PAGE, FILE_ALLOC_TABLE_PAGE, FILE_ALLOC_TABLE_PAGE_FULL_SECTOR. The last is requested by file_table_add_full_sector when the full table needs a page; that page must link in before the current sector migrates, else migration finds no room and recurses — the reason the third value exists.

Phase 4 — migrate to full on overflow

is_full = file_partsect_is_full (partsect);
file_header_alloc (fhead, alloc_type, was_empty, is_full);   /* <- 7.8: counters + WAL */
file_log_fhead_alloc (thread_p, page_fhead, alloc_type, was_empty, is_full);
if (is_full)
  {
    VSID vsid_full = partsect->vsid;                 /* <- save before removal */
    file_log_extdata_remove (thread_p, extdata_part_ftab, page_fhead, 0, 1);
    file_extdata_remove_at (extdata_part_ftab, 0, 1);            /* <- drop head item */
    error_code = file_table_add_full_sector (thread_p, page_fhead, &vsid_full);  /* <- 7.7 */
    if (error_code != NO_ERROR) { ASSERT_ERROR (); goto exit; }
  }

Counters update first (correct before any nested allocation), then the full head sector is removed from position 0 and added to the full table — restoring the head-of-Partial invariant. Rollback is owned by the enclosing file_alloc sysop.

7.4 `file_perm_expand`: refill the partial table

Called when n_page_free == 0. Reserves a batch of new sectors, appending them as empty partial entries in the header.

// file_perm_expand -- src/storage/file_manager.c
expand_size_in_sectors = (int) ((float) fhead->n_sector_total * fhead->tablespace.expand_ratio);
expand_size_in_sectors = MAX (expand_size_in_sectors, expand_min_size_in_sectors);
expand_size_in_sectors = MIN (expand_size_in_sectors, expand_max_size_in_sectors);  /* <- clamp to header capacity */
// ... condensed: db_private_alloc vsids_reserved buffer ...
log_sysop_start (thread_p);   /* <- separate committed sysop: expansion is permanent */
error_code = disk_reserve_sectors (thread_p, DB_PERMANENT_DATA_PURPOSE, fhead->volid_last_expand,
                                   expand_size_in_sectors, vsids_reserved);   /* fail -> goto exit, abort */
qsort (vsids_reserved, expand_size_in_sectors, sizeof (VSID), disk_compare_vsids);
partsect.page_bitmap = FILE_EMPTY_PAGE_BITMAP;
for (... each reserved vsid ...)
  { partsect.vsid = *vsid_iter; file_extdata_append (extdata_part_ftab, &partsect); }  /* <- empty entries into header */
fhead->n_sector_total += expand_size_in_sectors;
fhead->n_sector_empty = fhead->n_sector_partial = expand_size_in_sectors;   /* asserted 0 before */
fhead->n_page_free = expand_size_in_sectors * DISK_SECTOR_NPAGES;           /* asserted 0 before */
fhead->n_page_total += fhead->n_page_free;

Branches: (1) size clamped to header file_extdata_remaining_capacity — expansion never needs a new table page. (2) VSID-buffer db_private_alloc fails → ER_OUT_OF_VIRTUAL_MEMORY, return before any sysop. (3) disk_reserve_sectors fails → goto exit, sysop aborted. (4) success sets the counters (each asserted 0 first, confirming expand runs only on full exhaustion). The inner sysop commits on success, aborts on error (its own nested top action, Ch.5); RVFL_EXPAND logs the reserved VSID array as redo with empty undo.

7.5 `file_table_move_partial_sectors_to_header`

Header section empty but overflow pages still hold partial sectors: hoist items from the first overflow page up.

// file_table_move_partial_sectors_to_header -- src/storage/file_manager.c
page_part_ftab_first = pgbuf_fix (thread_p, &extdata_part_ftab_head->vpid_next, OLD_PAGE, ...);  /* fail -> exit */
n_items_to_move = file_extdata_item_count (extdata_part_ftab_first);
if (n_items_to_move == 0) { assert_release (false); error_code = ER_FAILED; goto exit; }
// ... condensed: re-check header is empty ...
n_items_to_move = MIN (n_items_to_move, file_extdata_remaining_capacity (extdata_part_ftab_head));  /* <- cap to header room */
file_extdata_append_array (extdata_part_ftab_head, file_extdata_start (extdata_part_ftab_first), n_items_to_move);
file_log_extdata_add (thread_p, extdata_part_ftab_head, page_fhead, 0, n_items_to_move, ...);
if (n_items_to_move < file_extdata_item_count (extdata_part_ftab_first))
  { /* partial move: remove copied prefix; first page survives */
    file_log_extdata_remove (thread_p, extdata_part_ftab_first, page_part_ftab_first, 0, n_items_to_move);
    file_extdata_remove_at (extdata_part_ftab_first, 0, n_items_to_move);
  }
else
  { /* whole page drained: unlink and REUSE it (CBRD-21242) */
    VPID save_next = extdata_part_ftab_head->vpid_next;  /* <- drained page id, saved before relink */
    // ... relink: head->vpid_next = first->vpid_next (skip drained page) ...
    *vpid_alloc_out = save_next;
    pgbuf_dealloc_page (thread_p, page_part_ftab_first);
    if (alloc_type == FILE_ALLOC_TABLE_PAGE_FULL_SECTOR) { file_table_append_full_sector_page (...); }
    else if (alloc_type == FILE_ALLOC_USER_PAGE) { fhead->n_page_ftab--; fhead->n_page_user++;
        log_append_undoredo_data2 (thread_p, RVFL_FHEAD_CONVERT_FTAB_TO_USER, ...); }
  }

Error/assert branches before the split: header vpid_next NULL → assert(false), ER_FAILED; pgbuf_fix of the first overflow page fails → goto exit; n_items_to_move == 0 → assert_release(false); header not actually empty → silent goto exit. The full-drain path saves vpid_next before relinking, reuses the drained page as the result, and converts a table page to a user page (RVFL_FHEAD_CONVERT_FTAB_TO_USER) — avoiding a deallocate-then-reallocate loop, which is why Phase 2 short-circuits on !VPID_ISNULL (vpid_alloc_out).

7.6 `file_partsect_alloc` and the bit helpers

Allocation is one bit flip in the head sector’s bitmap.

// file_partsect_alloc -- src/storage/file_manager.c
int offset_to_zero = bit64_count_trailing_ones (partsect->page_bitmap);  /* <- index of first 0-bit */
if (offset_to_zero >= FILE_ALLOC_BITMAP_NBITS)        /* 64: bitmap all ones */
  { assert (file_partsect_is_full (partsect)); return false; }  /* <- caller treats as logic error */
file_partsect_set_bit (partsect, offset_to_zero);
if (offset_out)  *offset_out = offset_to_zero;
if (vpid_out)                                          /* <- reconstruct VPID from vsid + offset */
  {
    vpid_out->volid = partsect->vsid.volid;
    vpid_out->pageid = SECTOR_FIRST_PAGEID (partsect->vsid.sectid) + offset_to_zero;
  }
return true;

bit64_count_trailing_ones finds the lowest unset bit (pages go out densely from the sector bottom). file_partsect_set_bit asserts the bit is clear and ORs it via bit64_set. The inverse file_partsect_pageid_to_offset subtracts SECTOR_FIRST_PAGEID (sectid) — used by deallocation (Ch.9). The bitmap is the page list.

7.7 Adding a full sector: `file_table_add_full_sector` and `file_table_append_full_sector_page`

When the head sector fills, its VSID migrates to the full table.

// file_table_add_full_sector -- src/storage/file_manager.c
FILE_HEADER_GET_FULL_FTAB (fhead, extdata_full_ftab);
error_code = file_extdata_find_not_full (thread_p, &extdata_full_ftab, &page_ftab, &found);
if (!found)
  { /* full table is full: allocate a NEW table page for it */
    error_code = file_perm_alloc (thread_p, page_fhead, FILE_ALLOC_TABLE_PAGE_FULL_SECTOR, &vpid_ftab_new);  /* <- recursion */
    page_ftab = pgbuf_fix (thread_p, &vpid_ftab_new, OLD_PAGE, ...);   /* already initialized */
    extdata_full_ftab = (FILE_EXTENSIBLE_DATA *) page_ftab;
  }
page_extdata = page_ftab != NULL ? page_ftab : page_fhead;   /* <- which page the add is logged against */
file_extdata_find_ordered (extdata_full_ftab, vsid, disk_compare_vsids, &found, &pos);
if (found) { assert_release (false); error_code = ER_FAILED; goto exit; }   /* duplicate VSID */
file_extdata_insert_at (extdata_full_ftab, pos, 1, vsid);   /* + file_log_extdata_add(..., page_extdata, ...) */

Branches: (1) free space in an existing component → insert ordered. (2) no space → recurse into file_perm_alloc with FILE_ALLOC_TABLE_PAGE_FULL_SECTOR; bounded because that type appends the new page to the full table before further migration. (3) duplicate VSID → ER_FAILED. Entries stay sorted by disk_compare_vsids for binary search.

file_table_append_full_sector_page initializes the new page and links it at the head of the chain:

// file_table_append_full_sector_page -- src/storage/file_manager.c
page_ftab = pgbuf_fix (thread_p, vpid_new, NEW_PAGE, ...);   /* fail -> ASSERT_ERROR_AND_SET, return */
pgbuf_set_page_ptype (thread_p, page_ftab, PAGE_FTAB);
file_extdata_init (sizeof (VSID), DB_PAGESIZE, extdata_new_ftab);   /* <- full entries are bare VSIDs */
VPID_COPY (&extdata_new_ftab->vpid_next, &extdata_full_ftab->vpid_next);  /* new page points at old head */
pgbuf_log_new_page (thread_p, page_ftab, file_extdata_size (extdata_new_ftab), PAGE_FTAB);
pgbuf_unfix_and_init (thread_p, page_ftab);                        /* <- new page no longer fixed */
file_log_extdata_set_next (thread_p, extdata_full_ftab, page_fhead, vpid_new);   /* old head -> new page */
VPID_COPY (&extdata_full_ftab->vpid_next, vpid_new);

file_extdata_init uses sizeof (VSID), not sizeof (FILE_PARTIAL_SECTOR) — the 7.1 prefix compatibility in action.

7.8 Counter updates in `file_header_alloc`

file_header_alloc is the single place maintaining the eight header counters (n_page_total/user/ftab/free, n_sector_total/partial/full/empty).

// file_header_alloc -- src/storage/file_manager.c
fhead->n_page_free--;
if (alloc_type == FILE_ALLOC_USER_PAGE)  fhead->n_page_user++;
else                                     fhead->n_page_ftab++;   /* table page of either flavor */
if (was_empty)  fhead->n_sector_empty--;     /* sector now holds a page: no longer empty */
if (is_full)  { fhead->n_sector_partial--; fhead->n_sector_full++; }   /* migrated to full */

The leading assert (!was_empty || !is_full) enforces that one allocation cannot take a sector empty→full (only empty→partial or partial→full). file_log_fhead_alloc writes a 3-bool redo {is_ftab_page, was_empty, is_full} replayed by file_rv_fhead_alloc. n_page_total/n_sector_total change only on expansion (7.4).

7.9 Chapter summary — key takeaways

file_alloc dispatches on FILE_IS_TEMPORARY: temporary → file_temp_alloc (Ch.8), no sysop; permanent → an atomic nested-top-action sysop closed by log_sysop_end_logical_undo (RVFL_ALLOC, {vfid,vpid}).
file_partial_sector is {vsid, page_bitmap}, vsid MUST be first — full-table code reinterprets the pointer as a bare VSID; the 64-bit bitmap is one bit per page of a 64-page sector.
Phases 1–2 (expand, then move-to-header) restore the two bold invariants before the bit flip in file_partsect_alloc, which uses bit64_count_trailing_ones and reconstructs the VPID from SECTOR_FIRST_PAGEID + offset.
A filled sector migrates to the full table: file_header_alloc counters update first, then the head item moves to the sorted full table, which grows via bounded recursion using FILE_ALLOC_TABLE_PAGE_FULL_SECTOR. FILE_ALLOC_USER_PAGE vs _TABLE_PAGE[_FULL_SECTOR] decides n_page_user vs n_page_ftab; only file_perm_expand grows n_*_total (RVFL_EXPAND, its own committed sysop).
Numerable registration is a tail call (file_numerable_add_page, Ch.10); the full-drain branch reuses the emptied overflow page (CBRD-21242), logging the table-to-user conversion via RVFL_FHEAD_CONVERT_FTAB_TO_USER.

Chapter 8: Temporary File Page Allocation

Temporary files back sorts, hash joins, and query-result materialization. They live and die inside a single transaction (or get parked in the tempcache for reuse — Ch.11), so the disk manager throws away most of the machinery permanent files depend on. This chapter answers: why do temporary files skip the Partial-to-Full migration, and how does a single header cursor make allocation O(1) with no logging? The high-level rationale lives in the companion cubrid-disk-manager.md; this chapter traces the code, contrasting file_perm_alloc (Ch.7) rather than re-deriving it.

8.1 The fork in `file_alloc`

Every page allocation enters through file_alloc. The header is fixed and sanity-checked, then a single predicate splits the world:

// file_alloc -- src/storage/file_manager.c
if (FILE_IS_TEMPORARY (fhead))
  error_code = file_temp_alloc (thread_p, page_fhead, FILE_ALLOC_USER_PAGE, vpid_out);   /* <- no sysop, no undo */
else
  {
    log_sysop_start_atomic (thread_p);                /* <- permanent path opens a nested top action (Ch.5) */
    is_sysop_started = true;
    error_code = file_perm_alloc (thread_p, page_fhead, FILE_ALLOC_USER_PAGE, vpid_out);
    VFID_COPY ((VFID *) undo_log_data, vfid);         /* <- pack (VFID,VPID) logical-undo payload */
    VPID_COPY ((VPID *) (undo_log_data + sizeof (VFID)), vpid_out);
  }

Three asymmetries propagate everywhere: the temp branch starts no system operation (is_sysop_started stays false), builds no undo data, and calls file_temp_alloc. The exit-label sysop epilogue is guarded by if (is_sysop_started), so the temporary path skips the whole block (log_sysop_abort on error, else log_sysop_end_logical_undo (thread_p, RVFL_ALLOC, ...)). A temporary allocation thus produces nothing for recovery to replay; if the transaction dies mid-flight the file is simply discarded — nothing was logged, so nothing to roll back.

The f_init handling also diverges: a temporary file’s f_init may be NULL (sort buffers init their own pages), and the else branch asserts FILE_IS_TEMPORARY (fhead) before fixing the page NEW_PAGE. Numerable temp files still call file_numerable_add_page (Ch.10) — temporary does not exempt a file from the user page table. The fork is the top of Figure 8-2.

8.2 The header cursor: the entire bookkeeping state

A permanent file tracks two extensible tables (Partial and Full) and migrates sectors between them. A temporary file keeps only the Partial table plus a two-field cursor — its entire allocation state:

Field	Role	Why it exists
`vpid_last_temp_alloc`	VPID of the Partial-table page holding the sector being filled	Lets allocation jump straight to the live table page; equals the header VPID for the in-header copy, else an overflow `PAGE_FTAB` page
`offset_to_last_temp_alloc`	Index, in that page’s extensible data, of the `FILE_PARTIAL_SECTOR` being filled	Names the exact sector; advances only when the sector fills, so it also counts fully-consumed sectors in the page

The struct comment states the design contract directly — “Temporary file pages are never deallocated … keep a cursor: when the sector becomes full it is incremented; when all page becomes full it moves to next page”:

// FILE_HEADER -- src/storage/file_manager.c
VPID vpid_last_temp_alloc;     /* VPID of partial table page last used to allocate a page. */
int offset_to_last_temp_alloc; /* Sector offset in partial table last used to allocate a page. */

The cursor is seeded at creation: file_create’s temp branch sets vpid_last_temp_alloc = vpid_fhead (the header’s own Partial table) and offset_to_last_temp_alloc = fhead->n_sector_full, skipping sectors already full at creation.

Invariant (cursor consistency). offset_to_last_temp_alloc is always a valid index into the extensible data at vpid_last_temp_alloc, or exactly its item count (“advance to next page next call”); file_temp_alloc asserts both halves before dereferencing. If violated, file_extdata_at indexes past the array and corrupts an adjacent sector descriptor or reads garbage as a VSID.

graph LR
  H["FILE_HEADER"] -->|vpid_last_temp_alloc| P0["Partial table page\nin-header or PAGE_FTAB"]
  H -->|offset_to_last_temp_alloc| PS["FILE_PARTIAL_SECTOR + page_bitmap"]
  P0 -->|vpid_next| P1["next Partial table page ..."]

Figure 8-1. Cursor-to-table relationship. No Full table — full sectors stay in place ahead of the cursor.

8.3 Walking `file_temp_alloc` branch by branch

The function first disables interrupt checking (logtb_set_check_interrupt (thread_p, false), saved into save_check_interrupt) — there is no rollback, so a half-finished temp allocation must not be torn down — then asserts FILE_IS_TEMPORARY (fhead).

Step 1 — locate the live Partial-table page. If the cursor points at the header the in-header table is used directly; otherwise the overflow page is fixed with a write latch, only ER_INTERRUPTED tolerated on failure:

// file_temp_alloc -- src/storage/file_manager.c
if (VPID_EQ (&vpid_fhead, &fhead->vpid_last_temp_alloc))
  FILE_HEADER_GET_PART_FTAB (fhead, extdata_part_ftab);   /* <- table lives in header page */
else
  {
    page_ftab = pgbuf_fix (thread_p, &fhead->vpid_last_temp_alloc, OLD_PAGE, PGBUF_LATCH_WRITE, PGBUF_UNCONDITIONAL_LATCH);
    if (page_ftab == NULL)
      { error_code = er_errid (); if (error_code != ER_INTERRUPTED) assert_release (false); goto exit; }
    extdata_part_ftab = (FILE_EXTENSIBLE_DATA *) page_ftab;
  }

Step 2 — expand if out of free pages. The inline equivalent of file_temp_expand: when n_page_free == 0 it reserves one new sector via the disk manager (Ch.4) with DB_TEMPORARY_DATA_PURPOSE, so it lands in a temp volume:

// file_temp_alloc -- src/storage/file_manager.c
if (fhead->n_page_free == 0)
  {
    FILE_PARTIAL_SECTOR partsect_new = FILE_PARTIAL_SECTOR_INITIALIZER;
    error_code = disk_reserve_sectors (thread_p, DB_TEMPORARY_DATA_PURPOSE, fhead->volid_last_expand, 1, &partsect_new.vsid);
    if (error_code != NO_ERROR) { /* same ER_INTERRUPTED-tolerated handling as Step 1 */ goto exit; }

Two sub-branches follow, on whether the current page has room for one more FILE_PARTIAL_SECTOR. Sub-branch 2a — table page is full: the new sector cannot be recorded here, so its first page is stolen to host a fresh Partial-table page (bit 0 set, type PAGE_FTAB, previous vpid_next linked forward, cursor wrapped to offset 0):

// file_temp_alloc -- src/storage/file_manager.c
if (file_extdata_is_full (extdata_part_ftab))
  {
    vpid_ftab_new.volid = partsect_new.vsid.volid;
    vpid_ftab_new.pageid = SECTOR_FIRST_PAGEID (partsect_new.vsid.sectid);
    file_partsect_set_bit (&partsect_new, 0);                 /* <- page 0 becomes the table page */
    page_ftab_new = pgbuf_fix (thread_p, &vpid_ftab_new, NEW_PAGE, PGBUF_LATCH_WRITE, PGBUF_UNCONDITIONAL_LATCH);
    if (page_ftab_new == NULL) { error_code = ER_FAILED; goto exit; }
    pgbuf_set_page_ptype (thread_p, page_ftab_new, PAGE_FTAB);
    VPID_COPY (&extdata_part_ftab->vpid_next, &vpid_ftab_new); /* <- link old table -> new table */
    if (page_ftab != NULL) pgbuf_set_dirty_and_free (thread_p, page_ftab);
    VPID_COPY (&fhead->vpid_last_temp_alloc, &vpid_ftab_new);  /* <- cursor wraps to fresh table page */
    fhead->offset_to_last_temp_alloc = 0;
    page_ftab = page_ftab_new;  extdata_part_ftab = (FILE_EXTENSIBLE_DATA *) page_ftab;
    file_extdata_init (sizeof (FILE_PARTIAL_SECTOR), DB_PAGESIZE, extdata_part_ftab);
    ATOMIC_INC_32 (&file_Tempcache.spacedb_temp.npage_reserved, DISK_SECTOR_NPAGES - 1);
    ATOMIC_INC_32 (&file_Tempcache.spacedb_temp.npage_ftab, 1);
  }
else
  ATOMIC_INC_32 (&file_Tempcache.spacedb_temp.npage_reserved, DISK_SECTOR_NPAGES);  /* all pages reservable */

This is the only place the cursor wraps to a fresh table page during expansion; when a table page is carved out, one page counts as npage_ftab and only DISK_SECTOR_NPAGES - 1 are reservable. After either sub-branch the sector is appended and counters bumped — empty vs. table-hosting is encoded in partsect_new.page_bitmap (non-empty only in 2a):

// file_temp_alloc -- src/storage/file_manager.c
file_extdata_append (extdata_part_ftab, &partsect_new);
fhead->n_sector_partial++;  fhead->n_sector_total++;          // n_page_free/n_page_total += DISK_SECTOR_NPAGES
if (partsect_new.page_bitmap == FILE_EMPTY_PAGE_BITMAP) fhead->n_sector_empty++;
else { fhead->n_page_free--; fhead->n_page_ftab++; }          /* <- table page already consumed */

Invariant (sectors never leave Partial). A filled sector keeps its all-ones bitmap in place; nothing migrates it to a Full table. If violated, the cursor offset (which counts consumed sectors in the page) would no longer match the extensible-data layout and the Step-3 page-hop would skip live sectors.

Step 3 — advance to the next page if the cursor sits at the item count. A previous call may have left offset_to_last_temp_alloc one past the last sector of a now-full page. The guard if (fhead->offset_to_last_temp_alloc == file_extdata_item_count (extdata_part_ftab)) then fires: it asserts file_extdata_is_full (...) && !VPID_ISNULL (&extdata_part_ftab->vpid_next), unfixes the old page_ftab, fixes vpid_next (write latch, only ER_INTERRUPTED tolerated), and sets vpid_last_temp_alloc = vpid_next; offset_to_last_temp_alloc = 0.

Step 4 — allocate from the sector under the cursor. file_partsect_alloc sets the first zero bit. Its false return is impossible here (the cursor never points at a full sector) and is treated as a logic error:

// file_temp_alloc -- src/storage/file_manager.c
partsect = (FILE_PARTIAL_SECTOR *) file_extdata_at (extdata_part_ftab, fhead->offset_to_last_temp_alloc);
was_empty = file_partsect_is_empty (partsect);
if (!file_partsect_alloc (partsect, vpid_alloc_out, NULL))
  { assert_release (false); error_code = ER_FAILED; goto exit; }   /* <- full sector under cursor == bug */
if (file_partsect_is_full (partsect))
  { is_full = true; fhead->offset_to_last_temp_alloc++; }   /* <- advance cursor; page hop deferred to next call */
file_header_alloc (fhead, alloc_type, was_empty, is_full);   /* <- shared with perm path: pure counter math */
pgbuf_set_dirty (thread_p, page_fhead, DONT_FREE);

The cursor advances on fullness, not on every allocation: while a sector has free bits it stays put, so the common case touches only the header and one table page. file_header_alloc is the permanent path’s helper (Ch.7); its is_full shuffle still updates n_sector_full/n_sector_partial here, but no table migration accompanies it — those counters are advisory statistics, not table membership.

Step 5 — unconditional cleanup. The exit label runs on every path: file_header_sanity_check, unfix page_ftab if held, restore the saved interrupt flag via logtb_set_check_interrupt, return error_code. No pgbuf_set_dirty is ever paired with a log append — the only durability action is marking pages dirty for the non-WAL-ordered flush of temp data.

flowchart TD
  A["file_temp_alloc\ndisable interrupt check"] --> C{"cursor == header VPID?"}
  C -->|yes| F{"n_page_free == 0?"}
  C -->|no| E["fix cursor's table page"] --> F
  F -->|yes| G["disk_reserve_sectors 1 sector"] --> H{"table page full?"}
  H -->|yes| I["carve page0 as PAGE_FTAB\nlink vpid_next, wrap cursor"] --> L["append sector, bump counters"]
  H -->|no| J["reserve DISK_SECTOR_NPAGES"] --> L
  F -->|no| K{"offset == item_count?"}
  L --> K
  K -->|yes| N["fix vpid_next, cursor offset 0"] --> O["file_extdata_at + file_partsect_alloc"]
  K -->|no| O
  O --> Q{"sector now full?"}
  Q -->|yes| R["offset_to_last_temp_alloc++"] --> T["file_header_alloc, set_dirty"]
  Q -->|no| T
  T --> U["exit: unfix, restore interrupt"]

Figure 8-2. file_temp_alloc complete branch map, including both expansion sub-branches and the deferred page-hop.

8.4 Why no Full table, no postpone, no WAL

The Full table exists in permanent files only so the allocation scan can skip sectors with no free pages (Ch.7). A temporary file never scans — it allocates from the single cursor sector and advances linearly — so a filled sector is never revisited and a second table buys nothing while costing the logging the design avoids. The companion cubrid-disk-manager.md enumerates the savings; the code above is the mechanism.

Invariant (monotone, bookkeeping-free allocation). The cursor only advances (offset++ on sector-full, page-hop on item-count), never backward. file_dealloc never clears an allocation bit for a temporary file — it takes the empty else branch (no postpone, no bitmap change) and skips the deallocation entirely (Ch.9) — so no sector regains a free bit behind the cursor and the monotone property holds without reconciliation. If violated (a freed bit behind the cursor), that page is silently leaked and n_page_free drifts from reality.

8.5 Recycling: the cursor reset

This minimalism lets the tempcache (Ch.11) recycle a file by reset rather than rebuild. file_temp_reset_user_pages re-collects the partial-table bitmaps, rebuilds the n_sector_*/n_page_* counters, zeroes the user count, and rewinds the cursor to the header VPID, offset 0:

// file_temp_reset_user_pages -- src/storage/file_manager.c
fhead->n_page_user = 0;                          // ... n_sector_*/n_page_* rebuilt from re-collected bitmaps ...
fhead->vpid_last_temp_alloc = vpid_fhead;        /* <- cursor rewinds to header VPID, offset 0 */
fhead->offset_to_last_temp_alloc = 0;

This seed differs from file_create’s, which sets offset_to_last_temp_alloc = fhead->n_sector_full; reset always rewinds to offset 0. A reset file keeps its reserved sectors (no disk round-trip) and hands pages out from the front again — the payoff of skipping the Partial-to-Full machinery: allocation state collapses to two integers that cost nothing to reset.

8.6 Chapter summary — key takeaways

file_alloc forks on FILE_IS_TEMPORARY: the temp lane calls file_temp_alloc with no sysop, no undo data, no log records; the permanent lane wraps file_perm_alloc in a nested top action with RVFL_ALLOC logical undo.
Temporary files keep only a Partial sectors table; a filled sector stays in place. The complete allocation state is vpid_last_temp_alloc/offset_to_last_temp_alloc.
The cursor makes allocation O(1): Step 4 allocates directly from the cursor sector via file_partsect_alloc, advancing the offset only on sector-full and deferring the page-hop to the next call (Step 3).
Expansion is inline (n_page_free == 0): it reserves one sector with DB_TEMPORARY_DATA_PURPOSE; the table-full sub-branch carves the sector’s first page into a fresh PAGE_FTAB, links vpid_next, and wraps the cursor to offset 0.
There is no Full-table migration, no postpone, zero WAL — temp pages are never individually deallocated (file_dealloc takes the empty else branch), so the cursor is provably monotone and needs no reconciliation.
file_header_alloc is shared with the permanent path, but for temp files its is_full shuffle is advisory statistics only — no table movement.
The bookkeeping-free design lets the tempcache recycle a file by resetting the cursor (vpid_last_temp_alloc = header VPID, offset_to_last_temp_alloc = 0, n_page_user = 0) and rebuilding counters from the bitmaps, not rebuilding tables (Ch.11). Reset rewinds to offset 0, unlike file_create’s n_sector_full seed.

Chapter 9: Page Deallocation and File Destruction

This chapter traces the inverse of permanent allocation (Chapter 7): how is a page — and an entire file — given back, and why is the actual bit-flip postponed to commit time? It assumes Chapter 4 (the two-step reservation protocol, the bitmap-then-cache release-order invariant) and Chapter 7 (file_perm_alloc, the Partial/Full tables); the companion cubrid-disk-manager.md covers the sector-bitmap and disk/file split. The central fact: a freed permanent page or sector is not cleared synchronously — the releaser stages a postpone log record and the clear runs at do-postpone.

9.1 Why postpone — the committed-releaser hazard

If the bit cleared immediately when transaction T1 freed page P, a second transaction could reserve that sector, allocate P, and commit its data; should T1 then abort, undo would restore P’s old contents and clobber the second’s committed work. CUBRID defers the clear to do-postpone, which runs only after commit is logically certain — until then the bit stays set, so no allocator hands the page out. Same reasoning as Chapter 4’s release-order invariant.

INVARIANT (deferred-free): A permanent page/sector freed by an active transaction keeps its bit set until do-postpone, enforced by routing all permanent frees through log_append_postpone (RVFL_DEALLOC) / (RVDK_UNRESERVE_SECTORS) instead of mutating the bitmap inline. If violated, a concurrent allocator re-hands-out the page and a later abort corrupts the new owner’s data.

stateDiagram-v2
  [*] --> Allocated
  Allocated --> PostponeStaged : file_dealloc \n RVFL_DEALLOC appended, bit still set
  PostponeStaged --> Allocated : transaction abort \n postpone discarded, page stays allocated
  PostponeStaged --> Freed : do-postpone \n file_perm_dealloc clears bit
  Freed --> [*]

Figure 9-1 — Lifecycle of a permanent page bit. The abort edge is the point: until do-postpone, nothing changed on disk.

9.2 `file_dealloc` — staging, not freeing

file_dealloc is the public entry for giving back one page; despite its name it usually stages rather than frees. The header fix is conditional: a release build with a trustworthy concrete file_type_hint skips it to save an I/O, while a debug build always fixes (the #if defined (NDEBUG) guard) to assert the hint matches fhead->type and that vpid is not the sticky first page. The postpone decision is conservative under uncertainty — it postpones unless it can prove the file temporary:

// file_dealloc -- src/storage/file_manager.c
  if ((fhead != NULL && !FILE_IS_TEMPORARY (fhead)) || file_type_hint != FILE_TEMP)
    {
      VFID_COPY ((VFID *) log_data, vfid);
      VPID_COPY ((VPID *) (log_data + sizeof (VFID)), vpid);
      log_append_postpone (thread_p, RVFL_DEALLOC, &log_addr, LOG_DATA_SIZE, log_data);  /* <- stage only */
    }
  /* else: we do not deallocate pages from temporary files */

The RVFL_DEALLOC record carries only (VFID, VPID) — no bitmap state — because the real work is recomputed at do-postpone. Temporary files take the else (reclaimed wholesale at destroy / tempcache reset, Chapter 11). Two early exits then key on numerability: goto exit if !FILE_TYPE_CAN_BE_NUMERABLE (file_type_hint) (not numerable by type) and again if !FILE_IS_NUMERABLE (fhead) (type allows it but this file is not). Only a genuinely numerable file acts now — it searches the user page table and sets FILE_USER_PAGE_MARK_DELETED, logging RVFL_USER_PAGE_MARK_DELETE for non-temporary files (mechanics deferred to Chapter 10).

INVARIANT (numerable consistency): In a numerable file the page must exist in the user page table and not already be marked deleted (enforced by assert_release (false) on !found and on FILE_USER_PAGE_IS_MARKED_DELETED). If violated, the user page table and the allocation tables have diverged — a hard bug.

The exit: label unfixes page_fhead and page_ftab if held.

9.3 `file_perm_dealloc` — the actual bit-flip at do-postpone

At commit, do-postpone replays each RVFL_DEALLOC through file_rv_dealloc_on_postpone → file_rv_dealloc_internal, which fixes the header, starts a system operation, and calls file_perm_dealloc — where the bit is finally cleared. Entry asserts the contract: log_check_system_op_is_started (must be inside a sysop) and !FILE_IS_TEMPORARY (fhead) (permanent only); it then computes vsid_dealloc from vpid_dealloc (SECTOR_FROM_PAGEID).

INVARIANT (sysop-wrapped table change): All file-table mutations in file_perm_dealloc must commit as a nested system operation before the header page is unfixed. If violated, a crash mid-update leaves the Partial/Full tables and header counters inconsistent with no atomic recovery boundary.

flowchart TB
  START["file_perm_dealloc(vpid)"] --> SEARCH["search Partial table"]
  SEARCH --> FOUND{found in Partial?}
  FOUND -- yes --> CLEAR["clear bit in partsect<br/>log RVFL_PARTSECT_DEALLOC<br/>is_empty?"]
  FOUND -- no --> REMOVE["remove vsid from Full table<br/>was_full = true"]
  REMOVE --> MERGED{ftab page merged away?}
  MERGED -- "same sector" --> SAMESEC["clear merged page's bit too<br/>simulate ftab dealloc"]
  MERGED -- "other sector" --> RECURSE["file_perm_dealloc(merged) recursive"]
  MERGED -- none --> BUILD["build partsect_new = FULL minus bit"]
  SAMESEC --> BUILD
  RECURSE --> BUILD
  BUILD --> SPACE{free slot in Partial?}
  SPACE -- yes --> INSERT["file_extdata_insert_at ordered"]
  SPACE -- no --> NEWPG["file_perm_alloc new ftab page"]
  CLEAR --> HDR["file_header_dealloc<br/>update counters"]
  INSERT --> HDR
  NEWPG --> HDR
  HDR --> DEALLOC["pgbuf_dealloc_page(vpid)"]
  DEALLOC --> EXIT["exit: unfix page_ftab"]

Figure 9-2 — Branch map of file_perm_dealloc. Left: sector already Partial (common case). Right: sector was Full, where the Full-to-Partial migration happens and may recurse.

Left branch — already Partial. The sector has a free page so it is already in the Partial table: clear the bit, recompute is_empty, log it with RVFL_PARTSECT_DEALLOC via log_append_undoredo_data — undoredo, not postpone, because by do-postpone time we are executing the free, so the table edit is a normal recoverable change.

Right branch — sector was Full. Every reserved sector is in exactly one table (Chapter 6), so if not Partial it is Full. The function sets was_full = true and calls file_extdata_find_and_remove_item on the Full table; this may empty the last Full-table component, returning a vpid_merged — a now-orphaned table page that must itself be freed. The guard, written as the two merged cases in Figure 9-2, hinges on VSID_IS_SECTOR_OF_VPID (&vsid_dealloc, &vpid_merged):

different sector → file_perm_dealloc (..., &vpid_merged, FILE_ALLOC_TABLE_PAGE) recurses to free it normally;
same sector (the one being moved to Partial) → do not recurse; set is_merged_page_from_sector, clear that page’s bit too in the new descriptor, simulate accounting via file_header_dealloc (..., FILE_ALLOC_TABLE_PAGE, ...) then pgbuf_dealloc_page (vpid_merged).

The new descriptor starts from partsect_new.page_bitmap = FILE_FULL_PAGE_BITMAP with the freed bit(s) file_partsect_clear_bit’d, then is inserted at the ordered position; if Partial has no free slot a new table page comes from file_perm_alloc (FILE_ALLOC_TABLE_PAGE). Guards: file_extdata_find_ordered must report the VSID not present (assert_release (false) on duplicate), and assert (page_ftab == NULL) confirms all transient table pages were unfixed.

Tail — both branches. file_header_dealloc (fhead, alloc_type, is_empty, was_full) adjusts n_page_free / sector counters (file_log_fhead_dealloc logs it); the page is then fixed and handed to pgbuf_dealloc_page (§9.6), and PSTAT_FILE_NUM_PAGE_DEALLOCS bumps. is_empty/was_full drive the math: a was_full sector now contributes free pages, an is_empty sector becomes fully free. Most error paths unfix any held page_ftab via ASSERT_ERROR (); goto exit; the two Full-branch sub-paths — the recursive file_perm_dealloc of an other-sector orphan and the same-sector merged-page pgbuf_fix failure — instead return error_code directly, which is safe because page_ftab is still NULL at those points. The hard-fail-during-recovery guard lives one level up in file_rv_dealloc_internal (§9.8), not here.

9.4 `file_destroy` — giving back the whole file

Destroying a file returns every sector it reserved; is_temp forks the entire function. The prologue: a permanent file calls file_tracker_unregister (catalog-visible, dropped first); a temporary file calls logtb_set_check_interrupt (thread_p, false) so destroy cannot abort halfway and leak pages. The header is fixed, file_table_collect_all_vsids gathers every sector, then the forks diverge on eviction and re-converge on one disk_unreserve_ordered_sectors call.

flowchart TB
  P["file_destroy(vfid, is_temp)"] --> FORK{is_temp?}
  FORK -- no --> UNREG["file_tracker_unregister"]
  FORK -- yes --> NOINT["disable interrupt check"]
  UNREG --> FIX["fix header page"]
  NOINT --> FIX
  FIX --> COLLECT["file_table_collect_all_vsids<br/>-> vsid_collector"]
  COLLECT --> FORK2{permanent or temporary?}
  FORK2 -- permanent --> PDEAL["file_sector_map_dealloc over Partial+Full<br/>pgbuf_dealloc_page each user+ftab page<br/>pgbuf_dealloc_page(header)"]
  FORK2 -- temporary --> TDEAL["file_sector_map_dealloc_temp over Partial<br/>pgbuf_dealloc_temp_page each<br/>decrement Tempcache counters"]
  PDEAL --> UNRES["disk_unreserve_ordered_sectors"]
  TDEAL --> UNRES
  UNRES --> EXIT["exit: free collectors, unfix header,<br/>restore interrupt check"]

Figure 9-3 — file_destroy two forks.

Permanent fork. file_extdata_apply_funcs over Partial then Full passes file_extdata_collect_ftab_pages (gather file-table-page sectors into a FILE_FTAB_COLLECTOR) and file_sector_map_dealloc (fix each user page, pgbuf_dealloc_page); it then evicts each collected table-page sector and finally the header. Every owned page becomes a PAGE_UNKNOWN eviction candidate before sectors are unreserved.

Temporary fork. No Full table, so only Partial is walked via file_sector_map_dealloc_temp / pgbuf_dealloc_temp_page. It logs nothing and tolerates a missing page (pgbuf_simple_fix NULL → continue) since temporary pages need not be on disk, then decrements the global tempcache spacedb_temp counters (Chapter 11) and frees the header.

INVARIANT (evict-before-unreserve): Every buffer-pool page of a file must become an eviction candidate (pgbuf_dealloc_page / pgbuf_dealloc_temp_page) before its sectors are unreserved. If violated, a stale dirty BCB could be flushed to a sector already unreserved and re-reserved by another file, writing one file’s bytes into another.

The exit: label is universal cleanup: unfix the header, db_private_free both collector arrays, restore the interrupt-check flag for the temporary case.

9.5 `file_vsid_collector` and `file_table_collect_all_vsids`

The collector is a fixed-size array plus count:

// struct file_vsid_collector -- src/storage/file_manager.c
struct file_vsid_collector { VSID *vsids; int n_vsids; };

Field	Role	Why it exists
`vsids`	Pointer to a `db_private_alloc`’d array of `fhead->n_sector_total` `VSID`s	Output buffer, sized exactly to the sector count so no realloc is ever needed.
`n_vsids`	Running count of sectors appended	Both the array cursor during collection and the element count handed to `disk_unreserve_ordered_sectors`. After collection it must equal `n_sector_total`.

file_table_collect_all_vsids allocates the array, then applies file_table_collect_vsid (collector->vsids[collector->n_vsids++] = *vsid) across Partial and — for permanent files only — Full:

// file_table_collect_all_vsids -- src/storage/file_manager.c
  collector_out->vsids = (VSID *) db_private_alloc (thread_p, fhead->n_sector_total * sizeof (VSID));
  FILE_HEADER_GET_PART_FTAB (fhead, extdata_ftab);
  error_code = file_extdata_apply_funcs (thread_p, extdata_ftab, NULL, NULL, file_table_collect_vsid, collector_out, ...);
  if (!FILE_IS_TEMPORARY (fhead))
    {
      FILE_HEADER_GET_FULL_FTAB (fhead, extdata_ftab);    /* <- temporary files have no full table */
      error_code = file_extdata_apply_funcs (thread_p, extdata_ftab, NULL, NULL, file_table_collect_vsid, collector_out, ...);
    }
  if (collector_out->n_vsids != fhead->n_sector_total)
    assert_release (false);                               /* <- the count invariant, checked */
  qsort (collector_out->vsids, fhead->n_sector_total, sizeof (VSID), disk_compare_vsids);   /* <- ordered output */

INVARIANT (complete collection): The collected VSID count must equal fhead->n_sector_total. If violated, the file’s bookkeeping is corrupt and destroy fails with assert_release (false).

The final qsort establishes the next function’s precondition — the VSID list ordered by (volid, sectid) so disk_unreserve_ordered_sectors can batch per volume in one pass.

9.6 `pgbuf_dealloc_page` — the eviction hint

Both file_perm_dealloc and the permanent file_destroy fork hand each freed page to pgbuf_dealloc_page, which does no flush or write I/O — it resets the page type to PAGE_UNKNOWN and steers the BCB toward victimization:

// pgbuf_dealloc_page -- src/storage/page_buffer.c
  /* how it works: page is "deallocated" by resetting its type to PAGE_UNKNOWN. also prepare bcb for victimization.
   * note: the bcb used to be invalidated. but that means flushing page to disk and waiting for IO write. that may be
   *       too slow. if we add the bcb to the bottom of a lru list, it will be eventually flushed by flush thread and
   *       victimized. */
  CAST_PGPTR_TO_BFPTR (bcb, page_dealloc);
  assert (get_fcnt (&bcb->atomic_latch) == 1);   /* <- caller must hold the only latch */

Deallocation is a hint, not a synchronous discard — the page may still be flushed later by the flush thread, which is exactly why the evict-before-unreserve invariant (§9.4) requires it be issued before the sector becomes reusable.

9.7 `disk_unreserve_ordered_sectors` — returning sectors

The disk-manager counterpart of Chapter 4’s reservation: a thin wrapper that takes CSECT_DISK_CHECK as a reader and delegates to disk_unreserve_ordered_sectors_without_csect. The worker exploits the §9.5 sort — it groups consecutive vsids sharing a volid into per-volume runs in a DISK_RESERVE_CONTEXT (asserting volid strictly increasing across runs, sectid within one) and issues one disk_unreserve_sectors_from_volume per volume, which iterates sector-table units calling disk_stab_unit_unreserve — the leaf where the permanent-vs-temporary postpone split lands, mirroring §9.2 at the sector level:

// disk_stab_unit_unreserve -- src/storage/disk_manager.c
  assert ((unreserve_bits & (*cursor->unit)) == unreserve_bits);   /* <- all target bits were actually set */
  if (unreserve_bits != 0)
    {
      if (context->purpose == DB_PERMANENT_DATA_PURPOSE)
        log_append_postpone (thread_p, RVDK_UNRESERVE_SECTORS, &addr, sizeof (unreserve_bits), &unreserve_bits);  /* <- deferred */
      else
        {
          (*cursor->unit) &= ~unreserve_bits;                       /* <- bitmap cleared NOW */
          /* ... pgbuf_set_dirty + lock_reserve_for_purpose condensed ... */
          disk_cache_update_vol_free (cursor->volheader->volid, nsect);   /* <- then cache, Ch.4 order */
        }
    }

Permanent purpose stages the clear via log_append_postpone (RVDK_UNRESERVE_SECTORS), upholding the deferred-free invariant (§9.1) at sector granularity; temporary purpose clears the bits immediately, in the bitmap-then-cache order Chapter 4’s release-order invariant mandates (bit cleared, then disk_cache_update_vol_free). The entry assert guards that every freed sector was genuinely reserved.

9.8 The abort path — restoring state

Aborting a permanent deallocation is free: its staged postpone records are discarded, never run, so the page stays allocated (Figure 9-1’s back edge). Real undo happens only when a page allocation is rolled back. Both do-postpone and undo route through file_rv_dealloc_internal, which fixes the header, opens the sysop, calls file_perm_dealloc, and — because a recovery replay must not be tolerated to fail silently — hard-fails via if (error_code != NO_ERROR) { assert_release (false); } on any non-NO_ERROR return. It then seals the sysop by one parameter: log_sysop_abort on error, log_sysop_end_logical_compensate for FILE_RV_DEALLOC_COMPENSATE (undo of an alloc), otherwise log_sysop_end_logical_run_postpone (do-postpone of a dealloc) — all three making the table change durable before the header is unfixed (§9.3).

9.9 Chapter summary — key takeaways

file_dealloc stages, it does not free — a non-temporary file appends an RVFL_DEALLOC postpone record carrying (VFID, VPID); temporary files deallocate nothing; numerable files also mark-delete the user-page-table entry now (Chapter 10).
Postpone closes the committed-releaser window — no transaction can grab a freed page before the releaser’s commit is irreversible, the sector-level analogue of Chapter 4’s release ordering.
file_perm_dealloc is the real free, branch-rich. Partial: clear a bit. Full: migrate to Partial, recursing to free an orphaned table page except a same-sector orphan (inlined). Must run inside a system operation.
file_destroy forks on is_temp end to end. Permanent: unregister, evict via pgbuf_dealloc_page, unreserve postponed. Temporary: disable interrupts, evict via pgbuf_dealloc_temp_page, adjust tempcache counters, unreserve immediately.
Collection precedes destruction, sorted — file_table_collect_all_vsids gathers exactly n_sector_total VSIDs (asserting the count) and qsorts them so unreserve batches per volume.
pgbuf_dealloc_page is an eviction hint, not a flush — it queues the PAGE_UNKNOWN BCB for victimization, so pages must be evicted before their sectors are unreserved.
The postpone split bottoms out in disk_stab_unit_unreserve — permanent stages RVDK_UNRESERVE_SECTORS; temporary clears bitmap then cache inline; abort of a permanent dealloc is free.

Chapter 10: Numerable Files and the User Page Table

A numerable file adds one promise: ask for “the n-th page I allocated”, in allocation order, in amortized O(1). The sector allocation machinery of Ch 3-Ch 7 cannot answer this — it stores ownership, not order — so the numerable layer keeps a second, separately-externalized index over the same VPIDs: the User Page Table. See cubrid-disk-manager.md (“Numerable files”) for the high-level contract; here we trace every branch.

10.1 Why the sector table cannot recover allocation order

The Partial and Full sector tables (Ch 3) are kept VSID-sorted via disk_compare_vsids so reservation and lookup are binary searches. That sort destroys history two ways: (1) promotion erases batch identity — a filled partial sector migrates Partial -> Full and is re-sorted by VSID, losing which batch reserved it; (2) cross-expand reorders — a batch spanning a fresh reservation gives new sectors a VSID order unrelated to produce order, so a sorted bitmap scan yields pages in a different sequence than the user received them.

The sector table answers membership but not order. The User Page Table re-externalizes that lost order as an append-only list of VPIDs — one entry per user page, in allocation order — so find_nth(n) is a positional index into it.

The table is a chain of FILE_EXTENSIBLE_DATA components (the extdata primitive used throughout file_manager.c) whose items are bare VPIDs. The header caches the last component in vpid_last_user_page_ftab for O(1) appends; FILE_HEADER_GET_USER_PAGE_FTAB locates the first component in the header page.

10.2 The find-nth context and the header’s order-keeping fields

file_find_nth_context (struct { VPID *vpid_nth; int nth; int first_index; }) is the accumulator threaded through the scan callbacks:

Field	Role	Why it exists
`vpid_nth`	Out-param pointer for the found VPID	Scan writes through it so the caller’s slot is filled in place
`nth`	Remaining index, decremented as components/items are skipped	Countdown; the scan stops when it reaches the target item
`first_index`	Absolute item index of the current component’s entry 0	Feeds the cache: where in the global sequence the landing component begins

Five FILE_HEADER fields carry the order machinery (struct covered in Ch 1):

Field	Role	Why it exists
`vpid_last_user_page_ftab`	Hint to the last UPT component page	O(1) append target; equals the header VPID while the table lives in-header
`vpid_find_nth_last`	Cached page of the last `find_nth` landing	Lets sequential `find_nth(n), find_nth(n+1)...` resume mid-table
`first_index_find_nth_last`	Global index of entry 0 on `vpid_find_nth_last`	Turns the cached page into an absolute offset for the next search
`n_page_user`	Total user pages (incl. mark-deleted)	Numerator of the live-page count
`n_page_mark_delete`	Count of mark-delete-bit entries	Correction term: live pages = `n_page_user - n_page_mark_delete`

Invariant (live-count correction). Findable pages = n_page_user - n_page_mark_delete, never n_page_user. file_numerable_find_nth enforces this at the auto-alloc test and when skipping marked entries; drift would make find_nth return a deleted page or allocate at the wrong index. Kept exact by file_header_update_mark_deleted logging +1/-1 on every set/clear.

Invariant (cache validity). FILE_CACHE_LAST_FIND_NTH is true only for FILE_TEMP numerable files on a non-parallel thread, so the cache may be read/written without a write latch or dirty flag. Any deallocation resets it (VPID_SET_NULL (&fhead->vpid_find_nth_last)); appends leave it valid because they only extend the tail.

10.3 file_numerable_add_page — appending on every allocation

file_alloc calls file_numerable_add_page right after a page’s bit is set, whenever FILE_IS_NUMERABLE (fhead), so the UPT grows in lock-step. It resolves the tail from vpid_last_user_page_ftab (in-header if equal to the header VPID, else pgbuf_fix WRITE), chains a component if full, then appends:

// file_numerable_add_page -- src/storage/file_manager.c
if (VPID_EQ (&fhead->vpid_last_user_page_ftab, &vpid_fhead))
  FILE_HEADER_GET_USER_PAGE_FTAB (fhead, extdata_user_page_ftab);   /* tail in header */
else page_ftab = pgbuf_fix (..., OLD_PAGE, PGBUF_LATCH_WRITE, ...); /* else fix tail page */
// ... condensed: if (file_extdata_is_full) chain via file_temp_alloc/file_perm_alloc ...
file_extdata_append (extdata_user_page_ftab, vpid);                /* <- the append */

flowchart TD
  A["hint = vpid_last_user_page_ftab"] --> B{"hint == header VPID?"}
  B -->|yes| C["extdata = in-header UPT"]
  B -->|no| D{"pgbuf_fix WRITE ok?"}
  D -->|no| Z["ASSERT_ERROR_AND_SET, goto exit"]
  D -->|yes| F["extdata = that ftab page"]
  C --> G{"file_extdata_is_full?"}
  F --> G
  G -->|no| M["file_extdata_append vpid"]
  G -->|yes| H{"FILE_IS_TEMPORARY?"}
  H -->|yes| I["file_temp_alloc TABLE_PAGE"]
  H -->|no| J["file_perm_alloc TABLE_PAGE"]
  I --> K["fix NEW_PAGE, link prev->next, init extdata,\n advance last_user_page_ftab"]
  J --> K
  K --> M
  M --> N{"temporary?"}
  N -->|no| O["file_log_extdata_add WAL"]
  N -->|yes| P["pgbuf_set_dirty only"]
  O --> Q["exit: unfix page_ftab if held"]
  P --> Q
  Z --> Q

Figure 10-1. file_numerable_add_page, all branches.

The branch worth restating is the temp-vs-permanent logging asymmetry (Figure 10-1 node N): a permanent append emits file_log_extdata_add WAL (plus RVFL_FHEAD_SET_LAST_USER_PAGE_FTAB undoredo when a component is chained), a temporary append only marks pages dirty. A closing assert (!file_extdata_is_full (...)) rules out overflow.

10.4 file_numerable_find_nth — the indexed lookup

The function fixes the header READ, asserts numerable, then branches three ways. Auto-alloc-at-end (auto_alloc && nth == fhead->n_page_user - fhead->n_page_mark_delete) promotes the latch and calls file_alloc to grow the file, re-fixing WRITE and re-checking on ER_PAGE_LATCH_PROMOTE_FAIL. Otherwise the search splits on n_page_mark_delete: with holes it visits every item (file_extdata_find_nth_vpid_and_skip_marked); with no holes it strides components and may resume from the cache, whose predicate is load-bearing:

// file_numerable_find_nth (no-holes branch) -- src/storage/file_manager.c
if (FILE_CACHE_LAST_FIND_NTH (fhead, thread_p) && !VPID_ISNULL (&fhead->vpid_find_nth_last)
    && !VPID_EQ (&vpid_fhead, &fhead->vpid_find_nth_last) && nth >= fhead->first_index_find_nth_last)
  { find_nth_context.first_index = fhead->first_index_find_nth_last;   /* resume from cache */
    find_nth_context.nth -= fhead->first_index_find_nth_last; }        /* <- rebase the countdown */

flowchart TD
  A["fix header READ, assert numerable"] --> B{"auto_alloc and nth == live count?"}
  B -->|yes| C{"promote latch ok?"}
  C -->|FAIL| E["re-fix WRITE, re-check, file_alloc, exit"]
  C -->|ok| F["file_alloc, exit"]
  B -->|no| G{"n_page_mark_delete > 0?"}
  G -->|yes| H["skip-marked over EVERY item"]
  G -->|no| I{"cache usable?"}
  I -->|yes| J["fix cached page, rebase nth"]
  I -->|no| K["first_index = 0, from head"]
  J --> L["find_nth_vpid: stride components"]
  K --> L
  L --> M{"cache eligible?"}
  M -->|yes| N["store landing page + first_index"]
  M -->|no| O["skip cache update"]
  H --> P{"vpid_nth still NULL?"}
  N --> P
  O --> P
  P -->|yes| Q["assert_release false, ER_FAILED"]
  P -->|no| R["exit: unfix pages"]

Figure 10-2. file_numerable_find_nth, all branches.

The three predicate conjuncts above (notably nth >= first_index_find_nth_last, which forbids a backward resume) let the search start mid-table and walk only the landing component — the amortized O(1) for the run-merge pattern find_nth(0), find_nth(1), .... Exit cleanup avoids double-unfixing aliased page pointers.

10.5 The two scan callbacks

file_extdata_apply_funcs invokes a per-component and/or per-item function. file_extdata_find_nth_vpid is the per-component (no-holes) callback — a whole component is one O(1) stride:

// file_extdata_find_nth_vpid -- src/storage/file_manager.c
int count_vpid = file_extdata_item_count (extdata);
if (count_vpid <= find_nth_context->nth)
  { find_nth_context->nth -= count_vpid;            /* <- skip whole component */
    find_nth_context->first_index += count_vpid; }  /* <- keep global index accurate */
else
  { VPID_COPY (find_nth_context->vpid_nth, (VPID *) file_extdata_at (extdata, find_nth_context->nth));
    assert (!FILE_USER_PAGE_IS_MARKED_DELETED (find_nth_context->vpid_nth));  /* <- no holes */
    *stop = true; }

file_extdata_find_nth_vpid_and_skip_marked is the per-item (holes) callback; it inspects every VPID because a deleted entry consumes a slot but not an index:

// file_extdata_find_nth_vpid_and_skip_marked -- src/storage/file_manager.c
if (FILE_USER_PAGE_IS_MARKED_DELETED (vpidp))   return NO_ERROR;   /* <- skip, do not advance nth */
if (find_nth_context->nth == 0)  { *find_nth_context->vpid_nth = *vpidp; *stop = true; }
else                             find_nth_context->nth--;

The asymmetry is the point: no holes lets you stride components and keep first_index to prime the cache; holes do not, since a component’s live-entry count is not its item count.

10.6 The mark-delete machinery (permanent numerable)

A numerable page cannot vanish from the middle of the UPT mid-transaction — that would renumber later pages and corrupt concurrent find_nth — so file_dealloc removes in two phases. Phase 1, in-transaction, only sets the top bit of the pageid (FILE_USER_PAGE_MARK_DELETE_FLAG == 0x80000000) via FILE_USER_PAGE_MARK_DELETED (vpid_found), logs RVFL_USER_PAGE_MARK_DELETE undoredo (permanent only), bumps the counter via file_header_update_mark_deleted (..., 1), and resets the cache if FILE_CACHE_LAST_FIND_NTH. The entry keeps its slot, later indices are undisturbed, and find_nth skips it via the per-item callback.

Phase 2, at commit run-postpone, physically removes the entry via file_extdata_find_and_remove_item: it walks the chain (linear, ordered=false, since the UPT is append-ordered not VSID-ordered), removes the item with file_extdata_remove_at (logged via file_log_extdata_remove), pops the VPID into an out-param, and merges an emptied component with its predecessor, reporting the freed table page through vpid_merged; it asserts a system op is active and assert_release(false)s on a missing item. A marked pop decrements the counter:

// file_dealloc run-postpone body -- src/storage/file_manager.c
file_extdata_find_and_remove_item (..., vpid_dealloc, file_compare_vpids, false,
                                   &vpid_removed, &vpid_merged);
if (!VPID_ISNULL (&vpid_merged))                                  /* table page emptied -> free it */
  file_perm_dealloc (thread_p, page_fhead, &vpid_merged, FILE_ALLOC_TABLE_PAGE);
if (FILE_USER_PAGE_IS_MARKED_DELETED (&vpid_removed))
  file_header_update_mark_deleted (thread_p, page_fhead, -1);     /* <- counter back down */

On abort, file_rv_user_page_unmark_delete_logical undoes phase 1. Because concurrent transactions may have shifted the table, it cannot trust the original position — it re-searches by VPID (file_extdata_search_item), asserts the bit is set, clears it with FILE_USER_PAGE_CLEAR_MARK_DELETED, and logs a RVFL_USER_PAGE_MARK_DELETE_COMPENSATE record via log_append_compensate.

Invariant (slot stability under deletion). A marked-deleted entry never moves or re-indexes until commit, so a concurrent reader’s cached vpid_find_nth_last stays structurally valid through a mark (deallocation resets only the cache, not the slots). Compacting on mark would renumber pages mid-transaction.

10.7 file_numerable_truncate — dealloc-driven shrink

Truncation is the only public shrink path, leaning on find_nth + file_dealloc:

// file_numerable_truncate -- src/storage/file_manager.c
if (!FILE_IS_NUMERABLE (fhead)) { assert_release (false); error_code = ER_FAILED; goto exit; }
if (fhead->n_page_mark_delete != 0) { assert (false); return NO_ERROR; }  /* <- refuse mid-dealloc */
while (fhead->n_page_user > npages) {                                     /* repeatedly drop index npages */
    file_numerable_find_nth (thread_p, vfid, npages, false, NULL, NULL, &vpid);  /* auto-alloc off */
    file_dealloc (thread_p, vfid, &vpid, fhead->type); }

Each iteration deallocates the page now at index npages; as n_page_user drops the loop ends exactly at npages. It bails on n_page_mark_delete != 0, since a half-finished dealloc makes the index meaningless.

10.8 Real callers and the dead-code finding

file_numerable_find_nth has three callers across two file-type families; mark-delete is exercised only by the permanent family. The extendible-hash family is consumed by both src/storage/extendible_hash.c and the file-hash-scan code in src/query/query_hash_scan.c — fhs_fix_nth_page calls file_numerable_find_nth, and its files are created via file_create_ehash / file_create_ehash_dir, so they are FILE_EXTENDIBLE_HASH(_DIRECTORY), the same family as the storage row.

Caller	File type	Deallocates?	Mark-delete used?
External sort run files (`external_sort.c`, `file_create_temp_numerable`)	`FILE_TEMP`	never	no (dead)
Extendible hash bucket/directory (`extendible_hash.c` find_nth, truncate)	`FILE_EXTENDIBLE_HASH(_DIRECTORY)`	yes	yes
File-hash-scan FHS (`query_hash_scan.c`, `fhs_fix_nth_page`)	`FILE_EXTENDIBLE_HASH(_DIRECTORY)`	via truncate path	yes

Non-numerable temp consumers — list_file query intermediates and the query result cache (FILE_QUERY_AREA) — never touch this layer; they chain pages via QFILE_PAGE_HEADER.next_vpid, with no find_nth contract.

Critical finding. For FILE_TEMP numerable files (external sort), file_temp_alloc never deallocates, so FILE_USER_PAGE_MARK_DELETED, n_page_mark_delete, and the whole RVFL_USER_PAGE_MARK_DELETE* chain are effectively dead code there: n_page_mark_delete stays 0 and find_nth always takes the no-holes/cache branch. The table data structure is still mandatory — it supplies the order contract the sort merge depends on. The dead part is the deletion sub-apparatus, not the table.

10.9 Chapter summary — key takeaways

VSID-sorted sector tables store which pages a file owns but discard order; the User Page Table re-externalizes order as an append-only VPID list, so find_nth(n) is a positional index.
file_numerable_add_page appends one VPID per allocation inside file_alloc, using vpid_last_user_page_ftab as an O(1) tail hint and chaining a component (logged for permanent, dirty-only for temporary) when the tail fills.
file_numerable_find_nth is O(1)-amortized only in the no-holes branch (file_extdata_find_nth_vpid strides components, the cache resumes mid-table); the holes branch falls back to a per-item skip scan.
Live page count is n_page_user - n_page_mark_delete, governing auto-alloc-at-end and deleted-entry skipping, kept exact by logged deltas.
Permanent deletion is two-phase: phase 1 sets FILE_USER_PAGE_MARK_DELETE_FLAG and bumps the counter (slot kept); phase 2 at run-postpone removes via file_extdata_find_and_remove_item; abort re-searches by VPID and clears the bit.
file_numerable_truncate is a thin find_nth(npages) + file_dealloc loop, refusing to run while n_page_mark_delete != 0.
Mark-delete is dead code for FILE_TEMP numerable (external sort never deallocates), yet the table itself stays necessary — the dead part is the deletion sub-apparatus, not the structure.

Chapter 11: Special Paths Tempcache Tracker Sticky Page TDE and Recovery

Five machines sit beside the single-page lifecycle of Ch 6-10: the temp-file cache, the File Tracker, the sticky-first-page escape hatch, the TDE flags, and the recovery handlers. This chapter dissects only the code; for the why, see the companion’s “Temporary file cache”, “File destruction and the File Tracker”, and “Two-step sector reservation” sections.

11.1 The temp-file cache: recycling whole files

file_Tempcache is a global pool holding retired temp files intact so the next request of the same shape gets one back instead of destroy-and-recreate. Three structs cooperate.

// file_tempcache_entry -- src/storage/file_manager.c
struct file_tempcache_entry { VFID vfid; FILE_TYPE ftype; FILE_TEMPCACHE_ENTRY *next; };

// file_tempcache_tran_entry -- src/storage/file_manager.c
struct file_tempcache_tran_entry {
  pthread_mutex_t mutex;
  FILE_TEMPCACHE_ENTRY *head;
#if !defined (NDEBUG)
  int owner_mutex;
#endif
};

// file_tempcache -- src/storage/file_manager.c
struct file_tempcache {
  FILE_TEMPCACHE_ENTRY *free_entries;
  int nfree_entries_max, nfree_entries;
  FILE_TEMPCACHE_ENTRY *cached_not_numerable, *cached_numerable;
  int ncached_max, ncached_not_numerable, ncached_numerable;
  pthread_mutex_t mutex;
#if !defined (NDEBUG)
  int owner_mutex;
#endif
  FILE_TEMPCACHE_TRAN_ENTRY *tran_files;
  SPACEDB_FILES spacedb_temp;
};
static FILE_TEMPCACHE file_Tempcache;

file_tempcache_entry

Field	Role	Why it exists
`vfid`	identifies the cached file	the cache stores real, allocated files, not descriptors
`ftype`	file type of the cached file	a `get` matches by type; a near-miss is re-typed in place
`next`	list link	one entry travels between `free_entries`, a tran list, and a cached list — never on two at once

file_tempcache_tran_entry (one per transaction index)

Field	Role	Why it exists
`mutex`	per-transaction lock over this transaction’s `head`	held by `file_tempcache_lock_tran_entry` / `unlock_tran_entry` during the commit/abort drain
`head`	files this transaction created and still owns	drained at commit/abort by `file_tempcache_drop_tran_temp_files`
`owner_mutex`	NDEBUG-only ownership tracker	records which thread holds `mutex` for the lock/unlock assertions

file_tempcache (the global)

Field	Role	Why it exists
`free_entries`	pool of empty entry shells	avoids malloc/free on every cache op
`nfree_entries_max` / `nfree_entries`	cap / current size of the shell pool	`init` sets max to `ntrans * 8`
`cached_not_numerable`	retired regular temp files	a `get(numerable=false)` pops here
`cached_numerable`	retired numerable temp files	separate list since the user page table differs (Ch 10)
`ncached_max`	total capacity (`PRM_ID_MAX_ENTRIES_IN_TEMP_FILE_CACHE`)	`put` refuses once `not_numerable + numerable >= max`
`ncached_not_numerable` / `ncached_numerable`	per-list counts	kept lock-step with the lists (see invariant)
`mutex`	guards global lists and shell pool	one lock for all global state
`owner_mutex`	NDEBUG-only ownership tracker	which thread holds `mutex`, for `file_tempcache_lock` / `unlock` asserts
`tran_files`	array of per-transaction lists	indexed by tran index so commit is O(1)
`spacedb_temp`	temp-space accounting	feeds `SPACEDB` reporting

Invariant — list head and count agree. (cached_not_numerable == NULL) == (ncached_not_numerable == 0) and likewise for numerable; put asserts both before linking. If a count drifted, get underflows (it asserts ncached_* > 0) or put over-admits past ncached_max, leaking temp files.

11.1.1 `file_tempcache_get` — hand out a recycled file or a fresh shell

// file_tempcache_get -- src/storage/file_manager.c
*entry = numerable ? file_Tempcache.cached_numerable : file_Tempcache.cached_not_numerable;
if (*entry != NULL && (*entry)->ftype != ftype) {       /* cached file is wrong type */
    error_code = file_temp_set_type (thread_p, &(*entry)->vfid, ftype);
    if (error_code != NO_ERROR) *entry = NULL;          /* <- re-type failed: fall to miss */
    else (*entry)->ftype = ftype;
}
if (*entry != NULL) { /* hit: unlink, decrement the matching ncached_* */ ... }
else { error_code = file_tempcache_alloc_entry (entry); /* miss: bare shell, VFID_SET_NULL */ }

Five branches: hit/type-matches pops and decrements; hit/re-type succeeds patches ftype then pops; hit/re-type fails nulls *entry to the miss path; miss allocates a shell (vfid == NULL); shell-alloc failure propagates. A hit names an allocated file; a miss returns a shell for the caller to create into.

11.1.2 `file_tempcache_put` — admit a file back, or refuse

// file_tempcache_put -- src/storage/file_manager.c
if (file_header_copy (...&entry->vfid, &fhead) != NO_ERROR
    || fhead.n_page_user > prm_get_integer_value (PRM_ID_MAX_PAGES_IN_TEMP_FILE_CACHE))
  return false;                                  /* <- too big / unreadable: no lock taken yet */
file_tempcache_lock ();
if (ncached_not_numerable + ncached_numerable < ncached_max) {
    if (file_temp_reset_user_pages (thread_p, &entry->vfid) != NO_ERROR)
      { file_tempcache_unlock (); return false; } /* <- reset failed: cannot reuse */
    /* push onto cached_numerable / cached_not_numerable per FILE_IS_NUMERABLE(&fhead) */
    file_tempcache_unlock (); return true;
}
file_tempcache_unlock (); return false;          /* cache full */

Four exits, one keeps the file: header-copy-fails-or-too-big (false, before locking), cache-full (false), reset-fails (false), and all-clear (push onto the list chosen by the real header’s FILE_IS_NUMERABLE, true). A false return tells the caller to destroy it.

11.1.3 Commit/abort drain — `file_tempcache_drop_tran_temp_files`

// file_tempcache_drop_tran_temp_files -- src/storage/file_manager.c
int tran_index = file_get_tempcache_entry_index (thread_p);
file_tempcache_lock_tran_entry (&file_Tempcache.tran_files[tran_index]);
if (file_Tempcache.tran_files[tran_index].head != NULL)
  file_tempcache_cache_or_drop_entries (thread_p, &file_Tempcache.tran_files[tran_index].head);
file_tempcache_unlock_tran_entry (&file_Tempcache.tran_files[tran_index]);

file_tempcache_cache_or_drop_entries walks head; per entry it calls file_tempcache_put, and on false calls file_destroy(..., true) (interrupts suppressed so nothing leaks mid-drop) then file_tempcache_retire_entry; the list ends empty. tran_files is sized ntrans, where ntrans = logtb_get_number_of_total_tran_indices () + 1 in server mode (the +1 reserves index 0) and 1 in SA mode — the array is ntrans-sized, not ntrans + 1.

11.1.4 Query-manager-owned files — `file_temp_preserve` / `file_temp_retire_preserved`

A temp file that must outlive the request but not the session cannot stay on the transaction list, or the next commit reclaims it. file_temp_preserve removes it:

// file_temp_preserve -- src/storage/file_manager.c
entry = file_tempcache_pop_tran_file (thread_p, vfid);
if (entry == NULL) assert_release (false);    /* must have been on the list */
else file_tempcache_retire_entry (entry);     /* return the shell; file is now untracked */

When done the owner calls file_temp_retire_preserved = file_temp_retire_internal(..., /*was_preserved=*/true). The flag changes how the entry is obtained: a preserved file is on no list, so retire allocates a fresh shell with vfid from the argument; a non-preserved retire pops the existing entry. Both funnel into file_tempcache_put and on false file_destroy(..., true).

Invariant — a temp file lives on exactly one tracking list: its transaction’s head, OR preserved (on no list), OR a global cached list, OR destroyed. file_temp_preserve enforces the hand-off by popping before retiring. Skip the pop and both the commit drain and the query manager retire it — a double-free.

11.2 The File Tracker: the catalog of permanent files

The File Tracker is one permanent file per database whose body is a single FILE_EXTENSIBLE_DATA chain of FILE_TRACK_ITEM records — one per permanent file. It is located through two globals seeded at boot from boot_Db_parm->trk_vfid: file_Tracker_vfid (its VFID) and file_Tracker_vpid (its sticky first page, 11.3). boot_sr.c calls file_tracker_create at creation, file_tracker_load at every restart.

// file_track_metadata / file_track_item -- src/storage/file_manager.c
union file_track_metadata {           /* 8 bytes, role depends on item->type */
  FILE_TRACK_HEAP_METADATA heap;      /* { bool is_marked_deleted; bool dummy[7]; } */
  INT64 metadata_size_tracker;        /* forces the union to exactly 8 bytes */
};
struct file_track_item {
  INT32 fileid; INT16 volid; INT16 type;  /* type is a FILE_TYPE cast to INT16 */
  FILE_TRACK_METADATA metadata;           /* total 16 bytes */
};

file_track_item — (volid, fileid) is the search key:

Field	Role	Why it exists
`fileid`	low 4 bytes of the VFID	with `volid`, uniquely names the file
`volid`	volume of the file	items kept ordered by `file_compare_track_items` for binary search
`type`	`FILE_TYPE` as 16 bits	lets `file_tracker_map` filter by type without fixing each header
`metadata`	per-type side data	meaningful only for heaps; otherwise zero

file_track_metadata — a role matrix, because the union means different things by type:

`item->type`	Active member	Meaning
`FILE_HEAP` / `FILE_HEAP_REUSE_SLOTS`	`heap.is_marked_deleted`	heap is logically dropped but kept for reuse (`file_tracker_item_reuse_heap`)
any other type	`metadata_size_tracker`	unused; written `0` by `file_tracker_register` when `metadata == NULL`

Invariant — items are ordered across the whole chain by file_compare_track_items; register inserts at the binary-search position. Both unregister (file_extdata_find_and_remove_item) and the iterator’s resume-by-cursor logic rely on this order — an out-of-order insert makes a later lookup silently miss a file that exists.

flowchart LR
  parm["boot_Db_parm->trk_vfid"] --> vfid["file_Tracker_vfid"]
  parm --> sticky["sticky first page"]
  sticky --> vpid["file_Tracker_vpid"]
  vpid --> head["FILE_EXTENSIBLE_DATA (head page)"]
  head -->|vpid_next| more["FILE_EXTENSIBLE_DATA (more pages)"]
  head --> items["FILE_TRACK_ITEM[] (volid,fileid,type,metadata)"]

Figure 11-1. Boot parameter to tracker globals to the extensible-data item chain.

11.2.1 `file_tracker_register` — add an item on permanent create

Called from file_create for every permanent file (Ch 6), under a started system op.

// file_tracker_register -- src/storage/file_manager.c
assert (log_check_system_op_is_started (thread_p));
item.volid = vfid->volid; item.fileid = vfid->fileid; item.type = (INT16) ftype;
if (metadata == NULL) item.metadata.metadata_size_tracker = 0;   /* zero-fill */
else                  item.metadata = *metadata;
page_track_head = pgbuf_fix (..., &file_Tracker_vpid, OLD_PAGE, PGBUF_LATCH_WRITE, ...);
if (page_track_head == NULL) { ASSERT_ERROR_AND_SET (error_code); return error_code; }
error_code = file_tracker_register_internal (thread_p, page_track_head, &item);

Placement lives in file_tracker_register_internal: find a not-full page (file_extdata_find_not_full); if none, allocate a new tracker page (file_alloc(&file_Tracker_vfid, ...)) linked via file_log_extdata_set_next; binary-search the slot; assert no duplicate (assert_release(false)); file_extdata_insert_at + file_log_extdata_add, mark dirty. Both error exits and the duplicate path goto exit.

11.2.2 `file_tracker_unregister` — remove an item on permanent destroy

// file_tracker_unregister -- src/storage/file_manager.c
log_sysop_start (thread_p);                       /* its own nested system op */
item_inout.volid = vfid->volid; item_inout.fileid = vfid->fileid;
error_code = file_extdata_find_and_remove_item (..., &item_inout, file_compare_track_items, true,
                                                &item_inout, &vpid_merged);
if (error_code != NO_ERROR) goto exit;            /* -> sysop_abort */
if (!VPID_ISNULL (&vpid_merged))                  /* removal emptied/merged a page */
  error_code = file_dealloc (thread_p, &file_Tracker_vfid, &vpid_merged, FILE_TRACKER);
exit:
  if (error_code != NO_ERROR) log_sysop_abort (thread_p);
  else log_sysop_end_logical_undo (thread_p, RVFL_TRACKER_UNREGISTER, NULL, sizeof (item_inout), &item_inout);

Branches: fix-fails returns early (no sysop); find-and-remove-fails or merge-then-dealloc-fails both goto exit → abort; success ends with logical undo. Logical (not physical) undo is the key — items shift between pages as the chain compacts, so a physical undo would target the wrong slot; the undo replays file_tracker_register_internal from the saved item (file_rv_tracker_unregister_undo).

11.2.3 `file_tracker_map` — enumerate every file

// file_tracker_map -- src/storage/file_manager.c
page_track_head = pgbuf_fix (..., &file_Tracker_vpid, OLD_PAGE, latch_mode, ...);
while (true) {                                        /* walk the extdata chain */
    for (index_item = 0; index_item < file_extdata_item_count (extdata); index_item++) {
        error_code = func (thread_p, page_extdata, extdata, index_item, &stop, args);
        if (error_code != NO_ERROR || stop) goto exit;   /* error, or callback early-out */
    }
    if (page_track_other != NULL) pgbuf_unfix_and_init (thread_p, page_track_other);
    if (VPID_ISNULL (&extdata->vpid_next)) break;        /* end of chain */
    page_track_other = pgbuf_fix (..., &extdata->vpid_next, OLD_PAGE, latch_mode, ...);
    if (page_track_other == NULL) goto exit;
    page_extdata = page_track_other;
}

map holds the head page and rotates one page_track_other (at most two latched at once). The companion file_tracker_interruptable_iterate instead returns a cursor (vfid) plus an OID lock so a long scan can be interrupted and resumed without pinning the tracker — its FILE_GET_TRACKER_LOCK_MODE macro picks IX_LOCK for B-trees and SCH_S_LOCK otherwise.

11.3 Sticky first page

Some files must keep their first user page at a fixed VPID forever — the tracker itself and the boot HFID heap. file_alloc_sticky_first_page allocates page #1 and records it.

// file_alloc_sticky_first_page -- src/storage/file_manager.c
assert (fhead->n_page_user == 0 && VPID_ISNULL (&fhead->vpid_sticky_first));  /* brand-new file */
error_code = file_alloc (thread_p, vfid, f_init, f_init_args, vpid_out, page_out);
if (error_code != NO_ERROR) goto exit;
log_append_undoredo_data2 (thread_p, RVFL_FHEAD_STICKY_PAGE, NULL, page_fhead, 0,
                           sizeof (VPID), sizeof (VPID), &fhead->vpid_sticky_first, vpid_out);
fhead->vpid_sticky_first = *vpid_out;                      /* remember it */
pgbuf_set_dirty (thread_p, page_fhead, DONT_FREE);

An ordinary file_alloc plus one logged header write (recovered by file_rv_fhead_sticky_page). The payoff is on dealloc: file_dealloc and its helpers assert (!VPID_EQ (&fhead->vpid_sticky_first, vpid)), exempting the sticky page from the Ch 9 lifecycle. This is a debug assertion (compiled out under NDEBUG), not a runtime short-circuit — callers are simply expected never to pass the sticky VPID. file_get_sticky_first_page reads it back (assert_release(false) if NULL); this is how file_tracker_load recovers file_Tracker_vpid.

11.4 TDE flags — orthogonal to allocation

TDE is two mutually exclusive bits in fhead->file_flags: FILE_FLAG_ENCRYPTED_AES (0x4) and FILE_FLAG_ENCRYPTED_ARIA (0x8).

// file_set_tde_algorithm_internal -- src/storage/file_manager.c
fhead->file_flags &= ~FILE_FLAG_ENCRYPTED_MASK;            /* clear both bits first */
switch (tde_algo) {
  case TDE_ALGORITHM_AES:  fhead->file_flags |= FILE_FLAG_ENCRYPTED_AES;  break;
  case TDE_ALGORITHM_ARIA: fhead->file_flags |= FILE_FLAG_ENCRYPTED_ARIA; break;
  case TDE_ALGORITHM_NONE: break;                          /* already cleared */
}

Neither sector reservation (Ch 4) nor page allocation (Ch 7/8) consults these flags. file_get_tde_algorithm_internal asserts the two bits are never both set, then reports AES, ARIA, or NONE. Encryption is applied per page at the buffer layer; the allocation machinery is algorithm-blind, so for a reader modifying allocation TDE is a non-event.

11.5 The shared primitive — `file_extdata_apply_funcs`

Every table in the module (tracker items, user-page table, sector tables) is a FILE_EXTENSIBLE_DATA chain, and almost every walk goes through this generic visitor.

// file_extdata_apply_funcs -- src/storage/file_manager.c
while (true) {
    if (f_extdata != NULL) { error_code = f_extdata (...); if (error_code || stop) goto exit; }  /* per-page */
    if (f_item != NULL)                                                                          /* per-item */
      for (i = 0; i < file_extdata_item_count (extdata_in); i++)
        { error_code = f_item (..., file_extdata_at (extdata_in, i), i, &stop, ...); if (error_code || stop) goto exit; }
    if (VPID_ISNULL (&extdata_in->vpid_next)) break;
    // ... unfix current, fix extdata_in->vpid_next, goto exit on NULL ...
}
exit:
    if (stop && page_out != NULL) *page_out = page_extdata;   /* hand page back, latched */
    else if (page_extdata != NULL) pgbuf_unfix (thread_p, page_extdata);

Two optional callbacks (f_extdata per page, f_item per item), either can stop or error. The exit policy is the subtle part: on stop with page_out the page is handed back still latched so search-then-modify can act on it; otherwise it is unfixed here. This underlies file_extdata_search_item / _find_ordered (binary search), _insert_at / _remove_at, and file_extdata_merge. latch_mode is WRITE when for_write, else READ.

11.6 Recovery handlers for this chapter, and the open question

Both modules register undo/redo/dump callbacks indexed by the RV* enum. The handlers introduced by this chapter’s machinery:

RV index	Handler(s)	What it replays
`RVFL_FHEAD_STICKY_PAGE`	`file_rv_fhead_sticky_page`	sticky-first-page VPID (11.3)
`RVFL_TRACKER_UNREGISTER`	`file_rv_tracker_unregister_undo`	logical undo of tracker removal (11.2.2)
`RVFL_SET_TDE_ALGORITHM`	`file_rv_set_tde_algorithm`	TDE flag change (11.4)
`RVFL_EXTDATA_ADD/REMOVE/SET_NEXT/MERGE`	`file_rv_extdata_add` / `_remove` / `_set_next` / `_merge`	every extensible-data edit (11.5)
`RVDK_FORMAT`	`disk_rv_undo_format`, `disk_rv_redo_format`, `disk_rv_dump_hdr`	volume create/format — the open question below

The core-lifecycle handlers — sector reserve/unreserve, volume-header expand, file-header alloc/dealloc, partial-sector bitmap edits, postponed dealloc, file destroy — belong to Chapters 3-5, 7, 9; see those for their RVDK_* / RVFL_* rows.

Open question — mid-disk_format crash idempotency. disk_rv_redo_format carries an is_first_call flag (rcv->offset == -1) that skips the disk-cache update on the first of its two calls, so the format handlers encode an implicit assumption about how far disk_format got before a crash. Whether every interleaving (volume file created / cache registered / volume_info written) is covered — notably a crash between the redo’s two calls — is not provable from the handlers alone and is left as a verification target.

11.7 Chapter summary — key takeaways

file_Tempcache recycles whole temp files: get pops a cached list or allocates a shell; put admits a reset file or returns false so the caller destroys it.
A temp file lives on exactly one tracking list; file_temp_preserve pops it off the transaction list for the query manager, drop_tran_temp_files drains the rest at commit/abort.
The File Tracker is a (volid, fileid)-ordered FILE_TRACK_ITEM chain reached via trk_vfid → file_Tracker_vfid / file_Tracker_vpid; unregister uses logical undo because items migrate between pages.
Sticky first pages are lifecycle-exempt by contract — file_dealloc only asserts (debug build) the page is never vpid_sticky_first; release builds rely on callers never passing it.
TDE (mutually exclusive AES/ARIA bits in file_flags) is orthogonal to allocation; the bits are read only when bytes hit disk.
file_extdata_apply_funcs is the one engine behind every table, with per-page/per-item callbacks and an exit policy that can hand the stopped-on page back still latched.
Recovery is indexed by RV* constants; the one unproven corner is disk_rv_*_format idempotency across mid-disk_format crash points.

Position hints as of this revision

The following are line numbers as observed on 2026-06-09; symbols are the canonical anchor and line numbers are hints that decay.

Symbol	File	Line
`bit64_count_trailing_ones`	`src/base/bit.c`	515
`PRM_ID_BOSR_MAXTMP_PAGES`	`src/base/system_parameter.c`	1246
`DB_VOLPURPOSE`	`src/compat/dbtype_def.h`	196
`DB_VOLTYPE`	`src/compat/dbtype_def.h`	203
`VSID`	`src/compat/dbtype_def.h`	939
`fhs_fix_nth_page`	`src/query/query_hash_scan.c`	1078
`disk_volume_header`	`src/storage/disk_manager.c`	75
`disk_cache_volinfo`	`src/storage/disk_manager.c`	155
`disk_extend_info`	`src/storage/disk_manager.c`	162
`disk_perm_info`	`src/storage/disk_manager.c`	180
`disk_temp_info`	`src/storage/disk_manager.c`	186
`nsect_perm_free`	`src/storage/disk_manager.c`	189
`disk_cache`	`src/storage/disk_manager.c`	194
`disk_Cache`	`src/storage/disk_manager.c`	209
`disk_Temp_max_sects`	`src/storage/disk_manager.c`	211
`DISK_STAB_UNIT`	`src/storage/disk_manager.c`	224
`disk_stab_cursor`	`src/storage/disk_manager.c`	229
`DISK_STAB_PAGE_BIT_COUNT`	`src/storage/disk_manager.c`	250
`DISK_ALLOCTBL_SECTOR_PAGE_OFFSET`	`src/storage/disk_manager.c`	253
`DISK_ALLOCTBL_SECTOR_UNIT_OFFSET`	`src/storage/disk_manager.c`	255
`DISK_STAB_NPAGES`	`src/storage/disk_manager.c`	263
`disk_cache_vol_reserve`	`src/storage/disk_manager.c`	273
`DISK_PRERESERVE_BUF_DEFAULT`	`src/storage/disk_manager.c`	278
`disk_reserve_context`	`src/storage/disk_manager.c`	281
`DISK_MIN_VOLUME_SECTS`	`src/storage/disk_manager.c`	300
`DISK_SYS_NSECT_SIZE`	`src/storage/disk_manager.c`	347
`disk_format`	`src/storage/disk_manager.c`	512
`disk_unformat`	`src/storage/disk_manager.c`	822
`disk_rv_undo_format`	`src/storage/disk_manager.c`	1235
`disk_rv_redo_format`	`src/storage/disk_manager.c`	1340
`disk_extend`	`src/storage/disk_manager.c`	1633
`disk_volume_expand`	`src/storage/disk_manager.c`	1904
`disk_rv_volhead_extend_redo`	`src/storage/disk_manager.c`	2022
`disk_rv_volhead_extend_undo`	`src/storage/disk_manager.c`	2081
`disk_add_volume`	`src/storage/disk_manager.c`	2117
`disk_add_volume_extension`	`src/storage/disk_manager.c`	2326
`disk_volume_boot`	`src/storage/disk_manager.c`	2443
`disk_cache_load_volume`	`src/storage/disk_manager.c`	2567
`disk_cache_init`	`src/storage/disk_manager.c`	2627
`disk_cache_final`	`src/storage/disk_manager.c`	2688
`disk_cache_load_all_volumes`	`src/storage/disk_manager.c`	2714
`disk_cache_free_reserved`	`src/storage/disk_manager.c`	2728
`disk_cache_update_vol_free`	`src/storage/disk_manager.c`	2748
`disk_lock_extend`	`src/storage/disk_manager.c`	2791
`disk_unlock_extend`	`src/storage/disk_manager.c`	2817
`disk_cache_lock_reserve_for_purpose`	`src/storage/disk_manager.c`	2837
`disk_volume_header_set_stab`	`src/storage/disk_manager.c`	3166
`disk_verify_volume_header`	`src/storage/disk_manager.c`	3179
`disk_stab_cursor_set_at_sectid`	`src/storage/disk_manager.c`	3258
`disk_stab_cursor_set_at_end`	`src/storage/disk_manager.c`	3284
`disk_stab_cursor_set_at_start`	`src/storage/disk_manager.c`	3303
`disk_stab_cursor_check_valid`	`src/storage/disk_manager.c`	3372
`disk_stab_cursor_is_bit_set`	`src/storage/disk_manager.c`	3414
`disk_stab_cursor_set_bit`	`src/storage/disk_manager.c`	3429
`disk_stab_cursor_fix`	`src/storage/disk_manager.c`	3493
`disk_stab_unit_reserve`	`src/storage/disk_manager.c`	3544
`disk_stab_iterate_units`	`src/storage/disk_manager.c`	3665
`disk_stab_iterate_units_all`	`src/storage/disk_manager.c`	3738
`disk_stab_set_bits_contiguous`	`src/storage/disk_manager.c`	3807
`disk_rv_reserve_sectors`	`src/storage/disk_manager.c`	3899
`disk_rv_unreserve_sectors`	`src/storage/disk_manager.c`	3982
`disk_reserve_sectors_in_volume`	`src/storage/disk_manager.c`	4066
`disk_reserve_sectors`	`src/storage/disk_manager.c`	4290
`disk_reserve_from_cache`	`src/storage/disk_manager.c`	4463
`disk_reserve_from_cache_vols`	`src/storage/disk_manager.c`	4612
`disk_reserve_from_cache_volume`	`src/storage/disk_manager.c`	4666
`disk_unreserve_ordered_sectors`	`src/storage/disk_manager.c`	4703
`disk_unreserve_ordered_sectors_without_csect`	`src/storage/disk_manager.c`	4735
`disk_unreserve_sectors_from_volume`	`src/storage/disk_manager.c`	4794
`disk_stab_unit_unreserve`	`src/storage/disk_manager.c`	4848
`disk_stab_init`	`src/storage/disk_manager.c`	4909
`disk_manager_init`	`src/storage/disk_manager.c`	5002
`disk_manager_final`	`src/storage/disk_manager.c`	5044
`disk_format_first_volume`	`src/storage/disk_manager.c`	5062
`disk_sectors_to_extend_npages`	`src/storage/disk_manager.c`	6845
`DISK_VOLHEADER_PAGE`	`src/storage/disk_manager.h`	35
`fileio_map_mounted`	`src/storage/file_io.c`	3448
`file_header`	`src/storage/file_manager.c`	90
`n_page_mark_delete`	`src/storage/file_manager.c`	104
`volid_last_expand`	`src/storage/file_manager.c`	117
`vpid_sticky_first`	`src/storage/file_manager.c`	123
`vpid_last_temp_alloc`	`src/storage/file_manager.c`	132
`offset_to_last_temp_alloc`	`src/storage/file_manager.c`	133
`vpid_last_user_page_ftab`	`src/storage/file_manager.c`	139
`vpid_find_nth_last`	`src/storage/file_manager.c`	156
`first_index_find_nth_last`	`src/storage/file_manager.c`	157
`FILE_HEADER_ALIGNED_SIZE`	`src/storage/file_manager.c`	167
`FILE_FLAG_NUMERABLE`	`src/storage/file_manager.c`	170
`FILE_FLAG_ENCRYPTED_AES`	`src/storage/file_manager.c`	172
`FILE_FLAG_ENCRYPTED_ARIA`	`src/storage/file_manager.c`	173
`FILE_CACHE_LAST_FIND_NTH`	`src/storage/file_manager.c`	181
`FILE_HEADER_GET_PART_FTAB`	`src/storage/file_manager.c`	199
`FILE_HEADER_GET_FULL_FTAB`	`src/storage/file_manager.c`	203
`FILE_HEADER_GET_USER_PAGE_FTAB`	`src/storage/file_manager.c`	208
`file_extensible_data`	`src/storage/file_manager.c`	232
`FILE_EXTDATA_HEADER_ALIGNED_SIZE`	`src/storage/file_manager.c`	240
`FILE_TABLESPACE_FOR_PERM_NPAGES`	`src/storage/file_manager.c`	281
`FILE_TABLESPACE_FOR_TEMP_NPAGES`	`src/storage/file_manager.c`	287
`file_vsid_collector`	`src/storage/file_manager.c`	296
`file_alloc_type`	`src/storage/file_manager.c`	388
`FILE_USER_PAGE_MARK_DELETE_FLAG`	`src/storage/file_manager.c`	425
`FILE_USER_PAGE_IS_MARKED_DELETED`	`src/storage/file_manager.c`	426
`FILE_USER_PAGE_MARK_DELETED`	`src/storage/file_manager.c`	427
`FILE_USER_PAGE_CLEAR_MARK_DELETED`	`src/storage/file_manager.c`	428
`file_find_nth_context`	`src/storage/file_manager.c`	433
`file_tempcache_entry`	`src/storage/file_manager.c`	448
`file_tempcache_tran_entry`	`src/storage/file_manager.c`	457
`file_tempcache`	`src/storage/file_manager.c`	467
`file_Tempcache`	`src/storage/file_manager.c`	490
`file_Tracker_vfid`	`src/storage/file_manager.c`	496
`file_Tracker_vpid`	`src/storage/file_manager.c`	497
`file_track_metadata`	`src/storage/file_manager.c`	507
`file_track_item`	`src/storage/file_manager.c`	515
`file_manager_init`	`src/storage/file_manager.c`	859
`file_manager_final`	`src/storage/file_manager.c`	872
`file_header_alloc`	`src/storage/file_manager.c`	1093
`file_header_update_mark_deleted`	`src/storage/file_manager.c`	1317
`file_extdata_init`	`src/storage/file_manager.c`	1492
`file_extdata_max_size`	`src/storage/file_manager.c`	1520
`file_extdata_apply_funcs`	`src/storage/file_manager.c`	1886
`file_extdata_find_and_remove_item`	`src/storage/file_manager.c`	2571
`file_partsect_is_full`	`src/storage/file_manager.c`	2758
`file_partsect_is_empty`	`src/storage/file_manager.c`	2770
`file_partsect_set_bit`	`src/storage/file_manager.c`	2796
`file_partsect_pageid_to_offset`	`src/storage/file_manager.c`	2826
`file_partsect_alloc`	`src/storage/file_manager.c`	2847
`file_create_with_npages`	`src/storage/file_manager.c`	3101
`file_create_heap`	`src/storage/file_manager.c`	3126
`file_create_temp_internal`	`src/storage/file_manager.c`	3155
`file_create_temp`	`src/storage/file_manager.c`	3217
`file_create_temp_numerable`	`src/storage/file_manager.c`	3231
`file_create_query_area`	`src/storage/file_manager.c`	3244
`file_create_ehash`	`src/storage/file_manager.c`	3261
`file_create_ehash_dir`	`src/storage/file_manager.c`	3285
`file_create`	`src/storage/file_manager.c`	3311
`file_table_collect_vsid`	`src/storage/file_manager.c`	3915
`file_table_collect_all_vsids`	`src/storage/file_manager.c`	3934
`file_destroy`	`src/storage/file_manager.c`	4121
`file_temp_retire_preserved`	`src/storage/file_manager.c`	4445
`file_temp_retire_internal`	`src/storage/file_manager.c`	4476
`file_perm_expand`	`src/storage/file_manager.c`	4644
`file_table_move_partial_sectors_to_header`	`src/storage/file_manager.c`	4772
`file_table_append_full_sector_page`	`src/storage/file_manager.c`	4976
`file_table_add_full_sector`	`src/storage/file_manager.c`	5026
`file_perm_alloc`	`src/storage/file_manager.c`	5166
`file_alloc`	`src/storage/file_manager.c`	5405
`file_alloc_sticky_first_page`	`src/storage/file_manager.c`	5681
`file_rv_fhead_sticky_page`	`src/storage/file_manager.c`	5753
`file_get_sticky_first_page`	`src/storage/file_manager.c`	5779
`file_set_tde_algorithm_internal`	`src/storage/file_manager.c`	5896
`file_get_tde_algorithm_internal`	`src/storage/file_manager.c`	5963
`file_dealloc`	`src/storage/file_manager.c`	6116
`file_perm_dealloc`	`src/storage/file_manager.c`	6309
`file_rv_dealloc_internal`	`src/storage/file_manager.c`	6616
`file_rv_dealloc_on_undo`	`src/storage/file_manager.c`	6758
`file_rv_dealloc_on_postpone`	`src/storage/file_manager.c`	6773
`file_numerable_add_page`	`src/storage/file_manager.c`	7935
`file_extdata_find_nth_vpid`	`src/storage/file_manager.c`	8119
`file_extdata_find_nth_vpid_and_skip_marked`	`src/storage/file_manager.c`	8153
`file_numerable_find_nth`	`src/storage/file_manager.c`	8193
`file_rv_user_page_mark_delete`	`src/storage/file_manager.c`	8381
`file_rv_user_page_unmark_delete_logical`	`src/storage/file_manager.c`	8406
`file_numerable_truncate`	`src/storage/file_manager.c`	8577
`file_temp_alloc`	`src/storage/file_manager.c`	8650
`disk_reserve_sectors`	`src/storage/file_manager.c`	8715
`file_temp_reset_user_pages`	`src/storage/file_manager.c`	8949
`file_temp_preserve`	`src/storage/file_manager.c`	9143
`file_tempcache_init`	`src/storage/file_manager.c`	9171
`file_tempcache_final`	`src/storage/file_manager.c`	9234
`file_tempcache_get`	`src/storage/file_manager.c`	9414
`file_tempcache_put`	`src/storage/file_manager.c`	9541
`file_tempcache_drop_tran_temp_files`	`src/storage/file_manager.c`	9645
`file_tempcache_cache_or_drop_entries`	`src/storage/file_manager.c`	9664
`file_tempcache_pop_tran_file`	`src/storage/file_manager.c`	9702
`file_tracker_create`	`src/storage/file_manager.c`	9861
`file_tracker_load`	`src/storage/file_manager.c`	9910
`file_tracker_register`	`src/storage/file_manager.c`	9960
`file_tracker_register_internal`	`src/storage/file_manager.c`	10016
`file_tracker_unregister`	`src/storage/file_manager.c`	10113
`file_tracker_map`	`src/storage/file_manager.c`	10306
`file_tracker_interruptable_iterate`	`src/storage/file_manager.c`	10992
`file_heap_des`	`src/storage/file_manager.h`	82
`file_btree_des`	`src/storage/file_manager.h`	98
`file_ovf_btree_des`	`src/storage/file_manager.h`	106
`FILE_DESCRIPTORS_SIZE`	`src/storage/file_manager.h`	128
`file_descriptors`	`src/storage/file_manager.h`	130
`file_tablespace`	`src/storage/file_manager.h`	143
`FILE_ALLOC_BITMAP`	`src/storage/file_manager.h`	153
`FILE_FULL_PAGE_BITMAP`	`src/storage/file_manager.h`	154
`FILE_ALLOC_BITMAP_NBITS`	`src/storage/file_manager.h`	157
`file_partial_sector`	`src/storage/file_manager.h`	162
`pgbuf_dealloc_page`	`src/storage/page_buffer.c`	14562
`DISK_SECTOR_NPAGES`	`src/storage/storage_common.h`	109
`trk_vfid`	`src/transaction/boot_sr.c`	119
`LOG_MAX_DBVOLID`	`src/transaction/log_volids.hpp`	34

Sources

cubrid-disk-manager.md — the high-level companion (covers both file and disk managers).
Raw analyses under raw/code-analysis/cubrid/storage/disk_manager/ and the numerable-file Q&A note raw/code-analysis/cubrid/file-manager-numerable-qa.md.
Code: src/storage/file_manager.{c,h}, src/storage/disk_manager.{c,h}.
Methodology: knowledge/methodology/code-analysis-detail-doc.md.