Skip to content

PostgreSQL pg_waldump — WAL Decoding and Inspection Utility

Contents:

A write-ahead log is opaque by design: it is a binary byte stream optimized for sequential append, not for human inspection. The correctness guarantee it provides — “if these bytes survived, these changes survived” — is valuable precisely because the engine does not need to interpret the bytes during normal operation. But that opacity is a liability the moment something goes wrong. A DBA who sees a standby falling behind, an engineer debugging an unexpected VACUUM stall, or a developer writing a new extension that emits WAL records all need the same thing: a way to read the log and understand what is in it.

The general problem is WAL introspection: given a physical log file on disk, produce a human-readable rendering of each record’s identity, position, size, and payload summary. Three properties must hold for this to be safe and useful.

Offline read-only access. The tool must be able to read a WAL segment without the database being online — the primary use case is post-mortem analysis of a crashed server or inspection of an archived WAL stream that is not attached to a live instance. This means the tool cannot call into the buffer manager, shared memory, lock manager, or any other infrastructure that assumes a running server. It must be a pure reader operating on files.

Faithful record decoding. A WAL record is not self-describing prose; its payload is a binary blob whose layout is defined by the resource manager (rmgr) that wrote it. A Heap INSERT record, a B-tree page-split record, and a checkpoint record all have the same wire framing but different payloads. The introspection tool must be able to invoke the same desc / identify logic that the recovery driver would use, but without executing the record’s redo function. This requires the rmgr table to be present in the tool, with its description callbacks but without its recovery logic.

Position and size accounting. Operators debugging replication lag or checkpoint pressure need more than record text. They need aggregate counts and sizes: how many bytes does the Heap rmgr consume per WAL segment? What fraction is full-page images? Which record type generates the most volume? This calls for a statistics accumulation mode layered on top of the basic record scan.

These three properties map directly onto pg_waldump’s three operating modes: plain record display, statistics (--stats), and follow mode (--follow).

The ARIES paper (Mohan et al. 1992) defines the log as a sequence of self-describing records linked by prevLSN back-pointers. The back-pointer is what makes sequential backward traversal possible. pg_waldump exploits the forward-link structure (each record carries its own length, so the next record starts at the current LSN plus the padded record length) and uses the xl_prev back-pointer (XLogRecGetPrev) to report the predecessor LSN in each display line. See postgres-xlog-wal.md for the record framing and postgres-wal-records-rmgr.md for the rmgr dispatch table.

Every production-quality DBMS ships a WAL inspection tool. They converge on a small set of architectural choices that are worth naming before looking at PostgreSQL’s specific implementation.

Frontend-mode binary with shared library support

Section titled “Frontend-mode binary with shared library support”

The standard pattern is a standalone binary compiled from the same source tree as the server, sharing the WAL-reader and rmgr-description code but excluding the recovery executor and all server-side infrastructure. MySQL/InnoDB ships mysqlbinlog (operates on binlog files, which are MySQL’s logical WAL analog). Oracle ships LogMiner, which runs as a server-side package rather than an external binary. SQL Server ships fn_dblog and fn_dump_dblog as T-SQL table-valued functions callable from a live connection. PostgreSQL’s design choice — an external binary that links against the same C library used by the server but runs entirely as a frontend process — sits between Oracle’s fully server-integrated approach and a hypothetical third-party parser.

A WAL reader that works both inside the server (reading from WAL buffers or streaming-replication receive buffers) and outside it (reading from segment files on disk) cannot hard-code the I/O path. The universal solution is a page-read callback supplied by the caller. Inside the server, the callback reads from shared WAL buffers. Outside the server, the callback opens segment files directly. The WAL reader’s core loop — record framing, CRC checking, block-reference decoding — is the same in both cases. pg_waldump supplies its own WALDumpReadPage callback for the file-based path.

Raw WAL output is voluminous. A WAL segment defaults to 16 MB and can contain thousands of records. Every inspection tool offers filters to narrow the output: by time range (Oracle LogMiner START_TIME/END_TIME), by object (OBJECT_NAME), by transaction. pg_waldump exposes filters by rmgr, XID, relation (tablespace OID / database OID / relation filenode), block number, fork, and full-page-write presence. The filter is applied in a tight main loop after each successful XLogReadRecord; matching records are displayed, skipped records advance the loop with no output.

Statistics mode as an independent output path

Section titled “Statistics mode as an independent output path”

An aggregate view of WAL content — “how many bytes does each record type contribute over this segment?” — is qualitatively different from per-record display. It requires accumulating counts and sizes across the scan, then printing a summary table. Separating the accumulation path from the display path keeps both clean: the display path is stateless (one record in, one line out), while the statistics path is stateful (N records in, one table out). pg_waldump implements this split with XLogDumpDisplayRecord and XLogDumpDisplayStats as two mutually exclusive output paths inside the same main loop.

pg_waldump is a single-file frontend binary (src/bin/pg_waldump/pg_waldump.c, 1323 lines). Its companion rmgrdesc.c provides the RmgrDescTable — a stripped-down version of the server-side rmgr table that carries only the rm_name, rm_desc, and rm_identify callbacks, not the redo or startup/cleanup callbacks. The binary links against libpgcommon and libpgport for utility functions but against none of the server backend libraries.

flowchart TB
    CLI["main()<br/>option parsing<br/>XLogDumpConfig + XLogDumpPrivate"]
    DIR["identify_target_directory()<br/>locate WAL segment dir<br/>probe ., pg_wal, $PGDATA/pg_wal"]
    ALLOC["XLogReaderAllocate()<br/>WalSegSz, waldir<br/>XL_ROUTINE callbacks"]
    FIND["XLogFindNextRecord()<br/>scan forward to first valid record"]
    LOOP["main loop<br/>XLogReadRecord()"]
    FILTER{"filters pass?<br/>rmgr / xid / relation<br/>block / fork / fpw"}
    DISPLAY["XLogDumpDisplayRecord()<br/>rm_name + rm_identify + rm_desc<br/>+ XLogRecGetBlockRefInfo"]
    STATS["XLogRecStoreStats()<br/>accumulate XLogStats"]
    FPW["XLogRecordSaveFPWs()<br/>RestoreBlockImage → file"]
    SUMMARY["XLogDumpDisplayStats()<br/>per-rmgr/record table"]
    END["XLogReaderFree()"]

    CLI --> DIR --> ALLOC --> FIND --> LOOP
    LOOP --> FILTER
    FILTER -->|"no"| LOOP
    FILTER -->|"yes, --stats"| STATS --> FPW --> LOOP
    FILTER -->|"yes, display"| DISPLAY --> FPW --> LOOP
    LOOP -->|"end or --limit"| SUMMARY --> END

Figure 1 — pg_waldump main control flow. The two output paths (display and stats) are mutually exclusive inside the loop; --save-fullpage runs on every matching record regardless of mode.

Startup: locating WAL and reading the segment header

Section titled “Startup: locating WAL and reading the segment header”

main calls identify_target_directory to find the directory that contains WAL segment files. The search order is: (1) the explicit --path argument; (2) that path plus /pg_wal; (3) if no --path, the current directory, then pg_wal, then $PGDATA/pg_wal. For each candidate, search_directory opens a file matching the WAL filename pattern and reads the first 8 KB page to extract WalSegSz from the XLogLongPageHeader:

// search_directory — pg_waldump.c (condensed)
r = read(fd, buf.data, XLOG_BLCKSZ);
if (r == XLOG_BLCKSZ)
{
XLogLongPageHeader longhdr = (XLogLongPageHeader) buf.data;
WalSegSz = longhdr->xlp_seg_size;
if (!IsValidWalSegSize(WalSegSz))
/* error: not a power of two between 1 MB and 1 GB */
exit(1);
}

IsValidWalSegSize checks that the size is a power of two in [1 MB, 1 GB]. The segment size is a compile-time default (16 MB) but is overridable at initdb time, so pg_waldump must discover it from the file rather than assuming the compiled default. This matters for clusters initialized with a non-default --wal-segsize.

XLogReaderState initialization and the three callbacks

Section titled “XLogReaderState initialization and the three callbacks”

After locating the WAL directory and segment size, main calls XLogReaderAllocate to create an XLogReaderState. The three file-I/O callbacks are supplied via the XL_ROUTINE macro, which initializes a compound literal of type XLogReaderRoutine:

// main — pg_waldump.c (condensed)
xlogreader_state =
XLogReaderAllocate(WalSegSz, waldir,
XL_ROUTINE(.page_read = WALDumpReadPage,
.segment_open = WALDumpOpenSegment,
.segment_close = WALDumpCloseSegment),
&private);

WALDumpOpenSegment builds the segment filename from (tli, segno, segsize) via XLogFileName and opens it read-only. In --follow mode it retries up to 10 times with 500 ms sleeps — a short busy-wait to handle the gap between the server finishing the previous segment and creating the next one.

WALDumpReadPage is the core I/O callback — pg_waldump’s file-based equivalent of the server’s read_local_xlog_page. It checks whether the request falls within the configured end-pointer (private.endptr), computes how many bytes to read, and calls WALRead. If WALRead fails it reports the segment filename and offset from the WALReadError struct. A read that exceeds endptr sets private.endptr_reached = true and returns -1 to stop the loop:

// WALDumpReadPage — pg_waldump.c (condensed)
static int
WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogRecPtr targetPtr, char *readBuff)
{
XLogDumpPrivate *private = state->private_data;
int count = XLOG_BLCKSZ;
WALReadError errinfo;
if (private->endptr != InvalidXLogRecPtr)
{
if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
count = XLOG_BLCKSZ;
else if (targetPagePtr + reqLen <= private->endptr)
count = private->endptr - targetPagePtr;
else
{
private->endptr_reached = true;
return -1; /* stops the main loop */
}
}
if (!WALRead(state, readBuff, targetPagePtr, count, private->timeline,
&errinfo))
{
WALOpenSegment *seg = &errinfo.wre_seg;
char fname[MAXPGPATH];
XLogFileName(fname, seg->ws_tli, seg->ws_segno,
state->segcxt.ws_segsize);
/* pg_fatal with fname + errinfo.wre_off ... */
}
return count;
}

The three branches on endptr matter: a full page is read when the whole page lies before the boundary; a partial page (count = endptr - targetPagePtr) is read when only reqLen bytes are needed before the boundary; and the boundary itself trips endptr_reached. WALRead (shared with the server) handles the segment_open/segment_close calls into WALDumpOpenSegment / WALDumpCloseSegment under the hood, so the callback never opens files itself.

WALDumpCloseSegment simply closes the file descriptor and resets it to -1.

After XLogFindNextRecord positions the reader at the first valid record at or after the requested start LSN, the main loop calls XLogReadRecord repeatedly:

// main — pg_waldump.c (condensed)
for (;;)
{
if (time_to_stop) break; /* SIGINT handler set this */
record = XLogReadRecord(xlogreader_state, &errormsg);
if (!record)
{
if (!config.follow || private.endptr_reached) break;
pg_usleep(1000000L); /* follow mode: sleep 1 s and retry */
continue;
}
/* apply filters */
if (config.filter_by_rmgr_enabled &&
!config.filter_by_rmgr[record->xl_rmid]) continue;
if (config.filter_by_xid_enabled &&
config.filter_by_xid != record->xl_xid) continue;
if (config.filter_by_extended &&
!XLogRecordMatchesRelationBlock(...)) continue;
if (config.filter_by_fpw && !XLogRecordHasFPW(xlogreader_state)) continue;
/* output */
if (!config.quiet)
{
if (config.stats)
{ XLogRecStoreStats(&stats, xlogreader_state); stats.endptr = ...; }
else
XLogDumpDisplayRecord(&config, xlogreader_state);
}
if (config.save_fullpage_path)
XLogRecordSaveFPWs(xlogreader_state, config.save_fullpage_path);
config.already_displayed_records++;
if (config.stop_after_records > 0 &&
config.already_displayed_records >= config.stop_after_records) break;
}

XLogReadRecord handles page boundaries, record spanning, CRC verification, and back-link validation internally. pg_waldump’s loop sees only a decoded XLogRecord * (or NULL on end-of-WAL or error) and the unpacked block references in the XLogReaderState.

The real per-record dispatch — the heart of the loop, after filters pass — selects between the stats and display output paths and then optionally extracts full-page images. This is the verbatim shape of the body:

// main (loop body) — pg_waldump.c (verbatim)
/* perform any per-record work */
if (!config.quiet)
{
if (config.stats == true)
{
XLogRecStoreStats(&stats, xlogreader_state);
stats.endptr = xlogreader_state->EndRecPtr;
}
else
XLogDumpDisplayRecord(&config, xlogreader_state);
}
/* save full pages if requested */
if (config.save_fullpage_path != NULL)
XLogRecordSaveFPWs(xlogreader_state, config.save_fullpage_path);
/* check whether we printed enough */
config.already_displayed_records++;
if (config.stop_after_records > 0 &&
config.already_displayed_records >= config.stop_after_records)
break;

Two details are load-bearing. First, stats.endptr is refreshed on every stats record (not once at the end) because the loop may stop early via --limit, and the summary table’s byte-range header must reflect the last record actually counted. Second, XLogRecordSaveFPWs is outside the if (!config.quiet) block, so --save-fullpage --quiet still extracts images while suppressing text — a deliberate decoupling of the FPI side-channel from the display path.

flowchart TB
    TOP["loop top"]
    SIG{"time_to_stop?<br/>SIGINT handler"}
    READ["record = XLogReadRecord()"]
    NULLQ{"record == NULL?"}
    FOLLOW{"follow and not<br/>endptr_reached?"}
    SLEEP["pg_usleep 1 s<br/>continue"]
    EXIT["break loop"]
    FRMGR{"rmgr filter pass?"}
    FXID{"xid filter pass?"}
    FEXT{"relation/block/fork<br/>filter pass?"}
    FFPW{"fpw filter pass?"}
    OUT{"stats mode?"}
    STORE["XLogRecStoreStats()<br/>stats.endptr = EndRecPtr"]
    DISP["XLogDumpDisplayRecord()"]
    FPW["save_fullpage_path?<br/>XLogRecordSaveFPWs()"]
    LIMIT{"stop_after_records<br/>reached?"}

    TOP --> SIG
    SIG -->|"yes"| EXIT
    SIG -->|"no"| READ --> NULLQ
    NULLQ -->|"yes"| FOLLOW
    FOLLOW -->|"yes"| SLEEP --> TOP
    FOLLOW -->|"no"| EXIT
    NULLQ -->|"no"| FRMGR
    FRMGR -->|"no"| TOP
    FRMGR -->|"yes"| FXID
    FXID -->|"no"| TOP
    FXID -->|"yes"| FEXT
    FEXT -->|"no"| TOP
    FEXT -->|"yes"| FFPW
    FFPW -->|"no"| TOP
    FFPW -->|"yes"| OUT
    OUT -->|"yes"| STORE --> FPW
    OUT -->|"no, display"| DISP --> FPW
    FPW --> LIMIT
    LIMIT -->|"yes"| EXIT
    LIMIT -->|"no"| TOP

Figure 2 — the decode loop. Each iteration reads one record, runs the four filter gates in fixed order (rmgr, xid, extended relation/block/fork, fpw) — a continue on any failed gate returns to the loop top with no output — then dispatches to exactly one of the stats or display paths. --save-fullpage and the --limit check run after either output path. Filters short-circuit left-to-right, so the cheapest checks (single array index, single integer compare) precede the per-block-iteration checks.

XLogDumpConfig holds all filter state. Five independent filter dimensions can be combined:

  • rmgr (--rmgr): a boolean array filter_by_rmgr[RM_MAX_ID + 1]. The --rmgr=list option calls print_rmgr_list which iterates GetRmgrDesc(i)->rm_name for i in 0..RM_MAX_BUILTIN_ID, printing the builtin rmgr names.
  • xid (--xid): a single TransactionId.
  • relation + block + fork (--relation, --block, --fork): checked by XLogRecordMatchesRelationBlock, which iterates over all block IDs in the record via XLogRecMaxBlockId / XLogRecGetBlockTagExtended.
  • full-page image presence (--fullpage): checked by XLogRecordHasFPW, which iterates block IDs and tests XLogRecHasBlockImage.

The filter_by_extended flag is the gate for the relation/block/fork check; it is set whenever any of --relation, --block, or --fork is supplied. --block requires --relation; the code enforces this after option parsing.

The --rmgr option (case 'r' in the getopt_long switch) resolves a name to an rmgr ID at parse time and sets one slot of the filter_by_rmgr[] boolean array. It accepts three forms — the literal "list" (print names and exit), a "custom###" numeric ID for an unloaded custom rmgr, and a builtin name matched case-insensitively against the rmgr table:

// main (case 'r') — pg_waldump.c (condensed)
if (pg_strcasecmp(optarg, "list") == 0)
{
print_rmgr_list();
exit(EXIT_SUCCESS);
}
/* "custom###": the module isn't loaded, so match by numeric ID */
if (sscanf(optarg, "custom%03d", &rmid) == 1)
{
if (!RmgrIdIsCustom(rmid))
/* error: custom resource manager does not exist */ ;
config.filter_by_rmgr[rmid] = true;
config.filter_by_rmgr_enabled = true;
}
else
{
/* then look for builtin rmgrs by name */
for (rmid = 0; rmid <= RM_MAX_BUILTIN_ID; rmid++)
if (pg_strcasecmp(optarg, GetRmgrDesc(rmid)->rm_name) == 0)
{
config.filter_by_rmgr[rmid] = true;
config.filter_by_rmgr_enabled = true;
break;
}
if (rmid > RM_MAX_BUILTIN_ID)
/* error: resource manager does not exist */ ;
}

Resolving the name to an integer ID at parse time is what lets the main loop’s filter check be a single O(1) array index (config.filter_by_rmgr[record->xl_rmid]) rather than a string comparison per record. Repeating --rmgr sets multiple slots, so the filter is a union (records matching any listed rmgr pass).

XLogDumpDisplayRecord produces one line per record. It calls GetRmgrDesc(XLogRecGetRmid(record)) to obtain the RmgrDescData for this record’s rmgr, then prints the fixed fields, then calls the two text callbacks:

// XLogDumpDisplayRecord — pg_waldump.c (condensed)
const RmgrDescData *desc = GetRmgrDesc(XLogRecGetRmid(record));
XLogRecGetLen(record, &rec_len, &fpi_len);
printf("rmgr: %-11s len (rec/tot): %6u/%6u, tx: %10u, lsn: %X/%08X, prev %X/%08X, ",
desc->rm_name,
rec_len, XLogRecGetTotalLen(record),
XLogRecGetXid(record),
LSN_FORMAT_ARGS(record->ReadRecPtr),
LSN_FORMAT_ARGS(xl_prev));
id = desc->rm_identify(info);
if (id == NULL)
printf("desc: UNKNOWN (%x) ", info & ~XLR_INFO_MASK);
else
printf("desc: %s ", id);
initStringInfo(&s);
desc->rm_desc(&s, record);
printf("%s", s.data);
XLogRecGetBlockRefInfo(record, true, config->bkp_details, &s, NULL);
printf("%s", s.data);

rm_identify maps the 4-bit opcode (upper nibble of xl_info) to a string such as "INSERT", "UPDATE", or "HEAP_LOCK". rm_desc appends the payload summary — for a Heap INSERT, this is the target block number and offset; for a checkpoint, it is the redo pointer and timeline. Both callbacks are defined in the rmgr’s *desc.c file (e.g. heapdesc.c, nbtdesc.c) and are compiled into both the server and pg_waldump’s rmgrdesc.c via the same PG_RMGR X-macro expansion.

XLogRecGetBlockRefInfo appends the block reference list — a comma-separated blk N: rel T/D/R fork main blk B summary for each registered block, plus FPW if the block carries a full-page image.

When --stats is supplied, XLogDumpDisplayRecord is replaced by XLogRecStoreStats (from src/include/access/xlogstats.h) which increments XLogStats.rmgr_stats[rmid] and optionally record_stats[rmid][xl_info >> 4]. At the end of the scan, XLogDumpDisplayStats prints the summary:

// XLogDumpDisplayStats — pg_waldump.c (condensed)
for (ri = 0; ri <= RM_MAX_ID; ri++)
{
if (!RmgrIdIsValid(ri)) continue;
desc = GetRmgrDesc(ri);
if (!config->stats_per_record)
{
count = stats->rmgr_stats[ri].count;
/* ... */
XLogDumpStatsRow(desc->rm_name, count, total_count,
rec_len, total_rec_len, fpi_len, total_fpi_len,
tot_len, total_len);
}
else
{
for (rj = 0; rj < MAX_XLINFO_TYPES; rj++)
{
/* skip zero-count entries */
id = desc->rm_identify(rj << 4);
XLogDumpStatsRow(psprintf("%s/%s", desc->rm_name, id), ...);
}
}
}

MAX_XLINFO_TYPES is 16 (the 4-bit opcode space per rmgr). The stats table has four columns — record count (%), record-body bytes (%), FPI bytes (%), combined bytes (%) — each shown with a percentage of the column total. --stats=record produces per-rmgr/opcode rows; --stats alone produces per-rmgr rows. The percentages in per-row cells are against the column total; the final Total row shows FPI share as [xx.xx%] against the row total.

--save-fullpage=DIR enables XLogRecordSaveFPWs, which runs after every matching record (regardless of display/stats mode). It iterates over block IDs, skips any that lack XLogRecHasBlockImage, calls RestoreBlockImage to decompress the stored image into a PGAlignedBlock buffer, then writes it to a file named:

<DIR>/<tli>-<LSN_hi>/<LSN_lo>.<spcOid>.<dbOid>.<relNumber>.<blk>_<fork>

This is the same image that the recovery driver would apply to a data page during redo. The file is a raw 8 KB (or BLCKSZ-byte) page — it can be opened with any tool that understands the PostgreSQL page layout (see postgres-page-layout.md).

pg_waldump runs without loading extension modules. When it encounters a record with a custom rmgr ID (above RM_MAX_BUILTIN_ID), GetRmgrDesc calls initialize_custom_rmgrs on the first such lookup, which populates CustomRmgrDesc[] with numeric names custom000..customNNN and stub default_desc / default_identify callbacks. The stub rm_identify returns NULL (triggering "UNKNOWN" in the display); default_desc appends just the raw rmgr ID. The --rmgr option accepts "custom###" names in this same convention so a user can filter on a numeric custom rmgr ID even without the module loaded.

--follow keeps the loop alive after end-of-WAL rather than exiting. When XLogReadRecord returns NULL (end of current segment) and endptr_reached is false, the loop sleeps 1 second and retries. WALDumpOpenSegment also retries up to 10 times with 500 ms delays when the next segment file does not yet exist — covering the window after the server has written the final record of the previous segment but before it has created the new one. SIGINT sets the volatile sig_atomic_t time_to_stop flag, which the main loop checks at the top of each iteration.

Anchor on symbol names, not line numbers. Line numbers in the table below are hints scoped to commit 273fe94 (REL_18_STABLE, 2026-06-06).

  • main — option parsing (getopt_long over 18 long options), directory discovery, XLogReaderAllocate, XLogFindNextRecord, main read loop, final XLogDumpDisplayStats if --stats, XLogReaderFree.
  • sigint_handler — sets time_to_stop = true; checked at loop top.
  • identify_target_directory — three-candidate search (explicit path, path + /pg_wal, then ., pg_wal, $PGDATA/pg_wal).
  • search_directory — opens a WAL file (by name or by scanning for any IsXLogFileName match), reads first page to discover WalSegSz.
  • verify_directory / open_file_in_directory / split_path — path helpers.
  • create_fullpage_directory — create output dir for --save-fullpage via pg_check_dir / pg_mkdir_p.
  • WALDumpOpenSegmentXLogReaderRoutine.segment_open callback; retries 10 × 500 ms in follow mode.
  • WALDumpCloseSegmentXLogReaderRoutine.segment_close callback.
  • WALDumpReadPageXLogReaderRoutine.page_read callback; calls WALRead with endptr enforcement.
  • XLogRecordMatchesRelationBlock — iterates block IDs via XLogRecGetBlockTagExtended; tests RelFileLocatorEquals, block number, fork number.
  • XLogRecordHasFPW — iterates block IDs; tests XLogRecHasBlockImage.
  • XLogRecordSaveFPWs — iterates block IDs, RestoreBlockImage, writes raw page to --save-fullpage dir.
  • XLogDumpDisplayRecord — formats the per-record display line: rm_name, lengths, xid, LSN, prev LSN, rm_identify(info), rm_desc(s, record), XLogRecGetBlockRefInfo.
  • XLogDumpStatsRow — formats one row of the stats table with four count/pct columns.
  • XLogDumpDisplayStats — two-pass stats printer: first pass totals; second pass per-rmgr (or per-rmgr/opcode) rows; final Total row with FPI share.
  • print_rmgr_list — prints builtin rmgr names for --rmgr=list.
  • usage — help text listing all 18 options.

rmgr description layer (rmgrdesc.c, rmgrdesc.h)

Section titled “rmgr description layer (rmgrdesc.c, rmgrdesc.h)”
  • RmgrDescTable[RM_N_BUILTIN_IDS] — static array built by the PG_RMGR X-macro over access/rmgrlist.h; each entry holds rm_name, rm_desc, rm_identify. No redo, startup, or cleanup entries.
  • CustomRmgrDesc[RM_N_CUSTOM_IDS] — lazy-initialized by initialize_custom_rmgrs on first custom-rmgr lookup; names are "custom000" .. "custom127".
  • GetRmgrDesc(rmid) — returns &RmgrDescTable[rmid] for builtins; initializes and returns &CustomRmgrDesc[rmid - RM_MIN_CUSTOM_ID] for customs.
  • XLogDumpPrivate{timeline, startptr, endptr, endptr_reached}. Passed as private_data to XLogReaderState.
  • XLogDumpConfig — all option state: display flags (quiet, bkp_details, follow, stats, stats_per_record), filter fields (filter_by_rmgr[], filter_by_xid, filter_by_relation, filter_by_relation_block, filter_by_relation_forknum, filter_by_fpw), stop_after_records, save_fullpage_path.
  • XLogStats (from xlogstats.h) — {count, startptr, endptr, rmgr_stats[RM_MAX_ID+1], record_stats[RM_MAX_ID+1][MAX_XLINFO_TYPES]}. Each XLogRecStats slot holds {count, rec_len, fpi_len}.
  • RmgrDescData (from rmgrdesc.h in pg_waldump, mirroring rmgr.h) — {rm_name, rm_desc, rm_identify}.
  • XLogReaderRoutine (from xlogreader.h) — {page_read, segment_open, segment_close} function pointers; XL_ROUTINE(...) initializes a compound literal.

Position hints (as of 2026-06-06, REL_18 273fe94)

Section titled “Position hints (as of 2026-06-06, REL_18 273fe94)”
SymbolFileLine
XLogDumpPrivate (typedef)pg_waldump.c47
XLogDumpConfig (typedef)pg_waldump.c55
sigint_handlerpg_waldump.c91
print_rmgr_listpg_waldump.c97
verify_directorypg_waldump.c113
create_fullpage_directorypg_waldump.c127
split_pathpg_waldump.c160
open_file_in_directorypg_waldump.c188
search_directorypg_waldump.c209
identify_target_directorypg_waldump.c291
WALDumpOpenSegmentpg_waldump.c337
WALDumpCloseSegmentpg_waldump.c379
WALDumpReadPagepg_waldump.c388
XLogRecordMatchesRelationBlockpg_waldump.c437
XLogRecordHasFPWpg_waldump.c469
XLogRecordSaveFPWspg_waldump.c489
XLogDumpDisplayRecordpg_waldump.c545
XLogDumpStatsRowpg_waldump.c584
XLogDumpDisplayStatspg_waldump.c625
usagepg_waldump.c755
mainpg_waldump.c792
RmgrDescTable[]rmgrdesc.c37
initialize_custom_rmgrsrmgrdesc.c68
GetRmgrDescrmgrdesc.c86
XLogStats (struct)xlogstats.h28
XLogRecStoreStatsxlogstats.h41
XLogReaderRoutine (struct)xlogreader.h72
XL_ROUTINE (macro)xlogreader.h117
XLogReaderAllocatexlogreader.h331
  • WalSegSz is always discovered from the first page of a segment file, never assumed from the compiled default. Verified in search_directory: it reads XLOG_BLCKSZ bytes, casts to XLogLongPageHeader, and reads xlp_seg_size. IsValidWalSegSize checks the power-of-two constraint before WalSegSz is used anywhere else.

  • The three I/O callbacks are registered via XL_ROUTINE, which expands to a compound-literal of XLogReaderRoutine. Verified in xlogreader.h: #define XL_ROUTINE(...) &(XLogReaderRoutine){__VA_ARGS__}. The XLogReaderAllocate signature takes XLogReaderRoutine *routine.

  • --follow sleeps at two points: 500 ms in WALDumpOpenSegment (up to 10 retries for the next segment file) and 1 second in the main loop (when XLogReadRecord returns NULL). Verified in WALDumpOpenSegment (tries < 10, pg_usleep(500 * 1000)) and in the main loop (pg_usleep(1000000L)).

  • The filter_by_rmgr[] check uses record->xl_rmid directly (the xl_rmid field of the XLogRecord header), not a secondary decoded field. Verified: !config.filter_by_rmgr[record->xl_rmid] in the main loop. XLogReadRecord returns the raw XLogRecord *, so xl_rmid is the actual header byte.

  • --block requires --relation. Verified in main after getopt_long: if (config.filter_by_relation_block_enabled && !config.filter_by_relation_enabled) triggers a pg_log_error and goto bad_argument.

  • Custom rmgr names in --rmgr are accepted as "custom###" three-digit IDs, consistent with rmgrdesc.c’s initialize_custom_rmgrs. Verified: the case 'r' branch sscanf(optarg, "custom%03d", &rmid) before the builtin-name loop, and initialize_custom_rmgrs uses the same "custom%03d" format.

  • Statistics accumulation uses XLogRecStoreStats from xlogstats.h, not local code. Verified: XLogRecStoreStats(&stats, xlogreader_state) in the main loop; the function is declared extern in xlogstats.h and is shared with pg_walinspect (the contrib SQL-accessible analog, mentioned in the source comment at line 37 of pg_waldump.c).

  • XLogRecordSaveFPWs calls RestoreBlockImage to decompress FPIs before writing. Verified: if (!RestoreBlockImage(record, block_id, page)) pg_fatal(...) before fwrite. The written file is the decompressed 8 KB page, not the compressed WAL representation.

  • The stats table skips custom rmgr rows with zero count but prints all builtin rmgr rows (including zero-count ones). Verified: if (RmgrIdIsCustom(ri) && count == 0) continue; inside the stats loop — the guard applies only to custom rmgrs, so builtins always get a row.

  1. pg_walinspect parity. The source comment at the top of pg_waldump.c says “it is highly recommended to give a thought about doing the same in pg_walinspect contrib module as well” for any code change or fix. The degree of code sharing (vs. duplication) between the two is not fully traced here; a pg_walinspect-focused follow-up would clarify the boundary.

  2. Timeline handling across history files. WALDumpOpenSegment advances *tli_p if the segment is not found on the current timeline. The logic for reading timeline history files (.history files in pg_wal) and switching timelines during a scan is inherited from the XLogReader infrastructure and is not walked in detail here. A follow-up anchored on XLogReadRecord’s internal timeline switching would complete the picture.

  3. --save-fullpage file naming for non-main forks. The filename includes forkname derived from forkNames[fork] with an underscore prefix. The behavior when a fork number outside [0, MAX_FORKNUM] appears is an immediate pg_fatal rather than a skip — whether this can be triggered by a legitimate FPW of an init fork is left as an open question.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”

MySQL’s binary log is a logical log: it records row images (in ROW format) or SQL statements (in STATEMENT format), not physical page edits. mysqlbinlog is structurally similar to pg_waldump — a standalone binary, file-based I/O, a filter pipeline — but because MySQL’s log is row-oriented rather than page-oriented, mysqlbinlog can reconstruct readable DML (INSERT INTO … VALUES …) rather than rmgr-specific payload summaries. The trade-off is that MySQL’s binlog is not suitable for crash recovery of the storage engine (InnoDB uses its own redo log for that); MySQL therefore has two separate logs for the two separate concerns. PostgreSQL’s WAL serves both crash recovery and logical decoding from a single stream (the latter via postgres-logical-decoding.md).

Oracle LogMiner (introduced in Oracle 8i) is the opposite architectural choice from pg_waldump: it is a server-side PL/SQL package (DBMS_LOGMNR) that reads redo log files and presents the decoded output through SQL views (V$LOGMNR_CONTENTS). This means it has access to the full data dictionary, can reconstruct SQL-level DML, and can filter by schema and object name — but it requires a live database connection and cannot inspect redo logs from an offline crashed instance without an auxiliary setup. pg_waldump requires only files, making it usable in scenarios where the server cannot start.

pg_walinspect is a contrib extension that exposes pg_waldump-equivalent functionality through SQL functions: pg_get_wal_records_info, pg_get_wal_stats, and pg_get_wal_block_info. Because it runs inside the server, it can join WAL record data against the live catalog and is accessible to any client with the right privilege — useful for monitoring dashboards and automated tooling. The source comment in pg_waldump.c explicitly calls out the expectation that bug fixes in pg_waldump be ported to pg_walinspect. The two share XLogStats and the XLogRecStoreStats accounting function, but they are otherwise separate codebases.

Research: WAL analytics for performance tuning

Section titled “Research: WAL analytics for performance tuning”

The aggregate WAL statistics that --stats produces are a practical instance of a broader research area: using log content as a workload signal. Papers in this space include work on adaptive checkpoint scheduling (minimizing I/O amplification from FPWs by tuning checkpoint frequency per workload mix) and on WAL compression schemes that exploit the redundancy in FPIs across adjacent checkpoints. Greenplum and CockroachDB have published engineering reports on structured WAL analytics — characterizing the volume and distribution of record types across workload phases — as a lever for capacity planning. pg_waldump’s --stats=record mode is the manual version of this analysis; automated collection is the direction pg_walinspect’s SQL interface enables.

  • src/bin/pg_waldump/pg_waldump.c — main binary (REL_18_STABLE, commit 273fe94)
  • src/bin/pg_waldump/rmgrdesc.c / rmgrdesc.h — rmgr description table for pg_waldump
  • src/include/access/xlogreader.hXLogReaderState, XLogReaderRoutine, XL_ROUTINE, XLogReaderAllocate, XLogFindNextRecord, XLogReadRecord, XLogReaderFree
  • src/include/access/xlogstats.hXLogStats, XLogRecStats, XLogRecStoreStats
  • src/include/access/xlog_internal.hXLogLongPageHeader, XLogLongPageHeaderData, IsXLogFileName, XLogFileName, XLogFromFileName, XLogSegNoOffsetToRecPtr, XLByteInSeg
  • src/include/access/xlogrecord.hXLogRecord header fields
  • postgres-xlog-wal.md — WAL insertion, LSN mechanics, flush watermarks
  • postgres-wal-records-rmgr.md — rmgr dispatch table, record anatomy, FPIs
  • postgres-page-layout.md — page layout (for interpreting --save-fullpage output)
  • postgres-logical-decoding.md — logical decoding, a separate WAL consumer