Skip to content

PostgreSQL pg_upgrade — In-Place Major-Version Upgrade

Contents:

A major-version upgrade replaces the server binary and, typically, the on-disk format of system catalogs. Two strategies exist for carrying user data across this boundary.

Logical migration (dump-reload) serializes every relation into SQL text via pg_dump, drops the old cluster, initializes a new one, and replays the SQL. It is always correct — the restore path is the same code path used for any other restore — but its cost is proportional to the total size of the data. A 10 TB database takes the same time to restore as it took to load originally. See postgres-pg-dump-restore.md for the mechanism.

Physical migration (pg_upgrade) moves the heap and index files directly. The premise is that while system catalogs change between major versions, user table heap pages and index pages are often compatible: PostgreSQL’s page layout (postgres-page-layout.md), tuple wire format, and B-tree on-disk representation have been stable across most major-version boundaries since 8.x. If the data files can be moved as-is, the upgrade time shrinks to the time needed to transfer files — which with hard-link or directory-swap modes approaches zero independent of data size.

The core invariant physical migration must preserve is OID stability for a specific set of catalog columns that are stored in user data:

Catalog columnStored in user data as
pg_class.oid / pg_class.relfilenodeToast pointers in heap tuples
pg_type.oidComposite type values in user tables
pg_enum.oidEnum values in user tables
pg_tablespace.oidDirectory name on disk
pg_database.oidDirectory name on disk
pg_authid.oidLegacy large-object metadata

If the new cluster assigns different OIDs to these objects, the transferred heap files reference dangling pointers. pg_upgrade therefore takes explicit control of OID assignment during the new-cluster schema restore, running initdb and pg_restore in a special binary upgrade mode that forces OIDs to match the old cluster’s values.

Two additional invariants govern the physical-file layer:

  1. Block-size homogeneity. A page must be blocksz bytes in both clusters. Mismatches are a hard stop.
  2. WAL segment compatibility. WAL records reference page LSNs; if the WAL segment size differs, pg_resetwal cannot safely re-seal the new WAL directory.

Both are verified by check_control_data at pre-flight time.

Physical-migration tools for relational databases share a structure regardless of vendor:

Before touching any files the tool verifies that the two clusters are compatible: same block size, same WAL segment size, no prepared transactions that would leave the old cluster in a non-clean state, no extension libraries present in the old cluster that are absent from the new one, no data types whose on-disk representation changed between the two versions, no user tables using column types that reference unstable OIDs. Each check is independent and can be run without performing the actual upgrade (--check mode).

The new cluster is initialized fresh by initdb. The old cluster’s schema — DDL for every database object, in dependency order — is dumped via the logical backup tool in a mode that emits explicit OID = clauses. This schema is restored into the new cluster, which is running in a mode that accepts and enforces those explicit OIDs. The result is a new cluster whose system catalogs are structurally new but whose OID namespace matches the old cluster exactly for the objects that matter.

With schema in place, each user relation’s heap and index files are transferred from old data directory to new. Four transfer strategies cover the speed/safety/flexibility tradeoffs:

StrategyMechanismData-size costRollback safety
Copycp / read+writeFull copyOld cluster intact
Cloneioctl FICLONE / reflinkCopy-on-write, near-zeroOld cluster intact
Copy-file-rangecopy_file_range(2)In-kernel copyOld cluster intact
Hard-linklink(2)Near-zeroShared inodes — old cluster unsafe after new starts
SwapDirectory renameNear-zeroOld cluster directory replaced

After the physical files land, the new cluster must inherit the old cluster’s transaction counters (nextXid, nextOid, nextMultiXact, WAL position) so that no future transaction ID collides with an existing MVCC visibility record. pg_resetwal is used to write these values into the new cluster’s pg_control without starting a full server cycle.

ConceptPostgreSQL name
Old cluster descriptorClusterInfo old_cluster
New cluster descriptorClusterInfo new_cluster
Per-relation file mappingFileNameMap
Transfer strategy enumtransferMode / TRANSFER_MODE_*
Binary-upgrade flag sent to server--binary-upgrade (pg_dump / pg_restore)
Transaction-counter writepg_resetwal invocations in copy_xact_xlog_xid

main() in pg_upgrade.c runs a linear pipeline. The two phases (“OLD” and “NEW” in the source comments) each start and stop a postmaster:

// main — src/bin/pg_upgrade/pg_upgrade.c
parseCommandLine(argc, argv);
adjust_data_dir(&old_cluster);
adjust_data_dir(&new_cluster);
make_outputdirs(new_cluster.pgdata); /* pg_upgrade_output.d/$timestamp/ */
setup(argv[0]);
output_check_banner();
check_cluster_versions();
check_cluster_compatibility(); /* pg_control cross-check */
check_and_dump_old_cluster(); /* OLD: checks + pg_dump schema */
/* -- NEW -- */
start_postmaster(&new_cluster, true);
check_new_cluster();
report_clusters_compatible();
set_locale_and_encoding();
prepare_new_cluster(); /* vacuumdb --all --analyze + --freeze */
stop_postmaster(false);
copy_xact_xlog_xid(); /* pg_resetwal: xid/oid/multixact/WAL */
set_new_cluster_char_signedness();
/* -- NEW (second time) -- */
start_postmaster(&new_cluster, true);
prepare_new_globals(); /* set_frozenxids + restore globals dump */
create_new_objects(); /* pg_restore per database, parallel */
stop_postmaster(false);
transfer_all_new_tablespaces(...); /* move/link/clone/swap relation files */
/* pg_resetwal -o (next OID) */
create_logical_replication_slots(); /* if any */
issue_warnings_and_set_wal_level();

Figure 1 — pg_upgrade main pipeline.

flowchart TD
    A[parseCommandLine] --> B[check_cluster_versions<br/>check_cluster_compatibility]
    B --> C[check_and_dump_old_cluster<br/>OLD postmaster: checks + pg_dump schema]
    C --> D[start NEW postmaster]
    D --> E[check_new_cluster<br/>report_clusters_compatible]
    E --> F[prepare_new_cluster<br/>vacuumdb analyze + freeze]
    F --> G[stop NEW postmaster]
    G --> H[copy_xact_xlog_xid<br/>pg_resetwal: counters + WAL]
    H --> I[start NEW postmaster]
    I --> J[prepare_new_globals<br/>set_frozenxids + restore globals]
    J --> K[create_new_objects<br/>pg_restore per-db parallel]
    K --> L[stop NEW postmaster]
    L --> M[transfer_all_new_tablespaces<br/>copy / link / clone / swap]
    M --> N[pg_resetwal -o next OID]
    N --> O[create_logical_replication_slots]
    O --> P[issue_warnings_and_set_wal_level]
    P --> Q[Upgrade Complete]

Figure 1 — pg_upgrade main pipeline (REL_18_STABLE). Two postmaster start/stop cycles bracket the schema restore and the physical file transfer.

Every piece of state about old and new clusters lives in the two global ClusterInfo instances:

// ClusterInfo, ControlData — src/bin/pg_upgrade/pg_upgrade.h
typedef struct
{
ControlData controldata; /* pg_control snapshot */
DbLocaleInfo *template0; /* template0 locale / encoding */
DbInfoArr dbarr; /* per-database: relations + logical slots */
char *pgdata; /* $PGDATA path */
char *bindir; /* bin/ path (pg_dump, pg_restore, etc.) */
unsigned short port; /* postmaster listen port */
uint32 major_version;
const char *tablespace_suffix;
} ClusterInfo;
typedef struct
{
uint32 cat_ver; /* catalog version — checked for compatibility */
uint32 chkpnt_nxtxid; /* next transaction ID to transplant */
uint32 chkpnt_nxtoid; /* next OID to transplant */
uint32 chkpnt_nxtmulti; /* next MultiXactId to transplant */
uint32 chkpnt_nxtmxoff; /* next MultiXact offset */
uint32 chkpnt_oldstMulti; /* oldest MultiXactId */
uint32 chkpnt_oldstxid; /* oldest transaction ID */
uint32 blocksz; /* must match between clusters */
uint32 walseg; /* WAL segment size — must match */
bool default_char_signedness; /* PG18: char signed/unsigned flag */
/* ... */
} ControlData;

check_control_data cross-checks blocksz, walseg, align, index, toast, large_object and a handful of other fields. Any mismatch is a hard abort.

check_and_dump_old_cluster and check_new_cluster (both in check.c) run the pre-flight battery. Key checks:

// check_and_dump_old_cluster — src/bin/pg_upgrade/check.c
check_for_connection_status(&old_cluster);
get_db_rel_and_slot_infos(&old_cluster);
check_is_install_user(&old_cluster);
check_for_prepared_transactions(&old_cluster);
check_for_isn_and_int8_passing_mismatch(&old_cluster);
check_for_data_types_usage(&old_cluster); /* reg* types, line, jsonb, aclitem … */
check_for_unicode_update(&old_cluster);
/* version-gated: encoding conversions, postfix ops, polymorphics,
tables with OIDs, NOT NULL inheritance (new in PG18), pg_ role prefixes */
generate_old_dump(); /* pg_dump --schema-only --binary-upgrade */

The data-type checks are driven by the DataTypesUsageChecks table — a statically initialized array of structs, each carrying a status string, a report filename, a SQL query that extracts the OID of a problematic type, a human-readable error text, and a threshold_version that controls whether the check applies to the old cluster’s version:

// DataTypesUsageChecks data_types_usage_checks[] — src/bin/pg_upgrade/check.c
{
.status = "Checking for system-defined composite types in user tables",
.base_query = "SELECT t.oid FROM pg_catalog.pg_type t ... WHERE typtype = 'c'
AND (t.oid < 16384 OR nspname = 'information_schema')",
.threshold_version = ALL_VERSIONS
},
{
.status = "Checking for reg* data types in user tables",
.base_query = "SELECT oid FROM pg_catalog.pg_type t WHERE t.typname IN
('regcollation','regconfig','regdictionary','regnamespace',
'regoper','regoperator','regproc','regprocedure')",
.threshold_version = ALL_VERSIONS
},
/* ... aclitem (<=15), unknown (<=9.6), sql_identifier (<=11), jsonb (manual),
abstime/reltime/tinterval (<=11), line (<=9.3) ... */

Each check is run via an UpgradeTask that connects to each database in the old cluster and executes the query. A failing check writes a report file to pg_upgrade_output.d/$timestamp/ and aborts.

The PG18-new check (check_for_not_null_inheritance) rejects child tables that omit NOT NULL constraints required by their parent columns, because the schema restore will fail for those.

copy_xact_xlog_xid wires the old cluster’s counters into the new cluster’s pg_control using a sequence of pg_resetwal invocations:

// copy_xact_xlog_xid — src/bin/pg_upgrade/pg_upgrade.c
/* Copy pg_xact (commit log) from old to new */
copy_subdir_files("pg_xact", "pg_xact");
/* Transplant oldest and next XID */
exec_prog(..., "\"%s/pg_resetwal\" -f -u %u \"%s\"",
new_cluster.bindir, old_cluster.controldata.chkpnt_oldstxid, ...);
exec_prog(..., "\"%s/pg_resetwal\" -f -x %u \"%s\"",
new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid, ...);
/* Transplant MultiXact counters (if format compatible) */
copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");
copy_subdir_files("pg_multixact/members", "pg_multixact/members");
exec_prog(..., "\"%s/pg_resetwal\" -O %u -m %u,%u \"%s\"",
new_cluster.bindir,
old_cluster.controldata.chkpnt_nxtmxoff,
old_cluster.controldata.chkpnt_nxtmulti,
old_cluster.controldata.chkpnt_oldstMulti, ...);
/* Reset WAL archives to match old cluster's LSN */
exec_prog(..., "\"%s/pg_resetwal\" -l 00000001%s \"%s\"",
new_cluster.bindir,
old_cluster.controldata.nextxlogfile + 8, ...);

The pg_xact directory (commit log) is copied verbatim because the new cluster must know which of those old XIDs were committed — without it, heap tuples from the transferred files would have invisible commit status.

transfer_all_new_tablespaces (in relfilenumber.c) iterates over every database and relation, builds a FileNameMap array pairing old and new file paths, then dispatches to transfer_single_new_db:

// FileNameMap — src/bin/pg_upgrade/pg_upgrade.h
typedef struct
{
const char *old_tablespace;
const char *new_tablespace;
const char *old_tablespace_suffix;
const char *new_tablespace_suffix;
Oid db_oid;
RelFileNumber relfilenumber;
char *nspname;
char *relname;
} FileNameMap;

transfer_relfile handles one relation, iterating over 1 GB segment files (relfilenumber, relfilenumber.1, relfilenumber.2, …) and their forks (_fsm, _vm). For visibility-map forks when upgrading from a cluster predating VISIBILITY_MAP_FROZEN_BIT_CAT_VER, it rewrites the VM file to add the frozen bit rather than copying it verbatim.

This is where the relfilenumber-preservation invariant becomes concrete: the old and new file paths are built from the same map->relfilenumber field, so the heap/index file keeps its on-disk name across the transfer. That name was pinned during the binary-upgrade schema restore, which is why toast pointers inside transferred tuples still resolve. The loop also shows the per-mode dispatch and the VM-rewrite short-circuit:

// transfer_relfile — src/bin/pg_upgrade/relfilenumber.c
/* same relfilenumber on both sides — name is preserved, not reassigned */
snprintf(old_file, sizeof(old_file), "%s%s/%u/%u%s%s",
map->old_tablespace, map->old_tablespace_suffix,
map->db_oid, map->relfilenumber, type_suffix, extent_suffix);
snprintf(new_file, sizeof(new_file), "%s%s/%u/%u%s%s",
map->new_tablespace, map->new_tablespace_suffix,
map->db_oid, map->relfilenumber, type_suffix, extent_suffix);
unlink(new_file);
if (vm_must_add_frozenbit && strcmp(type_suffix, "_vm") == 0)
/* rewrite VM to add per-page frozen bit instead of copying verbatim */
rewriteVisibilityMap(old_file, new_file, map->nspname, map->relname);
else
switch (user_opts.transfer_mode)
{
case TRANSFER_MODE_CLONE: /* ioctl FICLONE / reflink */
cloneFile(old_file, new_file, map->nspname, map->relname); break;
case TRANSFER_MODE_COPY: /* read + write */
copyFile(old_file, new_file, map->nspname, map->relname); break;
case TRANSFER_MODE_COPY_FILE_RANGE: /* copy_file_range(2) */
copyFileByRange(old_file, new_file, map->nspname, map->relname); break;
case TRANSFER_MODE_LINK: /* link(2) — shared inode */
linkFile(old_file, new_file, map->nspname, map->relname); break;
case TRANSFER_MODE_SWAP: /* handled in do_swap, not here */
pg_fatal("should never happen"); break;
}

Note that TRANSFER_MODE_SWAP is explicitly forbidden in this path — swap mode renames whole database directories in do_swap and never touches the per-segment loop.

The —swap mode (new in PG18) is structurally different from the other transfer modes. Instead of copying or linking individual files, it:

  1. Moves the old cluster’s entire database directory ($PGDATA/base/$db_oid) into the new cluster’s slot using rename(2).
  2. Moves the pg_restore-generated catalog files from the moved directory back into the new database directory.
  3. Moves the remaining old catalog files aside to moved_for_upgrade/ under the old cluster, so delete_old_cluster.sh can clean them up later.
// do_swap / swap_catalog_files — src/bin/pg_upgrade/relfilenumber.c
static void
swap_catalog_files(FileNameMap *maps, int size,
const char *old_catalog_dir,
const char *new_db_dir,
const char *moved_db_dir)
{
/* Move old catalog files aside (those not in maps[] — the user data files
stay in place from the renamed directory) */
/* Move new pg_restore-generated catalog files into place */
/* Fsync everything via sync_queue */
}

The swap mode’s fsync strategy uses a batched sync_queue (a fixed-size array of paths flushed with fsync once full or at end). This avoids calling fsync per-file on the catalog files produced by pg_restore (which were written with fsync=off).

flowchart LR
    A["transfer_all_new_tablespaces"] --> B["transfer_all_new_dbs<br/>per tablespace"]
    B --> C["gen_db_file_maps<br/>old + new RelInfo arrays → FileNameMap[]"]
    C --> D["transfer_single_new_db"]
    D --> E{transfer_mode}
    E -- SWAP --> F["do_swap<br/>rename dir + swap_catalog_files"]
    E -- LINK --> G["transfer_relfile<br/>link primary + _fsm + _vm segments"]
    E -- COPY/CLONE/CFR --> H["transfer_relfile<br/>copy/clone/cfr per segment"]
    G --> I["vm_must_add_frozenbit?<br/>rewriteVisibilityMap"]
    H --> I

Figure 2 — relation-file transfer dispatch in relfilenumber.c.

PG18 introduced a cluster-level default for the signedness of the char type (default_char_signedness in ControlData). After copy_xact_xlog_xid, set_new_cluster_char_signedness reads the old cluster’s value (or the user’s --set-char-signedness override) and calls pg_resetwal --char-signedness signed|unsigned if the new cluster’s default differs. This check is version-gated: --set-char-signedness is rejected when upgrading from PG18 or later (the option only makes sense for clusters that predate the per-cluster default).

Frozen-XID bootstrapping and schema restore

Section titled “Frozen-XID bootstrapping and schema restore”

Before any user objects are restored, set_frozenxids(false) sets pg_class.relfrozenxid and pg_database.datfrozenxid for all initdb-created tables to the old cluster’s next-XID value. This prevents autovacuum from immediately aging those catalog tables relative to the transplanted XID counter.

create_new_objects runs pg_restore for each database in two passes: template1 first (serially, because transiently dropping it blocks connections), then all other databases in parallel (parallel_exec_prog). Each pg_restore invocation uses --transaction-size=1000 (the RESTORE_TRANSACTION_SIZE constant) to batch TOC entries; in parallel mode the transaction size is divided by the job count to stay within lock limits.

If the old cluster contains logical replication slots (PG17+), create_logical_replication_slots restores them in the new cluster by calling pg_create_logical_replication_slot for each slot, passing the original plugin name, two-phase decode flag, and failover flag. This happens after pg_resetwal because the slot creation records LSNs and requires the WAL to be in its final state.

  • main — top-level pipeline; owns the two postmaster start/stop cycles.
  • setup — verifies no stale postmaster PID files; tries to start/stop any found.
  • make_outputdirs — creates pg_upgrade_output.d/$timestamp/{dump,log}/.
  • prepare_new_cluster — runs vacuumdb --all --analyze then vacuumdb --all --freeze on the new cluster so pg_statistic is frozen before the old counters are transplanted.
  • prepare_new_globals — calls set_frozenxids(false) then restores globals.dump (roles, tablespaces).
  • create_new_objects — pg_restore per database, template1 first, then parallel; calls get_db_rel_and_slot_infos on the new cluster after.
  • copy_xact_xlog_xid — copies pg_xact / pg_multixact, calls pg_resetwal for XID, epoch, OID, multixact counters, and WAL position.
  • set_frozenxids — issues UPDATE on pg_class and pg_database to align frozen-XID markers with the old cluster’s XID counter.
  • set_new_cluster_char_signedness — PG18 char-signedness alignment.
  • set_locale_and_encoding — UPDATEs template0’s datcollate, datctype, datlocprovider, datlocale in the new cluster to match the old.
  • create_logical_replication_slots — per-database loop calling pg_create_logical_replication_slot.
  • check_and_dump_old_cluster — orchestrates old-cluster checks and calls generate_old_dump at the end.
  • check_new_cluster — new-cluster checks: empty check, loadable libraries, transfer-mode-specific checks (clone / copy_file_range / link / swap), logical slot and subscription state.
  • check_cluster_versions — enforces minimum source version (9.2), target must be current PG, no downgrade, binaries match data dirs. PG18: rejects --set-char-signedness for source >= 18.
  • check_cluster_compatibility — calls get_control_data + check_control_data; rejects port collision for live-check mode.
  • check_for_data_types_usage — iterates data_types_usage_checks[], builds UpgradeTask steps, runs per-database; reports failures and aborts.
  • DataTypesUsageChecks — static array; each entry has status, report_filename, base_query, report_text, threshold_version, optional version_hook.
  • check_for_not_null_inheritance — PG18-new: rejects child tables missing parent NOT NULL constraints (schema restore would fail).
  • transfer_all_new_tablespaces — entry point; dispatches by mode (parallel-by-tablespace for jobs > 1).
  • transfer_all_new_dbs — iterates old/new database pairs, calls gen_db_file_maps then transfer_single_new_db.
  • transfer_single_new_db — checks vm_must_add_frozenbit; routes to do_swap or per-map transfer_relfile loop.
  • transfer_relfile — iterates 1 GB segment files + _fsm / _vm forks; calls rewriteVisibilityMap when the frozen bit must be added; dispatches to cloneFile / copyFile / copyFileByRange / linkFile per mode.
  • do_swap — sorts maps by relfilenumber, calls prepare_for_swap then swap_catalog_files per tablespace.
  • prepare_for_swap — renames old db dir into new cluster’s slot, creates moved_for_upgrade/ staging area.
  • swap_catalog_files — moves old catalog files aside, moves pg_restore-generated catalog files into place, fsyncs via sync_queue.
  • sync_queue_* — fixed-size 1024-path queue with pre_sync_fname + batch fsync; drains on full or at end.
  • ClusterInfo — per-cluster state: controldata, dbarr, pgdata, bindir, port, major_version, tablespace_suffix.
  • ControlData — pg_control snapshot: cat_ver, chkpnt_nxtxid, chkpnt_nxtoid, chkpnt_nxtmulti, blocksz, walseg, default_char_signedness (PG18).
  • FileNameMap — per-relation transfer mapping: old/new tablespace paths + suffixes, db_oid, relfilenumber.
  • DbInfo / DbInfoArr — per-database: db_oid, db_name, rel_arr, slot_arr.
  • RelInfo — per-relation: nspname, relname, reloid, relfilenumber, tablespace.
  • LogicalSlotInfo / LogicalSlotInfoArr — per-slot: slotname, plugin, two_phase, failover.
  • UserOpts — parsed CLI options: check, live_check, transfer_mode, jobs, char_signedness (PG18), do_statistics.
  • transferMode enum — TRANSFER_MODE_{CLONE,COPY,COPY_FILE_RANGE,LINK,SWAP}.

Position hints (as of 2026-06-06 / commit 273fe94)

Section titled “Position hints (as of 2026-06-06 / commit 273fe94)”
SymbolFileLine
mainpg_upgrade.c88
make_outputdirspg_upgrade.c252
setuppg_upgrade.c337
set_locale_and_encodingpg_upgrade.c440
prepare_new_clusterpg_upgrade.c519
prepare_new_globalspg_upgrade.c549
create_new_objectspg_upgrade.c571
copy_xact_xlog_xidpg_upgrade.c749
set_frozenxidspg_upgrade.c874
create_logical_replication_slotspg_upgrade.c976
set_new_cluster_char_signednesspg_upgrade.c404
output_check_bannercheck.c570
check_and_dump_old_clustercheck.c588
check_new_clustercheck.c709
check_cluster_versionscheck.c849
check_cluster_compatibilitycheck.c904
check_for_data_types_usagecheck.c463
data_types_usage_checks[]check.c98
check_for_not_null_inheritancecheck.c— (grep: “not_null_inheritance”)
DataTypesUsageChecks (struct)check.c42
transfer_all_new_tablespacesrelfilenumber.c107
transfer_all_new_dbsrelfilenumber.c170
prepare_for_swaprelfilenumber.c236
swap_catalog_filesrelfilenumber.c362
do_swaprelfilenumber.c452
transfer_single_new_dbrelfilenumber.c500
transfer_relfilerelfilenumber.c552
sync_queue_pushrelfilenumber.c74
FileNameMap (struct)pg_upgrade.h180
ClusterInfo (struct)pg_upgrade.h287
ControlData (struct)pg_upgrade.h229
UserOpts (struct)pg_upgrade.h328
transferMode (enum)pg_upgrade.h259
RESTORE_TRANSACTION_SIZEpg_upgrade.c58
DEFAULT_CHAR_SIGNEDNESS_CAT_VERpg_upgrade.h132
  • --swap transfer mode is present in REL_18_STABLE. TRANSFER_MODE_SWAP appears in the transferMode enum (pg_upgrade.h:265) and is dispatched in transfer_single_new_db. The swap-specific helpers (prepare_for_swap, swap_catalog_files, do_swap, sync_queue_*) are all present in relfilenumber.c. Swap mode requires source >= PG10 (pg_upgrade.c main, check.c:753).

  • --set-char-signedness option is new in PG18 and is rejected for source >= 18. UserOpts.char_signedness defaults to -1 (unset). The option is accepted from the CLI. check_cluster_versions (check.c:895) aborts with a clear message if source cluster major version >= 18 and the option was supplied. set_new_cluster_char_signedness (pg_upgrade.c:404) calls pg_resetwal --char-signedness only when the new cluster’s value differs from the resolved target.

  • RESTORE_TRANSACTION_SIZE is 1000 (hard-coded). Defined at pg_upgrade.c:58. A comment notes this could become user-controllable; it is not a GUC. In parallel mode (jobs > 1) the per-job transaction size is Max(1000 / jobs, 10).

  • check_for_not_null_inheritance is a PG18-new check. The version gate in check_and_dump_old_cluster is <= 1800, meaning it fires for all source versions up to and including PG18 — the check is unconditional for the current target. Its purpose is to catch the pre-PG18 schema where child tables could omit parent NOT NULL constraints.

  • pg_xact is copied verbatim; pg_multixact copy is version-gated. copy_xact_xlog_xid copies pg_xact unconditionally. pg_multixact files are copied only if both clusters have cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER; otherwise only the counter values are reset (not the files).

  • Logical replication slot migration is conditional on count_old_cluster_logical_slots() > 0. The new postmaster is started a third time only if old-cluster slots exist. This avoids the extra postmaster cycle for clusters without logical replication.

  1. check_for_not_null_inheritance line number. The function is declared as a static in check.c:28 but the definition was not reached in the lines read (check.c runs 2376 lines). Investigation path: grep -n check_for_not_null_inheritance /data/hgryoo/references/postgres/src/bin/pg_upgrade/check.c.

  2. --do-statistics option behavior. UserOpts.do_statistics is parsed but its effect in the pipeline (whether it triggers a separate statistics transfer step) was not traced in the lines read. Investigation path: grep -n do_statistics /data/hgryoo/references/postgres/src/bin/pg_upgrade/*.c.

  3. Swap mode tablespace limitation. prepare_for_swap notes a comment (“XXX: The below line is a hack”) that the new tablespace path is assumed equal to the old tablespace path, blocking support for in-place tablespaces in swap mode. Whether this is tracked as a known limitation or a planned fix is not captured in the source.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”
  • Oracle Database upgrade (dbupgrade / DBUA) — Oracle performs in-place catalog upgrade by running upgrade scripts against the running database. No physical file transfer needed because the heap format is stable; the catalog DDL is versioned separately. A comparison would reveal whether PostgreSQL’s schema-dump + OID-pin approach has lower correctness risk than Oracle’s in-place catalog mutation.

  • MySQL / InnoDB upgrade — InnoDB marks data dictionary tables with format versions; the server runs DDL upgrade on first start. No analog to pg_upgrade’s explicit OID control; InnoDB’s clustered-index design avoids the toast-pointer OID problem that makes OID stability mandatory in PostgreSQL.

  • pg_upgrade + logical replication as a zero-downtime path — Combining pg_upgrade (for the physical copy) with logical replication (to replay in-flight writes during the copy window) is a documented production pattern. The --swap mode (PG18) shrinks the physical-copy window to near zero even for large databases, making this combination more practical.

  • Reflink / copy-on-write upgradeTRANSFER_MODE_CLONE uses ioctl(FICLONE) or copy_file_range with reflink, available on btrfs and XFS with reflink. This makes the transfer instantaneous and keeps the old files intact (copy-on-write on first write to either copy). The performance implications on WAL-heavy workloads post-upgrade warrant measurement.

(none — synthesized directly from source tree)

Source code (REL_18_STABLE, commit 273fe94)

Section titled “Source code (REL_18_STABLE, commit 273fe94)”
  • src/bin/pg_upgrade/pg_upgrade.c — main pipeline, helpers
  • src/bin/pg_upgrade/pg_upgrade.h — all key types
  • src/bin/pg_upgrade/check.c — pre-flight checks
  • src/bin/pg_upgrade/relfilenumber.c — relation-file transfer
  • knowledge/code-analysis/postgres/postgres-pg-dump-restore.md — pg_dump / pg_restore mechanism (the schema-dump and restore steps pg_upgrade calls)
  • knowledge/code-analysis/postgres/postgres-page-layout.md — heap page format (why pages are transferable across versions)
  • knowledge/code-analysis/postgres/postgres-mvcc-snapshots.md — XID visibility (why commit-log transplant is necessary)
  • knowledge/code-analysis/postgres/postgres-initdb-bootstrap-genbki.md — initdb / binary-upgrade mode (how the new cluster’s OID assignments are forced to match)