PostgreSQL pg_upgrade — In-Place Major-Version Upgrade
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-06)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”A major-version upgrade replaces the server binary and, typically, the on-disk format of system catalogs. Two strategies exist for carrying user data across this boundary.
Logical migration (dump-reload) serializes every relation into SQL text via
pg_dump, drops the old cluster, initializes a new one, and replays the SQL.
It is always correct — the restore path is the same code path used for any
other restore — but its cost is proportional to the total size of the data.
A 10 TB database takes the same time to restore as it took to load originally.
See postgres-pg-dump-restore.md for the mechanism.
Physical migration (pg_upgrade) moves the heap and index files directly.
The premise is that while system catalogs change between major versions, user
table heap pages and index pages are often compatible: PostgreSQL’s page layout
(postgres-page-layout.md), tuple wire format, and B-tree on-disk representation
have been stable across most major-version boundaries since 8.x. If the data
files can be moved as-is, the upgrade time shrinks to the time needed to
transfer files — which with hard-link or directory-swap modes approaches zero
independent of data size.
The core invariant physical migration must preserve is OID stability for a specific set of catalog columns that are stored in user data:
| Catalog column | Stored in user data as |
|---|---|
pg_class.oid / pg_class.relfilenode | Toast pointers in heap tuples |
pg_type.oid | Composite type values in user tables |
pg_enum.oid | Enum values in user tables |
pg_tablespace.oid | Directory name on disk |
pg_database.oid | Directory name on disk |
pg_authid.oid | Legacy large-object metadata |
If the new cluster assigns different OIDs to these objects, the transferred
heap files reference dangling pointers. pg_upgrade therefore takes explicit
control of OID assignment during the new-cluster schema restore, running
initdb and pg_restore in a special binary upgrade mode that forces
OIDs to match the old cluster’s values.
Two additional invariants govern the physical-file layer:
- Block-size homogeneity. A page must be
blockszbytes in both clusters. Mismatches are a hard stop. - WAL segment compatibility. WAL records reference page LSNs; if the
WAL segment size differs,
pg_resetwalcannot safely re-seal the new WAL directory.
Both are verified by check_control_data at pre-flight time.
Common DBMS Design
Section titled “Common DBMS Design”Physical-migration tools for relational databases share a structure regardless of vendor:
Pre-flight check battery
Section titled “Pre-flight check battery”Before touching any files the tool verifies that the two clusters are
compatible: same block size, same WAL segment size, no prepared transactions
that would leave the old cluster in a non-clean state, no extension libraries
present in the old cluster that are absent from the new one, no data types
whose on-disk representation changed between the two versions, no user tables
using column types that reference unstable OIDs. Each check is independent
and can be run without performing the actual upgrade (--check mode).
Schema-only dump + binary-mode restore
Section titled “Schema-only dump + binary-mode restore”The new cluster is initialized fresh by initdb. The old cluster’s schema —
DDL for every database object, in dependency order — is dumped via the logical
backup tool in a mode that emits explicit OID = clauses. This schema is
restored into the new cluster, which is running in a mode that accepts and
enforces those explicit OIDs. The result is a new cluster whose system
catalogs are structurally new but whose OID namespace matches the old cluster
exactly for the objects that matter.
Relation-file transfer
Section titled “Relation-file transfer”With schema in place, each user relation’s heap and index files are transferred from old data directory to new. Four transfer strategies cover the speed/safety/flexibility tradeoffs:
| Strategy | Mechanism | Data-size cost | Rollback safety |
|---|---|---|---|
| Copy | cp / read+write | Full copy | Old cluster intact |
| Clone | ioctl FICLONE / reflink | Copy-on-write, near-zero | Old cluster intact |
| Copy-file-range | copy_file_range(2) | In-kernel copy | Old cluster intact |
| Hard-link | link(2) | Near-zero | Shared inodes — old cluster unsafe after new starts |
| Swap | Directory rename | Near-zero | Old cluster directory replaced |
Transaction-counter transplant
Section titled “Transaction-counter transplant”After the physical files land, the new cluster must inherit the old cluster’s
transaction counters (nextXid, nextOid, nextMultiXact, WAL position)
so that no future transaction ID collides with an existing MVCC visibility
record. pg_resetwal is used to write these values into the new cluster’s
pg_control without starting a full server cycle.
Theory ↔ PostgreSQL mapping
Section titled “Theory ↔ PostgreSQL mapping”| Concept | PostgreSQL name |
|---|---|
| Old cluster descriptor | ClusterInfo old_cluster |
| New cluster descriptor | ClusterInfo new_cluster |
| Per-relation file mapping | FileNameMap |
| Transfer strategy enum | transferMode / TRANSFER_MODE_* |
| Binary-upgrade flag sent to server | --binary-upgrade (pg_dump / pg_restore) |
| Transaction-counter write | pg_resetwal invocations in copy_xact_xlog_xid |
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”Overall pipeline
Section titled “Overall pipeline”main() in pg_upgrade.c runs a linear pipeline. The two phases (“OLD” and
“NEW” in the source comments) each start and stop a postmaster:
// main — src/bin/pg_upgrade/pg_upgrade.cparseCommandLine(argc, argv);adjust_data_dir(&old_cluster);adjust_data_dir(&new_cluster);make_outputdirs(new_cluster.pgdata); /* pg_upgrade_output.d/$timestamp/ */setup(argv[0]);
output_check_banner();check_cluster_versions();check_cluster_compatibility(); /* pg_control cross-check */check_and_dump_old_cluster(); /* OLD: checks + pg_dump schema */
/* -- NEW -- */start_postmaster(&new_cluster, true);check_new_cluster();report_clusters_compatible();
set_locale_and_encoding();prepare_new_cluster(); /* vacuumdb --all --analyze + --freeze */stop_postmaster(false);
copy_xact_xlog_xid(); /* pg_resetwal: xid/oid/multixact/WAL */set_new_cluster_char_signedness();
/* -- NEW (second time) -- */start_postmaster(&new_cluster, true);prepare_new_globals(); /* set_frozenxids + restore globals dump */create_new_objects(); /* pg_restore per database, parallel */stop_postmaster(false);
transfer_all_new_tablespaces(...); /* move/link/clone/swap relation files *//* pg_resetwal -o (next OID) */create_logical_replication_slots(); /* if any */issue_warnings_and_set_wal_level();Figure 1 — pg_upgrade main pipeline.
flowchart TD
A[parseCommandLine] --> B[check_cluster_versions<br/>check_cluster_compatibility]
B --> C[check_and_dump_old_cluster<br/>OLD postmaster: checks + pg_dump schema]
C --> D[start NEW postmaster]
D --> E[check_new_cluster<br/>report_clusters_compatible]
E --> F[prepare_new_cluster<br/>vacuumdb analyze + freeze]
F --> G[stop NEW postmaster]
G --> H[copy_xact_xlog_xid<br/>pg_resetwal: counters + WAL]
H --> I[start NEW postmaster]
I --> J[prepare_new_globals<br/>set_frozenxids + restore globals]
J --> K[create_new_objects<br/>pg_restore per-db parallel]
K --> L[stop NEW postmaster]
L --> M[transfer_all_new_tablespaces<br/>copy / link / clone / swap]
M --> N[pg_resetwal -o next OID]
N --> O[create_logical_replication_slots]
O --> P[issue_warnings_and_set_wal_level]
P --> Q[Upgrade Complete]
Figure 1 — pg_upgrade main pipeline (REL_18_STABLE). Two postmaster start/stop cycles bracket the schema restore and the physical file transfer.
ClusterInfo and ControlData
Section titled “ClusterInfo and ControlData”Every piece of state about old and new clusters lives in the two global
ClusterInfo instances:
// ClusterInfo, ControlData — src/bin/pg_upgrade/pg_upgrade.htypedef struct{ ControlData controldata; /* pg_control snapshot */ DbLocaleInfo *template0; /* template0 locale / encoding */ DbInfoArr dbarr; /* per-database: relations + logical slots */ char *pgdata; /* $PGDATA path */ char *bindir; /* bin/ path (pg_dump, pg_restore, etc.) */ unsigned short port; /* postmaster listen port */ uint32 major_version; const char *tablespace_suffix;} ClusterInfo;
typedef struct{ uint32 cat_ver; /* catalog version — checked for compatibility */ uint32 chkpnt_nxtxid; /* next transaction ID to transplant */ uint32 chkpnt_nxtoid; /* next OID to transplant */ uint32 chkpnt_nxtmulti; /* next MultiXactId to transplant */ uint32 chkpnt_nxtmxoff; /* next MultiXact offset */ uint32 chkpnt_oldstMulti; /* oldest MultiXactId */ uint32 chkpnt_oldstxid; /* oldest transaction ID */ uint32 blocksz; /* must match between clusters */ uint32 walseg; /* WAL segment size — must match */ bool default_char_signedness; /* PG18: char signed/unsigned flag */ /* ... */} ControlData;check_control_data cross-checks blocksz, walseg, align, index,
toast, large_object and a handful of other fields. Any mismatch is a
hard abort.
Pre-flight check battery
Section titled “Pre-flight check battery”check_and_dump_old_cluster and check_new_cluster (both in check.c) run
the pre-flight battery. Key checks:
// check_and_dump_old_cluster — src/bin/pg_upgrade/check.ccheck_for_connection_status(&old_cluster);get_db_rel_and_slot_infos(&old_cluster);check_is_install_user(&old_cluster);check_for_prepared_transactions(&old_cluster);check_for_isn_and_int8_passing_mismatch(&old_cluster);check_for_data_types_usage(&old_cluster); /* reg* types, line, jsonb, aclitem … */check_for_unicode_update(&old_cluster);/* version-gated: encoding conversions, postfix ops, polymorphics, tables with OIDs, NOT NULL inheritance (new in PG18), pg_ role prefixes */generate_old_dump(); /* pg_dump --schema-only --binary-upgrade */The data-type checks are driven by the DataTypesUsageChecks table — a
statically initialized array of structs, each carrying a status string, a
report filename, a SQL query that extracts the OID of a problematic type, a
human-readable error text, and a threshold_version that controls whether
the check applies to the old cluster’s version:
// DataTypesUsageChecks data_types_usage_checks[] — src/bin/pg_upgrade/check.c{ .status = "Checking for system-defined composite types in user tables", .base_query = "SELECT t.oid FROM pg_catalog.pg_type t ... WHERE typtype = 'c' AND (t.oid < 16384 OR nspname = 'information_schema')", .threshold_version = ALL_VERSIONS},{ .status = "Checking for reg* data types in user tables", .base_query = "SELECT oid FROM pg_catalog.pg_type t WHERE t.typname IN ('regcollation','regconfig','regdictionary','regnamespace', 'regoper','regoperator','regproc','regprocedure')", .threshold_version = ALL_VERSIONS},/* ... aclitem (<=15), unknown (<=9.6), sql_identifier (<=11), jsonb (manual), abstime/reltime/tinterval (<=11), line (<=9.3) ... */Each check is run via an UpgradeTask that connects to each database in the
old cluster and executes the query. A failing check writes a report file to
pg_upgrade_output.d/$timestamp/ and aborts.
The PG18-new check (check_for_not_null_inheritance) rejects child tables
that omit NOT NULL constraints required by their parent columns, because the
schema restore will fail for those.
Transaction-counter transplant
Section titled “Transaction-counter transplant”copy_xact_xlog_xid wires the old cluster’s counters into the new cluster’s
pg_control using a sequence of pg_resetwal invocations:
// copy_xact_xlog_xid — src/bin/pg_upgrade/pg_upgrade.c/* Copy pg_xact (commit log) from old to new */copy_subdir_files("pg_xact", "pg_xact");
/* Transplant oldest and next XID */exec_prog(..., "\"%s/pg_resetwal\" -f -u %u \"%s\"", new_cluster.bindir, old_cluster.controldata.chkpnt_oldstxid, ...);exec_prog(..., "\"%s/pg_resetwal\" -f -x %u \"%s\"", new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid, ...);
/* Transplant MultiXact counters (if format compatible) */copy_subdir_files("pg_multixact/offsets", "pg_multixact/offsets");copy_subdir_files("pg_multixact/members", "pg_multixact/members");exec_prog(..., "\"%s/pg_resetwal\" -O %u -m %u,%u \"%s\"", new_cluster.bindir, old_cluster.controldata.chkpnt_nxtmxoff, old_cluster.controldata.chkpnt_nxtmulti, old_cluster.controldata.chkpnt_oldstMulti, ...);
/* Reset WAL archives to match old cluster's LSN */exec_prog(..., "\"%s/pg_resetwal\" -l 00000001%s \"%s\"", new_cluster.bindir, old_cluster.controldata.nextxlogfile + 8, ...);The pg_xact directory (commit log) is copied verbatim because the new
cluster must know which of those old XIDs were committed — without it,
heap tuples from the transferred files would have invisible commit status.
Relation-file transfer
Section titled “Relation-file transfer”transfer_all_new_tablespaces (in relfilenumber.c) iterates over every
database and relation, builds a FileNameMap array pairing old and new
file paths, then dispatches to transfer_single_new_db:
// FileNameMap — src/bin/pg_upgrade/pg_upgrade.htypedef struct{ const char *old_tablespace; const char *new_tablespace; const char *old_tablespace_suffix; const char *new_tablespace_suffix; Oid db_oid; RelFileNumber relfilenumber; char *nspname; char *relname;} FileNameMap;transfer_relfile handles one relation, iterating over 1 GB segment files
(relfilenumber, relfilenumber.1, relfilenumber.2, …) and their forks
(_fsm, _vm). For visibility-map forks when upgrading from a cluster
predating VISIBILITY_MAP_FROZEN_BIT_CAT_VER, it rewrites the VM file to
add the frozen bit rather than copying it verbatim.
This is where the relfilenumber-preservation invariant becomes concrete:
the old and new file paths are built from the same map->relfilenumber
field, so the heap/index file keeps its on-disk name across the transfer.
That name was pinned during the binary-upgrade schema restore, which is why
toast pointers inside transferred tuples still resolve. The loop also shows
the per-mode dispatch and the VM-rewrite short-circuit:
// transfer_relfile — src/bin/pg_upgrade/relfilenumber.c/* same relfilenumber on both sides — name is preserved, not reassigned */snprintf(old_file, sizeof(old_file), "%s%s/%u/%u%s%s", map->old_tablespace, map->old_tablespace_suffix, map->db_oid, map->relfilenumber, type_suffix, extent_suffix);snprintf(new_file, sizeof(new_file), "%s%s/%u/%u%s%s", map->new_tablespace, map->new_tablespace_suffix, map->db_oid, map->relfilenumber, type_suffix, extent_suffix);
unlink(new_file);
if (vm_must_add_frozenbit && strcmp(type_suffix, "_vm") == 0) /* rewrite VM to add per-page frozen bit instead of copying verbatim */ rewriteVisibilityMap(old_file, new_file, map->nspname, map->relname);else switch (user_opts.transfer_mode) { case TRANSFER_MODE_CLONE: /* ioctl FICLONE / reflink */ cloneFile(old_file, new_file, map->nspname, map->relname); break; case TRANSFER_MODE_COPY: /* read + write */ copyFile(old_file, new_file, map->nspname, map->relname); break; case TRANSFER_MODE_COPY_FILE_RANGE: /* copy_file_range(2) */ copyFileByRange(old_file, new_file, map->nspname, map->relname); break; case TRANSFER_MODE_LINK: /* link(2) — shared inode */ linkFile(old_file, new_file, map->nspname, map->relname); break; case TRANSFER_MODE_SWAP: /* handled in do_swap, not here */ pg_fatal("should never happen"); break; }Note that TRANSFER_MODE_SWAP is explicitly forbidden in this path — swap
mode renames whole database directories in do_swap and never touches the
per-segment loop.
The —swap mode (new in PG18) is structurally different from the other transfer modes. Instead of copying or linking individual files, it:
- Moves the old cluster’s entire database directory (
$PGDATA/base/$db_oid) into the new cluster’s slot usingrename(2). - Moves the pg_restore-generated catalog files from the moved directory back into the new database directory.
- Moves the remaining old catalog files aside to
moved_for_upgrade/under the old cluster, sodelete_old_cluster.shcan clean them up later.
// do_swap / swap_catalog_files — src/bin/pg_upgrade/relfilenumber.cstatic voidswap_catalog_files(FileNameMap *maps, int size, const char *old_catalog_dir, const char *new_db_dir, const char *moved_db_dir){ /* Move old catalog files aside (those not in maps[] — the user data files stay in place from the renamed directory) */ /* Move new pg_restore-generated catalog files into place */ /* Fsync everything via sync_queue */}The swap mode’s fsync strategy uses a batched sync_queue (a fixed-size
array of paths flushed with fsync once full or at end). This avoids calling
fsync per-file on the catalog files produced by pg_restore (which were
written with fsync=off).
flowchart LR
A["transfer_all_new_tablespaces"] --> B["transfer_all_new_dbs<br/>per tablespace"]
B --> C["gen_db_file_maps<br/>old + new RelInfo arrays → FileNameMap[]"]
C --> D["transfer_single_new_db"]
D --> E{transfer_mode}
E -- SWAP --> F["do_swap<br/>rename dir + swap_catalog_files"]
E -- LINK --> G["transfer_relfile<br/>link primary + _fsm + _vm segments"]
E -- COPY/CLONE/CFR --> H["transfer_relfile<br/>copy/clone/cfr per segment"]
G --> I["vm_must_add_frozenbit?<br/>rewriteVisibilityMap"]
H --> I
Figure 2 — relation-file transfer dispatch in relfilenumber.c.
Char signedness (PG18)
Section titled “Char signedness (PG18)”PG18 introduced a cluster-level default for the signedness of the char type
(default_char_signedness in ControlData). After copy_xact_xlog_xid,
set_new_cluster_char_signedness reads the old cluster’s value (or the
user’s --set-char-signedness override) and calls pg_resetwal --char-signedness signed|unsigned if the new cluster’s default differs.
This check is version-gated: --set-char-signedness is rejected when
upgrading from PG18 or later (the option only makes sense for clusters that
predate the per-cluster default).
Frozen-XID bootstrapping and schema restore
Section titled “Frozen-XID bootstrapping and schema restore”Before any user objects are restored, set_frozenxids(false) sets
pg_class.relfrozenxid and pg_database.datfrozenxid for all
initdb-created tables to the old cluster’s next-XID value. This prevents
autovacuum from immediately aging those catalog tables relative to the
transplanted XID counter.
create_new_objects runs pg_restore for each database in two passes:
template1 first (serially, because transiently dropping it blocks connections),
then all other databases in parallel (parallel_exec_prog). Each pg_restore
invocation uses --transaction-size=1000 (the RESTORE_TRANSACTION_SIZE
constant) to batch TOC entries; in parallel mode the transaction size is
divided by the job count to stay within lock limits.
Logical replication slot migration
Section titled “Logical replication slot migration”If the old cluster contains logical replication slots (PG17+),
create_logical_replication_slots restores them in the new cluster by calling
pg_create_logical_replication_slot for each slot, passing the original
plugin name, two-phase decode flag, and failover flag. This happens after
pg_resetwal because the slot creation records LSNs and requires the WAL to
be in its final state.
Source Walkthrough
Section titled “Source Walkthrough”pg_upgrade.c — pipeline and helpers
Section titled “pg_upgrade.c — pipeline and helpers”main— top-level pipeline; owns the two postmaster start/stop cycles.setup— verifies no stale postmaster PID files; tries to start/stop any found.make_outputdirs— createspg_upgrade_output.d/$timestamp/{dump,log}/.prepare_new_cluster— runsvacuumdb --all --analyzethenvacuumdb --all --freezeon the new cluster so pg_statistic is frozen before the old counters are transplanted.prepare_new_globals— callsset_frozenxids(false)then restoresglobals.dump(roles, tablespaces).create_new_objects— pg_restore per database, template1 first, then parallel; callsget_db_rel_and_slot_infoson the new cluster after.copy_xact_xlog_xid— copies pg_xact / pg_multixact, calls pg_resetwal for XID, epoch, OID, multixact counters, and WAL position.set_frozenxids— issues UPDATE onpg_classandpg_databaseto align frozen-XID markers with the old cluster’s XID counter.set_new_cluster_char_signedness— PG18 char-signedness alignment.set_locale_and_encoding— UPDATEs template0’sdatcollate,datctype,datlocprovider,datlocalein the new cluster to match the old.create_logical_replication_slots— per-database loop callingpg_create_logical_replication_slot.
check.c — pre-flight
Section titled “check.c — pre-flight”check_and_dump_old_cluster— orchestrates old-cluster checks and callsgenerate_old_dumpat the end.check_new_cluster— new-cluster checks: empty check, loadable libraries, transfer-mode-specific checks (clone / copy_file_range / link / swap), logical slot and subscription state.check_cluster_versions— enforces minimum source version (9.2), target must be current PG, no downgrade, binaries match data dirs. PG18: rejects--set-char-signednessfor source >= 18.check_cluster_compatibility— callsget_control_data+check_control_data; rejects port collision for live-check mode.check_for_data_types_usage— iteratesdata_types_usage_checks[], buildsUpgradeTasksteps, runs per-database; reports failures and aborts.DataTypesUsageChecks— static array; each entry hasstatus,report_filename,base_query,report_text,threshold_version, optionalversion_hook.check_for_not_null_inheritance— PG18-new: rejects child tables missing parent NOT NULL constraints (schema restore would fail).
relfilenumber.c — file transfer
Section titled “relfilenumber.c — file transfer”transfer_all_new_tablespaces— entry point; dispatches by mode (parallel-by-tablespace forjobs > 1).transfer_all_new_dbs— iterates old/new database pairs, callsgen_db_file_mapsthentransfer_single_new_db.transfer_single_new_db— checksvm_must_add_frozenbit; routes todo_swapor per-maptransfer_relfileloop.transfer_relfile— iterates 1 GB segment files +_fsm/_vmforks; callsrewriteVisibilityMapwhen the frozen bit must be added; dispatches tocloneFile/copyFile/copyFileByRange/linkFileper mode.do_swap— sorts maps by relfilenumber, callsprepare_for_swapthenswap_catalog_filesper tablespace.prepare_for_swap— renames old db dir into new cluster’s slot, createsmoved_for_upgrade/staging area.swap_catalog_files— moves old catalog files aside, moves pg_restore-generated catalog files into place, fsyncs viasync_queue.sync_queue_*— fixed-size 1024-path queue withpre_sync_fname+ batchfsync; drains on full or at end.
pg_upgrade.h — key types
Section titled “pg_upgrade.h — key types”ClusterInfo— per-cluster state:controldata,dbarr,pgdata,bindir,port,major_version,tablespace_suffix.ControlData— pg_control snapshot:cat_ver,chkpnt_nxtxid,chkpnt_nxtoid,chkpnt_nxtmulti,blocksz,walseg,default_char_signedness(PG18).FileNameMap— per-relation transfer mapping: old/new tablespace paths + suffixes,db_oid,relfilenumber.DbInfo/DbInfoArr— per-database:db_oid,db_name,rel_arr,slot_arr.RelInfo— per-relation:nspname,relname,reloid,relfilenumber,tablespace.LogicalSlotInfo/LogicalSlotInfoArr— per-slot:slotname,plugin,two_phase,failover.UserOpts— parsed CLI options:check,live_check,transfer_mode,jobs,char_signedness(PG18),do_statistics.transferModeenum —TRANSFER_MODE_{CLONE,COPY,COPY_FILE_RANGE,LINK,SWAP}.
Position hints (as of 2026-06-06 / commit 273fe94)
Section titled “Position hints (as of 2026-06-06 / commit 273fe94)”| Symbol | File | Line |
|---|---|---|
main | pg_upgrade.c | 88 |
make_outputdirs | pg_upgrade.c | 252 |
setup | pg_upgrade.c | 337 |
set_locale_and_encoding | pg_upgrade.c | 440 |
prepare_new_cluster | pg_upgrade.c | 519 |
prepare_new_globals | pg_upgrade.c | 549 |
create_new_objects | pg_upgrade.c | 571 |
copy_xact_xlog_xid | pg_upgrade.c | 749 |
set_frozenxids | pg_upgrade.c | 874 |
create_logical_replication_slots | pg_upgrade.c | 976 |
set_new_cluster_char_signedness | pg_upgrade.c | 404 |
output_check_banner | check.c | 570 |
check_and_dump_old_cluster | check.c | 588 |
check_new_cluster | check.c | 709 |
check_cluster_versions | check.c | 849 |
check_cluster_compatibility | check.c | 904 |
check_for_data_types_usage | check.c | 463 |
data_types_usage_checks[] | check.c | 98 |
check_for_not_null_inheritance | check.c | — (grep: “not_null_inheritance”) |
DataTypesUsageChecks (struct) | check.c | 42 |
transfer_all_new_tablespaces | relfilenumber.c | 107 |
transfer_all_new_dbs | relfilenumber.c | 170 |
prepare_for_swap | relfilenumber.c | 236 |
swap_catalog_files | relfilenumber.c | 362 |
do_swap | relfilenumber.c | 452 |
transfer_single_new_db | relfilenumber.c | 500 |
transfer_relfile | relfilenumber.c | 552 |
sync_queue_push | relfilenumber.c | 74 |
FileNameMap (struct) | pg_upgrade.h | 180 |
ClusterInfo (struct) | pg_upgrade.h | 287 |
ControlData (struct) | pg_upgrade.h | 229 |
UserOpts (struct) | pg_upgrade.h | 328 |
transferMode (enum) | pg_upgrade.h | 259 |
RESTORE_TRANSACTION_SIZE | pg_upgrade.c | 58 |
DEFAULT_CHAR_SIGNEDNESS_CAT_VER | pg_upgrade.h | 132 |
Source verification (as of 2026-06-06)
Section titled “Source verification (as of 2026-06-06)”Verified facts
Section titled “Verified facts”-
--swaptransfer mode is present in REL_18_STABLE.TRANSFER_MODE_SWAPappears in thetransferModeenum (pg_upgrade.h:265) and is dispatched intransfer_single_new_db. The swap-specific helpers (prepare_for_swap,swap_catalog_files,do_swap,sync_queue_*) are all present inrelfilenumber.c. Swap mode requires source >= PG10 (pg_upgrade.cmain,check.c:753). -
--set-char-signednessoption is new in PG18 and is rejected for source >= 18.UserOpts.char_signednessdefaults to-1(unset). The option is accepted from the CLI.check_cluster_versions(check.c:895) aborts with a clear message if source cluster major version >= 18 and the option was supplied.set_new_cluster_char_signedness(pg_upgrade.c:404) callspg_resetwal --char-signednessonly when the new cluster’s value differs from the resolved target. -
RESTORE_TRANSACTION_SIZEis 1000 (hard-coded). Defined atpg_upgrade.c:58. A comment notes this could become user-controllable; it is not a GUC. In parallel mode (jobs > 1) the per-job transaction size isMax(1000 / jobs, 10). -
check_for_not_null_inheritanceis a PG18-new check. The version gate incheck_and_dump_old_clusteris<= 1800, meaning it fires for all source versions up to and including PG18 — the check is unconditional for the current target. Its purpose is to catch the pre-PG18 schema where child tables could omit parent NOT NULL constraints. -
pg_xact is copied verbatim; pg_multixact copy is version-gated.
copy_xact_xlog_xidcopiespg_xactunconditionally. pg_multixact files are copied only if both clusters havecat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER; otherwise only the counter values are reset (not the files). -
Logical replication slot migration is conditional on
count_old_cluster_logical_slots() > 0. The new postmaster is started a third time only if old-cluster slots exist. This avoids the extra postmaster cycle for clusters without logical replication.
Open questions
Section titled “Open questions”-
check_for_not_null_inheritanceline number. The function is declared as a static incheck.c:28but the definition was not reached in the lines read (check.c runs 2376 lines). Investigation path:grep -n check_for_not_null_inheritance /data/hgryoo/references/postgres/src/bin/pg_upgrade/check.c. -
--do-statisticsoption behavior.UserOpts.do_statisticsis parsed but its effect in the pipeline (whether it triggers a separate statistics transfer step) was not traced in the lines read. Investigation path:grep -n do_statistics /data/hgryoo/references/postgres/src/bin/pg_upgrade/*.c. -
Swap mode tablespace limitation.
prepare_for_swapnotes a comment (“XXX: The below line is a hack”) that the new tablespace path is assumed equal to the old tablespace path, blocking support for in-place tablespaces in swap mode. Whether this is tracked as a known limitation or a planned fix is not captured in the source.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”-
Oracle Database upgrade (
dbupgrade/ DBUA) — Oracle performs in-place catalog upgrade by running upgrade scripts against the running database. No physical file transfer needed because the heap format is stable; the catalog DDL is versioned separately. A comparison would reveal whether PostgreSQL’s schema-dump + OID-pin approach has lower correctness risk than Oracle’s in-place catalog mutation. -
MySQL / InnoDB upgrade — InnoDB marks data dictionary tables with format versions; the server runs DDL upgrade on first start. No analog to pg_upgrade’s explicit OID control; InnoDB’s clustered-index design avoids the toast-pointer OID problem that makes OID stability mandatory in PostgreSQL.
-
pg_upgrade + logical replication as a zero-downtime path — Combining pg_upgrade (for the physical copy) with logical replication (to replay in-flight writes during the copy window) is a documented production pattern. The
--swapmode (PG18) shrinks the physical-copy window to near zero even for large databases, making this combination more practical. -
Reflink / copy-on-write upgrade —
TRANSFER_MODE_CLONEusesioctl(FICLONE)orcopy_file_rangewith reflink, available on btrfs and XFS with reflink. This makes the transfer instantaneous and keeps the old files intact (copy-on-write on first write to either copy). The performance implications on WAL-heavy workloads post-upgrade warrant measurement.
Sources
Section titled “Sources”Raw files
Section titled “Raw files”(none — synthesized directly from source tree)
Source code (REL_18_STABLE, commit 273fe94)
Section titled “Source code (REL_18_STABLE, commit 273fe94)”src/bin/pg_upgrade/pg_upgrade.c— main pipeline, helperssrc/bin/pg_upgrade/pg_upgrade.h— all key typessrc/bin/pg_upgrade/check.c— pre-flight checkssrc/bin/pg_upgrade/relfilenumber.c— relation-file transfer
Adjacent docs
Section titled “Adjacent docs”knowledge/code-analysis/postgres/postgres-pg-dump-restore.md— pg_dump / pg_restore mechanism (the schema-dump and restore steps pg_upgrade calls)knowledge/code-analysis/postgres/postgres-page-layout.md— heap page format (why pages are transferable across versions)knowledge/code-analysis/postgres/postgres-mvcc-snapshots.md— XID visibility (why commit-log transplant is necessary)knowledge/code-analysis/postgres/postgres-initdb-bootstrap-genbki.md— initdb / binary-upgrade mode (how the new cluster’s OID assignments are forced to match)