PostgreSQL pg_upgrade failures: extension, collation, and locale gotchas

You ran pg_upgrade --check and it reported a collation version mismatch. Or the upgrade finished, your application reconnected, and PostGIS functions failed with could not access file '$libdir/postgis-3'. Maybe SELECT queries against text columns returned rows in the wrong order, or unique indexes threw violations against previously clean data. These are structural gaps between what pg_upgrade copies (catalog metadata) and what it does not verify (shared libraries, OS collation rules, and runtime configuration).

If you used --link mode, the situation is worse. Once the new cluster starts, do not restart the old cluster. The two clusters share hard-linked files. Starting the old cluster after the new one has written WAL corrupts both. Rollback is only possible if the new cluster was never started.

What this means

pg_upgrade copies system catalogs – extensions, types, functions, and collation objects – but does not reinstall extensions, rebuild indexes against current OS collation rules, or copy postgresql.conf. When the new cluster’s environment diverges from those assumptions, failures surface as library load errors, collation mismatch warnings, or silent index corruption.

The most deceptive failures happen when the upgrade appears to succeed. A SQL-only function like PostGIS_Full_Version() runs without loading a shared library, while a C function like ST_Contains fails immediately. A B-tree index on a text column continues to exist but returns rows in an order that no longer matches current collation rules, producing wrong results or unique-constraint violations.

flowchart TD
    A[pg_upgrade copies catalogs] --> B[New cluster starts]
    A -.->|does not verify| C[Extension .so libraries]
    A -.->|carries forward| D[Collation version metadata]
    E[OS/library upgrade] -->|new rules| D
    C -->|missing| F[Extension call fails]
    D -->|drift| G[Wrong index ordering]
    B --> H[Queries return errors or wrong results]
    F --> I[PostGIS catch-22]
    G --> J[Unique violation on existing data]

Common causes

CauseWhat it looks likeFirst thing to check
Missing extension shared library in new clusterERROR: could not access file "$libdir/postgis-3" when calling a C function after upgrade; SQL-only functions like PostGIS_Full_Version() still workFile existence in the new cluster’s library directory via pg_config --pkglibdir
glibc or ICU collation version driftWrong sort order, unique-index violations on existing data, or startup warnings about collation version mismatchSELECT pg_collation_actual_version(oid) FROM pg_collation WHERE collversion IS NOT NULL LIMIT 1;
Data checksum incompatibilitypg_upgrade refuses with “old cluster does not use data checksums but the new one does”pg_controldata checksum state on the old cluster
Utility extensions requiring manual removalpg_upgrade warns or fails because extensions such as pg_repack or hypopg must be dropped before upgrade and recreated afterSELECT extname FROM pg_extension WHERE extname IN ('pg_repack', 'hypopg');
Uncommitted prepared transactions“There are uncommitted prepared transactions” in pg_upgrade outputSELECT * FROM pg_prepared_xacts;

Quick checks

# Check for prepared transactions that block pg_upgrade
psql -c "SELECT * FROM pg_prepared_xacts;"
# Identify extensions that need manual handling
psql -c "SELECT extname, extversion FROM pg_extension WHERE extname IN ('postgis', 'pg_repack', 'hypopg');"
# List collations with version metadata
psql -c "SELECT collname, collprovider, collversion FROM pg_collation WHERE collversion IS NOT NULL;"
# Compare actual OS glibc collation version against catalog
psql -c "SELECT pg_collation_actual_version(oid) FROM pg_collation WHERE collprovider = 'c' AND collversion IS NOT NULL LIMIT 1;"
# Identify replication slots that must be dropped before upgrade
psql -c "SELECT slot_name, slot_type FROM pg_replication_slots;"
# Verify data checksum state on the old cluster
pg_controldata /path/to/old/datadir | grep -i checksum
# Confirm extension shared libraries exist in the new cluster
ls "$(/usr/lib/postgresql/16/bin/pg_config --pkglibdir)"/postgis*.so

How to diagnose it

  1. Run pg_upgrade --check with the exact flags you intend to use for the real upgrade, including --link or --clone if applicable. Do not skip this step.
  2. If the check reports extension conflicts, verify whether the extension includes C functions. Extensions that rely on shared libraries need the .so file in the new cluster’s library directory before the upgrade; SQL-only extensions typically upgrade via ALTER EXTENSION.
  3. Check for prepared transactions with pg_prepared_xacts. Commit or rollback any that exist.
  4. Check collation version drift. Run pg_collation_actual_version(oid) for glibc-backed collations and compare the result against the collversion column in pg_collation. A difference means the OS locale rules changed between versions.
  5. Check the old cluster’s checksum state. If the new cluster defaults to checksums and the old cluster lacks them, use pg_checksums --enable on the old cluster after a clean shutdown.
  6. List replication slots. Drop physical slots before upgrade; they do not survive major version upgrades.
  7. After a successful --check, perform the upgrade. Start the new cluster and immediately test one C-function call per extension per database.
  8. On any database where collation drift was detected, run REINDEX DATABASE ... during a maintenance window, then refresh the tracked version with ALTER DATABASE ... REFRESH COLLATION VERSION on PostgreSQL 14+.

Fixes

Missing extension shared libraries

Install the extension package into the new cluster before running pg_upgrade. Installing after the upgrade and running ALTER EXTENSION name UPDATE; may not force the library to load if the catalog version already matches; a cluster restart may be required.

The PostGIS catch-22 is particularly deceptive: PostGIS_Full_Version() is a SQL function and succeeds even when the underlying C library is missing. Verify with a C function such as SELECT ST_AsText(ST_MakePoint(0, 0));. Drop utility extensions such as pg_repack and hypopg before the upgrade and recreate them after.

Collation version drift

If the OS glibc or ICU version changed between the old and new cluster installation, B-tree indexes on text columns may return incorrect results because the index ordering no longer matches current collation rules. After upgrade, run REINDEX DATABASE name; for each affected database during a maintenance window; this locks indexes and blocks writes. On PostgreSQL 14+, follow with ALTER DATABASE name REFRESH COLLATION VERSION;. For ICU collations specifically, use ALTER COLLATION pg_catalog."name" REFRESH VERSION;.

Data checksum incompatibility

When upgrading to PostgreSQL 18 on systems where the new cluster defaults to data page checksums, pg_upgrade refuses if the old cluster lacks them. Take the old cluster offline cleanly and run pg_checksums --enable -D /path/to/old/datadir. This rewrites data pages and requires sufficient disk I/O headroom. Do not attempt this on a running instance.

Prepared transactions and replication slots

Commit or rollback all prepared transactions before upgrade. Drop physical replication slots; they do not survive major version upgrades. Recreate them afterward.

Configuration and statistics

pg_upgrade does not copy postgresql.conf. Manually migrate shared_preload_libraries, max_connections, work_mem, and other GUCs. After upgrade, run vacuumdb --all --analyze-in-stages to rebuild optimizer statistics. Extended statistics created with CREATE STATISTICS may need to be repopulated with ANALYZE.

If you used --link mode and the new cluster has started, do not restart the old cluster. The two clusters share hard-linked data files. Starting the old cluster after the new one has written WAL corrupts both. Rollback is impossible. Have a tested logical or base backup before you begin.

Prevention

  • Always run pg_upgrade --check with the intended mode flags before the real upgrade.
  • Verify extension package availability in the new cluster’s library directory before starting.
  • Test one C-function call per extension in a staging environment.
  • Check OS glibc and ICU versions. If they differ from the old environment, plan for a post-upgrade REINDEX and REFRESH COLLATION VERSION.
  • Drop utility extensions such as pg_repack and hypopg before upgrade.
  • Resolve all prepared transactions and drop physical replication slots.
  • Verify that data checksum states are compatible, or enable checksums on the old cluster beforehand.
  • Run pg_amcheck on the old cluster before upgrade to detect silent corruption that pg_upgrade would copy forward.
  • Archive WAL and verify base backups before upgrade.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
pg_collation_actual_version vs collversionDetects OS collation rule changes before indexes return wrong dataValues differ for glibc or ICU collations
Extension C-function call successVerifies that installed extensions have matching shared librariesERROR: could not access file "$libdir/..."
pg_prepared_xacts countPrepared transactions block pg_upgrade entirelyAny non-empty result before upgrade
pg_replication_slots countPhysical slots do not survive major version upgradesSlots present immediately before upgrade
data_checksums setting (old vs new)Checksum mismatch may prevent pg_upgradeOld cluster off, new cluster on

How Netdata helps

  • Correlate PostgreSQL log warnings about collation version mismatch or extension load failures with system-level events on the host.
  • Alert on replication slot lag and WAL growth before upgrade to prevent disk-full incidents during the transition window.
  • Monitor query error rates post-upgrade to detect extension function failures that application smoke tests might miss.
  • Track I/O saturation during post-upgrade REINDEX or vacuumdb operations to avoid unexpected latency spikes.