PostgreSQL pg_upgrade failures: extension, collation, and locale gotchas
You ran pg_upgrade --check and it reported a collation version mismatch. Or the upgrade finished, your application reconnected, and PostGIS functions failed with could not access file '$libdir/postgis-3'. Maybe SELECT queries against text columns returned rows in the wrong order, or unique indexes threw violations against previously clean data. These are structural gaps between what pg_upgrade copies (catalog metadata) and what it does not verify (shared libraries, OS collation rules, and runtime configuration).
If you used --link mode, the situation is worse. Once the new cluster starts, do not restart the old cluster. The two clusters share hard-linked files. Starting the old cluster after the new one has written WAL corrupts both. Rollback is only possible if the new cluster was never started.
What this means
pg_upgrade copies system catalogs – extensions, types, functions, and collation objects – but does not reinstall extensions, rebuild indexes against current OS collation rules, or copy postgresql.conf. When the new cluster’s environment diverges from those assumptions, failures surface as library load errors, collation mismatch warnings, or silent index corruption.
The most deceptive failures happen when the upgrade appears to succeed. A SQL-only function like PostGIS_Full_Version() runs without loading a shared library, while a C function like ST_Contains fails immediately. A B-tree index on a text column continues to exist but returns rows in an order that no longer matches current collation rules, producing wrong results or unique-constraint violations.
flowchart TD
A[pg_upgrade copies catalogs] --> B[New cluster starts]
A -.->|does not verify| C[Extension .so libraries]
A -.->|carries forward| D[Collation version metadata]
E[OS/library upgrade] -->|new rules| D
C -->|missing| F[Extension call fails]
D -->|drift| G[Wrong index ordering]
B --> H[Queries return errors or wrong results]
F --> I[PostGIS catch-22]
G --> J[Unique violation on existing data]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Missing extension shared library in new cluster | ERROR: could not access file "$libdir/postgis-3" when calling a C function after upgrade; SQL-only functions like PostGIS_Full_Version() still work | File existence in the new cluster’s library directory via pg_config --pkglibdir |
| glibc or ICU collation version drift | Wrong sort order, unique-index violations on existing data, or startup warnings about collation version mismatch | SELECT pg_collation_actual_version(oid) FROM pg_collation WHERE collversion IS NOT NULL LIMIT 1; |
| Data checksum incompatibility | pg_upgrade refuses with “old cluster does not use data checksums but the new one does” | pg_controldata checksum state on the old cluster |
| Utility extensions requiring manual removal | pg_upgrade warns or fails because extensions such as pg_repack or hypopg must be dropped before upgrade and recreated after | SELECT extname FROM pg_extension WHERE extname IN ('pg_repack', 'hypopg'); |
| Uncommitted prepared transactions | “There are uncommitted prepared transactions” in pg_upgrade output | SELECT * FROM pg_prepared_xacts; |
Quick checks
# Check for prepared transactions that block pg_upgrade
psql -c "SELECT * FROM pg_prepared_xacts;"
# Identify extensions that need manual handling
psql -c "SELECT extname, extversion FROM pg_extension WHERE extname IN ('postgis', 'pg_repack', 'hypopg');"
# List collations with version metadata
psql -c "SELECT collname, collprovider, collversion FROM pg_collation WHERE collversion IS NOT NULL;"
# Compare actual OS glibc collation version against catalog
psql -c "SELECT pg_collation_actual_version(oid) FROM pg_collation WHERE collprovider = 'c' AND collversion IS NOT NULL LIMIT 1;"
# Identify replication slots that must be dropped before upgrade
psql -c "SELECT slot_name, slot_type FROM pg_replication_slots;"
# Verify data checksum state on the old cluster
pg_controldata /path/to/old/datadir | grep -i checksum
# Confirm extension shared libraries exist in the new cluster
ls "$(/usr/lib/postgresql/16/bin/pg_config --pkglibdir)"/postgis*.so
How to diagnose it
- Run
pg_upgrade --checkwith the exact flags you intend to use for the real upgrade, including--linkor--cloneif applicable. Do not skip this step. - If the check reports extension conflicts, verify whether the extension includes C functions. Extensions that rely on shared libraries need the
.sofile in the new cluster’s library directory before the upgrade; SQL-only extensions typically upgrade viaALTER EXTENSION. - Check for prepared transactions with
pg_prepared_xacts. Commit or rollback any that exist. - Check collation version drift. Run
pg_collation_actual_version(oid)for glibc-backed collations and compare the result against thecollversioncolumn inpg_collation. A difference means the OS locale rules changed between versions. - Check the old cluster’s checksum state. If the new cluster defaults to checksums and the old cluster lacks them, use
pg_checksums --enableon the old cluster after a clean shutdown. - List replication slots. Drop physical slots before upgrade; they do not survive major version upgrades.
- After a successful
--check, perform the upgrade. Start the new cluster and immediately test one C-function call per extension per database. - On any database where collation drift was detected, run
REINDEX DATABASE ...during a maintenance window, then refresh the tracked version withALTER DATABASE ... REFRESH COLLATION VERSIONon PostgreSQL 14+.
Fixes
Missing extension shared libraries
Install the extension package into the new cluster before running pg_upgrade. Installing after the upgrade and running ALTER EXTENSION name UPDATE; may not force the library to load if the catalog version already matches; a cluster restart may be required.
The PostGIS catch-22 is particularly deceptive: PostGIS_Full_Version() is a SQL function and succeeds even when the underlying C library is missing. Verify with a C function such as SELECT ST_AsText(ST_MakePoint(0, 0));. Drop utility extensions such as pg_repack and hypopg before the upgrade and recreate them after.
Collation version drift
If the OS glibc or ICU version changed between the old and new cluster installation, B-tree indexes on text columns may return incorrect results because the index ordering no longer matches current collation rules. After upgrade, run REINDEX DATABASE name; for each affected database during a maintenance window; this locks indexes and blocks writes. On PostgreSQL 14+, follow with ALTER DATABASE name REFRESH COLLATION VERSION;. For ICU collations specifically, use ALTER COLLATION pg_catalog."name" REFRESH VERSION;.
Data checksum incompatibility
When upgrading to PostgreSQL 18 on systems where the new cluster defaults to data page checksums, pg_upgrade refuses if the old cluster lacks them. Take the old cluster offline cleanly and run pg_checksums --enable -D /path/to/old/datadir. This rewrites data pages and requires sufficient disk I/O headroom. Do not attempt this on a running instance.
Prepared transactions and replication slots
Commit or rollback all prepared transactions before upgrade. Drop physical replication slots; they do not survive major version upgrades. Recreate them afterward.
Configuration and statistics
pg_upgrade does not copy postgresql.conf. Manually migrate shared_preload_libraries, max_connections, work_mem, and other GUCs. After upgrade, run vacuumdb --all --analyze-in-stages to rebuild optimizer statistics. Extended statistics created with CREATE STATISTICS may need to be repopulated with ANALYZE.
–link mode safety
If you used --link mode and the new cluster has started, do not restart the old cluster. The two clusters share hard-linked data files. Starting the old cluster after the new one has written WAL corrupts both. Rollback is impossible. Have a tested logical or base backup before you begin.
Prevention
- Always run
pg_upgrade --checkwith the intended mode flags before the real upgrade. - Verify extension package availability in the new cluster’s library directory before starting.
- Test one C-function call per extension in a staging environment.
- Check OS glibc and ICU versions. If they differ from the old environment, plan for a post-upgrade
REINDEXandREFRESH COLLATION VERSION. - Drop utility extensions such as
pg_repackandhypopgbefore upgrade. - Resolve all prepared transactions and drop physical replication slots.
- Verify that data checksum states are compatible, or enable checksums on the old cluster beforehand.
- Run
pg_amcheckon the old cluster before upgrade to detect silent corruption that pg_upgrade would copy forward. - Archive WAL and verify base backups before upgrade.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
pg_collation_actual_version vs collversion | Detects OS collation rule changes before indexes return wrong data | Values differ for glibc or ICU collations |
| Extension C-function call success | Verifies that installed extensions have matching shared libraries | ERROR: could not access file "$libdir/..." |
pg_prepared_xacts count | Prepared transactions block pg_upgrade entirely | Any non-empty result before upgrade |
pg_replication_slots count | Physical slots do not survive major version upgrades | Slots present immediately before upgrade |
data_checksums setting (old vs new) | Checksum mismatch may prevent pg_upgrade | Old cluster off, new cluster on |
How Netdata helps
- Correlate PostgreSQL log warnings about collation version mismatch or extension load failures with system-level events on the host.
- Alert on replication slot lag and WAL growth before upgrade to prevent disk-full incidents during the transition window.
- Monitor query error rates post-upgrade to detect extension function failures that application smoke tests might miss.
- Track I/O saturation during post-upgrade
REINDEXorvacuumdboperations to avoid unexpected latency spikes.
Related guides
- How PostgreSQL actually works in production: a mental model for operators
- PostgreSQL ALTER TABLE blocked: zero-downtime DDL patterns
- PostgreSQL autovacuum blocked by long-running transaction: detection and fix
- PostgreSQL autovacuum not running: detection, causes, and fixes
- PostgreSQL autovacuum tuning: per-table thresholds for high-churn workloads
- PostgreSQL blocking queries: finding the root blocker in a lock cascade
- PostgreSQL checkpoint storms: detection, causes, and tuning
- PostgreSQL: checkpoints are occurring too frequently – what to tune
- PostgreSQL connection exhaustion: detection, diagnosis, and prevention
- PostgreSQL connection refused: pg_hba, listen_addresses, and TCP diagnosis
- PostgreSQL: database is not accepting commands to avoid wraparound data loss
- PostgreSQL dead tuples piling up: why autovacuum can’t keep up






