$ guides / postgres / postgres-frozen-xid-monitoring ▌

Operations Guides

PostgreSQL frozen XID monitoring: catching wraparound 6 months early

Transaction ID wraparound is a slow burn, not a sudden crisis. The 32-bit XID counter advances with every transaction. If VACUUM freeze does not keep pace, PostgreSQL will eventually refuse new transactions to prevent data corruption. By the time built-in log warnings fire, you may have only hours of runway left. Monitor age(datfrozenxid) and age(relfrozenxid) with tiered thresholds so you act with months of lead time.

To use those metrics effectively, you need to know what blocks freeze progress and which response matches each threshold tier. This guide covers the freeze mechanism, exact queries, common blocker patterns, and operational thresholds.

What XID wraparound is and why it matters

Wraparound is a hard architectural limit. PostgreSQL assigns every transaction a 32-bit unsigned transaction ID, cycling at roughly 2.147 billion transactions. To avoid cycling back over live data, VACUUM marks old row versions as frozen. The oldest unfrozen XID in each database is recorded in pg_database.datfrozenxid. age(datfrozenxid) returns the transaction span between that frozen point and the current XID counter.

When age(datfrozenxid) grows, the database is consuming XIDs faster than freeze advances. PostgreSQL emits warnings when approximately 40 million XIDs remain (age roughly 2.107 billion). When roughly 3 million XIDs remain, the database refuses new transactions. There is no parameter that expands the 32-bit space. Your only operational defense is to monitor age() per database and per table, remove blockers, and ensure anti-wraparound vacuum completes.

How freezing works

Freeze is a side effect of VACUUM. Every table stores a relfrozenxid in pg_class that tracks the oldest unfrozen XID in that table. When VACUUM runs, it freezes tuples older than vacuum_freeze_min_age and performs an aggressive freeze scan when the table’s age exceeds vacuum_freeze_table_age (default 150 million). If a table’s age crosses autovacuum_freeze_max_age (default 200 million), PostgreSQL launches an anti-wraparound autovacuum even if autovacuum is globally disabled. You can identify these emergency workers in pg_stat_activity by the query string pattern autovacuum: ... to prevent wraparound.

relfrozenxid updates only when VACUUM finishes. If a freeze scan runs for hours, the reported table age does not drop until completion. This confuses operators who expect immediate progress.

The database-level datfrozenxid is the oldest relfrozenxid across all tables in that database, including system catalogs. Even a user database with no application writes advances its XID counter because background processes and system catalog churn consume transaction IDs. Monitor all databases, not just application-facing ones.

flowchart LR
    XID[XID counter increments] --> VAC{VACUUM freeze}
    VAC -->|Unblocked| FROZEN[Tuple frozen
relfrozenxid advances]
    FROZEN --> DBAGE[datfrozenxid advances]
    DBAGE --> SAFE[Wraparound risk resets]
    VAC -->|Blocked by long tx
or replication slot| STALL[Freeze stalls]
    STALL --> AGE[age(datfrozenxid) grows]
    AGE --> WRAP[Approach 2.147B ceiling]

Where freeze lag shows up in production

Freeze stalls when something pins the visibility horizon VACUUM uses to judge whether a tuple can be frozen. The two most common blockers are long-running transactions and inactive replication slots.

An open transaction holds a snapshot that prevents VACUUM from freezing tuples created after it started. This includes idle in transaction sessions left by connection leaks or ORM auto-begin behavior. While the transaction sits idle, age(datfrozenxid) climbs.

Replication slots, especially logical slots for CDC or forgotten physical slots, retain WAL and hold back the freeze horizon (logical slots via catalog_xmin, physical slots via xmin). An inactive slot forces the primary to retain WAL and prevents freeze from advancing. Combined with WAL growth, this can fill the disk and compound the outage.

The monitoring surface: database, table, and multixact metrics

Run these checks daily or stream them into your time-series store.

Database-level age. The primary scalar for wraparound risk:

SELECT datname, age(datfrozenxid)
FROM pg_database
ORDER BY 2 DESC;

Table-level age. Identifies which tables are lagging. Check the main table and its TOAST table; TOAST has its own relfrozenxid and can independently drive database age:

SELECT n.nspname, c.relname, age(c.relfrozenxid), age(t.relfrozenxid)
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_class t ON t.oid = c.reltoastrelid
WHERE c.relkind = 'r';

Multixact age. Multixact IDs wrap independently and are consumed by row-level share locks. A database heavy on FOR SHARE or FOR KEY SHARE can hit multixact wraparound even when XID age looks safe. The equivalent forced-vacuum threshold is autovacuum_multixact_freeze_max_age:

SELECT datname, mxid_age(datminmxid)
FROM pg_database
ORDER BY 2 DESC;

Vacuum progress. When you are actively remediating, confirm that freeze vacuum is scanning rather than blocked. Focus on phase; values like scanning heap or vacuuming indexes confirm forward motion. If heap_blks_scanned is static for minutes, the worker is likely blocked on a lock or I/O:

SELECT pid, phase, heap_blks_scanned, heap_blks_total
FROM pg_stat_progress_vacuum;

Alert thresholds and tiered response

Use raw XID age against the 2.147 billion ceiling. Do not wait for PostgreSQL’s built-in warnings.

Tier	age(datfrozenxid)	Meaning	Response
Early warning	500 million	Roughly 25% of safe range consumed	Review autovacuum schedule and blocker queries
Urgent	1 billion	Roughly 50% consumed	Schedule manual `VACUUM FREEZE` windows; terminate blockers
Emergency	1.5 billion	Roughly 75% consumed	Emergency `VACUUM FREEZE` on lagging tables; page on-call
Critical	2.107 billion	~40 million XIDs remain	PostgreSQL warnings are firing; writes will stop imminently

Anti-wraparound autovacuum triggers automatically when a table crosses autovacuum_freeze_max_age (default 200 million), but the database-level datfrozenxid can still be driven upward by a single lagging table or a blocked vacuum. Tiered alerts give you time to act before the automatic emergency.

Warning: VACUUM FREEZE acquires a SHARE UPDATE EXCLUSIVE lock and generates heavy sequential I/O. Run it during a maintenance window on large tables, and verify the freeze is not blocked before starting.

Blockers that stall freeze progress

Before running manual freeze, check for the blockers that make vacuum ineffective.

Long-running transactions. Check for sessions with old xact_start timestamps, especially those in idle in transaction:

SELECT pid, usename, state, xact_start, query
FROM pg_stat_activity
WHERE xact_start < NOW() - INTERVAL '5 minutes'
ORDER BY xact_start;

Set idle_in_transaction_session_timeout to a reasonable value (for example, 2 minutes) to prevent connection leaks from holding back freeze indefinitely.

Inactive replication slots. Check for slots that are not currently active but are retaining WAL:

SELECT slot_name, active, restart_lsn
FROM pg_replication_slots
WHERE NOT active;

Drop slots for decommissioned consumers. On PostgreSQL 13 and later, set max_slot_wal_keep_size to prevent an inactive slot from filling the disk.

Signals to watch in production

Signal	Why it matters	Warning sign
`age(datfrozenxid)`	Primary scalar for wraparound risk	Exceeds 500 million
`age(relfrozenxid)` per table	Identifies specific lagging tables	Any table approaching 1 billion
`mxid_age(datminmxid)`	Multixact wraparound is a separate failure mode	Growing steadily across databases
`pg_stat_activity.query`	Confirms anti-wraparound vacuum is running	Pattern matches `to prevent wraparound`
`pg_replication_slots.active`	Inactive slots block freeze and fill disk	`false` with a stale `restart_lsn`
`pg_stat_activity.state`	Open transactions pin the freeze horizon	`idle in transaction` with an old `xact_start`

How Netdata helps

Netdata collects age(datfrozenxid) and age(relfrozenxid) as time-series, exposing upward trends before a threshold breach.
Alerts on XID and multixact age can be set to the tiers above, paging you before PostgreSQL’s built-in warnings fire.
pg_stat_progress_vacuum metrics show whether a remediation vacuum is scanning heap blocks or stalled.
Correlating XID age with pg_stat_activity state counts and replication slot status distinguishes “VACUUM cannot keep up” from “VACUUM is blocked by a long transaction or slot.”

The Netdata solution

PostgreSQL monitoring with Netdata

Netdata monitors PostgreSQL with per-second metrics, pre-built dashboards, and ML-powered anomaly detection. Correlate connection saturation, lock waits, autovacuum progress, replication lag, and checkpoint I/O against the rest of your stack so you catch the incidents in these runbooks before they page anyone.

See PostgreSQL monitoring → Start monitoring free

PostgreSQL frozen XID monitoring: catching wraparound 6 months early

PostgreSQL frozen XID monitoring: catching wraparound 6 months early

What XID wraparound is and why it matters

How freezing works

Where freeze lag shows up in production

The monitoring surface: database, table, and multixact metrics

Alert thresholds and tiered response

Blockers that stall freeze progress

Signals to watch in production

How Netdata helps

Related guides

PostgreSQL monitoring with Netdata