PostgreSQL autovacuum blocked by long-running transaction: detection and fix
Autovacuum workers are active, but n_dead_tup climbs and queries slow down as they scan dead tuples. Table bloat grows because VACUUM cannot reclaim dead row versions: a long-running transaction, abandoned replication slot, or hot-standby feedback is pinning the xmin horizon cluster-wide. The worker is not broken; it is blocked by an older transaction ID that must remain visible.
What this means
PostgreSQL’s MVCC keeps old tuple versions in the table until VACUUM removes them. VACUUM cannot remove any tuple that might still be visible to an active transaction. The boundary is the xmin horizon: the oldest transaction ID still active anywhere in the cluster.
If a backend holds backend_xmin or backend_xid far in the past, every dead tuple created after that point is frozen in place. Autovacuum scans the table, consumes I/O, and reports success, but n_dead_tup does not drop. On high-churn tables this causes rapid bloat, degraded sequential-scan performance, and increased I/O. The blockage is not table-local: one stale transaction protects dead tuples across every table in the database.
The ANALYZE phase of an autovacuum worker also acquires a snapshot and holds it for the entire duration, blocking cleanup while it runs.
flowchart TD
A[Long-running transaction or slot] --> B[ Pins xmin horizon ]
B --> C[ VACUUM cannot remove newer dead tuples ]
C --> D[ n_dead_tup grows despite autovacuum ]
D --> E[ Table bloat and scan slowdown ]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Long-running transaction on primary | pg_stat_activity shows active or idle in transaction with an old xact_start and backend_xmin set | SELECT age(backend_xmin), age(backend_xid) FROM pg_stat_activity WHERE backend_xmin IS NOT NULL |
| Abandoned replication slot | Disk usage growing on primary; slot consumer is offline | SELECT slot_name, age(xmin), age(catalog_xmin) FROM pg_replication_slots WHERE active = false |
| Orphaned prepared transaction | Two-phase commit transaction left in prepared state | SELECT gid, prepared, age(transaction) FROM pg_prepared_xacts |
| Hot standby feedback | Replica long query holds backend_xmin; primary bloat grows | SELECT application_name, age(backend_xmin) FROM pg_stat_replication |
Quick checks
# Backends holding back the xmin horizon
psql -c "SELECT pid, datname, usename, state,
backend_xmin, backend_xid,
clock_timestamp() - xact_start AS xact_age,
clock_timestamp() - state_change AS state_age
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
OR backend_xid IS NOT NULL
ORDER BY COALESCE(age(backend_xmin), age(backend_xid)) DESC;"
# Abandoned replication slots
psql -c "SELECT slot_name, slot_type, database,
active, age(xmin) AS xmin_age, age(catalog_xmin) AS catalog_age
FROM pg_replication_slots
WHERE xmin IS NOT NULL
OR catalog_xmin IS NOT NULL
ORDER BY age(xmin) DESC NULLS LAST, age(catalog_xmin) DESC NULLS LAST;"
# Orphaned prepared transactions
psql -c "SELECT gid, prepared, owner, database,
age(transaction) AS xact_age
FROM pg_prepared_xacts
ORDER BY age(transaction) DESC;"
# Standby xmin held back on the primary
psql -c "SELECT application_name, client_addr, state,
backend_xmin, age(backend_xmin) AS xmin_age
FROM pg_stat_replication
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;"
# Dead tuple growth on affected tables
psql -c "SELECT schemaname, relname, n_live_tup, n_dead_tup,
round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0), 3) AS dead_ratio,
last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;"
How to diagnose it
- Confirm autovacuum runs but does not reduce dead tuples. Query
pg_stat_user_tablesfor tables wheren_dead_tupis high andlast_autovacuumis recent. If the dead tuple count grows despite a recentlast_autovacuum, the worker is blocked. - Find the xmin horizon blocker on the primary. Query
pg_stat_activityfor rows wherebackend_xminorbackend_xidis not null. Order by age descending. The oldest value defines the horizon. Anidle in transactionsession underSERIALIZABLEisolation can block cleanup even without writes. - Check replication slots if no backend explains the horizon. Query
pg_replication_slots. An inactive slot with an oldxminorcatalog_xminretains WAL and prevents vacuum from cleaning tuples newer than that point. Logical slots holdcatalog_xminand block catalog vacuuming. - Check prepared transactions. Query
pg_prepared_xacts. A prepared transaction from a failed two-phase commit holds its XID indefinitely until committed or rolled back. - Check physical streaming standbys. If
hot_standby_feedback = on, a long query on a replica sends itsbackend_xminto the primary. Querypg_stat_replicationon the primary to seebackend_xminfrom each standby connection. - Correlate with verbose vacuum output. Run
VACUUM (VERBOSE)on the affected table. The output includes a line such asN dead row versions cannot be removed yet, oldest xmin: XXXXXX, confirming the exact horizon value.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
pg_stat_activity transaction start age | Directly measures how far back the xmin horizon is pinned | backend_xmin set and xact_start older than 5 minutes |
pg_stat_user_tables.n_dead_tup | Measures dead tuple accumulation | Growing across consecutive checks on a high-write table |
pg_replication_slots xmin / catalog_xmin | Slots pin the horizon even when no query is running | Inactive slot with non-null xmin or catalog_xmin |
age(datfrozenxid) | Wraparound safety; blocked vacuum delays freezing | > 500 million transactions (early); > 1 billion (urgent) |
| Dead tuple ratio | Bloat pressure that degrades scans | > 20% and rising |
pg_stat_replication.backend_xmin | Standby feedback can bloat the primary | Standby backend_xmin lagging behind primary |
Fixes
Terminate the blocking backend
If the blocker is a long-running query or an abandoned idle in transaction session on the primary, terminate it.
pg_cancel_backend interrupts the current query but does not close the transaction. The snapshot and xmin horizon remain until the backend disconnects or commits. Use pg_terminate_backend to force disconnection and roll back the transaction, which immediately releases the horizon.
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid = <blocker_pid>;
Warning: this kills the session and rolls back any uncommitted work.
Drop or advance the replication slot
If an abandoned slot is the cause, confirm the consumer is permanently offline, then drop it:
SELECT pg_drop_replication_slot('slot_name');
For logical slots where the subscriber is paused due to a schema mismatch, resolve the mismatch and let the subscriber advance, or drop and recreate the slot and reseed the subscriber. Dropping a slot means the consumer must be reinitialized if it comes back.
Roll back the prepared transaction
For orphaned two-phase commits:
ROLLBACK PREPARED 'gid';
Only use this if you are certain the prepared transaction is not part of an in-flight distributed transaction.
Reduce standby feedback impact
If a hot-standby query is holding back the primary, you have three options:
- Terminate the long query on the replica with
pg_terminate_backend. - Set
hot_standby_feedback = offon the standby (requires reload). Tradeoff: queries on the replica face more replication-conflict cancellations. - Reduce
max_standby_streaming_delayon the standby (requires reload) so WAL replay cancels long queries rather than waiting indefinitely.
Reclaim existing bloat
Once the horizon advances, subsequent autovacuum runs will reclaim the dead tuples. If bloat is already severe and query performance is critical, run pg_repack on the affected table to reclaim space online without the aggressive locking of VACUUM FULL. Avoid VACUUM FULL in production unless you can tolerate an exclusive lock for the duration.
Prevention
idle_in_transaction_session_timeout. Set this to kill abandoned sessions before they pin the horizon.statement_timeout. Prevent runaway queries from holding snapshots for hours.- Transaction age alerts. Alert on any
backend_xminolder than your threshold, not just query runtime. A transaction can stay open indefinitely if individual queries stay understatement_timeoutand gaps between queries stay underidle_in_transaction_session_timeout. - Replication slot monitoring. Alert on
active = falsewith a growingxminage. Setmax_slot_wal_keep_size(PostgreSQL 13+) to bound disk risk. - Cron-based terminators. A scheduled job that terminates backends with
xact_startolder than a hard wall-clock limit is a reliable backstop. PostgreSQL does not provide a nativetransaction_timeoutsetting.
How Netdata helps
- Correlate
postgres.table_dead_tupleswithpostgres.transaction_xmin_ageandpostgres.replication_slot_xmin_ageto identify whether dead tuple growth is caused by a pinned horizon. - Track
postgres.table_auto_vacuum_countandpostgres.table_last_autovacuum_timeagainstpostgres.table_dead_tuplesto spot ineffective autovacuum runs. - Alert on
postgres.database_transaction_id_ageto catch wraparound risk before it becomes an emergency. - Monitor
postgres.replication_slot_retain_wal_sizeto detect abandoned slots filling disk.
Related guides
- How PostgreSQL actually works in production: a mental model for operators
- PostgreSQL ALTER TABLE blocked: zero-downtime DDL patterns
- PostgreSQL autovacuum not running: detection, causes, and fixes
- PostgreSQL autovacuum tuning: per-table thresholds for high-churn workloads
- PostgreSQL blocking queries: finding the root blocker in a lock cascade
- PostgreSQL connection exhaustion: detection, diagnosis, and prevention
- PostgreSQL connection refused: pg_hba, listen_addresses, and TCP diagnosis
- PostgreSQL dead tuples piling up: why autovacuum can’t keep up
- PostgreSQL deadlock detected: how to diagnose and prevent deadlocks
- PostgreSQL idle in transaction: detecting and killing zombie sessions
- PostgreSQL index bloat: detection and REINDEX CONCURRENTLY recovery
- PostgreSQL ERROR: could not obtain lock — diagnosis and recovery






