PostgreSQL autovacuum not running: detection, causes, and fixes
Table sizes grow faster than insert rates. Query latency creeps up on UPDATE-heavy workloads. pg_stat_user_tables.n_dead_tup climbs while last_autovacuum stays frozen. Autovacuum should clean this up, but it is not. Dead tuple accumulation degrades performance and can eventually trigger transaction-ID-wraparound shutdown. Detect why autovacuum is stalled, find the blocker, and fix it without making things worse.
What this means
Autovacuum is a background subsystem that spawns workers to run VACUUM and ANALYZE based on table-level thresholds. When it works, it reclaims dead tuple space, updates the free space map, maintains the visibility map, and freezes old transaction IDs to prevent wraparound. When it stops, effects cascade: table and index bloat grow, sequential scans read more dead pages, the planner chooses worse plans as statistics stale, and database age advances toward the 2-billion-transaction hard limit.
“Not running” means three different things: the launcher is disabled, workers cannot launch because of locks or configuration, or workers are running but blocked by a long-running transaction or cost throttling.
flowchart TD
A[Dead tuples growing] --> B{Autovacuum enabled?}
B -->|No| C[Enable autovacuum]
B -->|Yes| D{Workers running?}
D -->|No| E{Thresholds met?}
E -->|No| F[Lower scale factor]
E -->|Yes| G[Check DDL locks]
D -->|Yes| H{Making progress?}
H -->|No| I[Check long transactions]
H -->|Yes| J[Check cost throttling]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Autovacuum disabled globally | No autovacuum workers in pg_stat_activity; last_autovacuum is stale across all tables | SHOW autovacuum; and SHOW autovacuum_max_workers; |
| Long transaction or idle-in-transaction session | n_dead_tup grows despite active autovacuum workers; old snapshots prevent cleanup | SELECT pid, state, xact_start FROM pg_stat_activity WHERE xact_start IS NOT NULL AND now() - xact_start > interval '5 minutes' ORDER BY xact_start; |
| DDL lock or heavy manual vacuum blocking worker | An autovacuum worker is stuck in pg_stat_activity with wait_event_type = 'Lock' | SELECT * FROM pg_locks WHERE NOT granted AND mode = 'ShareUpdateExclusiveLock'; |
| Thresholds too high for large or insert-only tables | Specific tables are never vacuumed despite high churn; only some tables are affected | SELECT schemaname, relname, n_dead_tup, last_autovacuum FROM pg_stat_user_tables WHERE n_dead_tup > 10000 ORDER BY n_dead_tup DESC LIMIT 5; |
| Cost throttling or too few workers | Workers are constantly running but falling behind; bloat grows slowly but steadily | SHOW autovacuum_vacuum_cost_delay;, SHOW autovacuum_max_workers;, and pg_stat_progress_vacuum |
| Prepared transaction holding XID horizon | Unexplained vacuum blockage with no visible session in pg_stat_activity | SELECT * FROM pg_prepared_xacts; |
Quick checks
# Verify autovacuum is enabled globally
psql -c "SHOW autovacuum;" -c "SHOW autovacuum_max_workers;"
# List running autovacuum workers
psql -c "SELECT pid, datname, state, wait_event_type, wait_event, xact_start, query FROM pg_stat_activity WHERE query LIKE 'autovacuum:%';"
# Find tables with the most dead tuples
psql -c "SELECT schemaname, relname, n_dead_tup, last_autovacuum, last_autoanalyze FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 10;"
# Check for sessions that block vacuum
psql -c "SELECT pid, state, xact_start, now() - xact_start AS xact_age FROM pg_stat_activity WHERE xact_start IS NOT NULL AND now() - xact_start > interval '5 minutes' ORDER BY xact_start;"
# Check for autovacuum workers waiting on locks
psql -c "SELECT pid, wait_event_type, wait_event, query FROM pg_stat_activity WHERE query LIKE 'autovacuum:%' AND wait_event_type = 'Lock';"
# Check database XID age for wraparound risk
psql -c "SELECT datname, age(datfrozenxid) FROM pg_database WHERE datallowconn ORDER BY age(datfrozenxid) DESC;"
# Check for prepared transactions
psql -c "SELECT transaction, owner, database, prepared, age(now(), prepared) AS age FROM pg_prepared_xacts;"
How to diagnose it
- Confirm global settings. Run
SHOW autovacuum;andSHOW autovacuum_max_workers;. If either is off or zero, autovacuum is disabled. Re-enable it before investigating anything else. - See if workers exist. Query
pg_stat_activityfor queries starting withautovacuum:. If you see none, the launcher may be napping or thresholds are not met. Checkpg_stat_user_tablesto confirm dead tuple counts exceed the trigger formula. If you see workers, note theirwait_event_type. - Identify the tables under pressure. Query
pg_stat_user_tablesordered byn_dead_tup. Iflast_autovacuumis NULL or weeks old for a high-churn table, that table is not being reached. - Find the blocker. If workers are present but
n_dead_tupis not dropping, look for long-running transactions inpg_stat_activitywith oldxact_start. Anyidle in transactionsession holds a snapshot that prevents cleanup. Also checkpg_locksfor ungranted locks. - Check XID age. Run
age(datfrozenxid)per database. If the age is climbing toward 500 million, vacuum is not freezing aggressively enough. Above 1 billion, this is an urgent wraparound risk; the hard limit is roughly 2 billion. - Inspect per-table configuration. A table-level
autovacuum_enabled = falsesetting, or an extremely highautovacuum_vacuum_scale_factoron a large table, can make autovacuum mathematically unable to trigger. Checkpg_class.reloptionsto see custom settings.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Dead tuple ratio per table | Indicates bloat pressure and whether vacuum is reclaiming space | n_dead_tup / (n_live_tup + n_dead_tup) > 0.20 |
| Transaction ID age | Wraparound risk; autovacuum must freeze old tuples regularly | age(datfrozenxid) > 500 million |
| Time since last autovacuum | Confirms autovacuum is reaching the table | last_autovacuum older than 24 hours on high-write tables |
| Active autovacuum workers vs dead tuple trend | Distinguishes “not running” from “running but blocked” | Workers active in pg_stat_activity but n_dead_tup still increasing |
| Long-running transaction age | Old snapshots prevent dead tuple removal | Any idle in transaction or active transaction older than 5 minutes |
Lock waits in pg_locks | DDL or heavy locks can block autovacuum from starting | Ungranted ShareUpdateExclusiveLock or stronger on user tables |
Fixes
Re-enable autovacuum globally
If autovacuum = off or autovacuum_max_workers = 0, re-enable immediately. Do not treat this as a tuning knob; disabling autovacuum trades a small, visible I/O cost for permanent bloat and wraparound debt. If you are worried about I/O impact, tune throttling rather than disabling the daemon.
Clear blocking transactions
Identify and terminate offending sessions.
# Terminate idle-in-transaction sessions older than 5 minutes
psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND state_change < NOW() - INTERVAL '5 minutes';"
WARNING: Terminating a session rolls back its open work.
Then set idle_in_transaction_session_timeout to a reasonable value such as 2 minutes to prevent recurrence.
Tune per-table thresholds
Default thresholds require 20% of a table plus 50 tuples to change before vacuum fires. For a 500-million-row table, that is 100 million dead tuples. Lower the threshold:
ALTER TABLE events SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_vacuum_threshold = 10000,
autovacuum_analyze_scale_factor = 0.005
);
For insert-only tables on PostgreSQL 13+, also tune autovacuum_vacuum_insert_scale_factor. Tradeoff: vacuum runs more often, consuming more I/O and CPU.
Reduce throttling and raise resources
Reduce autovacuum_vacuum_cost_delay and raise autovacuum_vacuum_cost_limit to let workers clean faster. Increase autovacuum_work_mem to 1-2 GB per worker so scans do not spill to disk. If workers are saturated across many tables, raise autovacuum_max_workers. Account for total memory: each worker uses its own autovacuum_work_mem allocation. Tradeoff: more aggressive vacuum competes with foreground queries for I/O and CPU.
Run emergency cleanup
If bloat is severe and autovacuum cannot catch up, run a manual VACUUM FREEZE on the affected table during a low-traffic window. Monitor progress in pg_stat_progress_vacuum.
WARNING: Do not use VACUUM FULL in production unless you are prepared for an exclusive lock that can last hours. Use pg_repack for online bloat removal instead. pg_repack requires a primary key or unique NOT NULL index and roughly 2x the table size in free disk space.
Resolve prepared transactions and slot pressure
If pg_prepared_xacts shows old prepared transactions, commit or roll them back. If inactive logical replication slots are holding back the catalog XMIN, drop them with SELECT pg_drop_replication_slot('name'); after confirming the consumer is decommissioned.
Prevention
- Global autovacuum toggle. Never set
autovacuum = off. Temporary I/O relief creates permanent bloat and wraparound debt. - Idle-in-transaction timeout. Set
idle_in_transaction_session_timeout = '2min'to kill abandoned sessions before they block vacuum. - Per-table tuning. Apply aggressive
autovacuum_vacuum_scale_factorandautovacuum_vacuum_thresholdsettings to high-churn and large tables at creation time. - Wraparound monitoring. Alert on
age(datfrozenxid)at 500 million and page the on-call at 1 billion. - Autovacuum logging. Set
log_autovacuum_min_duration = 0to log every autovacuum run so you can verify activity without queryingpg_stat_user_tables. - Weekly review. Review
pg_stat_user_tablesfor tables whoselast_autovacuumhas not updated in the past week. Monitorpg_stat_progress_vacuumduring peak hours to verify workers complete before the next churn cycle.
How Netdata helps
- Correlate dead tuple ratios from
pg_stat_user_tableswith query latency to confirm bloat impact. - Alert on
age(datfrozenxid)approaching 500 million and 1 billion without ad-hoc queries. - Track active autovacuum worker counts alongside long-running transaction counts to spot blockers.
- Surface historical
pg_stat_user_tablesmetrics to identify tables that drift out of maintenance. - Monitor replication slot lag and WAL growth on the primary to catch slot-driven vacuum blockage early.
Related guides
- How PostgreSQL actually works in production: a mental model for operators: /guides/postgres/how-postgres-works-in-production/
- PostgreSQL blocking queries: finding the root blocker in a lock cascade: /guides/postgres/postgres-blocking-queries/
- PostgreSQL connection exhaustion: detection, diagnosis, and prevention: /guides/postgres/postgres-connection-exhaustion/
- PostgreSQL connection refused: pg_hba, listen_addresses, and TCP diagnosis: /guides/postgres/postgres-connection-refused/
- PostgreSQL deadlock detected: how to diagnose and prevent deadlocks: /guides/postgres/postgres-deadlock-detected/
- PostgreSQL idle in transaction: detecting and killing zombie sessions






