PostgreSQL autovacuum not running: detection, causes, and fixes

Table sizes grow faster than insert rates. Query latency creeps up on UPDATE-heavy workloads. pg_stat_user_tables.n_dead_tup climbs while last_autovacuum stays frozen. Autovacuum should clean this up, but it is not. Dead tuple accumulation degrades performance and can eventually trigger transaction-ID-wraparound shutdown. Detect why autovacuum is stalled, find the blocker, and fix it without making things worse.

What this means

Autovacuum is a background subsystem that spawns workers to run VACUUM and ANALYZE based on table-level thresholds. When it works, it reclaims dead tuple space, updates the free space map, maintains the visibility map, and freezes old transaction IDs to prevent wraparound. When it stops, effects cascade: table and index bloat grow, sequential scans read more dead pages, the planner chooses worse plans as statistics stale, and database age advances toward the 2-billion-transaction hard limit.

“Not running” means three different things: the launcher is disabled, workers cannot launch because of locks or configuration, or workers are running but blocked by a long-running transaction or cost throttling.

flowchart TD
    A[Dead tuples growing] --> B{Autovacuum enabled?}
    B -->|No| C[Enable autovacuum]
    B -->|Yes| D{Workers running?}
    D -->|No| E{Thresholds met?}
    E -->|No| F[Lower scale factor]
    E -->|Yes| G[Check DDL locks]
    D -->|Yes| H{Making progress?}
    H -->|No| I[Check long transactions]
    H -->|Yes| J[Check cost throttling]

Common causes

CauseWhat it looks likeFirst thing to check
Autovacuum disabled globallyNo autovacuum workers in pg_stat_activity; last_autovacuum is stale across all tablesSHOW autovacuum; and SHOW autovacuum_max_workers;
Long transaction or idle-in-transaction sessionn_dead_tup grows despite active autovacuum workers; old snapshots prevent cleanupSELECT pid, state, xact_start FROM pg_stat_activity WHERE xact_start IS NOT NULL AND now() - xact_start > interval '5 minutes' ORDER BY xact_start;
DDL lock or heavy manual vacuum blocking workerAn autovacuum worker is stuck in pg_stat_activity with wait_event_type = 'Lock'SELECT * FROM pg_locks WHERE NOT granted AND mode = 'ShareUpdateExclusiveLock';
Thresholds too high for large or insert-only tablesSpecific tables are never vacuumed despite high churn; only some tables are affectedSELECT schemaname, relname, n_dead_tup, last_autovacuum FROM pg_stat_user_tables WHERE n_dead_tup > 10000 ORDER BY n_dead_tup DESC LIMIT 5;
Cost throttling or too few workersWorkers are constantly running but falling behind; bloat grows slowly but steadilySHOW autovacuum_vacuum_cost_delay;, SHOW autovacuum_max_workers;, and pg_stat_progress_vacuum
Prepared transaction holding XID horizonUnexplained vacuum blockage with no visible session in pg_stat_activitySELECT * FROM pg_prepared_xacts;

Quick checks

# Verify autovacuum is enabled globally
psql -c "SHOW autovacuum;" -c "SHOW autovacuum_max_workers;"
# List running autovacuum workers
psql -c "SELECT pid, datname, state, wait_event_type, wait_event, xact_start, query FROM pg_stat_activity WHERE query LIKE 'autovacuum:%';"
# Find tables with the most dead tuples
psql -c "SELECT schemaname, relname, n_dead_tup, last_autovacuum, last_autoanalyze FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 10;"
# Check for sessions that block vacuum
psql -c "SELECT pid, state, xact_start, now() - xact_start AS xact_age FROM pg_stat_activity WHERE xact_start IS NOT NULL AND now() - xact_start > interval '5 minutes' ORDER BY xact_start;"
# Check for autovacuum workers waiting on locks
psql -c "SELECT pid, wait_event_type, wait_event, query FROM pg_stat_activity WHERE query LIKE 'autovacuum:%' AND wait_event_type = 'Lock';"
# Check database XID age for wraparound risk
psql -c "SELECT datname, age(datfrozenxid) FROM pg_database WHERE datallowconn ORDER BY age(datfrozenxid) DESC;"
# Check for prepared transactions
psql -c "SELECT transaction, owner, database, prepared, age(now(), prepared) AS age FROM pg_prepared_xacts;"

How to diagnose it

  1. Confirm global settings. Run SHOW autovacuum; and SHOW autovacuum_max_workers;. If either is off or zero, autovacuum is disabled. Re-enable it before investigating anything else.
  2. See if workers exist. Query pg_stat_activity for queries starting with autovacuum:. If you see none, the launcher may be napping or thresholds are not met. Check pg_stat_user_tables to confirm dead tuple counts exceed the trigger formula. If you see workers, note their wait_event_type.
  3. Identify the tables under pressure. Query pg_stat_user_tables ordered by n_dead_tup. If last_autovacuum is NULL or weeks old for a high-churn table, that table is not being reached.
  4. Find the blocker. If workers are present but n_dead_tup is not dropping, look for long-running transactions in pg_stat_activity with old xact_start. Any idle in transaction session holds a snapshot that prevents cleanup. Also check pg_locks for ungranted locks.
  5. Check XID age. Run age(datfrozenxid) per database. If the age is climbing toward 500 million, vacuum is not freezing aggressively enough. Above 1 billion, this is an urgent wraparound risk; the hard limit is roughly 2 billion.
  6. Inspect per-table configuration. A table-level autovacuum_enabled = false setting, or an extremely high autovacuum_vacuum_scale_factor on a large table, can make autovacuum mathematically unable to trigger. Check pg_class.reloptions to see custom settings.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Dead tuple ratio per tableIndicates bloat pressure and whether vacuum is reclaiming spacen_dead_tup / (n_live_tup + n_dead_tup) > 0.20
Transaction ID ageWraparound risk; autovacuum must freeze old tuples regularlyage(datfrozenxid) > 500 million
Time since last autovacuumConfirms autovacuum is reaching the tablelast_autovacuum older than 24 hours on high-write tables
Active autovacuum workers vs dead tuple trendDistinguishes “not running” from “running but blocked”Workers active in pg_stat_activity but n_dead_tup still increasing
Long-running transaction ageOld snapshots prevent dead tuple removalAny idle in transaction or active transaction older than 5 minutes
Lock waits in pg_locksDDL or heavy locks can block autovacuum from startingUngranted ShareUpdateExclusiveLock or stronger on user tables

Fixes

Re-enable autovacuum globally

If autovacuum = off or autovacuum_max_workers = 0, re-enable immediately. Do not treat this as a tuning knob; disabling autovacuum trades a small, visible I/O cost for permanent bloat and wraparound debt. If you are worried about I/O impact, tune throttling rather than disabling the daemon.

Clear blocking transactions

Identify and terminate offending sessions.

# Terminate idle-in-transaction sessions older than 5 minutes
psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND state_change < NOW() - INTERVAL '5 minutes';"

WARNING: Terminating a session rolls back its open work.

Then set idle_in_transaction_session_timeout to a reasonable value such as 2 minutes to prevent recurrence.

Tune per-table thresholds

Default thresholds require 20% of a table plus 50 tuples to change before vacuum fires. For a 500-million-row table, that is 100 million dead tuples. Lower the threshold:

ALTER TABLE events SET (
    autovacuum_vacuum_scale_factor = 0.01,
    autovacuum_vacuum_threshold = 10000,
    autovacuum_analyze_scale_factor = 0.005
);

For insert-only tables on PostgreSQL 13+, also tune autovacuum_vacuum_insert_scale_factor. Tradeoff: vacuum runs more often, consuming more I/O and CPU.

Reduce throttling and raise resources

Reduce autovacuum_vacuum_cost_delay and raise autovacuum_vacuum_cost_limit to let workers clean faster. Increase autovacuum_work_mem to 1-2 GB per worker so scans do not spill to disk. If workers are saturated across many tables, raise autovacuum_max_workers. Account for total memory: each worker uses its own autovacuum_work_mem allocation. Tradeoff: more aggressive vacuum competes with foreground queries for I/O and CPU.

Run emergency cleanup

If bloat is severe and autovacuum cannot catch up, run a manual VACUUM FREEZE on the affected table during a low-traffic window. Monitor progress in pg_stat_progress_vacuum.

WARNING: Do not use VACUUM FULL in production unless you are prepared for an exclusive lock that can last hours. Use pg_repack for online bloat removal instead. pg_repack requires a primary key or unique NOT NULL index and roughly 2x the table size in free disk space.

Resolve prepared transactions and slot pressure

If pg_prepared_xacts shows old prepared transactions, commit or roll them back. If inactive logical replication slots are holding back the catalog XMIN, drop them with SELECT pg_drop_replication_slot('name'); after confirming the consumer is decommissioned.

Prevention

  • Global autovacuum toggle. Never set autovacuum = off. Temporary I/O relief creates permanent bloat and wraparound debt.
  • Idle-in-transaction timeout. Set idle_in_transaction_session_timeout = '2min' to kill abandoned sessions before they block vacuum.
  • Per-table tuning. Apply aggressive autovacuum_vacuum_scale_factor and autovacuum_vacuum_threshold settings to high-churn and large tables at creation time.
  • Wraparound monitoring. Alert on age(datfrozenxid) at 500 million and page the on-call at 1 billion.
  • Autovacuum logging. Set log_autovacuum_min_duration = 0 to log every autovacuum run so you can verify activity without querying pg_stat_user_tables.
  • Weekly review. Review pg_stat_user_tables for tables whose last_autovacuum has not updated in the past week. Monitor pg_stat_progress_vacuum during peak hours to verify workers complete before the next churn cycle.

How Netdata helps

  • Correlate dead tuple ratios from pg_stat_user_tables with query latency to confirm bloat impact.
  • Alert on age(datfrozenxid) approaching 500 million and 1 billion without ad-hoc queries.
  • Track active autovacuum worker counts alongside long-running transaction counts to spot blockers.
  • Surface historical pg_stat_user_tables metrics to identify tables that drift out of maintenance.
  • Monitor replication slot lag and WAL growth on the primary to catch slot-driven vacuum blockage early.
  • How PostgreSQL actually works in production: a mental model for operators: /guides/postgres/how-postgres-works-in-production/
  • PostgreSQL blocking queries: finding the root blocker in a lock cascade: /guides/postgres/postgres-blocking-queries/
  • PostgreSQL connection exhaustion: detection, diagnosis, and prevention: /guides/postgres/postgres-connection-exhaustion/
  • PostgreSQL connection refused: pg_hba, listen_addresses, and TCP diagnosis: /guides/postgres/postgres-connection-refused/
  • PostgreSQL deadlock detected: how to diagnose and prevent deadlocks: /guides/postgres/postgres-deadlock-detected/
  • PostgreSQL idle in transaction: detecting and killing zombie sessions