$ guides / postgres / postgres-autovacuum-not-running ▌

Operations Guides

PostgreSQL autovacuum not running: detection, causes, and fixes

Table sizes grow faster than insert rates. Query latency creeps up on UPDATE-heavy workloads. pg_stat_user_tables.n_dead_tup climbs while last_autovacuum stays frozen. Autovacuum should clean this up, but it is not. Dead tuple accumulation degrades performance and can eventually trigger transaction-ID-wraparound shutdown. Detect why autovacuum is stalled, find the blocker, and fix it without making things worse.

What this means

Autovacuum is a background subsystem that spawns workers to run VACUUM and ANALYZE based on table-level thresholds. When it works, it reclaims dead tuple space, updates the free space map, maintains the visibility map, and freezes old transaction IDs to prevent wraparound. When it stops, effects cascade: table and index bloat grow, sequential scans read more dead pages, the planner chooses worse plans as statistics stale, and database age advances toward the 2-billion-transaction hard limit.

“Not running” means three different things: the launcher is disabled, workers cannot launch because of locks or configuration, or workers are running but blocked by a long-running transaction or cost throttling.

flowchart TD
    A[Dead tuples growing] --> B{Autovacuum enabled?}
    B -->|No| C[Enable autovacuum]
    B -->|Yes| D{Workers running?}
    D -->|No| E{Thresholds met?}
    E -->|No| F[Lower scale factor]
    E -->|Yes| G[Check DDL locks]
    D -->|Yes| H{Making progress?}
    H -->|No| I[Check long transactions]
    H -->|Yes| J[Check cost throttling]

Common causes

Cause	What it looks like	First thing to check
Autovacuum disabled globally	No autovacuum workers in `pg_stat_activity`; `last_autovacuum` is stale across all tables	`SHOW autovacuum;` and `SHOW autovacuum_max_workers;`
Long transaction or idle-in-transaction session	`n_dead_tup` grows despite active autovacuum workers; old snapshots prevent cleanup	`SELECT pid, state, xact_start FROM pg_stat_activity WHERE xact_start IS NOT NULL AND now() - xact_start > interval '5 minutes' ORDER BY xact_start;`
DDL lock or heavy manual vacuum blocking worker	An autovacuum worker is stuck in `pg_stat_activity` with `wait_event_type = 'Lock'`	`SELECT * FROM pg_locks WHERE NOT granted AND mode = 'ShareUpdateExclusiveLock';`
Thresholds too high for large or insert-only tables	Specific tables are never vacuumed despite high churn; only some tables are affected	`SELECT schemaname, relname, n_dead_tup, last_autovacuum FROM pg_stat_user_tables WHERE n_dead_tup > 10000 ORDER BY n_dead_tup DESC LIMIT 5;`
Cost throttling or too few workers	Workers are constantly running but falling behind; bloat grows slowly but steadily	`SHOW autovacuum_vacuum_cost_delay;`, `SHOW autovacuum_max_workers;`, and `pg_stat_progress_vacuum`
Prepared transaction holding XID horizon	Unexplained vacuum blockage with no visible session in `pg_stat_activity`	`SELECT * FROM pg_prepared_xacts;`

Quick checks

# Verify autovacuum is enabled globally
psql -c "SHOW autovacuum;" -c "SHOW autovacuum_max_workers;"

# List running autovacuum workers
psql -c "SELECT pid, datname, state, wait_event_type, wait_event, xact_start, query FROM pg_stat_activity WHERE query LIKE 'autovacuum:%';"

# Find tables with the most dead tuples
psql -c "SELECT schemaname, relname, n_dead_tup, last_autovacuum, last_autoanalyze FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 10;"

# Check for sessions that block vacuum
psql -c "SELECT pid, state, xact_start, now() - xact_start AS xact_age FROM pg_stat_activity WHERE xact_start IS NOT NULL AND now() - xact_start > interval '5 minutes' ORDER BY xact_start;"

# Check for autovacuum workers waiting on locks
psql -c "SELECT pid, wait_event_type, wait_event, query FROM pg_stat_activity WHERE query LIKE 'autovacuum:%' AND wait_event_type = 'Lock';"

# Check database XID age for wraparound risk
psql -c "SELECT datname, age(datfrozenxid) FROM pg_database WHERE datallowconn ORDER BY age(datfrozenxid) DESC;"

# Check for prepared transactions
psql -c "SELECT transaction, owner, database, prepared, age(now(), prepared) AS age FROM pg_prepared_xacts;"

How to diagnose it

Confirm global settings. Run SHOW autovacuum; and SHOW autovacuum_max_workers;. If either is off or zero, autovacuum is disabled. Re-enable it before investigating anything else.
See if workers exist. Query pg_stat_activity for queries starting with autovacuum:. If you see none, the launcher may be napping or thresholds are not met. Check pg_stat_user_tables to confirm dead tuple counts exceed the trigger formula. If you see workers, note their wait_event_type.
Identify the tables under pressure. Query pg_stat_user_tables ordered by n_dead_tup. If last_autovacuum is NULL or weeks old for a high-churn table, that table is not being reached.
Find the blocker. If workers are present but n_dead_tup is not dropping, look for long-running transactions in pg_stat_activity with old xact_start. Any idle in transaction session holds a snapshot that prevents cleanup. Also check pg_locks for ungranted locks.
Check XID age. Run age(datfrozenxid) per database. If the age is climbing toward 500 million, vacuum is not freezing aggressively enough. Above 1 billion, this is an urgent wraparound risk; the hard limit is roughly 2 billion.
Inspect per-table configuration. A table-level autovacuum_enabled = false setting, or an extremely high autovacuum_vacuum_scale_factor on a large table, can make autovacuum mathematically unable to trigger. Check pg_class.reloptions to see custom settings.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Dead tuple ratio per table	Indicates bloat pressure and whether vacuum is reclaiming space	`n_dead_tup / (n_live_tup + n_dead_tup)` > 0.20
Transaction ID age	Wraparound risk; autovacuum must freeze old tuples regularly	`age(datfrozenxid)` > 500 million
Time since last autovacuum	Confirms autovacuum is reaching the table	`last_autovacuum` older than 24 hours on high-write tables
Active autovacuum workers vs dead tuple trend	Distinguishes “not running” from “running but blocked”	Workers active in `pg_stat_activity` but `n_dead_tup` still increasing
Long-running transaction age	Old snapshots prevent dead tuple removal	Any `idle in transaction` or active transaction older than 5 minutes
Lock waits in `pg_locks`	DDL or heavy locks can block autovacuum from starting	Ungranted `ShareUpdateExclusiveLock` or stronger on user tables

Fixes

Re-enable autovacuum globally

If autovacuum = off or autovacuum_max_workers = 0, re-enable immediately. Do not treat this as a tuning knob; disabling autovacuum trades a small, visible I/O cost for permanent bloat and wraparound debt. If you are worried about I/O impact, tune throttling rather than disabling the daemon.

Clear blocking transactions

Identify and terminate offending sessions.

# Terminate idle-in-transaction sessions older than 5 minutes
psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND state_change < NOW() - INTERVAL '5 minutes';"

WARNING: Terminating a session rolls back its open work.

Then set idle_in_transaction_session_timeout to a reasonable value such as 2 minutes to prevent recurrence.

Tune per-table thresholds

Default thresholds require 20% of a table plus 50 tuples to change before vacuum fires. For a 500-million-row table, that is 100 million dead tuples. Lower the threshold:

ALTER TABLE events SET (
    autovacuum_vacuum_scale_factor = 0.01,
    autovacuum_vacuum_threshold = 10000,
    autovacuum_analyze_scale_factor = 0.005
);

For insert-only tables on PostgreSQL 13+, also tune autovacuum_vacuum_insert_scale_factor. Tradeoff: vacuum runs more often, consuming more I/O and CPU.

Reduce throttling and raise resources

Reduce autovacuum_vacuum_cost_delay and raise autovacuum_vacuum_cost_limit to let workers clean faster. Increase autovacuum_work_mem to 1-2 GB per worker so scans do not spill to disk. If workers are saturated across many tables, raise autovacuum_max_workers. Account for total memory: each worker uses its own autovacuum_work_mem allocation. Tradeoff: more aggressive vacuum competes with foreground queries for I/O and CPU.

Run emergency cleanup

If bloat is severe and autovacuum cannot catch up, run a manual VACUUM FREEZE on the affected table during a low-traffic window. Monitor progress in pg_stat_progress_vacuum.

WARNING: Do not use VACUUM FULL in production unless you are prepared for an exclusive lock that can last hours. Use pg_repack for online bloat removal instead. pg_repack requires a primary key or unique NOT NULL index and roughly 2x the table size in free disk space.

Resolve prepared transactions and slot pressure

If pg_prepared_xacts shows old prepared transactions, commit or roll them back. If inactive logical replication slots are holding back the catalog XMIN, drop them with SELECT pg_drop_replication_slot('name'); after confirming the consumer is decommissioned.

Prevention

Global autovacuum toggle. Never set autovacuum = off. Temporary I/O relief creates permanent bloat and wraparound debt.
Idle-in-transaction timeout. Set idle_in_transaction_session_timeout = '2min' to kill abandoned sessions before they block vacuum.
Per-table tuning. Apply aggressive autovacuum_vacuum_scale_factor and autovacuum_vacuum_threshold settings to high-churn and large tables at creation time.
Wraparound monitoring. Alert on age(datfrozenxid) at 500 million and page the on-call at 1 billion.
Autovacuum logging. Set log_autovacuum_min_duration = 0 to log every autovacuum run so you can verify activity without querying pg_stat_user_tables.
Weekly review. Review pg_stat_user_tables for tables whose last_autovacuum has not updated in the past week. Monitor pg_stat_progress_vacuum during peak hours to verify workers complete before the next churn cycle.

How Netdata helps

Correlate dead tuple ratios from pg_stat_user_tables with query latency to confirm bloat impact.
Alert on age(datfrozenxid) approaching 500 million and 1 billion without ad-hoc queries.
Track active autovacuum worker counts alongside long-running transaction counts to spot blockers.
Surface historical pg_stat_user_tables metrics to identify tables that drift out of maintenance.
Monitor replication slot lag and WAL growth on the primary to catch slot-driven vacuum blockage early.

How PostgreSQL actually works in production: a mental model for operators: /guides/postgres/how-postgres-works-in-production/
PostgreSQL blocking queries: finding the root blocker in a lock cascade: /guides/postgres/postgres-blocking-queries/
PostgreSQL connection exhaustion: detection, diagnosis, and prevention: /guides/postgres/postgres-connection-exhaustion/
PostgreSQL connection refused: pg_hba, listen_addresses, and TCP diagnosis: /guides/postgres/postgres-connection-refused/
PostgreSQL deadlock detected: how to diagnose and prevent deadlocks: /guides/postgres/postgres-deadlock-detected/
PostgreSQL idle in transaction: detecting and killing zombie sessions

The Netdata solution

PostgreSQL monitoring with Netdata

Netdata monitors PostgreSQL with per-second metrics, pre-built dashboards, and ML-powered anomaly detection. Correlate connection saturation, lock waits, autovacuum progress, replication lag, and checkpoint I/O against the rest of your stack so you catch the incidents in these runbooks before they page anyone.

See PostgreSQL monitoring → Start monitoring free

PostgreSQL autovacuum not running: detection, causes, and fixes

PostgreSQL autovacuum not running: detection, causes, and fixes

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Re-enable autovacuum globally

Clear blocking transactions

Tune per-table thresholds

Reduce throttling and raise resources

Run emergency cleanup

Resolve prepared transactions and slot pressure

Prevention

How Netdata helps

Related guides

PostgreSQL monitoring with Netdata