$ guides / postgres / postgres-dead-tuples-piling-up ▌

Operations Guides

PostgreSQL dead tuples piling up: why autovacuum can't keep up

PostgreSQL dead tuples piling up: why autovacuum can’t keep up

Table sizes grow faster than insert rates, and queries that scanned thousands of rows last week now scan millions. pg_stat_user_tables shows n_dead_tup climbing into the millions while last_autovacuum is stale or absent. Autovacuum is running somewhere in the cluster, but it is losing the race.

PostgreSQL creates dead tuples on every UPDATE and DELETE. Autovacuum reclaims them only when the dead tuple count crosses a threshold. On large or high-churn tables, the default threshold is far too conservative, and even when vacuum does fire, long-running transactions, worker starvation, or streaming replica feedback can prevent tuple removal. The result is bloat: wasted space, slower scans, and eventually transaction ID wraparound risk.

What this means

Dead tuples are row versions no longer visible to any transaction. They remain until VACUUM marks the space reusable. Autovacuum compares n_dead_tup against a per-table threshold: threshold + scale_factor * reltuples. The global defaults are autovacuum_vacuum_threshold = 50 and autovacuum_vacuum_scale_factor = 0.2. On a 100-million-row table, vacuum does not trigger until roughly 20 million tuples are dead.

Even when autovacuum starts, it can fail to reduce n_dead_tup. A long-running transaction holds an MVCC snapshot that prevents VACUUM from removing tuples newer than the snapshot. On a streaming replica, hot_standby_feedback = on tells the primary to retain dead tuples the replica might still need. With only autovacuum_max_workers = 3 by default, many tables hitting their thresholds simultaneously can queue high-churn tables behind cold ones.

On PostgreSQL 16 and earlier, autovacuum tracks dead tuple TIDs in a memory array capped by autovacuum_work_mem. When that array fills, the worker rescans the heap for additional index vacuum phases. PostgreSQL 17 replaces this with a TIDStore backed by an Adaptive Radix Tree, cutting memory use and collapsing multiple index phases into one.

flowchart TD
    A[n_dead_tup growing] --> B[Check last_autovacuum
per table]
    B -->|Stale or absent| C[Check scale_factor
vs table size]
    B -->|Recent but n_dead_tup
still growing| D[Check blockers]
    D --> E[Long-running xacts?]
    D --> F[hot_standby_feedback?]
    D --> G[Workers saturated?]
    E --> H[Terminate or fix app]
    F --> I[Disable feedback or
tune replica queries]
    G --> J[Raise workers or
tune per-table]
    C --> K[Lower scale_factor
or threshold]

Common causes

Cause	What it looks like	First thing to check
Scale factor too high for large tables	`n_dead_tup` stable in the millions; `last_autovacuum` is days old	Table-level `autovacuum_vacuum_scale_factor`
Long-running transaction blocking vacuum	`n_dead_tup` grows despite active autovacuum workers; vacuum completes but count does not drop	`pg_stat_activity` for `xact_start` older than vacuum cycles
Worker starvation	Many tables exceed thresholds; small tables get vacuumed while hot tables wait	`SHOW autovacuum_max_workers`; worker occupancy
`hot_standby_feedback` on replica	Primary bloats with modest write rates; replica runs long reports	`pg_replication_slots` and `hot_standby_feedback` setting
Insufficient `work_mem` for dead tuple tracking (pre-PG17)	Workers active but `n_dead_tup` keeps climbing; repeated index vacuum phases	`autovacuum_work_mem` or `maintenance_work_mem`

Quick checks

# Top tables by dead tuple count
psql -c "SELECT schemaname, relname, n_dead_tup, n_live_tup,
       round(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_ratio,
       last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC LIMIT 10;"

# Sessions that may be blocking vacuum
psql -c "SELECT pid, usename, state, xact_start, state_change, query
FROM pg_stat_activity
WHERE xact_start < NOW() - INTERVAL '5 minutes'
   OR (state = 'idle in transaction' AND state_change < NOW() - INTERVAL '5 minutes')
ORDER BY xact_start;"

# Currently running autovacuum workers
psql -c "SELECT pid, query, backend_start, state
FROM pg_stat_activity
WHERE query LIKE 'autovacuum:%';"

# Wraparound age per database
psql -c "SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY 2 DESC;"

# Replication slot status on the primary
psql -c "SELECT slot_name, active, restart_lsn FROM pg_replication_slots;"

# Current autovacuum configuration
psql -c "SHOW autovacuum_max_workers;"
psql -c "SHOW autovacuum_vacuum_scale_factor;"
psql -c "SHOW autovacuum_work_mem;"

How to diagnose it

Quantify bloat. Run the dead tuple ratio query. Any table with a ratio above 0.20 and growing is actively bloating. Compare n_dead_tup to n_tup_upd and n_tup_del to confirm write rate.
Check if vacuum reaches the table. Look at last_autovacuum in pg_stat_user_tables. If it is recent but n_dead_tup does not drop, vacuum fires but cannot reclaim tuples. If it is stale, the table may not have hit its threshold or workers are starved.
Find blockers. Query pg_stat_activity for sessions with old xact_start, especially state = 'idle in transaction'. These pin snapshots that block dead tuple removal. Check pg_replication_slots on the primary for inactive slots, and verify whether hot_standby_feedback is enabled on replicas running long queries.
Check worker saturation. If pg_stat_activity shows autovacuum workers constantly busy and last_autovacuum is stale on hot tables, the default autovacuum_max_workers = 3 may be too low.
Check memory pressure on PostgreSQL 16 and earlier. If a table goes through multiple index vacuum phases in pg_stat_progress_vacuum, autovacuum_work_mem may be too small to hold all dead tuple TIDs without rescanning the heap.
Correlate primary bloat with replicas. If the primary bloats but write volume is moderate, check replica query patterns. hot_standby_feedback = on trades primary space for replica query stability.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Dead tuple ratio (`n_dead_tup / (n_live_tup + n_dead_tup)`)	Direct measure of bloat pressure	> 0.20 sustained
`last_autovacuum` per table	Confirms autovacuum is reaching the table	Stale (> 24h) on write-heavy tables
`age(datfrozenxid)`	Freeze progress; vacuum failure leads to wraparound	> 500 million
`pg_stat_activity` idle in transaction age	Blockers preventing dead tuple removal	Any session > 5 minutes
Active autovacuum workers vs `autovacuum_max_workers`	Worker starvation	All workers busy continuously

Fixes

Tune thresholds for high-churn tables

Do not rely on the global 20% scale factor for large or hot tables. Set per-table overrides:

ALTER TABLE events SET (
    autovacuum_vacuum_scale_factor = 0.01,
    autovacuum_vacuum_threshold = 100
);

This triggers vacuum after roughly 1% of the table changes instead of 20%. For insert-only tables, use autovacuum_vacuum_insert_scale_factor (PostgreSQL 13+).

Tradeoff: More frequent vacuum consumes more I/O and CPU. Raise vacuum_cost_delay to throttle if the cluster becomes I/O saturated.

Remove blockers

WARNING: Terminating backends aborts in-flight work and can disrupt applications. Coordinate with application teams before running this in production.

Terminate idle-in-transaction sessions that are pinning snapshots:

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
  AND state_change < NOW() - INTERVAL '5 minutes';

Set a preventive limit:

ALTER SYSTEM SET idle_in_transaction_session_timeout = '2min';
SELECT pg_reload_conf();

If hot_standby_feedback is causing primary bloat, evaluate whether the replica’s long queries can be moved to a different replica, batched into smaller windows, or run with feedback disabled. Disabling feedback reduces bloat at the cost of more query cancels on the replica.

Tradeoff: Killing transactions rolls back application work. Coordinate with application teams before terminating.

Increase worker and memory headroom

Raise the global worker pool when many tables compete:

ALTER SYSTEM SET autovacuum_max_workers = 6;

Restart PostgreSQL to apply autovacuum_max_workers.

For PostgreSQL 16 and earlier, raise autovacuum_work_mem so a single worker can track more dead tuples without rescanning:

ALTER SYSTEM SET autovacuum_work_mem = '1GB';
SELECT pg_reload_conf();

On PostgreSQL 17, the TIDStore uses less memory for the same workload, but aggressive tables still benefit from higher limits.

Tradeoff: More workers and memory increase resource competition with foreground queries. Monitor CPU and I/O saturation after changes.

Reclaim existing bloat online

If bloat is already severe, regular VACUUM will not shrink the table. Use pg_repack for online reorganization:

pg_repack -d mydb -t events

This requires a primary key or unique NOT NULL index and roughly twice the table size in free disk space. It does not hold an exclusive lock for the duration.

Avoid VACUUM FULL in production. It acquires an ACCESS EXCLUSIVE lock and blocks all access for the entire rewrite.

Prevention

Tune per-table autovacuum settings on all high-churn and large tables before bloat appears.
Partition large time-series tables. Parallel vacuum runs per partition and keeps table sizes manageable.
Set fillfactor below 100 on update-heavy tables to enable HOT updates, which reduce index churn and dead tuple creation.
Monitor dead tuple ratio and wraparound age proactively. Alert at 10% dead ratio and 500 million XID age.
Set idle_in_transaction_session_timeout globally to prevent abandoned transactions from blocking vacuum.
Test backup restores and keep pg_repack available. Bloat recovery is an operational event.

How Netdata helps

Charts n_dead_tup per table alongside query latency and sequential scan rates to show bloat impact.
Alerts on dead tuple ratio and age(datfrozenxid) before wraparound becomes critical.
Tracks active autovacuum workers against autovacuum_max_workers to detect saturation.
Shows idle in transaction age and replication slot lag with dead tuple growth to identify blockers.
Correlates primary table bloat with replica lag and hot_standby_feedback status to diagnose cross-instance pressure.

How PostgreSQL actually works in production: a mental model for operators: /guides/postgres/how-postgres-works-in-production/
PostgreSQL ALTER TABLE blocked: zero-downtime DDL patterns: /guides/postgres/postgres-alter-table-blocked/
PostgreSQL autovacuum not running: detection, causes, and fixes: /guides/postgres/postgres-autovacuum-not-running/
PostgreSQL blocking queries: finding the root blocker in a lock cascade: /guides/postgres/postgres-blocking-queries/
PostgreSQL connection exhaustion: detection, diagnosis, and prevention: /guides/postgres/postgres-connection-exhaustion/
PostgreSQL connection refused: pg_hba, listen_addresses, and TCP diagnosis: /guides/postgres/postgres-connection-refused/
PostgreSQL deadlock detected: how to diagnose and prevent deadlocks: /guides/postgres/postgres-deadlock-detected/
PostgreSQL idle in transaction: detecting and killing zombie sessions: /guides/postgres/postgres-idle-in-transaction/
PostgreSQL ERROR: could not obtain lock — diagnosis and recovery: /guides/postgres/postgres-lock-not-available/
PostgreSQL monitoring checklist: the signals every production database needs: /guides/postgres/postgres-monitoring-checklist/
PostgreSQL monitoring maturity model: from reactive to self-healing: /guides/postgres/postgres-monitoring-maturity-model/
PgBouncer pool exhausted: how to diagnose and fix client waits: /guides/postgres/postgres-pgbouncer-pool-exhausted/

The Netdata solution

PostgreSQL monitoring with Netdata

Netdata monitors PostgreSQL with per-second metrics, pre-built dashboards, and ML-powered anomaly detection. Correlate connection saturation, lock waits, autovacuum progress, replication lag, and checkpoint I/O against the rest of your stack so you catch the incidents in these runbooks before they page anyone.

See PostgreSQL monitoring → Start monitoring free