PostgreSQL autovacuum tuning: per-table thresholds for high-churn workloads

Default autovacuum settings target modest OLTP workloads. On a 500-million-row table, the global autovacuum_vacuum_scale_factor of 0.2 means autovacuum ignores the table until dead tuples exceed twenty percent of the row count. For high-churn tables, that delay lets bloat accumulate, degrades indexes, slows scans, and pushes transaction ID age toward wraparound. PostgreSQL overrides these thresholds per table via storage parameters (reloptions), so a 50 GB hot table can run aggressive settings without saturating workers across a 500 MB reference table. This guide covers calculating overrides, sizing worker memory, and verifying vacuum keeps up without drowning the cluster in background I/O.

Prerequisites

  • PostgreSQL instance with access to pg_stat_user_tables and ALTER TABLE ... SET (...).
  • Superuser or table owner on target tables.
  • Disk I/O headroom for more frequent vacuum work. If storage is already saturated, raise autovacuum_vacuum_cost_delay or increase the scale factor before you lower thresholds.
  • Table-level statistics source: pg_stat_user_tables or a metrics exporter.

Procedure

  1. Identify candidate tables

    Query pg_stat_user_tables for tables with high dead-tuple ratios or high absolute dead-tuple counts. Focus on large tables with frequent UPDATE or DELETE.

    SELECT schemaname, relname,
           n_live_tup, n_dead_tup,
           ROUND(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0), 3) AS dead_ratio
    FROM pg_stat_user_tables
    WHERE n_dead_tup > 10000
    ORDER BY n_dead_tup DESC
    LIMIT 10;
    

    A sustained dead-tuple ratio above 0.10 on a busy table is a strong signal that the default 20 percent threshold is too conservative.

  2. Choose a trigger strategy

    Autovacuum fires when dead tuples exceed:

    threshold + scale_factor * reltuples
    

    With default settings, a 50-million-row table needs more than 10 million dead tuples before autovacuum triggers. Two common fixes:

    • Fixed threshold: Set autovacuum_vacuum_scale_factor = 0 and set autovacuum_vacuum_threshold to an absolute number of dead tuples, for example 50000. This is predictable on very large tables.
    • Reduced scale factor: Keep a small scale factor, such as 0.01, so the trigger scales with table growth but fires earlier than the default.
  3. Apply per-table storage parameters

    Use ALTER TABLE ... SET (...) to persist overrides in the catalog.

    ALTER TABLE events SET (
        autovacuum_vacuum_scale_factor = 0.01,
        autovacuum_vacuum_threshold = 50,
        autovacuum_analyze_scale_factor = 0.005,
        autovacuum_analyze_threshold = 50,
        autovacuum_vacuum_cost_delay = 2
    );
    

    autovacuum_vacuum_cost_delay = 2 matches the modern global default. For tables on fast SSD where bloat matters more than I/O latency, some operators set this to 0 to disable throttling entirely. Do that only after confirming the storage subsystem can absorb the extra load. You can also raise autovacuum_vacuum_cost_limit per table to let that table’s worker consume more of the global cost budget.

  4. Size worker memory

    autovacuum_work_mem defaults to -1, so each worker falls back to maintenance_work_mem. If that fallback is small, a large table with many indexes forces vacuum to scan indexes in multiple passes.

    ALTER SYSTEM SET autovacuum_work_mem = '1GB';
    

    1 GB to 2 GB is a common starting point for high-churn tables with several indexes. Reload the configuration after changing it.

    psql -c "SELECT pg_reload_conf();"
    
  5. Tune insert-only and append-only tables

    If the table rarely sees UPDATE or DELETE, dead tuples are not the problem, but freeze progress and visibility map maintenance are. From PostgreSQL 13 onward, use autovacuum_vacuum_insert_scale_factor to trigger vacuum based on insert volume rather than dead tuples.

    ALTER TABLE logs SET (
        autovacuum_vacuum_insert_scale_factor = 0.01,
        autovacuum_vacuum_insert_threshold = 1000
    );
    

    For truly insert-only tables, also monitor age(relfrozenxid) closely because the default dead-tuple threshold may never be reached.

  6. Prime the table after bulk loads

    After loading or backfilling data, do not wait for autovacuum. Run a manual VACUUM ANALYZE to update the free space map and statistics immediately.

    VACUUM ANALYZE events;
    

    For very large post-load tables, VACUUM (DISABLE_PAGE_SKIPPING) ensures a thorough first pass.

flowchart TD
    A[Identify high-churn table] --> B{Prefer fixed or relative trigger?}
    B -->|Large table, predictable churn| C[Set scale_factor = 0
Set threshold = N tuples] B -->|Growing table, variable size| D[Set scale_factor = 0.01
Keep threshold low] C --> E[Set cost_delay per table] D --> E E --> F[Size autovacuum_work_mem] F --> G[Monitor dead_tup and relfrozenxid]

Verifying it works

After applying settings, monitor the table over the next hours and days:

  • Recency: Check that last_autovacuum in pg_stat_user_tables advances after each expected trigger window.

  • Dead tuple trend: n_dead_tup should stop growing monotonically and stay below your target.

  • Log output: If log_autovacuum_min_duration is set, look for completion lines that mention the table and show index cleanup finishing in a single pass. Multiple passes indicate insufficient autovacuum_work_mem.

  • Freeze progress: Query age(relfrozenxid) from pg_class.

    SELECT relname, age(relfrozenxid)
    FROM pg_class
    WHERE relname = 'events'
      AND relkind = 'r';
    

    The value should increase slowly or step back after vacuum. If it is not resetting, verify that autovacuum is not blocked by a long-running transaction or an abandoned replication slot.

Common pitfalls

Over-tuning every table. Lowering thresholds across the whole database can saturate the autovacuum_max_workers pool (default 3) and cause small tables to starve large ones. Tune only tables that prove they need it.

Ignoring the xmin horizon. If n_dead_tup does not decrease despite active autovacuum workers, something is holding back the transaction horizon: a long-running query, an abandoned replication slot, or an idle in transaction session. No threshold change fixes a blocked horizon. Check pg_stat_activity and pg_replication_slots before assuming the tuning failed.

Undersizing memory. A table with ten indexes and 1 GB autovacuum_work_mem may still need multiple index passes if the dead-tuple map does not fit. If autovacuum logs show repeated index scans, raise the memory or reduce bloat with pg_repack first.

Tuning vacuum while ignoring fillfactor. If the table is update-heavy and fillfactor is 100, PostgreSQL cannot perform HOT updates on the same page. Every update becomes a dead tuple plus a new live tuple, increasing vacuum load. Lowering fillfactor to 85 or 90 leaves room for HOT updates, reducing dead-tuple generation before it reaches vacuum.

Forgetting ANALYZE. Vacuum reclaims space; ANALYZE updates planner statistics. A table that is vacuumed aggressively but analyzed rarely can still suffer plan regressions. Set autovacuum_analyze_scale_factor low alongside the vacuum scale factor.

Signals to monitor

SignalWhy it mattersWarning sign
pg_stat_user_tables.n_dead_tup / (n_live_tup + n_dead_tup)Measures bloat pressure per table> 0.10 sustained on a high-churn table
pg_stat_user_tables.last_autovacuumConfirms vacuum reaches the tableOlder than autovacuum_naptime * 3 on hot tables
pg_class.age(relfrozenxid)Tracks freeze progress> 500 million XIDs
pg_stat_activity autovacuum workersReveals worker saturationAll workers pinned to one table for > 1 hour
pg_stat_user_tables.n_tup_upd vs n_dead_tupValidates vacuum effectivenessDead tuples growing faster than update rate implies vacuum is not keeping up

How Netdata helps

Netdata exposes pg_stat_user_tables metrics, including per-table dead-tuple counts and last-vacuum age, without manual queries. Correlate spikes in pgsql.table_dead_tuples with query latency percentiles to confirm bloat is causing slow reads. Netdata also tracks pgsql.db_transaction_id_age per database, so you can alert well before the wraparound warning threshold. During tuning, use disk I/O charts to verify that more aggressive autovacuum is not saturating the storage subsystem.

  • How PostgreSQL actually works in production: a mental model for operators: /guides/postgres/how-postgres-works-in-production/
  • PostgreSQL ALTER TABLE blocked: zero-downtime DDL patterns: /guides/postgres/postgres-alter-table-blocked/
  • PostgreSQL autovacuum not running: detection, causes, and fixes: /guides/postgres/postgres-autovacuum-not-running/
  • PostgreSQL blocking queries: finding the root blocker in a lock cascade: /guides/postgres/postgres-blocking-queries/
  • PostgreSQL connection exhaustion: detection, diagnosis, and prevention: /guides/postgres/postgres-connection-exhaustion/
  • PostgreSQL connection refused: pg_hba, listen_addresses, and TCP diagnosis: /guides/postgres/postgres-connection-refused/
  • PostgreSQL deadlock detected: how to diagnose and prevent deadlocks: /guides/postgres/postgres-deadlock-detected/
  • PostgreSQL idle in transaction: detecting and killing zombie sessions: /guides/postgres/postgres-idle-in-transaction/
  • PostgreSQL ERROR: could not obtain lock — diagnosis and recovery: /guides/postgres/postgres-lock-not-available/
  • PostgreSQL monitoring checklist: the signals every production database needs: /guides/postgres/postgres-monitoring-checklist/
  • PostgreSQL monitoring maturity model: from reactive to self-healing: /guides/postgres/postgres-monitoring-maturity-model/
  • PgBouncer pool exhausted: how to diagnose and fix client waits: /guides/postgres/postgres-pgbouncer-pool-exhausted/