PostgreSQL table bloat: detection, measurement, and remediation

Table bloat appears as tables growing while row counts stay flat, sequential scans slowing despite indexes, and disk alarms that do not track business growth. PostgreSQL’s MVCC writes new tuple versions instead of overwriting old ones; every UPDATE leaves a dead row and every DELETE leaves invisible garbage. VACUUM reclaims that space for reuse within the file, but plain VACUUM does not shrink the relation on disk. When dead tuples outpace cleanup, a table can become several times larger than its logical content.

Not all bloat is harmful. A table with steady churn can maintain persistent bloat that improves write performance because PostgreSQL reuses free space faster than it allocates new pages. The danger is runaway growth: tables swell past their working-set memory footprint, pressure the buffer cache, and force index scans through fragmented leaf pages. Distinguish harmless steady-state bloat from pathological growth, measure it accurately, and remediate without taking the table offline during peak traffic.

This guide covers lightweight detection with catalog statistics, exact measurement with pgstattuple, remediation with pg_repack or planned VACUUM FULL, and prevention through per-table autovacuum tuning.

What this means

Table bloat is the difference between a table’s physical size and the space required to store its live tuples. It consists of dead tuples awaiting VACUUM, free space that VACUUM has marked reusable, and page fragmentation. The canonical formula using pgstattuple is (table_len - tuple_len) / table_len * 100. Ratios above 30-40 percent often warrant intervention, though the threshold depends on table size, churn rate, and whether the working set still fits in memory.

flowchart TD
    A[Measure bloat with pgstattuple] --> B{Can table be locked?}
    B -->|Maintenance window| C[VACUUM FULL]
    B -->|Must stay online| D{Has PK or unique NOT NULL index?}
    D -->|Yes| E[pg_repack]
    D -->|No| F[pg_squeeze or scheduled rewrite]
    C --> G[Re-run pgstattuple to verify]
    E --> G
    F --> G

Common causes

CauseWhat it looks likeFirst thing to check
Update-heavy workload with default autovacuumn_tup_upd » n_tup_ins, dead tuple ratio climbingpg_stat_user_tables.last_autovacuum and autovacuum_vacuum_scale_factor
Delete-heavy batch job without post-load vacuumn_dead_tup tracks n_tup_del, table size jumps after ETLlast_autovacuum timestamp and manual VACUUM schedule
Long transaction blocking VACUUMn_dead_tup growing despite active autovacuum workerspg_stat_activity for idle in transaction sessions older than the last vacuum
Autovacuum too conservative for large tableTable has billions of rows; 20% threshold is never reachedautovacuum_vacuum_scale_factor multiplied by table cardinality
Indexed column updates disabling HOTHigh n_tup_upd but low n_tup_hot_updfillfactor and whether updated columns are indexed

Quick checks

pg_stat_user_tables counters accumulate until reset; on a newly promoted replica or after a statistics reset, timestamps and ratios may look misleading until the workload stabilizes.

# Dead tuple leaderboard: lightweight check without table scans
psql -c "SELECT schemaname, relname, n_live_tup, n_dead_tup,
       round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC LIMIT 10;"
# Idle-in-transaction blockers that prevent VACUUM progress
psql -c "SELECT pid, usename, state, state_change, query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
  AND state_change < NOW() - INTERVAL '5 minutes';"
# Last autovacuum age per table
psql -c "SELECT schemaname, relname,
       last_autovacuum, last_vacuum,
       now() - last_autovacuum AS since_autovacuum
FROM pg_stat_user_tables
WHERE last_autovacuum IS NULL
   OR last_autovacuum < NOW() - INTERVAL '24 hours'
ORDER BY n_dead_tup DESC LIMIT 10;"
# Table size ranking
psql -c "SELECT schemaname, relname,
       pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC LIMIT 10;"
# Install pgstattuple for exact measurement
psql -c "CREATE EXTENSION IF NOT EXISTS pgstattuple;"
# Freeze age per table: wraparound indicator independent of dead tuple counts
psql -c "SELECT n.nspname AS schema, c.relname,
       age(c.relfrozenxid) AS xid_age,
       round(100 * age(c.relfrozenxid) / 2147483647.0, 2) AS wraparound_pct
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = 'r'
  AND age(c.relfrozenxid) > 500000000
ORDER BY age(c.relfrozenxid) DESC LIMIT 10;"

How to diagnose it

  1. Run the dead tuple leaderboard to identify candidates. A high n_dead_tup relative to n_live_tup signals vacuum lag.
  2. Check pg_stat_activity for idle in transaction sessions. A transaction older than last_autovacuum can hold a snapshot that prevents VACUUM from removing dead tuples visible to that snapshot.
  3. Create the pgstattuple extension if it is not already installed: CREATE EXTENSION pgstattuple;.
  4. Run an exact scan on the candidate table: SELECT * FROM pgstattuple('my_schema.my_table');. This performs a full table scan; on tables larger than tens of gigabytes expect minutes of runtime and I/O load. For faster approximate results, use pgstattuple_approx('my_schema.my_table'), which leverages the visibility map and avoids scanning all pages. Accuracy depends on how recently the table was vacuumed.
  5. Interpret the output. Key fields are table_len (total physical bytes), tuple_len (live tuple bytes), dead_tuple_len, free_space, and free_percent. Compute bloat percentage as (table_len - tuple_len) / table_len * 100. Values above 30-40 percent usually justify remediation.
  6. Do not trust catalog-based heuristic bloat queries. These underreport bloat significantly because they estimate free space using statistical assumptions rather than page-level inspection. A table can show zero bloat in a heuristic check while pgstattuple reveals 37 percent or more.
  7. Compare n_tup_hot_upd to n_tup_upd. If the ratio is low and the table has non-trivial update volume, bloat is likely coming from indexed-column updates that cannot use Heap-Only Tuple updates.
  8. If the table has steady churn and autovacuum is keeping the ratio stable, monitor trends rather than immediately remediating.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
n_dead_tup / (n_live_tup + n_dead_tup)Dead tuple ratio indicates vacuum effectiveness> 20% sustained or climbing
pgstattuple.free_percentExact fragmentation and free space> 40% with climbing trend
last_autovacuum ageWhether autovacuum reaches the table> 24 hours on high-write tables
Table physical size vs. logical row countSpace amplification from bloatSize grows faster than insert rate implies
age(relfrozenxid) per tableFreeze progress independent of dead tuple cleanup> 500 million (wraparound risk)

If you are collecting metrics with Netdata, chart pg_stat_user_tables.n_dead_tup alongside n_live_tup and pg_database.size. A widening gap between live tuples and table size is often the first visible signal of bloat, and it lets you spot problems without repeatedly querying the catalog on the production instance.

Fixes

Online remediation with pg_repack

pg_repack is the standard tool for online table reorganization. It creates a new compact copy of the table, builds fresh indexes, and replays concurrent DML from a trigger-based log table. The final catalog swap acquires an ACCESS EXCLUSIVE lock briefly; on healthy systems this is often sub-second, but it can stretch under heavy load or when many objects depend on the table.

Requirements: the table must have a primary key or a unique NOT NULL index. You need free disk space roughly equal to the size of the table plus its indexes, because the tool maintains both copies until the swap completes. Run it during a low-traffic window because the trigger adds write overhead during the copy.

# Basic pg_repack invocation
pg_repack -d mydb -t my_schema.my_table

If pg_repack aborts, trigger artifacts and the temporary table may be left behind. Check for orphaned log tables and triggers before retrying. Table owners who are not superusers can run it with --no-superuser-check where appropriate.

When pg_squeeze is an option

pg_squeeze provides similar online compaction but uses logical decoding instead of triggers. It can squeeze tables without a primary key and integrates with the autovacuum framework. However, it requires adding pg_squeeze to shared_preload_libraries, configuring wal_level and max_replication_slots, and restarting the cluster. It is not available on Amazon RDS.

Offline remediation with VACUUM FULL

VACUUM FULL rewrites the entire table and all indexes while holding an ACCESS EXCLUSIVE lock for the entire duration. No reads or writes are possible. For large tables this can take hours. Use this only during an explicit maintenance window when the table can be taken completely offline. It also requires disk space for the second copy of the table during the rewrite. If cancelled, the transaction rolls back and the table remains unchanged.

Plain VACUUM and autovacuum tuning

If bloat is mild and the table is not critical, aggressive autovacuum may be enough. Lower the per-table threshold so VACUUM triggers before bloat becomes severe:

ALTER TABLE my_table SET (
    autovacuum_vacuum_scale_factor = 0.05,
    autovacuum_vacuum_threshold = 10000
);

For insert-only tables, PostgreSQL 13 and later support autovacuum_vacuum_insert_scale_factor to trigger vacuum for freezing even when no dead tuples exist.

Prevention

  • Per-table autovacuum tuning. Default settings assume modest write rates. High-churn tables need autovacuum_vacuum_scale_factor as low as 0.01, higher autovacuum_work_mem so each pass processes more dead tuples, and enough autovacuum_max_workers to reach hot tables before they bloat.
  • Set fillfactor below 100 on UPDATE-heavy tables. This reserves page space for HOT updates, reducing index bloat and fragmentation from non-indexed column updates.
  • Keep transactions short. Long-running transactions, even idle ones, pin dead tuples and block freeze progress, which lets bloat accumulate and raises wraparound risk.
  • Monitor freeze age independently from dead tuple counts. A table can have low bloat but an old relfrozenxid; autovacuum driven by insert traffic or freeze requirements uses the same worker pool, and contention there delays dead tuple cleanup.