PostgreSQL index bloat: detection and REINDEX CONCURRENTLY recovery

Queries that used to return in milliseconds are now spiking to hundreds. Disk usage grows faster than insert volume. EXPLAIN shows a bitmap index scan pulling thousands of heap pages for a selective filter. The cause is usually index bloat: dead pages and fragmentation in B-tree indexes. Unlike table bloat, index bloat is not visible in pg_stat_user_tables, and it often degrades query latency or fills a volume before you notice it. Detect bloat with pgstattuple and recover online with REINDEX CONCURRENTLY.

What this means

PostgreSQL B-tree indexes store entries in 8 KB pages. When rows are updated or deleted, the index entries become dead, but the pages are not immediately reclaimed. VACUUM marks index tuples as dead, yet page-level fragmentation remains because new entries do not perfectly backfill the gaps. The result is low average leaf density. Many pages contain a mix of live and dead tuples, or are entirely dead but still allocated. A bloated index consumes extra disk space, pollutes the buffer cache, and forces bitmap scans to visit more heap pages than necessary. Over time, the index can grow larger than its table.

flowchart TD
  A["Index size > table size or latency spike"] --> B{"Run pgstatindex"}
  B -->|"avg_leaf_density < 70%"| C["Significant bloat confirmed"]
  B -->|"avg_leaf_density >= 70%"| D["Check table stats and HOT ratio"]
  C --> E{"PG version and index type"}
  E -->|"PG12+ B-tree"| F["REINDEX INDEX CONCURRENTLY"]
  E -->|"Exclusion or PG11"| G["Maintenance window or pg_repack"]
  F --> H{"Success?"}
  H -->|"No, _ccnew left"| I["Drop invalid _ccnew and retry"]
  H -->|"Yes, _ccold left"| J["Drop invalid _ccold remnant"]
  D --> K["Tune autovacuum or fillfactor"]

Common causes

CauseWhat it looks likeFirst thing to check
High UPDATE/DELETE rate on indexed columnsIndex size grows faster than table size; index scan latency risespg_stat_user_tables.n_tup_upd and n_tup_del vs insert rate
Long-running transactions blocking vacuumDead tuples persist despite active autovacuum workersOldest xact_start or backend_xid in pg_stat_activity
Autovacuum settings too conservative for table churnDead tuple ratio > 20% on high-write tables; last_autovacuum is stalePer-table autovacuum_vacuum_scale_factor and last_autovacuum timestamp
Low HOT update ratioEvery update touches indexes even when non-indexed heap columns changen_tup_hot_upd / n_tup_upd in pg_stat_user_tables
Excessive indexes on a high-write tableWrite amplification and vacuum overhead spread across many indexespg_stat_user_indexes.idx_scan vs idx_tup_read for unused indexes

Quick checks

These commands are read-only and safe to run during an incident.

# Find indexes that are larger than their table
psql -c "SELECT schemaname, relname, indexrelname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS idx_size,
  pg_size_pretty(pg_relation_size(relid)) AS tbl_size
FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > pg_relation_size(relid)
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 10;"
# Identify long-running transactions that may block vacuum
psql -c "SELECT pid, usename, state,
  EXTRACT(EPOCH FROM (now() - xact_start)) AS xact_age_sec
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
  AND EXTRACT(EPOCH FROM (now() - xact_start)) > 300
ORDER BY xact_start;"
# Review dead tuple counts and last autovacuum per table
psql -c "SELECT schemaname, relname, n_dead_tup, n_live_tup, last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 10;"
# Measure B-tree index bloat with pgstatindex (requires pgstattuple extension)
psql -c "SELECT avg_leaf_density, leaf_fragmentation, deleted_pages
FROM pgstatindex('public.my_index');"
# List invalid indexes, including leftovers from failed concurrent rebuilds
psql -c "SELECT n.nspname AS schema, c.relname AS index_name, i.indisvalid AS valid
FROM pg_index i
JOIN pg_class c ON c.oid = i.indexrelid
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE NOT i.indisvalid;"

How to diagnose it

  1. Spot size anomalies. Query pg_stat_user_indexes for indexes larger than their parent table. Join pg_class to compare reltuples against index size if you need a row-count estimate.
  2. Confirm bloat with pgstatindex. Install the pgstattuple extension if it is not already present. Run SELECT avg_leaf_density, leaf_fragmentation, deleted_pages FROM pgstatindex('idx_name');. Bloat percentage is roughly 100 - avg_leaf_density. Values below 70% combined with elevated deleted_pages indicate significant bloat. Access to pgstatindex is restricted to members of the pg_stat_scan_tables role; superusers bypass this.
  3. Correlate with table activity. Check pg_stat_user_tables for high n_tup_upd and n_tup_del on the underlying table. A low n_tup_hot_upd ratio means updates are not heap-only, so they generate index churn.
  4. Check for blockers. Verify that no long-running transactions or idle in transaction sessions are preventing vacuum from reclaiming dead tuples and index pages.
  5. Inspect for invalid remnants. Query pg_index for indisvalid = false. A failed REINDEX CONCURRENTLY may have left an invalid _ccnew index that consumes space without providing any benefit.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
pg_relation_size(indexrelid) / pg_relation_size(relid)Bloated indexes can exceed table size on diskRatio > 1.0 for non-unique indexes
avg_leaf_density from pgstatindexMeasures packing efficiency of leaf pagesBelow 70% indicates significant bloat
leaf_fragmentation from pgstatindexMeasures physical ordering degradationAbove 30% suggests recovery will help
pg_stat_user_tables.n_dead_tupDead tuples drive index bloatGrowing steadily on indexed tables
pg_stat_activity.xact_start ageLong transactions block vacuum and freezeAny backend older than 5 minutes
pg_stat_user_indexes.idx_scanUnused indexes still incur write and vacuum overheadidx_scan = 0 over a full business cycle

Fixes

Rebuild online with REINDEX CONCURRENTLY (PostgreSQL 12+)

REINDEX INDEX CONCURRENTLY idx_name;

This is the preferred fix for B-tree index bloat on PostgreSQL 12 and later. It takes a SHARE UPDATE EXCLUSIVE lock instead of ACCESS EXCLUSIVE, allowing concurrent inserts, updates, and deletes. The command performs two table scans and a final catalog swap. It cannot be run inside a transaction block. Only one concurrent index build per table is permitted at a time.

To rebuild all indexes on a table:

REINDEX TABLE CONCURRENTLY table_name;

Note that this skips exclusion-constraint indexes. You must handle those separately.

Exclusion constraints. REINDEX CONCURRENTLY raises an error if you target an exclusion-constraint index directly, and skips it when reindexing a table or database. Non-concurrent REINDEX is required for these indexes, so plan a maintenance window.

System catalogs. REINDEX SYSTEM does not support CONCURRENTLY.

Recover from a failed REINDEX CONCURRENTLY

If the rebuild fails mid-operation (for example, due to a uniqueness violation found during the second scan), PostgreSQL leaves behind an invalid index. A transient new index is suffixed _ccnew; if the original index could not be dropped after a successful rebuild, it is suffixed _ccold. Numeric suffixes such as _ccnew1 may appear to keep names unique.

Warning: DROP INDEX is destructive. Verify the index name and confirm it is invalid before dropping.

For an invalid _ccnew index, drop it and retry:

DROP INDEX idx_name_ccnew;
REINDEX INDEX CONCURRENTLY idx_name;

For an invalid _ccold index, the rebuild succeeded and the old index is no longer needed. Drop the remnant:

DROP INDEX idx_name_ccold;

When REINDEX CONCURRENTLY is unavailable

On PostgreSQL 11 and earlier, REINDEX CONCURRENTLY is not available. Options include:

  • CREATE INDEX CONCURRENTLY with a swap. Build a new index with a temporary name concurrently, drop the old index, and rename the new one. This requires updating any foreign-key or constraint references that depend on the old index name. Caution: This does not work for indexes backing PRIMARY KEY or UNIQUE constraints without dropping and recreating the constraint.
  • pg_repack. The pg_repack extension can reorganize a table and its indexes online without an extended exclusive lock. It requires the table to have a primary key or a unique NOT NULL index, plus roughly twice the disk space of the target table and its indexes. The final swap acquires a brief ACCESS EXCLUSIVE lock that is typically measured in milliseconds, not proportional to table size.

Non-B-tree and specialized indexes

pgstatindex works only for B-tree indexes. For GIN, GiST, BRIN, and HASH indexes, monitor physical size growth over time as the primary signal.

GIN indexes deserve special attention. When fastupdate is enabled (the default), inserts go to a pending list that is later flushed by autovacuum or ANALYZE. A large pending list slows searches and consumes space. You can force a flush with gin_clean_pending_list() or consider fastupdate = off if insert latency is acceptable and search performance is critical.

Prevention

  • Aggressive per-table autovacuum. Lower autovacuum_vacuum_scale_factor to 0.01–0.05 on high-churn tables so dead tuples are reclaimed before they fragment index pages.
  • HOT-friendly fillfactor. Set fillfactor to 85 or 90 on update-heavy tables to leave room for heap-only tuple updates, which avoid index maintenance when indexed columns do not change.
  • Remove unused indexes. Every index slows writes and adds vacuum overhead. Drop indexes with zero scans over a full business cycle.
  • Short transaction timeouts. Set idle_in_transaction_session_timeout to 60–300 seconds to prevent abandoned transactions from pinning dead tuples and blocking vacuum.
  • GIN pending list limits. Monitor GIN pending list growth. If searches degrade, use gin_clean_pending_list() or disable fastupdate for the index.

How Netdata helps

  • Index size metrics from pg_stat_user_indexes highlight disproportionate growth against the parent table.
  • Dead tuple ratio per table from pg_stat_user_tables warns that bloat pressure is building before scans degrade.
  • pg_stat_statements latency percentiles flag plan regressions from bloated indexes, such as bitmap scans that fetch excessive heap pages.
  • Connection state charts surface idle in transaction sessions that block autovacuum.
  • Custom alerts on index-to-table size ratios or dead tuple percentages can trigger before you need emergency REINDEX.