PostgreSQL missing indexes: detection from pg_stat_statements and logs

When the planner cannot find a suitable index path, it falls back to a sequential scan. On large tables, this turns millisecond queries into multi second outages.

PostgreSQL exposes evidence through built-in instrumentation: pg_stat_user_tables tracks sequential versus index scan ratios, pg_stat_statements surfaces the most time-consuming query fingerprints, and auto_explain logs execution plans that reveal Seq Scan nodes directly.

This guide gives an operator workflow to detect missing indexes using read-only checks and hypothetical index simulation. It assumes pg_stat_statements is enabled; if not, adding it to shared_preload_libraries requires a server restart.

The workflow moves from broad table-level counters to specific query fingerprints, then to plan verification and safe validation with HypoPG. Each step includes the exact queries and safety notes you need to run during a live incident without making things worse.

What this captures

This workflow uses four layers of evidence:

  • Table scan patterns from pg_stat_user_tables, showing which tables are scanned sequentially and how many rows are read.
  • Query-level timing from pg_stat_statements, identifying which normalized query fingerprints consume the most execution time.
  • Execution plans from auto_explain or manual EXPLAIN (ANALYZE, BUFFERS), revealing whether the planner is choosing sequential scans because no index exists or because statistics are stale.
  • Hypothetical index impact using the HypoPG extension, letting you test whether a proposed index changes the plan before writing any disk pages or WAL.

Together these answer two questions: where is the missing index, and will adding it actually help?

Prerequisites

Confirm the following before running the workflow:

  • pg_stat_statements is enabled. It must be listed in shared_preload_libraries and the extension created in the target database. Changing shared_preload_libraries requires a PostgreSQL restart.
  • You have superuser or pg_monitor role membership. pg_stat_statements and pg_stat_user_tables are readable by pg_monitor, but auto_explain configuration and HypoPG installation require superuser.
  • (Optional) auto_explain is loaded. To capture plans automatically from logs, auto_explain must also be in shared_preload_libraries. Like pg_stat_statements, this requires a restart to activate. Once loaded, its parameters can be changed with a configuration reload.
  • (Optional) HypoPG is installed. Available from the hypopg package on most distributions. Install with CREATE EXTENSION hypopg; in the target database.

Detection workflow

flowchart TD
    A[pg_stat_user_tables high seq_tup_read] --> B{Table > 10k rows?}
    B -->|No| C[Scan likely correct]
    B -->|Yes| D[Find query in pg_stat_statements]
    D --> E[Capture plan with EXPLAIN]
    E --> F{Seq Scan on large table?}
    F -->|No| G[Check statistics]
    F -->|Yes| H[Test with HypoPG]
    H --> I{Index Scan preferred?}
    I -->|Yes| J[CREATE INDEX CONCURRENTLY]
    I -->|No| K[Revise query or predicates]

Follow these steps in order. All steps are read-only except for HypoPG index creation, which is session-local and writes no WAL.

1. Identify tables with excessive sequential scans

Start with pg_stat_user_tables. Counters are cumulative since the last stats reset or server restart, so compare relative volumes rather than absolute thresholds.

SELECT schemaname, relname,
       seq_scan, seq_tup_read,
       idx_scan,
       CASE WHEN seq_scan + idx_scan > 0
            THEN round(100.0 * seq_scan / (seq_scan + idx_scan), 2)
            ELSE 0 END AS seq_pct
FROM pg_stat_user_tables
WHERE n_live_tup > 10000
ORDER BY seq_tup_read DESC
LIMIT 20;

What to look for: Large values of seq_tup_read on tables with more than a few thousand live rows. A high seq_pct alone is not enough; a table scanned ten thousand times with zero index scans is suspicious, but if the table has only a few hundred rows the sequential scan may still be correct. Focus on absolute seq_tup_read volume.

Note: these counters reset on server restart. If you restart frequently, baseline-correct by noting pre- and post-restart values.

2. Correlate with pg_stat_statements

Find the query fingerprints consuming the most execution time:

SELECT query, calls,
       round(total_exec_time::numeric, 2) AS total_ms,
       round(mean_exec_time::numeric, 4) AS mean_ms,
       rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Cross-reference the tables from Step 1 with the query text. Look for WHERE clauses on columns that lack indexes, or joins between large tables without supporting index paths.

Caveat: pg_stat_statements normalizes queries by replacing literals with placeholders. A query that is fast for most values but slow for a specific parameter will appear as a single fingerprint. Check stddev_exec_time on PostgreSQL 13 and later; a high standard deviation relative to the mean suggests parameter-sensitive plan flapping or bimodality that a single index may not fix.

3. Capture execution plans

For immediate inspection of a specific query, use manual EXPLAIN:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT ...

If auto_explain is loaded and you need continuous capture, configure it to log plans for queries exceeding your latency threshold:

ALTER SYSTEM SET auto_explain.log_min_duration = 1000;
ALTER SYSTEM SET auto_explain.log_analyze = true;
ALTER SYSTEM SET auto_explain.log_buffers = true;
SELECT pg_reload_conf();

Warning: auto_explain.log_analyze = true causes the server to execute queries internally to collect timing. This adds overhead and can trigger side effects from volatile functions. Enable it only for short diagnostic windows, then revert.

If auto_explain is unavailable, log_min_duration_statement logs slow query text without plans and requires no preload library.

What to look for: Seq Scan nodes on the tables identified in Step 1. Check the rows estimate versus actual rows. If the estimate is wildly wrong, the problem may be stale statistics rather than a missing index; run ANALYZE on the table first and re-check.

The BUFFERS output is critical. Look for shared hit versus shared read. If a sequential scan shows high shared read, the table is not cached and the scan is hitting disk. If shared hit dominates, the table fits in memory and the scan is less expensive, though still CPU-intensive for large tables.

4. Determine if the sequential scan is actually wrong

Sequential scans are not always mistakes. If a query returns more than 5-10% of a table’s rows, or if the table is small enough to fit in a few pages, a sequential scan is often faster than random index I/O. Before concluding that an index is missing, confirm:

  • The table has more than roughly ten thousand rows.
  • The query filters on a column that would be highly selective with an index.
  • The Seq Scan node in EXPLAIN shows high actual time relative to the rest of the plan.

If the planner estimates it would return a large fraction of the table, it is probably correct to avoid an index. If the table is small or the query is unselective, move on.

5. Test a hypothetical index with HypoPG

Ensure the HypoPG extension is installed in the target database. Then create a virtual index that exists only in the planner’s cost model. It generates no disk I/O, no WAL, and no locks.

SELECT * FROM hypopg_create_index('CREATE INDEX ON my_table(my_column)');

-- Check the new plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT ...

What to look for: If the planner switches from Seq Scan to Index Scan or Bitmap Index Scan and the estimated cost drops significantly, the index is likely to help.

Important: HypoPG indexes are session-local. They disappear when the session ends. They are invisible to other sessions and do not appear in pg_indexes. You cannot use them to warm production workloads.

6. Force the planner to consider index paths (diagnostic only)

If HypoPG is not available, you can temporarily bias the planner to see whether an index path is even possible:

SET enable_seqscan = off;
EXPLAIN (ANALYZE, BUFFERS)
SELECT ...
RESET enable_seqscan;

If the planner still chooses a sequential scan or materialized scan even with enable_seqscan = off, then no useful index path exists for that query shape. You may need to reconsider the query, add a composite index, or fix statistics.

Warning: This affects the entire session. Never leave enable_seqscan = off in production.

7. Create the index safely

If HypoPG or the diagnostic above confirms the index will help, create it without blocking writes:

CREATE INDEX CONCURRENTLY idx_my_table_my_column ON my_table(my_column);

CREATE INDEX CONCURRENTLY avoids locking the table for writes, though it takes longer than a standard CREATE INDEX and cannot run inside a transaction block. Monitor the build progress in system views.

Verifying it works

After the index is built, verify the change had the intended effect:

  • Re-run the EXPLAIN (ANALYZE, BUFFERS) from Step 3. Confirm the planner now uses the index.
  • Check pg_stat_user_indexes for the new index. idx_scan should increment as queries use it.
  • Re-query pg_stat_statements after a representative workload cycle. The target fingerprint’s mean_exec_time should drop. Because counters are cumulative, compare rates before and after rather than absolute totals.
  • If pg_stat_user_indexes shows zero scans on the new index after a representative workload cycle, the query shape may not match the index. Common causes include a function wrapper on the indexed column, an implicit type cast in the query, or a join condition that uses a different column order than a composite index. Re-examine the query text and the plan.

Common pitfalls

  • Small tables dominate seq_scan counters. Tables with fewer than a few thousand rows are often scanned sequentially by design. Filter them out of your investigation.
  • Stale statistics cause false positives. A missing index is not the only cause of sequential scans. If EXPLAIN shows a large gap between estimated and actual rows, run ANALYZE on the table and re-check the plan before creating an index.
  • Counters reset on restart. A server restart clears pg_stat_user_tables and pg_stat_statements. Baseline your metrics before making changes.
  • pg_stat_statements evictions. When pg_stat_statements.max is exceeded, least-used entries are deallocated. This causes queryid churn and can make before-and-after comparisons difficult. If you see frequent deallocations in pg_stat_statements_info, raise pg_stat_statements.max.
  • Over-indexing. Every index slows writes and consumes space. Do not create an index solely because a sequential scan exists. Validate with HypoPG first, and drop unused indexes (idx_scan = 0 over a full business cycle) during regular maintenance.

Signals to monitor

SignalWhy it mattersWarning sign
pg_stat_user_tables.seq_tup_readRows read via sequential scanGrowing on tables with more than 10k live rows
pg_stat_user_tables.seq_scan / idx_scan ratioIndex efficiencyseq_scan dominates on large, queried tables
pg_stat_statements.mean_exec_timeQuery latency trendSustained increase on specific fingerprints
pg_stat_statements.stddev_exec_timePlan stabilityHigh variance suggests flapping or skew
auto_explain Seq Scan frequencyPlan qualityRecurring Seq Scan on tables that should use indexes
pg_stat_user_indexes.idx_scanIndex adoptionNew index shows zero scans after creation

How Netdata helps

Netdata surfaces PostgreSQL statistics alongside system telemetry. Use it to correlate sequential scan spikes with query latency and disk I/O.

  • Correlate sequential scan rates on specific tables with disk read throughput and query latency.
  • Alert on per-table scan volume growth to catch regressions before they dominate workload latency.
  • Compare application query latency with PostgreSQL statement statistics to confirm that queries hitting sequentially scanned tables are the same ones spiking in application telemetry.