PostgreSQL missing indexes: detection from pg_stat_statements and logs
When the planner cannot find a suitable index path, it falls back to a sequential scan. On large tables, this turns millisecond queries into multi second outages.
PostgreSQL exposes evidence through built-in instrumentation: pg_stat_user_tables tracks sequential versus index scan ratios, pg_stat_statements surfaces the most time-consuming query fingerprints, and auto_explain logs execution plans that reveal Seq Scan nodes directly.
This guide gives an operator workflow to detect missing indexes using read-only checks and hypothetical index simulation. It assumes pg_stat_statements is enabled; if not, adding it to shared_preload_libraries requires a server restart.
The workflow moves from broad table-level counters to specific query fingerprints, then to plan verification and safe validation with HypoPG. Each step includes the exact queries and safety notes you need to run during a live incident without making things worse.
What this captures
This workflow uses four layers of evidence:
- Table scan patterns from
pg_stat_user_tables, showing which tables are scanned sequentially and how many rows are read. - Query-level timing from
pg_stat_statements, identifying which normalized query fingerprints consume the most execution time. - Execution plans from
auto_explainor manualEXPLAIN (ANALYZE, BUFFERS), revealing whether the planner is choosing sequential scans because no index exists or because statistics are stale. - Hypothetical index impact using the HypoPG extension, letting you test whether a proposed index changes the plan before writing any disk pages or WAL.
Together these answer two questions: where is the missing index, and will adding it actually help?
Prerequisites
Confirm the following before running the workflow:
pg_stat_statementsis enabled. It must be listed inshared_preload_librariesand the extension created in the target database. Changingshared_preload_librariesrequires a PostgreSQL restart.- You have superuser or
pg_monitorrole membership.pg_stat_statementsandpg_stat_user_tablesare readable bypg_monitor, butauto_explainconfiguration and HypoPG installation require superuser. - (Optional)
auto_explainis loaded. To capture plans automatically from logs,auto_explainmust also be inshared_preload_libraries. Likepg_stat_statements, this requires a restart to activate. Once loaded, its parameters can be changed with a configuration reload. - (Optional) HypoPG is installed. Available from the
hypopgpackage on most distributions. Install withCREATE EXTENSION hypopg;in the target database.
Detection workflow
flowchart TD
A[pg_stat_user_tables high seq_tup_read] --> B{Table > 10k rows?}
B -->|No| C[Scan likely correct]
B -->|Yes| D[Find query in pg_stat_statements]
D --> E[Capture plan with EXPLAIN]
E --> F{Seq Scan on large table?}
F -->|No| G[Check statistics]
F -->|Yes| H[Test with HypoPG]
H --> I{Index Scan preferred?}
I -->|Yes| J[CREATE INDEX CONCURRENTLY]
I -->|No| K[Revise query or predicates]Follow these steps in order. All steps are read-only except for HypoPG index creation, which is session-local and writes no WAL.
1. Identify tables with excessive sequential scans
Start with pg_stat_user_tables. Counters are cumulative since the last stats reset or server restart, so compare relative volumes rather than absolute thresholds.
SELECT schemaname, relname,
seq_scan, seq_tup_read,
idx_scan,
CASE WHEN seq_scan + idx_scan > 0
THEN round(100.0 * seq_scan / (seq_scan + idx_scan), 2)
ELSE 0 END AS seq_pct
FROM pg_stat_user_tables
WHERE n_live_tup > 10000
ORDER BY seq_tup_read DESC
LIMIT 20;
What to look for: Large values of seq_tup_read on tables with more than a few thousand live rows. A high seq_pct alone is not enough; a table scanned ten thousand times with zero index scans is suspicious, but if the table has only a few hundred rows the sequential scan may still be correct. Focus on absolute seq_tup_read volume.
Note: these counters reset on server restart. If you restart frequently, baseline-correct by noting pre- and post-restart values.
2. Correlate with pg_stat_statements
Find the query fingerprints consuming the most execution time:
SELECT query, calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 4) AS mean_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
Cross-reference the tables from Step 1 with the query text. Look for WHERE clauses on columns that lack indexes, or joins between large tables without supporting index paths.
Caveat: pg_stat_statements normalizes queries by replacing literals with placeholders. A query that is fast for most values but slow for a specific parameter will appear as a single fingerprint. Check stddev_exec_time on PostgreSQL 13 and later; a high standard deviation relative to the mean suggests parameter-sensitive plan flapping or bimodality that a single index may not fix.
3. Capture execution plans
For immediate inspection of a specific query, use manual EXPLAIN:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT ...
If auto_explain is loaded and you need continuous capture, configure it to log plans for queries exceeding your latency threshold:
ALTER SYSTEM SET auto_explain.log_min_duration = 1000;
ALTER SYSTEM SET auto_explain.log_analyze = true;
ALTER SYSTEM SET auto_explain.log_buffers = true;
SELECT pg_reload_conf();
Warning: auto_explain.log_analyze = true causes the server to execute queries internally to collect timing. This adds overhead and can trigger side effects from volatile functions. Enable it only for short diagnostic windows, then revert.
If auto_explain is unavailable, log_min_duration_statement logs slow query text without plans and requires no preload library.
What to look for: Seq Scan nodes on the tables identified in Step 1. Check the rows estimate versus actual rows. If the estimate is wildly wrong, the problem may be stale statistics rather than a missing index; run ANALYZE on the table first and re-check.
The BUFFERS output is critical. Look for shared hit versus shared read. If a sequential scan shows high shared read, the table is not cached and the scan is hitting disk. If shared hit dominates, the table fits in memory and the scan is less expensive, though still CPU-intensive for large tables.
4. Determine if the sequential scan is actually wrong
Sequential scans are not always mistakes. If a query returns more than 5-10% of a table’s rows, or if the table is small enough to fit in a few pages, a sequential scan is often faster than random index I/O. Before concluding that an index is missing, confirm:
- The table has more than roughly ten thousand rows.
- The query filters on a column that would be highly selective with an index.
- The
Seq Scannode inEXPLAINshows high actual time relative to the rest of the plan.
If the planner estimates it would return a large fraction of the table, it is probably correct to avoid an index. If the table is small or the query is unselective, move on.
5. Test a hypothetical index with HypoPG
Ensure the HypoPG extension is installed in the target database. Then create a virtual index that exists only in the planner’s cost model. It generates no disk I/O, no WAL, and no locks.
SELECT * FROM hypopg_create_index('CREATE INDEX ON my_table(my_column)');
-- Check the new plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT ...
What to look for: If the planner switches from Seq Scan to Index Scan or Bitmap Index Scan and the estimated cost drops significantly, the index is likely to help.
Important: HypoPG indexes are session-local. They disappear when the session ends. They are invisible to other sessions and do not appear in pg_indexes. You cannot use them to warm production workloads.
6. Force the planner to consider index paths (diagnostic only)
If HypoPG is not available, you can temporarily bias the planner to see whether an index path is even possible:
SET enable_seqscan = off;
EXPLAIN (ANALYZE, BUFFERS)
SELECT ...
RESET enable_seqscan;
If the planner still chooses a sequential scan or materialized scan even with enable_seqscan = off, then no useful index path exists for that query shape. You may need to reconsider the query, add a composite index, or fix statistics.
Warning: This affects the entire session. Never leave enable_seqscan = off in production.
7. Create the index safely
If HypoPG or the diagnostic above confirms the index will help, create it without blocking writes:
CREATE INDEX CONCURRENTLY idx_my_table_my_column ON my_table(my_column);
CREATE INDEX CONCURRENTLY avoids locking the table for writes, though it takes longer than a standard CREATE INDEX and cannot run inside a transaction block. Monitor the build progress in system views.
Verifying it works
After the index is built, verify the change had the intended effect:
- Re-run the
EXPLAIN (ANALYZE, BUFFERS)from Step 3. Confirm the planner now uses the index. - Check
pg_stat_user_indexesfor the new index.idx_scanshould increment as queries use it. - Re-query
pg_stat_statementsafter a representative workload cycle. The target fingerprint’smean_exec_timeshould drop. Because counters are cumulative, compare rates before and after rather than absolute totals. - If
pg_stat_user_indexesshows zero scans on the new index after a representative workload cycle, the query shape may not match the index. Common causes include a function wrapper on the indexed column, an implicit type cast in the query, or a join condition that uses a different column order than a composite index. Re-examine the query text and the plan.
Common pitfalls
- Small tables dominate
seq_scancounters. Tables with fewer than a few thousand rows are often scanned sequentially by design. Filter them out of your investigation. - Stale statistics cause false positives. A missing index is not the only cause of sequential scans. If
EXPLAINshows a large gap between estimated and actual rows, runANALYZEon the table and re-check the plan before creating an index. - Counters reset on restart. A server restart clears
pg_stat_user_tablesandpg_stat_statements. Baseline your metrics before making changes. - pg_stat_statements evictions. When
pg_stat_statements.maxis exceeded, least-used entries are deallocated. This causesqueryidchurn and can make before-and-after comparisons difficult. If you see frequent deallocations inpg_stat_statements_info, raisepg_stat_statements.max. - Over-indexing. Every index slows writes and consumes space. Do not create an index solely because a sequential scan exists. Validate with HypoPG first, and drop unused indexes (
idx_scan = 0over a full business cycle) during regular maintenance.
Signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
pg_stat_user_tables.seq_tup_read | Rows read via sequential scan | Growing on tables with more than 10k live rows |
pg_stat_user_tables.seq_scan / idx_scan ratio | Index efficiency | seq_scan dominates on large, queried tables |
pg_stat_statements.mean_exec_time | Query latency trend | Sustained increase on specific fingerprints |
pg_stat_statements.stddev_exec_time | Plan stability | High variance suggests flapping or skew |
auto_explain Seq Scan frequency | Plan quality | Recurring Seq Scan on tables that should use indexes |
pg_stat_user_indexes.idx_scan | Index adoption | New index shows zero scans after creation |
How Netdata helps
Netdata surfaces PostgreSQL statistics alongside system telemetry. Use it to correlate sequential scan spikes with query latency and disk I/O.
- Correlate sequential scan rates on specific tables with disk read throughput and query latency.
- Alert on per-table scan volume growth to catch regressions before they dominate workload latency.
- Compare application query latency with PostgreSQL statement statistics to confirm that queries hitting sequentially scanned tables are the same ones spiking in application telemetry.
Related guides
- How PostgreSQL actually works in production: a mental model for operators
- PostgreSQL ALTER TABLE blocked: zero-downtime DDL patterns
- PostgreSQL autovacuum blocked by long-running transaction: detection and fix
- PostgreSQL autovacuum not running: detection, causes, and fixes
- PostgreSQL autovacuum tuning: per-table thresholds for high-churn workloads
- PostgreSQL blocking queries: finding the root blocker in a lock cascade
- PostgreSQL checkpoint storms: detection, causes, and tuning
- PostgreSQL: checkpoints are occurring too frequently – what to tune
- PostgreSQL connection exhaustion: detection, diagnosis, and prevention
- PostgreSQL connection refused: pg_hba, listen_addresses, and TCP diagnosis
- PostgreSQL: database is not accepting commands to avoid wraparound data loss
- PostgreSQL dead tuples piling up: why autovacuum can’t keep up






