PostgreSQL statistics out of date: ANALYZE, default_statistics_target, and bad plans

When EXPLAIN shows a nested loop joining a million-row table, or a sequential scan where an index should win, and the query text has not changed, stale planner statistics are the likely culprit.

PostgreSQL’s cost-based optimizer relies on ANALYZE-derived catalog data to estimate cardinality. ANALYZE samples table contents and writes histograms, most-common-value lists, and correlation data into pg_statistic. When that data no longer matches the on-disk distribution, the planner misestimates row counts. A cardinality underestimate favors a nested loop; an overestimate favors a sequential scan. The result is a sudden latency spike that connection pooling or hardware scaling will not fix.

This guide covers how to recognize stale-statistics regressions, diagnose affected tables and columns, and fix them without restarting the database or locking tables.

flowchart TD
    A[Stale planner statistics] --> B[Cardinality misestimate]
    B --> C[Wrong join order or scan type]
    C --> D[Nested loop on large table]
    C --> E[Seq scan on selective query]
    D --> F[Latency spike or CPU saturation]
    E --> F

Common causes

CauseWhat it looks likeFirst thing to check
Bulk load without follow-up ANALYZEQuery slow immediately after ETL or batch insert; seq scan on recently loaded tablelast_autoanalyze in pg_stat_user_tables
default_statistics_target too low for skewed dataConsistent cardinality underestimate on high-cardinality or long-tail columnshistogram_bounds array length in pg_stats for the column
Correlated columns without extended statisticsWHERE on two correlated columns returns far fewer rows than the planner expectspg_statistic_ext objects covering the column pair
Table bloat skewing reltuples estimatesPlanner thinks table size is far from reality after heavy updates or deletes; often caused by vacuum blocked by long transactionspg_class.reltuples versus pg_stat_user_tables.n_live_tup

Quick checks

-- Check when ANALYZE last ran per table
SELECT schemaname, relname, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY COALESCE(last_autoanalyze, last_analyze) ASC NULLS FIRST
LIMIT 10;
-- Check for plan flapping in pg_stat_statements
SELECT query, calls, mean_exec_time, stddev_exec_time,
       stddev_exec_time / NULLIF(mean_exec_time, 0) AS cv
FROM pg_stat_statements
WHERE stddev_exec_time / NULLIF(mean_exec_time, 0) > 1
ORDER BY total_exec_time DESC
LIMIT 10;
# Compare planner estimate to reality for a suspect query
EXPLAIN (ANALYZE, BUFFERS)
SELECT ...;
-- Check per-column statistics targets (-1 means default_statistics_target)
SELECT attname,
       CASE WHEN attstattarget = -1 THEN 'default' ELSE attstattarget::text END AS target
FROM pg_attribute
WHERE attrelid = 'my_table'::regclass
  AND attnum > 0
  AND NOT attisdropped;
-- Check for vacuum/analyze blockers
SELECT pid, usename, state, state_change, query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
  AND state_change < NOW() - INTERVAL '5 minutes';
-- Check extended statistics objects
SELECT stxname, stxrelid::regclass, stxkeys
FROM pg_statistic_ext
WHERE stxrelid = 'my_table'::regclass;

How to diagnose it

  1. Capture the bad plan with EXPLAIN (ANALYZE, BUFFERS). Look for nodes where estimated rows diverge from actual rows by more than 10x.
  2. Note the tables and columns involved in the misestimated nodes. Focus on join conditions, WHERE predicates, and GROUP BY columns.
  3. Check pg_stat_user_tables.last_autoanalyze for those tables. If it is older than the last significant data change, the planner is working from outdated samples.
  4. Inspect pg_stats for the affected columns. If a skewed column has a very short histogram_bounds array or a sparse most_common_vals list, the default statistics target is likely too low.
  5. Check for extended statistics. If the query filters on multiple correlated columns, verify whether a CREATE STATISTICS object exists and has been analyzed.
  6. Check for long-running transactions. These do not block ANALYZE, but they block VACUUM, allowing dead tuples and bloat to grow. Bloat skews relpages and reltuples, which degrades plan quality even when histograms are current.
  7. Run a manual ANALYZE on the affected table and recapture the plan. If the plan improves immediately, you have confirmed statistics staleness. If estimates are still wrong, raise the per-column target and re-run.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
pg_stat_user_tables.last_autoanalyzeShows when the planner last received fresh samples for a tableTimestamp older than the table’s typical churn window
pg_stat_statements.stddev_exec_timeReveals plan flapping for the same normalized query textCoefficient of variation (stddev / mean) greater than 1.0
pg_class.reltuples vs n_live_tupDetects drift between catalog estimates and live tuple countsRatio diverges significantly from 1.0
EXPLAIN estimate/actual ratioDirect measure of statistics accuracy at the node levelOff by more than 10x on any plan node
Dead tuple growth with stale vacuumIndicates vacuum may be blocked by long transactionsn_dead_tup rising while last_autovacuum is static

Fixes

Run manual ANALYZE after bulk changes

ANALYZE is safe to run during production. Unlike VACUUM FULL, it does not rewrite the table. It acquires only a ShareUpdateExclusiveLock that blocks DDL but not reads or writes. However, on large tables it still reads a sample of pages and can generate I/O load.

After a bulk load, large DELETE, or significant UPDATE, run ANALYZE on the affected table immediately. Do not wait for autovacuum to notice. If you also need to reclaim dead tuple space, run VACUUM ANALYZE, but be aware that VACUUM generates I/O.

Raise the per-column statistics target

The default statistics target is 100. For columns with skewed distributions or high cardinality, this sample size is often too low. Increase it with:

ALTER TABLE my_table ALTER COLUMN my_column SET STATISTICS 500;

Then run ANALYZE my_table. Higher targets consume more catalog space and slightly lengthen ANALYZE time, but they produce finer histograms and more accurate most-common-value lists. Reduce the target only if ANALYZE duration becomes disruptive.

Add extended statistics for correlated columns

When a query filters on multiple columns that are not independent, the planner multiplies selectivities and overestimates the result set. Create an extended statistics object:

CREATE STATISTICS my_stats (dependencies) ON city, zip_code FROM addresses;
ANALYZE addresses;

This tells the planner that the two columns are correlated. The object does not help until ANALYZE populates it.

Force emergency plan stabilization

If a statistics change has caused an immediate production regression and you need breathing room, you can force the previous plan shape temporarily. For example, if a nested loop is destroying performance, SET enable_nestloop = off in the session or transaction can restore a hash join while you fix the underlying statistics. This is an emergency bandage, not a fix.

Unblock autovacuum

A long-running idle in transaction session blocks VACUUM and lets bloat grow. It does not block ANALYZE, but reclaiming dead tuples requires vacuum to run.

Warning: Terminating a backend aborts the client’s work and can leave the application in an inconsistent state. Verify the session is truly abandoned before killing it.

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
  AND state_change < NOW() - INTERVAL '5 minutes';

Then monitor whether last_autovacuum advances and n_dead_tup falls.

Adjust autovacuum analyze thresholds

For high-churn tables, the default thresholds may let too many changes accumulate before autoanalyze fires. Tune per-table:

ALTER TABLE high_churn_table SET (
    autovacuum_analyze_scale_factor = 0.005,
    autovacuum_analyze_threshold = 100
);

Tradeoff: more frequent autoanalyze consumes more CPU and I/O.

Prevention

  • Schedule ANALYZE after bulk operations. ETL jobs and large migrations should end with an explicit ANALYZE on every table they touch.
  • Set per-column targets during schema design. If you know a column will be skewed, set a higher statistics target when the table is created.
  • Create extended statistics proactively. For any pair of columns that are logically correlated and frequently filtered together, create a CREATE STATISTICS object before the query reaches production.
  • Monitor last_autoanalyze per table. Do not rely solely on database-wide metrics. A table that changed by millions of rows needs its own timeline.
  • Prevent autovacuum blockage. Set idle_in_transaction_session_timeout to a low value so abandoned transactions cannot hold back vacuum indefinitely.

How Netdata helps

  • Correlate query latency spikes in pg_stat_statements with autovacuum and autoanalyze activity on the same table.
  • Alert on stddev_exec_time outliers that indicate plan flapping before the latency spike becomes an outage.
  • Track dead tuple growth per table alongside query execution time to spot when a blocked vacuum is letting bloat grow.
  • Surface pg_stat_user_tables metrics so you can see which tables have not been analyzed recently without writing ad-hoc queries.