PostgreSQL statistics out of date: ANALYZE, default_statistics_target, and bad plans
When EXPLAIN shows a nested loop joining a million-row table, or a sequential scan where an index should win, and the query text has not changed, stale planner statistics are the likely culprit.
PostgreSQL’s cost-based optimizer relies on ANALYZE-derived catalog data to estimate cardinality. ANALYZE samples table contents and writes histograms, most-common-value lists, and correlation data into pg_statistic. When that data no longer matches the on-disk distribution, the planner misestimates row counts. A cardinality underestimate favors a nested loop; an overestimate favors a sequential scan. The result is a sudden latency spike that connection pooling or hardware scaling will not fix.
This guide covers how to recognize stale-statistics regressions, diagnose affected tables and columns, and fix them without restarting the database or locking tables.
flowchart TD
A[Stale planner statistics] --> B[Cardinality misestimate]
B --> C[Wrong join order or scan type]
C --> D[Nested loop on large table]
C --> E[Seq scan on selective query]
D --> F[Latency spike or CPU saturation]
E --> FCommon causes
| Cause | What it looks like | First thing to check |
|---|---|---|
Bulk load without follow-up ANALYZE | Query slow immediately after ETL or batch insert; seq scan on recently loaded table | last_autoanalyze in pg_stat_user_tables |
default_statistics_target too low for skewed data | Consistent cardinality underestimate on high-cardinality or long-tail columns | histogram_bounds array length in pg_stats for the column |
| Correlated columns without extended statistics | WHERE on two correlated columns returns far fewer rows than the planner expects | pg_statistic_ext objects covering the column pair |
Table bloat skewing reltuples estimates | Planner thinks table size is far from reality after heavy updates or deletes; often caused by vacuum blocked by long transactions | pg_class.reltuples versus pg_stat_user_tables.n_live_tup |
Quick checks
-- Check when ANALYZE last ran per table
SELECT schemaname, relname, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY COALESCE(last_autoanalyze, last_analyze) ASC NULLS FIRST
LIMIT 10;
-- Check for plan flapping in pg_stat_statements
SELECT query, calls, mean_exec_time, stddev_exec_time,
stddev_exec_time / NULLIF(mean_exec_time, 0) AS cv
FROM pg_stat_statements
WHERE stddev_exec_time / NULLIF(mean_exec_time, 0) > 1
ORDER BY total_exec_time DESC
LIMIT 10;
# Compare planner estimate to reality for a suspect query
EXPLAIN (ANALYZE, BUFFERS)
SELECT ...;
-- Check per-column statistics targets (-1 means default_statistics_target)
SELECT attname,
CASE WHEN attstattarget = -1 THEN 'default' ELSE attstattarget::text END AS target
FROM pg_attribute
WHERE attrelid = 'my_table'::regclass
AND attnum > 0
AND NOT attisdropped;
-- Check for vacuum/analyze blockers
SELECT pid, usename, state, state_change, query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND state_change < NOW() - INTERVAL '5 minutes';
-- Check extended statistics objects
SELECT stxname, stxrelid::regclass, stxkeys
FROM pg_statistic_ext
WHERE stxrelid = 'my_table'::regclass;
How to diagnose it
- Capture the bad plan with
EXPLAIN (ANALYZE, BUFFERS). Look for nodes where estimated rows diverge from actual rows by more than 10x. - Note the tables and columns involved in the misestimated nodes. Focus on join conditions,
WHEREpredicates, andGROUP BYcolumns. - Check
pg_stat_user_tables.last_autoanalyzefor those tables. If it is older than the last significant data change, the planner is working from outdated samples. - Inspect
pg_statsfor the affected columns. If a skewed column has a very shorthistogram_boundsarray or a sparsemost_common_valslist, the default statistics target is likely too low. - Check for extended statistics. If the query filters on multiple correlated columns, verify whether a
CREATE STATISTICSobject exists and has been analyzed. - Check for long-running transactions. These do not block
ANALYZE, but they blockVACUUM, allowing dead tuples and bloat to grow. Bloat skewsrelpagesandreltuples, which degrades plan quality even when histograms are current. - Run a manual
ANALYZEon the affected table and recapture the plan. If the plan improves immediately, you have confirmed statistics staleness. If estimates are still wrong, raise the per-column target and re-run.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
pg_stat_user_tables.last_autoanalyze | Shows when the planner last received fresh samples for a table | Timestamp older than the table’s typical churn window |
pg_stat_statements.stddev_exec_time | Reveals plan flapping for the same normalized query text | Coefficient of variation (stddev / mean) greater than 1.0 |
pg_class.reltuples vs n_live_tup | Detects drift between catalog estimates and live tuple counts | Ratio diverges significantly from 1.0 |
EXPLAIN estimate/actual ratio | Direct measure of statistics accuracy at the node level | Off by more than 10x on any plan node |
| Dead tuple growth with stale vacuum | Indicates vacuum may be blocked by long transactions | n_dead_tup rising while last_autovacuum is static |
Fixes
Run manual ANALYZE after bulk changes
ANALYZE is safe to run during production. Unlike VACUUM FULL, it does not rewrite the table. It acquires only a ShareUpdateExclusiveLock that blocks DDL but not reads or writes. However, on large tables it still reads a sample of pages and can generate I/O load.
After a bulk load, large DELETE, or significant UPDATE, run ANALYZE on the affected table immediately. Do not wait for autovacuum to notice. If you also need to reclaim dead tuple space, run VACUUM ANALYZE, but be aware that VACUUM generates I/O.
Raise the per-column statistics target
The default statistics target is 100. For columns with skewed distributions or high cardinality, this sample size is often too low. Increase it with:
ALTER TABLE my_table ALTER COLUMN my_column SET STATISTICS 500;
Then run ANALYZE my_table. Higher targets consume more catalog space and slightly lengthen ANALYZE time, but they produce finer histograms and more accurate most-common-value lists. Reduce the target only if ANALYZE duration becomes disruptive.
Add extended statistics for correlated columns
When a query filters on multiple columns that are not independent, the planner multiplies selectivities and overestimates the result set. Create an extended statistics object:
CREATE STATISTICS my_stats (dependencies) ON city, zip_code FROM addresses;
ANALYZE addresses;
This tells the planner that the two columns are correlated. The object does not help until ANALYZE populates it.
Force emergency plan stabilization
If a statistics change has caused an immediate production regression and you need breathing room, you can force the previous plan shape temporarily. For example, if a nested loop is destroying performance, SET enable_nestloop = off in the session or transaction can restore a hash join while you fix the underlying statistics. This is an emergency bandage, not a fix.
Unblock autovacuum
A long-running idle in transaction session blocks VACUUM and lets bloat grow. It does not block ANALYZE, but reclaiming dead tuples requires vacuum to run.
Warning: Terminating a backend aborts the client’s work and can leave the application in an inconsistent state. Verify the session is truly abandoned before killing it.
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND state_change < NOW() - INTERVAL '5 minutes';
Then monitor whether last_autovacuum advances and n_dead_tup falls.
Adjust autovacuum analyze thresholds
For high-churn tables, the default thresholds may let too many changes accumulate before autoanalyze fires. Tune per-table:
ALTER TABLE high_churn_table SET (
autovacuum_analyze_scale_factor = 0.005,
autovacuum_analyze_threshold = 100
);
Tradeoff: more frequent autoanalyze consumes more CPU and I/O.
Prevention
- Schedule ANALYZE after bulk operations. ETL jobs and large migrations should end with an explicit
ANALYZEon every table they touch. - Set per-column targets during schema design. If you know a column will be skewed, set a higher statistics target when the table is created.
- Create extended statistics proactively. For any pair of columns that are logically correlated and frequently filtered together, create a
CREATE STATISTICSobject before the query reaches production. - Monitor
last_autoanalyzeper table. Do not rely solely on database-wide metrics. A table that changed by millions of rows needs its own timeline. - Prevent autovacuum blockage. Set
idle_in_transaction_session_timeoutto a low value so abandoned transactions cannot hold back vacuum indefinitely.
How Netdata helps
- Correlate query latency spikes in
pg_stat_statementswith autovacuum and autoanalyze activity on the same table. - Alert on
stddev_exec_timeoutliers that indicate plan flapping before the latency spike becomes an outage. - Track dead tuple growth per table alongside query execution time to spot when a blocked vacuum is letting bloat grow.
- Surface
pg_stat_user_tablesmetrics so you can see which tables have not been analyzed recently without writing ad-hoc queries.
Related guides
- How PostgreSQL actually works in production: a mental model for operators
- PostgreSQL ALTER TABLE blocked: zero-downtime DDL patterns
- PostgreSQL autovacuum blocked by long-running transaction: detection and fix
- PostgreSQL autovacuum not running: detection, causes, and fixes
- PostgreSQL autovacuum tuning: per-table thresholds for high-churn workloads
- PostgreSQL blocking queries: finding the root blocker in a lock cascade
- PostgreSQL checkpoint storms: detection, causes, and tuning
- PostgreSQL: checkpoints are occurring too frequently – what to tune
- PostgreSQL connection exhaustion: detection, diagnosis, and prevention
- PostgreSQL connection refused: pg_hba, listen_addresses, and TCP diagnosis
- PostgreSQL: database is not accepting commands to avoid wraparound data loss
- PostgreSQL dead tuples piling up: why autovacuum can’t keep up






