MySQL query plan regression: when a query gets slow after a deploy or ANALYZE
A query that completed in milliseconds yesterday now times out. The deploy was clean, or maybe someone ran ANALYZE TABLE during maintenance. Slow_queries climbs, Handler_read_rnd_next jumps, and the same application code suddenly kills the database. The optimizer has changed how it executes one or more queries, and the new plan is far more expensive.
Plan regressions are sharp step-changes, not gradual growth. They correlate tightly with specific events: a schema migration, an index drop, a bulk load that triggered auto-recalc, a manual ANALYZE TABLE, or a MySQL version upgrade. The cause is usually stale or shifted statistics, an optimizer behavior change, or a missing index the optimizer can no longer use.
This guide covers identifying the regressed query, comparing execution plans, checking the statistics that drove the decision, and applying the safest fix. All checks run without restarting the server or changing application code.
What this means
The MySQL optimizer uses table and index statistics to estimate row counts and choose access paths. Inaccurate statistics or a new optimizer feature can flip the plan from an index lookup to a full table scan, change join order, or merge derived tables in ways that explode row examination. A small shift in estimated cardinality can flip the chosen plan. A table with millions of rows can look tiny to the optimizer if statistics are skewed, triggering a nested loop join that repeatedly scans the larger table.
Symptoms: a sudden spike in Handler_read_rnd_next (reads next row in data file, spiking during full scans), Select_full_join (unindexed join), or Sort_merge_passes, alongside a matching rise in Slow_queries. The operational clue is the step-change: one minute the query is fine, the next it is not.
flowchart TD
A[Slow query spike detected] --> B{EXPLAIN shows scan or bad join?}
B -->|Yes| C[Check Handler_read_rnd_next and Select_full_join]
B -->|No| D[Check lock waits and I/O saturation]
C --> E{Recent deploy or ANALYZE?}
E -->|Yes| F[Compare stats and cardinality]
E -->|No| G[Check optimizer_switch and range_optimizer_max_mem_size]
F --> H[Rebuild table or ANALYZE with higher sample_pages]
G --> I[Add hint or rewrite query]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Stale statistics after ANALYZE TABLE or bulk load | Handler_read_rnd_next step-change; EXPLAIN shows ALL/scan | mysql.innodb_table_stats.n_rows and mysql.innodb_index_stats.n_diff_pfx* vs actual counts |
| Accidental index drop or invisible index | Select_full_join or Select_scan spike; specific join lost its index | SHOW INDEX FROM table and EXPLAIN for the query |
| MySQL 8.0+ optimizer changes | Plan changed after upgrade; hash join or derived merge active | EXPLAIN for hash join vs nested loop; optimizer_switch flags |
| Range optimizer memory exhaustion | Large IN-list queries silently scan; SHOW WARNINGS after EXPLAIN | Code 3170 warning; range_optimizer_max_mem_size |
| Newly created index without statistics | Query uses new index but estimates are zero or wild | mysql.innodb_index_stats for the new index |
| index_merge_intersection misuse | Query intersects multiple indexes and reads more rows than a single index | EXPLAIN for index_merge; disable via optimizer_switch |
Quick checks
-- Check slow query rate and threshold
SHOW GLOBAL STATUS LIKE 'Slow_queries';
SHOW GLOBAL VARIABLES LIKE 'long_query_time';
-- Check for full table scans and unindexed joins
SHOW GLOBAL STATUS LIKE 'Handler_read_rnd_next';
SHOW GLOBAL STATUS LIKE 'Select_full_join';
-- Check overall throughput to confirm impact
SHOW GLOBAL STATUS LIKE 'Questions';
-- See what is running right now
SHOW PROCESSLIST;
How to diagnose it
- Identify the regressed query. Check the slow query log, or query
performance_schema.events_statements_summary_by_digestfor a jump inAVG_TIMER_WAITorSUM_ROWS_EXAMINEDwith stableCOUNT_STAR:
SELECT SCHEMA_NAME, DIGEST_TEXT,
AVG_TIMER_WAIT / 1e9 AS avg_ms,
SUM_ROWS_EXAMINED, COUNT_STAR
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME = 'your_database'
ORDER BY AVG_TIMER_WAIT DESC
LIMIT 10;
Capture the current plan. Run
EXPLAINfor the query. If you are on MySQL 8.0.18 or later, runEXPLAIN ANALYZEto see actual row counts and timing per operator.Check for silent warnings. Immediately after
EXPLAINin the same session, runSHOW WARNINGS. Look for Code 3170 , which indicates the range optimizer exceededrange_optimizer_max_mem_sizeand fell back to a less efficient plan.Compare statistics before and after. Query
mysql.innodb_table_statsandmysql.innodb_index_statsfor the affected tables. Comparen_rows,n_diff_pfx*values, andlast_updatetimestamps against your known baseline. If the table is fragmented,ANALYZE TABLEalone may not produce accurate estimates.Verify index health. Run
SHOW INDEX FROM table_name. Confirm the expected index exists, is visible, and has a cardinality that is not zero and roughly matches the actual distinct row count. If an index was dropped or marked invisible, the optimizer cannot use it.Inspect optimizer switches. Check
SHOW GLOBAL VARIABLES LIKE 'optimizer_switch'. Look forindex_merge_intersection=on,hash_join=on, orderived_merge=on. Upgrades to MySQL 8.0+ enablehash_joinandderived_mergeby default, and they can degrade specific queries.Check for new indexes with missing stats. If the query uses an index created by an online schema change tool, run
ANALYZE TABLEspecifically for that table. Newly created indexes have no statistics until an explicit or automatic recalc runs.Correlate with the event. Match the timestamp of the regression against deployments,
ANALYZE TABLEruns, bulk loads, or version changes. The closer the correlation, the more likely the root cause is statistical or optimizer-related rather than data growth.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Slow_queries rate | Direct indicator of query regression | >3x baseline for 10 minutes |
| Handler_read_rnd_next | Reads next row in data file; spikes during full scans | Step-change >5x baseline sustained |
| Select_full_join | Joins executing without usable index | Nonzero sustained rate in OLTP |
| Innodb_rows_read / Com_select | Row examination efficiency | Ratio jumps significantly |
| Sort_merge_passes | Sort buffer overflow to disk | Increasing rate |
| Created_tmp_disk_tables | Temp table spills | Ratio to total temp tables >25% |
Fixes
Stale or inaccurate statistics
If EXPLAIN shows a table scan because cardinality estimates are wrong, the fastest safe fix is to rebuild the table statistics. Run ANALYZE TABLE with an increased sample size to reduce sampling error on fragmented or skewed indexes:
-- Requires SUPER. On managed platforms, use a parameter group instead.
-- High sample pages increase I/O during ANALYZE.
SET GLOBAL innodb_stats_persistent_sample_pages = 512;
ANALYZE TABLE affected_table;
SET GLOBAL innodb_stats_persistent_sample_pages = 20;
If the table is heavily fragmented, ANALYZE TABLE samples random leaf pages that may be unrepresentative. Rebuild the table to defragment and reset page layout:
-- Warning: this locks the table and rewrites it.
-- Use online DDL or a schema change tool in production.
ALTER TABLE affected_table ENGINE=InnoDB;
After rebuilding, run ANALYZE TABLE again so the optimizer sees the new page distribution.
Range optimizer memory exhaustion
If SHOW WARNINGS shows Code 3170 after EXPLAIN, the query’s range analysis consumed more memory than range_optimizer_max_mem_size allows. The optimizer silently skips range optimization and falls back to a full index or table scan. Increase the limit dynamically:
SET GLOBAL range_optimizer_max_mem_size = 16777216;
For a permanent fix without raising the global limit, rewrite large IN-lists to use a temporary table join. This avoids the range optimizer path entirely and scales better for lists with hundreds or thousands of values.
Optimizer feature regressions
For queries that degrade due to index_merge_intersection, hash_join, or derived_merge in MySQL 8.0+, disable the feature at the session level to confirm the cause:
SET SESSION optimizer_switch='index_merge_intersection=off';
If disabling restores performance, the permanent fix is usually a hint such as FORCE INDEX, a query rewrite to simplify the join or derived table, or updated statistics so the cost model chooses correctly. Do not leave optimizer switches disabled globally without thorough testing; it affects every query on the server.
Missing or invisible indexes
If SHOW INDEX shows a critical index was dropped or set to invisible, restore it and run ANALYZE TABLE. If the index exists but the query ignores it, compare the index cardinality in mysql.innodb_index_stats against the actual table data. An index with a cardinality of zero or one will never be chosen even if structurally correct.
If you need an immediate tactical workaround and cannot change indexes, FORCE INDEX can pin the plan. Document the hint: it will rot as data distributions change.
Prevention
- Run explicit ANALYZE TABLE after bulk data changes. InnoDB’s auto-recalc triggers only after roughly 10 percent of rows change, and the background thread is not guaranteed to run promptly. Explicit analysis after bulk loads, partition exchanges, and schema migrations is the safe path.
- Baseline per-query latency and row counts. Track
AVG_TIMER_WAITandSUM_ROWS_EXAMINEDper digest inperformance_schema.events_statements_summary_by_digest. Alert on digest-level deviations before they cascade into saturation. - Validate query plans before major upgrades. MySQL 8.0 enables
hash_join,derived_merge, andindex_merge_intersectionby default. Use plan comparison tooling against captured slow query logs before upgrading from 5.7 or older 8.0 releases. - Keep slow query logging operational. Set
long_query_timeto 1 second or less in production OLTP. The default of 10 seconds hides regressions until they cause connection pile-ups. - Audit index statistics after online schema changes. Tools like gh-ost and pt-online-schema-change create indexes without statistics. Run
ANALYZE TABLEimmediately after cutover. - Monitor range query patterns. If your workload uses large IN-lists or complex OR conditions, watch for warnings after
EXPLAINand trackrange_optimizer_max_mem_sizeheadroom.
How Netdata helps
Netdata correlates Slow_queries rate with Handler_read_rnd_next and Select_full_join step-changes. It surfaces per-query digest latency from performance_schema to identify the exact degraded query pattern, and alerts on buffer pool and I/O saturation that compound the impact of a bad plan. Lock wait metrics help distinguish plan regressions from lock contention or checkpoint stalls.
Related guides
- How MySQL actually works in production: a mental model for operators
- MySQL Aborted_connects and Aborted_clients climbing: diagnosis
- MySQL adaptive hash index latch contention: high CPU, low throughput
- MySQL InnoDB buffer pool hit ratio collapse: the cliff edge
- MySQL slow after restart: buffer pool warm-up and the cold cache
- MySQL innodb_buffer_pool_size tuning: 60-80% of RAM and when that breaks
- MySQL Innodb_buffer_pool_wait_free > 0: buffer pool memory pressure
- MySQL InnoDB checkpoint age: the redo log capacity signal nobody watches
- MySQL connection exhaustion: detection, diagnosis, and prevention
- MySQL innodb_deadlock_detect=OFF: when deadlock detection becomes the bottleneck
- MySQL ERROR 1213: Deadlock found when trying to get lock; try restarting transaction
- MySQL FLUSH TABLES WITH READ LOCK stall: backups that freeze the server







