MySQL query plan regression: when a query gets slow after a deploy or ANALYZE

A query that completed in milliseconds yesterday now times out. The deploy was clean, or maybe someone ran ANALYZE TABLE during maintenance. Slow_queries climbs, Handler_read_rnd_next jumps, and the same application code suddenly kills the database. The optimizer has changed how it executes one or more queries, and the new plan is far more expensive.

Plan regressions are sharp step-changes, not gradual growth. They correlate tightly with specific events: a schema migration, an index drop, a bulk load that triggered auto-recalc, a manual ANALYZE TABLE, or a MySQL version upgrade. The cause is usually stale or shifted statistics, an optimizer behavior change, or a missing index the optimizer can no longer use.

This guide covers identifying the regressed query, comparing execution plans, checking the statistics that drove the decision, and applying the safest fix. All checks run without restarting the server or changing application code.

What this means

The MySQL optimizer uses table and index statistics to estimate row counts and choose access paths. Inaccurate statistics or a new optimizer feature can flip the plan from an index lookup to a full table scan, change join order, or merge derived tables in ways that explode row examination. A small shift in estimated cardinality can flip the chosen plan. A table with millions of rows can look tiny to the optimizer if statistics are skewed, triggering a nested loop join that repeatedly scans the larger table.

Symptoms: a sudden spike in Handler_read_rnd_next (reads next row in data file, spiking during full scans), Select_full_join (unindexed join), or Sort_merge_passes, alongside a matching rise in Slow_queries. The operational clue is the step-change: one minute the query is fine, the next it is not.

flowchart TD
  A[Slow query spike detected] --> B{EXPLAIN shows scan or bad join?}
  B -->|Yes| C[Check Handler_read_rnd_next and Select_full_join]
  B -->|No| D[Check lock waits and I/O saturation]
  C --> E{Recent deploy or ANALYZE?}
  E -->|Yes| F[Compare stats and cardinality]
  E -->|No| G[Check optimizer_switch and range_optimizer_max_mem_size]
  F --> H[Rebuild table or ANALYZE with higher sample_pages]
  G --> I[Add hint or rewrite query]

Common causes

CauseWhat it looks likeFirst thing to check
Stale statistics after ANALYZE TABLE or bulk loadHandler_read_rnd_next step-change; EXPLAIN shows ALL/scanmysql.innodb_table_stats.n_rows and mysql.innodb_index_stats.n_diff_pfx* vs actual counts
Accidental index drop or invisible indexSelect_full_join or Select_scan spike; specific join lost its indexSHOW INDEX FROM table and EXPLAIN for the query
MySQL 8.0+ optimizer changesPlan changed after upgrade; hash join or derived merge activeEXPLAIN for hash join vs nested loop; optimizer_switch flags
Range optimizer memory exhaustionLarge IN-list queries silently scan; SHOW WARNINGS after EXPLAINCode 3170 warning; range_optimizer_max_mem_size
Newly created index without statisticsQuery uses new index but estimates are zero or wildmysql.innodb_index_stats for the new index
index_merge_intersection misuseQuery intersects multiple indexes and reads more rows than a single indexEXPLAIN for index_merge; disable via optimizer_switch

Quick checks

-- Check slow query rate and threshold
SHOW GLOBAL STATUS LIKE 'Slow_queries';
SHOW GLOBAL VARIABLES LIKE 'long_query_time';

-- Check for full table scans and unindexed joins
SHOW GLOBAL STATUS LIKE 'Handler_read_rnd_next';
SHOW GLOBAL STATUS LIKE 'Select_full_join';

-- Check overall throughput to confirm impact
SHOW GLOBAL STATUS LIKE 'Questions';

-- See what is running right now
SHOW PROCESSLIST;

How to diagnose it

  1. Identify the regressed query. Check the slow query log, or query performance_schema.events_statements_summary_by_digest for a jump in AVG_TIMER_WAIT or SUM_ROWS_EXAMINED with stable COUNT_STAR:
SELECT SCHEMA_NAME, DIGEST_TEXT,
       AVG_TIMER_WAIT / 1e9 AS avg_ms,
       SUM_ROWS_EXAMINED, COUNT_STAR
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME = 'your_database'
ORDER BY AVG_TIMER_WAIT DESC
LIMIT 10;
  1. Capture the current plan. Run EXPLAIN for the query. If you are on MySQL 8.0.18 or later, run EXPLAIN ANALYZE to see actual row counts and timing per operator.

  2. Check for silent warnings. Immediately after EXPLAIN in the same session, run SHOW WARNINGS. Look for Code 3170 , which indicates the range optimizer exceeded range_optimizer_max_mem_size and fell back to a less efficient plan.

  3. Compare statistics before and after. Query mysql.innodb_table_stats and mysql.innodb_index_stats for the affected tables. Compare n_rows, n_diff_pfx* values, and last_update timestamps against your known baseline. If the table is fragmented, ANALYZE TABLE alone may not produce accurate estimates.

  4. Verify index health. Run SHOW INDEX FROM table_name. Confirm the expected index exists, is visible, and has a cardinality that is not zero and roughly matches the actual distinct row count. If an index was dropped or marked invisible, the optimizer cannot use it.

  5. Inspect optimizer switches. Check SHOW GLOBAL VARIABLES LIKE 'optimizer_switch'. Look for index_merge_intersection=on, hash_join=on, or derived_merge=on. Upgrades to MySQL 8.0+ enable hash_join and derived_merge by default, and they can degrade specific queries.

  6. Check for new indexes with missing stats. If the query uses an index created by an online schema change tool, run ANALYZE TABLE specifically for that table. Newly created indexes have no statistics until an explicit or automatic recalc runs.

  7. Correlate with the event. Match the timestamp of the regression against deployments, ANALYZE TABLE runs, bulk loads, or version changes. The closer the correlation, the more likely the root cause is statistical or optimizer-related rather than data growth.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Slow_queries rateDirect indicator of query regression>3x baseline for 10 minutes
Handler_read_rnd_nextReads next row in data file; spikes during full scansStep-change >5x baseline sustained
Select_full_joinJoins executing without usable indexNonzero sustained rate in OLTP
Innodb_rows_read / Com_selectRow examination efficiencyRatio jumps significantly
Sort_merge_passesSort buffer overflow to diskIncreasing rate
Created_tmp_disk_tablesTemp table spillsRatio to total temp tables >25%

Fixes

Stale or inaccurate statistics

If EXPLAIN shows a table scan because cardinality estimates are wrong, the fastest safe fix is to rebuild the table statistics. Run ANALYZE TABLE with an increased sample size to reduce sampling error on fragmented or skewed indexes:

-- Requires SUPER. On managed platforms, use a parameter group instead.
-- High sample pages increase I/O during ANALYZE.
SET GLOBAL innodb_stats_persistent_sample_pages = 512;
ANALYZE TABLE affected_table;
SET GLOBAL innodb_stats_persistent_sample_pages = 20;

If the table is heavily fragmented, ANALYZE TABLE samples random leaf pages that may be unrepresentative. Rebuild the table to defragment and reset page layout:

-- Warning: this locks the table and rewrites it.
-- Use online DDL or a schema change tool in production.
ALTER TABLE affected_table ENGINE=InnoDB;

After rebuilding, run ANALYZE TABLE again so the optimizer sees the new page distribution.

Range optimizer memory exhaustion

If SHOW WARNINGS shows Code 3170 after EXPLAIN, the query’s range analysis consumed more memory than range_optimizer_max_mem_size allows. The optimizer silently skips range optimization and falls back to a full index or table scan. Increase the limit dynamically:

SET GLOBAL range_optimizer_max_mem_size = 16777216;

For a permanent fix without raising the global limit, rewrite large IN-lists to use a temporary table join. This avoids the range optimizer path entirely and scales better for lists with hundreds or thousands of values.

Optimizer feature regressions

For queries that degrade due to index_merge_intersection, hash_join, or derived_merge in MySQL 8.0+, disable the feature at the session level to confirm the cause:

SET SESSION optimizer_switch='index_merge_intersection=off';

If disabling restores performance, the permanent fix is usually a hint such as FORCE INDEX, a query rewrite to simplify the join or derived table, or updated statistics so the cost model chooses correctly. Do not leave optimizer switches disabled globally without thorough testing; it affects every query on the server.

Missing or invisible indexes

If SHOW INDEX shows a critical index was dropped or set to invisible, restore it and run ANALYZE TABLE. If the index exists but the query ignores it, compare the index cardinality in mysql.innodb_index_stats against the actual table data. An index with a cardinality of zero or one will never be chosen even if structurally correct.

If you need an immediate tactical workaround and cannot change indexes, FORCE INDEX can pin the plan. Document the hint: it will rot as data distributions change.

Prevention

  • Run explicit ANALYZE TABLE after bulk data changes. InnoDB’s auto-recalc triggers only after roughly 10 percent of rows change, and the background thread is not guaranteed to run promptly. Explicit analysis after bulk loads, partition exchanges, and schema migrations is the safe path.
  • Baseline per-query latency and row counts. Track AVG_TIMER_WAIT and SUM_ROWS_EXAMINED per digest in performance_schema.events_statements_summary_by_digest. Alert on digest-level deviations before they cascade into saturation.
  • Validate query plans before major upgrades. MySQL 8.0 enables hash_join, derived_merge, and index_merge_intersection by default. Use plan comparison tooling against captured slow query logs before upgrading from 5.7 or older 8.0 releases.
  • Keep slow query logging operational. Set long_query_time to 1 second or less in production OLTP. The default of 10 seconds hides regressions until they cause connection pile-ups.
  • Audit index statistics after online schema changes. Tools like gh-ost and pt-online-schema-change create indexes without statistics. Run ANALYZE TABLE immediately after cutover.
  • Monitor range query patterns. If your workload uses large IN-lists or complex OR conditions, watch for warnings after EXPLAIN and track range_optimizer_max_mem_size headroom.

How Netdata helps

Netdata correlates Slow_queries rate with Handler_read_rnd_next and Select_full_join step-changes. It surfaces per-query digest latency from performance_schema to identify the exact degraded query pattern, and alerts on buffer pool and I/O saturation that compound the impact of a bad plan. Lock wait metrics help distinguish plan regressions from lock contention or checkpoint stalls.