$ guides / mysql / mysql-query-plan-regression ▌

Operations Guides

MySQL query plan regression: when a query gets slow after a deploy or ANALYZE

A query that completed in milliseconds yesterday now times out. The deploy was clean, or maybe someone ran ANALYZE TABLE during maintenance. Slow_queries climbs, Handler_read_rnd_next jumps, and the same application code suddenly kills the database. The optimizer has changed how it executes one or more queries, and the new plan is far more expensive.

Plan regressions are sharp step-changes, not gradual growth. They correlate tightly with specific events: a schema migration, an index drop, a bulk load that triggered auto-recalc, a manual ANALYZE TABLE, or a MySQL version upgrade. The cause is usually stale or shifted statistics, an optimizer behavior change, or a missing index the optimizer can no longer use.

This guide covers identifying the regressed query, comparing execution plans, checking the statistics that drove the decision, and applying the safest fix. All checks run without restarting the server or changing application code.

What this means

The MySQL optimizer uses table and index statistics to estimate row counts and choose access paths. Inaccurate statistics or a new optimizer feature can flip the plan from an index lookup to a full table scan, change join order, or merge derived tables in ways that explode row examination. A small shift in estimated cardinality can flip the chosen plan. A table with millions of rows can look tiny to the optimizer if statistics are skewed, triggering a nested loop join that repeatedly scans the larger table.

Symptoms: a sudden spike in Handler_read_rnd_next (reads next row in data file, spiking during full scans), Select_full_join (unindexed join), or Sort_merge_passes, alongside a matching rise in Slow_queries. The operational clue is the step-change: one minute the query is fine, the next it is not.

flowchart TD
  A[Slow query spike detected] --> B{EXPLAIN shows scan or bad join?}
  B -->|Yes| C[Check Handler_read_rnd_next and Select_full_join]
  B -->|No| D[Check lock waits and I/O saturation]
  C --> E{Recent deploy or ANALYZE?}
  E -->|Yes| F[Compare stats and cardinality]
  E -->|No| G[Check optimizer_switch and range_optimizer_max_mem_size]
  F --> H[Rebuild table or ANALYZE with higher sample_pages]
  G --> I[Add hint or rewrite query]

Common causes

Cause	What it looks like	First thing to check
Stale statistics after ANALYZE TABLE or bulk load	Handler_read_rnd_next step-change; EXPLAIN shows ALL/scan	`mysql.innodb_table_stats.n_rows` and `mysql.innodb_index_stats.n_diff_pfx*` vs actual counts
Accidental index drop or invisible index	Select_full_join or Select_scan spike; specific join lost its index	`SHOW INDEX FROM table` and `EXPLAIN` for the query
MySQL 8.0+ optimizer changes	Plan changed after upgrade; hash join or derived merge active	`EXPLAIN` for hash join vs nested loop; `optimizer_switch` flags
Range optimizer memory exhaustion	Large IN-list queries silently scan; `SHOW WARNINGS` after `EXPLAIN`	Code 3170 warning; `range_optimizer_max_mem_size`
Newly created index without statistics	Query uses new index but estimates are zero or wild	`mysql.innodb_index_stats` for the new index
index_merge_intersection misuse	Query intersects multiple indexes and reads more rows than a single index	`EXPLAIN` for index_merge; disable via `optimizer_switch`

Quick checks

-- Check slow query rate and threshold
SHOW GLOBAL STATUS LIKE 'Slow_queries';
SHOW GLOBAL VARIABLES LIKE 'long_query_time';

-- Check for full table scans and unindexed joins
SHOW GLOBAL STATUS LIKE 'Handler_read_rnd_next';
SHOW GLOBAL STATUS LIKE 'Select_full_join';

-- Check overall throughput to confirm impact
SHOW GLOBAL STATUS LIKE 'Questions';

-- See what is running right now
SHOW PROCESSLIST;

How to diagnose it

Identify the regressed query. Check the slow query log, or query performance_schema.events_statements_summary_by_digest for a jump in AVG_TIMER_WAIT or SUM_ROWS_EXAMINED with stable COUNT_STAR:

SELECT SCHEMA_NAME, DIGEST_TEXT,
       AVG_TIMER_WAIT / 1e9 AS avg_ms,
       SUM_ROWS_EXAMINED, COUNT_STAR
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME = 'your_database'
ORDER BY AVG_TIMER_WAIT DESC
LIMIT 10;

Capture the current plan. Run EXPLAIN for the query. If you are on MySQL 8.0.18 or later, run EXPLAIN ANALYZE to see actual row counts and timing per operator.
Check for silent warnings. Immediately after EXPLAIN in the same session, run SHOW WARNINGS. Look for Code 3170 , which indicates the range optimizer exceeded range_optimizer_max_mem_size and fell back to a less efficient plan.
Compare statistics before and after. Query mysql.innodb_table_stats and mysql.innodb_index_stats for the affected tables. Compare n_rows, n_diff_pfx* values, and last_update timestamps against your known baseline. If the table is fragmented, ANALYZE TABLE alone may not produce accurate estimates.
Verify index health. Run SHOW INDEX FROM table_name. Confirm the expected index exists, is visible, and has a cardinality that is not zero and roughly matches the actual distinct row count. If an index was dropped or marked invisible, the optimizer cannot use it.
Inspect optimizer switches. Check SHOW GLOBAL VARIABLES LIKE 'optimizer_switch'. Look for index_merge_intersection=on, hash_join=on, or derived_merge=on. Upgrades to MySQL 8.0+ enable hash_join and derived_merge by default, and they can degrade specific queries.
Check for new indexes with missing stats. If the query uses an index created by an online schema change tool, run ANALYZE TABLE specifically for that table. Newly created indexes have no statistics until an explicit or automatic recalc runs.
Correlate with the event. Match the timestamp of the regression against deployments, ANALYZE TABLE runs, bulk loads, or version changes. The closer the correlation, the more likely the root cause is statistical or optimizer-related rather than data growth.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Slow_queries rate	Direct indicator of query regression	>3x baseline for 10 minutes
Handler_read_rnd_next	Reads next row in data file; spikes during full scans	Step-change >5x baseline sustained
Select_full_join	Joins executing without usable index	Nonzero sustained rate in OLTP
Innodb_rows_read / Com_select	Row examination efficiency	Ratio jumps significantly
Sort_merge_passes	Sort buffer overflow to disk	Increasing rate
Created_tmp_disk_tables	Temp table spills	Ratio to total temp tables >25%

Fixes

Stale or inaccurate statistics

If EXPLAIN shows a table scan because cardinality estimates are wrong, the fastest safe fix is to rebuild the table statistics. Run ANALYZE TABLE with an increased sample size to reduce sampling error on fragmented or skewed indexes:

-- Requires SUPER. On managed platforms, use a parameter group instead.
-- High sample pages increase I/O during ANALYZE.
SET GLOBAL innodb_stats_persistent_sample_pages = 512;
ANALYZE TABLE affected_table;
SET GLOBAL innodb_stats_persistent_sample_pages = 20;

If the table is heavily fragmented, ANALYZE TABLE samples random leaf pages that may be unrepresentative. Rebuild the table to defragment and reset page layout:

-- Warning: this locks the table and rewrites it.
-- Use online DDL or a schema change tool in production.
ALTER TABLE affected_table ENGINE=InnoDB;

After rebuilding, run ANALYZE TABLE again so the optimizer sees the new page distribution.

Range optimizer memory exhaustion

If SHOW WARNINGS shows Code 3170 after EXPLAIN, the query’s range analysis consumed more memory than range_optimizer_max_mem_size allows. The optimizer silently skips range optimization and falls back to a full index or table scan. Increase the limit dynamically:

SET GLOBAL range_optimizer_max_mem_size = 16777216;

For a permanent fix without raising the global limit, rewrite large IN-lists to use a temporary table join. This avoids the range optimizer path entirely and scales better for lists with hundreds or thousands of values.

Optimizer feature regressions

For queries that degrade due to index_merge_intersection, hash_join, or derived_merge in MySQL 8.0+, disable the feature at the session level to confirm the cause:

SET SESSION optimizer_switch='index_merge_intersection=off';

If disabling restores performance, the permanent fix is usually a hint such as FORCE INDEX, a query rewrite to simplify the join or derived table, or updated statistics so the cost model chooses correctly. Do not leave optimizer switches disabled globally without thorough testing; it affects every query on the server.

Missing or invisible indexes

If SHOW INDEX shows a critical index was dropped or set to invisible, restore it and run ANALYZE TABLE. If the index exists but the query ignores it, compare the index cardinality in mysql.innodb_index_stats against the actual table data. An index with a cardinality of zero or one will never be chosen even if structurally correct.

If you need an immediate tactical workaround and cannot change indexes, FORCE INDEX can pin the plan. Document the hint: it will rot as data distributions change.

Prevention

Run explicit ANALYZE TABLE after bulk data changes. InnoDB’s auto-recalc triggers only after roughly 10 percent of rows change, and the background thread is not guaranteed to run promptly. Explicit analysis after bulk loads, partition exchanges, and schema migrations is the safe path.
Baseline per-query latency and row counts. Track AVG_TIMER_WAIT and SUM_ROWS_EXAMINED per digest in performance_schema.events_statements_summary_by_digest. Alert on digest-level deviations before they cascade into saturation.
Validate query plans before major upgrades. MySQL 8.0 enables hash_join, derived_merge, and index_merge_intersection by default. Use plan comparison tooling against captured slow query logs before upgrading from 5.7 or older 8.0 releases.
Keep slow query logging operational. Set long_query_time to 1 second or less in production OLTP. The default of 10 seconds hides regressions until they cause connection pile-ups.
Audit index statistics after online schema changes. Tools like gh-ost and pt-online-schema-change create indexes without statistics. Run ANALYZE TABLE immediately after cutover.
Monitor range query patterns. If your workload uses large IN-lists or complex OR conditions, watch for warnings after EXPLAIN and track range_optimizer_max_mem_size headroom.

How Netdata helps

Netdata correlates Slow_queries rate with Handler_read_rnd_next and Select_full_join step-changes. It surfaces per-query digest latency from performance_schema to identify the exact degraded query pattern, and alerts on buffer pool and I/O saturation that compound the impact of a bad plan. Lock wait metrics help distinguish plan regressions from lock contention or checkpoint stalls.

The Netdata solution

MySQL monitoring with Netdata

Netdata monitors MySQL and MariaDB with per-second metrics and ML anomaly detection. Track connection usage, query throughput, slow queries, redo-log pressure, and replication lag alongside the host and storage signals that explain them.

See MySQL monitoring → Start monitoring free

MySQL query plan regression: when a query gets slow after a deploy or ANALYZE

MySQL query plan regression: when a query gets slow after a deploy or ANALYZE

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Stale or inaccurate statistics

Range optimizer memory exhaustion

Optimizer feature regressions

Missing or invisible indexes

Prevention

How Netdata helps

Related guides

MySQL monitoring with Netdata