MySQL Sort_merge_passes climbing: filesort and sort_buffer_size
Sort_merge_passes climbing fast is the signature of filesort spilling to disk. You notice it in SHOW GLOBAL STATUS, along with rising disk I/O on the tmpdir partition and a growing Handler_read_rnd. Queries that returned in milliseconds now take seconds.
The absolute value of this cumulative counter is meaningless. Rate of change is what matters. A rapid climb within hours means sorts that used to stay in memory are now writing chunks to disk and merging them in multiple passes. Each pass costs disk I/O and adds latency. Active threads pile up behind slow sorts, driving Threads_running higher and the Questions rate lower.
Confirm the overflow, find the queries responsible, and fix them without causing memory pressure.
What this means
When MySQL cannot satisfy an ORDER BY clause from an index, it performs a filesort. The server reads rows into a sort buffer capped by sort_buffer_size. If the sort key and accompanying columns do not fit, MySQL writes sorted chunks to a temporary file on disk and merges them in subsequent passes. Each merge pass increments Sort_merge_passes.
Sort_scan counts sorts after a full table scan, while Sort_range counts sorts after a range read. Together they represent the total sort operations. When the merge-pass count grows faster than the sum of these two, the average sort requires more than one disk pass.
The operational formula uses deltas over an interval:
rate(Sort_merge_passes) / (rate(Sort_scan) + rate(Sort_range)) > 1
When this ratio exceeds 1, the sort buffer is too small for the workload. Handler_read_rnd indicates filesort activity, typically from ORDER BY on non-indexed columns. It tracks reads by row position during the sort output phase. This is distinct from full table scans, which show up in Handler_read_rnd_next. A sustained rise in Handler_read_rnd alongside Sort_merge_passes confirms that filesort is the mechanism causing the pain.
flowchart TD
A[Query with ORDER BY] --> B{Index covers sort order?}
B -->|No| C[Filesort]
C --> D[Read rows into sort buffer]
D --> E{Fits within sort_buffer_size?}
E -->|No| F[Write sorted chunks to disk]
F --> G[Merge passes]
G --> H[Sort_merge_passes increments]
H --> I[Slower queries, higher disk I/O]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Missing index on sort columns | Handler_read_rnd rising; EXPLAIN shows Using filesort | EXPLAIN for top ORDER BY queries |
sort_buffer_size too small for workload | Sort_merge_passes rate exceeds combined Sort_scan + Sort_range rate | Global status ratio over a 1-minute window |
| Application selecting and sorting more rows than needed | Innodb_rows_read far exceeds final rows sent; high Sort_rows relative to results | Query WHERE clauses and LIMIT usage |
| Batch or ETL jobs running during peak hours | Step-change in Sort_merge_passes correlated with job schedule | SHOW PROCESSLIST or information_schema.INNODB_TRX for long threads |
Quick checks
-- Sort overflow ratio inputs
SHOW GLOBAL STATUS LIKE 'Sort_merge_passes';
SHOW GLOBAL STATUS LIKE 'Sort_scan';
SHOW GLOBAL STATUS LIKE 'Sort_range';
-- Filesort activity
SHOW GLOBAL STATUS LIKE 'Handler_read_rnd';
-- Per-session sort buffer limit
SHOW GLOBAL VARIABLES LIKE 'sort_buffer_size';
-- Temp table spill correlation
SHOW GLOBAL STATUS LIKE 'Created_tmp_disk_tables';
SHOW GLOBAL STATUS LIKE 'Created_tmp_tables';
-- Long-running active queries
SELECT ID, USER, HOST, DB, TIME, STATE, LEFT(INFO, 100) AS query_snippet
FROM information_schema.PROCESSLIST
WHERE COMMAND != 'Sleep' AND TIME > 10
ORDER BY TIME DESC;
-- Top query patterns by total latency
SELECT DIGEST_TEXT, COUNT_STAR, SUM_TIMER_WAIT
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;
How to diagnose it
- Sample
Sort_merge_passes,Sort_scan, andSort_rangeat the start and end of a 5-minute window during peak traffic. Compute the rate ratio. A value above 1 means sorts are spilling. - Confirm filesort by checking whether
Handler_read_rndis increasing. This counter tracks row-position reads during filesort output. - Check
Created_tmp_disk_tablesagainstCreated_tmp_tables. If the disk ratio is also elevated, the workload is generating large intermediate results; sorting is one symptom of a broader query-efficiency problem. - Identify offending queries. Query
performance_schema.events_statements_summary_by_digestfor high cumulative latency, or querysys.statements_with_sortingto surface explicit sorts directly. UseSHOW PROCESSLISTto catch long-running sorts in real time. Look forORDER BYon large tables. - Run
EXPLAINon suspects. If theExtracolumn containsUsing filesort, the optimizer cannot use an index to satisfy the sort. - Check whether the query reads a large intermediate result that is later trimmed. A missing
LIMIT, a broad date range, or a low-cardinalityWHEREpredicate can force MySQL to sort thousands of rows to return ten. - Verify memory headroom before changing
sort_buffer_size. The variable is per-session. Each sort can allocate up to the fullsort_buffer_size; a global increase multiplies across every connection that sorts. Raising it without accounting formax_connectionscan convert a sort problem into an OOM kill.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
Sort_merge_passes rate | Direct measure of sort spill to disk | Rapid growth over hours, or ratio > 1 |
Sort_scan + Sort_range | Denominator for overflow ratio | Rising alongside Sort_merge_passes |
Handler_read_rnd | Confirms filesort execution | Sustained increase above baseline |
Created_tmp_disk_tables / Created_tmp_tables | Sort spill often correlates with temp table spill | Ratio sustained above 25% |
Threads_running | Spilled sorts slow queries and cause pile-up | Rising while Questions rate drops |
Slow_queries | Merge passes add latency | Rate increase correlating with sort spill |
Fixes
Add an index covering the sort order
The best fix is to eliminate the filesort. Create an index whose leading columns match the ORDER BY sequence so the optimizer reads rows in sorted order. If the query has a WHERE clause, the index must also satisfy the filter or the optimizer may still choose a filesort. A covering index that includes all selected columns avoids clustered-index lookups, reducing the row size entering the sort buffer. Tradeoff: extra index maintenance on writes and additional disk space.
Reduce the sort footprint
Add restrictive WHERE clauses, use LIMIT where business logic allows, and avoid selecting columns that are not needed. If the application fetches thousands of rows only to display ten, fix the application logic. This reduces rows entering the sort buffer without touching server configuration. Note that if the optimizer cannot use an index for the ORDER BY, it may still filesort the entire intermediate result before applying LIMIT.
Raise sort_buffer_size for a specific session
When a legitimate batch or reporting query must sort a large result set and cannot be rewritten, increase the buffer for that connection only:
SET SESSION sort_buffer_size = 524288;
Tradeoff: that session now allocates more memory for each sort. Use this sparingly for known operations, and disconnect afterward to free the memory.
Raise sort_buffer_size globally (with caution)
If the workload consistently requires a larger buffer and indexing is not possible, increase the global value modestly. The buffer is allocated per session, so a global value of 4 MB across 200 peak connections consumes 800 MB. Verify that max_connections * sort_buffer_size fits inside available RAM after accounting for the InnoDB buffer pool and other per-thread buffers. Large values often yield diminishing returns and increase memory pressure. Do not raise it blindly.
Prevention
- Baseline
Sort_merge_passesper hour during normal operations. A step-change after a deployment is usually a missing index or a newORDER BYpattern. - Run
EXPLAINon new queries in staging and reject any that showUsing filesorton a large table without a compensating index or a very small result set. - Monitor the sort overflow ratio alongside
Handler_read_rndas a code-quality gate before production release. - Use session-level
sort_buffer_sizeadjustments for known large sorts rather than raising the global default. - Correlate sort spill with temporary table spill. If both
Sort_merge_passesandCreated_tmp_disk_tablesare rising, the workload is generating large intermediate results that need query-level fixes, not just buffer tuning.
How Netdata helps
- Netdata polls
Sort_merge_passes,Sort_scan,Sort_range, andHandler_read_rndfromSHOW GLOBAL STATUSand plots rates, so you see the step-change without manual sampling. - Correlate climbing
Sort_merge_passeswith disk I/O on the tmpdir partition and withThreads_runningto confirm that sort spill is translating into user-facing latency. - Watch the
Slow_queriesrate alongside sort counters to build a composite signal that catches filesort degradation before it cascades into connection pile-up.
Related guides
- How MySQL actually works in production: a mental model for operators: /guides/mysql/how-mysql-works-in-production/
- MySQL Aborted_connects and Aborted_clients climbing: diagnosis: /guides/mysql/mysql-aborted-connections/
- MySQL adaptive hash index latch contention: high CPU, low throughput: /guides/mysql/mysql-adaptive-hash-index-latch-contention/
- MySQL InnoDB buffer pool hit ratio collapse: the cliff edge: /guides/mysql/mysql-buffer-pool-hit-ratio-collapse/
- MySQL slow after restart: buffer pool warm-up and the cold cache: /guides/mysql/mysql-buffer-pool-not-warming-up/
- MySQL innodb_buffer_pool_size tuning: 60-80% of RAM and when that breaks: /guides/mysql/mysql-buffer-pool-sizing/
- MySQL Innodb_buffer_pool_wait_free > 0: buffer pool memory pressure: /guides/mysql/mysql-buffer-pool-wait-free/
- MySQL InnoDB checkpoint age: the redo log capacity signal nobody watches: /guides/mysql/mysql-checkpoint-age-monitoring/
- MySQL connection exhaustion: detection, diagnosis, and prevention: /guides/mysql/mysql-connection-exhaustion/
- MySQL innodb_deadlock_detect=OFF: when deadlock detection becomes the bottleneck: /guides/mysql/mysql-deadlock-detect-off-high-concurrency/
- MySQL ERROR 1213: Deadlock found when trying to get lock; try restarting transaction: /guides/mysql/mysql-deadlock-found/
- MySQL FLUSH TABLES WITH READ LOCK stall: backups that freeze the server: /guides/mysql/mysql-flush-tables-with-read-lock-stall/







