MySQL Sort_merge_passes climbing: filesort and sort_buffer_size

Sort_merge_passes climbing fast is the signature of filesort spilling to disk. You notice it in SHOW GLOBAL STATUS, along with rising disk I/O on the tmpdir partition and a growing Handler_read_rnd. Queries that returned in milliseconds now take seconds.

The absolute value of this cumulative counter is meaningless. Rate of change is what matters. A rapid climb within hours means sorts that used to stay in memory are now writing chunks to disk and merging them in multiple passes. Each pass costs disk I/O and adds latency. Active threads pile up behind slow sorts, driving Threads_running higher and the Questions rate lower.

Confirm the overflow, find the queries responsible, and fix them without causing memory pressure.

What this means

When MySQL cannot satisfy an ORDER BY clause from an index, it performs a filesort. The server reads rows into a sort buffer capped by sort_buffer_size. If the sort key and accompanying columns do not fit, MySQL writes sorted chunks to a temporary file on disk and merges them in subsequent passes. Each merge pass increments Sort_merge_passes.

Sort_scan counts sorts after a full table scan, while Sort_range counts sorts after a range read. Together they represent the total sort operations. When the merge-pass count grows faster than the sum of these two, the average sort requires more than one disk pass.

The operational formula uses deltas over an interval:

rate(Sort_merge_passes) / (rate(Sort_scan) + rate(Sort_range)) > 1

When this ratio exceeds 1, the sort buffer is too small for the workload. Handler_read_rnd indicates filesort activity, typically from ORDER BY on non-indexed columns. It tracks reads by row position during the sort output phase. This is distinct from full table scans, which show up in Handler_read_rnd_next. A sustained rise in Handler_read_rnd alongside Sort_merge_passes confirms that filesort is the mechanism causing the pain.

flowchart TD
    A[Query with ORDER BY] --> B{Index covers sort order?}
    B -->|No| C[Filesort]
    C --> D[Read rows into sort buffer]
    D --> E{Fits within sort_buffer_size?}
    E -->|No| F[Write sorted chunks to disk]
    F --> G[Merge passes]
    G --> H[Sort_merge_passes increments]
    H --> I[Slower queries, higher disk I/O]

Common causes

CauseWhat it looks likeFirst thing to check
Missing index on sort columnsHandler_read_rnd rising; EXPLAIN shows Using filesortEXPLAIN for top ORDER BY queries
sort_buffer_size too small for workloadSort_merge_passes rate exceeds combined Sort_scan + Sort_range rateGlobal status ratio over a 1-minute window
Application selecting and sorting more rows than neededInnodb_rows_read far exceeds final rows sent; high Sort_rows relative to resultsQuery WHERE clauses and LIMIT usage
Batch or ETL jobs running during peak hoursStep-change in Sort_merge_passes correlated with job scheduleSHOW PROCESSLIST or information_schema.INNODB_TRX for long threads

Quick checks

-- Sort overflow ratio inputs
SHOW GLOBAL STATUS LIKE 'Sort_merge_passes';
SHOW GLOBAL STATUS LIKE 'Sort_scan';
SHOW GLOBAL STATUS LIKE 'Sort_range';
-- Filesort activity
SHOW GLOBAL STATUS LIKE 'Handler_read_rnd';
-- Per-session sort buffer limit
SHOW GLOBAL VARIABLES LIKE 'sort_buffer_size';
-- Temp table spill correlation
SHOW GLOBAL STATUS LIKE 'Created_tmp_disk_tables';
SHOW GLOBAL STATUS LIKE 'Created_tmp_tables';
-- Long-running active queries
SELECT ID, USER, HOST, DB, TIME, STATE, LEFT(INFO, 100) AS query_snippet
FROM information_schema.PROCESSLIST
WHERE COMMAND != 'Sleep' AND TIME > 10
ORDER BY TIME DESC;
-- Top query patterns by total latency
SELECT DIGEST_TEXT, COUNT_STAR, SUM_TIMER_WAIT
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;

How to diagnose it

  1. Sample Sort_merge_passes, Sort_scan, and Sort_range at the start and end of a 5-minute window during peak traffic. Compute the rate ratio. A value above 1 means sorts are spilling.
  2. Confirm filesort by checking whether Handler_read_rnd is increasing. This counter tracks row-position reads during filesort output.
  3. Check Created_tmp_disk_tables against Created_tmp_tables. If the disk ratio is also elevated, the workload is generating large intermediate results; sorting is one symptom of a broader query-efficiency problem.
  4. Identify offending queries. Query performance_schema.events_statements_summary_by_digest for high cumulative latency, or query sys.statements_with_sorting to surface explicit sorts directly. Use SHOW PROCESSLIST to catch long-running sorts in real time. Look for ORDER BY on large tables.
  5. Run EXPLAIN on suspects. If the Extra column contains Using filesort, the optimizer cannot use an index to satisfy the sort.
  6. Check whether the query reads a large intermediate result that is later trimmed. A missing LIMIT, a broad date range, or a low-cardinality WHERE predicate can force MySQL to sort thousands of rows to return ten.
  7. Verify memory headroom before changing sort_buffer_size. The variable is per-session. Each sort can allocate up to the full sort_buffer_size; a global increase multiplies across every connection that sorts. Raising it without accounting for max_connections can convert a sort problem into an OOM kill.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Sort_merge_passes rateDirect measure of sort spill to diskRapid growth over hours, or ratio > 1
Sort_scan + Sort_rangeDenominator for overflow ratioRising alongside Sort_merge_passes
Handler_read_rndConfirms filesort executionSustained increase above baseline
Created_tmp_disk_tables / Created_tmp_tablesSort spill often correlates with temp table spillRatio sustained above 25%
Threads_runningSpilled sorts slow queries and cause pile-upRising while Questions rate drops
Slow_queriesMerge passes add latencyRate increase correlating with sort spill

Fixes

Add an index covering the sort order

The best fix is to eliminate the filesort. Create an index whose leading columns match the ORDER BY sequence so the optimizer reads rows in sorted order. If the query has a WHERE clause, the index must also satisfy the filter or the optimizer may still choose a filesort. A covering index that includes all selected columns avoids clustered-index lookups, reducing the row size entering the sort buffer. Tradeoff: extra index maintenance on writes and additional disk space.

Reduce the sort footprint

Add restrictive WHERE clauses, use LIMIT where business logic allows, and avoid selecting columns that are not needed. If the application fetches thousands of rows only to display ten, fix the application logic. This reduces rows entering the sort buffer without touching server configuration. Note that if the optimizer cannot use an index for the ORDER BY, it may still filesort the entire intermediate result before applying LIMIT.

Raise sort_buffer_size for a specific session

When a legitimate batch or reporting query must sort a large result set and cannot be rewritten, increase the buffer for that connection only:

SET SESSION sort_buffer_size = 524288;

Tradeoff: that session now allocates more memory for each sort. Use this sparingly for known operations, and disconnect afterward to free the memory.

Raise sort_buffer_size globally (with caution)

If the workload consistently requires a larger buffer and indexing is not possible, increase the global value modestly. The buffer is allocated per session, so a global value of 4 MB across 200 peak connections consumes 800 MB. Verify that max_connections * sort_buffer_size fits inside available RAM after accounting for the InnoDB buffer pool and other per-thread buffers. Large values often yield diminishing returns and increase memory pressure. Do not raise it blindly.

Prevention

  • Baseline Sort_merge_passes per hour during normal operations. A step-change after a deployment is usually a missing index or a new ORDER BY pattern.
  • Run EXPLAIN on new queries in staging and reject any that show Using filesort on a large table without a compensating index or a very small result set.
  • Monitor the sort overflow ratio alongside Handler_read_rnd as a code-quality gate before production release.
  • Use session-level sort_buffer_size adjustments for known large sorts rather than raising the global default.
  • Correlate sort spill with temporary table spill. If both Sort_merge_passes and Created_tmp_disk_tables are rising, the workload is generating large intermediate results that need query-level fixes, not just buffer tuning.

How Netdata helps

  • Netdata polls Sort_merge_passes, Sort_scan, Sort_range, and Handler_read_rnd from SHOW GLOBAL STATUS and plots rates, so you see the step-change without manual sampling.
  • Correlate climbing Sort_merge_passes with disk I/O on the tmpdir partition and with Threads_running to confirm that sort spill is translating into user-facing latency.
  • Watch the Slow_queries rate alongside sort counters to build a composite signal that catches filesort degradation before it cascades into connection pile-up.
  • How MySQL actually works in production: a mental model for operators: /guides/mysql/how-mysql-works-in-production/
  • MySQL Aborted_connects and Aborted_clients climbing: diagnosis: /guides/mysql/mysql-aborted-connections/
  • MySQL adaptive hash index latch contention: high CPU, low throughput: /guides/mysql/mysql-adaptive-hash-index-latch-contention/
  • MySQL InnoDB buffer pool hit ratio collapse: the cliff edge: /guides/mysql/mysql-buffer-pool-hit-ratio-collapse/
  • MySQL slow after restart: buffer pool warm-up and the cold cache: /guides/mysql/mysql-buffer-pool-not-warming-up/
  • MySQL innodb_buffer_pool_size tuning: 60-80% of RAM and when that breaks: /guides/mysql/mysql-buffer-pool-sizing/
  • MySQL Innodb_buffer_pool_wait_free > 0: buffer pool memory pressure: /guides/mysql/mysql-buffer-pool-wait-free/
  • MySQL InnoDB checkpoint age: the redo log capacity signal nobody watches: /guides/mysql/mysql-checkpoint-age-monitoring/
  • MySQL connection exhaustion: detection, diagnosis, and prevention: /guides/mysql/mysql-connection-exhaustion/
  • MySQL innodb_deadlock_detect=OFF: when deadlock detection becomes the bottleneck: /guides/mysql/mysql-deadlock-detect-off-high-concurrency/
  • MySQL ERROR 1213: Deadlock found when trying to get lock; try restarting transaction: /guides/mysql/mysql-deadlock-found/
  • MySQL FLUSH TABLES WITH READ LOCK stall: backups that freeze the server: /guides/mysql/mysql-flush-tables-with-read-lock-stall/