MySQL adaptive hash index latch contention: high CPU, low throughput

Burning CPU while barely answering queries: OS user time near 100%, application latency spiking, and Questions flat or falling. Threads_connected is high but Threads_running stays low, and disk I/O is quiet. This pattern matches an Adaptive Hash Index (AHI) latch storm.

InnoDB builds an in-memory hash index over frequently accessed B-tree pages to speed lookups. Since MySQL 5.7, the latch is partitioned into innodb_adaptive_hash_index_parts (default 8). Under high concurrency, especially point lookups on a hot index, threads can collide on the same partition latch and spin instead of executing. CPU saturates while throughput collapses.

The fastest fix is SET GLOBAL innodb_adaptive_hash_index = OFF. It takes effect immediately, drops the entire AHI structure, and stops the spin waits.

What this means

AHI lives in the buffer pool. When a thread looks up a popular index, InnoDB may hash the target B-tree page to skip tree traversal. The btr_search rw-latch protects each partition. If one thread holds the latch exclusively to build or evict a hash entry, concurrent threads spin. On a many-core server with a point-lookup workload, eight partitions can serialize access and create a convoy.

Threads are counted as connected but not running, so Threads_running stays below Threads_connected while CPU climbs. Because the bottleneck is a memory latch, not disk, Innodb_buffer_pool_reads does not spike and I/O utilization remains low. Queries may not appear in the slow query log because they are not waiting on locks or disk; they are stuck in CPU spin loops inside InnoDB.

Common causes

CauseWhat it looks likeFirst thing to check
AHI partition collision on a hot indexbtr_search waits in SHOW ENGINE INNODB STATUS with spin rounds above 100K; a specific index may be namedSEMAPHORES section for btr_search latch lines
innodb_adaptive_hash_index_parts too small for core countHigh CPU on a many-core host running point lookups with the default partition countSHOW GLOBAL VARIABLES LIKE 'innodb_adaptive_hash_index_parts' against CPU core count
Buffer pool churn invalidating AHI entriesAHI hit ratio flat or dropping alongside high buffer pool readsINFORMATION_SCHEMA.INNODB_METRICS for adaptive_hash_searches versus adaptive_hash_searches_btree
Workload shifted to scans or range lookupsHash searches below 20% of total lookups; CPU high but latch waits absentAHI hit ratio from INNODB_METRICS and recent query plan changes

Quick checks

-- Measure the active thread gap
SHOW GLOBAL STATUS LIKE 'Threads_running';
SHOW GLOBAL STATUS LIKE 'Threads_connected';
-- Sample query throughput twice to compute rate
SHOW GLOBAL STATUS LIKE 'Questions';
-- Look for btr_search latch waits and spin rounds
SHOW ENGINE INNODB STATUS\G
-- Focus on: SEMAPHORES section
-- Compute AHI efficiency
SELECT NAME, COUNT FROM INFORMATION_SCHEMA.INNODB_METRICS
WHERE NAME IN ('adaptive_hash_searches', 'adaptive_hash_searches_btree');
-- Check partition count against CPU cores
SHOW GLOBAL VARIABLES LIKE 'innodb_adaptive_hash_index_parts';
-- Rule out I/O saturation
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read_requests';
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_reads';
-- Rule out row lock contention (MySQL 8.0)
SELECT COUNT(*) FROM performance_schema.data_lock_waits;

-- MySQL 5.7
SELECT COUNT(*) FROM information_schema.INNODB_LOCK_WAITS;

How to diagnose it

Use this flow to confirm an AHI latch storm before making changes.

flowchart TD
    A[High CPU and low throughput] --> B{Threads_running < Threads_connected?}
    B -->|Yes| C[Check SEMAPHORES btr_search]
    B -->|No| D[Check row locks and I/O]
    C --> E{Spin rounds > 100K?}
    E -->|Yes| F[AHI latch storm]
    E -->|No| G[Other mutex contention]
    F --> H[Disable AHI dynamically]
  1. Correlate CPU and throughput. Confirm that OS CPU is above 90% while the MySQL Questions rate has dropped sharply from baseline.
  2. Confirm the thread gap. If Threads_running is far below Threads_connected, threads are stuck on internal waits rather than executing. This gap is not unique to AHI, but it rules out simple CPU saturation from productive work.
  3. Inspect SHOW ENGINE INNODB STATUS and find the SEMAPHORES section. Look for lines containing btr_search. Spin rounds above 100K are a heuristic indicating threads are burning CPU on this latch.
  4. Check AHI utility. Query INFORMATION_SCHEMA.INNODB_METRICS for adaptive_hash_searches and adaptive_hash_searches_btree. If hash searches are less than 20% of total, AHI is not earning its keep.
  5. Compare innodb_adaptive_hash_index_parts to CPU core count. If the partition count is far below core count, the latch is a serialization bottleneck.
  6. Rule out I/O and row locks. Verify that Innodb_buffer_pool_reads is not spiking and that performance_schema.data_lock_waits (MySQL 8.0) or information_schema.INNODB_LOCK_WAITS (MySQL 5.7) is empty. An AHI storm is CPU-bound with quiet disk and no row lock waits.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
CPU utilization vs Questions rateDistinguishes spinning from productive loadCPU above 90% with flat or falling Questions
btr_search semaphore waitsDirect evidence of AHI partition contentionSpin rounds above 100K in SHOW ENGINE INNODB STATUS
Threads_running / Threads_connected gapReveals threads stuck waiting instead of executingThreads_running sustained below Threads_connected during load
AHI hit ratioShows whether AHI is worth its memory and riskadaptive_hash_searches / total lookups below 20%
Innodb_buffer_pool_read_requests rateContext for total lookup volume; AHI only helps point lookupsHigh volume with low AHI utilization suggests mispartitioning or scans

Fixes

Immediate relief: disable AHI dynamically

Warning: this is disruptive. The AHI structure is dropped immediately and queries revert to B-tree traversal.

Run:

SET GLOBAL innodb_adaptive_hash_index = OFF;

Requires SUPER or SYSTEM_VARIABLES_ADMIN. Threads stop spinning immediately. Expect a brief throughput dip, followed by recovery as convoyed queries resume. If the workload genuinely relied on AHI, you may see a small post-recovery latency increase on hot point lookups. During a storm, this is still usually a net win because the system moves from near-zero throughput to stable execution.

Permanent fix: increase partition count

Set innodb_adaptive_hash_index_parts to roughly twice the CPU core count in the configuration file, then restart the server. Valid range is 1 to 512. More partitions spread latch pressure across separate latches, reducing the chance that two hot threads land on the same partition. Tradeoff: additional partitions consume slightly more memory and can reduce per-partition cache locality. This does not eliminate contention, but it raises the concurrency level at which the storm appears.

Operational choice: leave AHI disabled

If your workload is scan-heavy, write-heavy, or already well served by the buffer pool, disabling AHI permanently removes this failure mode. Some production runbooks disable it by default. Tradeoff: you lose the hash shortcut for hot page lookups. On modern hardware with fast NVMe and large buffer pools, the extra B-tree traversal cost is often imperceptible compared to the risk of a production latch storm.

Fix query patterns

If the workload shifted to range scans or unanchored LIKE queries, AHI cannot help because it only accelerates exact-match page lookups. Optimize queries to use precise point lookups, add covering indexes, or route analytical scans to a replica. If the scan pattern is permanent, disable AHI rather than letting it consume buffer pool memory for no benefit.

Prevention

  • Proactively monitor SHOW ENGINE INNODB STATUS SEMAPHORES for btr_search waits after any major workload change or deployment.
  • Size innodb_adaptive_hash_index_parts to twice CPU cores at provisioning time, not after the first incident.
  • Baseline AHI hit ratio from INFORMATION_SCHEMA.INNODB_METRICS; if it stays below 50% for a full business cycle, disable AHI permanently in the configuration.
  • Do not expect AHI to help on workloads dominated by range scans, full table scans, or heavy writes. It is designed for repeated point lookups on a stable working set.

How Netdata helps

  • Netdata correlates CPU utilization with MySQL Questions and Threads_running in the same view, making the spinning-threads gap immediately visible.
  • Buffer pool and disk I/O charts help rule out storage saturation without switching tools.
  • Alerts on Threads_running diverging from Threads_connected can catch latch storms before throughput collapses completely.
  • Long-term baseline tracking of query throughput highlights subtle slowdowns that precede full CPU saturation.