Kafka disk I/O latency high: await, LocalTimeMs, and the slow-disk broker

iostat shows await climbing, maybe a disk alert fired. But await alone is not a pageable event. On SSDs and RAID arrays, %util hits 100% under modest load because it measures device busy time, not saturation. What matters is await, the average time for I/O requests to be served. At the Kafka layer, the mirror image is LocalTimeMs in the request latency breakdown. Both spike during normal operations – broker restart with cold page cache, log compaction, and partition reassignment all drive up disk latency without indicating hardware fault. This guide shows how to distinguish a transient spike from a slow disk that will shrink your ISR and block produce requests.

What this means

await is the weighted average of r_await and w_await. It captures queue time plus service time. For Kafka, write latency (w_await) reflects the durability path for produce requests and log flushes. Read latency (r_await) reflects the consumer fetch path. Because Kafka relies on the OS page cache for reads, healthy tail consumers should show r_await near zero. When r_await jumps, data has left the cache.

LocalTimeMs measures how long the broker spends processing a request locally, including appending to the log or reading from it. When LocalTimeMs rises alongside OS await, you are looking at the same bottleneck from two perspectives. LocalTimeMs can also rise for non-disk reasons, such as message format conversion. Correlate with RequestQueueTimeMs and RequestHandlerAvgIdlePercent. If LocalTimeMs is high but idle percent is healthy and the request queue is empty, the disk is not the problem.

flowchart TD
    A[OS await elevated] -->|check| B{Broker impact?}
    B -->|idle% low / URP rising| C[LocalTimeMs high]
    B -->|idle% normal| D[Transient]
    C -->|w_await up| E[Disk degradation]
    C -->|r_await up| F[Page cache miss]
    D -->|cause| G[Cold start / compaction / reassignment]
    E -->|confirm| H[LogFlushRateAndTimeMs]
    F -->|confirm| I[pgmajfault / consumer lag]

Common causes

Cause	What it looks like	First thing to check
Disk degradation or hardware wear	`w_await` grows steadily; `LocalTimeMs` for Produce spikes; `LogFlushRateAndTimeMs` p99 exceeds 500 ms; `RequestHandlerAvgIdlePercent` drops below 0.2	`iostat -xz 1` and JMX `LogFlushRateAndTimeMs`
Page cache eviction from backfill consumer	`r_await` jumps; `FetchConsumer` `LocalTimeMs` spikes; `BytesOutPerSec` rises without `BytesInPerSec` increase; `pgmajfault` rate doubles	Consumer group lag and `/proc/vmstat` `pgmajfault`
Log compaction burst	Transient `await` spikes; `max-dirty-percent` climbing; no growth in `RequestQueueTimeMs` or URP	JMX `kafka.log:type=LogCleanerManager,name=max-dirty-percent`
Cold start or partition reassignment	`await` high after broker restart or during reassignment; `RequestHandlerAvgIdlePercent` normal; URP transient	Broker uptime and `ReassigningPartitions` status
Swap pressure from JVM heap	`await` elevated with swap activity; long GC pauses; `si` and `so` visible in `vmstat`	`vmstat 1` and GC logs

Quick checks

# True disk saturation indicator: await, not %util
iostat -xz 1

# Page cache pressure: compare two samples 10 seconds apart
cat /proc/vmstat | grep pgmajfault

# Kafka broker impact: request handler idle and queue time
echo "get -b kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent OneMinuteRate" | java -jar jmxterm.jar -l localhost:9999
echo "get -b kafka.network:type=RequestMetrics,name=RequestQueueTimeMs,request=Produce 99thPercentile" | java -jar jmxterm.jar -l localhost:9999

# Local disk processing time
echo "get -b kafka.network:type=RequestMetrics,name=LocalTimeMs,request=Produce 99thPercentile" | java -jar jmxterm.jar -l localhost:9999
echo "get -b kafka.network:type=RequestMetrics,name=LocalTimeMs,request=FetchConsumer 99thPercentile" | java -jar jmxterm.jar -l localhost:9999

# Cluster durability status
kafka-topics.sh --bootstrap-server localhost:9092 --describe --under-replicated-partitions

# Disk space on log directories
grep log.dirs /etc/kafka/server.properties | tr ',' '\n' | while read d; do df -h "$d"; done

# Log cleaner health (compacted topics)
echo "get -b kafka.log:type=LogCleaner,name=DeadThreadCount Value" | java -jar jmxterm.jar -l localhost:9999

How to diagnose it

Establish the hardware baseline. SSD await should normally be under 5 ms; HDD under 10 ms. Sustained await above 20 ms for SSDs or 50 ms for HDDs is abnormal.
Split reads and writes. Use r_await and w_await. Write spikes point to disk degradation. Read spikes point to page cache misses.
Map to Kafka request type. If w_await is high, check LocalTimeMs for Produce. If r_await is high, check LocalTimeMs for FetchConsumer.
Confirm broker impact. Check RequestHandlerAvgIdlePercent. If it is below 0.2 and falling, or RequestQueueTimeMs is growing, the disk problem is backing up the broker. If idle percent is above 0.5, the spike may be transient background I/O.
Check for transient explanations. Look at broker uptime (cold start under 600 s), reassignment status, and compaction dirty ratio. If any of these match, the latency is expected and self-healing.
Check for disk failure signals. OfflineLogDirectoryCount above 0 is a binary failure. LogFlushRateAndTimeMs p99 above 500 ms confirms the write path is struggling.
Identify the consumer culprit for read spikes. Look for consumer groups with lag that is both large and actively shrinking, indicating a backfill. Corroborate with BytesOutPerSec rising without BytesInPerSec.
Check for swap. Run vmstat 1. If si and so are nonzero, JVM heap pages may be swapping to disk, compounding I/O latency. Confirm vm.swappiness is set to 1.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`iostat` `await`	True disk saturation indicator; `%util` misleads on SSD and RAID	SSD above 20 ms sustained; HDD above 50 ms sustained
`r_await` vs `w_await`	Distinguishes read cache misses from write path degradation	`w_await` up signals disk wear; `r_await` up signals cache miss
`LocalTimeMs` (Produce / Fetch)	Broker-level view of time spent in local I/O	p99 above 2-3x baseline
`RequestHandlerAvgIdlePercent`	Threads blocking on I/O vs compute	Below 0.3 sustained; below 0.2 critical
`RequestQueueTimeMs`	Queue between network and I/O threads	Growing while idle percent falls
`UnderReplicatedPartitions`	Durability degradation from slow followers	Nonzero above 5 minutes outside maintenance
`pgmajfault` rate	Page cache effectiveness	2x baseline or higher
`LogFlushRateAndTimeMs`	Fsync latency on log segments	p99 above 500 ms
`OfflineLogDirectoryCount`	Binary disk failure signal	Any nonzero value

Fixes

Disk degradation

If w_await is sustained and RequestHandlerAvgIdlePercent is below 0.2, the disk is the bottleneck. Do not restart the broker as a first fix; restarting loses the page cache and generates a wave of follower fetches. Instead, trigger a controlled shutdown to take the broker out of the data path. This moves leadership away cleanly and lets replicas catch up on healthier brokers. Tradeoff: you will see transient URP during migration. If the disk is JBOD and only one directory is slow, you may be able to move partitions off that specific log directory, but this requires reassignment planning.

Page cache thrashing

If a backfill consumer is driving r_await and evicting hot data, throttle the consumer using Kafka quotas on consumer_byte_rate. This caps the read bandwidth without stopping the job. Alternatively, if running Kafka 2.4+, enable follower fetching so backfill reads hit follower replicas instead of the leader. Tradeoff: backfill takes longer, but tail consumer latency recovers immediately.

Compaction I/O saturation

If log cleaner threads are driving spikes but DeadThreadCount is zero, the disk itself is too slow for the compaction workload. Adding log.cleaner.threads will not help an I/O-bound cleaner. The fix is faster storage or reducing compacted topic throughput. Tradeoff: infrastructure cost.

Silent cleaner failure

If DeadThreadCount is above 0, compaction has stopped. A broker restart resurrects the cleaner thread. Before restarting, grep logs for the root cause. Tradeoff: brief URP while the broker rejoins.

Disk space pressure

If await is high because the disk is above 90% full, emergency retention reduction or volume expansion is required. Be aware that compacted topics free space less predictably under retention.bytes because segments must be compacted before deletion. Tradeoff: reducing retention risks data loss for consumers that have not caught up.

Prevention

Alert on await, not %util. Set thresholds relative to your hardware baseline.
Monitor the Kafka request latency breakdown, not just TotalTimeMs.
Set vm.swappiness = 1 to prevent the OS from swapping JVM heap pages.
Monitor DeadThreadCount and max-dirty-percent to catch silent compaction failure before disk fills.
Maintain at least 15-20% free space on each log.dirs volume to account for compaction doubling and reassignment copies.
Run game-day tests: shut down a broker intentionally and measure ISR recovery time and page cache warmup duration. This establishes your real baselines for await during failure.

How Netdata helps

Correlates OS disk await with Kafka LocalTimeMs to show whether a broker-level latency spike matches the disk or a different layer.
Shows RequestHandlerAvgIdlePercent and RequestQueueTimeMs per broker to confirm impact before paging.
Collects UnderReplicatedPartitions, OfflineLogDirectoryCount, and LogFlushRateAndTimeMs without JMX scripting.
Tracks page cache pressure through major page fault metrics, highlighting backfill consumers before they degrade tail latency.
Baselines disk latency per broker and flags deviations from historical norms, catching disk wear early.

Kafka disk I/O latency high: await, LocalTimeMs, and the slow-disk broker

Kafka disk I/O latency high: await, LocalTimeMs, and the slow-disk broker

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Disk degradation

Page cache thrashing

Compaction I/O saturation

Silent cleaner failure

Disk space pressure

Prevention

How Netdata helps

Related guides