$ guides / redis / redis-cpu-saturation-single-thread ▌

Operations Guides

Redis CPU saturation: hitting the single-core throughput ceiling

Redis latency climbs. PING returns PONG, but simple GETs take milliseconds instead of microseconds. Host CPU looks moderate - perhaps 25% across eight cores - yet commands queue. The likely cause is main-thread CPU saturation. Redis executes all commands on a single event-loop thread. Once that thread saturates one core, latency rises linearly with queue depth. There is no performance cliff - only a steady ramp that eventually drives client timeouts. On multi-core hosts, aggregate process CPU hides this bottleneck because background children, I/O threads, and system accounting spread usage across cores.

What this means

Redis is a single-threaded command processor. Since Redis 6.0, I/O threads handle network socket reads and writes in parallel, but command execution, key expiry, active defragmentation, and incremental hash-table rehashing all run on the main thread. When main-thread CPU approaches one full core, every additional command waits in the event-loop queue. That wait time adds directly to latency. Because there is no preemption inside the event loop, a single slow operation blocks all subsequent commands until it completes.

The aggregate CPU counters used_cpu_user and used_cpu_sys include all threads: background fork children, bio threads, and I/O workers. On a multi-core server, these aggregates may show low utilization while the main thread is saturated. Redis 6.2 introduced used_cpu_user_main_thread and used_cpu_sys_main_thread, which isolate event-loop CPU. To detect saturation, sample these counters twice and compute the per-second rate.

Several internal processes compete for the same main-thread CPU: O(N) commands scanning large keys, the active expiry cycle sampling TTLs, active defragmentation relocating allocations, and incremental rehashing resizing the keyspace hash table. Any of these can push a stable host into saturation.

flowchart TD
  A[Main-thread CPU rate approaches 1.0] --> B{SLOWLOG shows O(N) commands?}
  B -->|Yes| C[Replace KEYS with SCAN,
chunk large ops,
optimize Lua]
  B -->|No| D{expired_keys or
evicted_keys spiking?}
  D -->|Yes| E[Add TTL jitter,
increase maxmemory,
or shard]
  D -->|No| F{active_defrag_running > 0?}
  F -->|Yes| G[Tune defrag
cycles or disable]
  F -->|No| H[Shard or reduce
command volume]

Common causes

Cause	What it looks like	First thing to check
Throughput exceeds single-core capacity	`instantaneous_ops_per_sec` plateaus while main-thread CPU rate approaches 1.0	`INFO cpu` main-thread rate versus `instantaneous_ops_per_sec`
O(N) commands blocking the event loop	Slowlog dominated by `KEYS`, `SMEMBERS`, `HGETALL`, `SORT`, or Lua scripts; latency spikes across all command types	`SLOWLOG GET 50`, `INFO commandstats`
Active expiry cycle under mass TTL pressure	CPU spikes correlate with jumps in `expired_keys`; `expired_time_cap_reached_count` climbing	`INFO stats` for `expired_keys` rate and time-cap counter
Active defragmentation	Elevated CPU during otherwise idle periods; `active_defrag_running` sustained above zero	`INFO memory` and `INFO stats` defrag metrics
Hash table rehashing after growth	Steady CPU overhead following bulk loads or rapid keyspace growth	`INFO keyspace` key count trend
Eviction under memory pressure	`evicted_keys` rate climbing alongside CPU; `used_memory` at `maxmemory`	`INFO stats` evicted keys rate, `used_memory` versus `maxmemory`

Quick checks

# Main-thread CPU counters (Redis 6.2+)
redis-cli INFO cpu | grep -E 'used_cpu_user_main_thread|used_cpu_sys_main_thread'

# Current throughput
redis-cli INFO stats | grep instantaneous_ops_per_sec

# Commands blocking the loop
redis-cli SLOWLOG GET 10

# Expensive command types
redis-cli INFO commandstats | grep -E 'cmdstat_keys|cmdstat_smembers|cmdstat_hgetall|cmdstat_sort'

# Expiry and eviction pressure
redis-cli INFO stats | grep -E 'expired_keys|evicted_keys'

# Active defrag status
redis-cli INFO memory | grep active_defrag
redis-cli INFO stats | grep active_defrag

# Internal latency events (requires latency-monitor-threshold > 0)
redis-cli LATENCY LATEST

# Blocked clients (distinguish queue wait from CPU wait)
redis-cli INFO clients | grep blocked_clients

How to diagnose it

Compute the main-thread CPU rate. Sample INFO cpu twice, 10 seconds apart. On Redis 6.2+, sum used_cpu_user_main_thread and used_cpu_sys_main_thread. Divide the delta by the elapsed interval. A rate above 0.9 means the main thread is saturated. On Redis versions older than 6.2, sum used_cpu_user and used_cpu_sys, but recognize that this aggregate includes background threads and can underestimate saturation on multi-core hosts.
Correlate CPU with throughput. Check INFO stats for instantaneous_ops_per_sec. If throughput is flat or falling while client demand rises, the instance has hit its single-core execution ceiling.
Identify event loop blockers. Inspect SLOWLOG GET 50. If the same pattern appears repeatedly - especially KEYS, SMEMBERS, HGETALL, unbounded LRANGE, or SORT - that command is serially delaying everything behind it. Check INFO commandstats for high usec_per_call outliers.
Check background CPU consumers. Compute the rate of expired_keys and evicted_keys from INFO stats. If either is spiking, the main thread is spending cycles on memory management instead of client commands. Check INFO memory for active_defrag_running above zero.
Verify internal latency sources. If latency-monitor-threshold is set, run LATENCY LATEST. Look for command, expire-cycle, or eviction-cycle events. These categories confirm which subsystem is consuming time.
Distinguish from I/O bottlenecks. Check INFO stats for io_threaded_reads_processed and io_threaded_writes_processed (Redis 6.0+). High numbers mean network I/O is offloaded, ruling out socket syscalls as the primary bottleneck and pointing to execution saturation.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Main-thread CPU rate (Redis 6.2+)	Isolates single-thread saturation from aggregate process CPU	Rate > 0.7 sustained; > 0.9 is saturated
`instantaneous_ops_per_sec`	Current command volume against the single-thread ceiling	Plateauing despite increasing client load
Slowlog growth rate	Reveals commands that block the event loop	> 10 new entries per minute, or repeated patterns
`expired_keys` rate	Active expiry consumes main-thread CPU	Sudden spike > 10x baseline
`evicted_keys` rate	Eviction consumes CPU and signals memory pressure	Sustained non-zero rate on persistent workloads
`active_defrag_running`	Defrag runs on the main thread	Sustained > 0 with high miss ratio
`LATENCY LATEST` events	Internal latency breakdown by subsystem	`command`, `expire-cycle`, or `eviction-cycle` events appearing
Keyspace size trend	Rehashing adds incremental overhead	Rapid growth after bulk loads

Fixes

If throughput exceeds single-core capacity

Scale CPU-bound command execution by sharding across multiple Redis instances or cluster nodes. Redis Cluster splits the keyspace across independent event loops, each capable of roughly one core of command execution. I/O threads offload network reads and writes, but they do not parallelize command execution. Do not expect I/O threads to relieve main-thread saturation.

If O(N) commands block the event loop

Replace KEYS with SCAN in all application code. Break large reads such as SMEMBERS, HGETALL, and unbounded LRANGE into smaller batches. Review Lua scripts for loops over large keyspaces and set lua-time-limit to prevent runaway execution.

After remediation, SLOWLOG RESET clears old noise so you can confirm the pattern disappears.

Warning: SLOWLOG RESET immediately and irreversibly clears the slow log history.

If active expiry or eviction consumes CPU

Add jitter to TTLs so keys do not expire in synchronized waves. If eviction is constant because the dataset exceeds maxmemory, increase the memory limit or shard the data. Tuning the eviction policy does not remove the CPU cost of sampling and deleting keys under pressure.

If active defrag consumes CPU

Review defrag effectiveness by comparing active_defrag_hits to active_defrag_misses. If the hit ratio is low but CPU overhead is high, lower active-defrag-cycle-max or disable activedefrag temporarily. Only enable defrag when mem_fragmentation_ratio is sustainably above 1.5.

If rehashing adds overhead

Rehashing is incremental and usually transient. If it persists after bulk loads, the keyspace may be growing beyond planned capacity. Shard before the next bulk operation.

Prevention

Main-thread CPU headroom. Keep main-thread CPU below 70% of one core during peak traffic. This leaves room for expiry cycles, defrag, and sudden command mix shifts.
Monitor main-thread rate, not aggregate CPU. Aggregate used_cpu_* metrics are misleading on multi-core hosts. Use the Redis 6.2+ main-thread counters.
Prohibit KEYS in production. Use ACLs or rename-command to prevent applications from issuing KEYS.
Run periodic big-key analysis. Schedule redis-cli --bigkeys or MEMORY USAGE sampling on representative keys to catch keys that will eventually block the loop.
Add TTL jitter. Prevent mass expiry events by distributing TTLs across a time window.
Size maxmemory to avoid chronic eviction. Persistent workloads should not rely on eviction as a steady-state mechanism.

How Netdata helps

Netdata derives the per-second rate from used_cpu_user_main_thread and used_cpu_sys_main_thread, charting main-thread CPU in isolation from aggregate process noise.

It correlates main-thread CPU with instantaneous_ops_per_sec, slowlog growth, and LATENCY LATEST events, which helps distinguish execution saturation from network or disk bottlenecks.

Alerts fire when main-thread CPU rate crosses 70% and 90% thresholds, and anomaly detection flags instantaneous_ops_per_sec plateaus.

Netdata tracks evicted_keys, expired_keys, and active_defrag_running alongside CPU to identify which background consumer is competing for the event loop.

It also monitors individual cluster nodes to reveal shard-level hot spots that aggregate cluster metrics miss.

The Netdata solution

Redis monitoring with Netdata

Netdata monitors Redis with per-second metrics and ML anomaly detection. Track memory usage and fragmentation, fork/COW latency, replication backlog, evictions, and connection pressure to spot the failure modes in these runbooks early.

See Redis monitoring → Start monitoring free

Redis CPU saturation: hitting the single-core throughput ceiling

Redis CPU saturation: hitting the single-core throughput ceiling

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

If throughput exceeds single-core capacity

If O(N) commands block the event loop

If active expiry or eviction consumes CPU

If active defrag consumes CPU

If rehashing adds overhead

Prevention

How Netdata helps

Related guides

Redis monitoring with Netdata