$ guides / redis / redis-big-keys-latency ▌

Operations Guides

Redis big keys: finding the giant key that blocks the event loop

Application latency spikes while redis-cli PING still returns PONG. Simple GET commands take hundreds of milliseconds. Aggregate used_memory looks stable, instantaneous_ops_per_sec drops, and the slowlog grows. The culprit is often a single oversized key: a sorted set with millions of elements, a hash with millions of fields, or a list fetched with an unbounded range. Redis executes commands sequentially on one main thread; an O(N) command on a giant key blocks every other client until it completes. This guide shows how to find that key and fix it without restarting Redis.

What this means

Redis is single-threaded for command execution. I/O threads can read and write sockets in parallel since Redis 6.0, but traversing a hash table, sorting a set, or freeing a large object happens on the main thread. When a command touches a large structure, every other command queues behind it. A ZRANGEBYSCORE on a 50M-element sorted set, an HGETALL on a giant hash, a DEL on a massive key, or an unbounded LRANGE consumes CPU and wall-clock time proportionally to the key’s size. Aggregate memory metrics hide this: used_memory can look healthy while one key causes periodic freezes.

flowchart TD
    A[Latency spike / ops drop] --> B{SLOWLOG shows O(N) command?}
    B -->|Yes| C[Note key name and command]
    B -->|No| D[Check LATENCY LATEST for fork/fsync]
    D --> E[Not a big key issue]
    C --> F[Run redis-cli --bigkeys]
    F --> G{Key in top per-type results?}
    G -->|Yes| H[MEMORY USAGE on suspect]
    G -->|No| I[Sample random keys with MEMORY USAGE]
    H --> J[Confirm oversized key]
    I --> J
    J --> K[Use UNLINK or paginate access]

Common causes

Cause	What it looks like	First thing to check
O(N) command on a large collection	Slowlog shows `HGETALL`, `SMEMBERS`, `LRANGE 0 -1`, `SORT`, or `ZRANGEBYSCORE` with high execution times	`SLOWLOG GET 10` and `INFO commandstats`
A single key growing without bound	Latency spikes correlate with writes to one key; key count is stable but one structure is bloated	`redis-cli --bigkeys`
Synchronous deletion of a large key	A single `DEL` causes a multi-second freeze; `LATENCY LATEST` shows a `command` spike	`LATENCY LATEST` and `LATENCY HISTORY command`
Lua script iterating a large key	Slowlog shows `EVAL` or `EVALSHA` with very high `usec_per_call`	`SLOWLOG GET` filtered by script entries
Application fetching entire structures instead of paginating	Repeated large output buffer spikes in `CLIENT LIST`; high outbound network traffic	`CLIENT LIST` `omem` values

Quick checks

# Check for recent slow commands and their arguments
redis-cli SLOWLOG GET 10

# Find the biggest key per data type via incremental SCAN
redis-cli --bigkeys

# Estimate RAM for a specific suspected key
redis-cli MEMORY USAGE my:suspect:key SAMPLES 5

# Identify commands with high per-call latency
redis-cli INFO commandstats | grep -E 'cmdstat_hgetall|cmdstat_smembers|cmdstat_lrange|cmdstat_sort|cmdstat_zrangebyscore|cmdstat_eval'

# List clients and sort by output buffer size to spot fetch-heavy connections
redis-cli CLIENT LIST | awk -F'[= ]' '{for(i=1;i<=NF;i++) if($i=="omem") print $(i+1)}' | sort -rn | head -10

# Check internal latency events for command spikes
redis-cli LATENCY LATEST

How to diagnose it

Confirm the event loop is blocked by commands, not by fork or fsync. Run SLOWLOG GET 10 and LATENCY LATEST. If slowlog entries show execution times over 100ms and LATENCY LATEST reports command spikes, the event loop is wedged by expensive operations.
Identify the command pattern. Use INFO commandstats and look for outliers in usec_per_call. Common offenders: HGETALL, SMEMBERS, LRANGE, SORT, ZRANGEBYSCORE, and EVAL.
Find the largest keys. Run redis-cli --bigkeys. This uses SCAN incrementally and is safe for production. It reports the biggest key per data type by element count or size.
Measure exact memory for suspects. Run MEMORY USAGE <key> [SAMPLES count] on the candidates from step 3 and on the keys accessed by the slow commands. High byte counts confirm which structures are overweight.
Correlate keys to clients. Run CLIENT LIST and look for connections with large omem values or cmd fields matching the slow command. This identifies which application instance is generating the load.
Determine if the key is necessary. If it is temporary or cache data, removal is the fastest fix. If it is required data, change the access pattern instead of deleting the structure.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Slowlog entry count rate	Direct evidence of commands blocking the event loop	Sustained growth > 10 entries per minute
Main-thread CPU utilization	Big key operations saturate the single execution core. In Redis 7+, derive a rate from `used_cpu_user_main_thread` and `used_cpu_sys_main_thread` deltas	Rate approaching 1.0 second per second (100% of one core)
`instantaneous_ops_per_sec`	Drops when the event loop is blocked by a slow command	Sustained drop > 50% from baseline with stable client count
`cmdstat_*` `usec_per_call`	Reveals which specific command types are expensive. Run `CONFIG RESETSTAT` after deployments to isolate recent behaviour	O(N) command averaging > 10ms per call since last reset
Client output buffer memory (`omem`)	Large buffers indicate clients retrieving oversized values	Any single client `omem` > 256MB
Aggregate `used_memory` vs per-key `MEMORY USAGE`	Aggregate hides outliers; a single key can dominate	Top key consumes > 20% of total dataset memory

Fixes

Immediate: remove the key safely

If the key is expendable, do not use DEL. DEL frees memory synchronously and will block the event loop for the entire duration of the deletion, potentially for seconds on a multi-gigabyte structure. Use UNLINK instead. UNLINK removes the key from the keyspace immediately and defers memory reclamation to a background thread. You can also set CONFIG SET lazyfree-lazy-user-del yes to make DEL behave like UNLINK.

Warning: this destroys data. Confirm the key name and its purpose before running it.

Change application access patterns

Replace full-structure commands with scoped alternatives. Instead of HGETALL, use HSCAN or fetch specific fields with HMGET. Instead of SMEMBERS, use SSCAN or test membership with SISMEMBER. Instead of LRANGE 0 -1, use bounded ranges. Instead of ZRANGEBYSCORE with no limit, use ZSCAN or paginate with COUNT. This reduces command complexity from O(N) to O(1) or O(log N) per chunk.

Optimize or remove Lua scripts

If a Lua script iterates a large structure, break it into smaller batches executed from the client side, or refactor to avoid full traversals. Set lua-time-limit (default 5000 ms) to define when Redis flags a script as slow and allows SCRIPT KILL. Note that SCRIPT KILL succeeds only against scripts that have not yet performed writes.

Shard large structures

If the data must remain and be accessed in bulk, shard it across multiple smaller keys. For example, split a giant hash into user:1000:profile, user:1001:profile, and so on, or partition a sorted set by score range. This keeps any single key small enough that O(N) traversals complete quickly.

Enable lazy freeing by default

Set lazyfree-lazy-user-del yes in redis.conf or via CONFIG SET. This ensures future accidental or intentional DEL operations on large keys do not block the event loop. For FLUSHDB and FLUSHALL, pass the ASYNC flag to avoid synchronous deletion.

Prevention

Schedule periodic redis-cli --bigkeys or MEMORY USAGE sampling runs via cron or configuration management. Trend per-key memory to catch growth before it blocks the event loop.
Ban unbounded O(N) commands in application code reviews. Enforce pagination for all collection access.
Set client-output-buffer-limit normal <hard> <soft> <seconds>; for example, client-output-buffer-limit normal 256mb 128mb 60. This disconnects runaway fetches before they destabilize the server.
Monitor INFO commandstats for usec_per_call regressions after each deployment; reset stats with CONFIG RESETSTAT to establish a clean baseline.
Keep lazyfree-lazy-user-del yes enabled on all production instances.
Maintain per-key memory dashboards if your monitoring system supports scraping MEMORY USAGE samples.

How Netdata helps

Correlates drops in instantaneous operations per second with keyspace hits and system CPU spikes to confirm event loop blocking.
Surfaces main-thread CPU saturation when a big key monopolizes the single execution core.
Tracks memory usage alongside application latency to expose when stable aggregate memory masks per-key outliers.
Alerts on rejected connections and connected client anomalies that follow latency spikes caused by queued commands.
Provides slowlog integration to visualize command latency outliers without manual SLOWLOG GET queries.

The Netdata solution

Redis monitoring with Netdata

Netdata monitors Redis with per-second metrics and ML anomaly detection. Track memory usage and fragmentation, fork/COW latency, replication backlog, evictions, and connection pressure to spot the failure modes in these runbooks early.

See Redis monitoring → Start monitoring free

Redis big keys: finding the giant key that blocks the event loop

Redis big keys: finding the giant key that blocks the event loop

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Immediate: remove the key safely

Change application access patterns

Optimize or remove Lua scripts

Shard large structures

Enable lazy freeing by default

Prevention

How Netdata helps

Related guides

Redis monitoring with Netdata