Redis slowlog filling up: finding and fixing the slow commands
Clients report timeouts and latency has a new plateau. Check SLOWLOG LEN: if it is climbing or already at 128, the slowlog is filling. The slowlog is a circular buffer that logs commands whose execution exceeds slowlog-log-slower-than (default 10 ms). Rapid rotation means entries evict before you inspect them, and every entry marks a blocked single-threaded event loop. Extract the culprits before the evidence disappears, distinguish execution time from queue wait, and stop the bleed.
What this means
The slowlog records command name, arguments, execution duration in microseconds, client address, and client name. It does not record network round-trip time, output buffer delays, or time spent waiting in the event loop queue. A filling slowlog indicates sustained command execution over the threshold, or a buffer too small to retain history. With a default slowlog-max-len of 128, a busy instance can overwrite evidence within seconds.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| O(N) commands on large keys | SLOWLOG GET shows KEYS *, SMEMBERS, HGETALL, LRANGE 0 -1, or SORT with high microseconds | redis-cli --bigkeys and MEMORY USAGE on the key names from the log |
| Unoptimized Lua scripts | Repeated EVAL or EVALSHA entries with execution times in the hundreds of milliseconds | INFO commandstats for cmdstat_eval and usec_per_call |
| Large synchronous deletions | DEL on a large list, hash, or sorted set appears in the slowlog | MEMORY USAGE on the deleted key to confirm size |
| Buffer too small | SLOWLOG LEN stays at the max and entries rotate before you can inspect them | CONFIG GET slowlog-max-len |
| Threshold set too low | Many entries for normally fast commands like GET or HGET at 11-20 ms | CONFIG GET slowlog-log-slower-than and compare to your SLA |
| Queue wait masquerading as slow execution | Client latency is high but the slowlog is empty or shows only modest times | LATENCY LATEST for internal blocking events and INFO stats for ops/sec drops |
Quick checks
# Check whether the slowlog is pinned at its limit
redis-cli SLOWLOG LEN
redis-cli CONFIG GET slowlog-max-len
# Inspect the most recent 20 entries for repeating patterns
redis-cli SLOWLOG GET 20
# View per-command call counts and average execution time
redis-cli INFO commandstats
# Check for internal latency events that stall the event loop
redis-cli LATENCY LATEST
# Sample the keyspace for oversized keys
redis-cli --bigkeys
# Map slow commands to source hosts or apps
redis-cli CLIENT LIST
# Check if throughput is dropping because the event loop is blocked
redis-cli INFO stats | grep instantaneous_ops_per_sec
How to diagnose it
- Quantify the bleed rate. Compare
SLOWLOG LENtoslowlog-max-len. If they are equal, the log is full and evicting old entries. You are losing history. - Pull the evidence. Run
SLOWLOG GET 50and look for a dominant command pattern. Repeated commands or key names indicate a single offender. - Check key sizes. For commands that operate on keys, run
MEMORY USAGEon the key names from the slowlog arguments. Keys with tens of megabytes or millions of elements turn O(1) or O(N) commands into event loop wedges. - Validate with commandstats. Run
INFO commandstatsand look forusec_per_calloutliers. Acmdstat_keysentry in production is a red flag regardless of call count. - Distinguish execution from queue wait. If client latency is high but the slowlog looks benign, the delay is likely network, output buffer backpressure, or event loop queueing. Run
LATENCY LATESTto look for internal blocking events such asforkthat stall the event loop. - Identify the client. Match
CLIENT LISTfieldsaddrornameto the client IP and port recorded in the slowlog entries to find the source host or service. - Check for correlation with ops/sec drops. Run
INFO statsand look forinstantaneous_ops_per_secdropping during the same window that slowlog entries are created. A drop confirms the event loop is stalled.
Fixes
Increase slowlog retention and tune the threshold
If the log is rotating too fast, increase the buffer and adjust sensitivity:
redis-cli CONFIG SET slowlog-max-len 1024
redis-cli CONFIG SET slowlog-log-slower-than 5000
redis-cli CONFIG REWRITE
This is safe and immediate. A larger buffer retains evidence during spikes. Lower the threshold to catch sub-10 ms outliers; raise it if normal variance floods the log.
Replace O(N) commands with incremental alternatives
Replace KEYS * with SCAN in all application code. For large containers:
- Paginate
SMEMBERS,HGETALL,ZRANGEBYSCORE, andLRANGEwith explicit limits instead of retrieving entire collections. - Avoid
SORTon large datasets. - Replace
DELon large keys withUNLINK, which frees memory asynchronously in the background and does not block the event loop.
Optimize or restrict Lua scripts
Lua scripts execute atomically on the main thread. If EVAL or EVALSHA dominates the slowlog, set lua-time-limit so long-running scripts become eligible for SCRIPT KILL, then review script logic to remove O(N) operations inside loops.
Kill misbehaving clients
Once you have identified the client via CLIENT LIST, disconnect it immediately:
redis-cli CLIENT KILL <ip:port>
Warning: this is disruptive to that client. Use it to stop acute damage while you deploy a code fix.
Split large keys at the application layer
If MEMORY USAGE shows a single key consuming tens of megabytes, shard the data into smaller keys or use a different data structure. A single large hash or sorted set turns every access into a potential latency spike.
Prevention
- Set
slowlog-max-lento 1024 or higher in production so you do not lose entries during incidents. - Monitor the rate of change of
SLOWLOG LENas a leading indicator. - Prohibit
KEYSin application code; enforceSCANvia code review or ACL restrictions. - Run
redis-cli --bigkeysorMEMORY USAGEsampling periodically to catch key-size regressions. - Review
INFO commandstatsweekly for unexpected commands or climbingusec_per_call. - Require
CLIENT SETNAMEin application connection pools so slowlog entries andCLIENT LISTare immediately attributable.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
SLOWLOG LEN growth rate | A circular buffer that fills faster than you can inspect loses forensic evidence | SLOWLOG LEN increasing steadily toward slowlog-max-len |
| Slowlog entry execution time | Direct measure of event loop blocking by a single command | Any entry above 100 ms; the same command pattern recurring multiple times |
instantaneous_ops_per_sec | Drop confirms the event loop is stalled and clients are queuing | Sustained drop greater than 50% from baseline while clients are connected |
INFO commandstats | Reveals which command types consume disproportionate CPU | usec_per_call orders of magnitude above theoretical complexity |
LATENCY LATEST | Captures internal latency events that block the event loop | Non-command events (for example, fork) above your monitor threshold |
CLIENT LIST output buffer memory | Slow commands with large replies can exhaust memory via client buffers | Any client with omem exceeding 256 MB |
Key size via MEMORY USAGE | Large keys turn O(1) or small-O(N) commands into blockers | Keys consuming tens of megabytes or containers with millions of elements |
How Netdata helps
- Correlate slowlog growth with
instantaneous_ops_per_secdrops on the same timeline to confirm event loop blocking. - Alert on
SLOWLOG LENapproachingslowlog-max-lenso entries are not lost before inspection. - Overlay
LATENCY LATESTevents with slowlog timestamps to separate command execution time from internal blocking. - Track
used_memoryandmem_fragmentation_ratioalongside slowlog spikes to detect large-key pressure. - Monitor
rejected_connectionsto catch the cascade that follows slow-command client timeouts.
Related guides
- How Redis actually works in production: a mental model for operators
- Redis aof_last_write_status:err: AOF write failures and recovery
- Redis appendfsync always latency: durability vs throughput trade-offs
- Redis Can’t save in background: fork: Cannot allocate memory - diagnosis and fix
- Redis event loop blocked: when one slow command freezes everything
- Redis eviction policy tuning: allkeys-lru vs volatile-ttl vs noeviction
- Redis fork/COW memory storm: why persistence doubles RSS and OOM-kills the box
- Redis KEYS command blocking production: why to replace it with SCAN
- Redis latest_fork_usec too high: THP, NUMA, and fork latency
- Redis maxmemory not set: why every production instance needs a memory limit
- MISCONF Redis is configured to save RDB snapshots - what it means and how to fix it
- Redis monitoring checklist: the signals every production instance needs







