Redis slowlog filling up: finding and fixing the slow commands

Clients report timeouts and latency has a new plateau. Check SLOWLOG LEN: if it is climbing or already at 128, the slowlog is filling. The slowlog is a circular buffer that logs commands whose execution exceeds slowlog-log-slower-than (default 10 ms). Rapid rotation means entries evict before you inspect them, and every entry marks a blocked single-threaded event loop. Extract the culprits before the evidence disappears, distinguish execution time from queue wait, and stop the bleed.

What this means

The slowlog records command name, arguments, execution duration in microseconds, client address, and client name. It does not record network round-trip time, output buffer delays, or time spent waiting in the event loop queue. A filling slowlog indicates sustained command execution over the threshold, or a buffer too small to retain history. With a default slowlog-max-len of 128, a busy instance can overwrite evidence within seconds.

Common causes

CauseWhat it looks likeFirst thing to check
O(N) commands on large keysSLOWLOG GET shows KEYS *, SMEMBERS, HGETALL, LRANGE 0 -1, or SORT with high microsecondsredis-cli --bigkeys and MEMORY USAGE on the key names from the log
Unoptimized Lua scriptsRepeated EVAL or EVALSHA entries with execution times in the hundreds of millisecondsINFO commandstats for cmdstat_eval and usec_per_call
Large synchronous deletionsDEL on a large list, hash, or sorted set appears in the slowlogMEMORY USAGE on the deleted key to confirm size
Buffer too smallSLOWLOG LEN stays at the max and entries rotate before you can inspect themCONFIG GET slowlog-max-len
Threshold set too lowMany entries for normally fast commands like GET or HGET at 11-20 msCONFIG GET slowlog-log-slower-than and compare to your SLA
Queue wait masquerading as slow executionClient latency is high but the slowlog is empty or shows only modest timesLATENCY LATEST for internal blocking events and INFO stats for ops/sec drops

Quick checks

# Check whether the slowlog is pinned at its limit
redis-cli SLOWLOG LEN
redis-cli CONFIG GET slowlog-max-len
# Inspect the most recent 20 entries for repeating patterns
redis-cli SLOWLOG GET 20
# View per-command call counts and average execution time
redis-cli INFO commandstats
# Check for internal latency events that stall the event loop
redis-cli LATENCY LATEST
# Sample the keyspace for oversized keys
redis-cli --bigkeys
# Map slow commands to source hosts or apps
redis-cli CLIENT LIST
# Check if throughput is dropping because the event loop is blocked
redis-cli INFO stats | grep instantaneous_ops_per_sec

How to diagnose it

  1. Quantify the bleed rate. Compare SLOWLOG LEN to slowlog-max-len. If they are equal, the log is full and evicting old entries. You are losing history.
  2. Pull the evidence. Run SLOWLOG GET 50 and look for a dominant command pattern. Repeated commands or key names indicate a single offender.
  3. Check key sizes. For commands that operate on keys, run MEMORY USAGE on the key names from the slowlog arguments. Keys with tens of megabytes or millions of elements turn O(1) or O(N) commands into event loop wedges.
  4. Validate with commandstats. Run INFO commandstats and look for usec_per_call outliers. A cmdstat_keys entry in production is a red flag regardless of call count.
  5. Distinguish execution from queue wait. If client latency is high but the slowlog looks benign, the delay is likely network, output buffer backpressure, or event loop queueing. Run LATENCY LATEST to look for internal blocking events such as fork that stall the event loop.
  6. Identify the client. Match CLIENT LIST fields addr or name to the client IP and port recorded in the slowlog entries to find the source host or service.
  7. Check for correlation with ops/sec drops. Run INFO stats and look for instantaneous_ops_per_sec dropping during the same window that slowlog entries are created. A drop confirms the event loop is stalled.

Fixes

Increase slowlog retention and tune the threshold

If the log is rotating too fast, increase the buffer and adjust sensitivity:

redis-cli CONFIG SET slowlog-max-len 1024
redis-cli CONFIG SET slowlog-log-slower-than 5000
redis-cli CONFIG REWRITE

This is safe and immediate. A larger buffer retains evidence during spikes. Lower the threshold to catch sub-10 ms outliers; raise it if normal variance floods the log.

Replace O(N) commands with incremental alternatives

Replace KEYS * with SCAN in all application code. For large containers:

  • Paginate SMEMBERS, HGETALL, ZRANGEBYSCORE, and LRANGE with explicit limits instead of retrieving entire collections.
  • Avoid SORT on large datasets.
  • Replace DEL on large keys with UNLINK, which frees memory asynchronously in the background and does not block the event loop.

Optimize or restrict Lua scripts

Lua scripts execute atomically on the main thread. If EVAL or EVALSHA dominates the slowlog, set lua-time-limit so long-running scripts become eligible for SCRIPT KILL, then review script logic to remove O(N) operations inside loops.

Kill misbehaving clients

Once you have identified the client via CLIENT LIST, disconnect it immediately:

redis-cli CLIENT KILL <ip:port>

Warning: this is disruptive to that client. Use it to stop acute damage while you deploy a code fix.

Split large keys at the application layer

If MEMORY USAGE shows a single key consuming tens of megabytes, shard the data into smaller keys or use a different data structure. A single large hash or sorted set turns every access into a potential latency spike.

Prevention

  • Set slowlog-max-len to 1024 or higher in production so you do not lose entries during incidents.
  • Monitor the rate of change of SLOWLOG LEN as a leading indicator.
  • Prohibit KEYS in application code; enforce SCAN via code review or ACL restrictions.
  • Run redis-cli --bigkeys or MEMORY USAGE sampling periodically to catch key-size regressions.
  • Review INFO commandstats weekly for unexpected commands or climbing usec_per_call.
  • Require CLIENT SETNAME in application connection pools so slowlog entries and CLIENT LIST are immediately attributable.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
SLOWLOG LEN growth rateA circular buffer that fills faster than you can inspect loses forensic evidenceSLOWLOG LEN increasing steadily toward slowlog-max-len
Slowlog entry execution timeDirect measure of event loop blocking by a single commandAny entry above 100 ms; the same command pattern recurring multiple times
instantaneous_ops_per_secDrop confirms the event loop is stalled and clients are queuingSustained drop greater than 50% from baseline while clients are connected
INFO commandstatsReveals which command types consume disproportionate CPUusec_per_call orders of magnitude above theoretical complexity
LATENCY LATESTCaptures internal latency events that block the event loopNon-command events (for example, fork) above your monitor threshold
CLIENT LIST output buffer memorySlow commands with large replies can exhaust memory via client buffersAny client with omem exceeding 256 MB
Key size via MEMORY USAGELarge keys turn O(1) or small-O(N) commands into blockersKeys consuming tens of megabytes or containers with millions of elements

How Netdata helps

  • Correlate slowlog growth with instantaneous_ops_per_sec drops on the same timeline to confirm event loop blocking.
  • Alert on SLOWLOG LEN approaching slowlog-max-len so entries are not lost before inspection.
  • Overlay LATENCY LATEST events with slowlog timestamps to separate command execution time from internal blocking.
  • Track used_memory and mem_fragmentation_ratio alongside slowlog spikes to detect large-key pressure.
  • Monitor rejected_connections to catch the cascade that follows slow-command client timeouts.