Redis client output buffer overflow: slow consumers and client-output-buffer-limit

Redis memory climbs faster than the dataset justifies. used_memory approaches maxmemory, keys evict, or the process is OOM-killed, yet the keyspace has not grown. Logs show “scheduled to be closed ASAP for overcoming of output buffer limits,” or clients vanish and reconnect. The culprit is usually a slow consumer that cannot drain its output buffer as fast as Redis fills it. A forgotten MONITOR session or an application that left a socket open but stopped reading are the textbook cases. Redis allocates client output buffers from the main heap; unread response data counts against maxmemory. The default client-output-buffer-limit normal 0 0 0 leaves normal clients unbounded, turning one slow reader into a memory leak that can kill the instance.

What this means

Redis maintains an output buffer per client. When the server writes faster than the client reads, the buffer grows. Three independent limit classes control this: normal, pubsub, and replica. Each class takes a hard limit, a soft limit, and a soft-limit duration. Crossing the hard limit disconnects the client immediately. Exceeding the soft limit continuously for the duration also disconnects.

The normal class defaults to 0 0 0: no hard limit, no soft limit, no timeout. Pub/Sub and replica classes have defaults, but normal clients can grow without bound. Because the limit is checked only as the buffer grows, a client can accumulate gigabytes of buffered replies before Redis acts. Until the limit is crossed, the buffered memory counts against the Redis heap and drives eviction pressure or the OOM killer. This is especially dangerous with MONITOR, which echoes every command, or with a slow replica that cannot keep up.

flowchart TD
    A[Memory spike or client disconnect] --> B{Check CLIENT LIST omem}
    B -->|Normal client| C[Check for MONITOR or slow app]
    B -->|Pub/Sub| D[Check subscriber read loop]
    B -->|Replica| E[Check replica lag and backlog]
    C --> F[Set normal buffer limits]
    D --> G[Fix consumer or shard channels]
    E --> H[Fix replica or adjust replica limit]

Common causes

CauseWhat it looks likeFirst thing to check
MONITOR left runningOne normal client with omem tracking command throughput exactlyCLIENT LIST | grep cmd=monitor
Slow Pub/Sub subscriberOne subscriber in CLIENT LIST TYPE pubsub with large omem while channel counts are stableSort CLIENT LIST TYPE pubsub by omem
Replica falling behindPrimary memory rises; replica shows intermittent master_link_status:down and sync_full incrementsCLIENT LIST TYPE replica on the primary, sort by omem
Slow application consumerOne or more normal clients with large omem; often tied to large reads or a stalled socket drainCLIENT LIST TYPE normal, sort by omem
Unbounded normal limitNo client disconnected for buffer growth; memory climbs until eviction or OOMCONFIG GET client-output-buffer-limit returns normal 0 0 0

Quick checks

# Check aggregate client memory pressure
redis-cli INFO clients | grep -E "connected_clients|client_recent_max_output_buffer"

# Find the largest normal client output buffers
redis-cli CLIENT LIST TYPE normal | tr ' ' '\n' | grep "^omem=" | sort -t= -k2 -nr | head -10

# Detect an active MONITOR session
redis-cli CLIENT LIST | grep "cmd=monitor"

# Inspect Pub/Sub subscriber buffers
redis-cli CLIENT LIST TYPE pubsub | tr ' ' '\n' | grep "^omem=" | sort -t= -k2 -nr | head -10

# Inspect replica buffers on the primary
redis-cli CLIENT LIST TYPE replica | tr ' ' '\n' | grep "^omem=" | sort -t= -k2 -nr | head -10

# View current output buffer limit policy
redis-cli CONFIG GET client-output-buffer-limit

# Check if memory pressure is already causing evictions
redis-cli INFO stats | grep evicted_keys

# Compare dataset size to overhead to confirm non-data bloat
redis-cli INFO memory | grep -E "used_memory_dataset|used_memory_overhead"

How to diagnose it

  1. Confirm buffers are the source. Compare used_memory to used_memory_dataset; a widening gap points to buffers or fragmentation. If used_memory_overhead grows while key count is flat, client buffers or replication backlogs are the likely cause. Check client_recent_max_output_buffer.
  2. Classify the client type. Run CLIENT LIST TYPE normal, TYPE pubsub, and TYPE replica. Look for the largest omem in each class.
  3. Identify the offender. Note addr, name, and cmd. One client with omem in the hundreds of megabytes is usually the target. If name is empty, adopt CLIENT SETNAME in your applications so future incidents map faster to services.
  4. Determine if limits are unbounded. Run CONFIG GET client-output-buffer-limit. If the normal class is 0 0 0, there is no safety rail.
  5. Correlate with workload. If cmd=monitor, the session echoes every command. If the client is a replica, compare master_repl_offset on the primary to slave_repl_offset on the replica; a widening gap that coincides with omem growth confirms the replica is the bottleneck. If it is a subscriber, check whether the application read loop is stalled.
  6. Assess collateral damage. If used_memory is near maxmemory, check evicted_keys and total_error_replies . Buffer bloat can silently evict data or trigger write rejection before the slow client is disconnected.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
CLIENT LIST omemDirect per-client output buffer sizeAny single client omem exceeds the hard limit, or grows without bound while the normal limit is zero
client_recent_max_output_bufferRecent peak output buffer across all clientsSustained growth over minutes
used_memory vs maxmemoryBuffers compete with the dataset for memoryRatio approaching 0.9 while client counts are stable
used_memory_overheadNon-data memory including buffers and backlogsGrowth while keyspace size is flat
evicted_keys rateBloat forces premature evictionSpike correlated with traffic but not keyspace growth
master_link_statusReplicas disconnected by limits appear as replication failuresIntermittent

[OUTPUT TRUNCATED: Response exceeded output token limit.]