Redis MONITOR left running: the output-buffer OOM footgun

A forgotten MONITOR session typically produces this pattern:

  • used_memory and used_memory_rss climb steadily.
  • If maxmemory is set and an eviction policy is active, keys disappear; otherwise the kernel OOM killer may terminate the process.
  • Application latency spikes.
  • Network output roughly doubles network input.
  • No large keys, no persistence fork, and no replication backlog overflow.

MONITOR streams a serialized copy of every executed command into the requesting client’s output buffer. That buffer is allocated on the main heap and counts in used_memory, which contributes to maxmemory pressure and RSS growth . Redis classifies MONITOR clients as normal clients, and the default client-output-buffer-limit normal 0 0 0 places no bound on normal client output buffers. Under production load, the buffer can grow by multiple gigabytes per minute. The monitor client also consumes roughly one extra copy of outbound network traffic, so output bandwidth approximately doubles.

Unlike Pub/Sub and replica clients, which have non-zero default output buffer limits, normal clients are trusted to drain promptly. MONITOR violates that assumption. There is no built-in alert for an active MONITOR session; you detect it by inspecting client state.

flowchart LR
    A[Client runs MONITOR] --> B[Every command copied to output buffer]
    B --> C[Normal class limit is 0 0 0]
    C --> D[Buffer grows unbounded under load]
    D --> E[Memory exhausted or OOM]
    E --> F[Eviction or process killed]

Common causes

CauseWhat it looks likeFirst check
Forgotten interactive MONITORA developer ran redis-cli monitor in a terminal, screen, or tmux and detached. Memory climbs while idle time in CLIENT LIST grows.CLIENT LIST for a single client with disproportionate omem
Monitoring script or tool using MONITORA custom script streams MONITOR output continuously. cmdstat_monitor shows non-zero calls.INFO commandstats for cmdstat_monitor
Default normal client buffer limit left at 0 0 0Any slow normal consumer can grow its output buffer without bound.CONFIG GET client-output-buffer-limit

Quick checks

Run these safe, read-only commands:

# Has MONITOR ever been invoked since startup or last CONFIG RESETSTAT?
redis-cli INFO commandstats | grep cmdstat_monitor

# List clients; look for one with huge omem and high idle time
redis-cli CLIENT LIST

# Logical vs OS-reported memory
redis-cli INFO memory | grep -E 'used_memory:|used_memory_rss:'

# Recent peak output buffer size
redis-cli INFO clients | grep client_recent_max_output_buffer

# Asymmetric network traffic
redis-cli INFO stats | grep -E 'instantaneous_input_kbps:|instantaneous_output_kbps:'

# Current output buffer limits
redis-cli CONFIG GET client-output-buffer-limit

# Keys evicted or clients dropped because of buffer limits
# Note: evicted_clients is reported in Redis 7.4+ <!-- TODO: verify evicted_clients is available in Redis 7.4+ -->
redis-cli INFO stats | grep -E 'evicted_keys:|evicted_clients:'

# Host-level OOM kill check (often unavailable inside containers)
dmesg | grep -i 'killed process.*redis'

How to diagnose it

  1. Confirm buffer-driven growth. Compare used_memory with used_memory_rss. If used_memory is rising rapidly while key count in INFO keyspace is flat, the growth is overhead, buffers, or fragmentation. A MONITOR client inflates used_memory through its heap-allocated output buffer.
  2. Find the oversized client. In CLIENT LIST, look for one client whose omem is orders of magnitude larger than the rest. A healthy application client usually has a small output buffer unless it is pipelining large responses. An idle client with a large, growing omem is suspicious.
  3. Correlate with command statistics. INFO commandstats should show cmdstat_monitor. The counter increments once per MONITOR invocation, not per echoed command, so even a single call is enough to explain the behavior.
  4. Check network asymmetry. Compare instantaneous_output_kbps with instantaneous_input_kbps. If output is roughly double input and the instance is not replicating to multiple replicas, a MONITOR client is likely consuming a full copy of the traffic.
  5. Look for eviction or OOM. If used_memory is near maxmemory, check evicted_keys. If there is no maxmemory limit or the policy is noeviction, check system logs for OOM killer activity. CLIENT LIST will still reveal the buffer hog if the process is alive.
  6. Verify buffer limits. CONFIG GET client-output-buffer-limit. If the normal class is 0 0 0, the instance has no protection against this failure mode.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
CLIENT LIST omemPer-client output buffer memory; a runaway MONITOR accumulates hereSingle client omem exceeds 100 MB or grows steadily
INFO commandstats cmdstat_monitorConfirms MONITOR was executedNon-zero call count outside a known debugging window
client_recent_max_output_bufferTracks the largest recent output bufferSustained increase indicates a slow consumer
used_memory vs maxmemoryOutput buffers count in used_memory and can pressure the memory limitused_memory climbing toward maxmemory with flat key count
used_memory_rssWhat the OS and OOM killer seeRSS approaching host or container memory limit
instantaneous_output_kbpsMONITOR doubles output bandwidthOutput rate far exceeds input rate without replication load
INFO stats evicted_clientsClients dropped due to buffer limit breachesNon-zero count indicates a prior runaway consumer

Fixes

Kill the MONITOR client

Identify the client id from CLIENT LIST.

# WARNING: terminates the client connection
redis-cli CLIENT KILL ID <client-id>

If the instance is so overloaded that redis-cli times out, close the TCP session from the client host or network path. Once the connection is gone, the output buffer memory is freed to the allocator. Due to jemalloc behavior, RSS may not return to the OS immediately; monitor used_memory_rss and run MEMORY PURGE if the build supports it.

After killing the client, rerun INFO memory and CLIENT LIST to confirm omem has dropped and used_memory has stabilized. Latency should begin recovering within seconds. If memory does not drop, look for additional slow consumers or a background rewrite that is temporarily retaining RSS.

Set a normal client output buffer limit

Apply a bounded hard limit to the normal client class. This prevents any single normal client, including a forgotten MONITOR, from consuming unlimited memory.

# WARNING: change running configuration; affects all normal clients
# Read current limits first, then preserve existing replica/pubsub values
redis-cli CONFIG GET client-output-buffer-limit

# Example: normal hard/soft 100 MB for 60 seconds
redis-cli CONFIG SET client-output-buffer-limit "normal 104857600 104857600 60 replica <existing> pubsub <existing>"

Persist the change in redis.conf. Tune limits to your workload’s peak burst output; legitimate clients that retrieve large values or pipeline heavily can also hit the limit. Pub/Sub and replica clients use separate classes and are unaffected by the normal class setting.

Replace MONITOR with safer observability

Do not use MONITOR as a continuous observability feed:

  • INFO commandstats and SLOWLOG GET for expensive commands.
  • LATENCY LATEST for internal latency breakdowns.
  • External metrics collection instead of streaming command replication.

Prevention

  • Treat MONITOR like a debugging breakpoint, not a dashboard source. Never leave it running unattended.
  • Set client-output-buffer-limit normal to a non-zero hard limit in every production instance.
  • Audit INFO commandstats. Any appearance of cmdstat_monitor outside a defined maintenance window should trigger investigation.
  • Restrict MONITOR via ACL rules (Redis 6.0+) or rename-command if your deployment does not require it.
  • Include CLIENT LIST inspection in memory-related incident response runbooks. No built-in metric alerts when MONITOR is active.

How Netdata helps

  • redis.client_output_buffer and related memory charts surface output buffer growth without manual CLIENT LIST parsing.
  • The redis.commandstats dimension for cmdstat_monitor makes MONITOR usage visible as soon as it occurs.
  • Memory charts correlate used_memory and used_memory_rss. Climbing RSS with flat logical memory points to buffer pressure or fragmentation.
  • Network throughput charts highlight asymmetric output spikes that correlate with forgotten MONITOR sessions.
  • Redis response time metrics help spot latency impact of memory pressure before the OOM killer intervenes.
  • How Redis actually works in production: a mental model for operators: /guides/redis/how-redis-works-in-production/
  • Redis aof_last_write_status:err: AOF write failures and recovery: /guides/redis/redis-aof-last-write-status-err/
  • Redis appendfsync always latency: durability vs throughput trade-offs: /guides/redis/redis-appendfsync-always-latency/
  • Redis BUSY Redis is busy running a script: blocking Lua and how to recover: /guides/redis/redis-busy-running-script/
  • Redis Can’t save in background: fork: Cannot allocate memory - diagnosis and fix: /guides/redis/redis-cant-save-in-background-fork/
  • Redis client output buffer overflow: slow consumers and client-output-buffer-limit: /guides/redis/redis-client-output-buffer-limit/
  • Redis connected_clients climbing: connection leak detection: /guides/redis/redis-connected-clients-climbing/
  • Redis connection exhaustion: leaks, pools, and the retry storm: /guides/redis/redis-connection-exhaustion/
  • Redis event loop blocked: when one slow command freezes everything: /guides/redis/redis-event-loop-blocked/
  • Redis eviction policy tuning: allkeys-lru vs volatile-ttl vs noeviction: /guides/redis/redis-eviction-policy-tuning/
  • Redis fork/COW memory storm: why persistence doubles RSS and OOM-kills the box: /guides/redis/redis-fork-cow-storm/
  • Redis KEYS command blocking production: why to replace it with SCAN: /guides/redis/redis-keys-command-blocking-production/