Redis max number of clients reached: maxclients and rejected_connections

ERR max number of clients reached in application logs, or a rising rejected_connections metric, means new TCP sockets are refused while existing connections work normally. PING still returns PONG. The failure is a hard cliff: once the total connection count hits maxclients, every new connection is rejected immediately. Because rejected_connections is cumulative, a flat line is healthy and any upward slope is an active incident.

maxclients defaults to 10000, but the effective ceiling is the smaller of the configured value and the OS file-descriptor limit minus Redis’s internal reserve. Redis reserves 32 file descriptors for internal use. At startup it checks ulimit -n. If maxclients + 32 exceeds ulimit -n, Redis silently lowers maxclients to ulimit -n - 32 and logs a warning. On a host with the common default LimitNOFILE=1024, the effective limit is 992 clients even when redis.conf says 10000. That startup warning is easy to miss.

Do not restart Redis to fix this. Identify who is holding connections open, reclaim slots safely with CLIENT KILL, raise the limit without a restart when possible, and remove the root cause.

What this means

maxclients is the maximum number of simultaneous connections Redis accepts, including client, replica, and cluster bus connections. When the total reaches the limit, Redis returns -ERR max number of clients reached, closes the new socket, and increments rejected_connections in INFO stats.

Alert on the rate of change, not the absolute value: any increase during a monitoring window means a client is being turned away right now. The failure mode has a dangerous feedback loop: applications that retry without backoff open more sockets, keeping the count at the ceiling and rejected_connections climbing.

flowchart TD
    A[Connection leak, pool bloat, or traffic spike] --> B[connected_clients + slaves + cluster_connections approaches maxclients]
    B --> C{New connection arrives}
    C -->|total >= maxclients| D[Redis rejects the connection]
    D --> E[rejected_connections increments]
    E --> F[Application retries open more sockets]
    F --> B

Common causes

CauseWhat it looks likeFirst thing to check
Application connection leakconnected_clients grows while throughput is flat; many high idle connections in CLIENT LISTCLIENT LIST sorted by idle
maxclients or ulimit -n too lowconnected_clients flat at ~992 or ~10000; startup log shows maxclients was reducedCONFIG GET maxclients, ulimit -n, systemd LimitNOFILE
Connection pool misconfigurationMany short-lived connections; high total_connections_received with low active command countINFO stats -> total_connections_received and CLIENT LIST age/idle
Sentinel / Cluster overheadReplicas or cluster bus consume slots expected to be freeINFO replication -> connected_slaves, INFO clients -> cluster_connections
Thundering herd after restartrejected_connections spikes after an uptime resetuptime_in_seconds and the rate of connected_clients growth
Slow consumer / forgotten MONITORMemory pressure from output buffers; visible cmd=monitor clientCLIENT LIST -> omem, cmd=monitor

Quick checks

Run in order. Read-only unless noted.

# Confirm rejected_connections is increasing
redis-cli INFO stats | grep rejected_connections
# See the configured maxclients value
redis-cli CONFIG GET maxclients
# Count all connection types that consume maxclients
redis-cli INFO clients | grep -E "connected_clients|cluster_connections"
redis-cli INFO replication | grep connected_slaves
# Check the OS file-descriptor ceiling for the Redis process
cat /proc/$(pgrep -x redis-server | head -n 1)/limits | grep "Max open files"
# List clients by idle time to spot leaks
redis-cli CLIENT LIST | awk -F'[= ]' '{for(i=1;i<=NF;i++) if($i=="idle") print $(i+1), $0}' | sort -rn | head -20
# Measure connection churn; compare two samples to compute a rate
redis-cli INFO stats | grep total_connections_received
# Spot a forgotten MONITOR client or a client with a large output buffer
redis-cli CLIENT LIST | grep -E "cmd=monitor|omem=[1-9][0-9]{7,}"

How to diagnose it

  1. Confirm an active incident. Sample rejected_connections twice, ten seconds apart. A static value means past rejections; an increase means clients are being refused right now.
  2. Compute real capacity usage. Sum connected_clients, connected_slaves, and cluster_connections, then divide by maxclients. Above 80% is tight; at 100% rejections begin.
  3. Check whether the limit was silently lowered at startup. Search the Redis log for a message about maxclients being reduced because of the file-descriptor limit. If found, the effective limit is ulimit -n - 32, not redis.conf.
  4. Profile the client list. Run CLIENT LIST and look for:
    • many connections with high idle (connection leak)
    • low age and high churn (connection pool misconfiguration)
    • high omem (slow consumer or large response backlog)
    • many connections from a single IP (runaway service or load balancer)
    • cmd=monitor (forgotten debugging session)
  5. Correlate with application metrics. Check application logs for pool exhaustion, connection timeouts, or retry storms. A sudden rise in total_connections_received with flat instantaneous_ops_per_sec indicates churn, not load.
  6. Check for a thundering herd. If uptime_in_seconds just reset and connected_clients spiked immediately, the issue is mass reconnection after a restart or failover, not a slow leak.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
rejected_connections rateCumulative counter of hard refusalsAny increase in a monitoring window
Connection capacity ratioShared by clients, replicas, and cluster bus(connected_clients + connected_slaves + cluster_connections) / maxclients > 0.8
connected_clientsDirect count of client socketsSustained growth uncorrelated with traffic
total_connections_received rateConnection churn indicatorRate more than 2x baseline while throughput is flat
client_recent_max_output_buffer / omemLarge buffers consume memory and can hide connection pressureSingle client omem > 64 MB or rising trend
blocked_clientsBlocking consumers hold sockets idleSustained growth above baseline
uptime_in_secondsResets indicate restarts that trigger reconnection stormsUnexpected reset plus connection spike

Fixes

Reclaim connections immediately

If connected_clients is at maxclients, kill stale or harmful clients before raising the limit. CLIENT KILL is disruptive: the targeted client receives a connection error.

# Kill a specific client by address
redis-cli CLIENT KILL 192.0.2.44:57312

Pick targets from CLIENT LIST. Prefer high-idle connections from known non-critical sources. Avoid killing replica or Sentinel connections unless you accept the failover risk.

Raise maxclients at runtime

If the OS file-descriptor limit allows, raise the limit without restarting Redis:

redis-cli CONFIG SET maxclients 15000

This fails if the OS limit is the blocker. Check ulimit -n or systemd LimitNOFILE first. Every connection costs memory; 10,000 idle connections can consume roughly 100 MB before data buffers.

Persist the change

Edit redis.conf:

maxclients 15000

If Redis runs under systemd, raise the OS limit so the next start does not silently cap it again. Adjust the unit name for your installation:

sudo systemctl edit redis

Add:

[Service]
LimitNOFILE=65535

Then reload and restart:

sudo systemctl daemon-reload
sudo systemctl restart redis  # disruptive: drops all connections

The safe formula is LimitNOFILE >= maxclients + 32.

Fix a connection leak

The most common root cause is an application that opens connections without closing them. CLIENT LIST will show many connections with high idle and low age. Fix the application; do not raise maxclients to absorb the leak.

Set an idle timeout so abandoned connections are reclaimed automatically:

timeout 300

The default timeout 0 leaves idle connections open forever.

Fix connection pool churn

If CLIENT LIST shows many connections with age=0 or age=1 and total_connections_received is climbing rapidly, the client is opening and closing sockets instead of reusing them. Tune the pool for longer-lived connections and reduce the pool size per instance.

Reduce overhead from replicas and cluster

In Primary/Replica or Cluster deployments, reserve headroom for non-client connections:

maxclients >= expected_app_clients + connected_slaves + cluster_connections + monitoring_headroom

If replica connections push you near the limit, spread read traffic across replicas so fewer clients hit the primary.

Cap dangerous output buffers

A forgotten MONITOR session or a slow Pub/Sub subscriber can consume enough memory to pressure the server even when connected_clients is below maxclients.

Set hard limits for Pub/Sub output buffers:

client-output-buffer-limit pubsub 32mb 8mb 60

Kill any cmd=monitor client immediately. Forbid MONITOR in production via ACLs or rename-command. The default client-output-buffer-limit normal 0 0 0 is unlimited, which is normally required for large key reads but dangerous when combined with MONITOR.

Prevention

  • Set maxclients explicitly in redis.conf.
  • Set timeout to a non-zero value.
  • Keep peak connection capacity below 80% of maxclients. Reserve the rest for Sentinel, Cluster, replicas, monitoring, and incident response.
  • Keep LimitNOFILE at least maxclients + 32. Verify the effective limit in the startup log after every deployment.
  • Monitor rejected_connections as a rate. Any increment is a ticket-level event.
  • Audit CLIENT LIST periodically for high idle, high omem, and cmd=monitor.
  • Review application connection pool sizing. Prefer a small number of long-lived connections.
  • In containerized environments, check both the container and host file-descriptor limits; the lower one wins.

How Netdata helps

  • Tracks redis.connections and redis.rejected_connections as time-series, so you can alert on the rate of increase instead of the cumulative value.
  • Correlates redis.connections with redis.net and redis.operations to distinguish connection leaks from legitimate traffic spikes.
  • Surfaces redis.memory alongside connection metrics to catch output-buffer bloat from slow consumers or forgotten MONITOR sessions.
  • Provides per-second resolution during incidents, making it easier to see whether CLIENT KILL or a CONFIG SET maxclients change had immediate effect.
  • Supports composite alerts that fire only when connected_clients is high and rejected_connections is increasing, reducing noise from transient reconnections.

[OUTPUT TRUNCATED: Response exceeded output token limit.]