Redis keyspace growing unbounded: keys without TTL and memory leaks
Redis memory climbs. used_memory approaches maxmemory, or the OOM killer intervenes. In INFO keyspace, keys rises while expires stays flat. Or both rise, but used_memory_rss outpaces the dataset. Unbounded keyspace growth is a symptom, not a root cause: missing TTLs, application bugs, expiration backlog, or leaks in client buffers and allocators.
What this means
INFO keyspace reports db{N}:keys=X,expires=Y,avg_ttl=Z. The keys field counts every key; expires counts only those with a TTL. When keys grows and expires does not, new keys are being created without expiration. Even when expires tracks keys, memory can still grow if the active expiration cycle cannot delete keys as fast as they are created.
Redis cleans expired keys lazily on access and actively via a background sampling loop running hz times per second. If neither keeps pace, expired keys accumulate. DBSIZE counts expired-but-not-yet-cleaned keys, so it may report a higher number than the live keyspace. The avg_ttl field is an approximate average TTL of keys with expiration. In mixed workloads it is bimodal and unreliable for alerting.
flowchart TD
A[Keyspace growing] --> B{keys vs expires diverging?}
B -->|Yes| C[Keys without TTL]
B -->|No| D{expired_keys rate low?}
D -->|Yes| E[Expire cycle behind]
D -->|No| F{RSS above used_memory?}
F -->|Yes| G[Fragmentation or buffer leak]
F -->|No| H[Replica lag or allocator bloat]
C --> I[Audit app code for missing EX/EXPIRE]
E --> J[Tune hz or active-expire-effort]
G --> K[Check CLIENT LIST omem / run MEMORY PURGE]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Application creating keys without TTL | keys grows while expires stays flat or grows much slower | INFO keyspace and application code for SET/HSET without EX/PX |
| Active expiration falling behind | expired_keys rate drops or stalls; expired_time_cap_reached_count increases | INFO stats for expire cycle throttling |
| Expired keys accumulating before cleanup | DBSIZE higher than keys in INFO keyspace | Compare DBSIZE to INFO keyspace |
| Client output buffer leak | used_memory climbs but keys is stable; large omem in CLIENT LIST | CLIENT LIST sorted by output buffer size |
| Memory fragmentation or allocator bloat | used_memory_rss grows faster than used_memory; mem_fragmentation_ratio > 1.5 | INFO memory fragmentation metrics |
| Replica expiration lag | Replica shows significantly more keys than primary | Keyspace counts on primary vs replica |
Quick checks
# Keyspace divergence
redis-cli INFO keyspace
# Expiration throughput and throttling
redis-cli INFO stats | grep -E "expired_keys|expired_time_cap_reached_count"
# Logical vs reported size
redis-cli DBSIZE
# Memory composition
redis-cli INFO memory | grep -E "used_memory:|used_memory_rss:|mem_fragmentation_ratio:"
# Largest client output buffers
redis-cli CLIENT LIST | grep -o 'omem=[0-9]*' | cut -d= -f2 | sort -rn | head -10
How to diagnose it
- Quantify divergence. Note
keysandexpiresinINFO keyspace. If the ratio of expires to keys drops, keys are being created without TTL. - Check expiration throughput. Compute the rate of change of
expired_keysfromINFO stats. If it is low relative to key creation, expired keys accumulate. - Check for throttling. If
expired_time_cap_reached_countis increasing, the active expire cycle is hitting its CPU budget. - Compare counts. A materially higher
DBSIZEthanINFO keyspacekeysindicates expired-but-uncleaned keys. - Inspect memory. If
used_memoryandused_memory_dataset_perctrack keyspace growth, the dataset itself is growing. Ifused_memory_rssoutpacesused_memory, checkmem_fragmentation_ratioandallocator_frag_ratio. - Check client buffers. Run
CLIENT LISTand look for largeomemvalues. These count against memory and can mimic a leak. - Sample the keyspace. Iterate with
SCANand verifyTTLon recent keys. Do not useKEYS *. - Check replicas. Replicas do not independently delete expired keys until the primary propagates
DEL. Since Redis 3.2, replicas returnnilfor logically expired keys on read, but the keys remain in memory. - Correlate with deployments. If growth started after a deploy, audit new write paths for missing expiration parameters.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
keyspace keys minus expires | Direct measure of TTL-less key accumulation | Delta growing steadily |
expired_keys rate | Expiration system throughput | Rate drops while write volume is steady |
expired_time_cap_reached_count | Expire cycle CPU budget exhaustion | Any sustained increase |
used_memory / maxmemory | Memory pressure from keyspace growth | > 80% |
mem_fragmentation_ratio | Allocator waste amplifying memory pressure | > 1.5 sustained |
db0:avg_ttl | Average TTL of expiring keys | Do not alert on this; it is bimodal and misleading in mixed workloads |
Fixes
Keys created without TTL
Audit application code for SET, HSET, LPUSH, SADD, and other writes that omit an expiration argument. Add EX/PX to SET, or follow writes with EXPIRE/PEXPIREAT. For existing TTL-less keys, use a script that iterates with SCAN and applies EXPIRE in batches. Warning: This generates write load; run it during low-traffic periods.
Active expiration falling behind
If expired_time_cap_reached_count is climbing, the active expire cycle cannot keep up. Increase hz (default 10) to run the loop more frequently, or raise active-expire-effort (default 1, max 10) to let each cycle sample more keys. Both consume main-thread CPU. Add jitter to TTLs to prevent synchronized mass expiry.
Expired-but-uncleaned keys
If DBSIZE is inflated relative to INFO keyspace, expired keys are awaiting cleanup. This is usually transient. If it persists, ensure hz is adequate and watch expired_time_cap_reached_count. MEMORY PURGE (jemalloc only) can encourage the allocator to return pages to the OS, but it does not force deletion of expired keys.
Client output buffer leak
If CLIENT LIST shows a client with large omem, that connection is consuming heap memory. Kill it with CLIENT KILL <ip:port> or CLIENT KILL TYPE pubsub if appropriate. Warning: This is disruptive to the client. Then tune client-output-buffer-limit for normal, replica, and pubsub clients to prevent recurrence.
Fragmentation and allocator bloat
When mem_fragmentation_ratio is high and used_memory_rss is the real threat, run MEMORY PURGE. If the problem is chronic and you are on Redis 4.0+, enable activedefrag yes. Be aware that active defrag itself consumes CPU.
Replica key count skew
A replica holding more keys than the primary is expected when the primary has recently expired many keys and the replica has not yet received the DEL propagation. If the gap is large and sustained, check replication lag and ensure the replica is not stalled.
Prevention
- Enforce TTLs at the application layer. Treat writes without expiration as a bug in cache and session workloads.
- Monitor the delta between
keysandexpiresas a first-class metric. - Set
maxmemoryand choose an eviction policy appropriate for your data model.allkeys-lrucan mask missing TTLs by evicting cold keys, butnoevictioncauses writes to fail once memory is full.volatile-lruonly evicts keys with TTLs, so it will not help if most keys lack them. - Add jitter to TTLs to smooth out expiration spikes.
- Include keyspace growth rate in capacity planning. Linear extrapolation of
used_memorytomaxmemorygives time-to-intervention.
How Netdata helps
- Netdata collects
INFO keyspacekeysandexpiresper database, surfacing divergence without manualredis-clichecks. - It correlates
used_memory,used_memory_rss, andmaxmemoryon one timeline, distinguishing dataset growth from fragmentation or buffer bloat. - It computes rates for cumulative counters like
expired_keys,evicted_keys, andtotal_commands_processed, so you don’t need to calculate deltas manually. - The Redis collector surfaces
client_output_buffermemory andblocked_clients, helping you identify whether memory growth is in the dataset or in connections. - Alerts can be configured on
used_memory / maxmemoryratio and on sudden drops in the expiration rate.
Related guides
- How Redis actually works in production: a mental model for operators
- Redis aof_last_write_status:err: AOF write failures and recovery
- Redis appendfsync always latency: durability vs throughput trade-offs
- Redis big keys: finding the giant key that blocks the event loop
- Redis blocked_clients growing: dead consumers vs healthy queues
- Redis BUSY Redis is busy running a script: blocking Lua and how to recover
- Redis Can’t save in background: fork: Cannot allocate memory - diagnosis and fix
- Redis client output buffer overflow: slow consumers and client-output-buffer-limit
- Redis cluster_slots_pfail > 0: impending node failure in a cluster
- Redis CLUSTERDOWN / cluster_state:fail: slot coverage and recovery
- Redis connected_clients climbing: connection leak detection
- Redis connected_slaves dropped: detecting replica disconnects on the primary







