Redis keyspace growing unbounded: keys without TTL and memory leaks

Redis memory climbs. used_memory approaches maxmemory, or the OOM killer intervenes. In INFO keyspace, keys rises while expires stays flat. Or both rise, but used_memory_rss outpaces the dataset. Unbounded keyspace growth is a symptom, not a root cause: missing TTLs, application bugs, expiration backlog, or leaks in client buffers and allocators.

What this means

INFO keyspace reports db{N}:keys=X,expires=Y,avg_ttl=Z. The keys field counts every key; expires counts only those with a TTL. When keys grows and expires does not, new keys are being created without expiration. Even when expires tracks keys, memory can still grow if the active expiration cycle cannot delete keys as fast as they are created.

Redis cleans expired keys lazily on access and actively via a background sampling loop running hz times per second. If neither keeps pace, expired keys accumulate. DBSIZE counts expired-but-not-yet-cleaned keys, so it may report a higher number than the live keyspace. The avg_ttl field is an approximate average TTL of keys with expiration. In mixed workloads it is bimodal and unreliable for alerting.

flowchart TD
    A[Keyspace growing] --> B{keys vs expires diverging?}
    B -->|Yes| C[Keys without TTL]
    B -->|No| D{expired_keys rate low?}
    D -->|Yes| E[Expire cycle behind]
    D -->|No| F{RSS above used_memory?}
    F -->|Yes| G[Fragmentation or buffer leak]
    F -->|No| H[Replica lag or allocator bloat]
    C --> I[Audit app code for missing EX/EXPIRE]
    E --> J[Tune hz or active-expire-effort]
    G --> K[Check CLIENT LIST omem / run MEMORY PURGE]

Common causes

CauseWhat it looks likeFirst thing to check
Application creating keys without TTLkeys grows while expires stays flat or grows much slowerINFO keyspace and application code for SET/HSET without EX/PX
Active expiration falling behindexpired_keys rate drops or stalls; expired_time_cap_reached_count increasesINFO stats for expire cycle throttling
Expired keys accumulating before cleanupDBSIZE higher than keys in INFO keyspaceCompare DBSIZE to INFO keyspace
Client output buffer leakused_memory climbs but keys is stable; large omem in CLIENT LISTCLIENT LIST sorted by output buffer size
Memory fragmentation or allocator bloatused_memory_rss grows faster than used_memory; mem_fragmentation_ratio > 1.5INFO memory fragmentation metrics
Replica expiration lagReplica shows significantly more keys than primaryKeyspace counts on primary vs replica

Quick checks

# Keyspace divergence
redis-cli INFO keyspace

# Expiration throughput and throttling
redis-cli INFO stats | grep -E "expired_keys|expired_time_cap_reached_count"

# Logical vs reported size
redis-cli DBSIZE

# Memory composition
redis-cli INFO memory | grep -E "used_memory:|used_memory_rss:|mem_fragmentation_ratio:"

# Largest client output buffers
redis-cli CLIENT LIST | grep -o 'omem=[0-9]*' | cut -d= -f2 | sort -rn | head -10

How to diagnose it

  1. Quantify divergence. Note keys and expires in INFO keyspace. If the ratio of expires to keys drops, keys are being created without TTL.
  2. Check expiration throughput. Compute the rate of change of expired_keys from INFO stats. If it is low relative to key creation, expired keys accumulate.
  3. Check for throttling. If expired_time_cap_reached_count is increasing, the active expire cycle is hitting its CPU budget.
  4. Compare counts. A materially higher DBSIZE than INFO keyspace keys indicates expired-but-uncleaned keys.
  5. Inspect memory. If used_memory and used_memory_dataset_perc track keyspace growth, the dataset itself is growing. If used_memory_rss outpaces used_memory, check mem_fragmentation_ratio and allocator_frag_ratio .
  6. Check client buffers. Run CLIENT LIST and look for large omem values. These count against memory and can mimic a leak.
  7. Sample the keyspace. Iterate with SCAN and verify TTL on recent keys. Do not use KEYS *.
  8. Check replicas. Replicas do not independently delete expired keys until the primary propagates DEL. Since Redis 3.2, replicas return nil for logically expired keys on read, but the keys remain in memory.
  9. Correlate with deployments. If growth started after a deploy, audit new write paths for missing expiration parameters.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
keyspace keys minus expiresDirect measure of TTL-less key accumulationDelta growing steadily
expired_keys rateExpiration system throughputRate drops while write volume is steady
expired_time_cap_reached_countExpire cycle CPU budget exhaustionAny sustained increase
used_memory / maxmemoryMemory pressure from keyspace growth> 80%
mem_fragmentation_ratioAllocator waste amplifying memory pressure> 1.5 sustained
db0:avg_ttlAverage TTL of expiring keysDo not alert on this; it is bimodal and misleading in mixed workloads

Fixes

Keys created without TTL

Audit application code for SET, HSET, LPUSH, SADD, and other writes that omit an expiration argument. Add EX/PX to SET, or follow writes with EXPIRE/PEXPIREAT. For existing TTL-less keys, use a script that iterates with SCAN and applies EXPIRE in batches. Warning: This generates write load; run it during low-traffic periods.

Active expiration falling behind

If expired_time_cap_reached_count is climbing, the active expire cycle cannot keep up. Increase hz (default 10) to run the loop more frequently, or raise active-expire-effort (default 1, max 10) to let each cycle sample more keys. Both consume main-thread CPU. Add jitter to TTLs to prevent synchronized mass expiry.

Expired-but-uncleaned keys

If DBSIZE is inflated relative to INFO keyspace, expired keys are awaiting cleanup. This is usually transient. If it persists, ensure hz is adequate and watch expired_time_cap_reached_count. MEMORY PURGE (jemalloc only) can encourage the allocator to return pages to the OS, but it does not force deletion of expired keys.

Client output buffer leak

If CLIENT LIST shows a client with large omem, that connection is consuming heap memory. Kill it with CLIENT KILL <ip:port> or CLIENT KILL TYPE pubsub if appropriate. Warning: This is disruptive to the client. Then tune client-output-buffer-limit for normal, replica, and pubsub clients to prevent recurrence.

Fragmentation and allocator bloat

When mem_fragmentation_ratio is high and used_memory_rss is the real threat, run MEMORY PURGE. If the problem is chronic and you are on Redis 4.0+, enable activedefrag yes. Be aware that active defrag itself consumes CPU.

Replica key count skew

A replica holding more keys than the primary is expected when the primary has recently expired many keys and the replica has not yet received the DEL propagation. If the gap is large and sustained, check replication lag and ensure the replica is not stalled.

Prevention

  • Enforce TTLs at the application layer. Treat writes without expiration as a bug in cache and session workloads.
  • Monitor the delta between keys and expires as a first-class metric.
  • Set maxmemory and choose an eviction policy appropriate for your data model. allkeys-lru can mask missing TTLs by evicting cold keys, but noeviction causes writes to fail once memory is full. volatile-lru only evicts keys with TTLs, so it will not help if most keys lack them.
  • Add jitter to TTLs to smooth out expiration spikes.
  • Include keyspace growth rate in capacity planning. Linear extrapolation of used_memory to maxmemory gives time-to-intervention.

How Netdata helps

  • Netdata collects INFO keyspace keys and expires per database, surfacing divergence without manual redis-cli checks.
  • It correlates used_memory, used_memory_rss, and maxmemory on one timeline, distinguishing dataset growth from fragmentation or buffer bloat.
  • It computes rates for cumulative counters like expired_keys, evicted_keys, and total_commands_processed, so you don’t need to calculate deltas manually.
  • The Redis collector surfaces client_output_buffer memory and blocked_clients, helping you identify whether memory growth is in the dataset or in connections.
  • Alerts can be configured on used_memory / maxmemory ratio and on sudden drops in the expiration rate.