Redis mem_fragmentation_ratio high: jemalloc fragmentation and active defrag
A mem_fragmentation_ratio sustained above 1.5 on a production instance means Redis holds significantly more physical memory (RSS) than its logical dataset size (used_memory), wasting RAM that could hold data or absorb spikes. This is not a memory leak. Redis uses jemalloc by default, which does not return freed pages to the OS eagerly. Deleted or resized keys leave holes in allocator arenas, inflating RSS while used_memory stays flat or drops.
The immediate risk is OOM kill. The Linux OOM killer targets RSS. An instance with 20 GB of logical data and a fragmentation ratio of 2.0 occupies 40 GB RAM. A background fork for RDB or AOF copy-on-write can push RSS over a 48 GB host or cgroup limit and kill the process, even though the dataset appears to fit.
Ignore the ratio on instances with less than 50 MB of used_memory. jemalloc’s minimum allocation granularity dominates at small scale. On larger instances, sustained elevation above 1.5 is a capacity incident.
What this means
mem_fragmentation_ratio is used_memory_rss divided by used_memory. A value of 1.0 to 1.1 is optimal; 1.1 to 1.5 is common for active workloads. Sustained values above 1.5 indicate significant waste. Values below 1.0 on instances larger than 100 MB suggest swap, which is catastrophic for latency.
This ratio is coarse. It includes process overhead such as code segments, shared libraries, and stack space, not just allocator fragmentation. For precision, Redis 4.0 and later expose allocator_frag_ratio (allocator_active / allocator_allocated), which isolates true jemalloc external fragmentation. An allocator_frag_ratio above 4.0 warrants attention regardless of the top-level ratio.
The ratio is also unreliable after peak memory events. The allocator holds freed pages for reuse rather than releasing them to the OS. If Redis briefly filled memory with a bulk import and then deleted the keys, used_memory drops but RSS remains at the peak. This produces an artificially high ratio that may not reflect active fragmentation. In this scenario, mem_fragmentation_bytes (the absolute difference between RSS and used_memory) is often more actionable than the ratio itself.
flowchart TD
A[mem_fragmentation_ratio above 1.5] --> B{used_memory below 50MB?}
B -->|Yes| C[Noise: ignore]
B -->|No| D{allocator_frag_ratio above 4.0?}
D -->|Yes| E[True jemalloc fragmentation]
D -->|No| F[RSS padding or process overhead]
E --> G{active_defrag_running?}
G -->|Yes| H[Check hits and misses ratio]
G -->|No| I[Enable activedefrag or run MEMORY PURGE]
F --> J[Check used_memory_peak vs used_memory]
H --> K{Hits high?}
K -->|Yes| L[Defrag working: wait or tune CPU]
K -->|No| M[Workload skips large fields or defrag ineffective]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Post-peak deallocation | Ratio jumps after bulk deletion or eviction; used_memory drops sharply while RSS stays flat | used_memory_peak against current used_memory |
| Heavy key churn with variable-size values | Ratio climbs steadily under write load as jemalloc arenas fragment | allocator_frag_ratio in INFO memory |
| Active defrag disabled or ineffective | Sustained high ratio on Redis 4.0+ with heavy mutation; defrag metrics show misses dominating hits | active_defrag_hits vs active_defrag_misses |
| Large hash or sorted set fields skipping defrag | active_defrag_running is positive but ratio does not decrease | active_defrag_key_misses relative to hits |
| Tiny instance noise | Ratio of 2.0 to 10.0 on instances with less than 50 MB of data | Absolute used_memory value |
Quick checks
Run these read-only commands to characterize the situation before making changes.
# Check top-level fragmentation and memory totals
redis-cli INFO memory | grep -E "used_memory:|used_memory_rss:|mem_fragmentation_ratio:|used_memory_peak:"
# Check allocator-level fragmentation for precision (Redis 4.0+)
redis-cli INFO memory | grep -E "allocator_frag_ratio:|allocator_rss_ratio:"
# Check active defrag status and effectiveness
redis-cli INFO memory | grep active_defrag_running
redis-cli INFO stats | grep -E "active_defrag_hits|active_defrag_misses"
redis-cli CONFIG GET activedefrag
# Run built-in diagnostics (Redis 7.0+)
redis-cli MEMORY DOCTOR
# Check for THP interference
cat /sys/kernel/mm/transparent_hugepage/enabled
How to diagnose it
- Filter out noise. If
used_memoryis below 50 MB, ignore the ratio. - Check for post-peak artifact. Compare
used_memory_peakto currentused_memory. If the peak is many times larger than the current value, the high ratio is likely residual RSS from prior bulk allocations. - Isolate true allocator fragmentation. On Redis 4.0+, inspect
allocator_frag_ratio. If this is elevated above 4.0, you have genuine jemalloc external fragmentation. If it is low butmem_fragmentation_ratiois high, the gap is process overhead or RSS padding. - Assess active defrag state. If
activedefragis disabled and you are on Redis 4.0 or later, the instance is not attempting to compact live objects. If it is enabled butactive_defrag_missesis high relative toactive_defrag_hits, defrag is working hard without reducing fragmentation. This can happen when large hash or sorted set fields exceedactive-defrag-max-scan-fields(default 1000) and are skipped per scan cycle. - Check for THP interference. If Transparent Huge Pages are not set to
never, jemalloc’s page management is impaired, which can amplify fragmentation and worsen fork latency. Theenabledfile should read[never]. - Correlate with persistence events.
used_memory_rssspikes temporarily during RDB or AOF rewrite because of copy-on-write. If the ratio is elevated only duringrdb_bgsave_in_progressoraof_rewrite_in_progress, the condition is transient.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
mem_fragmentation_ratio | Top-level waste indicator; OOM killer uses RSS | Sustained > 1.5 on instances > 50 MB |
allocator_frag_ratio | True jemalloc external fragmentation, excluding process overhead | > 4.0 |
used_memory_peak vs used_memory | Identifies artificial inflation from prior bulk allocations | Peak is 2x or more above current |
active_defrag_running | Indicates whether the defragmenter is active | Non-zero while misses exceed hits |
active_defrag_hits / (hits + misses) | Measures whether defrag is successfully moving objects | < 0.5 while running |
used_memory_rss | Physical memory footprint; determines OOM proximity | Approaching host or cgroup limit |
Fixes
Post-peak RSS retention: run MEMORY PURGE
If the fragmentation is residual from a prior peak, MEMORY PURGE asks jemalloc to purge dirty pages so they can be reclaimed. This is a jemalloc-specific operation; it is a NOOP when using libc or tcmalloc. It reduces used_memory_rss without changing used_memory. On large heaps this command can be slow. Run it during a low-traffic window.
# Ask jemalloc to release dirty pages to the OS
redis-cli MEMORY PURGE
Sustained fragmentation: enable active defrag
For Redis 4.0 and later, enabling active defragmentation moves live objects to fresh memory and releases fragmented pages. This consumes CPU and can add latency if misconfigured. Enable it live to test:
redis-cli CONFIG SET activedefrag yes
The default thresholds are:
active-defrag-ignore-bytes 100mbactive-defrag-threshold-lower 10active-defrag-threshold-upper 100active-defrag-cycle-min 1active-defrag-cycle-max 25active-defrag-max-scan-fields 1000
Do not lower active-defrag-ignore-bytes without reason. The threshold exists to prevent the CPU cost of defrag on small absolute fragmentation. If mem_fragmentation_bytes is below 100 MB, defrag will not trigger even if the percentage threshold is met.
If defrag is enabled but the ratio does not drop, check whether your workload uses large hash or sorted set fields. Objects with fields above the active-defrag-max-scan-fields threshold are skipped during each cycle, which limits effectiveness on those key types. You can raise the scan limit, but doing so increases the CPU cost per cycle.
Version-specific active defrag bugs
Before applying active defrag as a long-term fix, check your Redis version. On Redis 7.2.5 and later, a known bug can cause RSS to grow unbounded even when active defrag is enabled. If you observe this behavior, the workaround is a planned restart. On Redis 8.0.0 and 8.0.1, enabling active defrag causes cron-based timers to run twice as fast due to a scheduling interaction. Upgrade to 8.0.2 or later if you use active defrag.
Disruptive fallback: restart
Restarting Redis resets RSS and eliminates fragmentation immediately. This is effective but causes downtime, cache warmup, and replication delay if the instance is a primary. Use it only when MEMORY PURGE and active defrag have failed, or when you hit version-specific bugs such as unbounded RSS growth despite active defrag. Plan the restart during a maintenance window.
System-level: disable Transparent Huge Pages
If THP is enabled, disable it. The following commands take effect immediately but do not survive reboot:
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag
Persist the change in your system configuration (for example, sysctl or grub) if required.
Prevention
- Monitor
allocator_frag_ratioon Redis 4.0+ alongsidemem_fragmentation_ratioto catch true fragmentation early. On large instances, trackmem_fragmentation_bytesfor an absolute view of wasted memory. - Review
MEMORY MALLOC-STATSperiodically to understand arena-level fragmentation. Sustained growth in dirty or muzzy pages indicates the allocator is struggling to reuse memory efficiently. - Size memory limits with fragmentation headroom. Persistent instances should keep
used_memorybelow roughly 50% of physical RAM ormaxmemory, whichever is tighter, to leave space for RSS overhead and COW during fork. - Avoid mass deletion patterns that leave jemalloc arenas sparse. When possible, use expiration with jitter rather than bulk deletes.
- Evaluate active defrag effectiveness through
active_defrag_hitsandactive_defrag_missesrather than enabling it and forgetting it.
How Netdata helps
- Netdata collects
mem_fragmentation_ratio,used_memory, andused_memory_rssnatively, so you can correlate RSS growth with logical memory changes. - On Redis 4.0+, Netdata also surfaces
allocator_frag_ratio, helping you distinguish allocator waste from process overhead. - Netdata’s alert templates filter out tiny instances, suppressing noise on instances below 50 MB.
- You can correlate
active_defrag_runningwith CPU utilization and command latency to spot when defrag itself is becoming a performance cost. - RSS and
used_memoryare plotted on the same charts, exposing post-peak deallocation patterns.
Related guides
- How Redis actually works in production: a mental model for operators: /guides/redis/how-redis-works-in-production/
- Redis aof_last_write_status:err: AOF write failures and recovery: /guides/redis/redis-aof-last-write-status-err/
- Redis appendfsync always latency: durability vs throughput trade-offs: /guides/redis/redis-appendfsync-always-latency/
- Redis big keys: finding the giant key that blocks the event loop: /guides/redis/redis-big-keys-latency/
- Redis blocked_clients growing: dead consumers vs healthy queues: /guides/redis/redis-blocked-clients-growing/
- Redis BUSY Redis is busy running a script: blocking Lua and how to recover: /guides/redis/redis-busy-running-script/
- Redis Can’t save in background: fork: Cannot allocate memory - diagnosis and fix: /guides/redis/redis-cant-save-in-background-fork/
- Redis client output buffer overflow: slow consumers and client-output-buffer-limit: /guides/redis/redis-client-output-buffer-limit/
- Redis cluster_slots_pfail > 0: impending node failure in a cluster: /guides/redis/redis-cluster-slots-pfail/
- Redis CLUSTERDOWN / cluster_state:fail: slot coverage and recovery: /guides/redis/redis-cluster-state-fail/
- Redis connected_clients climbing: connection leak detection: /guides/redis/redis-connected-clients-climbing/
- Redis connected_slaves dropped: detecting replica disconnects on the primary: /guides/redis/redis-connected-slaves-dropped/







