$ guides / redis / redis-swapping-fragmentation-below-one ▌

Operations Guides

Redis mem_fragmentation_ratio below 1.0: detecting swap death

A mem_fragmentation_ratio of 0.72 on a substantial dataset means the operating system has swapped out Redis memory pages. Redis stays alive and responds to PING, but every command touching a swapped key blocks the single event loop on disk I/O. Because Redis logs nothing about swap, the resulting latency catastrophe looks like a mystery.

mem_fragmentation_ratio equals used_memory_rss / used_memory. When the ratio drops below 1.0, the resident set size is smaller than the memory Redis requested from its allocator. The missing bytes are in swap. Operators typically watch for ratios above 1.5, so a low ratio is misread as “good fragmentation” when it is actually the worst memory-related failure mode short of an OOM kill.

The operational threshold is roughly 0.8 on instances with used_memory above 100 MB. Smaller instances can show ratio noise from process overhead, but on a production dataset a sustained value below 0.8 means swap death.

What this means

used_memory tracks what Redis requested from its allocator. used_memory_rss tracks what the OS kept in RAM. When RSS falls below allocated memory, the kernel paged some of Redis’s anonymous memory out to disk.

Redis has no awareness that its pages are on swap. The event loop continues accepting commands, but when it touches a swapped key, value, or internal structure, the thread blocks waiting for the storage layer to page data back in. Because command execution is single-threaded, one swapped access stalls every other client. The latency is a wall, not a slope.

Swapped pages persist even after host memory pressure subsides. The kernel does not bring them back automatically when free RAM becomes available; they stay on disk until Redis touches them again. The latency hit can outlast the original pressure by hours.

Common causes

Cause	What it looks like	First thing to check
Host memory overcommit or competing processes	Ratio drops as other processes allocate RAM; system swap usage grows; latency climbs silently	`free -h` and `/proc/swaps`
Fork COW pressure from persistence	Ratio drops during or immediately after `BGSAVE` or AOF rewrite; `latest_fork_usec` spiked	`rdb_bgsave_in_progress`, `aof_rewrite_in_progress`, and host RAM headroom
`vm.swappiness` too high	Gradual ratio decline under normal load; no sudden memory spike in Redis	`cat /proc/sys/vm/swappiness`
Container memory limits without host headroom	Ratio drops inside a container while the host shows free RAM; OOM kills may follow	Container cgroup memory limits versus `used_memory_rss`
NUMA misconfiguration	Intermittent ratio drops on large bare-metal hosts under load	`numactl --hardware` or `/sys/devices/system/node/`

Quick checks

# Confirm ratio and instance size
redis-cli INFO memory | grep -E "mem_fragmentation_ratio|used_memory:"

# Check system swap usage and available RAM
free -h && swapon --show

# Check kernel tendency to swap anonymous pages
cat /proc/sys/vm/swappiness

# Check per-process swap consumption for the Redis process
cat /proc/$(pidof redis-server | awk '{print $1}')/status | grep VmSwap

# Check allocator metrics to rule out non-swap artifacts (Redis 4.0+)
redis-cli INFO memory | grep -E "allocator_frag_ratio|allocator_rss_ratio"

# Check for active persistence forks that may have triggered COW bloat
redis-cli INFO persistence | grep -E "rdb_bgsave_in_progress|aof_rewrite_in_progress"

All of these are read-only. None change server state.

How to diagnose it

Validate the signal. On instances with used_memory above 100 MB, a mem_fragmentation_ratio below 0.8 indicates swap. On smaller instances the ratio may be ambiguous due to process overhead; corroborate with OS metrics before treating as swap death.
Confirm OS-level swap. Run free -h and check that swap used is non-zero or trending upward. Check VmSwap in /proc/<pid>/status for the Redis process. Non-zero VmSwap is ground truth that the kernel has moved Redis pages out of RAM.
Rule out allocator artifacts. On Redis 4.0+, check allocator_frag_ratio and allocator_rss_ratio. If these sit in normal ranges while mem_fragmentation_ratio is depressed, the low ratio is not an allocator artifact. It is swap.
Correlate with persistence events. Check whether the ratio drop coincides with rdb_bgsave_in_progress=1 or aof_rewrite_in_progress=1. A fork() doubles RSS via copy-on-write; if the host had no headroom, the kernel may have swapped Redis parent pages to make room for COW overhead. This is likely when latest_fork_usec spiked immediately beforehand.
Check system memory pressure history. Inspect dmesg for OOM killer activity or memory-reclaim messages around the time the ratio dropped. Run vmstat 1 and look for sustained si and so columns indicating active swap-in and swap-out.
Determine whether pressure is ongoing or residual. If host free RAM has recovered but mem_fragmentation_ratio remains low, the pages are still swapped. Redis never touched them to trigger a page fault and bring them back. The event loop is running on borrowed time until the next access to a cold page stalls every client.

flowchart TD
    A[mem_fragmentation_ratio < 1.0] --> B{used_memory > 100MB?}
    B -->|No| C[Ambiguous: process overhead dominates]
    B -->|Yes| D[Check OS swap usage]
    D --> E[Swap confirmed]
    E --> F[Identify memory pressure source]
    F --> G[Restart Redis to reload pages in RAM]

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`mem_fragmentation_ratio`	Primary indicator comparing RSS to allocated memory	`< 0.8` while `used_memory > 100 MB`
`used_memory_rss`	What the OS sees; the OOM killer and swap subsystem use this	Sustained divergence below `used_memory`
`used_memory`	Allocator-reported consumption; provides context for the ratio	Validates whether ratio thresholds are meaningful
`allocator_frag_ratio` (Redis 4.0+)	True allocator fragmentation, separate from OS swap behavior	Normal value while `mem_fragmentation_ratio` is low confirms swap
`latest_fork_usec`	COW fork latency; spikes precede memory pressure events	`> 500 ms` before a ratio drop suggests COW-induced swap
OS swap used	Ground truth for whether swapping is active	Non-zero and correlated with ratio below 1.0
`rdb_bgsave_in_progress` / `aof_rewrite_in_progress`	Forks trigger COW that can push a tight host into swap	Ratio dropping during or after persistence operations

Fixes

Immediate: stop the memory pressure

Identify and terminate or migrate the non-Redis memory consumers that pushed the host over the edge. Stopping the pressure prevents additional pages from being swapped. It does not automatically bring already-swapped Redis pages back into RAM.

Recover the pages

Restart Redis. On startup, the dataset loads from RDB or AOF back into fresh physical memory. All swapped pages are abandoned by the process.

If a restart is not immediately feasible and you have root access, swapoff -a && swapon -a forces the kernel to move swapped pages back to RAM if capacity exists. Warning: if insufficient free memory exists, this command can hang for minutes or hours and may trigger the OOM killer. Do not run it on a memory-constrained host without an escape plan.

Do not rely on MEMORY PURGE. It instructs jemalloc to release dirty pages to the OS, but swapped pages are already on disk, not in allocator arenas. It will not reload them.

Right-size the host

Persistent instances need roughly 50% headroom above used_memory_rss to survive fork copy-on-write without pressuring the OS into swap. Cache-only instances still need headroom for client buffers, replication backlogs, and fragmentation. If the host cannot provide this, shard the dataset or move to larger hardware before the next persistence event triggers the same failure.

Prevention

Set vm.swappiness to 0 or 1. This reduces the kernel’s tendency to swap anonymous pages, though it does not eliminate swap under severe memory pressure.
Account for COW in capacity planning. For instances with RDB or AOF enabled, maintain used_memory below 50% of physical RAM. For cache-only instances, stay below 75% of available memory.
Disable Transparent Huge Pages. THP is the most common cause of excessive fork latency and COW bloat. A fork that takes ten times longer than necessary increases the window during which memory pressure can force swap.
Set maxmemory and enforce host-level headroom. A Redis instance with no limit grows until the OOM killer intervenes. Even with maxmemory configured, ensure the host has enough RSS headroom that the OS never needs to reclaim Redis pages.
Monitor RSS, not just used_memory. The swap subsystem and the OOM killer operate on RSS. A healthy used_memory means nothing if used_memory_rss is being reclaimed.
In containerized environments, ensure the container memory limit includes COW overhead. A limit set equal to maxmemory guarantees fork failures or swap death during the next background save.

How Netdata helps

Correlates mem_fragmentation_ratio, used_memory, and used_memory_rss on the same charts to expose the swap gap.
Alerts on mem_fragmentation_ratio < 0.8 when used_memory > 100 MB, suppressing noise from small instances where process overhead dominates.
Surfaces system-level swap metrics alongside Redis metrics to confirm OS-level swapping.
Tracks latest_fork_usec spikes that precede COW-induced memory pressure.
Displays allocator_frag_ratio and allocator_rss_ratio on Redis 4.0+ instances to help distinguish true swap from allocator artifacts.

The Netdata solution

Redis monitoring with Netdata

Netdata monitors Redis with per-second metrics and ML anomaly detection. Track memory usage and fragmentation, fork/COW latency, replication backlog, evictions, and connection pressure to spot the failure modes in these runbooks early.

See Redis monitoring → Start monitoring free

Redis mem_fragmentation_ratio below 1.0: detecting swap death

Redis mem_fragmentation_ratio below 1.0: detecting swap death

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Immediate: stop the memory pressure

Recover the pages

Right-size the host

Prevention

How Netdata helps

Related guides

Redis monitoring with Netdata