$ guides / redis / redis-out-of-memory-oom-killed ▌

Operations Guides

Redis OOM-killed by the kernel: RSS, overcommit, and recovery

Redis reports used_memory at 60% of maxmemory, then disappears. The container status is OOMKilled, or dmesg shows the kernel OOM killer selected redis-server. The kernel enforces resident memory (RSS), while used_memory and maxmemory track logical allocator state. Fragmentation, copy-on-write pages during persistence, and client buffers inflate RSS above the logical figure most operators monitor. When RSS hits the host or cgroup memory ceiling, the kernel terminates the process even though Redis believes it is within limits.

During a background save or AOF rewrite, Redis forks a child process. Under Linux copy-on-write semantics, the child shares pages with the parent until either modifies them. On a write-heavy instance, dirty pages are physically duplicated, temporarily pushing RSS to roughly twice the normal working set. If the system or container is sized only for logical used_memory, the OOM kill is guaranteed. Do not tweak eviction policies; size the host for the physical memory Redis actually occupies.

What this means

The kernel OOM killer targets processes by RSS, not Redis used_memory. RSS includes allocator fragmentation, shared libraries, client output buffers, and copy-on-write dirty pages. A Redis instance reporting used_memory of 2.75 GB can have used_memory_rss of 4.12 GB and be killed because the kernel sees 4.12 GB resident.

maxmemory does not protect against kernel OOM kills. It caps Redis’s logical allocator, but the OOM killer acts on physical RSS. If fragmentation or COW doubles RSS, the kernel may kill the process before maxmemory is reached. The gap between used_memory and used_memory_rss is the danger zone.

flowchart TD
    A[Fork for BGSAVE or AOF rewrite] --> B[COW duplicates dirty pages]
    B --> C[used_memory_rss spikes]
    C --> D{Memory limit reached?}
    D -->|Yes| E[Kernel OOM killer]
    E --> F[Redis terminated]
    F --> G[Restart with cold cache]
    G --> H[Client thundering herd]

Common causes

Cause	What it looks like	First thing to check
Allocator fragmentation	`mem_fragmentation_ratio` sustained above 1.5, stable `used_memory`, climbing `used_memory_rss`	`redis-cli INFO memory` for `mem_fragmentation_ratio` and `allocator_frag_ratio`
COW bloat during persistence	RSS doubles while `rdb_bgsave_in_progress` or `aof_rewrite_in_progress` is 1	`redis-cli INFO persistence` for `rdb_last_cow_size` or `aof_last_cow_size`
vm.overcommit_memory = 0	Fork fails or succeeds without headroom for COW pages; Redis logs “Cannot allocate memory” or the kernel OOM killer fires mid-save	`cat /proc/sys/vm/overcommit_memory`
Client output buffer accumulation	A single client or replica consumes hundreds of megabytes; `used_memory` climbs slowly but RSS jumps	`redis-cli CLIENT LIST` and inspect `omem` values
Transparent Huge Pages enabled	Fork latency spikes and COW copies 2 MB pages instead of 4 KB, amplifying RSS	`cat /sys/kernel/mm/transparent_hugepage/enabled`

Quick checks

# Logical vs physical memory
redis-cli INFO memory | grep -E "used_memory:|used_memory_rss:"

# Fragmentation ratio
redis-cli INFO memory | grep mem_fragmentation_ratio

# Kernel OOM evidence
sudo dmesg | grep -i "killed process"

# Overcommit policy
cat /proc/sys/vm/overcommit_memory

# THP status
cat /sys/kernel/mm/transparent_hugepage/enabled

# Persistence activity and recent COW size
redis-cli INFO persistence | grep -E "cow_size|bgsave_in_progress|rewrite_in_progress"

# Recent fork latency
redis-cli INFO stats | grep latest_fork_usec

# Client buffer bloat
redis-cli CLIENT LIST | awk -F'[= ]' '{for(i=1;i<=NF;i++) if($i=="omem") print $(i+1)}' | sort -rn | head -20

# Configured memory limit
redis-cli CONFIG GET maxmemory

How to diagnose it

Confirm a kernel OOM event. Check dmesg or /var/log/kern.log for Killed process near the time of death. Compare with uptime_in_seconds from INFO server; a sudden reset confirms a restart.
Measure the RSS-to-logical gap. Run redis-cli INFO memory and compare used_memory_rss to used_memory. A ratio above 1.5 on an instance with more than 100 MB of data signals meaningful overhead.
Identify COW as the trigger. Check rdb_last_cow_size and aof_last_cow_size in INFO persistence. If either exceeds 50% of used_memory, the last fork duplicated enough pages to push RSS toward the limit. On Redis 7.0+, monitor current_cow_peak during active forks.
Check vm.overcommit_memory. If it is 0 (the default), the kernel requires enough free RAM to cover the parent’s RSS before allowing a fork. This either causes fork failures or leaves zero margin for COW growth, making mid-operation OOM kills likely.
Audit client buffers. Run CLIENT LIST and look for large omem values. The default client-output-buffer-limit normal 0 0 0 means no limit for normal clients, so a slow subscriber or forgotten MONITOR session can consume gigabytes of RSS.
Inspect THP status. If /sys/kernel/mm/transparent_hugepage/enabled is not [never], a single-byte write during a fork can duplicate an entire 2 MB huge page instead of a 4 KB standard page, multiplying COW overhead.
Distinguish from Redis-level OOM. If maxmemory was reached with a noeviction policy, Redis returns -OOM errors tracked in errorstat_OOM (Redis 6.2+) rather than being killed by the kernel. Kernel OOM and Redis OOM require different fixes.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`used_memory_rss / total_system_memory`	Kernel OOM killer uses RSS, not logical memory	> 0.75 on persistent instances that fork
`mem_fragmentation_ratio`	Allocator fragmentation inflates RSS silently	Sustained > 1.5 on instances > 100 MB
`rdb_last_cow_size` / `aof_last_cow_size`	Pages duplicated during the last fork	> 50% of `used_memory`
`latest_fork_usec`	Long forks block the event loop and correlate with heavy COW	> 500 ms
`current_cow_peak` (Redis 7.0+)	Real-time COW memory during an active fork	Approaching container or host limit
`allocator_frag_ratio` (Redis 4.0+)	Isolates allocator waste from process overhead	Sustained > 1.5
Client `omem`	Output buffers are allocated from the heap and count toward RSS	Any single client > 256 MB
`uptime_in_seconds`	Detects unexpected restarts after OOM kills	Sudden drop or reset
`errorstat_OOM` (Redis 6.2+)	Distinguishes Redis-level OOM errors from kernel kills	Non-zero rate

Fixes

Reduce RSS from fragmentation

Run MEMORY PURGE to return jemalloc dirty pages to the OS. This reduces RSS briefly but cannot defragment live objects. For persistent fragmentation, enable activedefrag yes (Redis 4.0+) to compact live allocations in the background. Active defrag consumes main-thread CPU; cap active-defrag-cpu-max to avoid latency spikes.

Right-size for COW during persistence

For instances with RDB or AOF enabled, keep used_memory below 50% of physical RAM so a worst-case COW spike does not hit the ceiling. If you cannot add memory, disable automatic save directives and schedule BGSAVE during low-traffic windows. Tradeoff: wider RPO and manual operational burden. Alternatively, set appendonly no if AOF rewrite COW is the primary trigger, though this sacrifices AOF durability.

Fix overcommit and THP

Set vm.overcommit_memory = 1 so fork() succeeds without requiring free RAM equal to the parent’s RSS. You must rely on your own capacity planning rather than the kernel’s heuristic.

Disable Transparent Huge Pages:

# Warning: run as root. Applies immediately but resets on reboot.
# Persist via init scripts or systemd tmpfiles to survive reboot.
echo never > /sys/kernel/mm/transparent_hugepage/enabled

Tradeoff: slightly higher TLB pressure for some workloads, but fork latency improves and COW page granularity drops from 2 MB to 4 KB.

Contain client buffers

Set explicit output buffer limits for normal clients instead of the default unlimited:

redis-cli CONFIG SET client-output-buffer-limit normal 64mb 32mb 60

Add the same directive to redis.conf to survive restart.

Tradeoff: slow clients are forcibly disconnected. Audit CLIENT LIST for MONITOR sessions, which copy every command to an output buffer and can OOM an instance within minutes under load.

Prevention

Set maxmemory to leave headroom for RSS overhead, not just logical data. On persistent instances, treat 50% of available RAM as the practical ceiling for used_memory.
Monitor used_memory_rss and alert on it approaching the host or cgroup limit. Do not rely solely on used_memory or maxmemory ratio alerts.
Size repl-backlog-size to at least 100 MB to avoid replica disconnections that trigger full resyncs and additional forks.
Keep vm.overcommit_memory=1 and THP disabled on all Redis hosts. Verify both at provisioning time and after kernel upgrades.
Run redis-cli --bigkeys or MEMORY USAGE sampling periodically to catch single keys that disproportionately expand the dataset and COW cost.

How Netdata helps

Collects used_memory_rss, used_memory, and mem_fragmentation_ratio from INFO memory, correlating them with system RAM and container cgroup metrics to expose the gap that leads to kernel OOM kills.
Tracks rdb_last_cow_size, aof_last_cow_size, and latest_fork_usec to correlate persistence events with RSS spikes.
The redis.instance_available alarm triggers on uptime_in_seconds resets, surfacing OOM-killed restarts immediately.
Surfaces RSS-based memory usage alongside logical allocator memory, making fragmentation and COW bloat visible before the kernel intervenes.
On Kubernetes, monitors container memory.working_set and memory.limit, distinguishing cgroup OOM from global kernel OOM.

The Netdata solution

Redis monitoring with Netdata

Netdata monitors Redis with per-second metrics and ML anomaly detection. Track memory usage and fragmentation, fork/COW latency, replication backlog, evictions, and connection pressure to spot the failure modes in these runbooks early.

See Redis monitoring → Start monitoring free

Redis OOM-killed by the kernel: RSS, overcommit, and recovery

Redis OOM-killed by the kernel: RSS, overcommit, and recovery

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Reduce RSS from fragmentation

Right-size for COW during persistence

Fix overcommit and THP

Contain client buffers

Prevention

How Netdata helps

Related guides

Redis monitoring with Netdata