Redis Linux kernel tuning: vm.overcommit_memory, swappiness, and NUMA
Redis uses fork-based copy-on-write for background saves, AOF rewrites, and full replication resyncs. Linux defaults for memory overcommit, swap, page size, and socket queuing suit general-purpose workloads, not an in-memory store that clones multi-gigabyte address spaces. Left unchanged, they produce intermittent Cannot allocate memory errors during BGSAVE, 10-100x latency spikes during AOF rewrite, and silent OOM kills. This guide covers the five host-level tunables, the failure mode each prevents, and the production signals that expose a misconfigured host.
The fork path: where kernel defaults bite
flowchart TD
A[BGSAVE AOF rewrite or full resync] -->|fork| B[Copy-on-Write]
B --> C{vm.overcommit_memory}
C -->|0 or 2| D[fork fails
Cannot allocate memory]
C -->|1| E[fork succeeds]
E --> F{THP enabled?}
F -->|yes| G[2MB page copies
500x amplification]
F -->|no| H[4KB page copies]
G --> I[Latency spike RSS doubles]
H --> J[Normal COW overhead]
E --> K{vm.swappiness}
K -->|0| L[OOM killer targets Redis]
K -->|1| M[Emergency swap preserved]
E --> N{NUMA layout}
N -->|single-node bind| O[Local memory exhaustion]
N -->|interleave=all| P[Even cross-node allocation]vm.overcommit_memory: making fork reliable
Redis forks a child for BGSAVE, BGREWRITEAOF, and full replication resyncs. The kernel must duplicate the parent page tables. With vm.overcommit_memory=0 (the default on most distributions), the kernel heuristically checks whether free memory can satisfy the child’s worst-case demand. On a busy host where the parent already holds most RAM, this check fails and the fork is denied. Redis logs Can't save in background: fork: Cannot allocate memory and, if stop-writes-on-bgsave-error is enabled (the default), rejects all subsequent write commands.
Even with plenty of free RAM, the kernel may refuse the fork because it does not trust the copy-on-write promise. vm.overcommit_memory=1 tells the kernel to always permit fork() and rely on COW to avoid actual exhaustion. Redis warns at startup when this is not set to 1.
Check:
sysctl vm.overcommit_memory
Apply and persist:
sysctl vm.overcommit_memory=1
echo 'vm.overcommit_memory=1' >> /etc/sysctl.conf
vm.overcommit_memory is a global host kernel parameter; it is not namespaced. Containers inherit the host setting. If the host is wrong, every container on that node is affected. You cannot change this from inside a container. Set it on the node directly.
vm.swappiness: keeping Redis out of swap without inviting the OOM killer
Redis on swap is dead. A swapped page adds disk latency to the single-threaded event loop and causes cascading timeouts. The old advice to set vm.swappiness=0 is harmful on modern kernels. On Linux 3.5 and later, 0 disables swap except under OOM conditions. During a temporary memory spike, such as a fork COW burst or sudden client buffer expansion, the kernel has no graceful reclaim path and the OOM killer terminates Redis directly.
Set vm.swappiness=1. This makes swapping nearly impossible under normal conditions while preserving an emergency margin that lets the kernel swap cold pages rather than kill the process.
Check:
sysctl vm.swappiness
Apply and persist:
sysctl vm.swappiness=1
echo 'vm.swappiness=1' >> /etc/sysctl.conf
Transparent Huge Pages: the 500x copy-on-write multiplier
With Transparent Huge Pages (THP) enabled, the kernel allocates memory in 2 MB pages instead of 4 KB. During a fork, the child shares the parent’s pages via copy-on-write. If a write touches a single byte in a 2 MB huge page, the kernel copies the entire 2 MB page. This amplifies COW overhead by roughly 500x. The result is a sudden RSS spike, doubled memory usage, and main-thread freezes measured in hundreds of milliseconds or seconds. THP is the most commonly missed host-level tuning and the leading cause of fork latency incidents.
Disable THP:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
Persist across reboots by adding those commands to /etc/rc.local or a systemd ExecStartPre directive. Verify:
cat /sys/kernel/mm/transparent_hugepage/enabled
The output should show [never]. If you cannot disable THP host-wide, madvise mode (reported as [madvise] never) suppresses the Redis startup warning but does not eliminate the latency risk during fork.
NUMA placement on multi-socket hosts
On multi-socket physical servers, the kernel’s NUMA policy determines which memory nodes Redis uses. Without explicit placement, the kernel may concentrate allocations on one node. That node can exhaust local memory, forcing remote-node access or premature OOM kills even when aggregate system memory is sufficient. You may see one NUMA node at 95% while another sits at 30%.
Start Redis with numactl --interleave=all to spread allocations evenly:
numactl --interleave=all redis-server /etc/redis/redis.conf
Avoid --membind. It pins Redis to a single node and risks exhausting that node’s capacity. Automatic NUMA balancing helps dynamic workloads, but explicit interleaving is safer for long-running Redis instances with large, stable datasets. Most cloud VMs are single-socket and do not need this tuning. Check first:
numactl --hardware
TCP backlog and somaxconn alignment
The effective Redis listen queue depth is the lesser of tcp-backlog in redis.conf (default 511) and /proc/sys/net/core/somaxconn (default 128). When somaxconn is lower, the kernel silently truncates the queue. Under connection storms, clients receive ECONNREFUSED before maxclients is reached. Redis logs this at startup:
WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128
Raise the kernel limits:
sysctl net.core.somaxconn=65535
sysctl net.ipv4.tcp_max_syn_backlog=65535
echo 'net.core.somaxconn=65535' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_max_syn_backlog=65535' >> /etc/sysctl.conf
Ensure tcp-backlog in redis.conf is at least 511, or higher if your workload sees thundering-herd reconnections after failover. Each connection consumes one file descriptor. The effective maxclients ceiling is roughly (ulimit -n) - 32. If you raise the backlog, also set LimitNOFILE to at least 65536 in your systemd unit or limits.conf.
Do not use net.ipv4.tcp_tw_recycle. It was removed in Linux 4.12 and breaks NAT. net.ipv4.tcp_tw_reuse=1 remains valid on modern kernels.
Signals to watch in production
| Signal | Why it matters | Warning sign |
|---|---|---|
latest_fork_usec | Main-thread freeze duration during fork. High values indicate THP, NUMA imbalance, or overcommit denial. | > 500 ms consistently, or sudden spikes after kernel changes. |
used_memory_rss during rdb_bgsave_in_progress or aof_rewrite_in_progress | COW overhead pushes RSS toward the host memory limit. | RSS approaching total physical RAM during a save or rewrite. |
mem_fragmentation_ratio | Ratio below 1.0 means pages have been swapped out. | Sustained ratio < 1.0 with used_memory > 100 MB. |
rdb_last_cow_size / aof_last_cow_size | Actual memory cost of the last fork. Use to size host RAM. | > 50% of used_memory after a save or rewrite. |
rejected_connections | New clients cannot connect. Indicates somaxconn truncation or true maxclients exhaustion. | Any rate > 0. |
| Redis startup warnings | Redis explicitly warns about vm.overcommit_memory, THP, and somaxconn. | Any warning in the startup log. |
How Netdata helps
- Correlate
latest_fork_usecwith host-level CPU and memory metrics to distinguish kernel misconfiguration from dataset growth. - Alert on
used_memory_rssspikes that coincide withrdb_bgsave_in_progressoraof_rewrite_in_progress, catching COW-induced memory pressure before the OOM killer fires. - Flag
mem_fragmentation_ratiobelow 1.0, which indicates swap usage on modern kernels. - Monitor system memory page fault rates alongside Redis latency to detect THP-related fork stalls.
- Surface per-second
rejected_connectionsto identify TCP backlog truncation before application errors cascade.
Related guides
- How Redis actually works in production: a mental model for operators
- Redis aof_last_write_status:err: AOF write failures and recovery
- Redis appendfsync always latency: durability vs throughput trade-offs
- Redis big keys: finding the giant key that blocks the event loop
- Redis blocked_clients growing: dead consumers vs healthy queues
- Redis BUSY Redis is busy running a script: blocking Lua and how to recover
- Redis Can’t save in background: fork: Cannot allocate memory - diagnosis and fix
- Redis client output buffer overflow: slow consumers and client-output-buffer-limit
- Redis cluster_slots_pfail > 0: impending node failure in a cluster
- Redis CLUSTERDOWN / cluster_state:fail: slot coverage and recovery
- Redis connected_clients climbing: connection leak detection
- Redis connected_slaves dropped: detecting replica disconnects on the primary







