Redis Linux kernel tuning: vm.overcommit_memory, swappiness, and NUMA

Redis uses fork-based copy-on-write for background saves, AOF rewrites, and full replication resyncs. Linux defaults for memory overcommit, swap, page size, and socket queuing suit general-purpose workloads, not an in-memory store that clones multi-gigabyte address spaces. Left unchanged, they produce intermittent Cannot allocate memory errors during BGSAVE, 10-100x latency spikes during AOF rewrite, and silent OOM kills. This guide covers the five host-level tunables, the failure mode each prevents, and the production signals that expose a misconfigured host.

The fork path: where kernel defaults bite

flowchart TD
    A[BGSAVE AOF rewrite or full resync] -->|fork| B[Copy-on-Write]
    B --> C{vm.overcommit_memory}
    C -->|0 or 2| D[fork fails
Cannot allocate memory] C -->|1| E[fork succeeds] E --> F{THP enabled?} F -->|yes| G[2MB page copies
500x amplification] F -->|no| H[4KB page copies] G --> I[Latency spike RSS doubles] H --> J[Normal COW overhead] E --> K{vm.swappiness} K -->|0| L[OOM killer targets Redis] K -->|1| M[Emergency swap preserved] E --> N{NUMA layout} N -->|single-node bind| O[Local memory exhaustion] N -->|interleave=all| P[Even cross-node allocation]

vm.overcommit_memory: making fork reliable

Redis forks a child for BGSAVE, BGREWRITEAOF, and full replication resyncs. The kernel must duplicate the parent page tables. With vm.overcommit_memory=0 (the default on most distributions), the kernel heuristically checks whether free memory can satisfy the child’s worst-case demand. On a busy host where the parent already holds most RAM, this check fails and the fork is denied. Redis logs Can't save in background: fork: Cannot allocate memory and, if stop-writes-on-bgsave-error is enabled (the default), rejects all subsequent write commands.

Even with plenty of free RAM, the kernel may refuse the fork because it does not trust the copy-on-write promise. vm.overcommit_memory=1 tells the kernel to always permit fork() and rely on COW to avoid actual exhaustion. Redis warns at startup when this is not set to 1.

Check:

sysctl vm.overcommit_memory

Apply and persist:

sysctl vm.overcommit_memory=1
echo 'vm.overcommit_memory=1' >> /etc/sysctl.conf

vm.overcommit_memory is a global host kernel parameter; it is not namespaced. Containers inherit the host setting. If the host is wrong, every container on that node is affected. You cannot change this from inside a container. Set it on the node directly.

vm.swappiness: keeping Redis out of swap without inviting the OOM killer

Redis on swap is dead. A swapped page adds disk latency to the single-threaded event loop and causes cascading timeouts. The old advice to set vm.swappiness=0 is harmful on modern kernels. On Linux 3.5 and later, 0 disables swap except under OOM conditions. During a temporary memory spike, such as a fork COW burst or sudden client buffer expansion, the kernel has no graceful reclaim path and the OOM killer terminates Redis directly.

Set vm.swappiness=1. This makes swapping nearly impossible under normal conditions while preserving an emergency margin that lets the kernel swap cold pages rather than kill the process.

Check:

sysctl vm.swappiness

Apply and persist:

sysctl vm.swappiness=1
echo 'vm.swappiness=1' >> /etc/sysctl.conf

Transparent Huge Pages: the 500x copy-on-write multiplier

With Transparent Huge Pages (THP) enabled, the kernel allocates memory in 2 MB pages instead of 4 KB. During a fork, the child shares the parent’s pages via copy-on-write. If a write touches a single byte in a 2 MB huge page, the kernel copies the entire 2 MB page. This amplifies COW overhead by roughly 500x. The result is a sudden RSS spike, doubled memory usage, and main-thread freezes measured in hundreds of milliseconds or seconds. THP is the most commonly missed host-level tuning and the leading cause of fork latency incidents.

Disable THP:

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

Persist across reboots by adding those commands to /etc/rc.local or a systemd ExecStartPre directive. Verify:

cat /sys/kernel/mm/transparent_hugepage/enabled

The output should show [never]. If you cannot disable THP host-wide, madvise mode (reported as [madvise] never) suppresses the Redis startup warning but does not eliminate the latency risk during fork.

NUMA placement on multi-socket hosts

On multi-socket physical servers, the kernel’s NUMA policy determines which memory nodes Redis uses. Without explicit placement, the kernel may concentrate allocations on one node. That node can exhaust local memory, forcing remote-node access or premature OOM kills even when aggregate system memory is sufficient. You may see one NUMA node at 95% while another sits at 30%.

Start Redis with numactl --interleave=all to spread allocations evenly:

numactl --interleave=all redis-server /etc/redis/redis.conf

Avoid --membind. It pins Redis to a single node and risks exhausting that node’s capacity. Automatic NUMA balancing helps dynamic workloads, but explicit interleaving is safer for long-running Redis instances with large, stable datasets. Most cloud VMs are single-socket and do not need this tuning. Check first:

numactl --hardware

TCP backlog and somaxconn alignment

The effective Redis listen queue depth is the lesser of tcp-backlog in redis.conf (default 511) and /proc/sys/net/core/somaxconn (default 128). When somaxconn is lower, the kernel silently truncates the queue. Under connection storms, clients receive ECONNREFUSED before maxclients is reached. Redis logs this at startup:

WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128

Raise the kernel limits:

sysctl net.core.somaxconn=65535
sysctl net.ipv4.tcp_max_syn_backlog=65535
echo 'net.core.somaxconn=65535' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_max_syn_backlog=65535' >> /etc/sysctl.conf

Ensure tcp-backlog in redis.conf is at least 511, or higher if your workload sees thundering-herd reconnections after failover. Each connection consumes one file descriptor. The effective maxclients ceiling is roughly (ulimit -n) - 32. If you raise the backlog, also set LimitNOFILE to at least 65536 in your systemd unit or limits.conf.

Do not use net.ipv4.tcp_tw_recycle. It was removed in Linux 4.12 and breaks NAT. net.ipv4.tcp_tw_reuse=1 remains valid on modern kernels.

Signals to watch in production

SignalWhy it mattersWarning sign
latest_fork_usecMain-thread freeze duration during fork. High values indicate THP, NUMA imbalance, or overcommit denial.> 500 ms consistently, or sudden spikes after kernel changes.
used_memory_rss during rdb_bgsave_in_progress or aof_rewrite_in_progressCOW overhead pushes RSS toward the host memory limit.RSS approaching total physical RAM during a save or rewrite.
mem_fragmentation_ratioRatio below 1.0 means pages have been swapped out.Sustained ratio < 1.0 with used_memory > 100 MB.
rdb_last_cow_size / aof_last_cow_sizeActual memory cost of the last fork. Use to size host RAM.> 50% of used_memory after a save or rewrite.
rejected_connectionsNew clients cannot connect. Indicates somaxconn truncation or true maxclients exhaustion.Any rate > 0.
Redis startup warningsRedis explicitly warns about vm.overcommit_memory, THP, and somaxconn.Any warning in the startup log.

How Netdata helps

  • Correlate latest_fork_usec with host-level CPU and memory metrics to distinguish kernel misconfiguration from dataset growth.
  • Alert on used_memory_rss spikes that coincide with rdb_bgsave_in_progress or aof_rewrite_in_progress, catching COW-induced memory pressure before the OOM killer fires.
  • Flag mem_fragmentation_ratio below 1.0, which indicates swap usage on modern kernels.
  • Monitor system memory page fault rates alongside Redis latency to detect THP-related fork stalls.
  • Surface per-second rejected_connections to identify TCP backlog truncation before application errors cascade.