Redis Can’t save in background: fork: Cannot allocate memory - diagnosis and fix
Redis logs Can't save in background: fork: Cannot allocate memory. free -h shows plenty of free RAM, yet BGSAVE or BGREWRITEAOF fails. If stop-writes-on-bgsave-error is yes (default), writes fail too. The gap between free RAM and fork failure is the key.
This is not a simple OOM. It is a kernel commit charge failure. Linux fork() must account for the worst case where every copy-on-write page is modified. With vm.overcommit_memory=0 (the default), the kernel enforces a heuristic commit limit. When Redis RSS is large, that limit blocks fork() even with free physical memory. The fix is usually one sysctl, but THP, container limits, and actual RAM headroom determine whether it holds.
What this means
Redis calls fork() for BGSAVE and BGREWRITEAOF. The child inherits the parent’s page tables and reads the dataset while the parent continues serving writes. Copy-on-write keeps physical pages shared until one process modifies them, so RAM does not double immediately. But the kernel commit charge at fork() time must account for the worst case: every shared page being copied.
When vm.overcommit_memory is 0, the kernel uses a heuristic commit limit: swap plus roughly 50% of physical RAM. If Redis RSS is near that threshold, fork() fails with ENOMEM. A server reporting 40% memory usage can still refuse to fork because the kernel cannot guarantee the child will not eventually copy every page.
If stop-writes-on-bgsave-error is yes (default), Redis rejects writes after the failed save, turning a persistence failure into a write availability incident.
flowchart TD
A[BGSAVE or AOF rewrite triggers fork] --> B{vm.overcommit_memory = 1?}
B -->|Yes| C[Kernel allows virtual commit]
B -->|No| D{Redis RSS < swap + 50% RAM?}
D -->|Yes| C
D -->|No| E[fork returns ENOMEM]
E --> F[Redis logs 'Cannot allocate memory']
F --> G{stop-writes-on-bgsave-error}
G -->|yes| H[Redis rejects writes]
G -->|no| I[Writes continue with no snapshot]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
vm.overcommit_memory=0 | Exact error in logs; RSS above ~50% of RAM; host not actually OOM | sysctl vm.overcommit_memory |
| THP enabled | Fork succeeds intermittently but COW spikes are massive; latest_fork_usec is high | cat /sys/kernel/mm/transparent_hugepage/enabled |
| Container memory limit | Same error inside containers even when host memory is free; child process disappears | Host vm.overcommit_memory and container memory limit |
| Dataset exhausting physical RAM | used_memory_rss near total memory; swap or OOM kills follow | redis-cli INFO memory and system free |
Quick checks
Run these read-only checks to confirm the failure mode before making changes.
# Check kernel overcommit mode
sysctl vm.overcommit_memory
# Check THP status
cat /sys/kernel/mm/transparent_hugepage/enabled
# Check last bgsave status and timestamp
redis-cli INFO persistence | grep -E "rdb_last_bgsave_status|rdb_last_save_time"
# Check last fork duration
redis-cli INFO stats | grep latest_fork_usec
# Check Redis RSS vs logical memory
redis-cli INFO memory | grep -E "used_memory_rss|used_memory:"
# Check COW cost from last save
redis-cli INFO persistence | grep rdb_last_cow_size
# Check for recent OOM kills in kernel log
dmesg | grep -i "oom killer\|killed process"
How to diagnose it
- Confirm the error. Look for
Can't save in background: fork: Cannot allocate memoryin the Redis log, or checkredis-cli INFO persistenceforrdb_last_bgsave_status:err. - Check
vm.overcommit_memory. If it returns0, the kernel heuristic is blocking fork. This is the most common root cause. - Compare RSS to total memory. Run
redis-cli INFO memory | grep used_memory_rssand compare it to physical RAM. If RSS is above ~50% of RAM and overcommit is0, the failure is expected. - Check
latest_fork_usec. Values trending upward or above 500ms suggest THP or dataset size pressure. - Check THP status. If
cat /sys/kernel/mm/transparent_hugepage/enableddoes not show[never], COW granularity is inflated and memory pressure is amplified. - Verify container context. If Redis runs inside a container, confirm the host
vm.overcommit_memoryvalue. Containers inherit it by default; changing it inside a container usually fails. - Check
rdb_last_cow_size. If it exceeds 50% ofused_memory, your write rate during saves is high and headroom is insufficient even with overcommit enabled.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
used_memory_rss vs system memory | RSS is the practical metric for commit accounting on Redis hosts | RSS approaching 50% of RAM when vm.overcommit_memory=0 |
rdb_last_bgsave_status | Binary indicator of save health | Any err value |
latest_fork_usec | Fork freezes the main thread; duration blocks all commands | Sustained values above 500ms, or above 200ms per GB of dataset |
rdb_last_cow_size | Measures actual COW memory cost during last save | Exceeding 50% of used_memory |
| THP kernel setting | THP copies 2MB pages on write, amplifying COW | Any value other than [never] |
Fixes
Kernel overcommit policy
The canonical fix is vm.overcommit_memory=1. The kernel allows virtual allocations until physical memory is exhausted. For Redis this is safe because the background child only reads the shared dataset; it does not modify pages at random.
Apply live:
sudo sysctl vm.overcommit_memory=1
Persist by adding vm.overcommit_memory = 1 to /etc/sysctl.conf or a file under /etc/sysctl.d/.
Tradeoff: On multi-tenant hosts, overcommit=1 allows other processes to allocate freely, increasing the risk of OOM kills under genuine memory pressure. Isolate Redis on dedicated hosts or reserved slices if possible.
Transparent huge pages
Disable THP host-wide:
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
This takes effect immediately. Persist it across reboots using your distribution’s standard method.
The Redis documentation recommends disabling THP. A single write to a 2MB huge page copies the entire page during COW, inflating RSS and fork latency by an order of magnitude.
Tradeoff: 4KB pages increase TLB miss rates for some workloads, but the improvement in fork predictability outweighs the cost.
Memory headroom and sizing
If used_memory_rss is near physical RAM, overcommit=1 alone will not prevent the OOM killer from terminating the child or parent. Reduce maxmemory, enable stricter eviction, shard the dataset, or add RAM.
Persistent instances should keep used_memory below roughly 50% of physical RAM to leave room for COW. Cache-only instances can run closer to the limit, but fork() still requires free pages for page table duplication.
Container-specific behavior
Containers inherit the host’s vm.overcommit_memory. Changing it inside a container usually fails. Apply it on the host or via --sysctl flags where supported. Ensure container memory limits account for COW spikes; a limit tight to current RSS will OOM-kill the child during fork.
Do not mask the failure
Setting stop-writes-on-bgsave-error no turns a loud persistence failure into silent data-loss risk. Fix the fork or disable automatic saves if you do not need them; do not suppress the error.
Prevention
- Set
vm.overcommit_memory=1on every Redis host before production. - Disable THP before starting Redis.
- Size instances with COW headroom. Persistent instances should keep
used_memorybelow roughly 50% of physical RAM. - Monitor
latest_fork_usecandrdb_last_cow_sizeafter every save to trend memory pressure. - Schedule
BGSAVEduring low-write windows, or rely on AOF with periodic rewrites instead of frequent RDB snapshots when fork pressure is high.
How Netdata helps
- Correlate
redis.latest_fork_usecwith system memory metrics to spot growing fork duration before it fails. - Alert on
redis.rdb_last_bgsave_statustransitioning toerrimmediately after the first failed save. - Track
redis.used_memory_rssagainst system available memory to visualize commit headroom. - Surface
redis.rdb_last_cow_sizetrends to predict whether the next fork will fit within container or host limits. - Cross-reference Redis persistence health with kernel THP state and
vm.overcommit_memorycontext.
Related guides
- How Redis actually works in production: a mental model for operators
- Redis eviction policy tuning: allkeys-lru vs volatile-ttl vs noeviction
- Redis maxmemory not set: why every production instance needs a memory limit
- Redis monitoring checklist: the signals every production instance needs
- Redis monitoring maturity model: from survival to expert
- Redis OOM command not allowed when used memory > ‘maxmemory’ - causes and fixes
- Redis OOM-killed by the kernel: RSS, overcommit, and recovery







