Redis latest_fork_usec too high: THP, NUMA, and fork latency

INFO stats shows latest_fork_usec in the hundreds of milliseconds. Every fork() blocks the single event loop, so during that window no commands are processed. Clients time out, replicas disconnect, and a full resync can trigger another fork, creating a loop of latency and reconnection storms. A normal fork costs roughly 10-20ms per gigabyte of resident memory with Transparent Huge Pages disabled. If you are seeing 10-100x that, the culprit is usually THP, NUMA, or memory overcommit policy.

What this means

latest_fork_usec measures wall-clock time of the fork(2) syscall Redis uses for background RDB saves, AOF rewrites, and full replication resyncs. It measures only the time the main thread is frozen, not the total BGSAVE duration. Redis is single-threaded for command execution, so the entire event loop stops during this window. For latency-sensitive workloads, the impact is indistinguishable from an outage.

With THP disabled, expect roughly 10-20ms per GB of used_memory_rss. Values above 200ms on a reasonably sized instance point to THP interference, NUMA remote-node memory access, or an overcommitted hypervisor. Sustained spikes above 500ms cause client-side timeouts. Above one second, cascading replica disconnects and full resyncs become likely. THP also amplifies copy-on-write cost: after fork, a single byte write to a 2MB huge page forces the kernel to copy the entire page, inflating RSS and latency together.

flowchart TD
    A[Fork for BGSAVE or full resync] --> B{THP enabled or NUMA remote?}
    B -->|Yes| C[Fork latency spikes >500ms]
    B -->|No| D[Normal fork ~20ms/GB]
    C --> E[Main thread frozen]
    E --> F[Clients timeout]
    F --> G[Replicas disconnect]
    G --> H[Full resync on reconnect]
    H --> A

Common causes

CauseWhat it looks likeFirst thing to check
Transparent Huge Pages enabledlatest_fork_usec is 10-100x baseline, often over 1s on multi-GB instancescat /sys/kernel/mm/transparent_hugepage/enabled
NUMA misconfigurationElevated fork latency on large bare-metal or multi-socket VMs; memory allocated on a remote nodenumactl --hardware and numastat
vm.overcommit_memory not set to 1Fork fails or stalls under memory pressure; ENOMEM may appear in logssysctl vm.overcommit_memory
Dataset too large for the hostFork latency grows linearly with RSS; approaching physical memory limitsused_memory_rss versus total system RAM
Overcommitted VM or page table fragmentationSlower than expected forks even with THP disabled; virtualization overheadGuest steal time, host hypervisor memory statistics, and RSS trends

Quick checks

Run these safe, read-only commands to establish baseline state.

# Check the most recent fork duration in microseconds
redis-cli INFO stats | grep latest_fork_usec

# Check whether THP is active
cat /sys/kernel/mm/transparent_hugepage/enabled

# Check if a background save or rewrite is currently running
redis-cli INFO persistence | grep -E "rdb_bgsave_in_progress|aof_rewrite_in_progress"

# Check RSS to compute the expected fork baseline
redis-cli INFO memory | grep used_memory_rss

# Verify memory overcommit policy
sysctl vm.overcommit_memory

# Check replica count; full resyncs trigger additional forks
redis-cli INFO replication | grep connected_slaves

# Check for recent full resyncs
redis-cli INFO stats | grep -E "sync_full|sync_partial_err"

# Check NUMA layout
numactl --hardware

# Check per-node memory distribution for the Redis process (assumes one instance)
numastat -p $(pgrep -n redis-server)

How to diagnose it

  1. Establish the per-GB ratio. Divide latest_fork_usec by gigabytes of used_memory_rss. If the result is much higher than 20ms/GB, continue. If it is under 20ms/GB, the fork is normal and the issue is likely dataset size or client timeout tuning.

  2. Check THP status. Run cat /sys/kernel/mm/transparent_hugepage/enabled. If the value is [always], THP is active and is the most likely cause. [madvise] is usually safe for Redis because it allocates standard pages by default, but set it to [never] to eliminate the variable.

  3. Verify vm.overcommit_memory. Run sysctl vm.overcommit_memory. Redis requires this to be 1. Without it, the kernel performs conservative allocation checks that can cause fork() to fail with ENOMEM or stall, surfacing as MISCONF Redis is configured to save RDB snapshots.

  4. Map the fork to a trigger. Check rdb_bgsave_in_progress and aof_rewrite_in_progress. If neither is active but latest_fork_usec updated, the fork was likely triggered by a replica full resync. Check sync_full and sync_partial_err in INFO stats to confirm.

  5. Evaluate memory headroom. Compare used_memory_rss to total physical RAM. On persistent instances, maintain at least 50% headroom for COW. If RSS is over 50% of RAM, the kernel is under pressure and fork behavior becomes unpredictable even with correct settings.

  6. Inspect NUMA topology. On multi-socket hosts or large VMs, run numactl --hardware to list nodes, then numastat -p $(pgrep -n redis-server) to check memory distribution. If Other_Node is high or the process memory is not on the same node as its CPU, remote memory access is slowing page-table walks during fork. Co-locate CPU and memory, or use interleaved allocation.

  7. Enable latency monitoring if it is off. If LATENCY LATEST returns empty, the monitor is disabled. Run CONFIG SET latency-monitor-threshold 100, then check LATENCY HISTORY fork after the next persistence event to confirm the event is captured.

  8. Check for hypervisor overhead. If Redis runs on a virtualized host, run vmstat 1 or top and watch st (steal time). Consistent steal time alongside ballooning or host overcommit inflates fork times without any change in guest configuration.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
latest_fork_usecDirect measure of main-thread freeze during fork>500ms, or >20ms per GB of RSS
rdb_bgsave_in_progress / aof_rewrite_in_progressIdentifies when forks are triggered by persistenceCorrelates with latency spikes
used_memory_rssPage table size and COW footprint drive fork costRSS approaching total RAM
sync_full rateFull resyncs force additional forks on the primaryIncreasing counter means replicas are cycling
mem_fragmentation_ratioHigh fragmentation reduces available memory for COWSustained >1.5
LATENCY LATEST fork eventsBuilt-in latency tracking confirms internal impactAny fork event over your threshold

Fixes

Disable Transparent Huge Pages

This is the most common fix and the first change to make.

# Disable THP immediately
echo never > /sys/kernel/mm/transparent_hugepage/enabled

The tradeoff is a slightly higher TLB miss rate for generic workloads, but Redis explicitly recommends disabling THP. The improvement is usually immediate. Persist the change across reboots via your distribution’s kernel boot parameters or sysfs utilities.

Set vm.overcommit_memory to 1

Without this, the kernel may refuse the fork or behave conservatively under load.

# Set immediately
sysctl -w vm.overcommit_memory=1

The tradeoff is that you rely on the OOM killer rather than allocation-time failure, but Redis requires this for reliable fork behavior. Set it permanently in /etc/sysctl.conf or a drop-in file.

Fix NUMA placement

If the host has multiple NUMA nodes, bind the Redis process to cores and memory on the same node, or use interleaved allocation across nodes so no single fork pays remote-memory latency. Binding requires a process restart. The tradeoff is that pinning to one node restricts CPU scheduling, while interleaving removes locality benefits for other workloads.

Reduce fork frequency

Increase repl-backlog-size to prevent full resyncs on brief replica disconnections. The default 1MB is almost always too small for production.

redis-cli CONFIG SET repl-backlog-size 104857600

Also review your save directives. Frequent automatic BGSAVE on a large instance multiplies the fork penalty. The tradeoff is that a larger backlog consumes more memory, and less frequent RDB snapshots widen your recovery point objective.

Right-size or shard the instance

If fork latency is still unacceptable after THP is disabled and NUMA is corrected, the dataset may be too large for a single process. Shard the keyspace across multiple Redis instances or enable clustering. The tradeoff is operational complexity, but it removes the single-core and single-fork bottleneck.

Prevention

  • Bake THP disable into base images. Ensure every Redis host boots with THP set to never before the server starts.
  • Set vm.overcommit_memory=1 at boot. This avoids fork failures during traffic spikes.
  • Monitor latest_fork_usec after every fork. Alert when it exceeds 20ms per GB of RSS.
  • Size repl-backlog-size to 100MB or more. This prevents replica reconnections from triggering expensive full resyncs.
  • Keep RSS below 50% of physical RAM on persistent instances. This leaves headroom for COW pages during the fork window.

How Netdata helps

  • Correlates latest_fork_usec with persistence flags to tie fork spikes to BGSAVE, AOF rewrite, or replica sync events.
  • Tracks used_memory_rss, mem_fragmentation_ratio, and system memory to flag COW pressure.
  • Surfaces replication metrics including sync_full and sync_partial_err to catch backlog overflow loops.
  • Captures instantaneous_ops_per_sec drops that coincide with fork events, confirming client impact.
  • Monitors system-level THP state and vm.overcommit_memory alongside Redis metrics.