MongoDB swapping: why mongod must never swap and how to tune the OS

Application timeouts climb. MongoDB latency jumps from milliseconds to seconds or minutes, yet mongod is still running and accepting connections. CPU is low, disk I/O is not saturated, and the MongoDB log shows no errors. The process has not crashed. It has entered swap death. When the Linux kernel evicts mongod pages to swap, the database continues to function at roughly 1/1000th of normal speed. MongoDB relies on the WiredTiger cache and OS page cache to remain resident in RAM.

What this means

Swapping turns a memory access into a disk read. For mongod, this is catastrophic: the storage engine and OS cache are designed for RAM-speed access. WiredTiger maintains an in-memory cache separate from the OS page cache. When either layer is swapped out, cache misses hit disk at swap speed. The process stays alive, heartbeats continue, and replica set elections may not trigger because the node is technically responsive. Operations queue indefinitely. The degradation is self-reinforcing: slow operations hold tickets and connections longer, increasing memory pressure and causing more swapping.

flowchart TD
    A[Memory pressure] --> B[OS swaps mongod pages]
    B --> C[Cache misses hit disk]
    C --> D[Latency spikes 10-100x]
    D --> E[Connections pile up]
    E --> F[More memory pressure]
    F --> A

Common causes

CauseWhat it looks likeFirst thing to check
vm.swappiness at default (60)Host swaps under moderate pressure even when buffers could be droppedcat /proc/sys/vm/swappiness
Working set exceeds RAMPage fault rate climbs after warmup; RSS sits near physical memory limitdb.serverStatus().extra_info.page_faults and ps RSS
NUMA imbalance on multi-socketUneven memory allocation across sockets; some nodes saturated while others are freenumastat and /proc/<pid>/numa_maps
Transparent Huge Pages enabledLatency spikes and fragmentation under load, especially on older MongoDB versionscat /sys/kernel/mm/transparent_hugepage/enabled
Container memory limit too smallOOM kills or swap pressure inside the container despite free host RAMWiredTiger cache max bytes vs container limit

Quick checks

# Substitute the mongod PID explicitly if multiple instances are running.
MONGOD_PID=$(pgrep mongod)

cat /proc/sys/vm/swappiness

free -h
cat /proc/swaps

grep VmSwap /proc/$MONGOD_PID/status

ps -o rss,vsz,comm -p $MONGOD_PID

mongosh --quiet --eval 'db.serverStatus().wiredTiger.cache["maximum bytes configured"]'

mongosh --quiet --eval 'db.serverStatus().extra_info.page_faults'

cat /sys/kernel/mm/transparent_hugepage/enabled

numastat -p $MONGOD_PID

cat /proc/$MONGOD_PID/oom_score_adj

How to diagnose it

  1. Confirm mongod is swapped. Any nonzero VmSwap in /proc/<pid>/status is abnormal.
  2. Correlate with latency. Read and write spikes in db.serverStatus().opLatencies, together with rising page fault rates, confirm memory pressure. extra_info.page_faults is cumulative; calculate the delta over an interval.
  3. Find the memory consumer. Compare mongod RSS to wiredTiger.cache["maximum bytes configured"]. Budget roughly 1MB per connection plus 1-2GB of internal overhead. If the expected footprint is below physical RAM but swapping still occurs, another process may be consuming memory, or the kernel is over-aggressive due to swappiness.
  4. Inspect cache sizing. The WiredTiger default is 50% of RAM minus 1GB. If co-hosted software or container limits reduce available memory below this default, the cache pressures the OS.
  5. Verify kernel tuning. Check vm.swappiness, NUMA policy, and THP. Misconfiguration is the most common root cause after insufficient RAM.
  6. Check for application-thread evictions. In db.serverStatus().wiredTiger.cache, growing pages evicted by application threads means the cache is under memory pressure, which often precedes OS-level swapping.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
System swap usageAny swap consumed by mongod signals swap death riskVmSwap > 0 for the mongod process
Page fault rateHard page faults mean data is not residentRate increasing after the warmup period
WiredTiger cache fill ratioPressure here precedes OS swappingSustained > 80%
WiredTiger cache dirty ratioDirty data accumulation strains flush capacity and increases memory pressureSustained > 10%
Memory RSS vs system memoryApproaching the limit triggers swap or OOMRSS > 90% of system RAM
Connection countEach connection adds ~1MB of thread stack memoryGrowth correlating with an RSS spike

Fixes

Reduce memory pressure immediately

If mongod is actively swapping, do not restart it as a first response. A restart triggers cache warmup, potential election churn, and connection storms. Kill unnecessary long-running operations with db.currentOp() and db.killOp() to free tickets and memory. If the working set exceeds RAM, reduce the WiredTiger cache size temporarily or move the node to a larger instance.

Set vm.swappiness to 1

A value of 1 tells the kernel to avoid swapping unless absolutely necessary. Do not set it to 0. A value of 0 disables proactive swap and increases the risk that the kernel kills mongod under sudden memory pressure rather than paging out cleanly.

Set it immediately:

sudo sysctl vm.swappiness=1

Persist it in /etc/sysctl.conf:

echo 'vm.swappiness = 1' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Tradeoff: A low swappiness value protects mongod but means the OOM killer may target other processes first. If no swap is configured and swappiness is 1, the system has no emergency relief valve other than OOM kills.

Disable Transparent Huge Pages

THP causes latency spikes and memory fragmentation. Check the current state:

cat /sys/kernel/mm/transparent_hugepage/enabled

If the output includes [always] or [madvise], write never to the sysfs control file and persist the setting through your host’s init framework.

Tradeoff: Disabling THP slightly increases TLB pressure for workloads that would benefit from huge pages. For MongoDB, the latency stability gain outweighs this cost.

Configure NUMA interleaving

On multi-socket servers, run mongod with memory interleaved across all NUMA nodes to prevent one socket from saturating while others remain free:

numactl --interleave=all mongod ...

Also ensure the numad daemon is not running, because its dynamic placement conflicts with static interleaving.

Tradeoff: Interleaving adds minor cross-socket memory latency for localized access patterns, but prevents catastrophic imbalance.

Protect mongod from the OOM killer

Because mongod has a high RSS, the Linux OOM killer often selects it first. Set the OOM score adjustment to -1000 to exclude mongod from OOM killing:

echo -1000 | sudo tee /proc/$(pgrep mongod)/oom_score_adj

Tradeoff: Protecting mongod means another process will be killed instead. Ensure your host is not running other critical unprotected services that could cause a cascading failure if OOM-killed.

Container-specific tuning

When mongod runs inside a container, the WiredTiger cache default sizes itself against host RAM unless overridden. Explicitly set storage.wiredTiger.engineConfig.cacheSizeGB to roughly 50% of the container memory limit minus 1GB. Set vm.swappiness=1 on the host kernel. If your orchestrator uses cgroup-level swap controls, ensure they do not override the host setting.

Prevention

  • Set vm.swappiness=1 before production.
  • Size the WiredTiger cache for the deployment. Reduce the limit if you co-host other software or run inside a container.
  • Monitor swap usage continuously. Any swap consumed by mongod is an emergency, not a warning.
  • Disable THP and configure NUMA at provision time. Treat these as standard host image hardening.
  • Right-size instances before data growth exceeds RAM. Track page fault rates and cache fill trends weekly to forecast runway.
  • Limit application connection pool sizes. Unbounded growth increases mongod RSS directly.

How Netdata helps

  • Track per-process swap usage for mongod. Any nonzero value is a critical signal.
  • Correlate page fault rates with MongoDB opLatencies to distinguish swapping from slow queries.
  • Alert on memory utilization approaching limits alongside connection count growth.
  • Monitor kernel settings such as vm.swappiness and THP status to detect configuration drift after host updates.