MongoDB swapping: why mongod must never swap and how to tune the OS
Application timeouts climb. MongoDB latency jumps from milliseconds to seconds or minutes, yet mongod is still running and accepting connections. CPU is low, disk I/O is not saturated, and the MongoDB log shows no errors. The process has not crashed. It has entered swap death. When the Linux kernel evicts mongod pages to swap, the database continues to function at roughly 1/1000th of normal speed. MongoDB relies on the WiredTiger cache and OS page cache to remain resident in RAM.
What this means
Swapping turns a memory access into a disk read. For mongod, this is catastrophic: the storage engine and OS cache are designed for RAM-speed access. WiredTiger maintains an in-memory cache separate from the OS page cache. When either layer is swapped out, cache misses hit disk at swap speed. The process stays alive, heartbeats continue, and replica set elections may not trigger because the node is technically responsive. Operations queue indefinitely. The degradation is self-reinforcing: slow operations hold tickets and connections longer, increasing memory pressure and causing more swapping.
flowchart TD
A[Memory pressure] --> B[OS swaps mongod pages]
B --> C[Cache misses hit disk]
C --> D[Latency spikes 10-100x]
D --> E[Connections pile up]
E --> F[More memory pressure]
F --> ACommon causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| vm.swappiness at default (60) | Host swaps under moderate pressure even when buffers could be dropped | cat /proc/sys/vm/swappiness |
| Working set exceeds RAM | Page fault rate climbs after warmup; RSS sits near physical memory limit | db.serverStatus().extra_info.page_faults and ps RSS |
| NUMA imbalance on multi-socket | Uneven memory allocation across sockets; some nodes saturated while others are free | numastat and /proc/<pid>/numa_maps |
| Transparent Huge Pages enabled | Latency spikes and fragmentation under load, especially on older MongoDB versions | cat /sys/kernel/mm/transparent_hugepage/enabled |
| Container memory limit too small | OOM kills or swap pressure inside the container despite free host RAM | WiredTiger cache max bytes vs container limit |
Quick checks
# Substitute the mongod PID explicitly if multiple instances are running.
MONGOD_PID=$(pgrep mongod)
cat /proc/sys/vm/swappiness
free -h
cat /proc/swaps
grep VmSwap /proc/$MONGOD_PID/status
ps -o rss,vsz,comm -p $MONGOD_PID
mongosh --quiet --eval 'db.serverStatus().wiredTiger.cache["maximum bytes configured"]'
mongosh --quiet --eval 'db.serverStatus().extra_info.page_faults'
cat /sys/kernel/mm/transparent_hugepage/enabled
numastat -p $MONGOD_PID
cat /proc/$MONGOD_PID/oom_score_adj
How to diagnose it
- Confirm mongod is swapped. Any nonzero
VmSwapin/proc/<pid>/statusis abnormal. - Correlate with latency. Read and write spikes in
db.serverStatus().opLatencies, together with rising page fault rates, confirm memory pressure.extra_info.page_faultsis cumulative; calculate the delta over an interval. - Find the memory consumer. Compare mongod RSS to
wiredTiger.cache["maximum bytes configured"]. Budget roughly 1MB per connection plus 1-2GB of internal overhead. If the expected footprint is below physical RAM but swapping still occurs, another process may be consuming memory, or the kernel is over-aggressive due to swappiness. - Inspect cache sizing. The WiredTiger default is 50% of RAM minus 1GB. If co-hosted software or container limits reduce available memory below this default, the cache pressures the OS.
- Verify kernel tuning. Check
vm.swappiness, NUMA policy, and THP. Misconfiguration is the most common root cause after insufficient RAM. - Check for application-thread evictions. In
db.serverStatus().wiredTiger.cache, growingpages evicted by application threadsmeans the cache is under memory pressure, which often precedes OS-level swapping.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| System swap usage | Any swap consumed by mongod signals swap death risk | VmSwap > 0 for the mongod process |
| Page fault rate | Hard page faults mean data is not resident | Rate increasing after the warmup period |
| WiredTiger cache fill ratio | Pressure here precedes OS swapping | Sustained > 80% |
| WiredTiger cache dirty ratio | Dirty data accumulation strains flush capacity and increases memory pressure | Sustained > 10% |
| Memory RSS vs system memory | Approaching the limit triggers swap or OOM | RSS > 90% of system RAM |
| Connection count | Each connection adds ~1MB of thread stack memory | Growth correlating with an RSS spike |
Fixes
Reduce memory pressure immediately
If mongod is actively swapping, do not restart it as a first response. A restart triggers cache warmup, potential election churn, and connection storms. Kill unnecessary long-running operations with db.currentOp() and db.killOp() to free tickets and memory. If the working set exceeds RAM, reduce the WiredTiger cache size temporarily or move the node to a larger instance.
Set vm.swappiness to 1
A value of 1 tells the kernel to avoid swapping unless absolutely necessary. Do not set it to 0. A value of 0 disables proactive swap and increases the risk that the kernel kills mongod under sudden memory pressure rather than paging out cleanly.
Set it immediately:
sudo sysctl vm.swappiness=1
Persist it in /etc/sysctl.conf:
echo 'vm.swappiness = 1' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
Tradeoff: A low swappiness value protects mongod but means the OOM killer may target other processes first. If no swap is configured and swappiness is 1, the system has no emergency relief valve other than OOM kills.
Disable Transparent Huge Pages
THP causes latency spikes and memory fragmentation. Check the current state:
cat /sys/kernel/mm/transparent_hugepage/enabled
If the output includes [always] or [madvise], write never to the sysfs control file and persist the setting through your host’s init framework.
Tradeoff: Disabling THP slightly increases TLB pressure for workloads that would benefit from huge pages. For MongoDB, the latency stability gain outweighs this cost.
Configure NUMA interleaving
On multi-socket servers, run mongod with memory interleaved across all NUMA nodes to prevent one socket from saturating while others remain free:
numactl --interleave=all mongod ...
Also ensure the numad daemon is not running, because its dynamic placement conflicts with static interleaving.
Tradeoff: Interleaving adds minor cross-socket memory latency for localized access patterns, but prevents catastrophic imbalance.
Protect mongod from the OOM killer
Because mongod has a high RSS, the Linux OOM killer often selects it first. Set the OOM score adjustment to -1000 to exclude mongod from OOM killing:
echo -1000 | sudo tee /proc/$(pgrep mongod)/oom_score_adj
Tradeoff: Protecting mongod means another process will be killed instead. Ensure your host is not running other critical unprotected services that could cause a cascading failure if OOM-killed.
Container-specific tuning
When mongod runs inside a container, the WiredTiger cache default sizes itself against host RAM unless overridden. Explicitly set storage.wiredTiger.engineConfig.cacheSizeGB to roughly 50% of the container memory limit minus 1GB. Set vm.swappiness=1 on the host kernel. If your orchestrator uses cgroup-level swap controls, ensure they do not override the host setting.
Prevention
- Set
vm.swappiness=1before production. - Size the WiredTiger cache for the deployment. Reduce the limit if you co-host other software or run inside a container.
- Monitor swap usage continuously. Any swap consumed by mongod is an emergency, not a warning.
- Disable THP and configure NUMA at provision time. Treat these as standard host image hardening.
- Right-size instances before data growth exceeds RAM. Track page fault rates and cache fill trends weekly to forecast runway.
- Limit application connection pool sizes. Unbounded growth increases mongod RSS directly.
How Netdata helps
- Track per-process swap usage for mongod. Any nonzero value is a critical signal.
- Correlate page fault rates with MongoDB opLatencies to distinguish swapping from slow queries.
- Alert on memory utilization approaching limits alongside connection count growth.
- Monitor kernel settings such as
vm.swappinessand THP status to detect configuration drift after host updates.
Related guides
- How MongoDB actually works in production: a mental model for operators
- MongoDB pages evicted by application threads: when eviction becomes user latency
- MongoDB balancer stuck and jumbo chunks: permanent imbalance and how to fix it
- MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches
- MongoDB WiredTiger cache pressure cascade: eviction stalls and latency spikes
- MongoDB cache too small: sizing the WiredTiger cache for your working set
- MongoDB checkpoint duration climbing: diagnosing slow WiredTiger checkpoints
- MongoDB checkpoint stall write freeze: when all writes stop with no error
- MongoDB chunk migration storms: moveChunk I/O pressure and range locks
- MongoDB connection churn: high totalCreated rate and thread creation overhead
- MongoDB connection refused at maxIncomingConnections: hitting the connection ceiling
- MongoDB connection storm spiral: reconnection floods after an election or deploy







