MongoDB page faults high: working set exceeding memory after warmup
Hard page faults long after startup mean the active data set exceeds resident memory. On Linux, extra_info.page_faults counts major faults: the OS read data from disk because the page was missing from both the WiredTiger cache and the OS page cache. A brief spike after restart is normal during warmup, but sustained faults mean the working set does not fit. On EBS gp3, 50 faults per second can degrade latency. On NVMe, hundreds per second may be tolerable, but neither is free. Confirm the cause, distinguish warmup from pressure, and reduce the fault rate without guessing.
What this means
MongoDB uses a two-tier memory hierarchy. WiredTiger maintains its own uncompressed cache, defaulting to roughly 50% of RAM minus 1 GB. When a document is not in the WiredTiger cache, WiredTiger may still find the compressed on-disk page in the OS page cache. A page fault only fires when neither layer holds the data, forcing a physical disk read. Sustained faults after warmup mean the active data set exceeds the combined memory of both tiers. This is worse than a WiredTiger cache miss served by the OS page cache. It is an OS-level signal that the node is memory-bound, and every fault adds disk I/O latency directly to the operation.
flowchart TD
A[Query requests page] --> B{In WiredTiger cache?}
B -->|Yes| C[Serve from WT cache]
B -->|No| D{In OS page cache?}
D -->|Yes| E[Read into WT cache]
D -->|No| F[Major page fault
disk I/O required]
E --> C
F --> CCommon causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Working set growth or unindexed queries | Faults rise with disk read IOPS; docsExamined far exceeds docsReturned in slow queries. | WiredTiger cache fill ratio and db.currentOp() for collection scans. |
| WiredTiger cache undersized or container limit ignored | Faults are high despite a modest active set; cache is capped far below available RAM. | wiredTiger.cache.maximum bytes configured against host or container memory limit. |
| Long-running snapshots pinning old versions | Cache fill is high but dirty ratio is low; faults persist with few new writes. | db.currentOp() for open transactions and metrics.cursor.open.noTimeout count. |
| External memory pressure or swap | Faults spike alongside system-level memory exhaustion; mongod RSS is stable but available memory is low. | free -m and vmstat 1 for swap activity and system reclaim. |
| Inadequate storage for unavoidable faults | Fault rate is acceptable for NVMe but painful on EBS gp3; latency spikes correlate with fault spikes. | Storage device type and iostat -x 1 for await and utilization. |
Quick checks
Run these read-only commands to baseline the current state.
# Check system memory and swap pressure
free -m && vmstat 1 3
# Major page faults per mongod process
pgrep mongod | while read pid; do
awk '{print "pid "$1" majflt:", $12}' /proc/$pid/stat
done
// Check WiredTiger cache fill, dirty ratio, and configured size
var c = db.serverStatus().wiredTiger.cache;
var max = c["maximum bytes configured"];
var used = c["bytes currently in the cache"];
var dirty = c["tracked dirty bytes in the cache"];
print("Cache used: " + (100 * used / max).toFixed(1) + "%");
print("Cache dirty: " + (100 * dirty / max).toFixed(1) + "%");
print("Max configured: " + (max / 1024 / 1024 / 1024).toFixed(1) + " GB");
// Check cumulative page faults (compute delta over 60s for a rate)
db.serverStatus().extra_info.page_faults
// Check for long-running operations and open transactions
db.currentOp({ "active": true, "secs_running": { "$gt": 60 } }).inprog.forEach(function(op) {
print(op.opid + " | " + op.op + " | " + op.secs_running + "s | " + op.ns);
});
// Check for cursors that never time out and can pin snapshots
printjson(db.serverStatus().metrics.cursor)
# Check disk I/O latency and utilization
iostat -x 1 5
// Check resident memory vs expected baseline
var mem = db.serverStatus().mem;
var conn = db.serverStatus().connections;
print("RSS MB: " + mem.resident);
print("Connections: " + conn.current);
How to diagnose it
Confirm the fault rate is abnormal. Sample
extra_info.page_faultstwice over 60 seconds and compute the delta. If the node recently restarted, high faults are expected while the cache warms. Wait until the working set should have loaded before treating faults as abnormal.Check the two-tier memory state. Inspect WiredTiger cache fill ratio. If it is below 70% and faults are high, the working set likely exceeds the OS page cache because other processes are consuming RAM or the OS is reclaiming cache aggressively. If cache fill is above 80%, WiredTiger itself is under pressure.
Identify snapshot retention. Run
db.currentOp()filtered for transactions and aggregations running longer than 60 seconds. Checkmetrics.cursor.open.noTimeout. If either is elevated, old snapshots are preventing WiredTiger from evicting historical versions, reducing the effective cache available for the working set.Correlate with query efficiency. Scan the slow query log for
COLLSCANor queries wheredocsExaminedvastly exceedsdocsReturned. A new unindexed query can pull far more data into memory than necessary, displacing the real working set and causing faults on subsequent accesses.Validate the cache sizing. Compare
maximum bytes configuredto the host’s physical RAM. In containers, set the cache size explicitly based on the container limit , because the default formula may use host RAM rather than the container limit. A container with a 4 GB limit on a 64 GB host can experience OOM kills if the cache is sized to host RAM, or suffer cache pressure if capped too low.Check storage backend latency. Run
iostat -x 1. Ifawaitis high during fault spikes, the disk subsystem is the bottleneck. On EBS gp2, check burst balance. On gp3, verify provisioned IOPS and throughput are not saturated.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
extra_info.page_faults rate | Hard faults mean disk I/O on every miss. | Sustained rate above 50/s on EBS gp3, or trending upward after warmup. |
| WiredTiger cache fill ratio | Shows if the working set exceeds the internal cache. | Above 80% sustained, especially with rising eviction rates. |
| WiredTiger cache dirty ratio | Dirty data accumulation can displace clean pages and worsen faults. | Above 10% sustained; above 20% risks checkpoint stalls. |
metrics.cursor.open.noTimeout | Each cursor can hold a snapshot, pinning old versions. | Above zero is a risk; above 10 strongly indicates cache pressure from snapshots. |
currentOp max age | One runaway query can flood the cache with irrelevant pages. | Any non-background operation above 300 seconds. |
| System available memory / swap | External memory pressure steals page cache from MongoDB. | Available memory near zero or any swap activity. |
Disk read await (iostat) | Confirms whether faults are actually causing queueing. | await above 20 ms sustained during fault spikes. |
Fixes
Reduce the working set or improve locality
Add missing indexes or optimize queries so MongoDB touches fewer pages. Use db.collection.aggregate([{ $indexStats: {} }]) to verify indexes are being used. A single new collection scan can displace a previously stable working set. Tradeoff: write amplification from additional indexes and the I/O cost of background builds.
Right-size the WiredTiger cache
If the cache is too small for the working set, increase it with --wiredTigerCacheSizeGB or storage.wiredTiger.engineConfig.cacheSizeGB in the configuration file. Do not exceed roughly 80% of available RAM; the OS page cache and connection thread stacks also need space. In containers, set this explicitly based on the container limit, not the host’s. Tradeoff: less RAM for the OS page cache, which can paradoxically increase faults if overdone.
Free pinned snapshots
Kill unnecessarily long-running operations via db.killOp(). Identify applications leaving noCursorTimeout cursors open and close them. This immediately increases the pool of evictable pages. Warning: killing operations is disruptive to clients and can interrupt in-flight transactions or ETL jobs.
Reduce memory competition
Shrink application connection pool sizes to reduce thread stack overhead, or move non-MongoDB workloads off the node. Ensure vm.swappiness is set to 1 so the OS prefers reclaiming page cache over swapping. If swap is active, faults become far more expensive.
Scale out or archive cold data
If the working set exceeds what can fit in memory economically, shard the collection to spread the working set across nodes, or archive cold data to reduce the active set. Tradeoff: operational complexity.
Upgrade storage if faults are unavoidable
If the working set cannot be reduced and memory cannot be increased, ensure the storage layer can absorb the fault rate. Moving from EBS gp3 to NVMe-backed instances turns a latency crisis into manageable background noise.
Prevention
- Trend cache fill and dirty ratio over weeks. A steady climb from 60% to 75% gives early warning that the working set is approaching limits.
- Audit index usage monthly. Unused indexes consume cache and write bandwidth. Missing indexes cause scans that bloat the effective working set.
- Monitor connection churn, not just connection count. High
totalCreatedrates increase memory fragmentation and RSS pressure. - Gate alerts on uptime. Suppress page fault alerts during the first 30 minutes after restart to avoid false positives during warmup.
- Track
currentOpmax age continuously. Catching a runaway query at 60 seconds prevents it from flooding cache and causing a fault storm.
How Netdata helps
Netdata correlates extra_info.page_faults with WiredTiger cache fill, dirty ratio, and eviction rates. OS-level disk latency and mongod RSS on the same dashboard distinguish external memory pressure from internal cache saturation. Historical tracking of long-running operation age and cursor counts shows which query or noTimeout cursor preceded a fault spike. Connection churn is shown as a rate, surfacing thread-creation overhead that competes with the page cache. Second-granularity collection catches brief fault bursts that slower tools average away.
Related guides
- How MongoDB actually works in production: a mental model for operators
- MongoDB pages evicted by application threads: when eviction becomes user latency
- MongoDB balancer stuck and jumbo chunks: permanent imbalance and how to fix it
- MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches
- MongoDB WiredTiger cache pressure cascade: eviction stalls and latency spikes
- MongoDB cache too small: sizing the WiredTiger cache for your working set
- MongoDB checkpoint duration climbing: diagnosing slow WiredTiger checkpoints
- MongoDB checkpoint stall write freeze: when all writes stop with no error
- MongoDB chunk migration storms: moveChunk I/O pressure and range locks
- MongoDB connection churn: high totalCreated rate and thread creation overhead
- MongoDB connection refused at maxIncomingConnections: hitting the connection ceiling
- MongoDB connection storm spiral: reconnection floods after an election or deploy







