MongoDB OOM-killed by the kernel: RSS, cache sizing, and oom_score_adj

You find mongod gone. The replica set has no primary. Applications time out. MongoDB logs show no graceful shutdown. Instead, dmesg shows Out of memory: Killed process 12345 (mongod). The Linux OOM killer has reaped the process. MongoDB is a frequent target because its resident set size is usually the largest on the host.

An OOM kill is not a MongoDB bug. It is the kernel freeing RAM by terminating the highest-scoring process. mongod’s RSS is dominated by the WiredTiger cache, plus roughly 1 MB per connection, plus roughly 500 MB to 1 GB of internal overhead for indexes, session buffers, and stack. When that sum comes within 1 GB of total RAM, the node is in the danger zone. The kill is abrupt: no stepdown, no replica set coordination, and after restart the cache must warm again.

The fix is rarely just adding RAM. Size the WiredTiger cache so RSS fits safely inside the host or container limit, control connection count and churn, and use oom_score_adj as a protective signal without making the host unmanageable.

flowchart TD
    A[System RAM] --> B[mongod RSS]
    B --> C[WiredTiger cache]
    B --> D[~1 MB per connection]
    B --> E[Internal overhead ~500 MB-1 GB]
    C --> F[Cache fill >80%]
    D --> G[Connection storm]
    F --> H[RSS approaches RAM]
    G --> H
    H --> I[OOM killer selects mongod]
    I --> J[mongod terminated]

What this means

RSS is the physical memory mongod occupies. In a healthy node:

RSS ~= WiredTiger cache size + (connections current × ~1 MB) + ~500 MB-1 GB overhead

If your node has 16 GB RAM and the default cache formula applies, WiredTiger claims max(0.5 × (16 - 1), 0.25) = 7.5 GB. With 1,000 connections and 1 GB overhead, RSS sits around 9.5 GB. That is comfortable. But if the same default applies inside an 8 GB container, the cache still claims 7.5 GB, leaving almost no room for connections or overhead. RSS quickly reaches the container limit and the cgroup OOM killer terminates mongod.

Treat mem.resident within 1 GB of total RAM, or within 1 GB of the container memory limit, as a pre-OOM condition.

Common causes

CauseWhat it looks likeFirst thing to check
WiredTiger cache using host RAM formula inside a containerOOM kill shortly after startup; RSS hovers near the container limit even under light loadwiredTiger.cache.maximum bytes configured vs the container memory limit
Connection storm after failover, deploy, or DNS bliptotalCreated spikes; RSS tracks connections.current; latency rises from thread overheaddb.serverStatus().connections correlated with db.serverStatus().mem.resident
Default cache on a small VMRSS reaches the 2 GB limit despite the 256 MB cache floormem.resident vs total RAM minus 1 GB safety margin
Long-running snapshots pinning cacheCache fill and dirty ratio rise without workload increase; operations slow from application-thread evictiondb.currentOp() for old transactions and metrics.cursor.open.noTimeout
Memory leak or heap fragmentationRSS grows steadily while cache utilization and connection count are flatdb.serverStatus().tcmalloc.generic for heap_size vs current_allocated_bytes

Quick checks

All are read-only. Run in order.

# Confirm the OOM kill in the kernel log
sudo dmesg -T | grep -i "out of memory"
sudo grep "Killed process.*mongod" /var/log/kern.log
# Check mongod RSS in kilobytes
for pid in $(pgrep mongod); do grep VmRSS /proc/$pid/status; done
# Check current oom_score_adj
for pid in $(pgrep mongod); do cat /proc/$pid/oom_score_adj; done
# Check swap and swappiness
sysctl vm.swappiness
cat /proc/swaps
// Check RSS, cache size, and connection count
var mem = db.serverStatus().mem;
var wt = db.serverStatus().wiredTiger.cache;
var conn = db.serverStatus().connections;
print("RSS MB: " + mem.resident);
print("Cache max MB: " + (wt["maximum bytes configured"] / 1024 / 1024).toFixed(0));
print("Cache used %: " + (100 * wt["bytes currently in the cache"] / wt["maximum bytes configured"]).toFixed(1));
print("Connections current: " + conn.current);
print("Connections totalCreated: " + conn.totalCreated);

How to diagnose it

  1. Confirm the kill was OOM. Look for Out of memory: Killed process <pid> (mongod) in dmesg or /var/log/kern.log. In containers, the runtime may also emit a cgroup-specific OOM message.
  2. Measure RSS after restart. Use db.serverStatus().mem.resident and compare it to total system RAM or the container limit. If it is already within 1 GB, the cache is oversized or connections are too high.
  3. Compare the configured cache to available memory. wiredTiger.cache.maximum bytes configured should leave room for connections, overhead, and the OS page cache.
  4. Check for connection churn. A high totalCreated delta with stable current means connections are being destroyed and recreated rapidly. Each creation allocates a thread stack and spikes RSS.
  5. Check for container misconfiguration. In containers, ensure storage.wiredTiger.engineConfig.cacheSizeGB is set explicitly and sized for the container limit, not the host RAM.
  6. Look for snapshot pinning. Long-running multi-document transactions and noCursorTimeout cursors hold old cache snapshots open, preventing eviction and inflating RSS.
  7. Suspect a leak only after ruling out cache and connections. Compare tcmalloc.generic.heap_size to current_allocated_bytes. Significant and growing divergence suggests fragmentation.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
mem.residentTracks mongod RSS, the memory the OOM killer scoresWithin 1 GB of total RAM or cgroup limit
wiredTiger.cache.maximum bytes configuredShows whether the cache is sized to the host instead of the container limitLeaves no room for connections and overhead
wiredTiger.cache.bytes currently in the cacheCache fill directly adds to RSSSustained >80% with rising eviction
wiredTiger.cache.tracked dirty bytes in the cacheDirty ratio predicts cache pressure and checkpoint stall>20% of maximum bytes configured
wiredTiger.cache.pages evicted by application threadsIndicates cache pressure forcing user threads to do eviction workAny sustained nonzero rate
connections.currentEach connection adds ~1 MB of RSS and scheduling overheadSustained >80% of maxIncomingConnections
connections.totalCreated deltaChurn allocates and destroys thread stacks repeatedlySharp spike while current is stable
metrics.cursor.open.noTimeoutEach cursor can pin a cache snapshot indefinitelyGrowing or unexpectedly high
extra_info.page_faults rateSustained high rate after warmup indicates the working set exceeds memorySustained high rate after warmup

Fixes

Resize the WiredTiger cache

The safe cache size is not the default. It is the largest value that keeps total RSS below the danger zone.

For a container or VM with memory limit L, set cacheSizeGB so expected RSS stays at least 1 GB below L. Using the approximation RSS ~= cache + connections + overhead:

Example: an 8 GB container with 1,000 connections (~1 GB) and 1 GB overhead should set cacheSizeGB to roughly 3 GB, not the host-derived 7.5 GB or the container-aware default of 3.5 GB. That yields an expected RSS near 5 GB, keeping 3 GB of headroom.

Update mongod.conf:

storage:
  wiredTiger:
    engineConfig:
      cacheSizeGB: 3

Restart mongod to apply. You can change the cache at runtime with setParameter and wiredTigerEngineRuntimeConfig, but the change does not survive restart. Always update mongod.conf as the source of truth.

Reduce connection pressure

If a connection storm triggered the OOM kill, reduce the driver pool size, eliminate connection leaks, and route read traffic to secondaries. If the server is actively flooded, temporarily lowering net.maxIncomingConnections rejects new connections cleanly instead of accepting them and dying from RSS growth.

Protect mongod with oom_score_adj

Set oom_score_adj to a negative value so the kernel is less likely to select mongod first. Do not set it to -1000; full immunity can cause the kernel to kill sshd, systemd, or other critical processes instead, potentially locking you out of the host. A practical protective value is often around -900.

# Run as root. Persists only until restart; set it in systemd or init scripts for permanence.
for pid in $(pgrep mongod); do echo -900 > /proc/$pid/oom_score_adj; done

Release pinned snapshots

Warning: killOp terminates operations immediately. Use only if you have identified the specific transaction or cursor causing pressure.

Kill long-running transactions and noCursorTimeout cursors that pin cache snapshots:

// Find transactions open > 60 seconds
db.currentOp({ "transaction": { "$exists": true } }).inprog.forEach(function(op) {
  if (op.transaction.timeOpenMicros > 60000000) {
    print("Killing " + op.opid + " open for " + op.transaction.timeOpenMicros / 1000000 + "s");
    db.killOp(op.opid);
  }
});

Prevention

  • Size for headroom. Keep mem.resident below roughly 80% of total RAM or the container limit, and never within 1 GB of the ceiling.
  • Plot RSS weekly. Track mem.resident against cache size and connection count. If RSS grows without cache growth, investigate fragmentation or leaks.
  • Cap connections. Operate below 50% of maxIncomingConnections to leave room for reconnection storms.
  • Set vm.swappiness=1. MongoDB should not swap. A value of 1 lets the kernel swap only under extreme pressure without evicting hot WiredTiger pages eagerly.
  • Disable Transparent Huge Pages. THP causes latency spikes and fragmentation for database workloads. Set it to never or madvise.
  • Monitor dirty ratio and application-thread evictions. Rising dirty ratio and application-thread evictions are leading indicators that cache pressure is building before RSS explodes.

How Netdata helps

  • Correlate mem.resident, wiredTiger.cache utilization, and system memory usage on the same timeline to see when RSS approaches the host or cgroup limit.
  • Alert on WiredTiger cache dirty ratio climbing above safe thresholds before the pressure cascades into RSS growth.
  • Track connections.current and connections.totalCreated deltas alongside RSS to distinguish a connection storm from cache-driven memory growth.
  • Show container memory limits next to MongoDB metrics to expose container misconfiguration immediately.
  • Map kernel OOM events to MongoDB process state changes to confirm whether a restart was caused by the OOM killer.