$ guides / mongodb / mongodb-memory-rss-growing ▌

Operations Guides

MongoDB RSS growing without cache growth: leaks, threads, and tcmalloc fragmentation

db.serverStatus().mem.resident climbs while WiredTiger cache utilization stays flat and the host is not swapping. Virtual memory is larger than RSS by design and is not an alert target. Only RSS reflects physical memory pressure. When RSS grows without cache growth, the problem lives outside the storage engine.

This pattern points to one of three areas: tcmalloc heap retention and fragmentation, per-connection thread stack accumulation, or unbounded internal allocations from cursors, plan caches, or aggregation pipelines. Each connection reserves roughly 1 MB of stack space, so a connection storm can add gigabytes of RSS in minutes. TCMalloc caches freed memory in per-thread or per-CPU arenas, which inflates RSS independently of the WiredTiger cache.

Version-specific allocator changes complicate the picture. MongoDB 8.0 switched to a per-CPU tcmalloc implementation that changes THP behavior. MongoDB 7.0 introduced a confirmed memory leak in the Slot-Based Execution plan cache (SERVER-96924). Container deployments add another wrinkle: MongoDB may detect host RAM instead of the container limit, leaving the cache unbounded relative to the cgroup.

Use the read-only checks below to isolate the source before restarting. A restart drops RSS and erases the diagnostic state you need to prevent recurrence.

What this means

WiredTiger cache is a managed buffer pool with its own memory budget (cacheSizeGB). When cache utilization is flat but RSS rises, the additional memory comes from the C++ heap (tcmalloc), thread stacks, or internal data structures. TCMalloc retains deallocated blocks in per-thread or per-CPU caches to reduce lock contention. This cached memory counts toward RSS but is invisible to WiredTiger.

MongoDB uses one thread per connection. Each backend thread reserves up to 1 MB of virtual address space for its stack, with typical usage in the tens to hundreds of kilobytes. At thousands of connections, thread stacks alone can consume multiple gigabytes of RSS.

Internal structures can also balloon. Aggregation pipeline stages allocate memory outside the WiredTiger cache, capped at 100 MB per stage by default . Cursors with noCursorTimeout hold snapshots open, pinning memory until they close. The query plan cache, particularly in MongoDB 7.0, has exhibited unbounded growth under specific query patterns.

flowchart TD
    A[RSS growing] --> B{Cache flat?}
    B -->|Yes| C[Non-cache growth]
    C --> D[tcmalloc retention]
    C --> E[Connection threads]
    C --> F[Cursor leaks]
    C --> G[Plan cache leak]
    C --> H[Aggregation memory]
    B -->|No| I[See cache pressure guides]

Common causes

Cause	What it looks like	First thing to check
TCMalloc heap retention and fragmentation	RSS exceeds active allocations; `pageheap_free_bytes` plus `total_free_bytes` is high	`db.serverStatus().tcmalloc`
Connection thread stack accumulation	RSS spikes correlate with connection count spikes; `current` in the thousands	`db.serverStatus().connections`
Cursor or aggregation memory leak	`open.noTimeout` or `open.total` growing; long-running aggregations	`db.serverStatus().metrics.cursor` and `db.currentOp()`
Plan cache leak (MongoDB 7.0 SBE)	Unbounded RSS growth on 7.0 with complex `$in` arrays; plan cache size climbing	`db.serverStatus().metrics.query.planCacheTotalSizeEstimateBytes`
Container cache misconfiguration	RSS approaches container memory limit while `cacheSizeGB` is sized for host RAM	`db.serverStatus().wiredTiger.cache` maximum bytes vs cgroup limit

Quick checks

# Compare RSS to WiredTiger cache limit
mongosh --quiet --eval 'var s=db.serverStatus(); print("RSS MB: " + s.mem.resident); print("Cache max GB: " + (s.wiredTiger.cache["maximum bytes configured"]/1024/1024/1024).toFixed(1));'

// Check tcmalloc retained memory
var tc = db.serverStatus().tcmalloc;
print("Retained bytes: " + (tc.pageheap_free_bytes + tc.total_free_bytes));
print("Heap size to allocated ratio: " + (tc.generic.heap_size / tc.generic.current_allocated_bytes).toFixed(2));

// Check connection count and churn
var c = db.serverStatus().connections;
printjson({current: c.current, available: c.available, totalCreated: c.totalCreated});

// Check cursor state
printjson(db.serverStatus().metrics.cursor);

// Check plan cache size
print("Plan cache bytes: " + db.serverStatus().metrics.query.planCacheTotalSizeEstimateBytes);

// Find long-running operations
db.currentOp({ "active": true, "secs_running": { "$gt": 60 } }).inprog.forEach(function(op) {
  print(op.opid + " | " + op.op + " | " + op.secs_running + "s | " + op.ns);
});

# Check cgroup memory limit if containerized
cat /sys/fs/cgroup/memory/memory.limit_in_bytes 2>/dev/null || cat /sys/fs/cgroup/memory.max 2>/dev/null

How to diagnose it

Confirm the cache is flat. Sample db.serverStatus().wiredTiger.cache fill ratio and dirty ratio over time. Stable values below 80 percent rule out cache-driven growth.
Compare RSS to the expected baseline. Expected RSS is approximately cacheSizeGB plus 500 MB to 1 GB of internal overhead plus connections.current multiplied by roughly 1 MB. If actual RSS exceeds this by more than 20 percent, continue.
Check tcmalloc stats. Sum pageheap_free_bytes and total_free_bytes. If this sum represents a large portion of the RSS gap, the cause is fragmentation or allocator caching rather than a leak.
Check connection count. If current is high (thousands) and correlates with the RSS timeline, thread stacks are the likely source. A high totalCreated delta indicates churn.
Check cursor state. Elevated open.noTimeout or growing open.total without corresponding workload means cursors are leaking. Each one may hold a snapshot and memory.
Check plan cache size on MongoDB 7.0. Continuous growth of planCacheTotalSizeEstimateBytes with complex $in arrays suggests the SBE plan cache leak (SERVER-96924).
Check for aggregation memory pressure. Review db.currentOp() for long-running aggregations.
Verify container memory limits. Ensure cacheSizeGB is sized for the container limit, not host RAM.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`mem.resident`	Physical memory consumed by mongod	Exceeds `(cacheSizeGB + 2GB + connections x 1MB)` by more than 20 percent
`tcmalloc` retained bytes	Allocator-cached memory inflates RSS independently of cache	`pageheap_free_bytes + total_free_bytes` grows steadily or exceeds 20 percent of RSS
`connections.current`	Each connection adds roughly 1 MB stack RSS	Sustained count greater than 1000 or rapid spikes
`metrics.cursor.open.noTimeout`	Leaked cursors pin snapshots and memory	Count greater than 10 or growing steadily
`metrics.query.planCacheTotalSizeEstimateBytes`	Plan cache leak indicator on affected versions	Monotonic growth without workload change
`wiredTiger.cache` fill ratio	Rules out cache-driven growth	Flat while RSS climbs
`opcounters.getmore`	High cursor iteration rates can indicate leaked or large result sets	Spike without corresponding query increase
`currentOp` max duration	Long-running aggregations allocate outside cache	Operations exceeding 300 seconds

Fixes

TCMalloc fragmentation and retention

High pageheap_free_bytes plus total_free_bytes with stable current_allocated_bytes indicates fragmentation, not a leak.

MongoDB 8.0: Verify THP is enabled . Ensure Restartable Sequences (rseq) are available. If glibc registered rseq first and tcmalloc fell back to per-thread caches, set GLIBC_TUNABLES=glibc.pthread.rseq=0 before starting mongod .
Prior to 8.0: Disable THP to reduce latency spikes and fragmentation.
If retained memory threatens OOM: Schedule a rolling restart during a maintenance window. Disruptive but effective.

Connection and thread stack overhead

Reduce connection count to shrink the aggregate stack footprint.

Review driver pool sizes. Lower maxIncomingConnections if the server accepts more than the workload needs.
Fix connection churn. A high totalCreated delta means pools are destroying and recreating connections. Check for network blips, DNS issues, or election storms causing mass reconnects.
Tradeoff: Lowering limits may cause connection refused errors during spikes, but prevents memory exhaustion.

Cursor and aggregation leaks

Kill leaked cursors. Identify long-running noTimeout cursors in db.currentOp() and terminate them with db.killOp() if safe.
Fix application code to close cursors explicitly and avoid noCursorTimeout unless necessary.
For aggregations that risk exceeding memory limits, enable allowDiskUse: true. This spills intermediate data to disk and avoids OOM, though it increases latency. In MongoDB 6.0 and later, allowDiskUseByDefault controls the global default .
Tradeoff: Disk-based aggregation increases I/O load and slows pipeline execution.

Plan cache leak (MongoDB 7.0)

Upgrade to MongoDB 8.0, which resolves SERVER-96924.
If upgrading is not viable, disable the Slot-Based Execution engine .
As an interim measure, schedule weekly rolling restarts to truncate the plan cache.
Tradeoff: Disabling SBE may change query plans and performance characteristics. Test before applying.

Container memory limits

Explicitly set cacheSizeGB in mongod.conf based on the container memory limit, not host RAM.
Leave headroom for connection stacks and heap overhead (typically cacheSizeGB plus 2 to 3 GB).

Prevention

Trend RSS, cache fill, and tcmalloc retained bytes together. A widening gap between RSS and cache used predicts allocator pressure before it becomes critical.
Monitor connection count and totalCreated delta. Alert on connection churn, not just max connections.
Avoid noCursorTimeout cursors in application code. Close cursors explicitly and use standard timeouts.
Size cacheSizeGB explicitly in containers and standalone deployments so MongoDB does not default to host RAM.
For MongoDB 7.0 deployments using complex aggregations with large $in arrays, plan an upgrade path to 8.0.

How Netdata helps

Charts mem.resident alongside wiredTiger.cache fill ratio, exposing divergence between RSS and cache.
Tracks connection count and churn.
Collects tcmalloc memory statistics where exposed, distinguishing allocator-retained bytes from active allocations.
High-resolution operation latency and queue depth metrics help identify cursor leaks and aggregation pressure.
Container-aware memory charts reveal cgroup limit pressure even when the process sees host RAM.

The Netdata solution

MongoDB monitoring with Netdata

Netdata monitors MongoDB with per-second metrics and automatic dashboards. Watch WiredTiger cache pressure, oplog window, connection counts, checkpoint stalls, and replication health in one place, correlated with the underlying host.

See MongoDB monitoring → Start monitoring free

MongoDB RSS growing without cache growth: leaks, threads, and tcmalloc fragmentation

MongoDB RSS growing without cache growth: leaks, threads, and tcmalloc fragmentation

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

TCMalloc fragmentation and retention

Connection and thread stack overhead

Cursor and aggregation leaks

Plan cache leak (MongoDB 7.0)

Container memory limits

Prevention

How Netdata helps

Related guides

MongoDB monitoring with Netdata