$ guides / mongodb / mongodb-exceeded-memory-limit-group-sort ▌

Operations Guides

MongoDB exceeded memory limit for $group — aggregation spills and allowDiskUse

Application logs show error 16945, or an aggregation pipeline slows by an order of magnitude. In MongoDB, every aggregation stage not backed by an index is limited to 100 megabytes of RAM. When a stage exceeds this limit and disk spilling is not enabled, the operation fails immediately. If spilling is enabled, MongoDB writes temporary files to disk, which keeps the pipeline alive but adds unpredictable latency and extra I/O load.

Before MongoDB 6.0, you had to explicitly opt in to disk spilling with { allowDiskUse: true }. Starting in 6.0, the allowDiskUseByDefault server parameter is true, so eligible stages spill automatically. That removes the hard failure for many pipelines, but it also makes it easier for heavy workloads to hide behind disk I/O instead of failing fast. Some stages and accumulators, such as $graphLookup and the $push and $addToSet accumulators inside $group, cannot spill to disk at all regardless of the setting.

What this means

The 100MB limit applies per stage. When a $group, $sort, $bucket, $bucketAuto, $setWindowFields, or $sortByCount stage accumulates more than 100MB of data in memory, MongoDB has two choices: abort the operation or write intermediate results to temporary files on disk. The decision is controlled by allowDiskUse. In MongoDB 6.0 and later, the server default is true, so most eligible stages spill automatically. In earlier versions, the default is false, and the pipeline aborts unless the client explicitly requests disk spilling.

Not everything can spill. The $graphLookup stage ignores allowDiskUse entirely and is hard-capped at 100MB. If it exceeds the limit, it throws its own memory error. Inside a $group stage, the $push and $addToSet accumulators also cannot spill to disk. Even with allowDiskUse: true, unbounded arrays in these accumulators will hit the memory ceiling and fail.

When spilling does occur, MongoDB marks the operation with usedDisk: true in the profiler and slow-query log. This marks the pipeline as not running purely in memory. Temporary spill files exist for the duration of the pipeline execution and consume disk space on the instance’s storage volume. On small root-volume instances, aggressive spilling can fill the disk and trigger secondary failures. In a sharded cluster, enabling allowDiskUse for a sorting or grouping stage causes the merge step to run on a randomly selected shard rather than the originating shard, which can concentrate disk I/O on an unexpected node.

flowchart TD
    A[Stage exceeds 100MB RAM] --> B{allowDiskUse?}
    B -->|Yes| C{Stage supports spill?}
    B -->|No| D[Error 16945]
    C -->|Yes| E[Write temp files
usedDisk true]
    C -->|No| F[Error cannot spill]

Common causes

Cause	What it looks like	First thing to check
Missing `allowDiskUse` on pre-6.0 or explicit opt-out	Error 16945 in driver logs; pipeline fails immediately	`db.adminCommand({ getParameter: 1, allowDiskUseByDefault: 1 })` and the command options
`$push` or `$addToSet` with unbounded arrays	Memory error even when `allowDiskUse: true`	Accumulators inside the `$group` stage
High-cardinality `$group` or large blocking `$sort`	Pipeline slows dramatically; `usedDisk: true` in profiler	`db.currentOp()` for active aggregations
Missing early `$match` before heavy stage	Spikes in scanned objects; temp files grow	`explain("executionStats")` for `docsExamined`
Blocking `$sort` without index support	`$sort` stage hits limit when not index-backed	`explain("executionStats")` sort stage `stage` value

Quick checks

Run these read-only commands to assess the current state without risking further impact.

# Check active aggregations and their run time
mongosh --quiet --eval 'db.currentOp({ active: true, "secs_running": { $gt: 10 } }).inprog.forEach(o => { if(o.command && o.command.aggregate) print(o.ns + " | " + o.secs_running + "s") })'

// Check the server default for disk spilling (6.0+)
db.adminCommand({ getParameter: 1, allowDiskUseByDefault: 1 })

# Search slow query log for memory limit errors or disk use
grep -iE "Exceeded memory limit|usedDisk" /var/log/mongodb/mongod.log | tail -10

// Query the system profiler for recent disk spills
db.system.profile.find({ usedDisk: true }).sort({ ts: -1 }).limit(5)

// Check command latency for aggregation tail latency
var l = db.serverStatus().opLatencies;
print("Command avg µs:", Math.floor(l.commands.latency / l.commands.ops));

// Check WiredTiger cache dirty ratio for I/O pressure
var c = db.serverStatus().wiredTiger.cache;
var dirty = 100 * c["tracked dirty bytes in the cache"] / c["maximum bytes configured"];
print("Cache dirty %:", dirty.toFixed(1));

# Check disk space on the data volume
df -h /data/db

// Check queue depth to see if spills are causing contention
var q = db.serverStatus().globalLock.currentQueue;
print("Queued readers:", q.readers, "writers:", q.writers);

How to diagnose it

Confirm the error and the failing stage. Check application logs and the MongoDB log for “Exceeded memory limit”. Identify whether the failing stage is $group, $sort, or another stage. Note the error code: 16945 for $group, 16819 for $sort.
Check whether allowDiskUse is enabled. Run getParameter for allowDiskUseByDefault on 6.0+. On earlier versions, the server default is false. If the client explicitly passed { allowDiskUse: false }, the pipeline will abort instead of spilling.
Review the pipeline shape. Use db.currentOp() to find active aggregations. Look for stages before $group or $sort. If there is no selective $match near the start, the stage may be processing the entire collection.
Run explain("executionStats"). Look for COLLSCAN, high docsExamined, and whether a $sort is using an index. An index-backed sort does not count toward the 100MB limit. If the sort stage shows a blocking SORT instead of an index stage, it is consuming the memory budget.
Inspect the profiler for usedDisk. If usedDisk: true appears, the pipeline is spilling. Correlate the timestamp with disk I/O latency and WiredTiger cache pressure to determine if the spill is saturating storage.
Check for non-spill accumulators. If the error persists despite allowDiskUse: true, inspect the $group stage for $push or $addToSet. These accumulators cannot spill. The fix is pipeline restructuring, not a flag change.
Verify driver and framework defaults. Some drivers and ORMs override the server default.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`system.profile` `usedDisk`	Detects aggregations writing temporary spill files	Any `usedDisk: true` on production query patterns
`opLatencies.commands`	Aggregations run as commands; rising latency reveals spill cost	Command avg or tail latency spikes during batch windows
`wiredTiger.cache` dirty ratio	Disk spills compete with checkpoint I/O and raise dirty data	Dirty ratio sustained above 10% during aggregation peaks
`wiredTiger.log` sync latency	Temp-file I/O and journal pressure show up as sync latency	Average sync latency above 30 ms sustained
`globalLock.currentQueue`	Slow disk spills hold tickets and cause queuing	`readers` + `writers` sustained above 20
`metrics.document.returned`	Unfiltered pipelines examine and return excessive documents	`returned` rate spikes without a matching rise in `opcounters.query`

Fixes

Enable or verify `allowDiskUse`

Before MongoDB 6.0, pass { allowDiskUse: true } in the aggregation command to allow eligible stages to spill. On MongoDB 6.0+, the server defaults to true, but explicitly passing { allowDiskUse: false } opts out. Some application frameworks override the default to false. Check your driver documentation and set the flag explicitly to match your intent.

In a sharded cluster, remember that allowDiskUse: true can move the merge stage to a randomly selected shard, concentrating I/O there.

Reduce stage input with early filtering

Move $match stages as early as possible in the pipeline to reduce the document count entering $group or $sort. If your use case allows it, add a $limit before the heavy stage. Every document that reaches the stage consumes part of the 100MB budget.

Ensure `$sort` is index-backed

An index-backed $sort bypasses the 100MB memory restriction entirely because the sort is performed during the index traversal. If explain shows a blocking SORT stage, add an index that matches the sort fields and direction. If the query also has equality filters, place those fields before the sort fields in the index.

Replace accumulators that cannot spill

If you are using $push or $addToSet inside $group and the grouped arrays grow large, allowDiskUse will not help. Restructure the pipeline to return grouped identifiers without building the full arrays server-side, or move the array assembly to the application layer. For very large grouping operations, consider whether map-reduce or an out-of-band process is more appropriate.

Address disk spill latency

If usedDisk is true and latency is unacceptable, enabling the flag is only a stopgap. The real fix is reducing the data volume the stage must process. If the pipeline is already optimized, the bottleneck is disk throughput. Ensure your storage layer can handle the temporary I/O without degrading journal sync latency or checkpoint performance. On cloud block storage, watch for burst-credit depletion during spill-heavy windows.

Prevention

Restructure pipelines to filter early. A $match at the start of the pipeline reduces the document count entering $group and $sort.
Index sort fields. An index-backed $sort does not count toward the 100MB limit.
Monitor profiler for usedDisk. Any production aggregation writing temp files is a candidate for pipeline optimization.
Set explicit allowDiskUse in driver code. Do not rely on driver defaults; declare the intent explicitly to avoid framework overrides.
Size disks for temp file overhead. Spilled aggregations write temporary files to the instance storage. Ensure the volume has enough free space to avoid secondary failures.

How Netdata helps

Netdata collects serverStatus metrics including opLatencies.commands and wiredTiger.cache dirty ratio. Disk latency and utilization charts show temp-file I/O saturation. Alerts on WiredTiger cache dirty ratio and application-thread evictions flag cache pressure cascades. Queue depth and ticket utilization metrics reveal when slow aggregations consume concurrency tickets.

The Netdata solution

MongoDB monitoring with Netdata

Netdata monitors MongoDB with per-second metrics and automatic dashboards. Watch WiredTiger cache pressure, oplog window, connection counts, checkpoint stalls, and replication health in one place, correlated with the underlying host.

See MongoDB monitoring → Start monitoring free

MongoDB exceeded memory limit for $group — aggregation spills and allowDiskUse

MongoDB exceeded memory limit for $group — aggregation spills and allowDiskUse

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Enable or verify allowDiskUse

Reduce stage input with early filtering

Ensure $sort is index-backed

Replace accumulators that cannot spill

Address disk spill latency

Prevention

How Netdata helps

Related guides

MongoDB monitoring with Netdata

Enable or verify `allowDiskUse`

Ensure `$sort` is index-backed