$ guides / mongodb / mongodb-operation-exceeded-time-limit ▌

Operations Guides

MongoDB operation exceeded time limit (MaxTimeMSExpired): maxTimeMS and killed operations

Error code 50, MaxTimeMSExpired, means the server killed an operation that exceeded its processing budget. Raising the timeout without fixing the root cause turns acute failures into chronic resource exhaustion. The operation was already pathologically slow; maxTimeMS ended it before it consumed more resources or held locks and tickets indefinitely.

maxTimeMS sets a cumulative processing budget in milliseconds. MongoDB enforces it using the same interrupt mechanism as killOp, terminating the operation only at designated interrupt points. Idle time between cursor batches does not count toward the limit, and on direct connections network latency is excluded from the server-side clock. On sharded clusters, however, latency between mongos and shard mongod instances counts against the limit. Distinguish a true MaxTimeMSExpired from a client-side socket timeout, where the client gives up before the server responds.

What this means

MaxTimeMSExpired releases whatever resources the operation held: WiredTiger read or write tickets, cache space, and locks. The operation may have been scanning millions of documents or pinning an old snapshot.

This error is a symptom. Raising maxTimeMS without fixing the underlying slowness converts acute failures into chronic resource exhaustion. Long-running operations can block eviction and trigger cache pressure cascades. Find why the operation was slow and fix that.

flowchart TD
    A[Client sees MaxTimeMSExpired] --> B{Server-side or socket timeout?}
    B -->|Error code 50| C[Server killed operation]
    B -->|Network exception| D[Client timed out first]
    C --> E[currentOp shows long-running op]
    E --> F{Slow query plan?}
    F -->|COLLSCAN or bad IXSCAN| G[Missing index or plan regression]
    F -->|Plan is good| H[System saturation]
    H --> I[Cache dirty ratio high or tickets exhausted]
    D --> J[Raise socketTimeoutMS above maxTimeMS]
    G --> K[Build index or fix query]
    I --> L[Kill runaway ops or reduce load]

Common causes

Cause	What it looks like	First thing to check
Missing or dropped index	Slow query log shows `COLLSCAN` or `keysExamined:docsReturned` > 100:1	`db.collection.getIndexes()` and compare to query predicates
Query plan regression	Query was fast yesterday, slow today; same shape, different plan	`explain("executionStats")` or plan cache state
Cache pressure or ticket exhaustion	opLatencies spiking for all operations, not just one; app-thread evictions rising	`serverStatus().wiredTiger.cache` and `concurrentTransactions`
Runaway aggregation or large `$lookup`	`currentOp` shows `aggregate` with huge `docsReturned` or long `secs_running`	`db.currentOp({ "active": true, "secs_running": { "$gt": 60 } })`
Heavy load on secondary	Timeouts appear only on secondary reads while primary is healthy	`rs.status()` for lag and `serverStatus().flowControl`
Long-lived cursor with high maxTimeMS	Cursor killed after extended runtime despite high limit	Session idle lifetime and `metrics.cursor`

Quick checks

These are read-only unless otherwise noted.

# Check operations running longer than 10 seconds
mongosh --quiet --eval 'db.currentOp({ "active": true, "secs_running": { "$gt": 10 } }).inprog.forEach(function(op) { print(op.opid + " | " + op.op + " | " + op.secs_running + "s | " + op.ns + " | " + JSON.stringify(op.command || {}).substring(0,120)); })'

# Tail the slow query log for recent timeouts
grep -E "MaxTimeMSExpired|Slow query" /var/log/mongodb/mongod.log | tail -20

// Check WiredTiger cache pressure and dirty ratio
var c = db.serverStatus().wiredTiger.cache;
print("Cache fill: " + (100 * c["bytes currently in the cache"] / c["maximum bytes configured"]).toFixed(1) + "%");
print("Dirty ratio: " + (100 * c["tracked dirty bytes in the cache"] / c["maximum bytes configured"]).toFixed(1) + "%");

// Check available read and write tickets
var t = db.serverStatus().wiredTiger.concurrentTransactions;
print("Read tickets available: " + t.read.available + " / " + t.read.totalTickets);
print("Write tickets available: " + t.write.available + " / " + t.write.totalTickets);

// Check replication lag if secondaries are timing out
rs.printSecondaryReplicationInfo()

// Sample the system profiler for slow operations
db.system.profile.find().sort({ ts: -1 }).limit(5).forEach(function(doc) { print(doc.ts + " | " + doc.ns + " | " + doc.millis + "ms | " + doc.planSummary); });

How to diagnose it

Confirm it is server-side. A MaxTimeMSExpired response includes error code 50 and "codeName": "MaxTimeMSExpired". Client-side socket timeouts manifest as network exceptions in the driver without a MongoDB error code. If socketTimeoutMS equals maxTimeMS, the client may give up before the server returns the error, masking the root cause.
Capture the operation in currentOp. Run the currentOp query from the quick checks. Look for:
- High secs_running
- waitingForLock: true
- op: "query" or "command" with aggregation stages
- Large docsExamined vs docsReturned ratios in the slow log
Correlate with the slow query log. Filter for the same ns (namespace) and time window. Key ratios:
- keysExamined / docsReturned should be near 1:1 for indexed queries. A ratio of 100:1 indicates a badly targeted index scan.
- docsExamined / docsReturned near 1:1 is healthy. 1000:1 means nearly every document examined was discarded, typical of a missing index or a collection scan.
Check for system-wide pressure. If many unrelated operations are timing out, look at:
- WiredTiger cache dirty ratio > 15%
- Application-thread evictions incrementing
- Available tickets below 25% of total
- Queue depths (globalLock.currentQueue) sustained above 20 If these are elevated, the root cause is saturation, not a single bad query.
Check replication state for secondary timeouts. If reads with secondaryPreferred are failing while primary reads succeed, check replication lag. A secondary under heavy oplog application load may be slow to respond. Also verify the secondary is not in RECOVERING.
Inspect cursors. If the timed-out operation is a long-running analytical cursor, check db.serverStatus().metrics.cursor. If noTimeout cursors are high, or if the session has been idle, the operation may have been killed by the session idle timeout rather than maxTimeMS.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Slow query rate	Directly precedes MaxTimeMSExpired spikes	Sustained increase above baseline
`docsExamined:docsReturned` ratio	Reveals wasted work per operation	Ratio > 100:1 for OLTP queries
WiredTiger cache dirty ratio	Dirty data accumulation causes checkpoint stalls and global slowdown	> 10% sustained
Application-thread evictions	Indicates background eviction cannot keep up; latency spikes follow	Any sustained nonzero rate
Available read/write tickets	Ticket exhaustion makes all operations queue	< 25% of total available
`currentOp` max operation age	Catches runaway queries before they cascade	Any non-background op > 300s
Replication lag	Explains secondary-only timeouts	> 10s sustained or > 25% of oplog window
`opcounters` throughput	Sudden drop suggests global blocking	> 50% drop from baseline

Fixes

Fix the query, not the timeout

If currentOp and the slow log show a collection scan or an inefficient index scan, add or restore the correct index. Use background builds to avoid locking:

// Safe: builds in background
db.collection.createIndex({ field: 1 }, { background: true });

If the query planner has regressed, evict the bad plan from the cache or force an index with hint() as a temporary measure. Compare the winning plan in explain("executionStats") to the expected index.

Reduce resource consumption

For aggregations that time out due to data volume:

Push $match stages as early as possible in the pipeline.
Use $project or aggregation $unset to reduce document size.
Add $limit if the application only needs a subset.
For large $lookup operations, ensure the foreign collection has an index on the localField/foreignField.

Kill and reroute

If an operation is already running and blocking others, kill it:

db.killOp(<opid>)

Warning: killOp is best-effort and may not terminate immediately. Killing a write operation may leave multi-document writes partially completed. After killing a long-running write, verify data consistency in the affected collection.

If the workload is legitimate but heavy, move it to a hidden secondary or an analytics node, or schedule it during low-traffic windows.

Address saturation

If the root cause is cache pressure or ticket exhaustion:

Pause batch jobs or bulk imports to reduce write pressure.
Kill unnecessary long-running transactions or noCursorTimeout cursors that pin snapshots.
Check storage health with iostat -x 1 for elevated await or %util.
If storage is degraded, step down the primary to shift writes to a healthier member.

Warning: Stepping down the primary triggers an election and interrupts writes. Use only during a maintenance window or confirmed storage degradation.

Prevention

Monitor slow query trends, not just max age. A query that drifts from 10 ms to 500 ms over a week will eventually hit any reasonable maxTimeMS. Trend the 95th percentile of the slow query log.
Set operation-class timeouts. OLTP reads should have a tight maxTimeMS (for example, 5 seconds). Long analytical queries can have a higher limit, but only if the query is efficient and the infrastructure can support it.
Audit indexes after every deployment. Use $indexStats to confirm critical indexes are being used. If a key index shows zero operations after restart, investigate before the plan cache warms with a bad plan.
Keep headroom in cache and tickets. Operate WiredTiger cache below 80% fill and below 5% dirty during peak. Keep available tickets above 25% of total. These margins absorb transient slowdowns without cascading into timeouts.

How Netdata helps

Correlate MaxTimeMSExpired spikes with per-second opLatencies, scanned/returned ratios, and slow query rates to distinguish a single bad query from global pressure.
Alert on WiredTiger cache dirty ratio and application-thread evictions before they drive operations into timeout.
Surface ticket utilization and queue depth to catch storage engine saturation before queries start dying.
Track currentOp age and replication lag to catch secondary-side timeouts early.

The Netdata solution

MongoDB monitoring with Netdata

Netdata monitors MongoDB with per-second metrics and automatic dashboards. Watch WiredTiger cache pressure, oplog window, connection counts, checkpoint stalls, and replication health in one place, correlated with the underlying host.

See MongoDB monitoring → Start monitoring free

MongoDB operation exceeded time limit (MaxTimeMSExpired): maxTimeMS and killed operations

MongoDB operation exceeded time limit (MaxTimeMSExpired): maxTimeMS and killed operations

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Fix the query, not the timeout

Reduce resource consumption

Kill and reroute

Address saturation

Prevention

How Netdata helps

Related guides

MongoDB monitoring with Netdata