MongoDB operation exceeded time limit (MaxTimeMSExpired): maxTimeMS and killed operations
Error code 50, MaxTimeMSExpired, means the server killed an operation that exceeded its processing budget. Raising the timeout without fixing the root cause turns acute failures into chronic resource exhaustion. The operation was already pathologically slow; maxTimeMS ended it before it consumed more resources or held locks and tickets indefinitely.
maxTimeMS sets a cumulative processing budget in milliseconds. MongoDB enforces it using the same interrupt mechanism as killOp, terminating the operation only at designated interrupt points. Idle time between cursor batches does not count toward the limit, and on direct connections network latency is excluded from the server-side clock. On sharded clusters, however, latency between mongos and shard mongod instances counts against the limit. Distinguish a true MaxTimeMSExpired from a client-side socket timeout, where the client gives up before the server responds.
What this means
MaxTimeMSExpired releases whatever resources the operation held: WiredTiger read or write tickets, cache space, and locks. The operation may have been scanning millions of documents or pinning an old snapshot.
This error is a symptom. Raising maxTimeMS without fixing the underlying slowness converts acute failures into chronic resource exhaustion. Long-running operations can block eviction and trigger cache pressure cascades. Find why the operation was slow and fix that.
flowchart TD
A[Client sees MaxTimeMSExpired] --> B{Server-side or socket timeout?}
B -->|Error code 50| C[Server killed operation]
B -->|Network exception| D[Client timed out first]
C --> E[currentOp shows long-running op]
E --> F{Slow query plan?}
F -->|COLLSCAN or bad IXSCAN| G[Missing index or plan regression]
F -->|Plan is good| H[System saturation]
H --> I[Cache dirty ratio high or tickets exhausted]
D --> J[Raise socketTimeoutMS above maxTimeMS]
G --> K[Build index or fix query]
I --> L[Kill runaway ops or reduce load]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Missing or dropped index | Slow query log shows COLLSCAN or keysExamined:docsReturned > 100:1 | db.collection.getIndexes() and compare to query predicates |
| Query plan regression | Query was fast yesterday, slow today; same shape, different plan | explain("executionStats") or plan cache state |
| Cache pressure or ticket exhaustion | opLatencies spiking for all operations, not just one; app-thread evictions rising | serverStatus().wiredTiger.cache and concurrentTransactions |
Runaway aggregation or large $lookup | currentOp shows aggregate with huge docsReturned or long secs_running | db.currentOp({ "active": true, "secs_running": { "$gt": 60 } }) |
| Heavy load on secondary | Timeouts appear only on secondary reads while primary is healthy | rs.status() for lag and serverStatus().flowControl |
| Long-lived cursor with high maxTimeMS | Cursor killed after extended runtime despite high limit | Session idle lifetime and metrics.cursor |
Quick checks
These are read-only unless otherwise noted.
# Check operations running longer than 10 seconds
mongosh --quiet --eval 'db.currentOp({ "active": true, "secs_running": { "$gt": 10 } }).inprog.forEach(function(op) { print(op.opid + " | " + op.op + " | " + op.secs_running + "s | " + op.ns + " | " + JSON.stringify(op.command || {}).substring(0,120)); })'
# Tail the slow query log for recent timeouts
grep -E "MaxTimeMSExpired|Slow query" /var/log/mongodb/mongod.log | tail -20
// Check WiredTiger cache pressure and dirty ratio
var c = db.serverStatus().wiredTiger.cache;
print("Cache fill: " + (100 * c["bytes currently in the cache"] / c["maximum bytes configured"]).toFixed(1) + "%");
print("Dirty ratio: " + (100 * c["tracked dirty bytes in the cache"] / c["maximum bytes configured"]).toFixed(1) + "%");
// Check available read and write tickets
var t = db.serverStatus().wiredTiger.concurrentTransactions;
print("Read tickets available: " + t.read.available + " / " + t.read.totalTickets);
print("Write tickets available: " + t.write.available + " / " + t.write.totalTickets);
// Check replication lag if secondaries are timing out
rs.printSecondaryReplicationInfo()
// Sample the system profiler for slow operations
db.system.profile.find().sort({ ts: -1 }).limit(5).forEach(function(doc) { print(doc.ts + " | " + doc.ns + " | " + doc.millis + "ms | " + doc.planSummary); });
How to diagnose it
Confirm it is server-side. A
MaxTimeMSExpiredresponse includes error code50and"codeName": "MaxTimeMSExpired". Client-side socket timeouts manifest as network exceptions in the driver without a MongoDB error code. IfsocketTimeoutMSequalsmaxTimeMS, the client may give up before the server returns the error, masking the root cause.Capture the operation in
currentOp. Run thecurrentOpquery from the quick checks. Look for:- High
secs_running waitingForLock: trueop: "query"or"command"with aggregation stages- Large
docsExaminedvsdocsReturnedratios in the slow log
- High
Correlate with the slow query log. Filter for the same
ns(namespace) and time window. Key ratios:keysExamined / docsReturnedshould be near 1:1 for indexed queries. A ratio of 100:1 indicates a badly targeted index scan.docsExamined / docsReturnednear 1:1 is healthy. 1000:1 means nearly every document examined was discarded, typical of a missing index or a collection scan.
Check for system-wide pressure. If many unrelated operations are timing out, look at:
- WiredTiger cache dirty ratio > 15%
- Application-thread evictions incrementing
- Available tickets below 25% of total
- Queue depths (
globalLock.currentQueue) sustained above 20 If these are elevated, the root cause is saturation, not a single bad query.
Check replication state for secondary timeouts. If reads with
secondaryPreferredare failing while primary reads succeed, check replication lag. A secondary under heavy oplog application load may be slow to respond. Also verify the secondary is not inRECOVERING.Inspect cursors. If the timed-out operation is a long-running analytical cursor, check
db.serverStatus().metrics.cursor. IfnoTimeoutcursors are high, or if the session has been idle, the operation may have been killed by the session idle timeout rather thanmaxTimeMS.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Slow query rate | Directly precedes MaxTimeMSExpired spikes | Sustained increase above baseline |
docsExamined:docsReturned ratio | Reveals wasted work per operation | Ratio > 100:1 for OLTP queries |
| WiredTiger cache dirty ratio | Dirty data accumulation causes checkpoint stalls and global slowdown | > 10% sustained |
| Application-thread evictions | Indicates background eviction cannot keep up; latency spikes follow | Any sustained nonzero rate |
| Available read/write tickets | Ticket exhaustion makes all operations queue | < 25% of total available |
currentOp max operation age | Catches runaway queries before they cascade | Any non-background op > 300s |
| Replication lag | Explains secondary-only timeouts | > 10s sustained or > 25% of oplog window |
opcounters throughput | Sudden drop suggests global blocking | > 50% drop from baseline |
Fixes
Fix the query, not the timeout
If currentOp and the slow log show a collection scan or an inefficient index scan, add or restore the correct index. Use background builds to avoid locking:
// Safe: builds in background
db.collection.createIndex({ field: 1 }, { background: true });
If the query planner has regressed, evict the bad plan from the cache or force an index with hint() as a temporary measure. Compare the winning plan in explain("executionStats") to the expected index.
Reduce resource consumption
For aggregations that time out due to data volume:
- Push
$matchstages as early as possible in the pipeline. - Use
$projector aggregation$unsetto reduce document size. - Add
$limitif the application only needs a subset. - For large
$lookupoperations, ensure the foreign collection has an index on the localField/foreignField.
Kill and reroute
If an operation is already running and blocking others, kill it:
db.killOp(<opid>)
Warning: killOp is best-effort and may not terminate immediately. Killing a write operation may leave multi-document writes partially completed. After killing a long-running write, verify data consistency in the affected collection.
If the workload is legitimate but heavy, move it to a hidden secondary or an analytics node, or schedule it during low-traffic windows.
Address saturation
If the root cause is cache pressure or ticket exhaustion:
- Pause batch jobs or bulk imports to reduce write pressure.
- Kill unnecessary long-running transactions or
noCursorTimeoutcursors that pin snapshots. - Check storage health with
iostat -x 1for elevatedawaitor%util. - If storage is degraded, step down the primary to shift writes to a healthier member.
Warning: Stepping down the primary triggers an election and interrupts writes. Use only during a maintenance window or confirmed storage degradation.
Prevention
- Monitor slow query trends, not just max age. A query that drifts from 10 ms to 500 ms over a week will eventually hit any reasonable
maxTimeMS. Trend the 95th percentile of the slow query log. - Set operation-class timeouts. OLTP reads should have a tight
maxTimeMS(for example, 5 seconds). Long analytical queries can have a higher limit, but only if the query is efficient and the infrastructure can support it. - Audit indexes after every deployment. Use
$indexStatsto confirm critical indexes are being used. If a key index shows zero operations after restart, investigate before the plan cache warms with a bad plan. - Keep headroom in cache and tickets. Operate WiredTiger cache below 80% fill and below 5% dirty during peak. Keep available tickets above 25% of total. These margins absorb transient slowdowns without cascading into timeouts.
How Netdata helps
- Correlate
MaxTimeMSExpiredspikes with per-secondopLatencies, scanned/returned ratios, and slow query rates to distinguish a single bad query from global pressure. - Alert on WiredTiger cache dirty ratio and application-thread evictions before they drive operations into timeout.
- Surface ticket utilization and queue depth to catch storage engine saturation before queries start dying.
- Track
currentOpage and replication lag to catch secondary-side timeouts early.
Related guides
- How MongoDB actually works in production: a mental model for operators
- MongoDB pages evicted by application threads: when eviction becomes user latency
- MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches
- MongoDB WiredTiger cache pressure cascade: eviction stalls and latency spikes
- MongoDB cache too small: sizing the WiredTiger cache for your working set
- MongoDB checkpoint duration climbing: diagnosing slow WiredTiger checkpoints
- MongoDB checkpoint stall write freeze: when all writes stop with no error
- MongoDB connection churn: high totalCreated rate and thread creation overhead
- MongoDB connection refused at maxIncomingConnections: hitting the connection ceiling
- MongoDB connection storm spiral: reconnection floods after an election or deploy
- MongoDB exceeded memory limit for $group — aggregation spills and allowDiskUse
- MongoDB flow control throttling writes: when the primary slows itself down







