MongoDB slow query COLLSCAN: collection scans and the missing index
Queries that used to return in tens of milliseconds now breach application timeouts. Read latency climbs while write throughput stays flat. In the MongoDB slow query log, you see planSummary: "COLLSCAN" attached to operations that should be indexed. A collection scan reads documents that will never be returned, wastes disk I/O, floods the WiredTiger cache with irrelevant data, and holds read tickets until the whole instance cascades into cache pressure.
The severity scales with collection size and the docsExamined:docsReturned ratio. A ratio of 1:1 is ideal. A ratio of 1000:1 means 999 of every 1000 documents read were discarded after comparison. On a collection with millions of documents, that is an active incident.
MongoDB surfaces collection scans in two places: the slow query log and the query planner. The default slowms threshold is 100 milliseconds. Operations exceeding this are written to the diagnostic log with their plan summary. The system profiler captures the same data when profiling is enabled.
A COLLSCAN stage means MongoDB examined every document in the collection to answer the query. This happens when the query planner decides no existing index can satisfy the predicate efficiently. You will see it in explain("executionStats") output under winningPlan.stage, or in the planSummary field of slow query log entries and system.profile documents.
When a collection scan runs, two efficiency ratios degrade together. docsExamined:docsReturned climbs far above 1:1 because the server fetches and filters documents that do not match. keysExamined:docsReturned degrades as well. For a pure collection scan, keysExamined is zero while docsExamined is large. Both are primary signals of wasted work.
You can capture more detail by enabling profiling level 1 with db.setProfilingLevel(1, { slowms: 100 }). Profiling level 2 logs every operation and ignores user-provided slowms or filter values. It floods I/O on busy collections, so avoid it in production.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Missing index on query filter fields | planSummary: "COLLSCAN", keysExamined: 0, docsExamined far exceeds docsReturned | db.collection.getIndexes() compared to the query predicate |
| Accidentally dropped index | Queries that were fast are now slow; $indexStats shows zero ops for a previously used index since restart | db.collection.getIndexes() against your schema documentation |
| Query plan regression | Same query shape is slow after a restart or plan cache eviction | explain("executionStats") comparing current and expected plans |
$or with mixed index coverage | Some branches use indexes but others do not, forcing a full collection scan | explain("queryPlanner") on each branch of the $or |
$where or unanchored regex | JavaScript predicates and unanchored regular expressions cannot use indexes | Query source code for $where and leading-wildcard patterns |
Explicit hint: { $natural: 1 } | Application code forces the planner to ignore indexes | Code search for $natural hints |
Quick checks
# Check slow query log for collection scan entries
grep "COLLSCAN" /var/log/mongodb/mongod.log | tail -20
# Query the profiler for recent COLLSCAN operations
mongosh --quiet --eval 'db.system.profile.find({ planSummary: { $regex: "COLLSCAN" } }).sort({ millis: -1 }).limit(5).forEach(doc => print(JSON.stringify(doc)))'
# List current indexes on the affected collection
mongosh --quiet --eval 'db.collection.getIndexes().forEach(function(i) { print(i.name, JSON.stringify(i.key)); })'
# Check index usage statistics since last restart
mongosh --quiet --eval 'db.collection.aggregate([{ $indexStats: {} }]).forEach(function(i) { print(i.name, i.accesses.ops); })'
# Run explain with execution stats for a suspect query shape
mongosh --quiet --eval 'db.collection.find({ status: "pending" }).explain("executionStats")'
# Find long-running operations that may be collection scans
mongosh --quiet --eval 'db.currentOp({ active: true, secs_running: { $gt: 10 } }).inprog.forEach(function(op) { print(op.opid, op.ns, op.planSummary || "N/A", op.secs_running + "s"); })'
How to diagnose
- Confirm the scan. Run
explain("executionStats")on the slow query, or check the slow query log forplanSummary: "COLLSCAN". Verify thatwinningPlan.stageisCOLLSCANand not anIXSCANwith a high-cost fetch stage. If the plan showsIXSCANbut the query is still slow, the issue may be index selectivity, not a missing index. - Measure the waste. Divide
executionStats.totalDocsExaminedbyexecutionStats.nReturned. A ratio above 100:1 indicates severe inefficiency. IftotalKeysExaminedis zero whiletotalDocsExaminedis large, you have a pure collection scan. - Verify index existence. Run
db.collection.getIndexes()and compare the index keys to the query predicate. Check for exact matches and prefix matches on compound indexes. A compound index{a: 1, b: 1}can satisfy a query on{a: ...}but not a query on{b: ...}alone. - Check index usage. Run
db.collection.aggregate([{ $indexStats: {} }]). If an expected index shows zero operations since the last process restart, the planner is not choosing it. This can happen after an accidentaldropIndex, a failed background index build, or a plan cache shift. - Inspect the query shape. Look for
$whereclauses, unanchored regular expressions, or$orexpressions where some branches lack index coverage.$wherewith JavaScript forces a collection scan because the server cannot evaluate the predicate via an index. Unanchored regex patterns also cannot use an index. With$or, each branch must independently satisfy an index, or the whole expression falls back to a collection scan. - Check for forced scans. Search application code and logs for
hint: { $natural: 1 }, which explicitly bypasses index selection and forces the planner to read the collection in natural order. - Correlate with system pressure. Rising read latencies in
opLatencies, increasing WiredTiger cache fill, and nonzero application-thread evictions suggest the scan is hurting the whole instance. A single long-running collection scan can hold a read ticket for minutes and cascade into queue buildup.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
docsExamined:docsReturned ratio | Measures how many documents are read and then discarded | Sustained ratio above 100:1 |
keysExamined:docsReturned ratio | Shows whether an index is narrowing the search | keysExamined at zero while docsExamined exceeds docsReturned |
| COLLSCAN rate in slow query log | Direct evidence of collection scans on production paths | Any occurrence on collections above 10,000 documents |
| WiredTiger cache fill ratio | Collection scans load excess documents into cache, displacing the working set | Sustained increase above 80% with rising eviction rates |
| Application-thread evictions | Forced eviction adds latency to all operations | Nonzero sustained rate |
opLatencies read p99 | End-user impact of inefficient queries | p99 read latency doubling from baseline for more than 5 minutes |
Fixes
Missing index
If getIndexes() shows no index that matches the query predicate, create one. Use a background build to avoid blocking the collection:
db.collection.createIndex({ status: 1 }, { background: true })
Background builds still consume disk I/O and CPU. Run them during lower traffic when possible. After the build completes, verify the new plan with explain("executionStats") and confirm winningPlan.stage is IXSCAN.
Dropped or unused index
If an index existed but $indexStats shows zero usage since restart, confirm it is still present with getIndexes(). If it was accidentally dropped, rebuild it with a background build. If it exists but the planner ignores it, review the query predicate for selectivity issues. A highly non-selective index can be slower than a scan if the planner estimates it will return most of the collection.
Query plan regression
If explain() shows the planner choosing COLLSCAN despite a usable index, the plan cache may have shifted. Flush it as a temporary workaround:
db.collection.getPlanCache().clear()
For a persistent fix, review the query predicate for selectivity issues. If the query returns a large fraction of the collection, the planner may correctly choose a scan. Adding a more selective predicate or a compound index can change the calculation.
Predicate problems and forced scans
Remove $where clauses and replace them with standard query operators. Anchor regular expressions at the start of the string so they can use an index. Restructure $or expressions so every branch can use an index. If application code contains hint: { $natural: 1 }, remove it. If you must use a hint as a temporary workaround, document it with an expiration date and monitor it.
Kill runaway operations
For immediate relief, identify the operation ID with db.currentOp() and terminate it with db.killOp(opid). This interrupts the scan but does not fix the underlying index or query shape. Killing reads is safe. Killing multi-document writes that are not inside a transaction can leave data partially updated.
Prevention
- Review slow query logs regularly for new
COLLSCANpatterns before they become critical. - Track
docsExamined:docsReturnedandkeysExamined:docsReturnedratios in staging and load tests for every new query shape. - Enable profiling level 1 with a conservative
slowmsthreshold. On busy collections, consider loweringsampleRateto reduce profiler overhead. Avoid profiling level 2 in production because it floods I/O. - Monitor
$indexStatsafter every deployment to detect sudden drops in index usage. - Use
notablescanonly in development and testing environments. It prevents collection scans globally, including administrative queries, and is not safe for production workloads.
How Netdata helps
- Correlate MongoDB slow query metrics with host-level CPU, disk I/O, and memory pressure to distinguish a single bad query from systemic storage degradation.
- Monitor WiredTiger cache fill and dirty ratios alongside eviction rates to detect when collection scans are pushing the cache into application-thread eviction.
- Track
opLatenciesread histograms to catch tail latency increases caused by scans before applications time out. - Alert on sustained deviations in query efficiency ratios without needing continuous polling of
system.profile.
Related guides
- How MongoDB actually works in production: a mental model for operators
- MongoDB pages evicted by application threads: when eviction becomes user latency
- MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches
- MongoDB WiredTiger cache pressure cascade: eviction stalls and latency spikes
- MongoDB cache too small: sizing the WiredTiger cache for your working set
- MongoDB checkpoint duration climbing: diagnosing slow WiredTiger checkpoints
- MongoDB checkpoint stall write freeze: when all writes stop with no error
- MongoDB connection churn: high totalCreated rate and thread creation overhead
- MongoDB connection refused at maxIncomingConnections: hitting the connection ceiling
- MongoDB connection storm spiral: reconnection floods after an election or deploy
- MongoDB flow control throttling writes: when the primary slows itself down
- MongoDB journal sync latency high: the storage signal that warns 60 seconds early







