MongoDB slow query COLLSCAN: collection scans and the missing index

Queries that used to return in tens of milliseconds now breach application timeouts. Read latency climbs while write throughput stays flat. In the MongoDB slow query log, you see planSummary: "COLLSCAN" attached to operations that should be indexed. A collection scan reads documents that will never be returned, wastes disk I/O, floods the WiredTiger cache with irrelevant data, and holds read tickets until the whole instance cascades into cache pressure.

The severity scales with collection size and the docsExamined:docsReturned ratio. A ratio of 1:1 is ideal. A ratio of 1000:1 means 999 of every 1000 documents read were discarded after comparison. On a collection with millions of documents, that is an active incident.

MongoDB surfaces collection scans in two places: the slow query log and the query planner. The default slowms threshold is 100 milliseconds. Operations exceeding this are written to the diagnostic log with their plan summary. The system profiler captures the same data when profiling is enabled.

A COLLSCAN stage means MongoDB examined every document in the collection to answer the query. This happens when the query planner decides no existing index can satisfy the predicate efficiently. You will see it in explain("executionStats") output under winningPlan.stage, or in the planSummary field of slow query log entries and system.profile documents.

When a collection scan runs, two efficiency ratios degrade together. docsExamined:docsReturned climbs far above 1:1 because the server fetches and filters documents that do not match. keysExamined:docsReturned degrades as well. For a pure collection scan, keysExamined is zero while docsExamined is large. Both are primary signals of wasted work.

You can capture more detail by enabling profiling level 1 with db.setProfilingLevel(1, { slowms: 100 }). Profiling level 2 logs every operation and ignores user-provided slowms or filter values. It floods I/O on busy collections, so avoid it in production.

Common causes

CauseWhat it looks likeFirst thing to check
Missing index on query filter fieldsplanSummary: "COLLSCAN", keysExamined: 0, docsExamined far exceeds docsReturneddb.collection.getIndexes() compared to the query predicate
Accidentally dropped indexQueries that were fast are now slow; $indexStats shows zero ops for a previously used index since restartdb.collection.getIndexes() against your schema documentation
Query plan regressionSame query shape is slow after a restart or plan cache evictionexplain("executionStats") comparing current and expected plans
$or with mixed index coverageSome branches use indexes but others do not, forcing a full collection scanexplain("queryPlanner") on each branch of the $or
$where or unanchored regexJavaScript predicates and unanchored regular expressions cannot use indexesQuery source code for $where and leading-wildcard patterns
Explicit hint: { $natural: 1 }Application code forces the planner to ignore indexesCode search for $natural hints

Quick checks

# Check slow query log for collection scan entries
grep "COLLSCAN" /var/log/mongodb/mongod.log | tail -20
# Query the profiler for recent COLLSCAN operations
mongosh --quiet --eval 'db.system.profile.find({ planSummary: { $regex: "COLLSCAN" } }).sort({ millis: -1 }).limit(5).forEach(doc => print(JSON.stringify(doc)))'
# List current indexes on the affected collection
mongosh --quiet --eval 'db.collection.getIndexes().forEach(function(i) { print(i.name, JSON.stringify(i.key)); })'
# Check index usage statistics since last restart
mongosh --quiet --eval 'db.collection.aggregate([{ $indexStats: {} }]).forEach(function(i) { print(i.name, i.accesses.ops); })'
# Run explain with execution stats for a suspect query shape
mongosh --quiet --eval 'db.collection.find({ status: "pending" }).explain("executionStats")'
# Find long-running operations that may be collection scans
mongosh --quiet --eval 'db.currentOp({ active: true, secs_running: { $gt: 10 } }).inprog.forEach(function(op) { print(op.opid, op.ns, op.planSummary || "N/A", op.secs_running + "s"); })'

How to diagnose

  1. Confirm the scan. Run explain("executionStats") on the slow query, or check the slow query log for planSummary: "COLLSCAN". Verify that winningPlan.stage is COLLSCAN and not an IXSCAN with a high-cost fetch stage. If the plan shows IXSCAN but the query is still slow, the issue may be index selectivity, not a missing index.
  2. Measure the waste. Divide executionStats.totalDocsExamined by executionStats.nReturned. A ratio above 100:1 indicates severe inefficiency. If totalKeysExamined is zero while totalDocsExamined is large, you have a pure collection scan.
  3. Verify index existence. Run db.collection.getIndexes() and compare the index keys to the query predicate. Check for exact matches and prefix matches on compound indexes. A compound index {a: 1, b: 1} can satisfy a query on {a: ...} but not a query on {b: ...} alone.
  4. Check index usage. Run db.collection.aggregate([{ $indexStats: {} }]). If an expected index shows zero operations since the last process restart, the planner is not choosing it. This can happen after an accidental dropIndex, a failed background index build, or a plan cache shift.
  5. Inspect the query shape. Look for $where clauses, unanchored regular expressions, or $or expressions where some branches lack index coverage. $where with JavaScript forces a collection scan because the server cannot evaluate the predicate via an index. Unanchored regex patterns also cannot use an index. With $or, each branch must independently satisfy an index, or the whole expression falls back to a collection scan.
  6. Check for forced scans. Search application code and logs for hint: { $natural: 1 }, which explicitly bypasses index selection and forces the planner to read the collection in natural order.
  7. Correlate with system pressure. Rising read latencies in opLatencies, increasing WiredTiger cache fill, and nonzero application-thread evictions suggest the scan is hurting the whole instance. A single long-running collection scan can hold a read ticket for minutes and cascade into queue buildup.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
docsExamined:docsReturned ratioMeasures how many documents are read and then discardedSustained ratio above 100:1
keysExamined:docsReturned ratioShows whether an index is narrowing the searchkeysExamined at zero while docsExamined exceeds docsReturned
COLLSCAN rate in slow query logDirect evidence of collection scans on production pathsAny occurrence on collections above 10,000 documents
WiredTiger cache fill ratioCollection scans load excess documents into cache, displacing the working setSustained increase above 80% with rising eviction rates
Application-thread evictionsForced eviction adds latency to all operationsNonzero sustained rate
opLatencies read p99End-user impact of inefficient queriesp99 read latency doubling from baseline for more than 5 minutes

Fixes

Missing index

If getIndexes() shows no index that matches the query predicate, create one. Use a background build to avoid blocking the collection:

db.collection.createIndex({ status: 1 }, { background: true })

Background builds still consume disk I/O and CPU. Run them during lower traffic when possible. After the build completes, verify the new plan with explain("executionStats") and confirm winningPlan.stage is IXSCAN.

Dropped or unused index

If an index existed but $indexStats shows zero usage since restart, confirm it is still present with getIndexes(). If it was accidentally dropped, rebuild it with a background build. If it exists but the planner ignores it, review the query predicate for selectivity issues. A highly non-selective index can be slower than a scan if the planner estimates it will return most of the collection.

Query plan regression

If explain() shows the planner choosing COLLSCAN despite a usable index, the plan cache may have shifted. Flush it as a temporary workaround:

db.collection.getPlanCache().clear()

For a persistent fix, review the query predicate for selectivity issues. If the query returns a large fraction of the collection, the planner may correctly choose a scan. Adding a more selective predicate or a compound index can change the calculation.

Predicate problems and forced scans

Remove $where clauses and replace them with standard query operators. Anchor regular expressions at the start of the string so they can use an index. Restructure $or expressions so every branch can use an index. If application code contains hint: { $natural: 1 }, remove it. If you must use a hint as a temporary workaround, document it with an expiration date and monitor it.

Kill runaway operations

For immediate relief, identify the operation ID with db.currentOp() and terminate it with db.killOp(opid). This interrupts the scan but does not fix the underlying index or query shape. Killing reads is safe. Killing multi-document writes that are not inside a transaction can leave data partially updated.

Prevention

  • Review slow query logs regularly for new COLLSCAN patterns before they become critical.
  • Track docsExamined:docsReturned and keysExamined:docsReturned ratios in staging and load tests for every new query shape.
  • Enable profiling level 1 with a conservative slowms threshold. On busy collections, consider lowering sampleRate to reduce profiler overhead. Avoid profiling level 2 in production because it floods I/O.
  • Monitor $indexStats after every deployment to detect sudden drops in index usage.
  • Use notablescan only in development and testing environments. It prevents collection scans globally, including administrative queries, and is not safe for production workloads.

How Netdata helps

  • Correlate MongoDB slow query metrics with host-level CPU, disk I/O, and memory pressure to distinguish a single bad query from systemic storage degradation.
  • Monitor WiredTiger cache fill and dirty ratios alongside eviction rates to detect when collection scans are pushing the cache into application-thread eviction.
  • Track opLatencies read histograms to catch tail latency increases caused by scans before applications time out.
  • Alert on sustained deviations in query efficiency ratios without needing continuous polling of system.profile.