$ guides / mongodb / mongodb-silent-index-regression ▌

Operations Guides

MongoDB silent index regression: when a dropped index quietly becomes a collection scan

Read latency on the primary doubles while connection counts and write throughput stay flat. There are no election events or cache pressure alerts. Traffic is unchanged. Yet p99 read latency climbs until operations time out.

The slow query log shows queries that used to finish in milliseconds now taking seconds. The plans show COLLSCAN. An index that existed last week is gone, or the query planner switched to a less efficient index after a cache invalidation. Because queries still return correct results, the regression is silent until it becomes an outage.

What this means

MongoDB’s query planner evaluates available indexes for each query shape and caches the winning plan. When an index supporting that shape is dropped, or when a plan cache entry becomes stale, the planner may fall back to a collection scan (COLLSCAN). Unlike an outright failure, the query succeeds but examines every document to return a small result set.

Latency grows with collection size. A 10,000-document collection may tolerate a scan. A 10-million-document collection will not. Scans pull more data into the WiredTiger cache, displace the working set, and increase I/O pressure. The system reaches a tipping point where cache eviction and ticket contention amplify the regression into a cluster-wide latency spike.

flowchart TD
    A[Index dropped or planner regression] --> B[Query replans to COLLSCAN]
    B --> C[docsExamined rises with collection size]
    C --> D[Read latency increases gradually]
    D --> E[Cache fills with scanned pages]
    E --> F[Tipping point: I/O and ticket saturation]

Common causes

Cause	What it looks like	First thing to check
Accidental `dropIndexes` during maintenance	Slow query log shows `COLLSCAN` on collections that previously used an index	`db.collection.getIndexes()` compared to a known-good schema
Failed background index build	Index exists in catalog but is incomplete or ignored by planner	MongoDB logs for index build failures around the time latency changed
Query plan cache invalidation choosing a worse plan	Same query shape switches from `IXSCAN` to `COLLSCAN` or a less selective index	`explain("executionStats")` on the query shape to inspect the winning plan
Schema migration changing index selectivity	New field or type makes an existing index less effective; `keysExamined:docsReturned` degrades	Slow query log entries comparing `keysExamined` to `docsReturned`

Quick checks

Run these safe, read-only checks to confirm the regression.

# Recent collection scans in the log (requires slow operation logging)
grep "COLLSCAN" /var/log/mongodb/mongod.log | tail -20

// Query efficiency counters since server start
var qe = db.serverStatus().metrics.queryExecutor;
print("scanned (keysExamined): " + qe.scanned);
print("scannedObjects (docsExamined): " + qe.scannedObjects);

// Index access counters since server start
db.collection.aggregate([{ $indexStats: {} }]).forEach(function(i) {
  print(i.name + " | ops: " + i.accesses.ops + " | since: " + i.accesses.since);
});

// If profiling is enabled, compare docs examined to docs returned
db.system.profile.find().sort({ ts: -1 }).limit(20).forEach(function(p) {
  var ratio = p.docsExamined / Math.max(p.nreturned || 1, 1);
  print(p.ns + " | examined: " + p.docsExamined + " | returned: " + (p.nreturned || 0) + " | ratio: " + ratio.toFixed(0) + ":1");
});

// Current indexes on the affected collection
db.collection.getIndexes().forEach(function(i) {
  print(i.name + " | " + JSON.stringify(i.key));
});

// Average read latency in microseconds
var lat = db.serverStatus().opLatencies.reads;
print("Read avg (us): " + (lat.latency / lat.ops).toFixed(0));

// Long-running read operations
db.currentOp({ active: true, secs_running: { $gt: 10 }, op: "query" })

// Check winning plan for a suspect query shape
db.collection.explain("executionStats").find({ field: "value" })

How to diagnose it

Isolate read efficiency from load. Verify opLatencies.reads.latency is rising while opcounters.query and connections.current remain stable. This rules out a traffic spike or connection storm.
Inspect the slow query log. Look for planSummary: COLLSCAN or a high docsExamined:docsReturned ratio. A ratio above 100:1 on a large collection indicates the executor is discarding most examined documents.
Sample metrics.queryExecutor twice. Record scanned and scannedObjects, wait five minutes, then sample again. A rising ratio of scanned objects to queries means the executor is traversing more documents than it returns.
Run $indexStats on suspect collections. An index that previously served a query shape but shows zero ops since restart is either missing or bypassed. Compare accesses.since to the server uptime.
Verify index existence and state. Use db.collection.getIndexes(). If the expected index is gone, it was dropped. If it exists but is unused, check whether it is hidden or was left incomplete by a failed build.
Inspect the winning plan. Run explain("executionStats") on the affected query shape. If the plan shows COLLSCAN on a large collection, or an IXSCAN on a low-selectivity index followed by a large docsExamined, the planner has regressed. Note the executionTimeMillis and totalDocsExamined.
Check for plan cache pollution. Query the plan cache to see if a COLLSCAN plan is cached while a selective index exists. Clear the cache for the collection to force replanning before considering a restart.
Correlate with DDL events. Search MongoDB logs for dropIndexes, createIndexes, failed index builds, or collMod near the time the latency trend changed. DDL clears the plan cache and forces replanning.
Validate collection size. MongoDB may legitimately choose COLLSCAN for very small collections. Confirm the document count is high enough that a scan is pathological.
```
db.collection.estimatedDocumentCount()
```

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`opLatencies.reads.latency` / `opLatencies.reads.ops`	Server-side read latency trend	Average doubles from baseline for more than 5 minutes with no corresponding write or connection spike
`metrics.queryExecutor.scanned` / `scannedObjects`	Query executor efficiency	Sustained increase in the ratio without a workload change
Slow query log `COLLSCAN` rate	Direct evidence of collection scans	New `COLLSCAN` entries on collections with more than 100,000 documents
`wiredTiger.cache.pages read into cache`	I/O volume driven by scans	Sustained increase correlating with `COLLSCAN` onset in logs
`$indexStats.ops` per index	Whether expected indexes remain active	A previously used index drops to zero ops after a restart or DDL event
`globalLock.currentQueue.readers`	Read queuing caused by slow scans	Sustained queue depth greater than 20 correlating with scan onset

Fixes

Force a query hint (immediate mitigation)

If the correct index exists but the planner is ignoring it, apply a query hint to buy time while you fix the root cause.

db.collection.find({ field: "value" }).hint({ field: 1 })

Warning: Hints bypass the planner entirely. If the hinted index cannot support the query, the operation fails. Remove the hint after the root cause is resolved so future index additions are considered.

Recreate a missing index

If getIndexes() shows the index is gone, rebuild it with a background build.

db.collection.createIndex({ field: 1 }, { background: true })

Warning: Background builds consume CPU, I/O, and disk space. They generate oplog entries that secondaries apply, which can increase replication lag. Monitor lag with rs.printSecondaryReplicationInfo() during the build. Avoid starting builds during peak traffic.

Clear a stale plan cache

If the index exists but the planner avoids it, clear the plan cache for the collection. This forces replanning on the next execution.

db.runCommand({ planCacheClear: "collection" })

If the regression persists across the whole cluster and you cannot identify the offending entry, a rolling restart clears all cached plans. This is disruptive and should be a last resort.

Prevention

Review all index changes in a staging environment that mirrors production data volume. Run explain("executionStats") on critical query shapes before and after the change.
Monitor $indexStats deltas weekly. An index that drops out of the top-used list warrants investigation.
Alert on the ratio of metrics.queryExecutor.scanned to scannedObjects. A sustained increase is an early warning of regressing query efficiency.
Keep the slow query threshold low enough to catch new COLLSCAN patterns before they become outages.
Before dropping any index, verify it is not used by checking $indexStats across a full business cycle and reviewing slow query logs for the query shapes it serves.

How Netdata helps

Correlate rising opLatencies.reads with flat opcounters and connections to isolate query-efficiency regressions from load spikes.
Alert on metrics.queryExecutor.scanned rate increasing while opcounters.query stays flat, catching collection scans early.
Track per-node read latency. A silent index regression on a secondary shows up as elevated read latency for that node without primary impact.
Surface slow query frequency and planSummary patterns from MongoDB logs, flagging COLLSCAN as it appears.
Visualize $indexStats usage trends over time to spot index abandonment before latency degrades.

The Netdata solution

MongoDB monitoring with Netdata

Netdata monitors MongoDB with per-second metrics and automatic dashboards. Watch WiredTiger cache pressure, oplog window, connection counts, checkpoint stalls, and replication health in one place, correlated with the underlying host.

See MongoDB monitoring → Start monitoring free

MongoDB silent index regression: when a dropped index quietly becomes a collection scan

MongoDB silent index regression: when a dropped index quietly becomes a collection scan

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Force a query hint (immediate mitigation)

Recreate a missing index

Clear a stale plan cache

Prevention

How Netdata helps

Related guides

MongoDB monitoring with Netdata