MongoDB unused indexes: $indexStats, write amplification, and safe removal

Every insert and delete updates all indexes on a MongoDB collection; every update that modifies an indexed field updates every index containing that field. An unused index consumes WiredTiger cache space, increases write latency, and amplifies disk I/O. Over time, redundant indexes accumulate from schema migrations, abandoned query patterns, and exploratory tuning.

The $indexStats aggregation stage exposes per-index operation counts, but counters reset on mongod restart and internal operations can inflate values. This guide covers detecting truly unused indexes, validating redundancy, and removing them without causing query regressions.

On write-heavy workloads, each extra index adds fixed overhead to every write. In WiredTiger, index pages compete with documents for cache space. A large unused index reduces the RAM available to active data, forcing eviction threads to work harder. If eviction cannot keep pace, application threads perform page eviction directly, adding latency. Dropping an index based on stale statistics is a reliable way to turn a healthy query into a collection scan.

What unused indexes cost you

Disk and cache space. Index entries live in WiredTiger files on disk and are cached in memory. A large unused index reduces the cache available to active documents. On disk, the index contributes to storageSize that WiredTiger does not automatically return to the filesystem after data deletes.

Write amplification. Every insert adds entries to all indexes. Every update that changes an indexed field deletes the old entry and inserts a new one in each affected index. Every delete removes entries from all indexes. An unnecessary index adds sustained disk I/O and CPU overhead to every write.

Lock contention. dropIndex requires an exclusive collection lock. During removal, concurrent writes can stall. Dropping an index during peak traffic can queue operations visible in globalLock.currentQueue and lock wait times in db.serverStatus().locks.

Prerequisites

Before evaluating indexes, confirm the following.

  • Uptime. The mongod process has been running for at least 24 hours under a representative workload. $indexStats counters reset on restart.
  • Privileges. You need permission to run $indexStats and dropIndex on the target collections.
  • Workload coverage. The observed period should include both peak and off-peak traffic. A nightly batch job may use an index that daytime OLTP does not.
  • Rollback plan. Document the current index definitions with db.collection.getIndexes() before changing anything.

Procedure

1. Collect $indexStats per collection

Run $indexStats as the first stage in an aggregation pipeline. It returns one document per index with fields including name, key, host, and an accesses document containing ops and a since timestamp.

// Check index usage for a single collection
db.collection.aggregate([{ $indexStats: {} }])

To survey all collections in a database:

db.getCollectionNames().forEach(function(coll) {
  print("=== " + coll + " ===");
  db[coll].aggregate([{ $indexStats: {} }]).forEach(function(idx) {
    print("  " + idx.name + ": " + idx.accesses.ops + " ops");
  });
});

On a sharded cluster, run this command on each shard primary individually. $indexStats on a mongos does not aggregate statistics across shards.

2. Filter removal candidates

Identify indexes where accesses.ops equals zero. Exclude the _id index, which shows usage from internal operations, and exclude unique indexes that enforce constraints even if read operations are zero.

Zero operations after 24 or more hours of normal load indicates a removal candidate. If the process restarted after the since timestamp, the counter is meaningless. Wait for a full workload cycle.

3. Check compound-prefix redundancy

A compound index such as { field: 1, other: 1 } serves queries that filter on { field: 1 } alone, provided the filter uses the exact prefix fields in order. If a separate single-field index { field: 1 } exists and shows zero usage, it is redundant. Keep the compound index; drop the single-field one.

4. Aggregate across shards

In sharded collections, $indexStats is local to each shard primary. An index may show zero usage on one shard but serve queries on another. Connect to each shard primary, run the command, and sum accesses.ops across all shards before concluding an index is globally unused.

5. Drop during low traffic

Warning: destructive. dropIndex removes the index immediately and requires an exclusive collection lock.

// Drop a specific index by name
db.collection.dropIndex("index_name")

Perform the operation during a maintenance window or low-traffic period. The lock can stall concurrent writes.

6. Monitor for regressions

Immediately after removal, check the MongoDB slow query log for COLLSCAN on the affected collection. Run explain("executionStats") on critical query shapes to confirm they still use an efficient index plan rather than scanning the collection.

flowchart TD
    A[Run $indexStats] --> B{Uptime > 24h?}
    B -->|No| C[Wait and re-check]
    B -->|Yes| D{Ops == 0?}
    D -->|No| E[Keep index]
    D -->|Yes| F{Redundant prefix?}
    F -->|Yes| G[Drop single-field index]
    F -->|No| H{Shard confirms unused?}
    H -->|No| E
    H -->|Yes| I[Drop during low traffic]
    I --> J[Monitor slow queries]

Verifying the removal

After dropping the index, confirm the outcome with these checks.

  • Index list. Run db.collection.getIndexes() to confirm the index is gone.
  • Slow query log. A missing index often appears as COLLSCAN or as a sudden increase in the docsExamined to docsReturned ratio. Look for these patterns in the log.
  • Write latency. Removing an unnecessary index should reduce per-write overhead. Compare opLatencies.writes before and after removal to confirm improvement.
  • Cache pressure. Fewer indexes mean less cache churn per write. Watch the WiredTiger cache dirty ratio for downward pressure after cleanup.

Common pitfalls

  • Short uptime. Do not drop indexes within 24 hours of a mongod restart. Counters reset to zero on restart, making active indexes appear unused.
  • Sharded blind spots. Relying on stats from a single shard misses usage on other shards. Always aggregate across the cluster.
  • Prefix redundancy. Keeping a single-field index while dropping its compound cover wastes space. Retain the compound index; drop the prefix index.
  • Constraint-only indexes. A unique index may show zero read ops while still enforcing integrity. Dropping it removes the constraint and can allow duplicate keys.
  • Internal _id usage. The _id index is required and shows internal usage. Never treat it as a removal candidate.
  • Query plan changes. After removal, the optimizer may choose a different existing index. A query that used a dropped index might switch to another index rather than a COLLSCAN. Verify with explain() on critical query shapes.
  • Lock stalls. Dropping an index during heavy write traffic can spike lock waits. If you see globalLock.currentQueue growing during removal, pause and retry during a quieter window.

Signals to monitor

SignalWhy it mattersWarning sign
$indexStats zero opsConfirms an index has not been used since restartaccesses.ops at zero after 24+ hours of uptime
Slow query COLLSCANDetects queries regressing to collection scans after removalNew COLLSCAN entries on the collection
opLatencies.writesTracks per-write latency overheadUnexpected increase after index removal
WiredTiger cache dirty ratioIndexes compete for cache spaceElevated dirty ratio from inactive index pages
Lock waits during dropIndexDropping an index acquires a collection lockSpike in globalLock.currentQueue or locks wait times

How Netdata helps

  • Correlate zero-ops indexes from $indexStats with MongoDB opcounters and opLatencies metrics to measure write amplification before and after removal.
  • Monitor WiredTiger cache dirty ratio and eviction rates to observe cache pressure relief after dropping large unused indexes.
  • Alert on globalLock.currentQueue and lock wait times to detect contention caused by dropIndex operations.
  • Track slow query patterns and COLLSCAN indicators after index changes to catch regressions immediately.
  • Trend storage growth alongside per-collection size metrics to prioritize cleanup on the largest collections.