MongoDB lock wait times: collection and metadata lock contention during DDL

When p99 latency jumps and globalLock.currentQueue grows, check serverStatus().locks. If timeAcquiringMicros is climbing for Collection or Metadata, the cause is almost always DDL: createIndexes, dropIndexes, collMod, renameCollection, or similar commands that acquire exclusive collection, database, or metadata locks. WiredTiger uses document-level concurrency for ordinary reads and writes, so normal CRUD rarely blocks. A single schema change can serialize operations on a hot collection or across a database during peak traffic.

Read the lock metrics, find the DDL holder, and resolve the contention without making it worse.

What this means

db.serverStatus().locks reports four lock types that matter: Global, Database, Collection, and Metadata. Each type carries four subdocuments per mode:

  • acquireCount.{mode}: total acquisitions
  • acquireWaitCount.{mode}: acquisitions that had to wait
  • timeAcquiringMicros.{mode}: cumulative wait time in microseconds
  • deadlockCount.{mode}: deadlocks detected

Average wait per contested acquisition:

timeAcquiringMicros.{mode} / acquireWaitCount.{mode}

WiredTiger uses intent locks for routine operations. Conflicts at the document level are retried, not queued. DDL is different: createIndexes, dropIndexes, collMod, renameCollection, and drop require an exclusive W collection lock for their duration. collMod and some cross-collection operations also acquire a database lock. Metadata serializes schema changes. When one of these runs on a busy collection, every other operation targeting that collection queues.

A healthy target is lock wait time below 1% of total operation time. Compare timeAcquiringMicros deltas to opLatencies totals over the same window.

flowchart TD
    A[Lock wait > 1% of operation time] --> B{Which lock type rises?}
    B -->|Global| C[Cross-database DDL or admin command]
    B -->|Database| D[DB-level lock holder such as collMod]
    B -->|Collection| E[DDL on one collection: createIndexes, dropIndexes, collMod, rename]
    B -->|Metadata| F[Schema change serialization]
    E --> G[currentOp shows DDL holder]
    F --> G
    G --> H{Impact scope}
    H -->|Single namespace| I[Kill or wait for maintenance window]
    H -->|Replica set| J[Reschedule maintenance; avoid killing index builds]

Common causes

CauseWhat it looks likeFirst thing to check
Active DDL on a hot collectionCollection lock waits rising; operations on one namespace slow or queuecurrentOp for createIndexes, dropIndexes, collMod, or renameCollection on that namespace
Schema change serializationMetadata lock waits rising; multiple DDL commands queue behind each othercurrentOp with waitingForLock: true filtered to DDL commands
Long-running transaction blocking DDLDatabase lock waits rising; collMod or DDL appears stuck on a databasecurrentOp for transactions with timeOpenMicros > 60 seconds on the same database
Database-level or global DDLGlobal or Database lock waits spike; broad latency impact across collectionscurrentOp for renameCollection, cloneCollectionAsCapped, or admin commands
Oplog lock contention on a heavy primaryoplog lock waits increase alongside high write throughputserverStatus().locks.oplog and opcounters write rate

Quick checks

These checks are read-only and safe to run on a live primary or secondary.

// Print acquireWaitCount and timeAcquiringMicros for each lock type/mode
var locks = db.serverStatus().locks;
for (var type in locks) {
  var l = locks[type];
  if (l.acquireWaitCount) {
    for (var mode in l.acquireWaitCount) {
      print(type + " " + mode +
            " waits: " + l.acquireWaitCount[mode] +
            " totalUs: " + l.timeAcquiringMicros[mode]);
    }
  }
}
// Average wait per contested Collection W acquisition
var c = db.serverStatus().locks.Collection;
if (c && c.acquireWaitCount && c.acquireWaitCount.w > 0) {
  print("Collection W avg wait us: " +
        (c.timeAcquiringMicros.w / c.acquireWaitCount.w).toFixed(0));
}
// Operations currently waiting for locks
db.currentOp({ waitingForLock: true }).inprog.forEach(function(op) {
  print(op.opid + " | " + op.op + " | " + op.secs_running + "s | " + op.ns);
});
// Active operations running longer than 10 seconds
db.currentOp({ active: true, secs_running: { $gt: 10 } }).inprog.forEach(function(op) {
  print(op.opid + " | " + op.op + " | " + op.secs_running + "s | " + op.ns);
});
// Active multi-document transactions
db.currentOp({ "transaction": { "$exists": true }, active: true }).inprog.forEach(function(op) {
  print(op.opid + " | " + op.ns + " | open " +
        (op.transaction.timeOpenMicros / 1000000).toFixed(1) + "s");
});
// Current lock queue depths
printjson(db.serverStatus().globalLock.currentQueue);
# Recent DDL entries in the MongoDB log
grep -iE "createIndexes|dropIndexes|collMod|renameCollection" /var/log/mongodb/mongod.log | tail -20

How to diagnose it

  1. Confirm lock wait growth from serverStatus().locks. Compute the average wait per contested acquisition for Collection, Metadata, Database, and Global over a 1-5 minute window. If Collection or Metadata average wait is rising and exceeds roughly 1% of typical operation latency, you have DDL contention.

  2. Identify the lock type. Collection waits point to a single collection locked by DDL. Metadata waits point to schema-change serialization. Database waits often involve collMod or multi-collection DDL. Global waits indicate cross-database operations or administrative commands.

  3. Find the holder and the waiters. Run db.currentOp({ waitingForLock: true }) and list active operations older than 10 seconds on the affected namespace. Look for createIndexes, dropIndexes, collMod, or renameCollection.

  4. Check for blocking transactions. If collMod or DDL is stuck on a database, search currentOp for active multi-document transactions on that database. A long-running transaction can hold a database lock and block DDL.

  5. Correlate with workload impact. Compare lock wait growth to opLatencies, opcounters, and globalLock.currentQueue. A drop in throughput or spike in latency that aligns with lock waits confirms user impact.

  6. Assess scope and risk. Single-collection contention is usually limited to one namespace. Database or global lock contention affects many clients. On a replica set, DDL replicates to secondaries and can inflate replication lag while secondaries apply it.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
locks.Collection.timeAcquiringMicros / acquireWaitCount.wAverage wait for exclusive collection locks; reveals DDL blocking CRUD on a collectionSustained increase or average wait > 1% of operation latency
locks.Metadata.timeAcquiringMicros / acquireWaitCount.WSchema change serialization; often the hidden bottleneck during migration scriptsNon-zero sustained wait count or rising time
locks.Database.timeAcquiringMicrosDB-level waits from collMod or multi-collection DDLIncreasing during normal traffic
locks.Global.timeAcquiringMicrosCross-database DDL or admin operations blocking broad trafficSpike outside maintenance windows
globalLock.currentQueue.writersOperations queued behind exclusive locksSustained > 20 and growing
opLatencies.reads and opLatencies.writesUser-visible latency increasep99 > 2x baseline coinciding with lock wait growth
currentOp max age and waitingForLock countDirect view of the DDL holder and blocked operationsDDL operation > 60 seconds or more than 20 waiters

Fixes

Active DDL blocking a collection

If currentOp shows createIndexes, dropIndexes, or collMod running during peak traffic, the safest fix is to wait for the next maintenance window and reschedule. If impact is severe and the operation is not an index build, you can abort it with db.killOp(opid).

Warning: Killing a write operation can leave documents partially updated and indexes inconsistent. Only kill if the impact outweighs data risk.

Avoid killing index builds. Aborting a replicated index build partway through can require a resync.

Modern MongoDB versions use hybrid index builds that yield more than pre-4.2 foreground builds, but they still take collection locks at critical phases. Run index builds when traffic is low.

Long-running transaction blocking DDL

When a transaction holds a database lock and blocks collMod or similar DDL, identify the transaction in currentOp using the transaction filter. Have the application commit or abort it. If the transaction is abandoned, killOp will abort it and release the lock. Aborting a transaction rolls back its in-flight writes.

Database-level or global DDL

Cancel or reschedule renameCollection, cloneCollectionAsCapped, and similar operations that acquire Database or Global locks. These are not routine production traffic operations and should run during maintenance windows. If you must rename a collection, doing it within the same database acquires only collection locks and is far less disruptive than cross-database renames.

Oplog lock contention on write-heavy primaries

If locks.oplog waits grow with write throughput, reduce burstiness at the application layer. Split large batches into smaller ones, avoid massive multi-document transactions, and ensure write concern is not forcing unnecessary serialization. If the primary is saturated, throttle ingest or scale out.

Prevention

  • Run DDL in maintenance windows. createIndexes, dropIndexes, collMod, and renameCollection acquire exclusive locks. Schedule them away from peak traffic.
  • Keep transactions short. Long-running transactions hold database locks and block DDL. Enforce tight application timeouts and small transaction scopes.
  • Stagger schema changes. Avoid running multiple collMod or index operations concurrently on the same database because they serialize through metadata locks.
  • Monitor lock wait deltas. Trend timeAcquiringMicros / acquireWaitCount for Collection and Metadata. Alert when the average wait exceeds 1% of operation time.
  • Audit automated migrations. Some ORMs and schema management tools run collMod silently. Review change pipelines so DDL does not slip into production deploys.
  • Avoid cross-database renames during traffic. Prefer same-database renames or application-level copy-and-switch patterns.

How Netdata helps

  • Collects serverStatus().locks metrics per type and mode, surfacing acquireWaitCount and timeAcquiringMicros deltas to spot rising Collection or Metadata lock waits.
  • Correlates lock wait spikes with globalLock.currentQueue, opLatencies, and currentOp longest-running operation age on the same timeline to distinguish DDL contention from cache pressure or ticket exhaustion.
  • Computes average wait per contested acquisition and alerts when Collection or Metadata lock waits exceed a baseline threshold.
  • Surfaces replica set member state and replication lag alongside lock metrics, so you can see if DDL on the primary is delaying secondaries.
  • Captures lock metrics and queue depths at per-second resolution, catching short DDL bursts that slower polls miss.