MongoDB lock wait times: collection and metadata lock contention during DDL
When p99 latency jumps and globalLock.currentQueue grows, check serverStatus().locks. If timeAcquiringMicros is climbing for Collection or Metadata, the cause is almost always DDL: createIndexes, dropIndexes, collMod, renameCollection, or similar commands that acquire exclusive collection, database, or metadata locks. WiredTiger uses document-level concurrency for ordinary reads and writes, so normal CRUD rarely blocks. A single schema change can serialize operations on a hot collection or across a database during peak traffic.
Read the lock metrics, find the DDL holder, and resolve the contention without making it worse.
What this means
db.serverStatus().locks reports four lock types that matter: Global, Database, Collection, and Metadata. Each type carries four subdocuments per mode:
acquireCount.{mode}: total acquisitionsacquireWaitCount.{mode}: acquisitions that had to waittimeAcquiringMicros.{mode}: cumulative wait time in microsecondsdeadlockCount.{mode}: deadlocks detected
Average wait per contested acquisition:
timeAcquiringMicros.{mode} / acquireWaitCount.{mode}
WiredTiger uses intent locks for routine operations. Conflicts at the document level are retried, not queued. DDL is different: createIndexes, dropIndexes, collMod, renameCollection, and drop require an exclusive W collection lock for their duration. collMod and some cross-collection operations also acquire a database lock. Metadata serializes schema changes. When one of these runs on a busy collection, every other operation targeting that collection queues.
A healthy target is lock wait time below 1% of total operation time. Compare timeAcquiringMicros deltas to opLatencies totals over the same window.
flowchart TD
A[Lock wait > 1% of operation time] --> B{Which lock type rises?}
B -->|Global| C[Cross-database DDL or admin command]
B -->|Database| D[DB-level lock holder such as collMod]
B -->|Collection| E[DDL on one collection: createIndexes, dropIndexes, collMod, rename]
B -->|Metadata| F[Schema change serialization]
E --> G[currentOp shows DDL holder]
F --> G
G --> H{Impact scope}
H -->|Single namespace| I[Kill or wait for maintenance window]
H -->|Replica set| J[Reschedule maintenance; avoid killing index builds]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Active DDL on a hot collection | Collection lock waits rising; operations on one namespace slow or queue | currentOp for createIndexes, dropIndexes, collMod, or renameCollection on that namespace |
| Schema change serialization | Metadata lock waits rising; multiple DDL commands queue behind each other | currentOp with waitingForLock: true filtered to DDL commands |
| Long-running transaction blocking DDL | Database lock waits rising; collMod or DDL appears stuck on a database | currentOp for transactions with timeOpenMicros > 60 seconds on the same database |
| Database-level or global DDL | Global or Database lock waits spike; broad latency impact across collections | currentOp for renameCollection, cloneCollectionAsCapped, or admin commands |
| Oplog lock contention on a heavy primary | oplog lock waits increase alongside high write throughput | serverStatus().locks.oplog and opcounters write rate |
Quick checks
These checks are read-only and safe to run on a live primary or secondary.
// Print acquireWaitCount and timeAcquiringMicros for each lock type/mode
var locks = db.serverStatus().locks;
for (var type in locks) {
var l = locks[type];
if (l.acquireWaitCount) {
for (var mode in l.acquireWaitCount) {
print(type + " " + mode +
" waits: " + l.acquireWaitCount[mode] +
" totalUs: " + l.timeAcquiringMicros[mode]);
}
}
}
// Average wait per contested Collection W acquisition
var c = db.serverStatus().locks.Collection;
if (c && c.acquireWaitCount && c.acquireWaitCount.w > 0) {
print("Collection W avg wait us: " +
(c.timeAcquiringMicros.w / c.acquireWaitCount.w).toFixed(0));
}
// Operations currently waiting for locks
db.currentOp({ waitingForLock: true }).inprog.forEach(function(op) {
print(op.opid + " | " + op.op + " | " + op.secs_running + "s | " + op.ns);
});
// Active operations running longer than 10 seconds
db.currentOp({ active: true, secs_running: { $gt: 10 } }).inprog.forEach(function(op) {
print(op.opid + " | " + op.op + " | " + op.secs_running + "s | " + op.ns);
});
// Active multi-document transactions
db.currentOp({ "transaction": { "$exists": true }, active: true }).inprog.forEach(function(op) {
print(op.opid + " | " + op.ns + " | open " +
(op.transaction.timeOpenMicros / 1000000).toFixed(1) + "s");
});
// Current lock queue depths
printjson(db.serverStatus().globalLock.currentQueue);
# Recent DDL entries in the MongoDB log
grep -iE "createIndexes|dropIndexes|collMod|renameCollection" /var/log/mongodb/mongod.log | tail -20
How to diagnose it
Confirm lock wait growth from
serverStatus().locks. Compute the average wait per contested acquisition forCollection,Metadata,Database, andGlobalover a 1-5 minute window. IfCollectionorMetadataaverage wait is rising and exceeds roughly 1% of typical operation latency, you have DDL contention.Identify the lock type.
Collectionwaits point to a single collection locked by DDL.Metadatawaits point to schema-change serialization.Databasewaits often involvecollModor multi-collection DDL.Globalwaits indicate cross-database operations or administrative commands.Find the holder and the waiters. Run
db.currentOp({ waitingForLock: true })and list active operations older than 10 seconds on the affected namespace. Look forcreateIndexes,dropIndexes,collMod, orrenameCollection.Check for blocking transactions. If
collModor DDL is stuck on a database, searchcurrentOpfor active multi-document transactions on that database. A long-running transaction can hold a database lock and block DDL.Correlate with workload impact. Compare lock wait growth to
opLatencies,opcounters, andglobalLock.currentQueue. A drop in throughput or spike in latency that aligns with lock waits confirms user impact.Assess scope and risk. Single-collection contention is usually limited to one namespace. Database or global lock contention affects many clients. On a replica set, DDL replicates to secondaries and can inflate replication lag while secondaries apply it.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
locks.Collection.timeAcquiringMicros / acquireWaitCount.w | Average wait for exclusive collection locks; reveals DDL blocking CRUD on a collection | Sustained increase or average wait > 1% of operation latency |
locks.Metadata.timeAcquiringMicros / acquireWaitCount.W | Schema change serialization; often the hidden bottleneck during migration scripts | Non-zero sustained wait count or rising time |
locks.Database.timeAcquiringMicros | DB-level waits from collMod or multi-collection DDL | Increasing during normal traffic |
locks.Global.timeAcquiringMicros | Cross-database DDL or admin operations blocking broad traffic | Spike outside maintenance windows |
globalLock.currentQueue.writers | Operations queued behind exclusive locks | Sustained > 20 and growing |
opLatencies.reads and opLatencies.writes | User-visible latency increase | p99 > 2x baseline coinciding with lock wait growth |
currentOp max age and waitingForLock count | Direct view of the DDL holder and blocked operations | DDL operation > 60 seconds or more than 20 waiters |
Fixes
Active DDL blocking a collection
If currentOp shows createIndexes, dropIndexes, or collMod running during peak traffic, the safest fix is to wait for the next maintenance window and reschedule. If impact is severe and the operation is not an index build, you can abort it with db.killOp(opid).
Warning: Killing a write operation can leave documents partially updated and indexes inconsistent. Only kill if the impact outweighs data risk.
Avoid killing index builds. Aborting a replicated index build partway through can require a resync.
Modern MongoDB versions use hybrid index builds that yield more than pre-4.2 foreground builds, but they still take collection locks at critical phases. Run index builds when traffic is low.
Long-running transaction blocking DDL
When a transaction holds a database lock and blocks collMod or similar DDL, identify the transaction in currentOp using the transaction filter. Have the application commit or abort it. If the transaction is abandoned, killOp will abort it and release the lock. Aborting a transaction rolls back its in-flight writes.
Database-level or global DDL
Cancel or reschedule renameCollection, cloneCollectionAsCapped, and similar operations that acquire Database or Global locks. These are not routine production traffic operations and should run during maintenance windows. If you must rename a collection, doing it within the same database acquires only collection locks and is far less disruptive than cross-database renames.
Oplog lock contention on write-heavy primaries
If locks.oplog waits grow with write throughput, reduce burstiness at the application layer. Split large batches into smaller ones, avoid massive multi-document transactions, and ensure write concern is not forcing unnecessary serialization. If the primary is saturated, throttle ingest or scale out.
Prevention
- Run DDL in maintenance windows.
createIndexes,dropIndexes,collMod, andrenameCollectionacquire exclusive locks. Schedule them away from peak traffic. - Keep transactions short. Long-running transactions hold database locks and block DDL. Enforce tight application timeouts and small transaction scopes.
- Stagger schema changes. Avoid running multiple
collModor index operations concurrently on the same database because they serialize through metadata locks. - Monitor lock wait deltas. Trend
timeAcquiringMicros/acquireWaitCountforCollectionandMetadata. Alert when the average wait exceeds 1% of operation time. - Audit automated migrations. Some ORMs and schema management tools run
collModsilently. Review change pipelines so DDL does not slip into production deploys. - Avoid cross-database renames during traffic. Prefer same-database renames or application-level copy-and-switch patterns.
How Netdata helps
- Collects
serverStatus().locksmetrics per type and mode, surfacingacquireWaitCountandtimeAcquiringMicrosdeltas to spot risingCollectionorMetadatalock waits. - Correlates lock wait spikes with
globalLock.currentQueue,opLatencies, andcurrentOplongest-running operation age on the same timeline to distinguish DDL contention from cache pressure or ticket exhaustion. - Computes average wait per contested acquisition and alerts when
CollectionorMetadatalock waits exceed a baseline threshold. - Surfaces replica set member state and replication lag alongside lock metrics, so you can see if DDL on the primary is delaying secondaries.
- Captures lock metrics and queue depths at per-second resolution, catching short DDL bursts that slower polls miss.
Related guides
- How MongoDB actually works in production: a mental model for operators
- MongoDB pages evicted by application threads: when eviction becomes user latency
- MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches
- MongoDB WiredTiger cache pressure cascade: eviction stalls and latency spikes
- MongoDB cache too small: sizing the WiredTiger cache for your working set
- MongoDB checkpoint duration climbing: diagnosing slow WiredTiger checkpoints
- MongoDB checkpoint stall write freeze: when all writes stop with no error
- MongoDB connection churn: high totalCreated rate and thread creation overhead
- MongoDB connection refused at maxIncomingConnections: hitting the connection ceiling
- MongoDB connection storm spiral: reconnection floods after an election or deploy
- MongoDB flow control throttling writes: when the primary slows itself down
- MongoDB journal sync latency high: the storage signal that warns 60 seconds early







