$ guides / mongodb / mongodb-writeconflict-errors ▌

Operations Guides

MongoDB WriteConflict errors: optimistic concurrency retries under contention

WriteConflict exceptions (error 112) in application logs, or unexplained write latency spikes, point to document-level contention under WiredTiger optimistic concurrency control. Outside of transactions, MongoDB retries single-document writes internally; the client sees slower responses rather than errors. Inside multi-document transactions, MongoDB aborts immediately and returns error 112. Either way, the root cause is concurrent writers targeting the same document.

What this means

WiredTiger uses optimistic concurrency control at the document level. Two concurrent writes to the same document do not block indefinitely; one proceeds and the other encounters a conflict.

For single-document writes outside a transaction, MongoDB applies wait-on-conflict semantics: the server retries internally with backoff. The call usually succeeds, but latency rises as retries accumulate. For multi-document transactions, MongoDB uses fail-on-conflict semantics. WiredTiger returns a WriteConflict immediately and aborts the transaction. The server does not auto-retry. The application or driver must detect the transient error and replay the entire transaction.

High WriteConflict rates mean hot-document contention. Causes include multiple writers on the same document, long-running transactions pinning snapshots, or application-level read-modify-write races.

flowchart TD
    A[Client sends write] --> B{Inside multi-doc transaction?}
    B -->|No| C[WiredTiger wait-on-conflict]
    C --> D[Server retries internally with backoff]
    D --> E[Client sees success or timeout]
    B -->|Yes| F[WiredTiger fail-on-conflict]
    F --> G[Server returns WriteConflict 112]
    G --> H[Driver or app must retry entire transaction]

Common causes

Cause	What it looks like	First thing to check
Hot-document updates	Error 112 in logs, or rising `asserts.user`, with slow writes targeting the same `_id` or shard key	`db.currentOp()` filtered by `ns` to find repeated access to one document
Long-running multi-document transactions	Rising `transactions.totalAborted`, transactions open longer than 60 seconds, growing queue depths	`db.currentOp({ "transaction": { "$exists": true } })` for `timeOpenMicros`
Read-modify-write races	Application fetches a document, mutates it in memory, then writes it back without atomic operators	Profiler or logs for `find` followed by `updateOne` on the same `ns` without a version predicate
Transaction retry storms	Abort rate exceeds commit rate, latency spikes correlate with application error logs	`db.serverStatus().transactions` abort-to-commit ratio over time

Quick checks

// User assertion rate (includes all user errors, not only WriteConflict)
var a = db.serverStatus().asserts;
print("User assertions: " + a.user);

// Transaction abort versus commit balance
var t = db.serverStatus().transactions;
print("Aborted: " + t.totalAborted + ", Committed: " + t.totalCommitted);

// Long-running transactions and their age
db.currentOp({ "transaction": { "$exists": true } }).inprog.forEach(function(op) {
  print(op.opid + " | " + op.ns + " | " + (op.transaction.timeOpenMicros / 1000000) + "s");
});

// Writer queue depth
db.serverStatus().globalLock.currentQueue;

// Write ticket availability
var t = db.serverStatus().wiredTiger.concurrentTransactions;
print("Write tickets available: " + t.write.available + " / " + t.write.totalTickets);

# Check logs for WriteConflict evidence (path varies by installation)
grep -iE "writeconflict|error.*112" /var/log/mongodb/mongod.log | tail -20

// Recent slow operations for contention patterns
// Requires profiling enabled
db.system.profile.find().sort({ ts: -1 }).limit(20).pretty();

How to diagnose it

Quantify the error rate. Sample asserts.user at two points and compute the delta. A sustained rise signals growing user errors, but this counter includes all user assertions, not only WriteConflicts. Correlate with MongoDB logs for “WriteConflict” or code 112 to confirm the pattern.
Determine whether the problem is inside transactions. Compare transactions.totalAborted against transactions.totalCommitted. An abort rate above baseline with growing currentOpen means transactions are colliding and retrying.
Find the contested namespace. Use db.currentOp() to list active operations. Look for many write operations on the same ns, especially with identical query shapes or document keys. If multiple operations share the same planSummary and target the same _id or shard key, you have identified the hot document. Long-running transactions will show high timeOpenMicros.
Inspect the slow query log and profiler. Look for update operations with high lock wait time. Read-modify-write patterns appear as a find followed shortly by an updateOne on the same collection without an atomic operator such as $inc or $set. A COLLSCAN inside a transaction increases lock duration and raises collision probability.
Check for cascading saturation. Write conflicts consume tickets and hold snapshots. Verify whether wiredTiger.concurrentTransactions.write.available has dropped below 25 percent of total, or whether globalLock.currentQueue.writers is nonzero. If tickets are exhausted, the problem has moved from document contention to system-wide queuing. If writers queue while tickets remain available, look for CPU or storage saturation instead.
Correlate with cache pressure. Long-running transactions pin WiredTiger snapshots, which prevents eviction. Compute the dirty ratio and check application-thread evictions:
```
var s = db.serverStatus().wiredTiger.cache;
var dirtyRatio = s["tracked dirty bytes in the cache"] / s["maximum bytes configured"];
var appEvictions = s["pages evicted by application threads"];
print("Dirty ratio: " + dirtyRatio + ", App evictions: " + appEvictions);
```
If the dirty ratio or application-thread eviction count is rising, the WriteConflict storm is causing secondary cache pressure.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`asserts.user` rate	Rising rate signals growing user errors, including WriteConflict	Sustained increase from baseline
`transactions.totalAborted` vs `totalCommitted`	Reveals transaction-level retry storms	Abort rate consistently above 50 percent of commit rate
`currentOp` max transaction age	Long transactions pin snapshots and force retries on other writers	Any transaction open longer than 60 seconds
`globalLock.currentQueue.writers`	Queued writers indicate contention is becoming system-wide	Sustained nonzero queue
`wiredTiger.concurrentTransactions.write.available`	Ticket exhaustion turns document contention into global latency	Below 25 percent of total tickets sustained
Slow query log / `system.profile` rate	Retries increase operation duration	Sudden spike in slow writes
`opLatencies.writes` average	Average write latency rises when operations retry internally or abort	Latency doubling from baseline

Fixes

Hot-document contention

Replace read-modify-write loops with atomic operators. Use $inc, $set, or findOneAndUpdate with a version predicate to narrow the race window. For updates that must read before writing, project only the fields needed so the operation holds the document lock for the shortest time possible.

For inherently serial data such as global counters or leaderboards, distribute the hot document into N bucket documents selected by a hash or random value, then aggregate at read time. This spreads contention across multiple documents.

Long-running transactions

Reduce transaction scope. Split large batch updates into smaller transactions that commit faster. Ensure transactionLifetimeLimitSeconds is set appropriately for your workload to prevent runaway transactions.

Warning: Killing operations aborts active work and can disrupt legitimate clients.

If a transaction remains open after the application has finished, kill it with db.killOp(opid) to release its snapshot and locks.

Retry storms in application code

Ensure the application uses jittered exponential backoff between transaction retries. Without backoff, multiple clients retry simultaneously after a conflict, creating thundering-herd behavior that amplifies the problem. Cap total retry attempts to prevent infinite loops if an underlying hot document stays contested. Do not rely on naive immediate retry loops.

Ensure retries are idempotent. Retrying a transaction that has already partially committed can cause duplicate writes unless the application tracks transaction state or relies on unique indexes.

Storage-layer pressure

If WriteConflicts correlate with ticket exhaustion or cache pressure, reduce concurrent write load temporarily. Pause batch jobs or throttle ingestion until ticket availability recovers.

Warning: Killing operations aborts active work.

Kill unnecessary long-running operations only as a last resort to free tickets immediately.

Prevention

Monitor asserts.user deltas continuously. Anomalous rates detect contention before application timeouts trigger.
Track the ratio of transactions.totalAborted to totalCommitted. A rising ratio is an early warning of transaction unfriendliness in the workload.
Audit query patterns quarterly for read-modify-write sequences that could be replaced with atomic updates.
Keep transactions short and deterministic. Avoid transactions that scan large ranges or hold cursors open.
Monitor currentOp for operations approaching your transaction timeout threshold.
Set client-side transaction timeouts lower than the server’s transactionLifetimeLimitSeconds so applications fail fast rather than holding snapshots until the server aborts them.

How Netdata helps

Netdata charts asserts.user deltas, exposing WriteConflict storms that internal retries hide from application-level error counters.
Netdata correlates transaction abort rates with globalLock.currentQueue and WiredTiger ticket availability, helping distinguish document contention from storage saturation.
Netdata tracks opLatencies average write latency alongside queue depths, revealing retry-driven latency spikes before they trigger application timeouts.
Netdata monitors WiredTiger cache dirty ratio and application-thread evictions, alerting when transaction snapshots pin cache and amplify contention into a cascade.

The Netdata solution

MongoDB monitoring with Netdata

Netdata monitors MongoDB with per-second metrics and automatic dashboards. Watch WiredTiger cache pressure, oplog window, connection counts, checkpoint stalls, and replication health in one place, correlated with the underlying host.

See MongoDB monitoring → Start monitoring free

MongoDB WriteConflict errors: optimistic concurrency retries under contention

MongoDB WriteConflict errors: optimistic concurrency retries under contention

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Hot-document contention

Long-running transactions

Retry storms in application code

Storage-layer pressure

Prevention

How Netdata helps

Related guides

MongoDB monitoring with Netdata