MongoDB ticket exhaustion: WiredTiger read/write tickets and queued operations
Your application times out while the OS shows idle CPU and disk utilisation looks survivable. The MongoDB log shows no obvious errors, yet operations stall. The likely cause is WiredTiger ticket exhaustion: the storage engine has run out of read or write concurrency tokens, and new work queues behind slow operations. Confirm ticket starvation, find the root cause, and fix it without raising the ticket limit.
What this means
WiredTiger uses ticket-based admission control. Every operation that touches the storage engine must acquire a read or write ticket before it proceeds. In MongoDB 6.x and earlier, the default is 128 read and 128 write tickets per node. MongoDB 7.0 introduced a dynamic throughputProbing algorithm that adjusts the active ceiling downward from 128 under light load, scaling up under demand but never exceeding 128. In MongoDB 8.0+, the metrics moved from wiredTiger.concurrentTransactions to queues.execution, adding queue-length and timing fields that help distinguish true congestion from a low adaptive baseline.
When all tickets are in use, new operations queue. Ticket exhaustion is a symptom, not the disease. The root cause is almost always that operations hold tickets too long because of slow disk I/O, cache pressure, lock contention, or long-running queries. Raising the ticket limit is almost never the correct fix. It simply allows more operations to enter the storage engine and compound the bottleneck.
Flow control on replica set primaries uses a separate ticket pool to throttle writes and manage replication lag. Flow control starvation produces different symptoms and is governed by flowControlTargetLagSeconds. This article focuses on WiredTiger execution tickets.
flowchart TD
A[Tickets below 25%] --> B{Long ops in currentOp?}
B -->|Yes| C[Kill op or fix query plan]
B -->|No| D{Journal sync above 30ms?}
D -->|Yes| E[Storage bottleneck]
D -->|No| F{Dirty ratio above 10%?}
F -->|Yes| G[Cache pressure cascade]
F -->|No| H{Connection churn?}
H -->|Yes| I[Connection storm]
H -->|No| J[Adaptive floor post-7.0]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Slow disk I/O or storage degradation | Available tickets near zero, journal sync latency sustained above 30 ms, checkpoint duration climbing | OS disk latency (iostat -x 1) and wiredTiger.log sync averages |
| WiredTiger cache pressure | Cache dirty ratio above 10%, application-thread evictions incrementing, ticket availability dropping during peaks | wiredTiger.cache dirty ratio and pages evicted by application threads |
| Long-running operations holding tickets | Few connections but low ticket availability, currentOp shows operations running longer than 60 seconds | db.currentOp({ active: true, secs_running: { $gt: 10 } }) |
| Connection storm or concurrency spike | Connection count spiking rapidly, totalCreated delta high, ticket exhaustion follows a trigger event | serverStatus().connections and client source in currentOp |
| Dynamic algorithm baseline after 7.0 upgrade | totalTickets is 7-15 under light load, but queue length is zero and latency is normal | queues.execution queue length and totalTimeQueuedMicros (8.0+) or correlate with opLatencies |
Quick checks
// Ticket availability (MongoDB <= 7.x)
var t = db.serverStatus().wiredTiger.concurrentTransactions;
print("Read: " + t.read.available + "/" + t.read.totalTickets);
print("Write: " + t.write.available + "/" + t.write.totalTickets);
// Ticket availability (MongoDB 8.0+)
var q = db.serverStatus().queues.execution;
print("Read: " + q.read.available + "/" + q.read.totalTickets);
print("Write: " + q.write.available + "/" + q.write.totalTickets);
<!-- TODO: verify exact field names for queues.execution subdocuments -->
// Long-running operations
db.currentOp({ active: true, secs_running: { $gt: 10 } }).inprog.forEach(function(op) {
print(op.opid + " | " + op.op + " | " + op.secs_running + "s | " + op.ns);
});
// Queue depths
db.serverStatus().globalLock.currentQueue
// Cache pressure
var c = db.serverStatus().wiredTiger.cache;
var max = c["maximum bytes configured"];
print("Dirty: " + (100 * c["tracked dirty bytes in the cache"] / max).toFixed(1) + "%");
print("App evictions: " + c["pages evicted by application threads"]);
// Journal sync latency
var wt = db.serverStatus().wiredTiger.log;
var syncOps = wt["log sync operations"];
var syncTime = wt["log sync time duration (usecs)"];
print("Avg sync µs: " + (syncOps > 0 ? (syncTime / syncOps).toFixed(0) : "N/A"));
// Connection churn
var c = db.serverStatus().connections;
print("Current: " + c.current + ", Available: " + c.available + ", TotalCreated: " + c.totalCreated);
How to diagnose it
Confirm the direction of exhaustion. Use the version-appropriate
serverStatuspath to read available read and write tickets. If both are low, suspect storage or cache pressure. If only writes are low, suspect checkpoint stall, journal latency, or write-heavy lock contention.Check for version-specific behavior. On MongoDB 7.0+, the dynamic algorithm intentionally keeps
totalTicketslow under light load. LowtotalTicketsalone is not an error. In 8.0+, inspectqueueLengthandtotalTimeQueuedMicrosunderqueues.execution. If those are zero, the system is not queuing and you are seeing normal adaptive behavior.Inspect
db.currentOp()for operations running longer than 60 seconds. A single large aggregation or collection scan can hold tickets for minutes and starve the pool. Note theopid,ns, andsecs_running.Measure storage health. Compute average journal sync latency from
wiredTiger.log. If it exceeds 30 ms sustained, disk I/O is the likely root cause. Check OS-level disk metrics withiostat -x 1for elevated%utilorawait.Evaluate WiredTiger cache pressure. If the dirty ratio exceeds 10% or
pages evicted by application threadsis incrementing, the cache is forcing operations to do eviction work while holding tickets. This creates a feedback loop where slower operations consume more tickets for longer.Look at connection patterns. A spike in
totalCreatedwith stablecurrentindicates connection churn. A spike incurrentafter an election or deploy suggests a connection storm. Each new connection adds a thread that competes for tickets.Review lock wait times in
serverStatus().locks. GrowingtimeAcquiringMicrosforCollectionorGloballocks points to contention that keeps tickets held.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Available read/write tickets | Headroom before queuing | Below 25% of total, or below 10 absolute |
globalLock.currentQueue | Direct indicator of queued operations | Sustained total above 20 or growing trend |
opLatencies reads/writes | User-visible latency from ticket waits | Average sustained above 2x baseline |
| WiredTiger cache dirty ratio | Predicts checkpoint stall and eviction pressure | Above 10% sustained |
| Journal sync latency | Leading indicator of storage degradation | Above 30 ms sustained |
| Checkpoint duration | Whether dirty pages can flush fast enough | Above 30 seconds or approaching the 60-second interval |
currentOp max age | Catches runaway queries before they exhaust tickets | Above 60 seconds in OLTP workloads |
Connection count and totalCreated delta | Reveals concurrency spikes and churn | Current above 80% of max, or high totalCreated rate |
Lock timeAcquiringMicros | Shows contention keeping tickets held | Growing deltas relative to opLatencies totals |
Fixes
Fix slow storage
If journal sync latency is elevated or checkpoint duration is climbing, the disk subsystem cannot keep up. On cloud block storage, burst credit exhaustion is a common cause. Look for await above 10 ms or %util approaching 100% in iostat. Provision more IOPS or throughput, or move the primary to a node with healthier storage. Do not raise the ticket limit. If a single node is degraded, step down the primary to shift writes to a healthier member.
Fix cache pressure
Kill unnecessary long-running operations with db.killOp(opid). Warning: the operation may not release tickets immediately, and killing large operations can briefly increase load.
Reduce write throughput by pausing batch jobs or throttling ingestion. If the cache is genuinely undersized for the working set, increase it via storage.wiredTiger.engineConfig.cacheSizeGB and restart the node. See MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches and MongoDB cache too small: sizing the WiredTiger cache for your working set.
Fix long-running queries
If currentOp reveals a collection scan, an unindexed sort, or an unexpected aggregation, terminate it with db.killOp(opid) and fix the query plan. Add missing indexes, add query timeouts at the application level, or rewrite overly broad aggregations. One bad query can hold tickets long enough to degrade the whole node.
Fix connection storms
Identify the heaviest client sources via db.currentOp() grouped by client. If connections are overwhelming the node, block new sources at the firewall or load balancer to stop the spiral. Restarting with a lower net.maxIncomingConnections can cap the ceiling but requires a planned outage. Then fix the trigger: stabilise the replica set primary, resolve network blips, or fix DNS. See MongoDB connection storm spiral: reconnection floods after an election or deploy.
Do not disable dynamic tickets without support
On MongoDB 7.0+, attempting to set wiredTigerConcurrentReadTransactions at runtime may return an error when dynamic adjustment is active. Reverting to static 128 tickets requires setting wiredTigerConcurrentReadTransactions: 128 and wiredTigerConcurrentWriteTransactions: 128 in the configuration file and restarting. Engage MongoDB Technical Support before making this change. In most cases, the dynamic algorithm is not the problem.
Prevention
- Graph available tickets continuously and alert when they drop below 25% of the current
totalTickets. Do not wait for zero. - Correlate ticket graphs with journal sync latency, cache dirty ratio, and checkpoint duration so you can distinguish storage problems from query problems before queues form.
- Track the oldest running operation from
currentOpas a continuous metric. Runaway queries are easier to kill at 30 seconds than at 5 minutes. - Monitor connection churn (
totalCreateddeltas), not just connection count. Reconnection storms often precede ticket exhaustion by minutes. - If you run MongoDB 7.0+, remember that
totalTicketswill appear low under light load. Monitor queue depth and latency instead of fixating on the absolute ticket count. Usequeues.executionin 8.0+ to watchqueueLengthandtotalTimeQueuedMicros.
How Netdata helps
Netdata correlates available read/write tickets, cache dirty ratio, journal sync latency, checkpoint duration, and opLatencies on the same timeline. This shows whether ticket exhaustion follows storage degradation, cache pressure, or a query spike. It computes deltas for cumulative counters like totalCreated and lock wait times, surfacing connection churn and growing contention automatically. Alerting paths cover both pre-8.0 wiredTiger.concurrentTransactions and 8.0+ queues.execution metrics, and highlight currentOp age alongside queue depth to help distinguish a single bad query from a systemic bottleneck.
Related guides
- How MongoDB actually works in production: a mental model for operators
- MongoDB pages evicted by application threads: when eviction becomes user latency
- MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches
- MongoDB WiredTiger cache pressure cascade: eviction stalls and latency spikes
- MongoDB cache too small: sizing the WiredTiger cache for your working set
- MongoDB checkpoint duration climbing: diagnosing slow WiredTiger checkpoints
- MongoDB checkpoint stall write freeze: when all writes stop with no error
- MongoDB connection refused at maxIncomingConnections: hitting the connection ceiling
- MongoDB connection storm spiral: reconnection floods after an election or deploy
- MongoDB flow control throttling writes: when the primary slows itself down
- MongoDB journal sync latency high: the storage signal that warns 60 seconds early
- MongoDB monitoring checklist: the signals every production cluster needs







