MongoDB ticket exhaustion: WiredTiger read/write tickets and queued operations

Your application times out while the OS shows idle CPU and disk utilisation looks survivable. The MongoDB log shows no obvious errors, yet operations stall. The likely cause is WiredTiger ticket exhaustion: the storage engine has run out of read or write concurrency tokens, and new work queues behind slow operations. Confirm ticket starvation, find the root cause, and fix it without raising the ticket limit.

What this means

WiredTiger uses ticket-based admission control. Every operation that touches the storage engine must acquire a read or write ticket before it proceeds. In MongoDB 6.x and earlier, the default is 128 read and 128 write tickets per node. MongoDB 7.0 introduced a dynamic throughputProbing algorithm that adjusts the active ceiling downward from 128 under light load, scaling up under demand but never exceeding 128. In MongoDB 8.0+, the metrics moved from wiredTiger.concurrentTransactions to queues.execution, adding queue-length and timing fields that help distinguish true congestion from a low adaptive baseline.

When all tickets are in use, new operations queue. Ticket exhaustion is a symptom, not the disease. The root cause is almost always that operations hold tickets too long because of slow disk I/O, cache pressure, lock contention, or long-running queries. Raising the ticket limit is almost never the correct fix. It simply allows more operations to enter the storage engine and compound the bottleneck.

Flow control on replica set primaries uses a separate ticket pool to throttle writes and manage replication lag. Flow control starvation produces different symptoms and is governed by flowControlTargetLagSeconds. This article focuses on WiredTiger execution tickets.

flowchart TD
    A[Tickets below 25%] --> B{Long ops in currentOp?}
    B -->|Yes| C[Kill op or fix query plan]
    B -->|No| D{Journal sync above 30ms?}
    D -->|Yes| E[Storage bottleneck]
    D -->|No| F{Dirty ratio above 10%?}
    F -->|Yes| G[Cache pressure cascade]
    F -->|No| H{Connection churn?}
    H -->|Yes| I[Connection storm]
    H -->|No| J[Adaptive floor post-7.0]

Common causes

CauseWhat it looks likeFirst thing to check
Slow disk I/O or storage degradationAvailable tickets near zero, journal sync latency sustained above 30 ms, checkpoint duration climbingOS disk latency (iostat -x 1) and wiredTiger.log sync averages
WiredTiger cache pressureCache dirty ratio above 10%, application-thread evictions incrementing, ticket availability dropping during peakswiredTiger.cache dirty ratio and pages evicted by application threads
Long-running operations holding ticketsFew connections but low ticket availability, currentOp shows operations running longer than 60 secondsdb.currentOp({ active: true, secs_running: { $gt: 10 } })
Connection storm or concurrency spikeConnection count spiking rapidly, totalCreated delta high, ticket exhaustion follows a trigger eventserverStatus().connections and client source in currentOp
Dynamic algorithm baseline after 7.0 upgradetotalTickets is 7-15 under light load, but queue length is zero and latency is normalqueues.execution queue length and totalTimeQueuedMicros (8.0+) or correlate with opLatencies

Quick checks

// Ticket availability (MongoDB <= 7.x)
var t = db.serverStatus().wiredTiger.concurrentTransactions;
print("Read: " + t.read.available + "/" + t.read.totalTickets);
print("Write: " + t.write.available + "/" + t.write.totalTickets);
// Ticket availability (MongoDB 8.0+)
var q = db.serverStatus().queues.execution;
print("Read: " + q.read.available + "/" + q.read.totalTickets);
print("Write: " + q.write.available + "/" + q.write.totalTickets);
<!-- TODO: verify exact field names for queues.execution subdocuments -->
// Long-running operations
db.currentOp({ active: true, secs_running: { $gt: 10 } }).inprog.forEach(function(op) {
  print(op.opid + " | " + op.op + " | " + op.secs_running + "s | " + op.ns);
});
// Queue depths
db.serverStatus().globalLock.currentQueue
// Cache pressure
var c = db.serverStatus().wiredTiger.cache;
var max = c["maximum bytes configured"];
print("Dirty: " + (100 * c["tracked dirty bytes in the cache"] / max).toFixed(1) + "%");
print("App evictions: " + c["pages evicted by application threads"]);
// Journal sync latency
var wt = db.serverStatus().wiredTiger.log;
var syncOps = wt["log sync operations"];
var syncTime = wt["log sync time duration (usecs)"];
print("Avg sync µs: " + (syncOps > 0 ? (syncTime / syncOps).toFixed(0) : "N/A"));
// Connection churn
var c = db.serverStatus().connections;
print("Current: " + c.current + ", Available: " + c.available + ", TotalCreated: " + c.totalCreated);

How to diagnose it

  1. Confirm the direction of exhaustion. Use the version-appropriate serverStatus path to read available read and write tickets. If both are low, suspect storage or cache pressure. If only writes are low, suspect checkpoint stall, journal latency, or write-heavy lock contention.

  2. Check for version-specific behavior. On MongoDB 7.0+, the dynamic algorithm intentionally keeps totalTickets low under light load. Low totalTickets alone is not an error. In 8.0+, inspect queueLength and totalTimeQueuedMicros under queues.execution. If those are zero, the system is not queuing and you are seeing normal adaptive behavior.

  3. Inspect db.currentOp() for operations running longer than 60 seconds. A single large aggregation or collection scan can hold tickets for minutes and starve the pool. Note the opid, ns, and secs_running.

  4. Measure storage health. Compute average journal sync latency from wiredTiger.log. If it exceeds 30 ms sustained, disk I/O is the likely root cause. Check OS-level disk metrics with iostat -x 1 for elevated %util or await.

  5. Evaluate WiredTiger cache pressure. If the dirty ratio exceeds 10% or pages evicted by application threads is incrementing, the cache is forcing operations to do eviction work while holding tickets. This creates a feedback loop where slower operations consume more tickets for longer.

  6. Look at connection patterns. A spike in totalCreated with stable current indicates connection churn. A spike in current after an election or deploy suggests a connection storm. Each new connection adds a thread that competes for tickets.

  7. Review lock wait times in serverStatus().locks. Growing timeAcquiringMicros for Collection or Global locks points to contention that keeps tickets held.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Available read/write ticketsHeadroom before queuingBelow 25% of total, or below 10 absolute
globalLock.currentQueueDirect indicator of queued operationsSustained total above 20 or growing trend
opLatencies reads/writesUser-visible latency from ticket waitsAverage sustained above 2x baseline
WiredTiger cache dirty ratioPredicts checkpoint stall and eviction pressureAbove 10% sustained
Journal sync latencyLeading indicator of storage degradationAbove 30 ms sustained
Checkpoint durationWhether dirty pages can flush fast enoughAbove 30 seconds or approaching the 60-second interval
currentOp max ageCatches runaway queries before they exhaust ticketsAbove 60 seconds in OLTP workloads
Connection count and totalCreated deltaReveals concurrency spikes and churnCurrent above 80% of max, or high totalCreated rate
Lock timeAcquiringMicrosShows contention keeping tickets heldGrowing deltas relative to opLatencies totals

Fixes

Fix slow storage

If journal sync latency is elevated or checkpoint duration is climbing, the disk subsystem cannot keep up. On cloud block storage, burst credit exhaustion is a common cause. Look for await above 10 ms or %util approaching 100% in iostat. Provision more IOPS or throughput, or move the primary to a node with healthier storage. Do not raise the ticket limit. If a single node is degraded, step down the primary to shift writes to a healthier member.

Fix cache pressure

Kill unnecessary long-running operations with db.killOp(opid). Warning: the operation may not release tickets immediately, and killing large operations can briefly increase load.

Reduce write throughput by pausing batch jobs or throttling ingestion. If the cache is genuinely undersized for the working set, increase it via storage.wiredTiger.engineConfig.cacheSizeGB and restart the node. See MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches and MongoDB cache too small: sizing the WiredTiger cache for your working set.

Fix long-running queries

If currentOp reveals a collection scan, an unindexed sort, or an unexpected aggregation, terminate it with db.killOp(opid) and fix the query plan. Add missing indexes, add query timeouts at the application level, or rewrite overly broad aggregations. One bad query can hold tickets long enough to degrade the whole node.

Fix connection storms

Identify the heaviest client sources via db.currentOp() grouped by client. If connections are overwhelming the node, block new sources at the firewall or load balancer to stop the spiral. Restarting with a lower net.maxIncomingConnections can cap the ceiling but requires a planned outage. Then fix the trigger: stabilise the replica set primary, resolve network blips, or fix DNS. See MongoDB connection storm spiral: reconnection floods after an election or deploy.

Do not disable dynamic tickets without support

On MongoDB 7.0+, attempting to set wiredTigerConcurrentReadTransactions at runtime may return an error when dynamic adjustment is active. Reverting to static 128 tickets requires setting wiredTigerConcurrentReadTransactions: 128 and wiredTigerConcurrentWriteTransactions: 128 in the configuration file and restarting. Engage MongoDB Technical Support before making this change. In most cases, the dynamic algorithm is not the problem.

Prevention

  • Graph available tickets continuously and alert when they drop below 25% of the current totalTickets. Do not wait for zero.
  • Correlate ticket graphs with journal sync latency, cache dirty ratio, and checkpoint duration so you can distinguish storage problems from query problems before queues form.
  • Track the oldest running operation from currentOp as a continuous metric. Runaway queries are easier to kill at 30 seconds than at 5 minutes.
  • Monitor connection churn (totalCreated deltas), not just connection count. Reconnection storms often precede ticket exhaustion by minutes.
  • If you run MongoDB 7.0+, remember that totalTickets will appear low under light load. Monitor queue depth and latency instead of fixating on the absolute ticket count. Use queues.execution in 8.0+ to watch queueLength and totalTimeQueuedMicros.

How Netdata helps

Netdata correlates available read/write tickets, cache dirty ratio, journal sync latency, checkpoint duration, and opLatencies on the same timeline. This shows whether ticket exhaustion follows storage degradation, cache pressure, or a query spike. It computes deltas for cumulative counters like totalCreated and lock wait times, surfacing connection churn and growing contention automatically. Alerting paths cover both pre-8.0 wiredTiger.concurrentTransactions and 8.0+ queues.execution metrics, and highlight currentOp age alongside queue depth to help distinguish a single bad query from a systemic bottleneck.