$ guides / mongodb / mongodb-ticket-exhaustion ▌

Operations Guides

MongoDB ticket exhaustion: WiredTiger read/write tickets and queued operations

Your application times out while the OS shows idle CPU and disk utilisation looks survivable. The MongoDB log shows no obvious errors, yet operations stall. The likely cause is WiredTiger ticket exhaustion: the storage engine has run out of read or write concurrency tokens, and new work queues behind slow operations. Confirm ticket starvation, find the root cause, and fix it without raising the ticket limit.

What this means

WiredTiger uses ticket-based admission control. Every operation that touches the storage engine must acquire a read or write ticket before it proceeds. In MongoDB 6.x and earlier, the default is 128 read and 128 write tickets per node. MongoDB 7.0 introduced a dynamic throughputProbing algorithm that adjusts the active ceiling downward from 128 under light load, scaling up under demand but never exceeding 128. In MongoDB 8.0+, the metrics moved from wiredTiger.concurrentTransactions to queues.execution, adding queue-length and timing fields that help distinguish true congestion from a low adaptive baseline.

When all tickets are in use, new operations queue. Ticket exhaustion is a symptom, not the disease. The root cause is almost always that operations hold tickets too long because of slow disk I/O, cache pressure, lock contention, or long-running queries. Raising the ticket limit is almost never the correct fix. It simply allows more operations to enter the storage engine and compound the bottleneck.

Flow control on replica set primaries uses a separate ticket pool to throttle writes and manage replication lag. Flow control starvation produces different symptoms and is governed by flowControlTargetLagSeconds. This article focuses on WiredTiger execution tickets.

flowchart TD
    A[Tickets below 25%] --> B{Long ops in currentOp?}
    B -->|Yes| C[Kill op or fix query plan]
    B -->|No| D{Journal sync above 30ms?}
    D -->|Yes| E[Storage bottleneck]
    D -->|No| F{Dirty ratio above 10%?}
    F -->|Yes| G[Cache pressure cascade]
    F -->|No| H{Connection churn?}
    H -->|Yes| I[Connection storm]
    H -->|No| J[Adaptive floor post-7.0]

Common causes

Cause	What it looks like	First thing to check
Slow disk I/O or storage degradation	Available tickets near zero, journal sync latency sustained above 30 ms, checkpoint duration climbing	OS disk latency (`iostat -x 1`) and `wiredTiger.log` sync averages
WiredTiger cache pressure	Cache dirty ratio above 10%, application-thread evictions incrementing, ticket availability dropping during peaks	`wiredTiger.cache` dirty ratio and `pages evicted by application threads`
Long-running operations holding tickets	Few connections but low ticket availability, `currentOp` shows operations running longer than 60 seconds	`db.currentOp({ active: true, secs_running: { $gt: 10 } })`
Connection storm or concurrency spike	Connection count spiking rapidly, `totalCreated` delta high, ticket exhaustion follows a trigger event	`serverStatus().connections` and client source in `currentOp`
Dynamic algorithm baseline after 7.0 upgrade	`totalTickets` is 7-15 under light load, but queue length is zero and latency is normal	`queues.execution` queue length and `totalTimeQueuedMicros` (8.0+) or correlate with `opLatencies`

Quick checks

// Ticket availability (MongoDB <= 7.x)
var t = db.serverStatus().wiredTiger.concurrentTransactions;
print("Read: " + t.read.available + "/" + t.read.totalTickets);
print("Write: " + t.write.available + "/" + t.write.totalTickets);

// Ticket availability (MongoDB 8.0+)
var q = db.serverStatus().queues.execution;
print("Read: " + q.read.available + "/" + q.read.totalTickets);
print("Write: " + q.write.available + "/" + q.write.totalTickets);
<!-- TODO: verify exact field names for queues.execution subdocuments -->

// Long-running operations
db.currentOp({ active: true, secs_running: { $gt: 10 } }).inprog.forEach(function(op) {
  print(op.opid + " | " + op.op + " | " + op.secs_running + "s | " + op.ns);
});

// Queue depths
db.serverStatus().globalLock.currentQueue

// Cache pressure
var c = db.serverStatus().wiredTiger.cache;
var max = c["maximum bytes configured"];
print("Dirty: " + (100 * c["tracked dirty bytes in the cache"] / max).toFixed(1) + "%");
print("App evictions: " + c["pages evicted by application threads"]);

// Journal sync latency
var wt = db.serverStatus().wiredTiger.log;
var syncOps = wt["log sync operations"];
var syncTime = wt["log sync time duration (usecs)"];
print("Avg sync µs: " + (syncOps > 0 ? (syncTime / syncOps).toFixed(0) : "N/A"));

// Connection churn
var c = db.serverStatus().connections;
print("Current: " + c.current + ", Available: " + c.available + ", TotalCreated: " + c.totalCreated);

How to diagnose it

Confirm the direction of exhaustion. Use the version-appropriate serverStatus path to read available read and write tickets. If both are low, suspect storage or cache pressure. If only writes are low, suspect checkpoint stall, journal latency, or write-heavy lock contention.
Check for version-specific behavior. On MongoDB 7.0+, the dynamic algorithm intentionally keeps totalTickets low under light load. Low totalTickets alone is not an error. In 8.0+, inspect queueLength and totalTimeQueuedMicros under queues.execution. If those are zero, the system is not queuing and you are seeing normal adaptive behavior.
Inspect db.currentOp() for operations running longer than 60 seconds. A single large aggregation or collection scan can hold tickets for minutes and starve the pool. Note the opid, ns, and secs_running.
Measure storage health. Compute average journal sync latency from wiredTiger.log. If it exceeds 30 ms sustained, disk I/O is the likely root cause. Check OS-level disk metrics with iostat -x 1 for elevated %util or await.
Evaluate WiredTiger cache pressure. If the dirty ratio exceeds 10% or pages evicted by application threads is incrementing, the cache is forcing operations to do eviction work while holding tickets. This creates a feedback loop where slower operations consume more tickets for longer.
Look at connection patterns. A spike in totalCreated with stable current indicates connection churn. A spike in current after an election or deploy suggests a connection storm. Each new connection adds a thread that competes for tickets.
Review lock wait times in serverStatus().locks. Growing timeAcquiringMicros for Collection or Global locks points to contention that keeps tickets held.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Available read/write tickets	Headroom before queuing	Below 25% of total, or below 10 absolute
`globalLock.currentQueue`	Direct indicator of queued operations	Sustained total above 20 or growing trend
`opLatencies` reads/writes	User-visible latency from ticket waits	Average sustained above 2x baseline
WiredTiger cache dirty ratio	Predicts checkpoint stall and eviction pressure	Above 10% sustained
Journal sync latency	Leading indicator of storage degradation	Above 30 ms sustained
Checkpoint duration	Whether dirty pages can flush fast enough	Above 30 seconds or approaching the 60-second interval
`currentOp` max age	Catches runaway queries before they exhaust tickets	Above 60 seconds in OLTP workloads
Connection count and `totalCreated` delta	Reveals concurrency spikes and churn	Current above 80% of max, or high `totalCreated` rate
Lock `timeAcquiringMicros`	Shows contention keeping tickets held	Growing deltas relative to `opLatencies` totals

Fixes

Fix slow storage

If journal sync latency is elevated or checkpoint duration is climbing, the disk subsystem cannot keep up. On cloud block storage, burst credit exhaustion is a common cause. Look for await above 10 ms or %util approaching 100% in iostat. Provision more IOPS or throughput, or move the primary to a node with healthier storage. Do not raise the ticket limit. If a single node is degraded, step down the primary to shift writes to a healthier member.

Fix cache pressure

Kill unnecessary long-running operations with db.killOp(opid). Warning: the operation may not release tickets immediately, and killing large operations can briefly increase load.

Reduce write throughput by pausing batch jobs or throttling ingestion. If the cache is genuinely undersized for the working set, increase it via storage.wiredTiger.engineConfig.cacheSizeGB and restart the node. See MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches and MongoDB cache too small: sizing the WiredTiger cache for your working set.

Fix long-running queries

If currentOp reveals a collection scan, an unindexed sort, or an unexpected aggregation, terminate it with db.killOp(opid) and fix the query plan. Add missing indexes, add query timeouts at the application level, or rewrite overly broad aggregations. One bad query can hold tickets long enough to degrade the whole node.

Fix connection storms

Identify the heaviest client sources via db.currentOp() grouped by client. If connections are overwhelming the node, block new sources at the firewall or load balancer to stop the spiral. Restarting with a lower net.maxIncomingConnections can cap the ceiling but requires a planned outage. Then fix the trigger: stabilise the replica set primary, resolve network blips, or fix DNS. See MongoDB connection storm spiral: reconnection floods after an election or deploy.

Do not disable dynamic tickets without support

On MongoDB 7.0+, attempting to set wiredTigerConcurrentReadTransactions at runtime may return an error when dynamic adjustment is active. Reverting to static 128 tickets requires setting wiredTigerConcurrentReadTransactions: 128 and wiredTigerConcurrentWriteTransactions: 128 in the configuration file and restarting. Engage MongoDB Technical Support before making this change. In most cases, the dynamic algorithm is not the problem.

Prevention

Graph available tickets continuously and alert when they drop below 25% of the current totalTickets. Do not wait for zero.
Correlate ticket graphs with journal sync latency, cache dirty ratio, and checkpoint duration so you can distinguish storage problems from query problems before queues form.
Track the oldest running operation from currentOp as a continuous metric. Runaway queries are easier to kill at 30 seconds than at 5 minutes.
Monitor connection churn (totalCreated deltas), not just connection count. Reconnection storms often precede ticket exhaustion by minutes.
If you run MongoDB 7.0+, remember that totalTickets will appear low under light load. Monitor queue depth and latency instead of fixating on the absolute ticket count. Use queues.execution in 8.0+ to watch queueLength and totalTimeQueuedMicros.

How Netdata helps

Netdata correlates available read/write tickets, cache dirty ratio, journal sync latency, checkpoint duration, and opLatencies on the same timeline. This shows whether ticket exhaustion follows storage degradation, cache pressure, or a query spike. It computes deltas for cumulative counters like totalCreated and lock wait times, surfacing connection churn and growing contention automatically. Alerting paths cover both pre-8.0 wiredTiger.concurrentTransactions and 8.0+ queues.execution metrics, and highlight currentOp age alongside queue depth to help distinguish a single bad query from a systemic bottleneck.

The Netdata solution

MongoDB monitoring with Netdata

Netdata monitors MongoDB with per-second metrics and automatic dashboards. Watch WiredTiger cache pressure, oplog window, connection counts, checkpoint stalls, and replication health in one place, correlated with the underlying host.

See MongoDB monitoring → Start monitoring free

MongoDB ticket exhaustion: WiredTiger read/write tickets and queued operations

MongoDB ticket exhaustion: WiredTiger read/write tickets and queued operations

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Fix slow storage

Fix cache pressure

Fix long-running queries

Fix connection storms

Do not disable dynamic tickets without support

Prevention

How Netdata helps

Related guides

MongoDB monitoring with Netdata