MongoDB disk I/O saturation: correlating iostat with WiredTiger signals
When opLatencies.writes climbs and globalLock.currentQueue grows, db.serverStatus().wiredTiger.transaction often shows the most recent checkpoint took 45 seconds. WiredTiger metrics tell you what is hurting, but they do not tell you why. The next question is whether the disk is actually saturated.
Disk I/O saturation surfaces as climbing journal sync latency, checkpoint duration exceeding the 60-second interval, application-thread evictions, and ticket exhaustion. The only way to separate a storage problem from a query problem is to correlate OS-level disk signals (iostat -x) with WiredTiger internal signals in the same time window. This guide shows how to do that safely during an incident.
What this means
WiredTiger checkpoints run every 60 seconds by default and flush dirty cache pages to disk. The journal syncs every 100ms by default. Both are sequential write commitments that depend on stable, low-latency storage. When the underlying disk cannot absorb the write rate, await rises, operations hold write tickets longer, and the cache dirty ratio grows because pages cannot be flushed fast enough. If the next checkpoint arrives before the previous one finishes, the system enters a stall cycle that can freeze all writes.
MongoDB signals identify the subsystem in distress; OS signals identify the resource constraint. If you only watch MongoDB, you will chase cache tuning and ticket limits while the real problem is a depleted EBS burst credit bucket or a RAID rebuild.
flowchart TD
A[Disk saturation
%util high / await high] --> B[Journal sync latency spikes]
A --> C[Checkpoint duration climbs]
B --> D[Write tickets held longer]
C --> D
D --> E[Available tickets drop]
E --> F[Queue depth grows]
F --> G[opLatencies spike]
C --> H[Cache dirty ratio rises]
H --> I[Application-thread evictions]
I --> GCommon causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Cloud block storage burst credit depletion (EBS gp2, GCE PD) | await jumps to tens or hundreds of milliseconds while IOPS look modest; journal sync latency spikes first | Cloud metric BurstBalance or PD throughput/throttle counters |
| Local disk or RAID degradation | Sustained high %util and await on the data device across all workloads; no burst pattern | dmesg, RAID controller status, SMART health |
| Journal and data on the same overloaded device | Checkpoint bursts and journal syncs compete; latency spikes every 60 seconds and every 100ms | Mount points for dbPath and journal directory |
| Write burst exceeding flush capacity | Dirty ratio climbs above 10-15%, app-thread evictions start, checkpoint duration grows | opcounters and metrics.document deltas |
| Concurrent maintenance, backups, or compactions | Periodic spikes aligned with cron or backup windows; I/O patterns do not match application traffic | Backup schedules, currentOp, and OS process list |
Quick checks
Run these read-only checks from the MongoDB host during the incident. Do them in a single terminal session so timestamps line up.
# Check that mongod is still responsive
time mongosh --quiet --eval 'db.adminCommand({ping:1})'
# Disk saturation: look at %util and await for the data device
iostat -x 1 5
# Journal sync latency and operation count
mongosh --quiet --eval '
var s = db.serverStatus().wiredTiger.log;
print("log sync time (us): " + s["log sync time duration (usecs)"]);
print("log sync ops: " + s["log sync operations"]);
'
# Most recent checkpoint duration
mongosh --quiet --eval '
var t = db.serverStatus().wiredTiger.transaction;
print("last checkpoint ms: " + t["transaction checkpoint most recent time (msecs)"]);
print("total checkpoints: " + t["transaction checkpoints"]);
'
# Cache fill and dirty ratios
mongosh --quiet --eval '
var c = db.serverStatus().wiredTiger.cache;
var max = c["maximum bytes configured"];
var used = c["bytes currently in the cache"];
var dirty = c["tracked dirty bytes in the cache"];
print("cache used: " + (100*used/max).toFixed(1) + "%");
print("cache dirty: " + (100*dirty/max).toFixed(1) + "%");
'
# Available tickets
mongosh --quiet --eval '
var tr = db.serverStatus().wiredTiger.concurrentTransactions;
print("read available: " + tr.read.available + "/" + tr.read.totalTickets);
print("write available: " + tr.write.available + "/" + tr.write.totalTickets);
'
# Queue depth and operation latency
mongosh --quiet --eval '
printjson(db.serverStatus().globalLock.currentQueue);
printjson(db.serverStatus().opLatencies);
'
Interpret iostat -x with disk type in mind. On spinning disks, %util above 70-80% indicates saturation. On SSD or NVMe, %util is misleading because devices queue efficiently; trust await. If await is climbing with high %util, the workload is exceeding what the device can deliver.
How to diagnose it
Confirm the MongoDB symptom. Check
transaction checkpoint most recent time (msecs)andlog sync time duration (usecs). A checkpoint above 30 seconds or journal sync averaging above 30 ms points to storage. These are leading indicators that often appear 30-60 seconds beforeopLatenciesspikes.Capture OS disk signals in the same window. Run
iostat -x 1 5on the data volume and the journal volume. Look at%util,await, andavgqu-sz. Sustained elevatedawaitis the strongest saturation signal. Short micro-bursts may be averaged away at 1-second intervals, so run the capture continuously during the spike.Correlate direction and timing. If
awaitrises at the same time as journal sync latency, the disk is the bottleneck. Ifawaitis flat while MongoDB latency rises, look for lock contention, ticket exhaustion from a long-running operation, or a cache pressure cascade.Check the dirty ratio and eviction. If dirty ratio is above 15-20% and
pages evicted by application threadsis increasing, the disk cannot flush dirty pages fast enough. This confirms storage throughput is the constraint, not just latency.Check ticket availability. If available write tickets drop below 25% of total or below 10 absolute, operations are queuing because they hold tickets too long. Ticket exhaustion is usually a symptom of slow disk I/O, not a root cause.
Rule out cloud burst depletion. On AWS, gp2 volumes under 1TiB burst to 3,000 IOPS but sustain only baseline IOPS once credits run out. The inflection looks like a sudden latency cliff, not a gradual ramp. Verify
BurstBalancein CloudWatch or equivalent PD throttle metrics in GCP. Moving to gp3 removes the burst credit system.Identify competing consumers. Check
currentOpfor long-running operations, backup jobs, index builds, or large aggregations running on the primary. If a backup ormongodumpis saturating the disk, reschedule it to a hidden secondary.Decide whether to step down or throttle. If the primary’s storage is degraded and another member has healthy disk metrics, a controlled
rs.stepDown()can move writes off the bad node. Do not kill a running checkpoint; let it complete.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
iostat -x %util / await on data and journal devices | Direct measure of disk saturation and latency | await climbing with %util sustained above 70-80% on HDD, or await climbing on SSD/NVMe |
WiredTiger journal sync latency (log sync time duration (usecs) / log sync operations) | Journal sync is the highest-frequency storage commitment; spikes here precede application latency by 30-60 seconds | Average above 30ms sustained, or any jump above baseline |
WiredTiger checkpoint duration (transaction checkpoint most recent time (msecs)) | Checkpoints flush dirty pages; if they exceed the 60-second interval, stalls follow | Above 30 seconds sustained, or above 60 seconds critical |
WiredTiger cache dirty ratio (tracked dirty bytes in the cache / maximum bytes configured) | Leading indicator of flush backlog; rises when storage cannot absorb writes | Sustained above 10-15%, critical above 20% |
Application-thread evictions (pages evicted by application threads) | Confirms cache pressure is translating into user-visible latency | Any sustained nonzero rate after warmup |
WiredTiger ticket availability (wiredTiger.concurrentTransactions) | Storage engine admission control; low availability means operations are queuing | Available write tickets below 25% of total, or below 10 absolute |
opLatencies writes and reads | User-visible latency; confirms impact | Average or tail latency doubling from baseline for more than 5 minutes |
globalLock.currentQueue | Aggregate queue depth; grows when throughput cannot keep up | Sustained above 20 and increasing |
Fixes
Storage is actually saturated
- Cloud volumes: Resize the volume or migrate to a type without burst limits (AWS gp3, Azure Premium SSD v2, or provisioned IOPS). This is the most common fix for EBS gp2 burst-credit cliffs.
- Local disks: Replace a degraded drive or expand the RAID array. If a RAID rebuild is running, expect temporary saturation and consider throttling writes until it completes.
- Separate journal and data: Place the WiredTiger journal on a distinct device from the data files. Journal syncs are small sequential writes and should not compete with checkpoint write bursts.
Workload is overwhelming flush capacity
- Throttle writes: Pause batch jobs, migrations, or bulk imports. This is often faster than any infrastructure change.
- Kill long-running operations: Use
db.currentOp()to find operations holding tickets or snapshots, thendb.killOp(opid)if they are not critical. Long-running transactions ornoCursorTimeoutcursors can pin cache snapshots and make eviction harder. - Reschedule backups and maintenance: Run
mongodump, compaction, and index builds on hidden secondaries or during low-traffic windows.
Configuration adjustments
- Right-size the WiredTiger cache: The default is
max(50% of (RAM - 1GB), 256MB). In containers, explicitly set--wiredTigerCacheSizeGBbased on the container limit, not the host. - Journal commit interval:
storage.journal.commitIntervalMsdefaults to 100ms. Lowering it increases durability frequency but also I/O load. Only change this if you understand the tradeoff and the storage can absorb it. - Do not increase ticket limits: Raising
wiredTigerConcurrentReadTransactionsorwiredTigerConcurrentWriteTransactionsallows more operations into the storage engine but will not fix a disk bottleneck. It usually makes saturation worse.
Prevention
- Establish a baseline for
iostat -xon each MongoDB host during normal peak load. Baseline%util,await, and queue depth per device. - Graph MongoDB storage signals (checkpoint duration, journal sync latency, dirty ratio, available tickets) on the same dashboard as OS disk metrics. Use the same time window.
- Prefer cloud disk types without burst-credit mechanics for production database workloads.
- Keep journal and data on separate mount points when hardware allows.
- Size the oplog for peak write throughput and trend the window over time; a shrinking window can force more I/O during catch-up.
- Review slow queries and index health regularly so a collection scan does not compound a marginal disk situation.
How Netdata helps
- Per-second
iostatmetrics alongside MongoDB collector data let you overlay%util,await, journal sync latency, and checkpoint duration on one chart. - Pre-built alerts for WiredTiger cache dirty ratio, application-thread evictions, and available ticket ratio surface storage pressure before writes freeze.
- Netdata overlays
opLatenciesspikes and disk saturation windows on the same chart, so you can see whether a latency jump started withawaitor with a lock holder. - Per-process I/O visibility helps isolate whether MongoDB or a co-located backup job is consuming disk bandwidth.
Related guides
- How MongoDB actually works in production: a mental model for operators
- MongoDB pages evicted by application threads: when eviction becomes user latency
- MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches
- MongoDB WiredTiger cache pressure cascade: eviction stalls and latency spikes
- MongoDB cache too small: sizing the WiredTiger cache for your working set
- MongoDB checkpoint duration climbing: diagnosing slow WiredTiger checkpoints
- MongoDB checkpoint stall write freeze: when all writes stop with no error
- MongoDB connection churn: high totalCreated rate and thread creation overhead
- MongoDB connection refused at maxIncomingConnections: hitting the connection ceiling
- MongoDB connection storm spiral: reconnection floods after an election or deploy
- MongoDB disk full: emergency recovery when mongod can’t write the journal
- MongoDB exceeded memory limit for $group — aggregation spills and allowDiskUse







