MongoDB disk I/O saturation: correlating iostat with WiredTiger signals

When opLatencies.writes climbs and globalLock.currentQueue grows, db.serverStatus().wiredTiger.transaction often shows the most recent checkpoint took 45 seconds. WiredTiger metrics tell you what is hurting, but they do not tell you why. The next question is whether the disk is actually saturated.

Disk I/O saturation surfaces as climbing journal sync latency, checkpoint duration exceeding the 60-second interval, application-thread evictions, and ticket exhaustion. The only way to separate a storage problem from a query problem is to correlate OS-level disk signals (iostat -x) with WiredTiger internal signals in the same time window. This guide shows how to do that safely during an incident.

What this means

WiredTiger checkpoints run every 60 seconds by default and flush dirty cache pages to disk. The journal syncs every 100ms by default. Both are sequential write commitments that depend on stable, low-latency storage. When the underlying disk cannot absorb the write rate, await rises, operations hold write tickets longer, and the cache dirty ratio grows because pages cannot be flushed fast enough. If the next checkpoint arrives before the previous one finishes, the system enters a stall cycle that can freeze all writes.

MongoDB signals identify the subsystem in distress; OS signals identify the resource constraint. If you only watch MongoDB, you will chase cache tuning and ticket limits while the real problem is a depleted EBS burst credit bucket or a RAID rebuild.

flowchart TD
    A[Disk saturation
%util high / await high] --> B[Journal sync latency spikes] A --> C[Checkpoint duration climbs] B --> D[Write tickets held longer] C --> D D --> E[Available tickets drop] E --> F[Queue depth grows] F --> G[opLatencies spike] C --> H[Cache dirty ratio rises] H --> I[Application-thread evictions] I --> G

Common causes

CauseWhat it looks likeFirst thing to check
Cloud block storage burst credit depletion (EBS gp2, GCE PD)await jumps to tens or hundreds of milliseconds while IOPS look modest; journal sync latency spikes firstCloud metric BurstBalance or PD throughput/throttle counters
Local disk or RAID degradationSustained high %util and await on the data device across all workloads; no burst patterndmesg, RAID controller status, SMART health
Journal and data on the same overloaded deviceCheckpoint bursts and journal syncs compete; latency spikes every 60 seconds and every 100msMount points for dbPath and journal directory
Write burst exceeding flush capacityDirty ratio climbs above 10-15%, app-thread evictions start, checkpoint duration growsopcounters and metrics.document deltas
Concurrent maintenance, backups, or compactionsPeriodic spikes aligned with cron or backup windows; I/O patterns do not match application trafficBackup schedules, currentOp, and OS process list

Quick checks

Run these read-only checks from the MongoDB host during the incident. Do them in a single terminal session so timestamps line up.

# Check that mongod is still responsive
time mongosh --quiet --eval 'db.adminCommand({ping:1})'

# Disk saturation: look at %util and await for the data device
iostat -x 1 5

# Journal sync latency and operation count
mongosh --quiet --eval '
  var s = db.serverStatus().wiredTiger.log;
  print("log sync time (us): " + s["log sync time duration (usecs)"]);
  print("log sync ops: " + s["log sync operations"]);
'

# Most recent checkpoint duration
mongosh --quiet --eval '
  var t = db.serverStatus().wiredTiger.transaction;
  print("last checkpoint ms: " + t["transaction checkpoint most recent time (msecs)"]);
  print("total checkpoints: " + t["transaction checkpoints"]);
'

# Cache fill and dirty ratios
mongosh --quiet --eval '
  var c = db.serverStatus().wiredTiger.cache;
  var max = c["maximum bytes configured"];
  var used = c["bytes currently in the cache"];
  var dirty = c["tracked dirty bytes in the cache"];
  print("cache used: " + (100*used/max).toFixed(1) + "%");
  print("cache dirty: " + (100*dirty/max).toFixed(1) + "%");
'

# Available tickets
mongosh --quiet --eval '
  var tr = db.serverStatus().wiredTiger.concurrentTransactions;
  print("read available: " + tr.read.available + "/" + tr.read.totalTickets);
  print("write available: " + tr.write.available + "/" + tr.write.totalTickets);
'

# Queue depth and operation latency
mongosh --quiet --eval '
  printjson(db.serverStatus().globalLock.currentQueue);
  printjson(db.serverStatus().opLatencies);
'

Interpret iostat -x with disk type in mind. On spinning disks, %util above 70-80% indicates saturation. On SSD or NVMe, %util is misleading because devices queue efficiently; trust await. If await is climbing with high %util, the workload is exceeding what the device can deliver.

How to diagnose it

  1. Confirm the MongoDB symptom. Check transaction checkpoint most recent time (msecs) and log sync time duration (usecs). A checkpoint above 30 seconds or journal sync averaging above 30 ms points to storage. These are leading indicators that often appear 30-60 seconds before opLatencies spikes.

  2. Capture OS disk signals in the same window. Run iostat -x 1 5 on the data volume and the journal volume. Look at %util, await, and avgqu-sz. Sustained elevated await is the strongest saturation signal. Short micro-bursts may be averaged away at 1-second intervals, so run the capture continuously during the spike.

  3. Correlate direction and timing. If await rises at the same time as journal sync latency, the disk is the bottleneck. If await is flat while MongoDB latency rises, look for lock contention, ticket exhaustion from a long-running operation, or a cache pressure cascade.

  4. Check the dirty ratio and eviction. If dirty ratio is above 15-20% and pages evicted by application threads is increasing, the disk cannot flush dirty pages fast enough. This confirms storage throughput is the constraint, not just latency.

  5. Check ticket availability. If available write tickets drop below 25% of total or below 10 absolute, operations are queuing because they hold tickets too long. Ticket exhaustion is usually a symptom of slow disk I/O, not a root cause.

  6. Rule out cloud burst depletion. On AWS, gp2 volumes under 1TiB burst to 3,000 IOPS but sustain only baseline IOPS once credits run out. The inflection looks like a sudden latency cliff, not a gradual ramp. Verify BurstBalance in CloudWatch or equivalent PD throttle metrics in GCP. Moving to gp3 removes the burst credit system.

  7. Identify competing consumers. Check currentOp for long-running operations, backup jobs, index builds, or large aggregations running on the primary. If a backup or mongodump is saturating the disk, reschedule it to a hidden secondary.

  8. Decide whether to step down or throttle. If the primary’s storage is degraded and another member has healthy disk metrics, a controlled rs.stepDown() can move writes off the bad node. Do not kill a running checkpoint; let it complete.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
iostat -x %util / await on data and journal devicesDirect measure of disk saturation and latencyawait climbing with %util sustained above 70-80% on HDD, or await climbing on SSD/NVMe
WiredTiger journal sync latency (log sync time duration (usecs) / log sync operations)Journal sync is the highest-frequency storage commitment; spikes here precede application latency by 30-60 secondsAverage above 30ms sustained, or any jump above baseline
WiredTiger checkpoint duration (transaction checkpoint most recent time (msecs))Checkpoints flush dirty pages; if they exceed the 60-second interval, stalls followAbove 30 seconds sustained, or above 60 seconds critical
WiredTiger cache dirty ratio (tracked dirty bytes in the cache / maximum bytes configured)Leading indicator of flush backlog; rises when storage cannot absorb writesSustained above 10-15%, critical above 20%
Application-thread evictions (pages evicted by application threads)Confirms cache pressure is translating into user-visible latencyAny sustained nonzero rate after warmup
WiredTiger ticket availability (wiredTiger.concurrentTransactions)Storage engine admission control; low availability means operations are queuingAvailable write tickets below 25% of total, or below 10 absolute
opLatencies writes and readsUser-visible latency; confirms impactAverage or tail latency doubling from baseline for more than 5 minutes
globalLock.currentQueueAggregate queue depth; grows when throughput cannot keep upSustained above 20 and increasing

Fixes

Storage is actually saturated

  • Cloud volumes: Resize the volume or migrate to a type without burst limits (AWS gp3, Azure Premium SSD v2, or provisioned IOPS). This is the most common fix for EBS gp2 burst-credit cliffs.
  • Local disks: Replace a degraded drive or expand the RAID array. If a RAID rebuild is running, expect temporary saturation and consider throttling writes until it completes.
  • Separate journal and data: Place the WiredTiger journal on a distinct device from the data files. Journal syncs are small sequential writes and should not compete with checkpoint write bursts.

Workload is overwhelming flush capacity

  • Throttle writes: Pause batch jobs, migrations, or bulk imports. This is often faster than any infrastructure change.
  • Kill long-running operations: Use db.currentOp() to find operations holding tickets or snapshots, then db.killOp(opid) if they are not critical. Long-running transactions or noCursorTimeout cursors can pin cache snapshots and make eviction harder.
  • Reschedule backups and maintenance: Run mongodump, compaction, and index builds on hidden secondaries or during low-traffic windows.

Configuration adjustments

  • Right-size the WiredTiger cache: The default is max(50% of (RAM - 1GB), 256MB). In containers, explicitly set --wiredTigerCacheSizeGB based on the container limit, not the host.
  • Journal commit interval: storage.journal.commitIntervalMs defaults to 100ms. Lowering it increases durability frequency but also I/O load. Only change this if you understand the tradeoff and the storage can absorb it.
  • Do not increase ticket limits: Raising wiredTigerConcurrentReadTransactions or wiredTigerConcurrentWriteTransactions allows more operations into the storage engine but will not fix a disk bottleneck. It usually makes saturation worse.

Prevention

  • Establish a baseline for iostat -x on each MongoDB host during normal peak load. Baseline %util, await, and queue depth per device.
  • Graph MongoDB storage signals (checkpoint duration, journal sync latency, dirty ratio, available tickets) on the same dashboard as OS disk metrics. Use the same time window.
  • Prefer cloud disk types without burst-credit mechanics for production database workloads.
  • Keep journal and data on separate mount points when hardware allows.
  • Size the oplog for peak write throughput and trend the window over time; a shrinking window can force more I/O during catch-up.
  • Review slow queries and index health regularly so a collection scan does not compound a marginal disk situation.

How Netdata helps

  • Per-second iostat metrics alongside MongoDB collector data let you overlay %util, await, journal sync latency, and checkpoint duration on one chart.
  • Pre-built alerts for WiredTiger cache dirty ratio, application-thread evictions, and available ticket ratio surface storage pressure before writes freeze.
  • Netdata overlays opLatencies spikes and disk saturation windows on the same chart, so you can see whether a latency jump started with await or with a lock holder.
  • Per-process I/O visibility helps isolate whether MongoDB or a co-located backup job is consuming disk bandwidth.