MongoDB journal sync latency high: the storage signal that warns 60 seconds early

Application write latency spikes. Connections pile up. Look back 60 seconds and WiredTiger journal sync latency was likely already climbing. Every write with j:true or w:"majority" blocks until the journal buffer is fsynced to disk. When storage struggles, journal sync is the first domino to fall.

Journal sync latency is a storage subsystem signal, not a query or cache problem. The block device under mongod cannot absorb small sequential writes fast enough. The result is head-of-line delay for all durable writes, which cascades into ticket exhaustion and connection backlog.

What this means

WiredTiger maintains a write-ahead journal in <dbPath>/journal/. By default, it syncs this journal to disk every 100 ms. A write arriving with j:true triggers an immediate synchronous flush. On replica set secondaries, WiredTiger also syncs the journal after applying each oplog batch.

The counters in db.serverStatus().wiredTiger.log are cumulative since process start. The lifetime average sync latency is:

"log sync time duration (usecs)" / "log sync operations"

On modern SSD or NVMe storage, the interval average should stay below 10 ms. Between 10 ms and 50 ms is concerning. Above 100 ms sustained, all j:true and w:"majority" writes on that node are effectively stalled.

Since MongoDB 3.6, w:"majority" defaults to writeConcernMajorityJournalDefault: true unless changed at deploy time. This means majority writes are subject to journal sync latency even when the application does not explicitly set j:true. Reads and j:false writes are not directly blocked, but uncommitted journal entries accumulate, increasing the risk of a larger stall when the buffer eventually flushes.

flowchart TD
    A[Storage I/O degrades] --> B[Journal sync latency spikes]
    B --> C[j:true / w:majority writes stall]
    C --> D[Tickets held longer]
    D --> E[Queue depth grows]
    E --> F[Application write latency spikes]
    F --> G[Connection pileup]

Common causes

CauseWhat it looks likeFirst thing to check
Cloud burst credit depletion (EBS gp2, GCE PD)Latency jumps from <10 ms to >100 ms while write volume stays flat; no local disk errorsCloud provider volume burst balance or IOPS throttling metrics
Checkpoint I/O contention (journal and data share one device)Journal sync spikes correlate with checkpoint duration spikes every 60 sDisk layout (df, <dbPath> / <dbPath>/journal mount points); iostat -x
Disk subsystem degradationElevated OS await across all I/O types; RAID controller or SMART alertsiostat -x 1, dmesg, RAID controller logs
Noisy neighbor / shared storage oversubscriptionLatency spikes without a local workload change; common on virtualized or multi-tenant storageHost-level I/O metrics, hypervisor storage latency, provider status page
NFS or network-backed storage latencyHigh journal sync on nodes using NFS or SAN for dbPath; network RTT correlatesmount options, nfsstat, network RTT between host and storage

Quick checks

Run these read-only commands to confirm the symptom and narrow the cause.

# These fields are cumulative. Run this command twice, 5-10 seconds apart, and diff the values.
# Interval latency (ms) = (sync_usec_delta / ops_delta) / 1000
mongosh --quiet --eval '
  var s = db.serverStatus().wiredTiger.log;
  print("sync_usec=" + s["log sync time duration (usecs)"] + " ops=" + s["log sync operations"]);
'
# Check OS-level disk latency and utilization.
# Look for %util > 70 or await > 10 ms on the data/journal device.
iostat -x 1 5
# Check whether the node is primary.
mongosh --quiet --eval 'rs.status().members.forEach(m => print(m.name + " " + m.stateStr))'
# Verify disk topology: are journal and data on the same filesystem?
# Replace paths if your dbPath differs from the default.
df -h /var/lib/mongodb /var/lib/mongodb/journal 2>/dev/null || echo "Check dbPath and journal paths manually"
# Check WiredTiger checkpoint duration.
mongosh --quiet --eval '
  var t = db.serverStatus().wiredTiger.transaction;
  print("Last checkpoint: " + t["transaction checkpoint most recent time (msecs)"] + " ms");
'
# Check WiredTiger write ticket availability.
mongosh --quiet --eval '
  var w = db.serverStatus().wiredTiger.concurrentTransactions.write;
  print("Write tickets available: " + w.available + " / " + (w.out + w.available));
'

How to diagnose it

  1. Establish the baseline deviation. Sample wiredTiger.log twice and compute the interval average. Compare it to the node’s historical baseline. A sustained jump from 2 ms to 50 ms is more significant than a steady 15 ms on a known slow disk.
  2. Confirm the impact scope. If the node is a primary, j:true and w:"majority" writes from all clients stall. If it is a secondary, replication oplog application stalls, which increases replication lag and may eventually trigger flow control on the primary.
  3. Correlate with OS I/O. Run iostat -x 1 and look at await and %util for the block device hosting dbPath. If await is high across all I/O types and not just journal sync, the disk itself is saturated or degraded.
  4. Check for checkpoint collision. Compare transaction checkpoint most recent time (msecs) with the journal sync spikes. If both spike together every 60 seconds, the shared device is contending between large checkpoint sequential writes and small journal fsyncs.
  5. Investigate cloud storage throttling. On AWS EBS gp2, check volume burst balance. On gp3 or provisioned IOPS volumes, check for IOPS throttling. On GCE PD, check IOPS balance metrics. Sudden latency jumps with no local configuration change often indicate cloud storage throttling.
  6. Verify disk topology. If <dbPath> and <dbPath>/journal reside on the same mount point, checkpoint writes and journal syncs compete for the same device queue. Separating them removes that collision, though it does not fix underlying storage saturation.
  7. Rule out hardware failure. Check dmesg, RAID controller status, and SMART metrics for predictive failure or rebuild activity. A degrading disk often shows elevated await before it shows errors.
  8. Check WiredTiger ticket availability. If available write tickets are near zero while journal sync is high, the storage stall has consumed all concurrency and operations are queuing behind the fsync.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
WiredTiger log sync time / log sync operationsDirect measure of journal fsync latencyInterval average >10 ms sustained; >100 ms is critical
opLatencies.writesUser-visible write latencyRising 30 to 60 seconds after journal sync spikes
WiredTiger checkpoint durationLong checkpoints compete for disk I/O with journal syncs>30 ms and correlating with sync latency spikes
WiredTiger cache dirty ratioDirty data accumulation increases pressure on flush paths>20% sustained
OS disk awaitUnderlying storage device saturation>10 ms on SSD/NVMe sustained
Cloud volume burst balanceCredits fund IOPS above baseline on EBS/GCE PDDepleting or at zero

Fixes

Immediate relief

Do not restart mongod to clear a journal sync stall. A restart forces journal replay and cache warmup, which adds I/O load and prolongs the outage.

  • Throttle or pause non-critical writes. Stop batch jobs, disable analytics ingestion, or pause data migrations. Reducing write volume lowers the frequency of forced journal syncs.
  • Step down a saturated primary. If the primary’s storage is degraded and secondaries have healthy disks, run rs.stepDown() to shift write load. This triggers a replica set election and the node will become secondary; only do this if your application handles failover gracefully.
  • Kill unnecessary long-running writes. Use db.currentOp() to find operations that are amplifying write volume, then terminate them with db.killOp(). Target only known-safe operations; killing writes can leave data in an inconsistent application state.

Storage-layer remediation

  • Scale cloud storage baseline IOPS. On AWS, move from gp2 to gp3 and provision higher baseline IOPS, or increase gp2 volume size to raise its baseline. On GCE PD, increase provisioned IOPS or move to Hyperdisk. This addresses burst credit depletion or baseline saturation permanently.
  • Separate journal and data onto independent devices. Mount <dbPath>/journal on a dedicated, low-latency block device. This isolates small sequential journal fsyncs from large checkpoint sequential writes. This is a provisioning change that typically requires a rolling restart to reconfigure mount points or symlinks.
  • Replace failing hardware. If dmesg or RAID logs show predictive failure, replace the disk or migrate the node.

Application-layer tradeoffs

  • Reducing write concern is dangerous. Switching from w:"majority" to w:1 or j:false will bypass journal sync stalls, but it also removes durability guarantees. Acknowledged writes can be lost on failover. Do not make this change during an incident unless the business accepts data loss.
  • Adjusting commitIntervalMs is not a fix. The valid range is version-dependent. Lowering it increases sync frequency, which can worsen the stall on already-saturated storage. Raising it reduces durability granularity but does not reduce the fsync cost of j:true writes.

Prevention

  • Baseline journal sync latency per node. Establish per-node baselines during normal peak load. Alert on deviation (for example, 3x baseline for 2 minutes) rather than fixed thresholds, because HDD-backed nodes have different baselines than NVMe.
  • Size storage for sustained IOPS, not burst. Cloud burst credits are for short spikes. If sustained write throughput depletes credits, provision enough baseline IOPS to cover the peak.
  • Monitor dirty ratio and checkpoint duration. A rising dirty ratio often precedes journal sync pressure because checkpoint flushes compete with the same device. Trend both together.
  • Review disk topology during provisioning. Plan for <dbPath>/journal on a dedicated device if the workload is write-heavy and uses j:true or w:"majority" by default.
  • Trend replication lag with journal sync. On secondaries, journal sync latency directly caps oplog application throughput. A secondary with slower disks than the primary will drift during bursts.

How Netdata helps

  • Netdata collects wiredTiger.log metrics per second and computes the interval journal sync latency, so you do not need to diff cumulative counters manually during an incident.
  • The MongoDB collector exposes journal sync latency alongside WiredTiger cache dirty ratio, checkpoint duration, and write ticket utilization on the same dashboard. This correlates storage saturation with cache pressure without switching contexts.
  • Per-second granularity captures transient spikes that minute-resolution aggregations miss, preserving the 30 to 60 second lead time this signal provides.