MongoDB journal sync latency high: the storage signal that warns 60 seconds early
Application write latency spikes. Connections pile up. Look back 60 seconds and WiredTiger journal sync latency was likely already climbing. Every write with j:true or w:"majority" blocks until the journal buffer is fsynced to disk. When storage struggles, journal sync is the first domino to fall.
Journal sync latency is a storage subsystem signal, not a query or cache problem. The block device under mongod cannot absorb small sequential writes fast enough. The result is head-of-line delay for all durable writes, which cascades into ticket exhaustion and connection backlog.
What this means
WiredTiger maintains a write-ahead journal in <dbPath>/journal/. By default, it syncs this journal to disk every 100 ms. A write arriving with j:true triggers an immediate synchronous flush. On replica set secondaries, WiredTiger also syncs the journal after applying each oplog batch.
The counters in db.serverStatus().wiredTiger.log are cumulative since process start. The lifetime average sync latency is:
"log sync time duration (usecs)" / "log sync operations"
On modern SSD or NVMe storage, the interval average should stay below 10 ms. Between 10 ms and 50 ms is concerning. Above 100 ms sustained, all j:true and w:"majority" writes on that node are effectively stalled.
Since MongoDB 3.6, w:"majority" defaults to writeConcernMajorityJournalDefault: true unless changed at deploy time. This means majority writes are subject to journal sync latency even when the application does not explicitly set j:true. Reads and j:false writes are not directly blocked, but uncommitted journal entries accumulate, increasing the risk of a larger stall when the buffer eventually flushes.
flowchart TD
A[Storage I/O degrades] --> B[Journal sync latency spikes]
B --> C[j:true / w:majority writes stall]
C --> D[Tickets held longer]
D --> E[Queue depth grows]
E --> F[Application write latency spikes]
F --> G[Connection pileup]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Cloud burst credit depletion (EBS gp2, GCE PD) | Latency jumps from <10 ms to >100 ms while write volume stays flat; no local disk errors | Cloud provider volume burst balance or IOPS throttling metrics |
| Checkpoint I/O contention (journal and data share one device) | Journal sync spikes correlate with checkpoint duration spikes every 60 s | Disk layout (df, <dbPath> / <dbPath>/journal mount points); iostat -x |
| Disk subsystem degradation | Elevated OS await across all I/O types; RAID controller or SMART alerts | iostat -x 1, dmesg, RAID controller logs |
| Noisy neighbor / shared storage oversubscription | Latency spikes without a local workload change; common on virtualized or multi-tenant storage | Host-level I/O metrics, hypervisor storage latency, provider status page |
| NFS or network-backed storage latency | High journal sync on nodes using NFS or SAN for dbPath; network RTT correlates | mount options, nfsstat, network RTT between host and storage |
Quick checks
Run these read-only commands to confirm the symptom and narrow the cause.
# These fields are cumulative. Run this command twice, 5-10 seconds apart, and diff the values.
# Interval latency (ms) = (sync_usec_delta / ops_delta) / 1000
mongosh --quiet --eval '
var s = db.serverStatus().wiredTiger.log;
print("sync_usec=" + s["log sync time duration (usecs)"] + " ops=" + s["log sync operations"]);
'
# Check OS-level disk latency and utilization.
# Look for %util > 70 or await > 10 ms on the data/journal device.
iostat -x 1 5
# Check whether the node is primary.
mongosh --quiet --eval 'rs.status().members.forEach(m => print(m.name + " " + m.stateStr))'
# Verify disk topology: are journal and data on the same filesystem?
# Replace paths if your dbPath differs from the default.
df -h /var/lib/mongodb /var/lib/mongodb/journal 2>/dev/null || echo "Check dbPath and journal paths manually"
# Check WiredTiger checkpoint duration.
mongosh --quiet --eval '
var t = db.serverStatus().wiredTiger.transaction;
print("Last checkpoint: " + t["transaction checkpoint most recent time (msecs)"] + " ms");
'
# Check WiredTiger write ticket availability.
mongosh --quiet --eval '
var w = db.serverStatus().wiredTiger.concurrentTransactions.write;
print("Write tickets available: " + w.available + " / " + (w.out + w.available));
'
How to diagnose it
- Establish the baseline deviation. Sample
wiredTiger.logtwice and compute the interval average. Compare it to the node’s historical baseline. A sustained jump from 2 ms to 50 ms is more significant than a steady 15 ms on a known slow disk. - Confirm the impact scope. If the node is a primary,
j:trueandw:"majority"writes from all clients stall. If it is a secondary, replication oplog application stalls, which increases replication lag and may eventually trigger flow control on the primary. - Correlate with OS I/O. Run
iostat -x 1and look atawaitand%utilfor the block device hostingdbPath. Ifawaitis high across all I/O types and not just journal sync, the disk itself is saturated or degraded. - Check for checkpoint collision. Compare
transaction checkpoint most recent time (msecs)with the journal sync spikes. If both spike together every 60 seconds, the shared device is contending between large checkpoint sequential writes and small journal fsyncs. - Investigate cloud storage throttling. On AWS EBS gp2, check volume burst balance. On gp3 or provisioned IOPS volumes, check for IOPS throttling. On GCE PD, check IOPS balance metrics. Sudden latency jumps with no local configuration change often indicate cloud storage throttling.
- Verify disk topology. If
<dbPath>and<dbPath>/journalreside on the same mount point, checkpoint writes and journal syncs compete for the same device queue. Separating them removes that collision, though it does not fix underlying storage saturation. - Rule out hardware failure. Check
dmesg, RAID controller status, and SMART metrics for predictive failure or rebuild activity. A degrading disk often shows elevatedawaitbefore it shows errors. - Check WiredTiger ticket availability. If available write tickets are near zero while journal sync is high, the storage stall has consumed all concurrency and operations are queuing behind the fsync.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| WiredTiger log sync time / log sync operations | Direct measure of journal fsync latency | Interval average >10 ms sustained; >100 ms is critical |
| opLatencies.writes | User-visible write latency | Rising 30 to 60 seconds after journal sync spikes |
| WiredTiger checkpoint duration | Long checkpoints compete for disk I/O with journal syncs | >30 ms and correlating with sync latency spikes |
| WiredTiger cache dirty ratio | Dirty data accumulation increases pressure on flush paths | >20% sustained |
OS disk await | Underlying storage device saturation | >10 ms on SSD/NVMe sustained |
| Cloud volume burst balance | Credits fund IOPS above baseline on EBS/GCE PD | Depleting or at zero |
Fixes
Immediate relief
Do not restart mongod to clear a journal sync stall. A restart forces journal replay and cache warmup, which adds I/O load and prolongs the outage.
- Throttle or pause non-critical writes. Stop batch jobs, disable analytics ingestion, or pause data migrations. Reducing write volume lowers the frequency of forced journal syncs.
- Step down a saturated primary. If the primary’s storage is degraded and secondaries have healthy disks, run
rs.stepDown()to shift write load. This triggers a replica set election and the node will become secondary; only do this if your application handles failover gracefully. - Kill unnecessary long-running writes. Use
db.currentOp()to find operations that are amplifying write volume, then terminate them withdb.killOp(). Target only known-safe operations; killing writes can leave data in an inconsistent application state.
Storage-layer remediation
- Scale cloud storage baseline IOPS. On AWS, move from gp2 to gp3 and provision higher baseline IOPS, or increase gp2 volume size to raise its baseline. On GCE PD, increase provisioned IOPS or move to Hyperdisk. This addresses burst credit depletion or baseline saturation permanently.
- Separate journal and data onto independent devices. Mount
<dbPath>/journalon a dedicated, low-latency block device. This isolates small sequential journal fsyncs from large checkpoint sequential writes. This is a provisioning change that typically requires a rolling restart to reconfigure mount points or symlinks. - Replace failing hardware. If
dmesgor RAID logs show predictive failure, replace the disk or migrate the node.
Application-layer tradeoffs
- Reducing write concern is dangerous. Switching from
w:"majority"tow:1orj:falsewill bypass journal sync stalls, but it also removes durability guarantees. Acknowledged writes can be lost on failover. Do not make this change during an incident unless the business accepts data loss. - Adjusting
commitIntervalMsis not a fix. The valid range is version-dependent. Lowering it increases sync frequency, which can worsen the stall on already-saturated storage. Raising it reduces durability granularity but does not reduce the fsync cost ofj:truewrites.
Prevention
- Baseline journal sync latency per node. Establish per-node baselines during normal peak load. Alert on deviation (for example, 3x baseline for 2 minutes) rather than fixed thresholds, because HDD-backed nodes have different baselines than NVMe.
- Size storage for sustained IOPS, not burst. Cloud burst credits are for short spikes. If sustained write throughput depletes credits, provision enough baseline IOPS to cover the peak.
- Monitor dirty ratio and checkpoint duration. A rising dirty ratio often precedes journal sync pressure because checkpoint flushes compete with the same device. Trend both together.
- Review disk topology during provisioning. Plan for
<dbPath>/journalon a dedicated device if the workload is write-heavy and usesj:trueorw:"majority"by default. - Trend replication lag with journal sync. On secondaries, journal sync latency directly caps oplog application throughput. A secondary with slower disks than the primary will drift during bursts.
How Netdata helps
- Netdata collects
wiredTiger.logmetrics per second and computes the interval journal sync latency, so you do not need to diff cumulative counters manually during an incident. - The MongoDB collector exposes journal sync latency alongside WiredTiger cache dirty ratio, checkpoint duration, and write ticket utilization on the same dashboard. This correlates storage saturation with cache pressure without switching contexts.
- Per-second granularity captures transient spikes that minute-resolution aggregations miss, preserving the 30 to 60 second lead time this signal provides.
Related guides
- How MongoDB actually works in production: a mental model for operators
- MongoDB pages evicted by application threads: when eviction becomes user latency
- MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches
- MongoDB WiredTiger cache pressure cascade: eviction stalls and latency spikes
- MongoDB cache too small: sizing the WiredTiger cache for your working set
- MongoDB monitoring checklist: the signals every production cluster needs
- MongoDB monitoring maturity model: from survival to expert
- MongoDB noTimeout cursors causing cache pressure: pinned snapshots and silent eviction stalls







