MongoDB checkpoint duration climbing: diagnosing slow WiredTiger checkpoints

You notice transaction checkpoint most recent time (msecs) climbing past 10 seconds, then 30, then 50. It is trending upward, check after check, approaching the 60-second default checkpoint interval. When checkpoint duration meets or exceeds the interval, WiredTiger has no margin left. The next checkpoint starts late, dirty pages accumulate faster than they flush, and the journal can fill to the point where all new writes block until the checkpoint finishes. This is a common production failure mode that starts as a slow climb and ends as a write freeze.

Checkpoint duration is a direct read on your storage subsystem’s ability to absorb large sequential write bursts, but it is not only an I/O signal. Compression CPU, cache dirty pressure, and version-specific behavioral changes can all extend checkpoint time. A 55-second checkpoint every 60 seconds looks stable on a dashboard but has zero headroom. Additional dirty page pressure, a RAID rebuild, or burst-credit exhaustion on cloud storage will push it over.

What this means

WiredTiger checkpoints flush dirty pages from the in-memory cache to disk. By default, this happens every 60 seconds, and MongoDB also triggers an immediate checkpoint when the journal file reaches 2 GB. Checkpoint duration measures how long this flush takes.

Checkpoint duration acts as a fixed-capacity scheduler against a fixed interval. If a checkpoint takes 20 seconds, there are 40 seconds of idle time before the next one. If it takes 55 seconds, there are 5 seconds of idle time. If it takes 65 seconds, the next scheduled checkpoint is already overdue when the previous one finishes. Dirty data that should have been flushed accumulates in the cache. The dirty ratio rises. If the dirty ratio hits eviction_dirty_trigger (default 20%), WiredTiger forces an unscheduled checkpoint, which competes with the normal cycle. In the extreme case, the journal cannot recycle space until the checkpoint completes, and WiredTiger blocks all writes until journal space is available. Reads may still work from cache, but writes freeze completely.

The climb is usually gradual. Storage degrades slowly, working sets grow, or compression throughput plateaus. Alerting only when checkpoint duration exceeds 60 seconds pages you after writes have frozen. Watching the trend from 8 seconds to 25 seconds gives time to act.

flowchart TD
  A[Checkpoint duration climbing] --> B{Dirty ratio >15%?}
  B -->|Yes| C[Eviction pressure forcing early checkpoints]
  B -->|No| D{Journal sync latency high?}
  D -->|Yes| E[Storage throughput degradation]
  D -->|No| F{CPU saturated during checkpoint?}
  F -->|Yes| G[Compression bottleneck]
  F -->|No| H[Check for RAID rebuild or cloud burst credit exhaustion]

Common causes

CauseWhat it looks likeFirst thing to check
Storage throughput saturation or degradationCheckpoint duration climbs steadily; journal sync latency rises in parallel; OS disk %util highiostat -x 1 on the data volume
Cloud storage burst credit exhaustionCheckpoint duration spikes suddenly after a period of normal operation; cloud disk metrics show credit balance at zeroCloud provider disk throughput metrics
WiredTiger cache dirty ratio pressureDirty ratio sustained above 15-20%; frequent early checkpoints triggered by eviction; application-thread evictions incrementingdb.serverStatus().wiredTiger.cache
Compression CPU bottleneckCheckpoint write throughput plateaus near 250-300 MB/s per core; CPU saturated during checkpoint windows; often reported after upgrading to MongoDB 5.0db.serverStatus().wiredTiger.transaction trends and per-core CPU usage
RAID rebuild or degraded physical diskCheckpoint duration spikes correlate with storage controller events; disk utility shows a disk in degraded stateStorage controller and disk health logs
Container memory misconfigurationCache size calculated against host RAM instead of container limit; unexpected eviction pressure and dirty ratio rise despite low apparent cache fillContainer cgroup memory limit vs --wiredTigerCacheSizePct

Quick checks

Run these read-only commands to characterize the current state.

# Check checkpoint duration and history
mongosh --quiet --eval 'db.serverStatus().wiredTiger.transaction'

Look for transaction checkpoint most recent time (msecs), transaction checkpoint max time (msecs), and transaction checkpoints count. If the most recent time is greater than 30 seconds and the max is climbing, the trend is active.

# Check cache dirty ratio
mongosh --quiet --eval '
var c = db.serverStatus().wiredTiger.cache;
var max = c["maximum bytes configured"];
var dirty = c["tracked dirty bytes in the cache"];
print("Dirty ratio: " + (100 * dirty / max).toFixed(1) + "%");
'

Dirty ratio greater than 15% is concerning; greater than 20% is critical.

# Check journal sync latency
mongosh --quiet --eval '
var wt = db.serverStatus().wiredTiger.log;
var syncTime = wt["log sync time duration (usecs)"];
var syncOps = wt["log sync operations"];
if (syncOps === 0) {
  print("No log sync operations recorded");
} else {
  print("Avg journal sync (ms): " + ((syncTime / syncOps) / 1000).toFixed(2));
}
'

Sustained average greater than 30 ms indicates storage is struggling with small sequential writes.

# Check OS disk health
iostat -x 1 5

Focus on %util, await, and w_await for the data volume. Sustained %util near 100% with climbing await means the device is saturated.

# Check for application-thread evictions
mongosh --quiet --eval '
var c = db.serverStatus().wiredTiger.cache;
print("App-thread evictions: " + c["pages evicted by application threads"]);
'

Any sustained nonzero rate means background eviction cannot keep up.

# Check current operations for long-running writers
mongosh --quiet --eval '
db.currentOp({ active: true, secs_running: { $gt: 30 } }).inprog.forEach(function(op) {
  print(op.opid + " | " + op.op + " | " + op.secs_running + "s | " + op.ns);
});
'

Long-running transactions or bulk writes can pin dirty pages.

# Check replica set state if considering failover
mongosh --quiet --eval 'rs.status().members.forEach(m => print(m.name + " -> " + m.stateStr))'

Only proceed with a stepdown if another member is healthy and caught up.

How to diagnose it

  1. Confirm the symptom is a trend, not a spike. Sample transaction checkpoint most recent time (msecs) multiple times. Single long checkpoints happen during backups or bulk inserts. A climbing trend over 10+ minutes is the danger signal.

  2. Correlate with cache dirty ratio. If dirty ratio is rising with checkpoint duration, dirty pages are accumulating faster than they can be flushed. This points to either write throughput exceeding storage capacity or checkpoint frequency being too low for the dirty volume.

  3. Check journal sync latency. If journal sync latency is also elevated, the storage subsystem is the likely root cause. Journal writes are small sequential I/O and should be fast. If they are slow, large sequential checkpoint writes will be slower.

  4. Check OS-level disk metrics. Use iostat -x 1 to see if %util or w_await is high. If the disk is saturated, the fix is storage capacity, not MongoDB tuning. If the disk is not saturated but checkpoint duration is high, look at CPU saturation during checkpoints.

  5. Evaluate compression CPU. Check per-core CPU utilization during checkpoint windows. If one core is pegged and checkpoint write throughput is plateaued near 250-300 MB/s, Snappy compression is the bottleneck. This is a known issue reported after MongoDB 5.0 upgrades.

  6. Check for early checkpoint triggers. If eviction_dirty_trigger (default 20%) is being reached frequently, unscheduled checkpoints are competing with the 60-second cycle. This accelerates the spiral.

  7. Identify the fix path. Storage saturation requires hardware or cloud provisioning changes. Compression bottlenecks require algorithm changes or version validation. Dirty pressure requires write throttling or eviction tuning.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
transaction checkpoint most recent time (msecs)Direct measure of flush speed vs the 60-second interval>10 seconds sustained; >30 seconds critical
WiredTiger cache dirty ratioIndicates how much unflushed data is waiting>10% concerning; >20% critical
Journal sync latencyLeading indicator of storage health; precedes checkpoint stalls by 30-60 secondsAverage >30 ms sustained
Application-thread evictionsShows cache pressure is already affecting user operationsAny sustained nonzero rate
pages selected for eviction unable to be evictedEviction is failing to free pagesAny positive value
Disk %util and w_awaitConfirms storage is the bottleneck%util >90% or w_await >20 ms
Per-core CPU during checkpointsReveals compression CPU saturationCore pegged at 100% during checkpoint window

Fixes

Storage throughput degradation or burst credit exhaustion

If iostat shows saturation or cloud metrics show depleted burst credits, reduce load on the primary immediately. Pause batch writes or throttle ingestion. If a healthy secondary exists, step down the primary to shift write load. Do not kill the checkpoint process. Let it complete.

For cloud volumes, switch to provisioned throughput. For local storage, ensure the journal and data files are on separate physical devices if possible. Avoid NFS for dbPath or journal paths.

Compression CPU bottleneck

If per-core CPU is saturated during checkpoints and write throughput is plateaued near 250-300 MB/s, Snappy compression is the limiting factor. You can change the collection compression algorithm for subsequent writes to zstd or none, but this is a significant storage and CPU tradeoff. zstd uses more CPU but achieves higher compression ratios; none removes compression CPU entirely at the cost of higher disk I/O.

If you are on MongoDB 5.0 and experiencing this after upgrading from 4.4, the issue has been reported as a version-specific regression. Validate whether a patch release addresses it in your version, or escalate to MongoDB support.

WiredTiger cache dirty pressure

If the dirty ratio is high but storage is healthy, reduce the volume of dirty pages each checkpoint must flush by tuning eviction_dirty_target and eviction_dirty_trigger downward. For example, lowering eviction_dirty_target to 2% and eviction_dirty_trigger to 10% causes WiredTiger to start flushing dirty pages earlier and more aggressively. The tradeoff is higher background I/O and CPU usage from eviction.

Terminate any unnecessary long-running transactions or noCursorTimeout cursors that pin old snapshots and prevent dirty page eviction. Killing operations is disruptive; confirm with application owners first. Throttle write-heavy batch jobs if possible.

RAID rebuild or hardware degradation

If a disk in a RAID array is degraded or rebuilding, checkpoint duration will suffer until the rebuild completes. There is no software fix. Monitor the rebuild progress and consider stepping down the primary to a node with healthy storage if the cluster can tolerate the failover.

Prevention

Trend checkpoint duration weekly. A steady climb from 5 seconds to 25 seconds over a month indicates storage degradation or a working set outgrowing flush capacity.

Trend the WiredTiger cache dirty ratio alongside checkpoint duration. Dirty ratio is the earlier signal. Most checkpoint stalls are preceded by dirty ratio climbing past 10%.

Size cloud storage for sustained throughput, not just capacity. Burst credits are designed for burst workloads, not sustained checkpoint flushes.

If running in containers, explicitly set --wiredTigerCacheSizePct so WiredTiger calculates cache size against the container limit rather than the host. Undersized cache relative to the working set increases dirty pressure.

Keep write concern at w:"majority" for critical data. If a checkpoint stall triggers a primary stepdown or a crash, w:1 writes accepted just before the stall are at risk of loss.

How Netdata helps

Netdata collects wiredTiger.transaction and wiredTiger.log metrics, allowing you to alert on checkpoint duration (warning at 10 seconds, critical at 30 seconds) and correlate it with cache dirty ratio, application-thread evictions, and journal sync latency on the same charts. OS disk metrics (%util, w_await) are collected alongside MongoDB storage engine metrics, so you can distinguish storage saturation from compression CPU or concurrency bottlenecks.

  • How MongoDB actually works in production: a mental model for operators: /guides/mongodb/how-mongodb-works-in-production/
  • MongoDB pages evicted by application threads: when eviction becomes user latency: /guides/mongodb/mongodb-application-thread-evictions/
  • MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches: /guides/mongodb/mongodb-cache-dirty-ratio-high/
  • MongoDB WiredTiger cache pressure cascade: eviction stalls and latency spikes: /guides/mongodb/mongodb-cache-pressure-cascade/
  • MongoDB cache too small: sizing the WiredTiger cache for your working set: /guides/mongodb/mongodb-cache-undersized-working-set/
  • MongoDB monitoring checklist: the signals every production cluster needs: /guides/mongodb/mongodb-monitoring-checklist/
  • MongoDB monitoring maturity model: from survival to expert: /guides/mongodb/mongodb-monitoring-maturity-model/
  • MongoDB noTimeout cursors causing cache pressure: pinned snapshots and silent eviction stalls: /guides/mongodb/mongodb-notimeout-cursors-cache-pressure/