MongoDB replication lag: detection, diagnosis, and fixes

Replication lag is the delay between the primary’s latest oplog entry and a secondary’s last applied entry. When lag grows faster than the oplog window, the secondary cannot catch up and requires a full initial sync, which can take hours to days. Until that happens, every second of lag erodes failover safety and read consistency.

What this means

MongoDB replicates by having secondaries tail the primary’s oplog, a capped collection in local.oplog.rs. The primary records every write with a timestamp; the secondary fetches entries and applies them locally. Lag is the delta between the primary’s optimeDate and the secondary’s optimeDate from rs.status().

Replication has two phases. Fetch reads the oplog from the sync source. Apply executes those operations on the secondary. Lag alone does not tell you which phase is bottlenecked. If fetch is slow, the replication buffer builds up. If apply is slow, the buffer drains slower than it fills. Diagnosing the wrong phase leads to the wrong fix.

Fixed-second thresholds are misleading. A lag of 30 seconds is harmless when the oplog window spans 72 hours, but critical when the window is 5 minutes. The more reliable signal is lag as a fraction of the oplog window. If lag exceeds roughly 25% of the window, the secondary is at risk. Members configured with secondaryDelaySecs are intentionally delayed and must be excluded from lag alerting.

flowchart TD
    A[Replication lag growing] --> B{Is secondary disk await high?}
    B -->|Yes| C[Apply bottleneck: storage or cache pressure]
    B -->|No| D{Is primary replication traffic elevated or network RTT high?}
    D -->|Yes| E[Fetch bottleneck: network or primary load]
    D -->|No| F[Check blocking ops and oplog window shrink]

Common causes

CauseWhat it looks likeFirst thing to check
Secondary storage bottleneckApply rate sustained below primary write rate; disk await elevatediostat -x 1 on the secondary
WiredTiger cache pressurepages evicted by application threads increasing; dirty ratio climbing; checkpoint duration risingdb.serverStatus().wiredTiger.cache on the secondary
Network fetch bottleneckLag spikes on geo-distributed secondaries; primary bytesOut elevatedrs.status() member latency and primary network stats
Bulk write surgeOplog window shrinking during batch jobs; primary opcounters spikers.printReplicationInfo() and primary write rate
Blocking operations on secondaryLong-running operations in db.currentOp() holding tickets or locksdb.currentOp({ active: true, secs_running: { $gt: 10 } })
Intentionally delayed memberLag equals the configured delay, especially when the primary is idleReplica set member configuration for secondaryDelaySecs

Quick checks

// 1. Lag per secondary in seconds
var s = rs.status();
var p = s.members.find(m => m.stateStr === 'PRIMARY');
s.members.filter(m => m.stateStr === 'SECONDARY').forEach(x => {
  print(x.name + ' lag: ' + ((p.optimeDate - x.optimeDate) / 1000) + 's');
});
// 2. Human-readable lag summary
rs.printSecondaryReplicationInfo();
// 3. Oplog window and size
rs.printReplicationInfo();
// 4. Secondary apply rate and batch timing
var r = db.serverStatus().metrics.repl.apply;
print('Applied ops: ' + r.ops);
print('Batches: ' + r.batches.num + ', total ms: ' + r.batches.totalMillis);
// 5. Primary write rate (sample twice, 10s apart)
var a = db.serverStatus().opcounters;
sleep(10000);
var b = db.serverStatus().opcounters;
for (var k in a) print(k + ': ' + ((b[k] - a[k]) / 10) + '/s');
// 6. Cache pressure on secondary
var c = db.serverStatus().wiredTiger.cache;
print('App-thread evictions: ' + c['pages evicted by application threads']);
print('Dirty ratio: ' + (100 * c['tracked dirty bytes in the cache'] / c['maximum bytes configured']).toFixed(1) + '%');
// 7. Long-running operations on secondary
db.currentOp({ active: true, secs_running: { $gt: 10 } }).inprog.forEach(o => {
  print(o.opid + ' | ' + o.secs_running + 's | ' + o.ns);
});
// 8. Flow control status on primary
var f = db.serverStatus().flowControl;
print('isLagged: ' + f.isLagged + ' | targetRateLimit: ' + f.targetRateLimit);
# 9. Secondary disk health
iostat -x 1 5
# 10. Replication stalls in MongoDB log
grep -i "REPL" /var/log/mongodb/mongod.log | tail -20

How to diagnose it

  1. Confirm the lag is real. An idle primary produces phantom lag equal to the time since the last write. If the primary has active writes and the secondary’s optimeDate is not advancing, the lag is genuine.
  2. Express lag as a fraction of the oplog window. Use rs.printReplicationInfo() to get the window in seconds. Divide lag by the window. If the result exceeds 25%, the secondary is in the danger zone.
  3. Determine whether the bottleneck is fetch or apply. If the secondary’s disk await is high and its apply rate is low, the bottleneck is apply. If network RTT is high and the primary’s replication bytesOut is saturated, the bottleneck is fetch.
  4. Check secondary storage health. Run iostat -x 1. Sustained await above 30 ms or %util near 100% means the disk cannot keep up with oplog application.
  5. Check WiredTiger cache pressure. On the secondary, a dirty ratio above 10% or any sustained increase in pages evicted by application threads means cache pressure is stealing threads from replication.
  6. Check flow control on the primary. If isLagged is true, the primary is throttling writes to protect secondaries. Do not disable flow control. Treat it as confirmation that the secondary bottleneck is severe enough to threaten the oplog window.
  7. Identify blocking operations on the secondary. Long-running aggregations, large transactions, or foreground index builds hold tickets and snapshots. Use db.currentOp() to find them.
  8. Inspect the MongoDB log. Look for replication-stage warnings and slow application entries on the secondary that correlate with the lag increase.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Lag as fraction of oplog windowDetermines how close the secondary is to falling off the oplogExceeds 25% of the window
Secondary apply rate vs primary write rateA sustained mismatch guarantees the secondary will eventually require a resyncApply rate below write rate for more than 10 minutes
WiredTiger application-thread evictionsEviction work on application threads reduces CPU available for oplog applicationAny sustained increase above zero
Oplog windowSafety margin for secondary downtime and catch-upDrops below 12 hours, or below 2x your longest expected maintenance window
Flow control isLaggedIndicates the primary is throttling writes to prevent window collapseTrue with growing timeAcquiringMicros
Secondary disk awaitDirect measure of storage health limiting apply throughputExceeds 30 ms sustained, or 2x baseline

Fixes

If the secondary cannot keep up (apply bottleneck)

Kill or redirect long-running reads and heavy aggregations on the secondary that compete for tickets and I/O. Use db.currentOp() to identify them and db.killOp() to abort.

Warning: db.killOp() aborts the operation immediately. Client requests will fail and may need retry logic.

If the secondary serves production read traffic, temporarily shift read preference to other nodes. Tradeoff: the remaining secondaries absorb more load, which can cascade lag if they are also near capacity.

Scale the secondary’s storage IOPS or CPU if it is under-provisioned relative to the primary. Tradeoff: cost and lead time. Secondaries apply writes serially per collection and can be more I/O intensive than the primary.

If WiredTiger cache pressure is the root cause, increase the cache size if it is undersized, or reduce the working set. See MongoDB WiredTiger cache pressure cascade and MongoDB cache too small. Tradeoff: more memory or workload changes.

If the network is the bottleneck (fetch bottleneck)

Check whether replication traffic competes with backups or cross-region traffic on the same network link. Isolate replication to dedicated bandwidth if possible.

In geo-distributed clusters, baseline lag includes at minimum one half of the network RTT. If lag is stable at that baseline, it is expected. If it grows beyond that baseline, investigate packet loss or bandwidth saturation.

If write volume is collapsing the oplog window

Reduce write throughput immediately. Pause batch imports, throttle bulk operations, or defer large index builds. Tradeoff: slower data ingestion.

Resize the oplog on each replica set member if you run MongoDB 4.0 or later:

db.adminCommand({ replSetResizeOplog: 1, size: <MB> });

Size the oplog to maintain at least 72 hours of coverage at peak write volume. Tradeoff: more disk consumption and longer initial sync times.

If a secondary has already fallen off the oplog

The only recovery is a full initial sync. Stop mongod on the secondary, wipe its data directory, and restart. The node will rejoin the replica set and begin syncing from the primary.

Warning: this destroys all data on the secondary. The cluster operates with reduced redundancy for hours to days, depending on data size and network speed. Do not attempt to write to the oplog manually.

Prevention

  • Monitor lag as a fraction of the oplog window. Fixed-second thresholds mislead because a 30-second lag is harmless when the window spans days, but critical when it spans minutes.
  • Exclude delayed members from lag alerts. Members with secondaryDelaySecs are intentionally behind and will report lag equal to their configured delay.
  • Match secondary storage to primary IOPS. Secondaries apply writes serially per collection and can be more I/O intensive than the primary.
  • Watch cache dirty ratio and application-thread evictions on secondaries. Cache pressure steals threads from oplog application.
  • Trend the oplog window monthly. Write volume and document size growth shrink the window over time.
  • Do not disable flow control. Flow control protects the cluster from oplog window collapse. Fix the secondary bottleneck instead.

How Netdata helps

  • Netdata charts replication lag alongside the oplog window on the same dashboard, making the lag fraction visible without manual calculation.
  • It correlates secondary apply rates with primary write rates, surfacing a sustained mismatch before lag grows large.
  • WiredTiger cache dirty ratio, application-thread evictions, and secondary disk latency appear on unified timelines, so you can see whether lag is caused by cache pressure or storage saturation.
  • Flow control status and primary write latency are correlated automatically, revealing when throttling is impacting application throughput even though rs.status() shows low lag.