MongoDB oplog window too small: sizing the oplog for your write volume

The oplog window is the only thing standing between a routine secondary restart and a multi-hour full initial sync. It is a fixed-size capped collection that stores a variable amount of history. As your write volume grows, the window compresses. Most teams size the oplog once during initial deployment and never look at it again. Six months later, a routine maintenance window turns into an incident because the secondary fell off the oplog, entered RECOVERING, and forced a resync that saturated the remaining nodes.

This guide explains how the oplog window works, how to size it for your actual write volume, and how to resize it safely without restarting MongoDB.

What the oplog window is and why it matters

MongoDB replicates by having secondaries tail the primary’s operation log. The oplog lives in local.oplog.rs on every replica set member. It is a capped collection with a hard byte limit. When new entries arrive and the collection hits that limit, MongoDB evicts the oldest entries to make room.

The oplog window is the time span between the oldest and newest entry in the oplog. You can inspect it with rs.printReplicationInfo() or programmatically with db.getReplicationInfo(). The field timeDiff (or timeDiffHours) tells you exactly how far behind a secondary can fall before the primary has overwritten the data it still needs.

If a secondary is offline, partitioned, or lagging for longer than the oplog window, it cannot catch up incrementally. It enters RECOVERING and must perform a full initial sync. During that sync, the remaining secondaries absorb more load. If the cluster was already near its oplog limit, losing one member can push a second secondary over the edge. The failure is binary and sudden.

How write volume compresses the window

Because the oplog is capped by bytes, not by time, the window is inversely proportional to write velocity. Higher throughput means faster turnover, which means fewer hours of history fit inside the same allocation.

Not all operations consume oplog space equally. Bulk updates, multi-document deletes, and large transactions generate disproportionately large oplog entries relative to the net data change. A single large multi-document transaction writes one massive entry. A migration script that updates every document in a collection can shrink the window from days to hours in minutes.

This is why monitoring logSizeMB alone is a mistake. logSizeMB only shows the configured cap. The actionable metric is timeDiffHours, and you must track its minimum value during peak traffic, not its average during quiet periods.

flowchart TD
    A[Primary write velocity increases] --> B[Oplog turnover accelerates]
    B --> C[Oplog window shrinks]
    C --> D[Secondary lag exceeds window]
    D --> E[Secondary enters RECOVERING]
    E --> F[Forced full initial sync]

Production sizing rules and tradeoffs

Production oplog windows should stay above 24 to 72 hours at all times. The window must also be greater than twice the longest expected secondary downtime. If you routinely take a node down for four hours during maintenance, the oplog window should never drop below eight hours. In practice, that means sizing for 24 hours as a bare minimum, and 72 hours if you run large secondaries that can take a long time to rebuild.

Size for peak write volume, not average. A bulk import, a backfill job, or a deployment that rebuilds an index can spike writes and temporarily compress the window. If you sized the oplog for average load, that spike becomes an incident.

The classic mistake is to set the oplog once and never trend it. Workloads evolve. Document sizes grow. New batch jobs appear. The window shrinks month by month until it becomes a single-digit hour count.

Workload patternImpact on oplog window
Bulk imports or migrationsCompresses window dramatically during the event
Large multi-document transactionsConsumes large contiguous oplog space per commit
High update or delete volumeGenerates more oplog bytes than net data change
Large individual documentsLarger entry per operation

How to inspect and trend the window

Check the current window and configured size from the primary:

// Inspect oplog window and configured size
rs.printReplicationInfo()

Look at logSizeMB for the cap and timeDiffHours for the actual window. The hours are the signal that matters.

For programmatic monitoring, use db.getReplicationInfo():

var info = db.getReplicationInfo();
print("Oplog window: " + (info.timeDiff / 3600).toFixed(1) + " hours");
print("Configured size: " + info.logSizeMB + " MB");

Correlate this with replication lag to calculate runway:

var status = rs.status();
var primary = status.members.filter(m => m.stateStr === 'PRIMARY')[0];
status.members.filter(m => m.stateStr === 'SECONDARY').forEach(function(s) {
  var lagSec = (primary.optimeDate - s.optimeDate) / 1000;
  var runwayHours = ((info.timeDiff - lagSec) / 3600).toFixed(1);
  print(s.name + " lag: " + lagSec + "s, runway: " + runwayHours + "h");
});

Track the minimum timeDiffHours during your peak load periods. If that minimum drops below your threshold, the oplog is too small for your actual volume.

How to resize the oplog

Starting in MongoDB 4.0, you can resize the oplog dynamically without restarting the node:

// Increase oplog to 16000 MB
db.adminCommand({ replSetResizeOplog: 1, size: 16000 })

The minimum size is 990 MB and the maximum is 1 PB. Changes persist across restarts. After resizing, update mongod.conf under replication.oplogSizeMB so that new members or rebuilds inherit the correct size.

You can also set a minimum retention period:

// Retain at least 24 hours of oplog
db.adminCommand({ replSetResizeOplog: 1, minRetentionHours: 24 })

Be careful with minRetentionHours. When this is set, MongoDB will retain entries for the full period even if the oplog exceeds its configured max size. The oplog can then grow unbounded and consume disk space. It also relies on the host wall clock, so clock skew between replica set members can cause unpredictable retention behavior. Monitor disk space closely if you use this setting.

Shrinking the oplog is more dangerous. Reducing the size immediately truncates the oldest entries. This invalidates open change streams and can force secondaries that have not yet replicated those entries into a full resync. Do not shrink the oplog during production traffic.

If you do shrink it and need to reclaim disk space, run:

// Reclaim disk space after shrinking
db.runCommand({ compact: "oplog.rs" })

On MongoDB versions earlier than 4.4, compact on oplog.rs blocks oplog synchronization. Schedule it during a maintenance window. Starting in 4.4, a secondary can continue replicating oplog entries while the compact runs.

Signals to watch in production

SignalWhy it mattersWarning sign
timeDiffHours from db.getReplicationInfo()Direct measure of catch-up marginMinimum during peak drops below 24 hours
Replication lag vs oplog windowRunway before a secondary falls offLag sustained above 50% of the window
Primary opcounters write ratePredicts how fast the window will shrinkSustained spike without matching oplog capacity
metrics.repl.apply rate on secondariesAbility to catch upApply rate below primary write rate for >10 minutes
flowControl.isLaggedPrimary throttling writes to protect windowtrue with growing timeAcquiringMicros

How Netdata helps

Netdata correlates the signals that predict oplog window collapse before a secondary enters RECOVERING.

  • Correlate shrinking timeDiffHours with opcounters write spikes and replication lag on the same timeline to confirm the window is compressing under load, not just reporting a transient blip.
  • Alert on the minimum oplog window during peak periods, not a simple average, so bulk jobs that run overnight do not create a false sense of safety.
  • Surface flowControl throttling alongside primary write pressure and secondary apply rates. This helps you distinguish between “the oplog is too small” and “the secondary cannot keep up because of disk or CPU saturation.”
  • Track metrics.repl.buffer trends where applicable to detect replication pipeline saturation before lag manifests.