MongoDB oplog window too small: sizing the oplog for your write volume
The oplog window is the only thing standing between a routine secondary restart and a multi-hour full initial sync. It is a fixed-size capped collection that stores a variable amount of history. As your write volume grows, the window compresses. Most teams size the oplog once during initial deployment and never look at it again. Six months later, a routine maintenance window turns into an incident because the secondary fell off the oplog, entered RECOVERING, and forced a resync that saturated the remaining nodes.
This guide explains how the oplog window works, how to size it for your actual write volume, and how to resize it safely without restarting MongoDB.
What the oplog window is and why it matters
MongoDB replicates by having secondaries tail the primary’s operation log. The oplog lives in local.oplog.rs on every replica set member. It is a capped collection with a hard byte limit. When new entries arrive and the collection hits that limit, MongoDB evicts the oldest entries to make room.
The oplog window is the time span between the oldest and newest entry in the oplog. You can inspect it with rs.printReplicationInfo() or programmatically with db.getReplicationInfo(). The field timeDiff (or timeDiffHours) tells you exactly how far behind a secondary can fall before the primary has overwritten the data it still needs.
If a secondary is offline, partitioned, or lagging for longer than the oplog window, it cannot catch up incrementally. It enters RECOVERING and must perform a full initial sync. During that sync, the remaining secondaries absorb more load. If the cluster was already near its oplog limit, losing one member can push a second secondary over the edge. The failure is binary and sudden.
How write volume compresses the window
Because the oplog is capped by bytes, not by time, the window is inversely proportional to write velocity. Higher throughput means faster turnover, which means fewer hours of history fit inside the same allocation.
Not all operations consume oplog space equally. Bulk updates, multi-document deletes, and large transactions generate disproportionately large oplog entries relative to the net data change. A single large multi-document transaction writes one massive entry. A migration script that updates every document in a collection can shrink the window from days to hours in minutes.
This is why monitoring logSizeMB alone is a mistake. logSizeMB only shows the configured cap. The actionable metric is timeDiffHours, and you must track its minimum value during peak traffic, not its average during quiet periods.
flowchart TD
A[Primary write velocity increases] --> B[Oplog turnover accelerates]
B --> C[Oplog window shrinks]
C --> D[Secondary lag exceeds window]
D --> E[Secondary enters RECOVERING]
E --> F[Forced full initial sync]Production sizing rules and tradeoffs
Production oplog windows should stay above 24 to 72 hours at all times. The window must also be greater than twice the longest expected secondary downtime. If you routinely take a node down for four hours during maintenance, the oplog window should never drop below eight hours. In practice, that means sizing for 24 hours as a bare minimum, and 72 hours if you run large secondaries that can take a long time to rebuild.
Size for peak write volume, not average. A bulk import, a backfill job, or a deployment that rebuilds an index can spike writes and temporarily compress the window. If you sized the oplog for average load, that spike becomes an incident.
The classic mistake is to set the oplog once and never trend it. Workloads evolve. Document sizes grow. New batch jobs appear. The window shrinks month by month until it becomes a single-digit hour count.
| Workload pattern | Impact on oplog window |
|---|---|
| Bulk imports or migrations | Compresses window dramatically during the event |
| Large multi-document transactions | Consumes large contiguous oplog space per commit |
| High update or delete volume | Generates more oplog bytes than net data change |
| Large individual documents | Larger entry per operation |
How to inspect and trend the window
Check the current window and configured size from the primary:
// Inspect oplog window and configured size
rs.printReplicationInfo()
Look at logSizeMB for the cap and timeDiffHours for the actual window. The hours are the signal that matters.
For programmatic monitoring, use db.getReplicationInfo():
var info = db.getReplicationInfo();
print("Oplog window: " + (info.timeDiff / 3600).toFixed(1) + " hours");
print("Configured size: " + info.logSizeMB + " MB");
Correlate this with replication lag to calculate runway:
var status = rs.status();
var primary = status.members.filter(m => m.stateStr === 'PRIMARY')[0];
status.members.filter(m => m.stateStr === 'SECONDARY').forEach(function(s) {
var lagSec = (primary.optimeDate - s.optimeDate) / 1000;
var runwayHours = ((info.timeDiff - lagSec) / 3600).toFixed(1);
print(s.name + " lag: " + lagSec + "s, runway: " + runwayHours + "h");
});
Track the minimum timeDiffHours during your peak load periods. If that minimum drops below your threshold, the oplog is too small for your actual volume.
How to resize the oplog
Starting in MongoDB 4.0, you can resize the oplog dynamically without restarting the node:
// Increase oplog to 16000 MB
db.adminCommand({ replSetResizeOplog: 1, size: 16000 })
The minimum size is 990 MB and the maximum is 1 PB. Changes persist across restarts. After resizing, update mongod.conf under replication.oplogSizeMB so that new members or rebuilds inherit the correct size.
You can also set a minimum retention period:
// Retain at least 24 hours of oplog
db.adminCommand({ replSetResizeOplog: 1, minRetentionHours: 24 })
Be careful with minRetentionHours. When this is set, MongoDB will retain entries for the full period even if the oplog exceeds its configured max size. The oplog can then grow unbounded and consume disk space. It also relies on the host wall clock, so clock skew between replica set members can cause unpredictable retention behavior. Monitor disk space closely if you use this setting.
Shrinking the oplog is more dangerous. Reducing the size immediately truncates the oldest entries. This invalidates open change streams and can force secondaries that have not yet replicated those entries into a full resync. Do not shrink the oplog during production traffic.
If you do shrink it and need to reclaim disk space, run:
// Reclaim disk space after shrinking
db.runCommand({ compact: "oplog.rs" })
On MongoDB versions earlier than 4.4, compact on oplog.rs blocks oplog synchronization. Schedule it during a maintenance window. Starting in 4.4, a secondary can continue replicating oplog entries while the compact runs.
Signals to watch in production
| Signal | Why it matters | Warning sign |
|---|---|---|
timeDiffHours from db.getReplicationInfo() | Direct measure of catch-up margin | Minimum during peak drops below 24 hours |
| Replication lag vs oplog window | Runway before a secondary falls off | Lag sustained above 50% of the window |
Primary opcounters write rate | Predicts how fast the window will shrink | Sustained spike without matching oplog capacity |
metrics.repl.apply rate on secondaries | Ability to catch up | Apply rate below primary write rate for >10 minutes |
flowControl.isLagged | Primary throttling writes to protect window | true with growing timeAcquiringMicros |
How Netdata helps
Netdata correlates the signals that predict oplog window collapse before a secondary enters RECOVERING.
- Correlate shrinking
timeDiffHourswithopcounterswrite spikes and replication lag on the same timeline to confirm the window is compressing under load, not just reporting a transient blip. - Alert on the minimum oplog window during peak periods, not a simple average, so bulk jobs that run overnight do not create a false sense of safety.
- Surface
flowControlthrottling alongside primary write pressure and secondary apply rates. This helps you distinguish between “the oplog is too small” and “the secondary cannot keep up because of disk or CPU saturation.” - Track
metrics.repl.buffertrends where applicable to detect replication pipeline saturation before lag manifests.
Related guides
- How MongoDB actually works in production: a mental model for operators
- MongoDB pages evicted by application threads: when eviction becomes user latency
- MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches
- MongoDB WiredTiger cache pressure cascade: eviction stalls and latency spikes
- MongoDB cache too small: sizing the WiredTiger cache for your working set
- MongoDB checkpoint duration climbing: diagnosing slow WiredTiger checkpoints
- MongoDB checkpoint stall write freeze: when all writes stop with no error
- MongoDB journal sync latency high: the storage signal that warns 60 seconds early
- MongoDB monitoring checklist: the signals every production cluster needs
- MongoDB monitoring maturity model: from survival to expert
- MongoDB noTimeout cursors causing cache pressure: pinned snapshots and silent eviction stalls
- MongoDB oplog window collapse: secondaries falling off and forced full resync







