MongoDB flow control throttling writes: when the primary slows itself down
Write throughput on the primary has dropped by 30% or more. Application logs show intermittent write latency spikes, but the primary’s CPU, memory, and disk metrics look healthy. There are no elections, cache pressure warnings, or obvious errors. Check replication: secondaries are lagging. The primary is not sick; it is throttling itself.
MongoDB 4.2 introduced flow control, a ticket-based admission mechanism that caps the primary’s write rate to keep secondaries from falling off the oplog. When isLagged is true, the primary artificially limits throughput. The fix is rarely on the primary. Look at the replication pipeline: a slow secondary, an oplog window that is too small, or a topology change that distorts majority commit lag.
What this means
Flow control requires the primary to acquire tickets before taking global intent-exclusive locks for writes. db.serverStatus().flowControl exposes the state. When isLagged is true, MongoDB is throttling. targetRateLimit is the enforced operations-per-second ceiling. timeAcquiringMicros grows when writes are blocked waiting for tickets. sustainerRate approximates the apply rate of the secondary sustaining the commit point.
Without flow control, a fast primary could outrun a slow secondary until the secondary’s oplog position fell past the oldest entry in the primary’s oplog. That would force the secondary into RECOVERING and require a full initial sync. Flow control trades primary throughput for replica set stability.
Because the primary self-throttles, primary resource metrics can look clean. The signal lives in the replication subsystem.
flowchart TD
A[Primary write latency spikes] --> B{Check flowControl.isLagged}
B -->|true| C[Flow control is throttling]
B -->|false| D[Investigate cache pressure or locks]
C --> E{Check replication lag}
E -->|lag growing| F[Secondary bottleneck or network]
E -->|no lag| G[Topology false positive]
F --> H{Check oplog window}
H -->|shrinking| I[Reduce writes or resize oplog]
H -->|healthy| J[Fix secondary I/O or network]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Secondary cannot keep up | Replication lag increasing on one or more secondaries; targetRateLimit flat or declining | rs.status() comparing optimeDate between primary and secondaries |
| Oplog window too small for write volume | Oplog window shrinking toward lag duration; high primary write rate | rs.printReplicationInfo() or db.getReplicationInfo() |
| PSA topology member loss | isLagged true after the only secondary goes down; no data-bearing node remains to advance the commit point | rs.status() for member state and commit point implications |
| Long-running operations blocking replication application | Secondary apply rate below primary write rate; lag grows steadily | db.serverStatus().metrics.repl.apply on the secondary |
| Network degradation between primary and secondaries | All secondaries lag simultaneously after network events | Network latency and packet loss between nodes |
Quick checks
Run these read-only checks on the primary to confirm flow control is engaged and gauge severity.
# Flow control state and ticket pressure
mongosh --quiet --eval 'db.serverStatus().flowControl'
Look for isLagged: true and a timeAcquiringMicros value that increases between samples. A declining targetRateLimit means lag is worsening despite the throttle.
# Replication lag per secondary
mongosh --quiet --eval 'rs.printSecondaryReplicationInfo()'
# Oplog window safety margin
mongosh --quiet --eval 'rs.printReplicationInfo()'
# Primary write volume
mongosh --quiet --eval 'db.serverStatus().opcounters'
Sample opcounters twice and compare the calculated write rate to targetRateLimit. If the primary wants to write faster than the limit, flow control is the bottleneck.
# WiredTiger cache dirty ratio to rule out cache pressure
mongosh --quiet --eval 'var c=db.serverStatus().wiredTiger.cache; var max=c["maximum bytes configured"]; var dirty=c["tracked dirty bytes in the cache"]; print("Dirty ratio: " + (100*dirty/max).toFixed(1) + "%")'
If the dirty ratio is above 15% with application-thread evictions, you are dealing with cache pressure, not flow control.
# Long-running operations on secondaries
mongosh --host <secondary> --quiet --eval 'db.currentOp({active:true, secs_running:{$gt:10}}).inprog.forEach(function(o){print(o.opid + " " + o.secs_running + "s " + o.ns)})'
How to diagnose it
- Confirm flow control engagement. Sample
db.serverStatus().flowControltwice, 10 seconds apart. IfisLaggedis true andtimeAcquiringMicrosgrew, the primary is throttling. - Quantify replication lag. Use
rs.status()to compareoptimeDateon every secondary against the primary. Identify whether lag is isolated or widespread. - Calculate oplog runway. Subtract the maximum replication lag from the oplog window. If the result is under one hour, the secondary is at risk of falling off.
- Determine if the secondary is underconsuming. On the lagging secondary, check
db.serverStatus().metrics.repl.apply. If the apply rate is consistently below the primary’s write rate, the secondary is the bottleneck. If apply rate matches but lag still grows, check network throughput and latency. - Inspect secondary resources. On the lagging secondary, check disk I/O latency (
iostat -x 1 5), CPU saturation, anddb.currentOp()for long-running reads or index builds that compete with oplog application. - Rule out cache pressure on the primary. Check
db.serverStatus().wiredTiger.cachefor dirty ratio above 15% orpages evicted by application threadsincrementing. Cache pressure can slow writes and mimic flow control symptoms, but the fix is different.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
flowControl.isLagged | True when the primary is throttling writes | True for more than one minute |
flowControl.timeAcquiringMicros | Cumulative microseconds writes wait for flow control tickets | Increasing between consecutive samples |
flowControl.targetRateLimit | The ops-per-second cap currently enforced | Declining over multiple checks |
| Replication lag | Distance between primary and secondary oplog positions | Greater than 10 seconds sustained |
| Oplog window | Time coverage of the oplog; safety margin for catch-up | Less than 12 hours |
| Secondary oplog apply rate | Whether the secondary can keep up | Consistently below primary write rate |
| WiredTiger cache dirty ratio | Distinguishes cache pressure from replication backpressure | Greater than 15% sustained |
Fixes
Secondary resource bottleneck
If the secondary cannot apply oplog entries fast enough because of slow disk, CPU saturation, or concurrent read traffic:
- Redirect read traffic away from the lagging secondary until it catches up.
- Kill long-running queries or aggregations on the secondary that hold tickets or snapshots. Be careful: killing operations can disrupt applications.
- If the secondary hardware is permanently slower than the primary, upgrade storage or CPU to match primary capacity. A secondary with slower I/O than its primary will always be at risk of flow control throttling during write bursts.
Tradeoff: Removing reads reduces application read capacity, but it allows replication to catch up and lifts the primary throttle.
Oplog window too small
If write volume has grown and the oplog window is shrinking:
- Resize the oplog online by running
db.adminCommand({replSetResizeOplog: 1, size: <newSize>})on each replica set member with MongoDB 4.0+. - After resizing, the window grows as new space is utilized.
Tradeoff: A larger oplog consumes more disk space, but it buys time for secondaries to catch up during maintenance or bursts.
Topology-induced false lag
In a Primary-Secondary-Arbiter deployment, losing the secondary stalls the majority commit point because the arbiter does not replicate data. Flow control may engage even though the primary can still accept writes and the arbiter maintains election majority. In this scenario, flow control is protecting a commit point that cannot advance until the secondary returns.
- If the outage is temporary, monitor until the node returns.
- If you must operate with reduced redundancy, consider whether the write throughput loss is acceptable. Tuning flow control thresholds upward delays throttling but increases the risk of oplog window collapse. Disabling flow control entirely removes the throttle, but if the secondary falls behind, it can drop off the oplog and require a full resync.
Tradeoff: Relaxing or disabling the safety mechanism trades throughput against the risk of a lengthy initial sync.
Network degradation
If lag spikes correlate with packet loss or latency between nodes:
- Fix the network path. There is no MongoDB-level knob for packet loss.
- As a temporary measure, reduce primary write volume to give the network time to clear the replication backlog.
Excessive primary write volume
If the workload has outgrown the replica set’s replication capacity:
- Throttle writes at the application layer during bursts.
- Shard the collection to distribute write load across multiple primaries.
Prevention
- Trend the oplog window over weeks, not just during incidents. Size the oplog to maintain at least 24 hours of coverage during peak write throughput.
- Monitor secondary apply rate versus primary write rate. A sustained gap predicts flow control engagement before it happens.
- Keep secondary storage and CPU capacity equal to the primary. Underprovisioning secondaries guarantees flow control will eventually throttle writes.
- Monitor
flowControl.isLaggedas a standard replication health signal. - Avoid long-running operations on secondaries that compete with oplog application. Schedule index builds and large aggregations during low-write windows, or run them on hidden nodes.
How Netdata helps
- Correlates
flowControl.isLaggedwith replication lag and oplog window on the same timeline, so you can see whether throttling starts before or after lag spikes. - Tracks primary
opcountersagainst secondary apply rates to surface capacity gaps before flow control engages. - Alerts on oplog window shrinkage and replication lag, giving time to act before the primary self-throttles.
- Surfaces WiredTiger cache dirty ratio and eviction metrics to distinguish cache pressure cascades from replication backpressure.
Related guides
- How MongoDB actually works in production: a mental model for operators
- MongoDB pages evicted by application threads: when eviction becomes user latency
- MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches
- MongoDB WiredTiger cache pressure cascade: eviction stalls and latency spikes
- MongoDB cache too small: sizing the WiredTiger cache for your working set
- MongoDB checkpoint duration climbing: diagnosing slow WiredTiger checkpoints
- MongoDB checkpoint stall write freeze: when all writes stop with no error
- MongoDB journal sync latency high: the storage signal that warns 60 seconds early
- MongoDB monitoring checklist: the signals every production cluster needs
- MongoDB monitoring maturity model: from survival to expert
- MongoDB noTimeout cursors causing cache pressure: pinned snapshots and silent eviction stalls
- MongoDB oplog window collapse: secondaries falling off and forced full resync







