MongoDB flow control throttling writes: when the primary slows itself down

Write throughput on the primary has dropped by 30% or more. Application logs show intermittent write latency spikes, but the primary’s CPU, memory, and disk metrics look healthy. There are no elections, cache pressure warnings, or obvious errors. Check replication: secondaries are lagging. The primary is not sick; it is throttling itself.

MongoDB 4.2 introduced flow control, a ticket-based admission mechanism that caps the primary’s write rate to keep secondaries from falling off the oplog. When isLagged is true, the primary artificially limits throughput. The fix is rarely on the primary. Look at the replication pipeline: a slow secondary, an oplog window that is too small, or a topology change that distorts majority commit lag.

What this means

Flow control requires the primary to acquire tickets before taking global intent-exclusive locks for writes. db.serverStatus().flowControl exposes the state. When isLagged is true, MongoDB is throttling. targetRateLimit is the enforced operations-per-second ceiling. timeAcquiringMicros grows when writes are blocked waiting for tickets. sustainerRate approximates the apply rate of the secondary sustaining the commit point.

Without flow control, a fast primary could outrun a slow secondary until the secondary’s oplog position fell past the oldest entry in the primary’s oplog. That would force the secondary into RECOVERING and require a full initial sync. Flow control trades primary throughput for replica set stability.

Because the primary self-throttles, primary resource metrics can look clean. The signal lives in the replication subsystem.

flowchart TD
    A[Primary write latency spikes] --> B{Check flowControl.isLagged}
    B -->|true| C[Flow control is throttling]
    B -->|false| D[Investigate cache pressure or locks]
    C --> E{Check replication lag}
    E -->|lag growing| F[Secondary bottleneck or network]
    E -->|no lag| G[Topology false positive]
    F --> H{Check oplog window}
    H -->|shrinking| I[Reduce writes or resize oplog]
    H -->|healthy| J[Fix secondary I/O or network]

Common causes

CauseWhat it looks likeFirst thing to check
Secondary cannot keep upReplication lag increasing on one or more secondaries; targetRateLimit flat or decliningrs.status() comparing optimeDate between primary and secondaries
Oplog window too small for write volumeOplog window shrinking toward lag duration; high primary write raters.printReplicationInfo() or db.getReplicationInfo()
PSA topology member lossisLagged true after the only secondary goes down; no data-bearing node remains to advance the commit pointrs.status() for member state and commit point implications
Long-running operations blocking replication applicationSecondary apply rate below primary write rate; lag grows steadilydb.serverStatus().metrics.repl.apply on the secondary
Network degradation between primary and secondariesAll secondaries lag simultaneously after network eventsNetwork latency and packet loss between nodes

Quick checks

Run these read-only checks on the primary to confirm flow control is engaged and gauge severity.

# Flow control state and ticket pressure
mongosh --quiet --eval 'db.serverStatus().flowControl'

Look for isLagged: true and a timeAcquiringMicros value that increases between samples. A declining targetRateLimit means lag is worsening despite the throttle.

# Replication lag per secondary
mongosh --quiet --eval 'rs.printSecondaryReplicationInfo()'
# Oplog window safety margin
mongosh --quiet --eval 'rs.printReplicationInfo()'
# Primary write volume
mongosh --quiet --eval 'db.serverStatus().opcounters'

Sample opcounters twice and compare the calculated write rate to targetRateLimit. If the primary wants to write faster than the limit, flow control is the bottleneck.

# WiredTiger cache dirty ratio to rule out cache pressure
mongosh --quiet --eval 'var c=db.serverStatus().wiredTiger.cache; var max=c["maximum bytes configured"]; var dirty=c["tracked dirty bytes in the cache"]; print("Dirty ratio: " + (100*dirty/max).toFixed(1) + "%")'

If the dirty ratio is above 15% with application-thread evictions, you are dealing with cache pressure, not flow control.

# Long-running operations on secondaries
mongosh --host <secondary> --quiet --eval 'db.currentOp({active:true, secs_running:{$gt:10}}).inprog.forEach(function(o){print(o.opid + " " + o.secs_running + "s " + o.ns)})'

How to diagnose it

  1. Confirm flow control engagement. Sample db.serverStatus().flowControl twice, 10 seconds apart. If isLagged is true and timeAcquiringMicros grew, the primary is throttling.
  2. Quantify replication lag. Use rs.status() to compare optimeDate on every secondary against the primary. Identify whether lag is isolated or widespread.
  3. Calculate oplog runway. Subtract the maximum replication lag from the oplog window. If the result is under one hour, the secondary is at risk of falling off.
  4. Determine if the secondary is underconsuming. On the lagging secondary, check db.serverStatus().metrics.repl.apply. If the apply rate is consistently below the primary’s write rate, the secondary is the bottleneck. If apply rate matches but lag still grows, check network throughput and latency.
  5. Inspect secondary resources. On the lagging secondary, check disk I/O latency (iostat -x 1 5), CPU saturation, and db.currentOp() for long-running reads or index builds that compete with oplog application.
  6. Rule out cache pressure on the primary. Check db.serverStatus().wiredTiger.cache for dirty ratio above 15% or pages evicted by application threads incrementing. Cache pressure can slow writes and mimic flow control symptoms, but the fix is different.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
flowControl.isLaggedTrue when the primary is throttling writesTrue for more than one minute
flowControl.timeAcquiringMicrosCumulative microseconds writes wait for flow control ticketsIncreasing between consecutive samples
flowControl.targetRateLimitThe ops-per-second cap currently enforcedDeclining over multiple checks
Replication lagDistance between primary and secondary oplog positionsGreater than 10 seconds sustained
Oplog windowTime coverage of the oplog; safety margin for catch-upLess than 12 hours
Secondary oplog apply rateWhether the secondary can keep upConsistently below primary write rate
WiredTiger cache dirty ratioDistinguishes cache pressure from replication backpressureGreater than 15% sustained

Fixes

Secondary resource bottleneck

If the secondary cannot apply oplog entries fast enough because of slow disk, CPU saturation, or concurrent read traffic:

  • Redirect read traffic away from the lagging secondary until it catches up.
  • Kill long-running queries or aggregations on the secondary that hold tickets or snapshots. Be careful: killing operations can disrupt applications.
  • If the secondary hardware is permanently slower than the primary, upgrade storage or CPU to match primary capacity. A secondary with slower I/O than its primary will always be at risk of flow control throttling during write bursts.

Tradeoff: Removing reads reduces application read capacity, but it allows replication to catch up and lifts the primary throttle.

Oplog window too small

If write volume has grown and the oplog window is shrinking:

  • Resize the oplog online by running db.adminCommand({replSetResizeOplog: 1, size: <newSize>}) on each replica set member with MongoDB 4.0+.
  • After resizing, the window grows as new space is utilized.

Tradeoff: A larger oplog consumes more disk space, but it buys time for secondaries to catch up during maintenance or bursts.

Topology-induced false lag

In a Primary-Secondary-Arbiter deployment, losing the secondary stalls the majority commit point because the arbiter does not replicate data. Flow control may engage even though the primary can still accept writes and the arbiter maintains election majority. In this scenario, flow control is protecting a commit point that cannot advance until the secondary returns.

  • If the outage is temporary, monitor until the node returns.
  • If you must operate with reduced redundancy, consider whether the write throughput loss is acceptable. Tuning flow control thresholds upward delays throttling but increases the risk of oplog window collapse. Disabling flow control entirely removes the throttle, but if the secondary falls behind, it can drop off the oplog and require a full resync.

Tradeoff: Relaxing or disabling the safety mechanism trades throughput against the risk of a lengthy initial sync.

Network degradation

If lag spikes correlate with packet loss or latency between nodes:

  • Fix the network path. There is no MongoDB-level knob for packet loss.
  • As a temporary measure, reduce primary write volume to give the network time to clear the replication backlog.

Excessive primary write volume

If the workload has outgrown the replica set’s replication capacity:

  • Throttle writes at the application layer during bursts.
  • Shard the collection to distribute write load across multiple primaries.

Prevention

  • Trend the oplog window over weeks, not just during incidents. Size the oplog to maintain at least 24 hours of coverage during peak write throughput.
  • Monitor secondary apply rate versus primary write rate. A sustained gap predicts flow control engagement before it happens.
  • Keep secondary storage and CPU capacity equal to the primary. Underprovisioning secondaries guarantees flow control will eventually throttle writes.
  • Monitor flowControl.isLagged as a standard replication health signal.
  • Avoid long-running operations on secondaries that compete with oplog application. Schedule index builds and large aggregations during low-write windows, or run them on hidden nodes.

How Netdata helps

  • Correlates flowControl.isLagged with replication lag and oplog window on the same timeline, so you can see whether throttling starts before or after lag spikes.
  • Tracks primary opcounters against secondary apply rates to surface capacity gaps before flow control engages.
  • Alerts on oplog window shrinkage and replication lag, giving time to act before the primary self-throttles.
  • Surfaces WiredTiger cache dirty ratio and eviction metrics to distinguish cache pressure cascades from replication backpressure.