MongoDB rollback after failover: silent data loss and the rollback directory
A replica set member in ROLLBACK state, or an application reporting vanished documents after failover, means a former primary held writes that never reached a majority. When that node rejoins, MongoDB erases the divergent history and writes the removed data to files under <dbPath>/rollback/. The application may have received acknowledgment for those writes. With w:1, acknowledgment meant only that the primary applied the write. It did not guarantee replication to a majority or survival through failover. That is silent data loss.
Rollback is a binary data-loss event. Any occurrence in production is page-worthy. Rollback files can be used for manual recovery, but the process is manual and error-prone. In older MongoDB versions, rollbacks larger than 300 MB could fail entirely, leaving the member in an unrecoverable state and forcing a full initial sync. The only operational configuration that eliminates rollback risk for acknowledged writes is w:"majority".
What this means
When a former primary rejoins a replica set after a failover, MongoDB compares its oplog to the new primary’s oplog. If the old primary contains write operations that were never replicated to a majority of members before it stepped down, the rejoining member transitions to ROLLBACK. MongoDB walks back the oplog to the last entry that matches the new primary’s history. Every write after that common point is reverted locally. The reverted documents are serialized into files in the rollback/ directory under the node’s dbPath. Once rollback finishes, the member resumes as a SECONDARY and catches up normally.
With w:"majority", a write is not acknowledged until a majority of voting data-bearing members have applied it. Once majority-acknowledged, a write is immune to rollback.
flowchart TD
A[Primary accepts writes with w:1] --> B[Failover or stepdown occurs]
B --> C[Old primary becomes unreachable]
C --> D[New primary elected]
D --> E[Old primary rejoins replica set]
E --> F{Unreplicated writes exist?}
F -->|Yes| G[Member enters ROLLBACK]
G --> H[Revert writes locally]
H --> I[Write removed data to rollback/ under dbPath]
I --> J[Resume as SECONDARY]
F -->|No| JCommon causes
| Cause | What it looks like | First thing to check |
|---|---|---|
w:1 writes during network partition | Primary continues accepting writes while isolated from secondaries | rs.status() for member states and lastHeartbeatMessage |
| Primary crash before replication | Writes acknowledged by primary but not yet replicated to secondaries | MongoDB logs for crash timestamps and election events |
| Slow secondaries falling behind | Replication lag spikes before failover, creating a large unreplicated window | Replication lag trend and oplog window size |
| Forced failover during bulk load | High write volume outpaces replication; old primary holds many unique operations | opcounters and oplog application rate on secondaries |
Quick checks
These commands are safe to run on any replica set member.
# Check for ROLLBACK member state
mongosh --quiet --eval 'rs.status().members.forEach(m => print(m.name + " " + m.stateStr))'
# Search logs for rollback events and unrecoverable errors
grep -iE "rollback|UnrecoverableRollbackError" /var/log/mongodb/mongod.log | tail -20
# Check if a rollback directory exists under dbPath
ls -la <dbPath>/rollback/ 2>/dev/null || echo "No rollback directory"
# Check for write concern timeouts (writes accepted but not majority-committed)
mongosh --quiet --eval 'printjson(db.serverStatus().metrics.getLastError.wtimeouts)'
How to diagnose it
- Confirm member state. Run
rs.status()and look forstateStr: "ROLLBACK". If the member is still rolling back, do not restart it. Restarting mid-rollback can abort the process and force a full resync. - Check MongoDB logs. Search
mongod.logfor"rollback"and"UnrecoverableRollbackError". The logs contain the timestamp range being rolled back and the collection namespaces affected. - Inspect the rollback directory. Check for files under
<dbPath>/rollback/. Their presence confirms data was removed. Recovery requires manually inspecting these files and deciding whether to re-insert the data. - Correlate with write concern. Check
db.serverStatus().metrics.getLastError.wtimeouts. A non-zero rate indicates the application was already experiencing write concern timeouts before the failover, meaning the cluster was struggling to replicate writes. - Determine the loss window. Compare the rolled-back oplog timestamp range to application logs or the oplog on the current primary to identify which writes were lost.
- Assess member health post-rollback. After the member returns to
SECONDARY, verify replication lag and oplog window to ensure it can catch up without falling off again.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
Replica set member state ROLLBACK | Binary indicator of data loss | Any member in ROLLBACK for any duration |
| Replication lag | Lag creates the window for unreplicated writes | Sustained lag >10 seconds on voting members |
Write concern errors (wtimeouts) | Shows writes accepted but not majority-committed | Non-zero rate of wtimeout in serverStatus or logs |
| Oplog window | Determines how long a secondary can be down before forced resync | Window shrinking toward maintenance windows or lag durations |
| Healthy voting members | Without a majority of healthy, caught-up voting members, w:"majority" cannot be satisfied | Voting members down or lagging such that a majority is unavailable |
UnrecoverableRollbackError in logs | Member cannot complete rollback automatically | Fatal assertion requiring operator intervention |
Fixes
Let rollback complete
If a member enters ROLLBACK, leave it alone. The process is automatic and can take seconds to hours depending on the volume of unreplicated writes. Monitor rs.status() until the state transitions to SECONDARY. Do not restart, do not step down the primary, and do not force a reconfiguration.
Recover from rollback files
After rollback, files remain in <dbPath>/rollback/. These files are BSON and hold the reverted data. There is no automatic replay command. Inspect them with bsondump, compare their contents to the current primary, and re-insert or merge documents manually. Blindly importing rollback files can overwrite newer data written after the failover. If the data is critical, prefer restoring from a point-in-time backup and reconciling with application audit logs.
Handle unrecoverable rollback
If logs show UnrecoverableRollbackError, the automatic rollback has aborted. In older versions this happens when the rollback exceeds 300 MB. The safest recovery is to wipe the member’s data directory and perform an initial sync. This is slow but guarantees consistency. Do not copy data files from another member unless you are certain they are consistent with the current primary.
Prevention
- Use
w:"majority"for acknowledged writes. This is the only write concern that prevents rollback for acknowledged operations. Any application that requires durability across failover must use it. - Monitor replication lag and oplog window. Keep sustained lag below 10 seconds and ensure the oplog window is at least 24 hours during peak load. A secondary that is already lagging before a failover creates a larger rollback surface.
- Resize the oplog if needed. MongoDB 4.0+ supports online oplog resizing with
replSetResizeOplogwithout restarting the node. - Verify majority-write readiness. Periodically check
rs.status()to confirm that enough healthy, caught-up members exist to satisfy majority write concern. A primary with only one lagging secondary is one failure away from data loss.
How Netdata helps
- Netdata collects replica set member state and alerts when a member enters
ROLLBACK. - Replication lag is charted per secondary, exposing the lag spike that preceded the failover.
- Election events are logged and correlated with member state changes so you can trace the sequence that caused the rollback.
- Write concern timeout metrics (
wtimeouts) surface when the cluster is failing to confirm majority writes before a failure. - Netdata correlates oplog window trends with replication lag to warn when a secondary is approaching forced resync.
Related guides
- How MongoDB actually works in production: a mental model for operators
- MongoDB pages evicted by application threads: when eviction becomes user latency
- MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches
- MongoDB WiredTiger cache pressure cascade: eviction stalls and latency spikes
- MongoDB cache too small: sizing the WiredTiger cache for your working set
- MongoDB checkpoint duration climbing: diagnosing slow WiredTiger checkpoints
- MongoDB checkpoint stall write freeze: when all writes stop with no error
- MongoDB connection churn: high totalCreated rate and thread creation overhead
- MongoDB connection refused at maxIncomingConnections: hitting the connection ceiling
- MongoDB connection storm spiral: reconnection floods after an election or deploy
- MongoDB exceeded memory limit for $group — aggregation spills and allowDiskUse
- MongoDB flow control throttling writes: when the primary slows itself down







