MongoDB rollback after failover: silent data loss and the rollback directory

A replica set member in ROLLBACK state, or an application reporting vanished documents after failover, means a former primary held writes that never reached a majority. When that node rejoins, MongoDB erases the divergent history and writes the removed data to files under <dbPath>/rollback/. The application may have received acknowledgment for those writes. With w:1, acknowledgment meant only that the primary applied the write. It did not guarantee replication to a majority or survival through failover. That is silent data loss.

Rollback is a binary data-loss event. Any occurrence in production is page-worthy. Rollback files can be used for manual recovery, but the process is manual and error-prone. In older MongoDB versions, rollbacks larger than 300 MB could fail entirely, leaving the member in an unrecoverable state and forcing a full initial sync. The only operational configuration that eliminates rollback risk for acknowledged writes is w:"majority".

What this means

When a former primary rejoins a replica set after a failover, MongoDB compares its oplog to the new primary’s oplog. If the old primary contains write operations that were never replicated to a majority of members before it stepped down, the rejoining member transitions to ROLLBACK. MongoDB walks back the oplog to the last entry that matches the new primary’s history. Every write after that common point is reverted locally. The reverted documents are serialized into files in the rollback/ directory under the node’s dbPath. Once rollback finishes, the member resumes as a SECONDARY and catches up normally.

With w:"majority", a write is not acknowledged until a majority of voting data-bearing members have applied it. Once majority-acknowledged, a write is immune to rollback.

flowchart TD
    A[Primary accepts writes with w:1] --> B[Failover or stepdown occurs]
    B --> C[Old primary becomes unreachable]
    C --> D[New primary elected]
    D --> E[Old primary rejoins replica set]
    E --> F{Unreplicated writes exist?}
    F -->|Yes| G[Member enters ROLLBACK]
    G --> H[Revert writes locally]
    H --> I[Write removed data to rollback/ under dbPath]
    I --> J[Resume as SECONDARY]
    F -->|No| J

Common causes

CauseWhat it looks likeFirst thing to check
w:1 writes during network partitionPrimary continues accepting writes while isolated from secondariesrs.status() for member states and lastHeartbeatMessage
Primary crash before replicationWrites acknowledged by primary but not yet replicated to secondariesMongoDB logs for crash timestamps and election events
Slow secondaries falling behindReplication lag spikes before failover, creating a large unreplicated windowReplication lag trend and oplog window size
Forced failover during bulk loadHigh write volume outpaces replication; old primary holds many unique operationsopcounters and oplog application rate on secondaries

Quick checks

These commands are safe to run on any replica set member.

# Check for ROLLBACK member state
mongosh --quiet --eval 'rs.status().members.forEach(m => print(m.name + " " + m.stateStr))'

# Search logs for rollback events and unrecoverable errors
grep -iE "rollback|UnrecoverableRollbackError" /var/log/mongodb/mongod.log | tail -20

# Check if a rollback directory exists under dbPath
ls -la <dbPath>/rollback/ 2>/dev/null || echo "No rollback directory"

# Check for write concern timeouts (writes accepted but not majority-committed)
mongosh --quiet --eval 'printjson(db.serverStatus().metrics.getLastError.wtimeouts)'

How to diagnose it

  1. Confirm member state. Run rs.status() and look for stateStr: "ROLLBACK". If the member is still rolling back, do not restart it. Restarting mid-rollback can abort the process and force a full resync.
  2. Check MongoDB logs. Search mongod.log for "rollback" and "UnrecoverableRollbackError". The logs contain the timestamp range being rolled back and the collection namespaces affected.
  3. Inspect the rollback directory. Check for files under <dbPath>/rollback/. Their presence confirms data was removed. Recovery requires manually inspecting these files and deciding whether to re-insert the data.
  4. Correlate with write concern. Check db.serverStatus().metrics.getLastError.wtimeouts. A non-zero rate indicates the application was already experiencing write concern timeouts before the failover, meaning the cluster was struggling to replicate writes.
  5. Determine the loss window. Compare the rolled-back oplog timestamp range to application logs or the oplog on the current primary to identify which writes were lost.
  6. Assess member health post-rollback. After the member returns to SECONDARY, verify replication lag and oplog window to ensure it can catch up without falling off again.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Replica set member state ROLLBACKBinary indicator of data lossAny member in ROLLBACK for any duration
Replication lagLag creates the window for unreplicated writesSustained lag >10 seconds on voting members
Write concern errors (wtimeouts)Shows writes accepted but not majority-committedNon-zero rate of wtimeout in serverStatus or logs
Oplog windowDetermines how long a secondary can be down before forced resyncWindow shrinking toward maintenance windows or lag durations
Healthy voting membersWithout a majority of healthy, caught-up voting members, w:"majority" cannot be satisfiedVoting members down or lagging such that a majority is unavailable
UnrecoverableRollbackError in logsMember cannot complete rollback automaticallyFatal assertion requiring operator intervention

Fixes

Let rollback complete

If a member enters ROLLBACK, leave it alone. The process is automatic and can take seconds to hours depending on the volume of unreplicated writes. Monitor rs.status() until the state transitions to SECONDARY. Do not restart, do not step down the primary, and do not force a reconfiguration.

Recover from rollback files

After rollback, files remain in <dbPath>/rollback/. These files are BSON and hold the reverted data. There is no automatic replay command. Inspect them with bsondump, compare their contents to the current primary, and re-insert or merge documents manually. Blindly importing rollback files can overwrite newer data written after the failover. If the data is critical, prefer restoring from a point-in-time backup and reconciling with application audit logs.

Handle unrecoverable rollback

If logs show UnrecoverableRollbackError, the automatic rollback has aborted. In older versions this happens when the rollback exceeds 300 MB. The safest recovery is to wipe the member’s data directory and perform an initial sync. This is slow but guarantees consistency. Do not copy data files from another member unless you are certain they are consistent with the current primary.

Prevention

  • Use w:"majority" for acknowledged writes. This is the only write concern that prevents rollback for acknowledged operations. Any application that requires durability across failover must use it.
  • Monitor replication lag and oplog window. Keep sustained lag below 10 seconds and ensure the oplog window is at least 24 hours during peak load. A secondary that is already lagging before a failover creates a larger rollback surface.
  • Resize the oplog if needed. MongoDB 4.0+ supports online oplog resizing with replSetResizeOplog without restarting the node.
  • Verify majority-write readiness. Periodically check rs.status() to confirm that enough healthy, caught-up members exist to satisfy majority write concern. A primary with only one lagging secondary is one failure away from data loss.

How Netdata helps

  • Netdata collects replica set member state and alerts when a member enters ROLLBACK.
  • Replication lag is charted per secondary, exposing the lag spike that preceded the failover.
  • Election events are logged and correlated with member state changes so you can trace the sequence that caused the rollback.
  • Write concern timeout metrics (wtimeouts) surface when the cluster is failing to confirm majority writes before a failure.
  • Netdata correlates oplog window trends with replication lag to warn when a secondary is approaching forced resync.