$ guides / mongodb / mongodb-not-master-error ▌

Operations Guides

MongoDB not master error: writes hitting a non-primary node after failover

A node restart, network partition, or planned stepdown triggers a MongoDB election. Seconds later, application logs show NotWritablePrimary (code 10107) or the legacy string not master and slaveOk=false. Writes fail against a node that used to be PRIMARY, even though the cluster has elected a new one.

This guide covers how to find the root cause and stop it from recurring.

What this means

MongoDB replica sets elect exactly one PRIMARY at a time. When a failover occurs, the old primary steps down and a secondary is promoted. Application drivers discover the new topology through the replica set seed list and refresh their connection pools automatically. Between stepdown and election completion, there is a brief window with no writable primary. After the new primary is elected, drivers should route writes there.

If writes land on a non-primary node after the election has settled, the driver’s view of the topology is stale, the connection is pinned to a specific host, or the application timed out before discovery completed. The cause is almost always in the driver configuration, connection pool state, or timeout behavior.

flowchart TD
  A[NotWritablePrimary error] --> B[Run db.runCommand {hello: 1} on target host]
  B --> C{isWritablePrimary?}
  C -->|false| D[Node is secondary or recovering]
  C -->|true| E[Driver topology stale or directConnection pinned]
  D --> F[Check rs.status for recent election]
  F --> G{Election occurred?}
  G -->|yes| H[Driver has not refreshed topology]
  G -->|no| I[Node stuck in RECOVERING or ROLLBACK]
  H --> J{Check driver URI}
  J -->|directConnection=true| K[Remove directConnection]
  J -->|serverSelectionTimeoutMS < 10s| L[Increase timeout to 30s]
  J -->|retryWrites=false| M[Enable retryable writes]
  E --> N[Restart application to rebuild connection pool]

Common causes

Cause	What it looks like	First thing to check
Stale connection pool after planned primary switch	Errors point to the former primary; only applications that did not restart are affected	Run `db.runCommand({ hello: 1 })` on the target node to confirm it is no longer primary
`directConnection=true` in the connection URI	Every write fails against the same seed host, even when other nodes are healthy	Application connection string for `directConnection=true`
Aggressive `serverSelectionTimeoutMS` or socket timeout	Driver raises `MongoTimeoutException` or `NotWritablePrimary` during brief elections lasting 2-12 seconds	Driver timeout settings; compare to the default 30,000 ms
Retryable writes disabled or older driver	Brief election window causes permanent write failures instead of a single automatic retry	URI for `retryWrites=false` or driver version
Transaction writes during an active election	Multi-document transaction fails and is not individually retried; only commit and abort are retryable	Application transaction retry logic

Quick checks

// Verify the target node's current role
db.runCommand({ hello: 1 }).isWritablePrimary

// Check replica set member states
rs.status().members.forEach(function(m) {
  print(m.name + " -> " + m.stateStr);
});

# Look for recent elections in the log
grep -iE "election|stepping down" /var/log/mongodb/mongod.log | tail -20

// Check current connections and churn
var c = db.serverStatus().connections;
print("Current: " + c.current + ", Available: " + c.available + ", Total created: " + c.totalCreated);

// Check write throughput on the current primary
db.serverStatus().opcounters

// Check write latency distribution
var lat = db.serverStatus().opLatencies;
print("Write avg (µs): " + (lat.writes.latency / lat.writes.ops));

How to diagnose it

Identify the exact error and target host. Modern drivers return NotWritablePrimary (10107). Legacy drivers may return not master. Note the host the application is targeting.
Confirm the target node is not primary. Connect directly to that host and run db.runCommand({ hello: 1 }). If isWritablePrimary is false, the application is writing to a secondary or recovering node. If it is true, the node may have stepped down very recently and the driver is holding a connection that was valid milliseconds ago.
Check for a recent election. Search the MongoDB log for Starting an election, Stepping down, or VoteRequester. Elections typically complete in 2-12 seconds, but driver discovery depends on heartbeatFrequencyMS . Cross-reference with rs.status(): compare electionTime and stateStr across members to confirm when the new primary took over.
Inspect the connection URI for directConnection=true. This setting forces the driver into Single topology and pins all operations to the seed host, bypassing replica set discovery entirely. It is a frequent misconfiguration in Kubernetes StatefulSets where each pod exposes its own host.
Compare timeout values to the election window. If serverSelectionTimeoutMS is set to a few seconds, the driver may time out before it discovers the new primary. The default is 30,000 ms. Values below 10,000 ms are risky during failover.
Check if the application disables retryable writes or uses an older driver. Inspect the URI for retryWrites=false. With retryWrites=true, the driver automatically retries single-document writes once after a transient error. If disabled, the application sees the error immediately and must handle it itself.
Evaluate connection pool state. After a planned primary switch, applications that do not restart may retain idle connections to the old primary. Monitor totalCreated over time: a sharp rise after the switch indicates the driver is discarding stale connections and rebuilding the pool. If totalCreated stays flat but errors persist, connections are likely pinned by directConnection=true or the driver has not yet attempted to create new ones.
For transaction errors, verify application-level retry logic. Writes inside a multi-document transaction are not individually retryable. Only the commitTransaction and abortTransaction operations are retryable.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Replica set member state	Shows which node is PRIMARY; any write to another state fails	Target node shows SECONDARY, RECOVERING, or ROLLBACK
Election events	Elections change topology; drivers need time to refresh via heartbeats	Elections outside maintenance windows
Connection count and churn	Stale pools or reconnection storms after failover	`totalCreated` delta spiking after member state changes
Operation latency (`opLatencies`)	High latency triggers aggressive timeouts that abort topology discovery	Write average approaching or exceeding `serverSelectionTimeoutMS`
opcounters write rate	A near-zero write rate on the primary while applications error confirms misrouted traffic	Write opcounters flat on PRIMARY during reported write failures

Fixes

Remove directConnection=true

If the connection URI contains directConnection=true, remove it. This setting forces the driver to treat the seed host as the only node, disabling replica set topology discovery. The fix requires an application redeploy or restart.

Refresh stale connection pools

After a planned primary switch, restart application instances to force connection pools to rebuild against the new primary. Most drivers detect topology changes automatically, but if connections remain pinned, a restart clears them. Warning: restarting causes a brief capacity reduction and disrupts in-flight requests.

Extend serverSelectionTimeoutMS

Increase serverSelectionTimeoutMS to at least 10,000-30,000 ms. The default is 30,000 ms. Values below 10,000 ms often expire before the election completes and the driver refreshes its topology. Tradeoff: slower detection of permanently unreachable nodes.

Enable retryable writes

Ensure the URI includes retryWrites=true. This is the default in current MongoDB drivers. It handles transient NotWritablePrimary errors during brief elections by retrying once. Tradeoff: a small latency penalty for the retry handshake.

Add application-level transaction retries

For multi-document transactions, implement retry logic around the commitTransaction and abortTransaction operations. Individual writes inside a transaction are not retryable. Tradeoff: requires code changes.

Fix Kubernetes StatefulSet routing

If the application connects directly to a single pod hostname because of a headless service workaround, switch to the full replica set seed list in the URI and remove directConnection=true. A headless Kubernetes service returns pod IPs, but if the application hardcodes one pod’s DNS name or uses a single-pod endpoint, the driver never sees the other members. Use the StatefulSet headless service DNS names for all pods in the seed list.

Prevention

Never use directConnection=true in production replica set connections. Use the full seed list and let the driver discover the primary.
Set serverSelectionTimeoutMS to at least 10,000 ms, preferably 30,000 ms, to survive elections without timing out.
Keep drivers up to date and do not disable retryWrites.
Restart application instances or verify that connection pools refresh after planned primary maintenance to avoid stale connections to the old primary.
Monitor election events and alert when they occur outside of maintenance windows.
Verify that load balancers or proxies between the application and MongoDB do not pin connections to a single node, as this defeats driver topology discovery.

How Netdata helps

Correlate replica set member state changes with application error spikes to identify stale topology quickly.
Track MongoDB connection count and totalCreated churn to detect reconnection storms after failovers.
Monitor opLatencies write latency to catch aggressive timeout configurations before they cause mass write failures.
Alert on election events parsed from MongoDB logs to surface the root-cause timeline.
Visualize opcounters drops on the primary to confirm that write traffic is not reaching the new primary.

The Netdata solution

MongoDB monitoring with Netdata

Netdata monitors MongoDB with per-second metrics and automatic dashboards. Watch WiredTiger cache pressure, oplog window, connection counts, checkpoint stalls, and replication health in one place, correlated with the underlying host.

See MongoDB monitoring → Start monitoring free

MongoDB not master error: writes hitting a non-primary node after failover

MongoDB not master error: writes hitting a non-primary node after failover

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Remove directConnection=true

Refresh stale connection pools

Extend serverSelectionTimeoutMS

Enable retryable writes

Add application-level transaction retries

Fix Kubernetes StatefulSet routing

Prevention

How Netdata helps

Related guides

MongoDB monitoring with Netdata