$ guides / mongodb / mongodb-connection-churn ▌

Operations Guides

MongoDB connection churn: high totalCreated rate and thread creation overhead

db.serverStatus().connections can show low current and a rapidly climbing totalCreated. That mismatch is connection churn: connections open and close rapidly instead of being reused. MongoDB uses a thread-per-connection model, so each cycle costs roughly a megabyte of thread stack, scheduling overhead, and file descriptor work. The result is rising RSS, CPU contention, and latency spikes that do not correlate with the active connection count.

For the broader mental model, see How MongoDB actually works in production: a mental model for operators. For the cascade after a failover, see MongoDB connection storm spiral: reconnection floods after an election or deploy.

What this means

db.serverStatus().connections reports three values:

current: connections open right now.
available: connection slots remaining before the server limit.
totalCreated: cumulative connections created since the mongod process started.

A high delta on totalCreated while current stays flat means clients are not holding connections. They open, authenticate, possibly run one or a few operations, close, and repeat. On a thread-per-connection deployment, 500 connections created and destroyed 100 times per minute generates memory and scheduler pressure even though the active count never exceeds 500.

Churn is also a leading indicator of a connection storm spiral. Once latency rises, applications time out and reconnect more aggressively, which raises churn further. Catching the high totalCreated rate early stops the feedback loop before memory or ticket exhaustion forces an outage.

flowchart TD
    A[Stable current connections] --> B[Rising totalCreated delta]
    B --> C[Connection churn]
    C --> D[Thread create/destroy]
    D --> E[Memory RSS growth]
    D --> F[CPU scheduling overhead]
    F --> G[Operation latency spikes]
    E --> H[Ticket contention]
    H --> G

Common causes

Cause	What it looks like	First thing to check
Client created per request, common in FaaS/serverless handlers	`totalCreated` spikes with each request wave; `current` stays flat; many short-lived source IPs	Application logs and `db.currentOp()` grouped by `client`
Driver pool too large or idle timeout too aggressive	Rapid open/close cycles; driver pool metrics show high creation	Driver pool settings for size and idle behavior
Reconnect storm after election, deploy, or network blip	`totalCreated` surges after a topology event; correlates with election log entries	MongoDB logs for `"Starting an election"` or `"Stepping down"`; `rs.status()`
Monitoring or scraping tools opening a fresh connection per check	Steady, low-rate churn from a small set of hosts	Source IPs in `currentOp`; monitoring agent configuration
Load balancer or proxy health checks resetting TCP	Repeated short-lived connections from the LB IP; duration is seconds	`ss` output sorted by source IP and state

Quick checks

Run these in order. All are read-only except where noted.

# Check current, available, active, and totalCreated
mongosh --quiet --eval 'JSON.stringify(db.serverStatus().connections)'

// Compute totalCreated delta over 60 seconds
var first = db.serverStatus().connections;
sleep(60000);
var second = db.serverStatus().connections;
print("current: " + second.current);
print("totalCreated delta / min: " + (second.totalCreated - first.totalCreated));
print("active: " + (second.active || "N/A"));

// Active vs current ratio and utilization against the server limit
var c = db.serverStatus().connections;
var util = 100 * c.current / (c.current + c.available);
print("utilization: " + util.toFixed(1) + "%");
print("active/current: " + (c.active !== undefined ? (c.active / c.current).toFixed(2) : "N/A"));

// Group active operations by client IP to find churn sources
var counts = {};
db.currentOp({ active: true }).inprog.forEach(function(op) {
  var ip = (op.client || "unknown").split(":")[0];
  counts[ip] = (counts[ip] || 0) + 1;
});
printjson(counts);

# Look for recent elections, connection errors, or resets in the logs
grep -iE "Starting an election|Stepping down|connection refused|error accepting" /var/log/mongodb/mongod.log | tail -20

# Compare open file descriptors to the process hard limit (assumes one mongod)
PID=$(pgrep -x mongod)
ls /proc/$PID/fd | wc -l
cat /proc/$PID/limits | grep "Max open files"

# Show established connections by source IP to spot repeat short-lived clients
# Strips the last :port; assumes IPv4 source addresses
ss -tnp | awk 'NR>1 {print $5}' | sed 's/:[^:]*$//' | sort | uniq -c | sort -rn | head

How to diagnose it

Confirm churn, not growth. Sample totalCreated twice over 60 seconds. If the delta is high while current is stable or only slightly changed, you have churn rather than legitimate pool growth.
Correlate with a trigger. Check MongoDB logs for elections, stepdowns, network errors, or application deployments. Churn that starts within seconds of an election points to a reconnect storm. Churn that tracks application request rate points to per-request client creation.
Identify the source hosts. Use db.currentOp() grouped by client to find which application instances or middleware are holding many short-lived connections. If the same IP appears repeatedly with new connection ports, that host is the culprit.
Check driver and application behavior. Verify whether the application creates a new MongoClient per request or per handler invocation. Verify whether monitoring agents authenticate on every scrape. Verify whether load balancer health checks open a new TCP connection each time.
Quantify impact. Correlate the churn window with:
- mem.resident growth that outpaces your baseline plus WiredTiger cache size and connection overhead (~1MB per current connection).
- opLatencies tail latency rising.
- globalLock.currentQueue or wiredTiger.concurrentTransactions available tickets dropping.
- File descriptor usage climbing toward ulimit -n.
Classify the root cause. Use the common causes table to decide whether the fix belongs in application code, driver configuration, infrastructure health checks, or the MongoDB network topology.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`connections.totalCreated` delta	Direct measure of churn; more informative than `current` alone	Sustained increase, or high delta with flat `current`
`connections.active / connections.current`	Shows how many open connections are actually doing work	Ratio stays low while `current` is high; many idle connections
`mem.resident`	Each connection costs ~1MB of thread stack; churn drives RSS growth	RSS grows disproportionately to cache size and connection count
`opLatencies` reads and writes	User-visible latency impact	p99 sustained >2x baseline
`globalLock.currentQueue`	Operations queuing behind contention	Sustained total >20
`wiredTiger.concurrentTransactions.available`	Ticket exhaustion from thread overhead	Read or write available tickets <25% of total
File descriptor utilization	Hard ceiling before connection rejections	>80% of `ulimit -n`
Election events	Common trigger for churn spikes	More than 1 per hour outside maintenance windows

Fixes

Application creates a client per request

The fix is to reuse one MongoClient instance per application process. Creating a client per request, per FaaS invocation, or per HTTP request forces a full TCP handshake, authentication, and potentially a topology discovery cycle every time. Cache the client at module or process scope and share it across requests. This is the single most effective fix for churn.

Driver pool sizing or idle behavior

If the driver pool is oversized or its idle timeout is aggressive, connections open and close unnecessarily. Reduce the maximum pool size to match actual concurrency, and set an idle timeout longer than typical request inter-arrival times so normal traffic keeps the pool warm.

Reconnect storm after a topology event

If churn followed an election or network blip, stabilize the cluster first:

Check rs.status() for flapping member states.
Review application retry configuration so clients back off rather than reconnect immediately.
If connections are approaching the limit and memory is climbing, you can temporarily lower net.maxIncomingConnections so MongoDB rejects new connections cleanly rather than accepting them and crashing from OOM. Warning: this is disruptive and will reject client connections. Coordinate with application owners before applying.

Monitoring or load balancer churn

If health checks or monitoring scrapers are the source, reconfigure them to use persistent connections or reduce their frequency. Ensure health checks do not perform an expensive handshake on every TCP open. If a proxy sits between the application and MongoDB, verify its idle timeout is not shorter than the driver’s, which causes the proxy to sever connections the driver still considers valid.

OS file descriptor limits

If churn is combined with high connection counts, check that ulimit -n and the systemd LimitNOFILE setting give MongoDB enough descriptors. Verify the actual process limit in /proc/$PID/limits (where PID is your mongod), because systemd unit files often override shell ulimit.

Prevention

Alert on totalCreated delta, not just current or available.
Track active / current so idle connections do not hide in the totals.
Enforce a single MongoClient singleton per application process.
Size driver pools to real concurrency and avoid idle timeouts shorter than your traffic cadence.
Test failover behavior under load to confirm clients back off instead of thundering herd.
Review infrastructure health checks quarterly to ensure they do not open fresh MongoDB connections per probe.
Keep connection headroom: operate below 50% of the effective connection limit so a reconnect storm does not immediately hit the ceiling.

How Netdata helps

Surfaces mongodb.connections_totalCreated as a rate without manual sampling.
Correlates churn with mongodb.memory_resident, CPU, and mongodb.globalLock_currentQueue on the same timeline.
Tracks mongodb.wiredTiger_concurrentTransactions_available to expose whether churn is translating into ticket contention.
Thresholds on total-created rate and RSS growth catch this failure mode before connection count alarms fire.
Per-second resolution catches short churn bursts that one-minute averages miss.

The Netdata solution

MongoDB monitoring with Netdata

Netdata monitors MongoDB with per-second metrics and automatic dashboards. Watch WiredTiger cache pressure, oplog window, connection counts, checkpoint stalls, and replication health in one place, correlated with the underlying host.

See MongoDB monitoring → Start monitoring free

MongoDB connection churn: high totalCreated rate and thread creation overhead

MongoDB connection churn: high totalCreated rate and thread creation overhead

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Application creates a client per request

Driver pool sizing or idle behavior

Reconnect storm after a topology event

Monitoring or load balancer churn

OS file descriptor limits

Prevention

How Netdata helps

Related guides

MongoDB monitoring with Netdata