MongoDB connection refused at maxIncomingConnections: hitting the connection ceiling

Application logs show connection timeouts. MongoDB logs show connection refused or error accepting new connection. db.serverStatus().connections shows current well below the configured maximum. This disconnect means you are hitting a hard ceiling at the TCP accept layer, not experiencing gradual degradation.

MongoDB uses a one-thread-per-connection model. Each accepted connection consumes at least one file descriptor and a thread stack sized by the OS ulimit -s. While maxIncomingConnections sets the logical inbound cap, the OS file-descriptor limit (ulimit -n) usually enforces the actual ceiling. Rejections happen before the connection handshake completes, so serverStatus().connections.current never counts refused connections. Look at logs, ratios, and OS-level resource counts to find the real limit.

What this means

maxIncomingConnections controls how many simultaneous inbound TCP connections MongoDB will accept. The default is 65,536. Practical ceilings are lower and are set by RAM, thread scheduling overhead, WiredTiger ticket contention, and file descriptors consumed by data files, journals, and indexes.

serverStatus().connections.current only tracks accepted connections. If the OS file-descriptor limit is lower than maxIncomingConnections – which is common – the server logs error accepting new connection while current sits far below the configured maximum. If maxIncomingConnections is the binding constraint, the log shows connection refused because there are too many open connections. In both cases, rejections occur at accept time and are invisible to the server counters.

flowchart TD
    A[Inbound TCP connection] --> B{OS FD limit reached?}
    B -->|Yes| C[Error accepting new connection]
    B -->|No| D{MongoDB maxIncomingConnections reached?}
    D -->|Yes| E[Connection refused too many open connections]
    D -->|No| F[Connection accepted]
    F --> G[Thread plus FD allocated]
    G --> H[connections.current incremented]
    C --> I[Not reflected in serverStatus]
    E --> I

Common causes

CauseWhat it looks likeFirst thing to check
Connection pool bloatcurrent count is high and flat; few application hosts hold most connectionsdb.serverStatus().connections and db.currentOp() grouped by client
Connection leakcurrent high but active low; connections never drop after traffic spikesactive versus current ratio in serverStatus().connections
Connection stormSudden spike in current and totalCreated; memory RSS jumps simultaneouslyMongoDB logs for election or network events; totalCreated delta over 10 seconds
OS file descriptor exhaustionerror accepting new connection appears before current reaches 80% of max/proc/<pid>/limits and `ls /proc//fd
Long-running operations pinning connectionsConnections idle but not closing; ticket utilization dropsdb.currentOp() for operations with high secs_running

Quick checks

Run these safe, read-only checks to assess the boundary that is actually enforcing the ceiling.

# Check MongoDB connection counters
mongosh --quiet --eval 'db.serverStatus().connections'
# Check active versus current ratio
mongosh --quiet --eval 'var c = db.serverStatus().connections; print("current:", c.current, "available:", c.available, "active:", c.active, "totalCreated:", c.totalCreated)'
# Search logs for refusals and accept errors (adjust path for your deployment)
grep -iE 'connection refused|error accepting|too many open connections' /var/log/mongodb/mongod.log | tail -n 20
# Check OS file descriptor usage against the hard limit
# NOTE: If multiple mongod processes exist on this host, specify the correct PID manually.
MONGO_PID=$(pgrep -x mongod)
ls /proc/$MONGO_PID/fd | wc -l
grep 'Max open files' /proc/$MONGO_PID/limits
# Check memory RSS
mongosh --quiet --eval 'db.serverStatus().mem.resident'
# List current connections by client to find heavy sources
mongosh --quiet --eval 'db.currentOp().inprog.forEach(function(o){ print(o.client || "local", o.connectionId) })' | sort | uniq -c | sort -rn | head

How to diagnose it

  1. Confirm the exact rejection site. error accepting new connection points to the OS file-descriptor limit. connection refused because there are too many open connections points to maxIncomingConnections.

  2. Calculate the MongoDB utilization ratio. Divide current by current + available. Sustained ratios above 80% degrade performance through thread overhead and ticket contention, even if rejections have not started.

  3. Check the real OS limit. File-descriptor exhaustion produces the same client-side symptom but requires a different fix. Compare the fd count in /proc/<pid>/fd to Max open files in /proc/<pid>/limits.

  4. Identify connection sources. Group db.currentOp() by client. A few hosts holding hundreds of connections each indicates oversized driver pools. A broad distribution across many hosts suggests a legitimate traffic surge or a reconnection storm.

  5. Measure connection churn. Sample totalCreated over a 10-second window. If current is stable but totalCreated climbs rapidly, pools are thrashing. Churn burns CPU on thread creation and destroys latency before the connection count reaches the ceiling.

  6. Correlate with recent events. Elections, rolling application deploys, DNS changes, or load-balancer health-check failures can trigger synchronized reconnections. Check rs.status() for recent election timestamps and match them to the start of connection spikes in your metrics.

  7. Check RSS growth. If mem.resident grows in step with connections.current, thread stack allocation is the dominant memory consumer. If RSS is high but connections are low, look for cursor leaks or large aggregations instead.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Connection utilization ratiocurrent / (current + available) is the only reliable headroom metric inside MongoDBSustained above 80%
Connection churn ratetotalCreated delta reveals pool thrashing that static counts hidetotalCreated increasing faster than current by 5x or more over 10 s
File descriptor utilizationOS limits enforce the real ceiling before maxIncomingConnectionsFD count above 80% of ulimit -n
Memory RSSEach connection allocates thread stack memory; RSS spikes predict OOMRSS growth correlating with connection spikes
Log rejection messagesRejections are invisible in serverStatus; logs are the primary sourceAny sustained rate of connection refused or error accepting messages
Active to current ratioIdle connections waste resources without doing workactive / current below 20% sustained

Fixes

Raise or fix the OS file descriptor limit

If ulimit -n is the blocker, increase LimitNOFILE in the systemd service unit or /etc/security/limits.conf and restart mongod. This requires a rolling restart in a replica set. Warning: restarting a primary triggers an election; plan maintenance windows accordingly.

Tradeoff: higher fd limits prevent accept errors but do not reduce memory or ticket pressure from open connections.

Right-size application connection pools

If db.currentOp() shows a few hosts holding hundreds of connections, reduce the driver maxPoolSize. Tradeoff: smaller pools may increase application-side queue latency under burst load, but they prevent server-side thread exhaustion.

Fix connection leaks

If active is low and current is high, applications are not closing connections. Fix client code to close cursors and dispose of clients on shutdown. Tradeoff: requires an application deploy, but it is the only fix that stops memory growth.

Shed load during a connection storm

During a storm, block or rate-limit new connections at the application tier or load balancer until the storm subsides. Do not restart mongod during an active storm unless the instance is unresponsive. If thread overhead consistently drives OOM, plan a maintenance window to lower net.maxIncomingConnections and restart with a harder boundary.

Kill or timeout long-running operations

If connections are held open by slow operations, use db.killOp() on non-essential long runners, or set maxTimeMS in application queries. Tradeoff: killed operations fail on the client, but they free threads and tickets for new connections.

Prevention

  • Monitor current / (current + available), not just absolute connection count.
  • Trend totalCreated deltas to detect churn before pools max out.
  • Monitor OS file-descriptor utilization alongside MongoDB connection metrics.
  • Keep application connection pools sized with headroom; never sum pool maximums to equal the server ceiling.
  • Run monitoring tools and backups against hidden secondaries to reserve primary capacity for application traffic.
  • Review db.currentOp() for the longest-running operation age during peak hours to catch connection holders early.

How Netdata helps

  • Netdata collects serverStatus().connections automatically and charts current, available, and totalCreated deltas, making churn visible without manual sampling.
  • Connection utilization percentage is computed and alerted when the ratio crosses 80%.
  • Connection spikes are correlated on the same timeline with RSS, WiredTiger ticket utilization, and opLatencies, so you can distinguish a connection storm from a storage-layer slowdown.
  • Netdata alerts on rapid totalCreated increases that indicate pool thrashing, even when current appears stable.
  • File descriptor utilization is collected at the OS level and correlated with the MongoDB process, exposing the ulimit -n boundary that MongoDB metrics alone cannot show.