MongoDB Too many open files: file descriptor exhaustion and ulimit tuning

Connection timeouts appear in application logs. MongoDB logs show Too many open files or error accepting new connection. New secondaries fail to sync, or a stable node rejects connections after restart. The mongod process hit the OS file descriptor limit. The failure is often silent until a dependent system breaks.

FD exhaustion is not always about connection count. WiredTiger maintains open file descriptors for data files, indexes, and journals. A dense deployment with thousands of collections can hold tens of thousands of FDs in steady state. A connection surge, index build, or restarted node rebuilding caches can push the process over the limit. MongoDB then cannot accept new connections, open new data files, or continue replication.

The most common root cause is a mismatch between the configured limit and how MongoDB uses file descriptors. Operators set ulimit in a shell profile, restart via systemctl, and find the old limit still applies because systemd overrides it.

What this means

MongoDB uses file descriptors for client connections, WiredTiger data and index files, and internal files such as journals and logs. The default mongod configuration allows up to 65,536 incoming connections, but the OS soft limit is often 1,024 or 4,096. Once the process exceeds the limit, system calls return EMFILE and MongoDB rejects connections or logs errors.

WiredTiger maps each collection and index to at least one file, so baseline FD count scales with schema size. In dense deployments, data files can consume more FDs than client connections. Systemd enforces its own limit via LimitNOFILE, overriding /etc/security/limits.conf. When the effective limit is lower than connections plus data files plus journals and logs, the process hits the ceiling.

Common causes

CauseWhat it looks likeFirst thing to check
Connection surgeConnection count spikes and logs show error accepting new connection`ls /proc//fd
Dense collections and indexesSteady-state FD count is already high; new secondaries struggledb.adminCommand({listDatabases: 1}) and per-collection stats
systemd overriding limits.conflimits.conf is set to 64000 but /proc/<PID>/limits shows 4096`cat /proc//limits
File descriptor leakFD count grows faster than connection count over daysCompare FD growth to connections.totalCreated and current

Quick checks

# Substitute the actual mongod PID. If multiple processes exist, pick the correct one.
MONGOD_PID=$(pgrep mongod)

# Current FD count
ls /proc/$MONGOD_PID/fd | wc -l

# Effective limit for the running process
cat /proc/$MONGOD_PID/limits | grep "Max open files"

# System-wide hard ceiling
cat /proc/sys/fs/file-max

# Open files grouped by type
lsof -p $MONGOD_PID

# MongoDB connection count
mongosh --quiet --eval 'db.serverStatus().connections.current'

# systemd unit limit
systemctl cat mongod | grep -i limitnofile

These checks are read-only and safe during an incident.

How to diagnose it

  1. Confirm the effective limit. Read /proc/<PID>/limits. This shows the kernel-enforced limit, ignoring shell profiles.
  2. Count open FDs. ls /proc/<PID>/fd | wc -l. If this is within 10% of the limit, exhaustion is imminent.
  3. Correlate FDs with connections. Compare db.serverStatus().connections.current to the FD count. Low connections with high FDs points to data files or journals.
  4. Check schema density. db.adminCommand({listDatabases: 1}) and count collections and indexes. Rapid schema growth explains high baseline usage.
  5. Identify init system enforcement. If /proc/<PID>/limits is lower than limits.conf, mongod was likely started via systemd. LimitNOFILE in the unit file or a drop-in takes precedence over limits.conf.
  6. Look for leaks. If FDs grow while connections.current and schema size stay flat, the application may be leaking connections, or WiredTiger may be failing to close idle files. Check db.serverStatus().connections.totalCreated for churn.
flowchart TD
    A[Errors or connection rejects] --> B{Check /proc//limits}
    B -->|Limit too low| C[systemd or limits.conf mismatch]
    B -->|Limit adequate| D{Check FD count vs connections}
    D -->|FDs >> connections| E[Schema density or leak]
    D -->|FDs ~ connections| F[Connection surge]
    E --> G[Count collections/indexes]
    G -->|Growth normal| H[Investigate leak or restart]
    G -->|Growth high| I[Raise limit or shard]
    F --> J[Throttle clients or raise limit]
    C --> K[Apply systemd drop-in and reload]

Metrics and signals to monitor

SignalWhy it mattersWarning sign
FD utilizationDirect measure of headroom before EMFILE>80% of effective limit
Connection countEach connection consumes approximately one FDcurrent trending toward maxIncomingConnections
Database and collection countWiredTiger opens files per collection and indexRapid growth in multi-tenant schemas
Connection errorsEarly indicator that MongoDB is rejecting workSustained error accepting new connection in logs
Connection churnHigh totalCreated rate without high current suggests leaktotalCreated delta growing while current is flat

Fixes

Raise the OS file descriptor limit

The recommended production floor is 64,000 for both nofile and nproc. Edit /etc/security/limits.conf or a file in /etc/security/limits.d/:

<mongodb-user> hard nofile 128000
<mongodb-user> soft nofile 128000
<mongodb-user> hard nproc 64000
<mongodb-user> soft nproc 64000

This requires a process restart and a new user session.

Apply a systemd override

On systemd hosts, limits.conf is ignored if the unit sets LimitNOFILE. Create a drop-in:

# /etc/systemd/system/mongod.service.d/override.conf
[Service]
LimitNOFILE=128000

Then reload and restart:

systemctl daemon-reload
systemctl restart mongod

Warning: systemctl restart mongod is disruptive. It interrupts all connections and can trigger an election on replica sets.

After restart, verify with /proc/<PID>/limits.

Account for schema density

For dense schemas, estimate baseline FD demand: one per connection, two or more per collection, one per additional index, plus journals and logs. If the baseline approaches 64,000, raise the limit to 128,000 before the next growth phase. A higher limit has no performance penalty; only the per-process FD table consumes kernel memory.

Address connection leaks

If FD growth exceeds connection growth, check for application-side connection leaks. Ensure drivers use connection pooling correctly and that clients close cursors.

As a temporary relief, restarting mongod closes all FDs and resets counts. Warning: this is disruptive and does not fix the leak.

Prevention

  • Set the limit for peak plus headroom. Use 64,000 as a minimum. Dense multi-tenant deployments often need 128,000 or more.
  • Verify limits after every restart. Automate a check that compares /proc/<PID>/limits against your intended value. Package upgrades can reset systemd unit files.
  • Monitor FD utilization as a percentage. Alert when utilization exceeds 80%.
  • Audit schema growth. Rapid creation of collections or indexes increases the baseline FD footprint. Track collection and index counts alongside connection counts.
  • Ensure systemd drop-ins survive upgrades. Store override files in /etc/systemd/system/mongod.service.d/ rather than editing the vendor unit file directly.

How Netdata helps

  • Correlate process-level FD count with MongoDB connection metrics and error logs.
  • Chart FD utilization as a percentage of the effective limit.
  • Monitor totalCreated connection deltas alongside current to distinguish leaks from surges.
  • Compare systemd unit limits against process usage to catch override mismatches after restarts.
  • How MongoDB actually works in production: a mental model for operators: /guides/mongodb/how-mongodb-works-in-production/
  • MongoDB pages evicted by application threads: when eviction becomes user latency: /guides/mongodb/mongodb-application-thread-evictions/
  • MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches: /guides/mongodb/mongodb-cache-dirty-ratio-high/
  • MongoDB WiredTiger cache pressure cascade: eviction stalls and latency spikes: /guides/mongodb/mongodb-cache-pressure-cascade/
  • MongoDB cache too small: sizing the WiredTiger cache for your working set: /guides/mongodb/mongodb-cache-undersized-working-set/
  • MongoDB checkpoint duration climbing: diagnosing slow WiredTiger checkpoints: /guides/mongodb/mongodb-checkpoint-duration-high/
  • MongoDB checkpoint stall write freeze: when all writes stop with no error: /guides/mongodb/mongodb-checkpoint-stall-write-freeze/
  • MongoDB connection storm spiral: reconnection floods after an election or deploy: /guides/mongodb/mongodb-connection-storm-spiral/
  • MongoDB flow control throttling writes: when the primary slows itself down: /guides/mongodb/mongodb-flow-control-throttling-writes/
  • MongoDB journal sync latency high: the storage signal that warns 60 seconds early: /guides/mongodb/mongodb-journal-sync-latency-high/
  • MongoDB monitoring checklist: the signals every production cluster needs: /guides/mongodb/mongodb-monitoring-checklist/
  • MongoDB monitoring maturity model: from survival to expert: /guides/mongodb/mongodb-monitoring-maturity-model/