MongoDB TLS certificate expiry: rotating certs without dropping connections
An expired TLS certificate does not degrade gracefully. In MongoDB, client drivers, replica set members, and mongos routers validate certificates during every TLS handshake. When the server certificate’s notAfter date passes, OpenSSL rejects the handshake immediately. Existing connections persist until they close, but nothing reconnects once they do. The result is a sudden cascade: secondaries miss heartbeats and transition to DOWN, application connection pools exhaust, and monitoring probes fail. During an active incident, check certificate dates first. For preventive work, your rotation strategy depends on whether you run MongoDB 5.0 or later.
What this means
MongoDB uses OpenSSL-backed TLS for client and intra-cluster communication. The server presents its certificate on every new TLS connection. If the certificate has expired, the peer rejects the handshake before application-level authentication. For replica sets, secondaries cannot open new connections to the primary for heartbeats and oplog tailing. For sharded clusters, mongos instances cannot open new connections to shards. MongoDB may log a warning when a certificate is within 30 days of expiry, but many deployments never surface this line. The failure is binary: connections work until they do not.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Server certificate expired | New TLS connections fail; replica set members report DOWN/UNKNOWN; apps see connection timeouts | openssl s_client notAfter date on each mongod and mongos |
| CA certificate expired or missing | Chain validation fails across all clients and members even if the server cert is valid | openssl verify with the full chain against the configured CAFile |
| Certificate not yet valid | Immediate handshake failures after deploying new certificates | System clock with timedatectl or ntpstat; certificate notBefore date |
| Cluster membership certificate DN mismatch | Members reject each other with “unauthorized” after rotation; states flip to RECOVERING | rs.status() lastHeartbeatMessage; compare subject and issuer DNs |
| Thumbprint selector incompatibility | rotateCertificates appears to succeed but the process continues using the old certificate | Startup config for --tlsCertificateSelector set to thumbprint |
Quick checks
# Check server certificate expiry on a running mongod
openssl s_client -connect localhost:27017 </dev/null 2>/dev/null | openssl x509 -noout -dates
# Check local certificate file expiry directly
openssl x509 -in /etc/mongodb/server.pem -noout -dates
# Verify certificate subject and issuer
openssl s_client -connect localhost:27017 </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer
# Check certificate SANs locally
openssl x509 -in /etc/mongodb/server.pem -noout -text | grep -A1 "Subject Alternative Name"
# Check MongoDB TLS handshake performance issues
mongosh --quiet --eval 'db.serverStatus().network.numSlowSSLOperations'
# Search logs for TLS and certificate errors
grep -iE "ssl|tls|handshake|certificate" /var/log/mongodb/mongod.log | tail -20
# Check replica set member health from a stable node
mongosh --quiet --eval 'rs.status().members.forEach(m => print(m.name + " " + m.stateStr + " " + (m.lastHeartbeatMessage || "")))'
# Verify certificate chain trust
openssl verify -CAfile /etc/mongodb/ca.pem /etc/mongodb/server.pem
How to diagnose it
- Identify the failing certificate. Run
openssl s_clientagainst every mongod and mongos port. If you have multiple mongos instances behind a load balancer, test each backend directly. Read the notAfter dates. Compare the notBefore date across nodes to catch backdated or future-dated files. Do not rely on file mtimes or issuance paperwork. - Check
db.serverStatus().network.numSlowSSLOperations. A sustained increase above baseline indicates SSL handshakes are stalling. This can precede outright expiry failures and may signal OCSP responder delays or chain validation overhead. - Inspect replica set member states. Intra-cluster TLS failures present as missed heartbeats. Check
rs.status()for members in DOWN, UNKNOWN, or RECOVERING, and readlastHeartbeatMessagefor TLS-related errors. - Determine your MongoDB version. Version 5.0 and later support
rotateCertificatesfor online reload. Earlier versions require a rolling restart. - Check for CVE-2024-1351 exposure. If the server was started with
net.tls.modeset toallowTLS,preferTLS, orrequireTLSwithoutnet.tls.CAFile, peer certificate validation may be skipped. Configure a CAFile when TLS is enabled, and upgrade to a patched version.
flowchart TD
A[Detect expiry warning or TLS failure] --> B[Check notAfter with openssl s_client]
B --> C{Certificate expired?}
C -->|Yes| D{MongoDB version}
C -->|No| E[Check CA chain and clock skew]
D -->|5.0+| F[rotateCertificates online]
D -->|4.4 and earlier| G[Rolling restart]
E --> H[Fix CA file or sync NTP]
F --> I[Verify with openssl s_client]
G --> I
H --> I
I --> J[Monitor member states and numSlowSSLOperations]Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Certificate notAfter date | Expired certificates break all new TLS handshakes immediately | Less than 30 days to expiry; less than 7 days is urgent |
network.numSlowSSLOperations | Tracks SSL operations exceeding the slow threshold; rising values indicate handshake health issues | Sustained increase above baseline |
| Replica set member state | Intra-cluster TLS failures block heartbeats and replication | Any data-bearing member in DOWN, UNKNOWN, or RECOVERING |
| Connection errors in logs | Direct evidence of rejected handshakes or trust failures | Sustained rate of TLS or connection errors |
Assertion counts (user) | Authentication and validation failures spike when clients retry after handshake errors | Sudden increase in user assertions |
Fixes
Online rotation (MongoDB 5.0 and later)
MongoDB 5.0 introduced rotateCertificates, which loads new certificate and key files from disk without terminating existing connections. New connections immediately use the rotated certificates. The command supports rotation of server TLS certificates, CRL files on Linux and Windows, and CA files. Run it on each mongod and mongos process individually.
Prerequisites:
- The
hostManagerrole or equivalent privileges. - New certificate files already written to the paths configured in
net.tls.certificateKeyFile,net.tls.CAFile, andnet.tls.CRLFile. - The new certificates are valid, unexpired, and trusted.
- File permissions must allow the mongod user to read the new keys without exposing them to other users.
- Encrypted PEM key files must have the password supplied via the
certificateKeyFilePasswordconfiguration setting. Interactive password prompts are not supported duringrotateCertificates.
Procedure:
- Write the new certificate files into place. Do not move or delete the old files until rotation succeeds.
- Ensure the mongod process can read the new files at the configured paths.
- Run:
db.adminCommand({ rotateCertificates: 1 }) - Verify by opening a new connection with
openssl s_clientand confirming the notAfter date. - If the command fails because the new files are incorrect, expired, or revoked, the running process retains the old TLS configuration and continues serving. Check the logs for the validation error, fix the files, and retry.
Limitations:
rotateCertificatesfails if mongod was started with--tlsCertificateSelectorset to thumbprint. Use a rolling restart instead.- When rotating X.509 cluster membership certificates, nodes compare DN attributes between peer certificates. A mismatch causes rejection. Use
tlsX509ClusterAuthDNOverridefor legacy clusters ortlsClusterAuthX509Overridefor clusters usingnet.tls.clusterAuthX509.attributesto bridge the gap during mixed-version rollout.
Rolling restart (MongoDB 4.4 and earlier)
Deployments running MongoDB 4.4 and earlier cannot reload certificates online. Use a rolling restart to avoid a full outage.
Procedure for a replica set:
- Update the certificate files on one secondary.
- Restart the secondary. Wait until it returns to SECONDARY state.
- Repeat for each remaining secondary.
- Step down the primary with
rs.stepDown(60). Ensure an electable secondary is available first. - Update certificate files on the former primary.
- Restart the former primary.
During each restart, connections to that node drop. The replica set remains available if a majority of voting members are up. Confirm that your driver retry settings and write concern tolerate brief unavailability before you begin.
Prevention
- Automated certificate monitoring. Query certificate expiry dates weekly with
openssl s_clientor an external monitor. Alert at 30 days; escalate at 7. - Staging rehearsal. Rotate certificates in staging before production to catch PEM formatting errors, chain trust gaps, and DN mismatch issues.
- Mandatory CAFile configuration. Always set
net.tls.CAFilewhen enabling TLS. Deployments without a CAFile may be vulnerable to CVE-2024-1351, where peer certificate validation can be bypassed. - NTP on all hosts. A certificate that is not yet valid because of clock skew produces identical symptoms to an expired certificate.
- Avoid thumbprint selectors for online rotation. File-based configuration is required for
rotateCertificatesto reload. - Store certificates in a path that does not change between rotations, so mongod configuration remains static and only file contents change.
How Netdata helps
- Tracks
network.numSlowSSLOperationsover time, establishing a baseline that makes handshake deviations visible before failures cascade. - Correlating replica set member state changes with connection error logs distinguishes certificate expiry from network partitions or elections.
- Connection utilization metrics, including
totalCreateddelta, reveal whether a rotation triggers connection churn or pool exhaustion. - Cross-referencing MongoDB
serverStatusmetrics with OS-level disk or CPU charts rules out storage bottlenecks during rotation windows.
Related guides
- How MongoDB actually works in production: a mental model for operators
- MongoDB pages evicted by application threads: when eviction becomes user latency
- MongoDB balancer stuck and jumbo chunks: permanent imbalance and how to fix it
- MongoDB WiredTiger cache dirty ratio high: the leading indicator nobody watches
- MongoDB WiredTiger cache pressure cascade: eviction stalls and latency spikes
- MongoDB cache too small: sizing the WiredTiger cache for your working set
- MongoDB checkpoint duration climbing: diagnosing slow WiredTiger checkpoints
- MongoDB checkpoint stall write freeze: when all writes stop with no error
- MongoDB chunk migration storms: moveChunk I/O pressure and range locks
- MongoDB connection churn: high totalCreated rate and thread creation overhead
- MongoDB connection refused at maxIncomingConnections: hitting the connection ceiling
- MongoDB connection storm spiral: reconnection floods after an election or deploy







