MongoDB TLS certificate expiry: rotating certs without dropping connections

An expired TLS certificate does not degrade gracefully. In MongoDB, client drivers, replica set members, and mongos routers validate certificates during every TLS handshake. When the server certificate’s notAfter date passes, OpenSSL rejects the handshake immediately. Existing connections persist until they close, but nothing reconnects once they do. The result is a sudden cascade: secondaries miss heartbeats and transition to DOWN, application connection pools exhaust, and monitoring probes fail. During an active incident, check certificate dates first. For preventive work, your rotation strategy depends on whether you run MongoDB 5.0 or later.

What this means

MongoDB uses OpenSSL-backed TLS for client and intra-cluster communication. The server presents its certificate on every new TLS connection. If the certificate has expired, the peer rejects the handshake before application-level authentication. For replica sets, secondaries cannot open new connections to the primary for heartbeats and oplog tailing. For sharded clusters, mongos instances cannot open new connections to shards. MongoDB may log a warning when a certificate is within 30 days of expiry, but many deployments never surface this line. The failure is binary: connections work until they do not.

Common causes

CauseWhat it looks likeFirst thing to check
Server certificate expiredNew TLS connections fail; replica set members report DOWN/UNKNOWN; apps see connection timeoutsopenssl s_client notAfter date on each mongod and mongos
CA certificate expired or missingChain validation fails across all clients and members even if the server cert is validopenssl verify with the full chain against the configured CAFile
Certificate not yet validImmediate handshake failures after deploying new certificatesSystem clock with timedatectl or ntpstat; certificate notBefore date
Cluster membership certificate DN mismatchMembers reject each other with “unauthorized” after rotation; states flip to RECOVERINGrs.status() lastHeartbeatMessage; compare subject and issuer DNs
Thumbprint selector incompatibilityrotateCertificates appears to succeed but the process continues using the old certificateStartup config for --tlsCertificateSelector set to thumbprint

Quick checks

# Check server certificate expiry on a running mongod
openssl s_client -connect localhost:27017 </dev/null 2>/dev/null | openssl x509 -noout -dates
# Check local certificate file expiry directly
openssl x509 -in /etc/mongodb/server.pem -noout -dates
# Verify certificate subject and issuer
openssl s_client -connect localhost:27017 </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer
# Check certificate SANs locally
openssl x509 -in /etc/mongodb/server.pem -noout -text | grep -A1 "Subject Alternative Name"
# Check MongoDB TLS handshake performance issues
mongosh --quiet --eval 'db.serverStatus().network.numSlowSSLOperations'
# Search logs for TLS and certificate errors
grep -iE "ssl|tls|handshake|certificate" /var/log/mongodb/mongod.log | tail -20
# Check replica set member health from a stable node
mongosh --quiet --eval 'rs.status().members.forEach(m => print(m.name + " " + m.stateStr + " " + (m.lastHeartbeatMessage || "")))'
# Verify certificate chain trust
openssl verify -CAfile /etc/mongodb/ca.pem /etc/mongodb/server.pem

How to diagnose it

  1. Identify the failing certificate. Run openssl s_client against every mongod and mongos port. If you have multiple mongos instances behind a load balancer, test each backend directly. Read the notAfter dates. Compare the notBefore date across nodes to catch backdated or future-dated files. Do not rely on file mtimes or issuance paperwork.
  2. Check db.serverStatus().network.numSlowSSLOperations. A sustained increase above baseline indicates SSL handshakes are stalling. This can precede outright expiry failures and may signal OCSP responder delays or chain validation overhead.
  3. Inspect replica set member states. Intra-cluster TLS failures present as missed heartbeats. Check rs.status() for members in DOWN, UNKNOWN, or RECOVERING, and read lastHeartbeatMessage for TLS-related errors.
  4. Determine your MongoDB version. Version 5.0 and later support rotateCertificates for online reload. Earlier versions require a rolling restart.
  5. Check for CVE-2024-1351 exposure. If the server was started with net.tls.mode set to allowTLS, preferTLS, or requireTLS without net.tls.CAFile, peer certificate validation may be skipped. Configure a CAFile when TLS is enabled, and upgrade to a patched version.
flowchart TD
    A[Detect expiry warning or TLS failure] --> B[Check notAfter with openssl s_client]
    B --> C{Certificate expired?}
    C -->|Yes| D{MongoDB version}
    C -->|No| E[Check CA chain and clock skew]
    D -->|5.0+| F[rotateCertificates online]
    D -->|4.4 and earlier| G[Rolling restart]
    E --> H[Fix CA file or sync NTP]
    F --> I[Verify with openssl s_client]
    G --> I
    H --> I
    I --> J[Monitor member states and numSlowSSLOperations]

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Certificate notAfter dateExpired certificates break all new TLS handshakes immediatelyLess than 30 days to expiry; less than 7 days is urgent
network.numSlowSSLOperationsTracks SSL operations exceeding the slow threshold; rising values indicate handshake health issuesSustained increase above baseline
Replica set member stateIntra-cluster TLS failures block heartbeats and replicationAny data-bearing member in DOWN, UNKNOWN, or RECOVERING
Connection errors in logsDirect evidence of rejected handshakes or trust failuresSustained rate of TLS or connection errors
Assertion counts (user)Authentication and validation failures spike when clients retry after handshake errorsSudden increase in user assertions

Fixes

Online rotation (MongoDB 5.0 and later)

MongoDB 5.0 introduced rotateCertificates, which loads new certificate and key files from disk without terminating existing connections. New connections immediately use the rotated certificates. The command supports rotation of server TLS certificates, CRL files on Linux and Windows, and CA files. Run it on each mongod and mongos process individually.

Prerequisites:

  • The hostManager role or equivalent privileges.
  • New certificate files already written to the paths configured in net.tls.certificateKeyFile, net.tls.CAFile, and net.tls.CRLFile.
  • The new certificates are valid, unexpired, and trusted.
  • File permissions must allow the mongod user to read the new keys without exposing them to other users.
  • Encrypted PEM key files must have the password supplied via the certificateKeyFilePassword configuration setting. Interactive password prompts are not supported during rotateCertificates.

Procedure:

  1. Write the new certificate files into place. Do not move or delete the old files until rotation succeeds.
  2. Ensure the mongod process can read the new files at the configured paths.
  3. Run: db.adminCommand({ rotateCertificates: 1 })
  4. Verify by opening a new connection with openssl s_client and confirming the notAfter date.
  5. If the command fails because the new files are incorrect, expired, or revoked, the running process retains the old TLS configuration and continues serving. Check the logs for the validation error, fix the files, and retry.

Limitations:

  • rotateCertificates fails if mongod was started with --tlsCertificateSelector set to thumbprint. Use a rolling restart instead.
  • When rotating X.509 cluster membership certificates, nodes compare DN attributes between peer certificates. A mismatch causes rejection. Use tlsX509ClusterAuthDNOverride for legacy clusters or tlsClusterAuthX509Override for clusters using net.tls.clusterAuthX509.attributes to bridge the gap during mixed-version rollout.

Rolling restart (MongoDB 4.4 and earlier)

Deployments running MongoDB 4.4 and earlier cannot reload certificates online. Use a rolling restart to avoid a full outage.

Procedure for a replica set:

  1. Update the certificate files on one secondary.
  2. Restart the secondary. Wait until it returns to SECONDARY state.
  3. Repeat for each remaining secondary.
  4. Step down the primary with rs.stepDown(60). Ensure an electable secondary is available first.
  5. Update certificate files on the former primary.
  6. Restart the former primary.

During each restart, connections to that node drop. The replica set remains available if a majority of voting members are up. Confirm that your driver retry settings and write concern tolerate brief unavailability before you begin.

Prevention

  • Automated certificate monitoring. Query certificate expiry dates weekly with openssl s_client or an external monitor. Alert at 30 days; escalate at 7.
  • Staging rehearsal. Rotate certificates in staging before production to catch PEM formatting errors, chain trust gaps, and DN mismatch issues.
  • Mandatory CAFile configuration. Always set net.tls.CAFile when enabling TLS. Deployments without a CAFile may be vulnerable to CVE-2024-1351, where peer certificate validation can be bypassed.
  • NTP on all hosts. A certificate that is not yet valid because of clock skew produces identical symptoms to an expired certificate.
  • Avoid thumbprint selectors for online rotation. File-based configuration is required for rotateCertificates to reload.
  • Store certificates in a path that does not change between rotations, so mongod configuration remains static and only file contents change.

How Netdata helps

  • Tracks network.numSlowSSLOperations over time, establishing a baseline that makes handshake deviations visible before failures cascade.
  • Correlating replica set member state changes with connection error logs distinguishes certificate expiry from network partitions or elections.
  • Connection utilization metrics, including totalCreated delta, reveal whether a rotation triggers connection churn or pool exhaustion.
  • Cross-referencing MongoDB serverStatus metrics with OS-level disk or CPU charts rules out storage bottlenecks during rotation windows.