Kubernetes kubelet certificate expired: detection, rotation, and recovery

A healthy node suddenly shows NotReady. Pods keep running, but the kubelet stops reporting status. kubectl logs and kubectl exec fail with TLS errors. The cluster event stream is quiet. This is usually an expired kubelet client certificate that failed to rotate.

Every kubelet maintains two independent TLS credentials: a client certificate that authenticates it to the kube-apiserver, and a serving certificate that secures the kubelet’s own HTTPS endpoints. Both typically have a one-year validity. When the client certificate expires, the kubelet cannot authenticate to the API server. The node goes NotReady. Workloads may continue running, but they are unmanaged: no evictions, no probe execution, no status updates, and no new pod scheduling. When the serving certificate expires, metrics-server, kubectl exec, and kubectl logs break even if the node is otherwise Ready.

What this means

Clusters created with kubeadm issue certificates with a one-year validity. The control plane certificates are managed by kubeadm, but the kubelet client certificate is managed by the kubelet’s own certificate manager. Automatic rotation is enabled by default on most modern clusters (rotateCertificates: true in the kubelet configuration). Rotation typically triggers during the final 20 percent of the certificate lifetime. The kubelet generates a new key pair, submits a CertificateSigningRequest (CSR), and the kube-controller-manager approves it via the csrapproving controller.

If that flow fails silently, the kubelet continues using the existing certificate until the NotAfter date passes. At that moment, all API server requests from the kubelet return 401 Unauthorized. After the node lease expires (default 40 seconds), the node controller marks the node NotReady. The failure is silent until expiry because the kubelet does not log a critical error until it tries to use the expired credential.

A common operator mistake is running kubeadm certs renew all and assuming the kubelet certificate is renewed. That command renews control-plane PKI only. The kubelet client certificate lives under /var/lib/kubelet/pki and is rotated by the kubelet, not by kubeadm.

Common causes

CauseWhat it looks likeFirst thing to check
Automatic rotation disabled or blockedNode NotReady exactly at certificate anniversary; no pending CSRsKubelet config for rotateCertificates and CSR approval controller health
CSR approval failurePending CSRs for node certificates; kubelet logs show CSR submission but no approvalkubectl get csr and kube-controller-manager leader election status
Expired cluster CAkubeadm certs check-expiration warns that CA bounds cannot be validated; multiple components fail simultaneouslyCA certificate dates at /etc/kubernetes/pki/ca.crt
Clock skewCertificate appears expired before or after its actual date; TLS errors on multiple nodesNTP sync status on the node and control plane
Node rename after bootstrapCSR auto-approval fails because the CSR username does not match the node’s current nameNode hostname and CSR subject

Quick checks

# Check kubelet client certificate expiry directly on the node
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates

# Check serving certificate expiry
openssl x509 -in /var/lib/kubelet/pki/kubelet.crt -noout -dates

# Check all kubeadm-managed control plane certificate dates
kubeadm certs check-expiration

# List pending or failed certificate signing requests
kubectl get csr

# Check kubelet logs for certificate or TLS errors
journalctl -u kubelet --since "1 day ago" | grep -iE "certificate|tls|unauthorized"

# Check API server 401 rate from kubelets (control plane metric)
kubectl get --raw /metrics | grep "apiserver_request_total" | grep 'code="401"'

# Check kubelet certificate TTL metric (if the metrics endpoint is reachable and authorized)
curl -sk https://localhost:10250/metrics | grep kubelet_certificate_manager_client_ttl_seconds

A valid certificate shows notAfter in the future. If the date is in the past or within the next seven days, rotation failed. A pending CSR with no approver appears in kubectl get csr as Pending.

How to diagnose it

  1. Confirm the symptom is TLS-related. If the node is NotReady but the kubelet process is running, check journalctl -u kubelet for “certificate has expired” or “x509: certificate signed by unknown authority”. If kubectl exec or kubectl logs fail with TLS handshake errors while kubectl get nodes still works, suspect the serving certificate.

  2. Check certificate expiration dates. On the node, run openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates. If notAfter has passed, the client certificate is expired. Run openssl x509 -in /var/lib/kubelet/pki/kubelet.crt -noout -dates to check the serving certificate.

  3. Verify the CA is still valid. If the CA at /etc/kubernetes/pki/ca.crt has expired, kubeadm certs check-expiration emits warnings that certificate bounds cannot be validated. An expired CA is a larger event than a single expired kubelet certificate. Do not proceed with node-level recovery until the CA is regenerated.

  4. Check for pending CSRs. Run kubectl get csr. Look for CSRs from the node with names like system:node:<node-name>. If a CSR exists and is Pending, the auto-approval controller is not running or the CSR does not match auto-approval criteria. If no CSR exists, the kubelet may have stopped requesting rotation.

  5. Distinguish client from serving cert failure.

    • Client cert expired: node NotReady, kubelet cannot update status, node lease expires.
    • Serving cert expired: node may still show Ready, but kubectl exec, kubectl logs, and metrics-server scrape fail with TLS errors.
  6. Check API server authentication metrics. On the control plane, apiserver_request_total{code="401"} spiking from the node’s IP confirms the kubelet is presenting an expired or invalid client certificate.

  7. Check kubelet feature gates and config. Verify /var/lib/kubelet/config.yaml contains rotateCertificates: true. If this is false, automatic rotation was disabled.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
kubelet_certificate_manager_client_ttl_secondsSeconds until the client certificate expires.Value trending toward zero or below 7 days.
kubelet_certificate_manager_server_ttl_secondsSeconds until the serving certificate expires.Value trending toward zero; breaks metrics-server before NotReady.
kubelet_certificate_manager_client_expiration_renew_errorsRotation is failing silently if this increments.Any non-zero value sustained over 5 minutes.
Node Ready conditionNotReady is the visible symptom of client cert expiry.Ready=False or Unknown for > 1 minute.
API server 401 rateExpired client certs cause mass authentication failures.apiserver_request_total{code="401"} spiking from node IPs.
Pending CSRsUnapproved CSRs block rotation.Any CSR from system:node:* in Pending for > 10 minutes.
Cluster CA expiryIf the CA expires, all dependent certs break.CA notAfter within 30 days.

Fixes

If the client certificate is expired but the CA is valid

This is the most common scenario. The kubelet can re-bootstrap using its bootstrap token.

  1. On the affected node, back up then remove the stale kubelet PKI files:
# WARNING: Destructive. This wipes kubelet client certificates and forces re-bootstrap.
# Ensure /etc/kubernetes/bootstrap-kubelet.conf contains a valid token before proceeding.
sudo cp -a /var/lib/kubelet/pki /var/lib/kubelet/pki.bak.$(date +%s)
sudo rm /var/lib/kubelet/pki/kubelet-client-*
  1. Restart the kubelet to trigger bootstrap from the bootstrap-kubeconfig:
# Disruptive: restarts the kubelet process.
sudo systemctl restart kubelet
  1. Approve the new CSR if it does not auto-approve:
kubectl certificate approve <csr-name>
  1. Verify the node returns to Ready and the new certificate has a future notAfter date.

If the serving certificate is expired

The serving certificate is used for kubectl exec, kubectl logs, and metrics-server scraping. If the client cert is still valid but the serving cert has expired:

  1. Check if serverTLSBootstrap: true is set in the kubelet configuration. If it is false (the default), the kubelet self-signs the serving certificate and will regenerate it on restart. If it is true, serving certificate rotation requires a custom CSR approver or manual approval because the built-in controller does not auto-approve serving CSRs.

  2. If serverTLSBootstrap: true and a pending serving CSR exists, manually approve it. If self-signed, restart the kubelet to force regeneration.

  3. Restart the kubelet after replacing the serving certificate files. Kubelet does not hot-reload changed TLS credentials.

If the cluster CA is expired

If /etc/kubernetes/pki/ca.crt has expired, kubeadm certs check-expiration warns that bounds cannot be validated. The entire trust chain is broken. You must regenerate the cluster CA, regenerate all dependent certificates (API server, kubelet client certs, etcd certs), distribute the new CA to all nodes, and restart all control plane components and kubelets.

This is a multi-step manual process. Do not attempt node-level kubelet bootstrap until the CA is valid again.

If CSR auto-approval is failing

If CSRs are pending and the CA is valid:

  1. Verify the kube-controller-manager is running and has elected a leader.
  2. Check the kubelet logs to confirm the CSR was submitted with the expected username (system:node:<nodeName> for renewals, system:bootstrap:<token-id> for initial bootstrap). A hostname change blocks auto-approval.
  3. If the controller-manager is healthy but CSRs remain pending, manually approve the CSR as an emergency fix.

Prevention

  • Monitor kubelet_certificate_manager_client_ttl_seconds and kubelet_certificate_manager_server_ttl_seconds with alerts at 30 days, 7 days, and 1 day.
  • Do not rely solely on kubeadm certs check-expiration for kubelet health. It does not monitor the kubelet-managed client certificate under /var/lib/kubelet/pki.
  • Verify that rotateCertificates: true is set in every kubelet’s configuration and that the CSR approver is functional after every control plane upgrade.
  • Verify that bootstrap tokens remain valid if you rely on kubeadm-managed node bootstrapping.
  • Ensure NTP is synchronized across all nodes and control plane hosts to prevent clock skew from causing premature certificate validation failures.

How Netdata helps

Netdata collects the signals that reveal certificate degradation before it becomes an outage:

  • Correlates node NotReady transitions with kubelet certificate TTL metrics on the same timeline.
  • Charts apiserver_request_total{code="401"} spikes alongside affected node names to identify mass certificate failures.
  • Tracks kubelet certificate manager renew errors to catch silent rotation failures.
  • Monitors control plane certificate expiration via kubeadm certs checks or direct file inspection where available.
  • Displays node-level kubelet metrics and API server authentication error rates together to speed root cause identification.
flowchart TD
    A[Node NotReady or TLS errors] --> B{Check client cert expiry}
    B -->|Expired or < 7 days| C{Check serving cert}
    B -->|Valid| D[Investigate API server or network]
    C -->|Expired| E{Check CA expiry}
    C -->|Valid| F[Check CSR approval and controller-manager]
    E -->|CA expired| G[Regenerate CA and all cluster certs]
    E -->|CA valid| H[Recover via bootstrap token and restart kubelet]
    F -->|Pending CSRs| I[Approve CSRs or fix controller-manager]
    F -->|No CSRs| J[Force rotation by wiping /var/lib/kubelet/pki and restarting]