Kubernetes kubelet certificate expired: detection, rotation, and recovery
A healthy node suddenly shows NotReady. Pods keep running, but the kubelet stops reporting status. kubectl logs and kubectl exec fail with TLS errors. The cluster event stream is quiet. This is usually an expired kubelet client certificate that failed to rotate.
Every kubelet maintains two independent TLS credentials: a client certificate that authenticates it to the kube-apiserver, and a serving certificate that secures the kubelet’s own HTTPS endpoints. Both typically have a one-year validity. When the client certificate expires, the kubelet cannot authenticate to the API server. The node goes NotReady. Workloads may continue running, but they are unmanaged: no evictions, no probe execution, no status updates, and no new pod scheduling. When the serving certificate expires, metrics-server, kubectl exec, and kubectl logs break even if the node is otherwise Ready.
What this means
Clusters created with kubeadm issue certificates with a one-year validity. The control plane certificates are managed by kubeadm, but the kubelet client certificate is managed by the kubelet’s own certificate manager. Automatic rotation is enabled by default on most modern clusters (rotateCertificates: true in the kubelet configuration). Rotation typically triggers during the final 20 percent of the certificate lifetime. The kubelet generates a new key pair, submits a CertificateSigningRequest (CSR), and the kube-controller-manager approves it via the csrapproving controller.
If that flow fails silently, the kubelet continues using the existing certificate until the NotAfter date passes. At that moment, all API server requests from the kubelet return 401 Unauthorized. After the node lease expires (default 40 seconds), the node controller marks the node NotReady. The failure is silent until expiry because the kubelet does not log a critical error until it tries to use the expired credential.
A common operator mistake is running kubeadm certs renew all and assuming the kubelet certificate is renewed. That command renews control-plane PKI only. The kubelet client certificate lives under /var/lib/kubelet/pki and is rotated by the kubelet, not by kubeadm.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Automatic rotation disabled or blocked | Node NotReady exactly at certificate anniversary; no pending CSRs | Kubelet config for rotateCertificates and CSR approval controller health |
| CSR approval failure | Pending CSRs for node certificates; kubelet logs show CSR submission but no approval | kubectl get csr and kube-controller-manager leader election status |
| Expired cluster CA | kubeadm certs check-expiration warns that CA bounds cannot be validated; multiple components fail simultaneously | CA certificate dates at /etc/kubernetes/pki/ca.crt |
| Clock skew | Certificate appears expired before or after its actual date; TLS errors on multiple nodes | NTP sync status on the node and control plane |
| Node rename after bootstrap | CSR auto-approval fails because the CSR username does not match the node’s current name | Node hostname and CSR subject |
Quick checks
# Check kubelet client certificate expiry directly on the node
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates
# Check serving certificate expiry
openssl x509 -in /var/lib/kubelet/pki/kubelet.crt -noout -dates
# Check all kubeadm-managed control plane certificate dates
kubeadm certs check-expiration
# List pending or failed certificate signing requests
kubectl get csr
# Check kubelet logs for certificate or TLS errors
journalctl -u kubelet --since "1 day ago" | grep -iE "certificate|tls|unauthorized"
# Check API server 401 rate from kubelets (control plane metric)
kubectl get --raw /metrics | grep "apiserver_request_total" | grep 'code="401"'
# Check kubelet certificate TTL metric (if the metrics endpoint is reachable and authorized)
curl -sk https://localhost:10250/metrics | grep kubelet_certificate_manager_client_ttl_seconds
A valid certificate shows notAfter in the future. If the date is in the past or within the next seven days, rotation failed. A pending CSR with no approver appears in kubectl get csr as Pending.
How to diagnose it
Confirm the symptom is TLS-related. If the node is NotReady but the kubelet process is running, check
journalctl -u kubeletfor “certificate has expired” or “x509: certificate signed by unknown authority”. Ifkubectl execorkubectl logsfail with TLS handshake errors whilekubectl get nodesstill works, suspect the serving certificate.Check certificate expiration dates. On the node, run
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates. IfnotAfterhas passed, the client certificate is expired. Runopenssl x509 -in /var/lib/kubelet/pki/kubelet.crt -noout -datesto check the serving certificate.Verify the CA is still valid. If the CA at
/etc/kubernetes/pki/ca.crthas expired,kubeadm certs check-expirationemits warnings that certificate bounds cannot be validated. An expired CA is a larger event than a single expired kubelet certificate. Do not proceed with node-level recovery until the CA is regenerated.Check for pending CSRs. Run
kubectl get csr. Look for CSRs from the node with names likesystem:node:<node-name>. If a CSR exists and isPending, the auto-approval controller is not running or the CSR does not match auto-approval criteria. If no CSR exists, the kubelet may have stopped requesting rotation.Distinguish client from serving cert failure.
- Client cert expired: node NotReady, kubelet cannot update status, node lease expires.
- Serving cert expired: node may still show Ready, but
kubectl exec,kubectl logs, and metrics-server scrape fail with TLS errors.
Check API server authentication metrics. On the control plane,
apiserver_request_total{code="401"}spiking from the node’s IP confirms the kubelet is presenting an expired or invalid client certificate.Check kubelet feature gates and config. Verify
/var/lib/kubelet/config.yamlcontainsrotateCertificates: true. If this is false, automatic rotation was disabled.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
kubelet_certificate_manager_client_ttl_seconds | Seconds until the client certificate expires. | Value trending toward zero or below 7 days. |
kubelet_certificate_manager_server_ttl_seconds | Seconds until the serving certificate expires. | Value trending toward zero; breaks metrics-server before NotReady. |
kubelet_certificate_manager_client_expiration_renew_errors | Rotation is failing silently if this increments. | Any non-zero value sustained over 5 minutes. |
| Node Ready condition | NotReady is the visible symptom of client cert expiry. | Ready=False or Unknown for > 1 minute. |
| API server 401 rate | Expired client certs cause mass authentication failures. | apiserver_request_total{code="401"} spiking from node IPs. |
| Pending CSRs | Unapproved CSRs block rotation. | Any CSR from system:node:* in Pending for > 10 minutes. |
| Cluster CA expiry | If the CA expires, all dependent certs break. | CA notAfter within 30 days. |
Fixes
If the client certificate is expired but the CA is valid
This is the most common scenario. The kubelet can re-bootstrap using its bootstrap token.
- On the affected node, back up then remove the stale kubelet PKI files:
# WARNING: Destructive. This wipes kubelet client certificates and forces re-bootstrap.
# Ensure /etc/kubernetes/bootstrap-kubelet.conf contains a valid token before proceeding.
sudo cp -a /var/lib/kubelet/pki /var/lib/kubelet/pki.bak.$(date +%s)
sudo rm /var/lib/kubelet/pki/kubelet-client-*
- Restart the kubelet to trigger bootstrap from the bootstrap-kubeconfig:
# Disruptive: restarts the kubelet process.
sudo systemctl restart kubelet
- Approve the new CSR if it does not auto-approve:
kubectl certificate approve <csr-name>
- Verify the node returns to Ready and the new certificate has a future
notAfterdate.
If the serving certificate is expired
The serving certificate is used for kubectl exec, kubectl logs, and metrics-server scraping. If the client cert is still valid but the serving cert has expired:
Check if
serverTLSBootstrap: trueis set in the kubelet configuration. If it is false (the default), the kubelet self-signs the serving certificate and will regenerate it on restart. If it is true, serving certificate rotation requires a custom CSR approver or manual approval because the built-in controller does not auto-approve serving CSRs.If
serverTLSBootstrap: trueand a pending serving CSR exists, manually approve it. If self-signed, restart the kubelet to force regeneration.Restart the kubelet after replacing the serving certificate files. Kubelet does not hot-reload changed TLS credentials.
If the cluster CA is expired
If /etc/kubernetes/pki/ca.crt has expired, kubeadm certs check-expiration warns that bounds cannot be validated. The entire trust chain is broken. You must regenerate the cluster CA, regenerate all dependent certificates (API server, kubelet client certs, etcd certs), distribute the new CA to all nodes, and restart all control plane components and kubelets.
This is a multi-step manual process. Do not attempt node-level kubelet bootstrap until the CA is valid again.
If CSR auto-approval is failing
If CSRs are pending and the CA is valid:
- Verify the kube-controller-manager is running and has elected a leader.
- Check the kubelet logs to confirm the CSR was submitted with the expected username (
system:node:<nodeName>for renewals,system:bootstrap:<token-id>for initial bootstrap). A hostname change blocks auto-approval. - If the controller-manager is healthy but CSRs remain pending, manually approve the CSR as an emergency fix.
Prevention
- Monitor
kubelet_certificate_manager_client_ttl_secondsandkubelet_certificate_manager_server_ttl_secondswith alerts at 30 days, 7 days, and 1 day. - Do not rely solely on
kubeadm certs check-expirationfor kubelet health. It does not monitor the kubelet-managed client certificate under/var/lib/kubelet/pki. - Verify that
rotateCertificates: trueis set in every kubelet’s configuration and that the CSR approver is functional after every control plane upgrade. - Verify that bootstrap tokens remain valid if you rely on kubeadm-managed node bootstrapping.
- Ensure NTP is synchronized across all nodes and control plane hosts to prevent clock skew from causing premature certificate validation failures.
How Netdata helps
Netdata collects the signals that reveal certificate degradation before it becomes an outage:
- Correlates node
NotReadytransitions with kubelet certificate TTL metrics on the same timeline. - Charts
apiserver_request_total{code="401"}spikes alongside affected node names to identify mass certificate failures. - Tracks kubelet certificate manager renew errors to catch silent rotation failures.
- Monitors control plane certificate expiration via
kubeadm certschecks or direct file inspection where available. - Displays node-level kubelet metrics and API server authentication error rates together to speed root cause identification.
Related guides
- See Kubernetes kubelet not responding: PLEG, runtime, and certificate issues for broader kubelet failure modes.
- See Kubernetes node DiskPressure: detection, eviction, and recovery for node-level resource pressure that can accompany certificate incidents.
- See Kubernetes monitoring checklist: the signals every production cluster needs for baseline cluster monitoring.
- See Kubernetes API server slow or unresponsive: causes and fixes if API server latency is blocking CSR approval.
- See Kubernetes eviction cascade: when one node failure takes down the cluster for managing workload impact during node recovery.
flowchart TD
A[Node NotReady or TLS errors] --> B{Check client cert expiry}
B -->|Expired or < 7 days| C{Check serving cert}
B -->|Valid| D[Investigate API server or network]
C -->|Expired| E{Check CA expiry}
C -->|Valid| F[Check CSR approval and controller-manager]
E -->|CA expired| G[Regenerate CA and all cluster certs]
E -->|CA valid| H[Recover via bootstrap token and restart kubelet]
F -->|Pending CSRs| I[Approve CSRs or fix controller-manager]
F -->|No CSRs| J[Force rotation by wiping /var/lib/kubelet/pki and restarting]





