Kubernetes API server slow or unresponsive: causes and fixes

When kubectl hangs, controllers log context deadline exceeded, and deployments stall, the Kubernetes API server is usually the bottleneck. It is the single funnel for every read and write to cluster state. Slowness propagates to scheduling, pod lifecycle, service discovery, and external automation.

This article covers operational causes and gives a step-by-step diagnostic flow to run during an incident. Use it to distinguish etcd latency, admission webhook stalls, request saturation, and memory pressure.

What this means

The API server is a stateless HTTP front-end to etcd. Every request passes through authentication, authorization, admission control, and then storage. Slowness means one of these stages is blocked. Unresponsiveness means the process is OOM-killed, deadlocked, or unable to reach its backing store.

In practice, you see three symptom classes:

  • Elevated latency: kubectl commands take seconds, controller reconciliation lags, and scheduling delays grow.
  • Saturation: the API server returns 429 (Too Many Requests) as inflight limits or API Priority and Fairness (APF) queues fill.
  • Outright failure: the process crashes, restarts, or stops passing /livez, causing all cluster operations to halt.

Common causes

CauseWhat it looks likeFirst thing to check
etcd disk latencyMutating requests slow; /readyz/etcd fails; WAL fsync p99 above 100 msetcd_disk_wal_fsync_duration_seconds
Admission webhook timeoutMutating latency spikes for specific resources; latency plateaus at the webhook timeout valueapiserver_admission_webhook_admission_duration_seconds
Inflight / APF saturation429 errors; APF queues full; all controllers laggingapiserver_current_inflight_requests and APF queue depth
Memory pressure / OOMProcess restarts; LIST latency spikes after restart; RSS near container limitprocess_resident_memory_bytes vs limit
Re-list stormLIST rate spikes; CPU and memory burst; watches reconnecting en masseapiserver_request_total{verb="LIST"}
Certificate expirySudden 401s; nodes NotReady; TLS handshake errorskubeadm certs check-expiration

Quick checks

These checks are read-only and safe to run during an active incident.

# Check if the API server process is alive and ready
kubectl get --raw '/livez?verbose'
kubectl get --raw '/readyz?verbose'
# Check etcd cluster health from a control plane node
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
  endpoint health
# Check per-webhook admission latency
kubectl get --raw '/metrics' | grep ^apiserver_admission_webhook_admission_duration_seconds
# Check inflight requests and APF queue depth
kubectl get --raw '/metrics' | grep ^apiserver_current_inflight_requests
kubectl get --raw '/metrics' | grep ^apiserver_flowcontrol_current_inqueue_requests
# Check API server memory and container restarts on the node
crictl ps --name kube-apiserver -q | xargs crictl stats
# Check 5xx and 429 error rates
kubectl get --raw '/metrics' | grep 'apiserver_request_total.*code="5'
kubectl get --raw '/metrics' | grep 'apiserver_request_total.*code="429"'
# Check control plane certificate expiry
kubeadm certs check-expiration
# Look for a LIST rate spike that indicates a re-list storm
kubectl get --raw '/metrics' | grep 'apiserver_request_total.*verb="LIST"'

How to diagnose it

Follow this flow to isolate the root cause.

  1. Scope the impact. In an HA deployment, check whether one instance or all instances are affected. Bypass the load balancer and call /readyz?verbose on each API server directly. If only one instance is degraded, remove it from rotation and investigate locally.
  2. Distinguish hung from overloaded. If /livez fails, the process is likely OOM-killed or deadlocked. Check container restart counts and kernel OOM logs (dmesg | grep -i oom). If /livez passes but /readyz fails, inspect the specific sub-checks (often etcd or poststarthook).
  3. Check etcd WAL fsync latency. Query the etcd metrics endpoint for etcd_disk_wal_fsync_duration_seconds. If p99 is above 100 ms sustained, disk I/O is the root cause. Every etcd write blocks on this fsync, so mutating API latency cannot be lower than this value.
  4. Check admission webhook latency. If apiserver_admission_webhook_admission_duration_seconds is elevated and correlates with total mutating latency, identify the specific webhook by name. Check its Deployment endpoints, pod readiness, and recent logs. A single slow webhook with failurePolicy: Fail can freeze all mutations for matched resources.
  5. Check for request saturation. If apiserver_current_inflight_requests is near the configured limit (default 400 read-only, 200 mutating) or APF queue depth is growing, find the noisy client. Break down apiserver_request_total by user-agent and user to identify runaway controllers or CI/CD pipelines.
  6. Check memory and OOM patterns. If the API server container has restarted and memory was near the limit beforehand, you are likely in an OOM/re-list cycle. The replacement instance starts with cold caches; all clients re-list simultaneously, causing a memory spike that triggers another OOM. Check process_resident_memory_bytes trends.
  7. Check auth errors and certificates. A sudden spike in 401s from known components suggests certificate expiry or service account token rotation failure. Use kubeadm certs check-expiration or openssl x509 -in on the relevant cert files.
  8. Correlate by verb and resource. High LIST latency with normal GET latency points to large un-paginated list calls or a re-list storm. High mutating latency with normal reads points to etcd or webhooks. High latency across all verbs with normal etcd and webhooks points to CPU or memory pressure.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
apiserver_request_duration_seconds p99Primary server-side latency indicatorMutating p99 above 1 s sustained; LIST p99 above 5 s
etcd_disk_wal_fsync_duration_seconds p99Root cause of most write latencyp99 above 100 ms
apiserver_admission_webhook_admission_duration_secondsAdds synchronous latency to every mutationp99 above 200 ms for any single webhook
apiserver_current_inflight_requestsMeasures global request saturationAbove 80% of configured limit
apiserver_flowcontrol_current_inqueue_requestsEarly warning before APF rejects trafficQueue depth above 0 for system or leader-election priority levels
process_resident_memory_bytesMemory pressure leads to OOM and re-list cascadesAbove 80% of container limit
apiserver_request_total{code="429"}Confirms APF or inflight rejectionSustained rate above 5% of total requests
apiserver_storage_objectsObject growth drives cache size and etcd storageAny resource type growing unboundedly

Fixes

If the cause is etcd latency

Run etcdctl check perf to validate disk performance against etcd requirements. Warning: this command writes load-test data. Do not run it on a production etcd cluster during an active incident.

If the disk is shared with other workloads, move etcd data to a dedicated SSD or NVMe volume. Check etcd_server_leader_changes_seen_total. If leader changes are increasing, the disk is too slow for the default Raft heartbeat interval. Schedule compaction and defragmentation during maintenance windows, one member at a time.

If the cause is admission webhooks

Identify the slow webhook from apiserver_admission_webhook_admission_duration_seconds. Check its Deployment endpoints and pod logs. If the webhook is non-critical, you can temporarily change its failurePolicy to Ignore to unblock mutations. Narrow its rules and namespaceSelector so it matches fewer requests. Scale the webhook horizontally or increase its CPU and memory if it is overloaded.

If the cause is inflight or APF saturation

Find the client causing the load by inspecting apiserver_request_total breakdown by user-agent and user. If traffic is legitimate, increase --max-requests-inflight and --max-mutating-requests-inflight only if the node has CPU and memory headroom. Tune APF PriorityLevelConfiguration concurrency shares to protect system and leader-election flows. Isolate noisy operators into dedicated flow schemas with lower priority.

If the cause is memory pressure or OOM

Increase the API server container memory limit immediately. If running Go 1.19+, set GOMEMLIMIT to roughly 90% of the container limit to trigger more aggressive garbage collection before the kernel OOM killer fires. Review apiserver_storage_objects and remove unneeded CRDs or stale objects. Increase --watch-cache-sizes for high-churn resources only if memory allows.

If the cause is a re-list storm

If the storm follows an API server restart, allow caches to warm up. If it is spontaneous, check for 410 Gone watch errors indicating cache overflow. Increase the watch cache size for the affected resource type via --watch-cache-sizes. Ensure clients use bookmark watches to reduce cache misses.

If the cause is certificate expiry

Renew expired certificates with kubeadm certs renew all on kubeadm-managed clusters, or via your certificate management pipeline. Restart the API server and affected kubelets to load the new certificates. Verify that renewal automation covers control plane, etcd, webhook CA, and front-proxy certificates.

Prevention

  • Monitor etcd disk latency directly. Slow disk is the root cause of most API server write latency, yet many teams only notice it after a leader election storm.
  • Alert on webhook latency and fail-opens. A slow or ignored webhook degrades every mutating request silently. Monitor apiserver_admission_webhook_fail_open_count.
  • Size memory for burst headroom. Post-restart re-list storms can spike API server memory 2-3x above baseline. Size limits for the burst, not the steady state.
  • Review APF flow schemas quarterly. Misclassification can starve leader election and kubelet heartbeats while a runaway operator fills the catch-all queue.
  • Automate certificate expiry checks. Alert at 30 days and renew with buffer time for troubleshooting.
  • Enforce object cleanup policies. Remove completed Jobs, stale Events, and unused CRDs before they inflate etcd and watch caches beyond capacity.

How Netdata helps

  • Correlates API server request latency with etcd WAL fsync and host disk I/O latency on a single timeline, making etcd cascades obvious.
  • Surfaces APF queue depth, inflight requests, and 429 rates alongside per-verb latency so you can see saturation before clients fail.
  • Tracks API server memory and container restart counts to catch OOM cycles and memory leaks early.
  • Maps request rate spikes by verb and resource to help identify noisy neighbors and re-list storms.