$ guides / kubernetes / kubernetes-statefulset-pod-not-ready ▌

Operations Guides

Kubernetes StatefulSet pod not ready: ordering, PVCs, and recovery

A StatefulSet pod stuck at 0/1 Ready, Pending, or Unknown blocks the entire chain when the default OrderedReady policy is in effect. The controller creates pods sequentially from ordinal 0 to N-1, so one failure halts all higher ordinals. A volumeClaimTemplate adds dependencies on storage binding, node affinity, and CSI attachment that can outlive the pod. Force-deleting a pod risks violating at-most-one semantics and causing split-brain in quorum-sensitive workloads.

Distinguish ordering blockers from storage lifecycle problems, run safe diagnostic commands, and recover without orphaning data or duplicating pod identities.

What this means

A StatefulSet guarantees at most one pod with a given identity runs in the cluster at any time. With the default OrderedReady podManagementPolicy, the controller creates pods in order and waits for each to reach Running and Ready before launching the next. Termination proceeds in reverse. A crash in pod-0 blocks pod-1, pod-2, and any rolling update.

Each replica receives its own PVC from the volumeClaimTemplate. The StatefulSet controller does not delete these PVCs when pods are removed, so they persist until manually deleted. The PV reclaim policy governs the backing volume when the PVC is deleted, not the PVC lifecycle. The PV may carry node-affinity constraints that pin it to a specific node, or a claimRef that binds it exclusively to a PVC. If the replacement pod cannot land on the same node, the volume must detach and reattach through the CSI driver, the attach-detach controller, and the VolumeAttachment object.

Stable network identity depends on a headless Service with clusterIP: None. Without it, or if the Service selector drifts, pod hostnames will not resolve and application bootstrap may fail readiness checks.

Common causes

Cause	What it looks like	First thing to check
Ordered startup blocking	Pod N is Pending or NotReady while a lower ordinal is not Ready	`kubectl get pods -l app=<statefulset-name>` and find the first non-Ready ordinal
PVC binding failure	Pod stays Pending with FailedScheduling or volume events	`kubectl get pvc` for the claim; look for Pending or provisioning errors
Volume attachment hang	Replacement pod Pending; volume still attached to a dead or old node	`kubectl get volumeattachments` for the PV
PV node affinity conflict	Replacement cannot schedule; PV node affinity pins volume to the original node	`kubectl get pv <pv-name> -o yaml` for node affinity and claimRef
Headless service mismatch	CoreDNS returns NXDOMAIN for pod hostnames; readiness probes fail during bootstrap	`kubectl get svc <headless-service> -o yaml` for selector and publishNotReadyAddresses
Force-delete aftermath	Pod phase Unknown, volume stuck in Attaching, risk of duplicate identity	`kubectl get pod <pod-name>` for phase and remaining finalizers

Quick checks

# Pod status and ordinals
kubectl get pods -l app=<statefulset-name> --sort-by=.metadata.name

# PVC status
kubectl get pvc -l app=<statefulset-name>

# Volume attachments
kubectl get volumeattachments

# Headless service selector match
kubectl get svc <headless-service> -o yaml | grep -A5 selector

# Recent events
kubectl describe pod <pod-name>
kubectl describe pvc <pvc-name>

# Node health
kubectl get node <node-name>

# PV constraints
kubectl get pv <pv-name> -o yaml

How to diagnose it

Find the blocked ordinal. Run kubectl get pods -l app=<statefulset-name> --sort-by=.metadata.name. Identify the lowest-indexed pod that is not Running and Ready. With OrderedReady, the controller will not create or update higher ordinals until this pod is healthy.
Determine whether the problem is scheduling or runtime. Run kubectl describe pod <pod-name> and read Events. If the pod is Pending with Unschedulable or volume-related messages, the issue is upstream of the container. If the pod is Running but not Ready, the issue is likely probes, DNS, or application startup.
Inspect PVC binding. Run kubectl get pvc <claim-name>. If the phase is Pending, check the StorageClass provisioner, resource quotas, and cloud provider volume limits. If a PVC was deleted and the replacement pod is Pending because the claim is missing, delete the pod so the controller recreates the pod and PVC together.
Verify volume attachment state. Run kubectl get volumeattachments. If a VolumeAttachment object still references the old node and the pod was force-deleted after node failure, the CSI attacher may not have detached the volume. The replacement pod will stay Pending because the volume is still exclusively attached elsewhere.
Check the headless Service. Run kubectl get svc <headless-service> -o yaml. Confirm clusterIP: None, that the selector matches the StatefulSet pod labels exactly, and that publishNotReadyAddresses is set to true if your application requires DNS resolution before passing readiness.
Review PV node affinity and claimRef. Run kubectl get pv <pv-name> -o yaml. If the PV is Released and has a claimRef pointing to a deleted PVC, a new PVC cannot bind to it. If the PVC itself is stuck Terminating because of a finalizer, remove the finalizer from the PVC instead of patching the PV.
Evaluate finalizers before forcing deletion. Run kubectl get pod <pod-name> -o yaml | grep finalizers. If finalizers are present, understand what they protect. Force-deleting a StatefulSet pod can violate the at-most-one identity guarantee. Only force-delete after asserting the old pod is truly gone and will never contact peers.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
StatefulSet ready replicas vs desired	Direct measure of workload health	Ready replicas less than desired for more than 5 minutes
Pod phase by ordinal	OrderedReady blocks on the first non-Ready pod	A higher ordinal is Pending while a lower one is not Ready
PVC phase	Unbound PVC prevents pod scheduling	PVC in Pending phase for more than 5 minutes
VolumeAttachment objects	Stuck attachments block rescheduling after node failure	VolumeAttachment exists for a pod on a terminated or NotReady node
Scheduler pending pods	Indicates capacity or constraint saturation	Pending pods increasing while the scheduler is healthy
API server mutating request latency	Slow admission or etcd delays StatefulSet reconciliation	p99 latency greater than 1 second sustained
etcd disk WAL fsync latency	Slow etcd cascades into all control plane writes	p99 fsync latency greater than 100 ms sustained
Node Ready condition	A dead node can strand attached volumes	Node condition Unknown or NotReady for more than 1 minute

Fixes

If the cause is ordered startup blocking

Do not force-delete higher ordinals to skip over a blocked pod. Fix the lowest unhealthy ordinal first. If the blocked pod is on a failed node and cannot recover, you cannot change podManagementPolicy on a running StatefulSet. Switching to Parallel requires deleting and recreating the StatefulSet, which is disruptive and removes startup ordering guarantees.

If the cause is PVC binding or provisioning failure

Check the StorageClass and provisioner logs. If the PVC is missing and the pod is Pending, deleting the pod causes the StatefulSet controller to recreate the pod and PVC from the template. This is destructive: the new PVC will provision a fresh volume unless the reclaim policy and storage backend are specifically configured to reuse the existing PV. Verify backups before proceeding.

If the cause is a volume stuck attached to a dead node

After confirming the node is truly gone and the pod will not resume, identify the VolumeAttachment object referencing the PV and old node, then delete it:

kubectl delete volumeattachment <volumeattachment-name>

This is disruptive and should only be done when the old pod is confirmed terminated. The replacement pod can then trigger a new attachment on its scheduled node.

If the cause is a PVC stuck Terminating with Retain policy

If the PVC is stuck in Terminating because of a protecting finalizer, remove the finalizer:

kubectl patch pvc <pvc-name> -p '{"metadata":{"finalizers":null}}'

If the PVC was deleted and the PV is Released with a lingering claimRef, clear the claimRef so a new PVC can bind:

# Destructive to binding state; ensure the old PVC is fully deleted
kubectl patch pv <pv-name> -p '{"spec":{"claimRef":null}}'

If the cause is headless service DNS failure

Correct the Service selector to match the StatefulSet pod labels exactly. If the application bootstraps using hostnames before passing readiness probes, set publishNotReadyAddresses: true on the headless Service.

If force deletion is the only remaining option

Use the force-delete procedure only as a last resort:

kubectl delete pod <pod-name> --grace-period=0 --force

If the pod remains in Unknown phase, patch finalizers to allow removal:

kubectl patch pod <pod-name> -p '{"metadata":{"finalizers":null}}'

This carries a real risk of violating at-most-one semantics. Do not force-delete unless you are certain the old pod process is gone and will not rejoin the StatefulSet membership.

Prevention

Use podManagementPolicy: Parallel only if the application tolerates simultaneous startup without ordered initialization. Remember that this field is immutable.
Set terminationGracePeriodSeconds high enough for graceful shutdown and CSI detachment.
Monitor PVC binding and VolumeAttachment lifecycle as first-class operational signals.
Create the headless Service before the StatefulSet, and treat selector changes as breaking.
Establish a runbook that checks node status and VolumeAttachment state before allowing force-deletion of a StatefulSet pod.
Use PodDisruptionBudgets conservatively with StatefulSets to avoid mass evictions that trigger reattachment storms.

How Netdata helps

Correlate node NotReady transitions with kubelet PLEG latency and disk pressure to catch node failures before they strand volumes.
Track API server mutating request latency and etcd WAL fsync latency to detect control plane slowdowns that delay StatefulSet reconciliation.
Monitor conntrack utilization and drops to preempt network-related communication failures.
Alert on PVC capacity and volume stats to catch storage pressure before binding fails.
Visualize per-node resource saturation to distinguish scheduling constraints from application-level crashes.

For volume and CSI-specific failures, see Kubernetes CSI driver failures.
For rollout and replica readiness concepts, see Kubernetes Deployment rollout stuck.
For DNS issues affecting headless Services, see Kubernetes DNS resolution failures inside pods.
For network-level connection drops, see Kubernetes conntrack exhaustion.
For control plane latency affecting all workload operations, see Kubernetes API server etcd latency.

flowchart TD
    A[StatefulSet pod N not ready] --> B{Is pod N-1 ready?}
    B -->|No| C[Ordered startup blocked
Fix lower ordinal first]
    B -->|Yes| D{Is pod pending?}
    D -->|Yes| E{Is PVC bound?}
    E -->|No| F[PVC binding or provisioning failure]
    E -->|Yes| G{VolumeAttachment on old node?}
    G -->|Yes| H[Volume stuck attaching
Clear VolumeAttachment]
    G -->|No| I[Scheduler or node constraint]
    D -->|No| J[Container or runtime issue
Check logs and probes]

The Netdata solution

Kubernetes monitoring with Netdata

Netdata monitors Kubernetes with per-second metrics across the control plane, nodes, and every pod, with ML anomaly detection and zero per-pod configuration. Correlate API-server and etcd latency, kubelet PLEG stalls, scheduling pressure, and OOMKills in one place.

See Kubernetes monitoring → Start monitoring free

Kubernetes StatefulSet pod not ready: ordering, PVCs, and recovery

Kubernetes StatefulSet pod not ready: ordering, PVCs, and recovery

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

If the cause is ordered startup blocking

If the cause is PVC binding or provisioning failure

If the cause is a volume stuck attached to a dead node

If the cause is a PVC stuck Terminating with Retain policy

If the cause is headless service DNS failure

If force deletion is the only remaining option

Prevention

How Netdata helps

Related guides

Kubernetes monitoring with Netdata