Kubernetes StatefulSet pod not ready: ordering, PVCs, and recovery

A StatefulSet pod stuck at 0/1 Ready, Pending, or Unknown blocks the entire chain when the default OrderedReady policy is in effect. The controller creates pods sequentially from ordinal 0 to N-1, so one failure halts all higher ordinals. A volumeClaimTemplate adds dependencies on storage binding, node affinity, and CSI attachment that can outlive the pod. Force-deleting a pod risks violating at-most-one semantics and causing split-brain in quorum-sensitive workloads.

Distinguish ordering blockers from storage lifecycle problems, run safe diagnostic commands, and recover without orphaning data or duplicating pod identities.

What this means

A StatefulSet guarantees at most one pod with a given identity runs in the cluster at any time. With the default OrderedReady podManagementPolicy, the controller creates pods in order and waits for each to reach Running and Ready before launching the next. Termination proceeds in reverse. A crash in pod-0 blocks pod-1, pod-2, and any rolling update.

Each replica receives its own PVC from the volumeClaimTemplate. The StatefulSet controller does not delete these PVCs when pods are removed, so they persist until manually deleted. The PV reclaim policy governs the backing volume when the PVC is deleted, not the PVC lifecycle. The PV may carry node-affinity constraints that pin it to a specific node, or a claimRef that binds it exclusively to a PVC. If the replacement pod cannot land on the same node, the volume must detach and reattach through the CSI driver, the attach-detach controller, and the VolumeAttachment object.

Stable network identity depends on a headless Service with clusterIP: None. Without it, or if the Service selector drifts, pod hostnames will not resolve and application bootstrap may fail readiness checks.

Common causes

CauseWhat it looks likeFirst thing to check
Ordered startup blockingPod N is Pending or NotReady while a lower ordinal is not Readykubectl get pods -l app=<statefulset-name> and find the first non-Ready ordinal
PVC binding failurePod stays Pending with FailedScheduling or volume eventskubectl get pvc for the claim; look for Pending or provisioning errors
Volume attachment hangReplacement pod Pending; volume still attached to a dead or old nodekubectl get volumeattachments for the PV
PV node affinity conflictReplacement cannot schedule; PV node affinity pins volume to the original nodekubectl get pv <pv-name> -o yaml for node affinity and claimRef
Headless service mismatchCoreDNS returns NXDOMAIN for pod hostnames; readiness probes fail during bootstrapkubectl get svc <headless-service> -o yaml for selector and publishNotReadyAddresses
Force-delete aftermathPod phase Unknown, volume stuck in Attaching, risk of duplicate identitykubectl get pod <pod-name> for phase and remaining finalizers

Quick checks

# Pod status and ordinals
kubectl get pods -l app=<statefulset-name> --sort-by=.metadata.name

# PVC status
kubectl get pvc -l app=<statefulset-name>

# Volume attachments
kubectl get volumeattachments

# Headless service selector match
kubectl get svc <headless-service> -o yaml | grep -A5 selector

# Recent events
kubectl describe pod <pod-name>
kubectl describe pvc <pvc-name>

# Node health
kubectl get node <node-name>

# PV constraints
kubectl get pv <pv-name> -o yaml

How to diagnose it

  1. Find the blocked ordinal. Run kubectl get pods -l app=<statefulset-name> --sort-by=.metadata.name. Identify the lowest-indexed pod that is not Running and Ready. With OrderedReady, the controller will not create or update higher ordinals until this pod is healthy.

  2. Determine whether the problem is scheduling or runtime. Run kubectl describe pod <pod-name> and read Events. If the pod is Pending with Unschedulable or volume-related messages, the issue is upstream of the container. If the pod is Running but not Ready, the issue is likely probes, DNS, or application startup.

  3. Inspect PVC binding. Run kubectl get pvc <claim-name>. If the phase is Pending, check the StorageClass provisioner, resource quotas, and cloud provider volume limits. If a PVC was deleted and the replacement pod is Pending because the claim is missing, delete the pod so the controller recreates the pod and PVC together.

  4. Verify volume attachment state. Run kubectl get volumeattachments. If a VolumeAttachment object still references the old node and the pod was force-deleted after node failure, the CSI attacher may not have detached the volume. The replacement pod will stay Pending because the volume is still exclusively attached elsewhere.

  5. Check the headless Service. Run kubectl get svc <headless-service> -o yaml. Confirm clusterIP: None, that the selector matches the StatefulSet pod labels exactly, and that publishNotReadyAddresses is set to true if your application requires DNS resolution before passing readiness.

  6. Review PV node affinity and claimRef. Run kubectl get pv <pv-name> -o yaml. If the PV is Released and has a claimRef pointing to a deleted PVC, a new PVC cannot bind to it. If the PVC itself is stuck Terminating because of a finalizer, remove the finalizer from the PVC instead of patching the PV.

  7. Evaluate finalizers before forcing deletion. Run kubectl get pod <pod-name> -o yaml | grep finalizers. If finalizers are present, understand what they protect. Force-deleting a StatefulSet pod can violate the at-most-one identity guarantee. Only force-delete after asserting the old pod is truly gone and will never contact peers.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
StatefulSet ready replicas vs desiredDirect measure of workload healthReady replicas less than desired for more than 5 minutes
Pod phase by ordinalOrderedReady blocks on the first non-Ready podA higher ordinal is Pending while a lower one is not Ready
PVC phaseUnbound PVC prevents pod schedulingPVC in Pending phase for more than 5 minutes
VolumeAttachment objectsStuck attachments block rescheduling after node failureVolumeAttachment exists for a pod on a terminated or NotReady node
Scheduler pending podsIndicates capacity or constraint saturationPending pods increasing while the scheduler is healthy
API server mutating request latencySlow admission or etcd delays StatefulSet reconciliationp99 latency greater than 1 second sustained
etcd disk WAL fsync latencySlow etcd cascades into all control plane writesp99 fsync latency greater than 100 ms sustained
Node Ready conditionA dead node can strand attached volumesNode condition Unknown or NotReady for more than 1 minute

Fixes

If the cause is ordered startup blocking

Do not force-delete higher ordinals to skip over a blocked pod. Fix the lowest unhealthy ordinal first. If the blocked pod is on a failed node and cannot recover, you cannot change podManagementPolicy on a running StatefulSet. Switching to Parallel requires deleting and recreating the StatefulSet, which is disruptive and removes startup ordering guarantees.

If the cause is PVC binding or provisioning failure

Check the StorageClass and provisioner logs. If the PVC is missing and the pod is Pending, deleting the pod causes the StatefulSet controller to recreate the pod and PVC from the template. This is destructive: the new PVC will provision a fresh volume unless the reclaim policy and storage backend are specifically configured to reuse the existing PV. Verify backups before proceeding.

If the cause is a volume stuck attached to a dead node

After confirming the node is truly gone and the pod will not resume, identify the VolumeAttachment object referencing the PV and old node, then delete it:

kubectl delete volumeattachment <volumeattachment-name>

This is disruptive and should only be done when the old pod is confirmed terminated. The replacement pod can then trigger a new attachment on its scheduled node.

If the cause is a PVC stuck Terminating with Retain policy

If the PVC is stuck in Terminating because of a protecting finalizer, remove the finalizer:

kubectl patch pvc <pvc-name> -p '{"metadata":{"finalizers":null}}'

If the PVC was deleted and the PV is Released with a lingering claimRef, clear the claimRef so a new PVC can bind:

# Destructive to binding state; ensure the old PVC is fully deleted
kubectl patch pv <pv-name> -p '{"spec":{"claimRef":null}}'

If the cause is headless service DNS failure

Correct the Service selector to match the StatefulSet pod labels exactly. If the application bootstraps using hostnames before passing readiness probes, set publishNotReadyAddresses: true on the headless Service.

If force deletion is the only remaining option

Use the force-delete procedure only as a last resort:

kubectl delete pod <pod-name> --grace-period=0 --force

If the pod remains in Unknown phase, patch finalizers to allow removal:

kubectl patch pod <pod-name> -p '{"metadata":{"finalizers":null}}'

This carries a real risk of violating at-most-one semantics. Do not force-delete unless you are certain the old pod process is gone and will not rejoin the StatefulSet membership.

Prevention

  • Use podManagementPolicy: Parallel only if the application tolerates simultaneous startup without ordered initialization. Remember that this field is immutable.
  • Set terminationGracePeriodSeconds high enough for graceful shutdown and CSI detachment.
  • Monitor PVC binding and VolumeAttachment lifecycle as first-class operational signals.
  • Create the headless Service before the StatefulSet, and treat selector changes as breaking.
  • Establish a runbook that checks node status and VolumeAttachment state before allowing force-deletion of a StatefulSet pod.
  • Use PodDisruptionBudgets conservatively with StatefulSets to avoid mass evictions that trigger reattachment storms.

How Netdata helps

  • Correlate node NotReady transitions with kubelet PLEG latency and disk pressure to catch node failures before they strand volumes.
  • Track API server mutating request latency and etcd WAL fsync latency to detect control plane slowdowns that delay StatefulSet reconciliation.
  • Monitor conntrack utilization and drops to preempt network-related communication failures.
  • Alert on PVC capacity and volume stats to catch storage pressure before binding fails.
  • Visualize per-node resource saturation to distinguish scheduling constraints from application-level crashes.
flowchart TD
    A[StatefulSet pod N not ready] --> B{Is pod N-1 ready?}
    B -->|No| C[Ordered startup blocked
Fix lower ordinal first] B -->|Yes| D{Is pod pending?} D -->|Yes| E{Is PVC bound?} E -->|No| F[PVC binding or provisioning failure] E -->|Yes| G{VolumeAttachment on old node?} G -->|Yes| H[Volume stuck attaching
Clear VolumeAttachment] G -->|No| I[Scheduler or node constraint] D -->|No| J[Container or runtime issue
Check logs and probes]