Kubernetes pod creation fails: admission, quota, and CRI errors
Pre-scheduling failures happen when the API server or container runtime rejects a Pod before the scheduler assigns it. You apply a Deployment, but kubectl get pods returns nothing. Or a Pod hangs in ImagePullBackOff before ContainerCreating. These cases surface as missing Pods, FailedCreate events on ReplicaSets or Jobs, or explicit API rejections. This guide covers admission control, quota and policy limits, and CRI-level image pull and sandbox failures.
What this means
Pod creation is a pipeline. The API server first authenticates and authorizes the request, then runs the Pod spec through admission controllers: mutating webhooks, validating webhooks, LimitRanger, ResourceQuota, and PodSecurity. If any controller rejects the request, the Pod object is never persisted to etcd. If admission succeeds, the object is stored, the scheduler assigns a node, and the kubelet asks the container runtime (via CRI) to create the pod sandbox and pull images. Failures in this guide happen before the pod is running or scheduled.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Admission webhook rejection or timeout | Events say failed calling webhook; mutating API latency spikes; all pod creation hangs | kubectl get validatingwebhookconfigurations,mutatingwebhookconfigurations |
| ResourceQuota exhausted | ReplicaSet or Job created but zero pods; event says exceeded quota | kubectl describe resourcequota -n <namespace> |
| LimitRange violation | API returns 403; container resources below minimum or above maximum | kubectl describe limitrange -n <namespace> |
| Pod Security Standards (PSS) denial | Event says violates PodSecurity; namespace enforces restricted | Namespace labels and pod securityContext |
| CRI image pull failure | Pod status ImagePullBackOff or ErrImagePull | kubectl describe pod and node crictl status |
| CRI auth or registry misconfiguration | Pull works via ctr but fails via kubelet/CRI | Runtime registry auth config vs imagePullSecrets |
Quick checks
# Check for pre-scheduling creation failures in namespace events
kubectl get events --field-selector reason=FailedCreate -n <namespace>
# Check webhook configs, scopes, and failure policies
kubectl get validatingwebhookconfigurations,mutatingwebhookconfigurations -o yaml | grep -B 5 -A 2 failurePolicy
# Check ResourceQuota consumption and hard limits
kubectl describe resourcequota -n <namespace>
# Check LimitRange constraints that may reject pod specs
kubectl describe limitrange -n <namespace>
# Check Pod Security Admission level on the namespace
kubectl get namespace <namespace> --show-labels
# Check image availability and pull errors on the assigned node
crictl images | grep <image-name>
# Check kubelet logs for CRI sandbox or pull errors
journalctl -u kubelet --since "10 minutes ago" | grep -iE "pull|cri|sandbox"
# Check API server admission webhook latency; requires metrics endpoint access
kubectl get --raw /metrics | grep apiserver_admission_webhook_admission_duration_seconds
How to diagnose it
- Determine whether the Pod object was created. Run
kubectl get pods -n <namespace>. If the pod does not exist, check the owning controller withkubectl describe. Look forFailedCreateevents. IfFailedCreateappears, the API server rejected the request before persisting the Pod. - Check if admission webhooks are on the critical path. If
FailedCreatementions a webhook name, or mutating requests are timing out, inspect the webhook service endpoints. A webhook withfailurePolicy: Failthat is unreachable blocks all matching resource creation. If you have access to API server metrics, checkapiserver_admission_webhook_admission_duration_secondsto identify slow webhooks. - Check quota and policy limits. Run
kubectl describe resourcequotaandkubectl describe limitrange. If quota is exhausted, the error message names the specific resource (pods, cpu, memory). If LimitRange is the issue, the API server returns a 403 explaining the min, max, or default constraint violation. - Check Pod Security Admission. If the event mentions
PodSecurity, inspect the namespace labels (pod-security.kubernetes.io/enforce). Arestrictedprofile rejects privileged containers, host namespaces, and certain volume types. Verify whether the pod spec complies or whether the namespace should usebaselineoraudit. - If the Pod exists but is in
ImagePullBackOff, move to CRI diagnosis. Checkkubectl describe podfor the exact pull error. Verify the image tag exists in the registry. Confirm the Pod’s ServiceAccount referencesimagePullSecretsif the registry requires authentication. On the node, test withcrictl pull. Ifcrictlfails with an auth error butctrsucceeds, the CRI plugin may not be using the registry credentials correctly. - Check controller workqueues. If quota or webhook issues resolve but pods still do not appear, the controller-manager may be backlogged. Check
workqueue_depthmetrics for the ReplicaSet or Job controller. A sustained depth above 100 indicates the controller is still catching up.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
apiserver_admission_webhook_admission_duration_seconds | Every mutating request waits for webhooks synchronously. | p99 > 1s sustained, or latency approaching timeoutSeconds |
| `apiserver_request_total{code=~“4.. | 5..”}` | Rejection rates expose quota, auth, and admission errors. |
kubelet_runtime_operations_errors_total{operation_type="pull_image"} | CRI-level pull failures block pod startup. | Non-zero rate for critical images |
| ResourceQuota used / hard ratio | Quota exhaustion is silent until creation fails. | > 80% utilization for pods, cpu, or memory |
Controller workqueue_depth | Backlog after recovery delays pod reconciliation. | > 100 sustained for replicaset or job queues |
Pod phase Pending with FailedScheduling vs no events | Distinguishes quota (no scheduling attempt) from capacity issues. | No node assigned and FailedCreate events present |
Fixes
If the cause is admission webhooks
Identify the slow or failing webhook from apiserver_admission_webhook_admission_duration_seconds. Check whether the webhook service has ready endpoints and whether the webhook pods are healthy. If the cluster is blocked and the webhook is non-critical, you can temporarily change failurePolicy to Ignore to unblock creation.
Warning: Setting failurePolicy: Ignore bypasses the webhook and may allow non-compliant or insecure resources into the cluster. Use only as a break-glass measure.
Narrow the webhook’s scope with namespaceSelector or objectSelector so it does not match kube-system or critical infrastructure. Long-term, treat webhook availability as a hard dependency for any resource type it matches.
If the cause is ResourceQuota
Increase the quota or reduce existing workload requests in the namespace. Quota is enforced at admission time based on resource requests, not limits. Pods in Terminating state still consume quota until fully deleted; stuck finalizers or slow garbage collection can delay this. If a Deployment creates a ReplicaSet but no Pods appear, quota exhaustion is the likely cause. For batch workloads, be aware that a Job object can be created successfully even when its namespace has no remaining pod quota, leaving the Job retrying indefinitely without surfacing a clear quota error on the Job itself.
If the cause is LimitRange
Adjust the Pod’s container resource requests and limits to fit within the LimitRange constraints. LimitRange can inject defaults if a container omits resources, which sometimes causes a spec to violate a maxLimitRequestRatio or a hard max unexpectedly. Run kubectl describe limitrange to see the defaults and min/max values before modifying the Pod spec.
If the cause is Pod Security Standards
Check the namespace labels. If pod-security.kubernetes.io/enforce is set to restricted or baseline, compare the Pod’s securityContext against the PSS profile requirements. If the cluster previously used PodSecurityPolicy (PSP), confirm no RBAC or manifests still reference PSP; the API was removed in Kubernetes 1.25. To avoid production outages, set the namespace to audit or warn before enforcing restricted, and review audit logs for violations.
If the cause is CRI image pull errors
Verify the image name, tag, and registry reachability. Ensure the Pod’s ServiceAccount is bound to the correct imagePullSecrets; all pods inherit imagePullSecrets from their ServiceAccount. Check node-level registry credentials in the container runtime configuration. If you use containerd and suspect a CRI auth issue, verify that credentials are configured in the CRI plugin’s registry auth block and not only in runtime-specific host configuration files.
Prevention
Monitor admission webhook latency. Alert on p99 webhook latency above 500ms and treat webhook downtime as a control plane outage. Monitor ResourceQuota utilization per namespace and alert at 80% so teams can resize or clean up before creation is blocked. Use LimitRange defaults to prevent containers from entering the cluster without resource requests; this makes quota accounting predictable. Validate Pod Security Standards in CI before enforcing them in production namespaces. Monitor kubelet_runtime_operations_errors_total for image pull failures and test registry authentication from a node’s CRI interface during onboarding. Do not rely only on Pod status to detect quota issues; monitor ReplicaSet, Job, and FailedCreate event rates.
How Netdata helps
Netdata surfaces the signals that distinguish admission and quota failures from runtime problems:
- API server latency charts break down mutating request latency, showing when admission webhooks are adding seconds to every Pod create.
- Admission webhook latency per webhook name pinpoints which policy engine or custom webhook is stalling the cluster.
- Kubelet CRI operation error rates expose image pull failures at the node level before they cascade into workload outages.
- ResourceQuota utilization per namespace provides a real-time view of capacity headroom without requiring manual
kubectl describechecks.
Related guides
- Kubernetes API server etcd latency: detection and cascading failures
- Kubernetes API server rate limiting: APF priority levels and starvation
- Kubernetes API server slow or unresponsive: causes and fixes
- Kubernetes conntrack exhaustion: dropped connections under load
- Kubernetes controller-manager leader election failures
- Kubernetes DNS resolution failures inside pods
- Kubernetes eviction cascade: when one node failure takes down the cluster
- Kubernetes kube-proxy iptables sync stall: causes and recovery
- Kubernetes kube-proxy IPVS: stale rules and session affinity issues
- Kubernetes kubelet certificate expired: detection, rotation, and recovery
- Kubernetes kubelet memory leak: detection and OOM cycle
- Kubernetes kubelet not responding: PLEG, runtime, and certificate issues
flowchart TD
A[Pod creation fails] --> B{Pod object exists?}
B -->|No| C[Check FailedCreate events]
B -->|Yes| D[Check pod status]
C --> E{Event mentions webhook?}
E -->|Yes| F[Diagnose admission webhook]
E -->|No| G[Check ResourceQuota and LimitRange]
D --> H{ImagePullBackOff?}
H -->|Yes| I[Diagnose CRI and registry auth]
H -->|No| J[Check PSA and runtime sandbox]
F --> K[Fix webhook latency or policy]
G --> L[Fix quota or resource constraints]
I --> M[Fix image pull config]
J --> N[Fix securityContext or runtime]





