Kubernetes pod creation fails: admission, quota, and CRI errors

Pre-scheduling failures happen when the API server or container runtime rejects a Pod before the scheduler assigns it. You apply a Deployment, but kubectl get pods returns nothing. Or a Pod hangs in ImagePullBackOff before ContainerCreating. These cases surface as missing Pods, FailedCreate events on ReplicaSets or Jobs, or explicit API rejections. This guide covers admission control, quota and policy limits, and CRI-level image pull and sandbox failures.

What this means

Pod creation is a pipeline. The API server first authenticates and authorizes the request, then runs the Pod spec through admission controllers: mutating webhooks, validating webhooks, LimitRanger, ResourceQuota, and PodSecurity. If any controller rejects the request, the Pod object is never persisted to etcd. If admission succeeds, the object is stored, the scheduler assigns a node, and the kubelet asks the container runtime (via CRI) to create the pod sandbox and pull images. Failures in this guide happen before the pod is running or scheduled.

Common causes

CauseWhat it looks likeFirst thing to check
Admission webhook rejection or timeoutEvents say failed calling webhook; mutating API latency spikes; all pod creation hangskubectl get validatingwebhookconfigurations,mutatingwebhookconfigurations
ResourceQuota exhaustedReplicaSet or Job created but zero pods; event says exceeded quotakubectl describe resourcequota -n <namespace>
LimitRange violationAPI returns 403; container resources below minimum or above maximumkubectl describe limitrange -n <namespace>
Pod Security Standards (PSS) denialEvent says violates PodSecurity; namespace enforces restrictedNamespace labels and pod securityContext
CRI image pull failurePod status ImagePullBackOff or ErrImagePullkubectl describe pod and node crictl status
CRI auth or registry misconfigurationPull works via ctr but fails via kubelet/CRIRuntime registry auth config vs imagePullSecrets

Quick checks

# Check for pre-scheduling creation failures in namespace events
kubectl get events --field-selector reason=FailedCreate -n <namespace>
# Check webhook configs, scopes, and failure policies
kubectl get validatingwebhookconfigurations,mutatingwebhookconfigurations -o yaml | grep -B 5 -A 2 failurePolicy
# Check ResourceQuota consumption and hard limits
kubectl describe resourcequota -n <namespace>
# Check LimitRange constraints that may reject pod specs
kubectl describe limitrange -n <namespace>
# Check Pod Security Admission level on the namespace
kubectl get namespace <namespace> --show-labels
# Check image availability and pull errors on the assigned node
crictl images | grep <image-name>
# Check kubelet logs for CRI sandbox or pull errors
journalctl -u kubelet --since "10 minutes ago" | grep -iE "pull|cri|sandbox"
# Check API server admission webhook latency; requires metrics endpoint access
kubectl get --raw /metrics | grep apiserver_admission_webhook_admission_duration_seconds

How to diagnose it

  1. Determine whether the Pod object was created. Run kubectl get pods -n <namespace>. If the pod does not exist, check the owning controller with kubectl describe. Look for FailedCreate events. If FailedCreate appears, the API server rejected the request before persisting the Pod.
  2. Check if admission webhooks are on the critical path. If FailedCreate mentions a webhook name, or mutating requests are timing out, inspect the webhook service endpoints. A webhook with failurePolicy: Fail that is unreachable blocks all matching resource creation. If you have access to API server metrics, check apiserver_admission_webhook_admission_duration_seconds to identify slow webhooks.
  3. Check quota and policy limits. Run kubectl describe resourcequota and kubectl describe limitrange. If quota is exhausted, the error message names the specific resource (pods, cpu, memory). If LimitRange is the issue, the API server returns a 403 explaining the min, max, or default constraint violation.
  4. Check Pod Security Admission. If the event mentions PodSecurity, inspect the namespace labels (pod-security.kubernetes.io/enforce). A restricted profile rejects privileged containers, host namespaces, and certain volume types. Verify whether the pod spec complies or whether the namespace should use baseline or audit.
  5. If the Pod exists but is in ImagePullBackOff, move to CRI diagnosis. Check kubectl describe pod for the exact pull error. Verify the image tag exists in the registry. Confirm the Pod’s ServiceAccount references imagePullSecrets if the registry requires authentication. On the node, test with crictl pull. If crictl fails with an auth error but ctr succeeds, the CRI plugin may not be using the registry credentials correctly.
  6. Check controller workqueues. If quota or webhook issues resolve but pods still do not appear, the controller-manager may be backlogged. Check workqueue_depth metrics for the ReplicaSet or Job controller. A sustained depth above 100 indicates the controller is still catching up.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
apiserver_admission_webhook_admission_duration_secondsEvery mutating request waits for webhooks synchronously.p99 > 1s sustained, or latency approaching timeoutSeconds
`apiserver_request_total{code=~“4..5..”}`Rejection rates expose quota, auth, and admission errors.
kubelet_runtime_operations_errors_total{operation_type="pull_image"}CRI-level pull failures block pod startup.Non-zero rate for critical images
ResourceQuota used / hard ratioQuota exhaustion is silent until creation fails.> 80% utilization for pods, cpu, or memory
Controller workqueue_depthBacklog after recovery delays pod reconciliation.> 100 sustained for replicaset or job queues
Pod phase Pending with FailedScheduling vs no eventsDistinguishes quota (no scheduling attempt) from capacity issues.No node assigned and FailedCreate events present

Fixes

If the cause is admission webhooks

Identify the slow or failing webhook from apiserver_admission_webhook_admission_duration_seconds. Check whether the webhook service has ready endpoints and whether the webhook pods are healthy. If the cluster is blocked and the webhook is non-critical, you can temporarily change failurePolicy to Ignore to unblock creation.

Warning: Setting failurePolicy: Ignore bypasses the webhook and may allow non-compliant or insecure resources into the cluster. Use only as a break-glass measure.

Narrow the webhook’s scope with namespaceSelector or objectSelector so it does not match kube-system or critical infrastructure. Long-term, treat webhook availability as a hard dependency for any resource type it matches.

If the cause is ResourceQuota

Increase the quota or reduce existing workload requests in the namespace. Quota is enforced at admission time based on resource requests, not limits. Pods in Terminating state still consume quota until fully deleted; stuck finalizers or slow garbage collection can delay this. If a Deployment creates a ReplicaSet but no Pods appear, quota exhaustion is the likely cause. For batch workloads, be aware that a Job object can be created successfully even when its namespace has no remaining pod quota, leaving the Job retrying indefinitely without surfacing a clear quota error on the Job itself.

If the cause is LimitRange

Adjust the Pod’s container resource requests and limits to fit within the LimitRange constraints. LimitRange can inject defaults if a container omits resources, which sometimes causes a spec to violate a maxLimitRequestRatio or a hard max unexpectedly. Run kubectl describe limitrange to see the defaults and min/max values before modifying the Pod spec.

If the cause is Pod Security Standards

Check the namespace labels. If pod-security.kubernetes.io/enforce is set to restricted or baseline, compare the Pod’s securityContext against the PSS profile requirements. If the cluster previously used PodSecurityPolicy (PSP), confirm no RBAC or manifests still reference PSP; the API was removed in Kubernetes 1.25. To avoid production outages, set the namespace to audit or warn before enforcing restricted, and review audit logs for violations.

If the cause is CRI image pull errors

Verify the image name, tag, and registry reachability. Ensure the Pod’s ServiceAccount is bound to the correct imagePullSecrets; all pods inherit imagePullSecrets from their ServiceAccount. Check node-level registry credentials in the container runtime configuration. If you use containerd and suspect a CRI auth issue, verify that credentials are configured in the CRI plugin’s registry auth block and not only in runtime-specific host configuration files.

Prevention

Monitor admission webhook latency. Alert on p99 webhook latency above 500ms and treat webhook downtime as a control plane outage. Monitor ResourceQuota utilization per namespace and alert at 80% so teams can resize or clean up before creation is blocked. Use LimitRange defaults to prevent containers from entering the cluster without resource requests; this makes quota accounting predictable. Validate Pod Security Standards in CI before enforcing them in production namespaces. Monitor kubelet_runtime_operations_errors_total for image pull failures and test registry authentication from a node’s CRI interface during onboarding. Do not rely only on Pod status to detect quota issues; monitor ReplicaSet, Job, and FailedCreate event rates.

How Netdata helps

Netdata surfaces the signals that distinguish admission and quota failures from runtime problems:

  • API server latency charts break down mutating request latency, showing when admission webhooks are adding seconds to every Pod create.
  • Admission webhook latency per webhook name pinpoints which policy engine or custom webhook is stalling the cluster.
  • Kubelet CRI operation error rates expose image pull failures at the node level before they cascade into workload outages.
  • ResourceQuota utilization per namespace provides a real-time view of capacity headroom without requiring manual kubectl describe checks.
flowchart TD
    A[Pod creation fails] --> B{Pod object exists?}
    B -->|No| C[Check FailedCreate events]
    B -->|Yes| D[Check pod status]
    C --> E{Event mentions webhook?}
    E -->|Yes| F[Diagnose admission webhook]
    E -->|No| G[Check ResourceQuota and LimitRange]
    D --> H{ImagePullBackOff?}
    H -->|Yes| I[Diagnose CRI and registry auth]
    H -->|No| J[Check PSA and runtime sandbox]
    F --> K[Fix webhook latency or policy]
    G --> L[Fix quota or resource constraints]
    I --> M[Fix image pull config]
    J --> N[Fix securityContext or runtime]