Docker image pull failures: registry, network, and auth diagnosis

What this means

When you run docker pull, the daemon negotiates a TLS connection to the registry, authenticates if required, resolves the manifest for the requested tag and architecture, then downloads missing layers. If any step fails, the pull aborts.

Registry errors surface as HTTP 429 or 401/403 responses. Network errors appear as timeouts, connection resets, or TLS handshake failures. Local problems such as a full disk or a hung daemon can also abort a pull even when the registry is healthy. In orchestrated environments, a single node’s pull failure can trigger the scheduler to retry on other nodes, turning a localized auth error into a cluster-wide rate limit storm. Distinguishing these layers quickly is the difference between a five-minute fix and a prolonged outage.

Common causes

CauseWhat it looks likeFirst thing to check
Registry rate limitingHTTP 429 or “toomanyrequests”; pulls succeed during low trafficAuth tier: anonymous vs. authenticated; rate limit headers
Authentication failure“denied” or “unauthorized” when pulling private images~/.docker/config.json and docker login state
Network or DNS path failure“connection refused”, “no such host”, or “unexpected EOF” mid-downloadDNS resolution and TCP connectivity to registry endpoint
Missing tag or manifest“manifest unknown” for a specific tag; other tags pull fineTag existence in registry; manifest architecture match
Local disk exhaustion“no space left on device” during layer extractiondocker system df; df -h /var/lib/docker/
TLS certificate problem“x509: certificate signed by unknown authority” against a private registrySystem CA bundle and registry certificate
MTU mismatch or proxy“unexpected EOF” mid-download; first attempt fails but retries succeeddocker0 MTU vs. host interface; proxy and firewall rules

Quick checks

Run these checks first. Favor read-only commands before making changes.

  1. Daemon responsiveness. A hung daemon returns slowly or not at all. This distinguishes local daemon problems from registry problems.

    time curl --max-time 5 --unix-socket /var/run/docker.sock http://localhost/_ping
    
  2. Docker disk usage. Layer extraction needs free space. A full disk can abort a pull mid-download.

    docker system df
    df -h /var/lib/docker/
    
  3. Recent pull events and errors. The events stream shows requested images; daemon logs contain the full registry response.

    docker events --filter event=pull --since 1h --format '{{.Time}} {{.Actor.ID}}'
    journalctl -u docker.service | grep -i "pull\|download\|layer"
    
  4. Reproduce the failure. Capture the exact error and measure latency against your baseline. This writes image layers to disk.

    time docker pull <image>:<tag>
    
  5. Host DNS resolution. If the host cannot resolve the registry hostname, Docker cannot connect.

    nslookup <registry-hostname>
    
  6. Authentication state. Expired or missing credentials produce auth errors that look like repository denial. This file contains secrets.

    cat ~/.docker/config.json
    
  7. Storage driver status. Corrupt overlay2 metadata causes pull failures that resemble registry errors.

    docker info | grep -A5 "Storage Driver"
    

How to diagnose it

  1. Read the exact error from the daemon logs. The CLI output is often truncated. Look for “unauthorized” for auth issues, “connection refused” or “EOF” for network issues, “no space left on device” for disk issues, and “manifest unknown” for missing tags. Use the log line to decide which branch to follow next.

  2. Verify local daemon and storage health. Run the /_ping probe and docker system df. A hung daemon or full disk mimics registry failures. If /_ping takes longer than one second or the disk is more than 80 percent full, fix the local condition first. A daemon that is alive but slow can drop connections mid-pull.

  3. Test registry reachability from the host. Use curl -I https://<registry>/v2/ or nc -zv <registry> 443 from the host. If the host cannot connect, the problem is outside Docker: check DNS resolution, routing tables, host firewall rules, and physical links. If the host connects but Docker does not, inspect the daemon’s proxy environment variables and the bridge MTU. An MTU mismatch between the host interface and the Docker bridge causes silent connection drops during large layer downloads.

  4. Verify authentication state. Inspect ~/.docker/config.json and re-run docker login <registry>. Tokens expire, and credential helpers vary by operating system. If re-authenticating fixes the pull, the previous token had expired or the credential helper held stale data.

  5. Confirm the tag and architecture exist. A “manifest unknown” error means the tag does not exist or the multi-architecture manifest does not include the host’s platform variant. Try pulling by digest instead of tag, or query the registry to verify the tag is still published.

  6. Check for registry rate limiting. Look for HTTP 429 in the daemon logs or ratelimit-remaining headers near zero. If you are limited, authenticate, switch to a mirror, or wait for the window to reset.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Image pull rateCache thrashing or pull stormsSustained rate above deployment frequency with no new containers
Pull latencyDelays scaling and recoveryGreater than 3x baseline or sustained latency above 5 minutes
Docker disk usageBlocks layer writesGreater than 80 percent used
Daemon responsivenessA hung daemon cannot complete pulls/_ping response above 1 second or timeout
Container network errorsCorrupts layer downloadsNonzero rx_errors or dropped packets on container interfaces
Registry rate limit headers429 responsesratelimit-remaining approaching zero
Container creation failuresPull failures surface as create errorsFailure rate above zero for new images

Fixes

If the cause is registry rate limiting

Authenticate with docker login to move from anonymous to authenticated tier. Deploy a local pull-through registry mirror to reduce external requests. Avoid floating tags such as latest; they force unnecessary re-pulls.

If the cause is authentication failure

Re-run docker login <registry> and verify the entry in ~/.docker/config.json. If you are running in Kubernetes, confirm the pod references an imagePullSecret; Kubernetes does not use the node daemon’s config.json for pod image pulls. For private registries, check that the authentication token has not expired and that the credential helper is storing the secret correctly.

If the cause is network or DNS

Verify DNS resolution for the registry hostname from the host. Check the bridge MTU; a mismatch between the host interface and docker0 causes “unexpected EOF” during large layer downloads. If the host uses an HTTP proxy, ensure the daemon environment includes HTTP_PROXY and HTTPS_PROXY, and restart the daemon after any change. If the proxy terminates TLS, ensure the system trust store includes the proxy’s CA certificate.

If the cause is storage or I/O

Free space under /var/lib/docker by pruning dangling images, truncating oversized container logs, and removing unused volumes. If pulls fail with layer errors after an unclean shutdown, overlay2 metadata may be corrupt. Remove the affected image with docker rmi and re-pull it. If removal fails because a stopped container references the image, remove the container first. docker system prune deletes stopped containers and unused networks. Run it only after confirming what will be deleted.

If the cause is a missing or incorrect image reference

Verify the tag exists in the registry. Pull by digest instead of tag for immutable references. Confirm the image manifest includes the host architecture; multi-arch images that omit the requested platform variant return “manifest unknown”. If the publisher recently updated the image, they may have deleted the tag.

Prevention

Set log rotation and disk cleanup policies. Alert on pull latency above 3x baseline or 5 minutes. Pin images by digest in deployment configs. Use a local registry mirror for frequently pulled base images. Include daemon pull errors in deployment health checks. Set container resource limits so image extraction does not starve the daemon during large pulls.

How Netdata helps

  • Compare pull latency against host network I/O, disk I/O, and daemon CPU to locate local vs. external bottlenecks.
  • Alert on Docker disk usage before it blocks layer writes.
  • Track daemon /_ping latency to distinguish registry outages from daemon hangs.
  • Monitor container creation failure rates; they spike after pull failures.
  • Plot image pull rate against registry connectivity errors.