$ guides / docker / docker-image-pull-failures ▌

Operations Guides

Docker image pull failures: registry, network, and auth diagnosis

Docker Image Pull Failures: Registry, Network & Auth Diagnosis

What This Means

When you run docker pull, the daemon negotiates a TLS connection to the registry, authenticates if required, resolves the manifest for the requested tag and architecture, then downloads missing layers. If any step fails, the pull aborts.

Registry errors surface as HTTP 429 or 401/403 responses. Network errors appear as timeouts, connection resets, or TLS handshake failures. Local problems such as a full disk or a hung daemon can also abort a pull even when the registry is healthy. In orchestrated environments, a single node’s pull failure can trigger the scheduler to retry on other nodes, turning a localized auth error into a cluster-wide rate limit storm. Distinguishing these layers quickly is the difference between a five-minute fix and a prolonged outage.

Common Causes

Cause	What it looks like	First thing to check
Registry rate limiting	HTTP 429 or “toomanyrequests”; pulls succeed during low traffic	Auth tier: anonymous vs. authenticated; rate limit headers
Authentication failure	“denied” or “unauthorized” when pulling private images	`~/.docker/config.json` and `docker login` state
Network or DNS path failure	“connection refused”, “no such host”, or “unexpected EOF” mid-download	DNS resolution and TCP connectivity to registry endpoint
Missing tag or manifest	“manifest unknown” for a specific tag; other tags pull fine	Tag existence in registry; manifest architecture match
Local disk exhaustion	“no space left on device” during layer extraction	`docker system df`; `df -h /var/lib/docker/`
TLS certificate problem	“x509: certificate signed by unknown authority” against a private registry	System CA bundle and registry certificate
MTU mismatch or proxy	“unexpected EOF” mid-download; first attempt fails but retries succeed	`docker0` MTU vs. host interface; proxy and firewall rules

Quick Checks

Run these checks first. Favor read-only commands before making changes.

Daemon responsiveness. A hung daemon returns slowly or not at all. This distinguishes local daemon problems from registry problems.
```
time curl --max-time 5 --unix-socket /var/run/docker.sock http://localhost/_ping
```
Docker disk usage. Layer extraction needs free space. A full disk can abort a pull mid-download.
```
docker system df
df -h /var/lib/docker/
```

Recent pull events and errors. The events stream shows requested images; daemon logs contain the full registry response.

docker events --filter event=pull --since 1h --format '{{.Time}} {{.Actor.ID}}'
journalctl -u docker.service | grep -i "pull\|download\|layer"

Reproduce the failure. Capture the exact error and measure latency against your baseline. This writes image layers to disk.
```
time docker pull <image>:<tag>
```
Host DNS resolution. If the host cannot resolve the registry hostname, Docker cannot connect.
```
nslookup <registry-hostname>
```
Authentication state. Expired or missing credentials produce auth errors that look like repository denial. This file contains secrets.
```
cat ~/.docker/config.json
```
Storage driver status. Corrupt overlay2 metadata causes pull failures that resemble registry errors.
```
docker info | grep -A5 "Storage Driver"
```

How To Diagnose It

Read the exact error from the daemon logs. The CLI output is often truncated. Look for “unauthorized” for auth issues, “connection refused” or “EOF” for network issues, “no space left on device” for disk issues, and “manifest unknown” for missing tags. Use the log line to decide which branch to follow next.
Verify local daemon and storage health. Run the /_ping probe and docker system df. A hung daemon or full disk mimics registry failures. If /_ping takes longer than one second or the disk is more than 80 percent full, fix the local condition first. A daemon that is alive but slow can drop connections mid-pull.
Test registry reachability from the host. Use curl -I https://<registry>/v2/ or nc -zv <registry> 443 from the host. If the host cannot connect, the problem is outside Docker: check DNS resolution, routing tables, host firewall rules, and physical links. If the host connects but Docker does not, inspect the daemon’s proxy environment variables and the bridge MTU. An MTU mismatch between the host interface and the Docker bridge causes silent connection drops during large layer downloads.
Verify authentication state. Inspect ~/.docker/config.json and re-run docker login <registry>. Tokens expire, and credential helpers vary by operating system. If re-authenticating fixes the pull, the previous token had expired or the credential helper held stale data.
Confirm the tag and architecture exist. A “manifest unknown” error means the tag does not exist or the multi-architecture manifest does not include the host’s platform variant. Try pulling by digest instead of tag, or query the registry to verify the tag is still published.
Check for registry rate limiting. Look for HTTP 429 in the daemon logs or ratelimit-remaining headers near zero. If you are limited, authenticate, switch to a mirror, or wait for the window to reset.

flowchart TD
    A[docker pull fails] --> B{Check daemon logs for error}
    B -->|auth error| C[Verify docker login and config.json]
    B -->|network error| D[Test DNS and TCP to registry]
    B -->|manifest unknown| E[Verify tag and architecture]
    B -->|disk error| F[Check docker system df and host disk]
    B -->|429 or rate limit| G[Authenticate or use a mirror]
    C --> H[Retry pull]
    D --> I[Fix DNS, proxy, or MTU]
    E --> J[Correct image reference]
    F --> K[Free disk space or re-pull]
    G --> H
    I --> H
    J --> H
    K --> H

Metrics & Signals To Monitor

Signal	Why it matters	Warning sign
Image pull rate	Cache thrashing or pull storms	Sustained rate above deployment frequency with no new containers
Pull latency	Delays scaling and recovery	Greater than 3x baseline or sustained latency above 5 minutes
Docker disk usage	Blocks layer writes	Greater than 80 percent used
Daemon responsiveness	A hung daemon cannot complete pulls	`/_ping` response above 1 second or timeout
Container network errors	Corrupts layer downloads	Nonzero `rx_errors` or dropped packets on container interfaces
Registry rate limit headers	429 responses	`ratelimit-remaining` approaching zero
Container creation failures	Pull failures surface as create errors	Failure rate above zero for new images

Fixes

If The Cause Is Registry Rate Limiting

Authenticate with docker login to move from anonymous to authenticated tier. Deploy a local pull-through registry mirror to reduce external requests. Avoid floating tags such as latest; they force unnecessary re-pulls.

If The Cause Is Authentication Failure

Re-run docker login <registry> and verify the entry in ~/.docker/config.json. If you are running in Kubernetes, confirm the pod references an imagePullSecret; Kubernetes does not use the node daemon’s config.json for pod image pulls. For private registries, check that the authentication token has not expired and that the credential helper is storing the secret correctly.

If The Cause Is Network Or DNS

Verify DNS resolution for the registry hostname from the host. Check the bridge MTU; a mismatch between the host interface and docker0 causes “unexpected EOF” during large layer downloads. If the host uses an HTTP proxy, ensure the daemon environment includes HTTP_PROXY and HTTPS_PROXY, and restart the daemon after any change. If the proxy terminates TLS, ensure the system trust store includes the proxy’s CA certificate.

If The Cause Is Storage Or I/O

Free space under /var/lib/docker by pruning dangling images, truncating oversized container logs, and removing unused volumes. If pulls fail with layer errors after an unclean shutdown, overlay2 metadata may be corrupt. Remove the affected image with docker rmi and re-pull it. If removal fails because a stopped container references the image, remove the container first. docker system prune deletes stopped containers and unused networks. Run it only after confirming what will be deleted.

If The Cause Is A Missing Or Incorrect Image Reference

Verify the tag exists in the registry. Pull by digest instead of tag for immutable references. Confirm the image manifest includes the host architecture; multi-arch images that omit the requested platform variant return “manifest unknown”. If the publisher recently updated the image, they may have deleted the tag.

Prevention

Set log rotation and disk cleanup policies. Alert on pull latency above 3x baseline or 5 minutes. Pin images by digest in deployment configs. Use a local registry mirror for frequently pulled base images. Include daemon pull errors in deployment health checks. Set container resource limits so image extraction does not starve the daemon during large pulls.

How Netdata Helps

Compare pull latency against host network I/O, disk I/O, and daemon CPU to locate local vs. external bottlenecks.
Alert on Docker disk usage before it blocks layer writes.
Track daemon /_ping latency to distinguish registry outages from daemon hangs.
Monitor container creation failure rates; they spike after pull failures.
Plot image pull rate against registry connectivity errors.

If docker ps or docker inspect hangs while you are diagnosing, see Docker commands hang: docker ps, inspect, and exec freezes.
If the daemon itself becomes unresponsive during pulls, see Docker daemon not responding: how to troubleshoot a hung dockerd.
If containers exit immediately after a successful pull, see Docker container exits immediately: how to diagnose it.
For disk space issues that block pulls, see Docker disk space full: how to troubleshoot /var/lib/docker.
For DNS resolution issues inside containers that can mimic registry failures, see Docker DNS not working inside containers.