Docker commands hang: docker ps, inspect, and exec freezes
docker ps, docker inspect, and docker exec hang while containers continue serving traffic. This is a Docker daemon hang: the management plane is dead while the data plane survives. Standard process monitors show dockerd as alive, yet you cannot manage, inspect, or evacuate workloads.
This guide covers how to distinguish a hang from a crash, identify whether the root cause is storage, a stuck shim, a plugin, or an internal deadlock, and recover without unnecessary host reboots or container kills.
What this means
When dockerd hangs, the process remains in the process table but stops making progress on API requests. Lifecycle operations and state queries block on internal mutexes or I/O. Running containers survive because containerd-shim processes supervise them independently. The container engine is fine; the control plane is not.
A crashed daemon is detected immediately by systemd and can be restarted. A hung daemon passes process-level health checks while rendering the host unmanageable. Orchestrators relying on docker ps or docker inspect mark the node unhealthy. Automated recovery scripts may themselves deadlock.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Storage driver deadlock | Most commands hang; high host I/O wait | df -h /var/lib/docker and df -i /var/lib/docker |
| Disk or inode exhaustion | Daemon logs show “no space left on device” | du -sh /var/lib/docker/*/ |
| containerd shim stuck | One container stuck in stopping or removing | ctr -n moby tasks list |
| Volume or network plugin hang | Hangs correlate with specific volume or network operations | journalctl -u docker.service for plugin errors |
| File descriptor exhaustion | Daemon accepts connections but fails new operations | ls /proc/$(pgrep -x dockerd)/fd | wc -l |
| Internal daemon deadlock | /_ping works but state commands hang | Goroutine dump via /debug/pprof/goroutine |
Quick checks
Run these to triage in under 60 seconds:
# Is dockerd in the process table?
pgrep -x dockerd && echo "dockerd alive" || echo "dockerd missing"
# Probe the API with a hard timeout
curl -s --max-time 5 --unix-socket /var/run/docker.sock http://localhost/_ping
# Bypass dockerd and check containerd
ctr version
ctr -n moby tasks list
# Check if container processes are still running
pgrep -a containerd-shim
# Check disk space and inodes
df -h /var/lib/docker
df -i /var/lib/docker
# Check dockerd file descriptor usage against its limit
ls /proc/$(pgrep -x dockerd)/fd | wc -l
cat /proc/$(pgrep -x dockerd)/limits | grep "open files"
# Check for recent daemon errors
journalctl -u docker.service --priority=err --since "10 minutes ago"
What bad looks like:
/_pingtimes out or takes >5 seconds while the process exists: daemon is hung.ctrcommands work butcurlto the Docker socket fails: dockerd is hung, containerd is fine.dfshows 100% usage on/var/lib/docker: disk exhaustion is the likely blocker.- FD count near the
Max open fileslimit: FD exhaustion is choking the daemon.
How to diagnose it
Confirm it is a hang, not a crash. Run
pgrep -x dockerd. If the process is missing, the daemon crashed. See Docker daemon not responding: how to troubleshoot a hung dockerd. If it exists, continue here.Test API responsiveness. Run
curl --max-time 5 --unix-socket /var/run/docker.sock http://localhost/_ping. If it returnsOK, the HTTP handler is alive but other subsystems may be locked. If it times out, the daemon is fully hung.Verify containers are still running. Use
ctr -n moby tasks listorpgrep -c containerd-shim. If containers are alive, you are dealing with a daemon-level hang rather than a kernel or containerd failure. This distinction determines whether a dockerd restart is safe.Check host storage health. Run
df -h /var/lib/dockeranddf -i /var/lib/docker. If the filesystem or inodes are exhausted, storage operations inside dockerd block indefinitely. If I/O is not completely wedged, rundu -sh /var/lib/docker/*/to identify whether logs, overlay2 layers, or volumes are the largest consumers.Inspect daemon resource exhaustion. Check
ls /proc/$(pgrep -x dockerd)/fd | wc -lagainst the process limit in/proc/$(pgrep -x dockerd)/limits. If FD usage is >80% of the limit, the daemon cannot open new sockets or files. Check thread count in/proc/$(pgrep -x dockerd)/status; rapid growth without corresponding workload indicates internal contention.Test containerd independently. Run
ctr version. If containerd is also unresponsive, the issue is deeper than dockerd, likely a kernel cgroup or storage driver problem affecting both daemons. If containerd is healthy but dockerd is not, the deadlock is in dockerd or in the dockerd-containerd gRPC path.Capture diagnostics before recovery. If debug mode is enabled and
/_pingresponds, fetch a goroutine dump:curl --max-time 5 --unix-socket /var/run/docker.sock http://localhost/debug/pprof/goroutine?debug=2 > /tmp/goroutine-dump.txt. Collect daemon logs withjournalctl -u docker.service --since "30 minutes ago". This data is lost after restart.Narrow the subsystem. If
/_pingworks butdocker pshangs, the container list lock or storage driver is blocked. Ifdocker psworks butdocker inspecton a specific container hangs, that container’s shim or volume is likely stuck. If all commands hang uniformly, suspect global resource exhaustion or a global lock.Check live-restore status before restarting. If
live-restoreis enabled in/etc/docker/daemon.json, restarting dockerd (systemctl restart docker) will not kill running containers. Without live-restore, a restart terminates all containers. Do not restart blindly.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| Docker daemon response latency | Rising latency precedes hangs | /_ping >500ms sustained |
| Docker daemon process health | Distinguishes hang from crash | Process exists but socket unresponsive |
| Docker daemon goroutine count | Rapid growth indicates forming deadlock | Count growing without container growth |
| Docker daemon file descriptor count | FD exhaustion causes silent failures | Usage >75% of process limit |
| Host disk I/O utilization | Storage driver hangs on saturated I/O | %util >80% for back-end device |
| Docker disk usage | Exhaustion blocks all creation and I/O | >80% filesystem usage on /var/lib/docker |
| containerd responsiveness | Isolates dockerd from runtime issues | ctr version slow or failing |
| Container count by state | Dead containers signal prior cleanup failures | Any container in “dead” state |
Fixes
If the cause is storage exhaustion
Do not restart dockerd first. Free disk space from the host:
- Truncate oversized container log files:
truncate -s 0 /var/lib/docker/containers/<id>/<id>-json.log. This is safe while the container runs. - If the daemon responds enough to accept commands, run
docker image prune -a --filter "until=48h"to remove unused images. - Expand the filesystem or migrate
/var/lib/dockerif growth is legitimate.
If the cause is a storage driver or I/O deadlock
- If the backing storage is network-attached (NFS, iSCSI), check the storage network path. A hung NFS mount can block dockerd in kernel space, which Go cannot timeout.
- If disk and inodes are healthy but I/O wait is extreme, wait for the storm to subside. If the storage layer has wedged in kernel space, reboot the host.
If the cause is a stuck containerd shim
- Identify the stuck task with
ctr -n moby tasks list. - Attempt to stop it via
ctr. If that fails, killing the shim process will terminate that container but may unblock dockerd without affecting others.
If the cause is a plugin failure
- Restart the volume or network plugin if it runs as a separate process.
- As a last resort, restart dockerd with live-restore enabled so running containers survive.
If the cause is file descriptor exhaustion
- Stop non-essential log tailers and monitoring scripts that hold Docker API connections.
- Restart dockerd to clear leaked FDs. This is disruptive without live-restore, but FD leaks rarely resolve without restart.
If the cause is an internal daemon deadlock
- Capture the goroutine dump before restart.
- Restart dockerd. With live-restore, containers survive. Without it, plan a maintenance window or evacuate workloads first.
- After recovery, upgrade Docker if the version is known to have race conditions.
Prevention
- Enable
live-restore: truein/etc/docker/daemon.json. This allows daemon restarts without container death. - Monitor daemon API latency, not just process existence. A process check misses deadlocks.
- Configure log rotation. Add
"log-opts": {"max-size": "10m", "max-file": "3"}todaemon.jsonto prevent log-driven disk exhaustion. - Alert on
/var/lib/dockerdisk usage and inode usage at 70%, not 90%. - Monitor dockerd FD count and goroutine count trends. Sustained growth without workload increase signals a leak.
- Keep the Docker version patched. Internal deadlocks are often fixed in point releases.
- Set
LimitNOFILEin the systemd unit high enough for your container density, and monitor actual usage.
How Netdata helps
- Correlates
dockerdAPI latency with host disk I/O saturation, exposing storage driver hangs before commands freeze. - Tracks container count by state, including dead containers that signal prior daemon instability.
- Monitors
dockerdfile descriptor usage and thread count to catch exhaustion trends. - Alerts on disk space and inode exhaustion on
/var/lib/dockerbefore they block the daemon. - Surfaces containerd health independently from
dockerd, helping isolate the failure layer.
Related guides
- Docker daemon not responding: how to troubleshoot a hung dockerd
- Docker disk space full: how to troubleshoot /var/lib/docker
- Docker container keeps restarting: causes, checks, and fixes
- Docker container exits immediately: how to diagnose it
- Docker CPU throttling: the hidden cause of container latency
- Docker container high memory usage: how to diagnose it
- Docker container high CPU usage: causes and fixes





