Docker commands hang: docker ps, inspect, and exec freezes

docker ps, docker inspect, and docker exec hang while containers continue serving traffic. This is a Docker daemon hang: the management plane is dead while the data plane survives. Standard process monitors show dockerd as alive, yet you cannot manage, inspect, or evacuate workloads.

This guide covers how to distinguish a hang from a crash, identify whether the root cause is storage, a stuck shim, a plugin, or an internal deadlock, and recover without unnecessary host reboots or container kills.

What this means

When dockerd hangs, the process remains in the process table but stops making progress on API requests. Lifecycle operations and state queries block on internal mutexes or I/O. Running containers survive because containerd-shim processes supervise them independently. The container engine is fine; the control plane is not.

A crashed daemon is detected immediately by systemd and can be restarted. A hung daemon passes process-level health checks while rendering the host unmanageable. Orchestrators relying on docker ps or docker inspect mark the node unhealthy. Automated recovery scripts may themselves deadlock.

Common causes

CauseWhat it looks likeFirst thing to check
Storage driver deadlockMost commands hang; high host I/O waitdf -h /var/lib/docker and df -i /var/lib/docker
Disk or inode exhaustionDaemon logs show “no space left on device”du -sh /var/lib/docker/*/
containerd shim stuckOne container stuck in stopping or removingctr -n moby tasks list
Volume or network plugin hangHangs correlate with specific volume or network operationsjournalctl -u docker.service for plugin errors
File descriptor exhaustionDaemon accepts connections but fails new operationsls /proc/$(pgrep -x dockerd)/fd | wc -l
Internal daemon deadlock/_ping works but state commands hangGoroutine dump via /debug/pprof/goroutine

Quick checks

Run these to triage in under 60 seconds:

# Is dockerd in the process table?
pgrep -x dockerd && echo "dockerd alive" || echo "dockerd missing"

# Probe the API with a hard timeout
curl -s --max-time 5 --unix-socket /var/run/docker.sock http://localhost/_ping

# Bypass dockerd and check containerd
ctr version
ctr -n moby tasks list

# Check if container processes are still running
pgrep -a containerd-shim

# Check disk space and inodes
df -h /var/lib/docker
df -i /var/lib/docker

# Check dockerd file descriptor usage against its limit
ls /proc/$(pgrep -x dockerd)/fd | wc -l
cat /proc/$(pgrep -x dockerd)/limits | grep "open files"

# Check for recent daemon errors
journalctl -u docker.service --priority=err --since "10 minutes ago"

What bad looks like:

  • /_ping times out or takes >5 seconds while the process exists: daemon is hung.
  • ctr commands work but curl to the Docker socket fails: dockerd is hung, containerd is fine.
  • df shows 100% usage on /var/lib/docker: disk exhaustion is the likely blocker.
  • FD count near the Max open files limit: FD exhaustion is choking the daemon.

How to diagnose it

  1. Confirm it is a hang, not a crash. Run pgrep -x dockerd. If the process is missing, the daemon crashed. See Docker daemon not responding: how to troubleshoot a hung dockerd. If it exists, continue here.

  2. Test API responsiveness. Run curl --max-time 5 --unix-socket /var/run/docker.sock http://localhost/_ping. If it returns OK, the HTTP handler is alive but other subsystems may be locked. If it times out, the daemon is fully hung.

  3. Verify containers are still running. Use ctr -n moby tasks list or pgrep -c containerd-shim. If containers are alive, you are dealing with a daemon-level hang rather than a kernel or containerd failure. This distinction determines whether a dockerd restart is safe.

  4. Check host storage health. Run df -h /var/lib/docker and df -i /var/lib/docker. If the filesystem or inodes are exhausted, storage operations inside dockerd block indefinitely. If I/O is not completely wedged, run du -sh /var/lib/docker/*/ to identify whether logs, overlay2 layers, or volumes are the largest consumers.

  5. Inspect daemon resource exhaustion. Check ls /proc/$(pgrep -x dockerd)/fd | wc -l against the process limit in /proc/$(pgrep -x dockerd)/limits. If FD usage is >80% of the limit, the daemon cannot open new sockets or files. Check thread count in /proc/$(pgrep -x dockerd)/status; rapid growth without corresponding workload indicates internal contention.

  6. Test containerd independently. Run ctr version. If containerd is also unresponsive, the issue is deeper than dockerd, likely a kernel cgroup or storage driver problem affecting both daemons. If containerd is healthy but dockerd is not, the deadlock is in dockerd or in the dockerd-containerd gRPC path.

  7. Capture diagnostics before recovery. If debug mode is enabled and /_ping responds, fetch a goroutine dump: curl --max-time 5 --unix-socket /var/run/docker.sock http://localhost/debug/pprof/goroutine?debug=2 > /tmp/goroutine-dump.txt. Collect daemon logs with journalctl -u docker.service --since "30 minutes ago". This data is lost after restart.

  8. Narrow the subsystem. If /_ping works but docker ps hangs, the container list lock or storage driver is blocked. If docker ps works but docker inspect on a specific container hangs, that container’s shim or volume is likely stuck. If all commands hang uniformly, suspect global resource exhaustion or a global lock.

  9. Check live-restore status before restarting. If live-restore is enabled in /etc/docker/daemon.json, restarting dockerd (systemctl restart docker) will not kill running containers. Without live-restore, a restart terminates all containers. Do not restart blindly.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Docker daemon response latencyRising latency precedes hangs/_ping >500ms sustained
Docker daemon process healthDistinguishes hang from crashProcess exists but socket unresponsive
Docker daemon goroutine countRapid growth indicates forming deadlockCount growing without container growth
Docker daemon file descriptor countFD exhaustion causes silent failuresUsage >75% of process limit
Host disk I/O utilizationStorage driver hangs on saturated I/O%util >80% for back-end device
Docker disk usageExhaustion blocks all creation and I/O>80% filesystem usage on /var/lib/docker
containerd responsivenessIsolates dockerd from runtime issuesctr version slow or failing
Container count by stateDead containers signal prior cleanup failuresAny container in “dead” state

Fixes

If the cause is storage exhaustion

Do not restart dockerd first. Free disk space from the host:

  • Truncate oversized container log files: truncate -s 0 /var/lib/docker/containers/<id>/<id>-json.log. This is safe while the container runs.
  • If the daemon responds enough to accept commands, run docker image prune -a --filter "until=48h" to remove unused images.
  • Expand the filesystem or migrate /var/lib/docker if growth is legitimate.

If the cause is a storage driver or I/O deadlock

  • If the backing storage is network-attached (NFS, iSCSI), check the storage network path. A hung NFS mount can block dockerd in kernel space, which Go cannot timeout.
  • If disk and inodes are healthy but I/O wait is extreme, wait for the storm to subside. If the storage layer has wedged in kernel space, reboot the host.

If the cause is a stuck containerd shim

  • Identify the stuck task with ctr -n moby tasks list.
  • Attempt to stop it via ctr. If that fails, killing the shim process will terminate that container but may unblock dockerd without affecting others.

If the cause is a plugin failure

  • Restart the volume or network plugin if it runs as a separate process.
  • As a last resort, restart dockerd with live-restore enabled so running containers survive.

If the cause is file descriptor exhaustion

  • Stop non-essential log tailers and monitoring scripts that hold Docker API connections.
  • Restart dockerd to clear leaked FDs. This is disruptive without live-restore, but FD leaks rarely resolve without restart.

If the cause is an internal daemon deadlock

  • Capture the goroutine dump before restart.
  • Restart dockerd. With live-restore, containers survive. Without it, plan a maintenance window or evacuate workloads first.
  • After recovery, upgrade Docker if the version is known to have race conditions.

Prevention

  • Enable live-restore: true in /etc/docker/daemon.json. This allows daemon restarts without container death.
  • Monitor daemon API latency, not just process existence. A process check misses deadlocks.
  • Configure log rotation. Add "log-opts": {"max-size": "10m", "max-file": "3"} to daemon.json to prevent log-driven disk exhaustion.
  • Alert on /var/lib/docker disk usage and inode usage at 70%, not 90%.
  • Monitor dockerd FD count and goroutine count trends. Sustained growth without workload increase signals a leak.
  • Keep the Docker version patched. Internal deadlocks are often fixed in point releases.
  • Set LimitNOFILE in the systemd unit high enough for your container density, and monitor actual usage.

How Netdata helps

  • Correlates dockerd API latency with host disk I/O saturation, exposing storage driver hangs before commands freeze.
  • Tracks container count by state, including dead containers that signal prior daemon instability.
  • Monitors dockerd file descriptor usage and thread count to catch exhaustion trends.
  • Alerts on disk space and inode exhaustion on /var/lib/docker before they block the daemon.
  • Surfaces containerd health independently from dockerd, helping isolate the failure layer.