Docker disk space full: how to troubleshoot /var/lib/docker

You notice deployments failing with “no space left on device,” image pulls hanging, or the Docker daemon becoming sluggish. On a Docker host, everything lives under /var/lib/docker: image layers, container writable layers and logs, named volumes, and build cache. When this filesystem fills, the failure is cascading and abrupt. New containers cannot start, running containers may fail on writes, and daemon operations deadlock.

This guide walks through identifying which of the five major consumers is dominating your disk, safely reclaiming space without deleting data you still need, and fixing the configuration gaps that let it happen again.

What this means

Docker stores all runtime data under /var/lib/docker. On a modern host using the overlay2 storage driver, the key subdirectories are:

  • overlay2/: image layers, container writable layers, and BuildKit snapshotter data
  • containers/: container metadata and json-file logs
  • volumes/: named and anonymous volume data
  • image/: image manifests and configuration JSON
  • buildkit/: BuildKit cache and build state

If your storage driver is not overlay2, the path may differ. Note that devicemapper was removed in Docker Engine v25.0; modern hosts should use overlay2.

The docker system df command breaks usage into four categories: Images, Containers, Local Volumes, and Build Cache. Each shows a SIZE and a RECLAIMABLE column. RECLAIMABLE is what Docker knows it can safely delete, but this number does not include volumes unless you explicitly target them, and it may undercount orphan overlay2 layers that have lost their metadata references.

When /var/lib/docker fills, there is no graceful degradation. The daemon may hang during storage operations, container creates fail, and json-file logs cannot rotate because there is no space to create new files.

Common causes

CauseWhat it looks likeFirst thing to check
Unbounded container logs (json-file driver)/var/lib/docker/containers/ grows rapidly; individual *-json.log files in the gigabytesfind /var/lib/docker/containers/ -name "*-json.log" -exec ls -lh {} \;
Dangling or unused imagesImage count is high but active container count is low; docker system df Images line shows large reclaimable`docker images –format ‘{{.Size}}\t{{.Repository}}:{{.Tag}}’
Orphaned volumesVolume usage grows even after containers are removeddocker volume ls and du -sh /var/lib/docker/volumes/*/
Build cache accumulation (CI/CD hosts)Build Cache line or /var/lib/docker/buildkit/ is tens of gigabytesdocker system df Build Cache line; docker builder prune --dry-run
Application writing to container filesystemContainer writable layers grow because data is not on a volumedocker ps -a --size --format '{{.Names}}\t{{.Size}}'
Orphan overlay2 layersdu shows large overlay2/ but docker system df reports 0B reclaimable`ls /var/lib/docker/overlay2/

Quick checks

# Check filesystem utilization for the Docker data directory
df -h /var/lib/docker

# Docker's own accounting of space by category
docker system df

# Verbose per-object breakdown (slow on large hosts)
docker system df -v

# Host-level directory sizes to find the dominant subdirectory
du -sh /var/lib/docker/*/

# Container log file sizes (json-file driver)
find /var/lib/docker/containers/ -name "*-json.log" -exec ls -lh {} \;

# Largest images by size
docker images --format '{{.Size}}\t{{.Repository}}:{{.Tag}}' | sort -hr | head -20

# Writable layer sizes for containers
# WARNING: this triggers a filesystem walk and is expensive; do not poll frequently
docker ps -a --size --format "table {{.Names}}\t{{.Size}}"

# Volume disk usage
sudo du -sh /var/lib/docker/volumes/*/

# Build cache dry-run to see reclaimable space
docker builder prune --dry-run

# Count dangling images
docker images -f "dangling=true" -q | wc -l

# Check active overlay mounts (active containers create merge mounts)
mount | grep overlay2

How to diagnose it

  1. Confirm the filesystem is actually full. Run df -h /var/lib/docker. If the backing filesystem is at 100%, every write operation will fail. If it is separate from the root filesystem, the host may look healthy while Docker is dead.

  2. Get Docker’s accounting. Run docker system df. Compare the SIZE and RECLAIMABLE columns across Images, Containers, Local Volumes, and Build Cache. If RECLAIMABLE is large relative to SIZE, you have tracked garbage that Docker can remove safely.

  3. Cross-check with host-level tools. Run du -sh /var/lib/docker/*/. This tells you which subdirectory is the largest. Note that du can double-count overlay2 layers because stacked overlay mounts include lowerdir data, so treat overlay2/ numbers as directional, not exact. If containers/ is the largest, logs are likely the culprit. If volumes/ dominates, check orphaned database or data volumes.

  4. Inspect container logs. If containers/ is large, run find /var/lib/docker/containers/ -name "*-json.log" -exec ls -lh {} \;. A single multi-gigabyte log file from a verbose or crash-looping container is the most common silent grower. The json-file driver has no max-size limit unless configured.

  5. Audit images and dangling layers. If image storage is large, list images sorted by size. Dangling images (docker images -f "dangling=true") are typically safe to remove, but be aware that parent images shared by multiple tagged images are protected. docker image prune will not delete an image that serves as a parent for another image until the child is removed first.

  6. Check build cache separately. The Build Cache line in docker system df maps to the legacy builder. If you use BuildKit, run docker buildx du for the authoritative view. docker system prune does not cover BuildKit-native cache; you must run docker buildx prune independently.

  7. Evaluate volume bloat. Run docker volume ls and inspect /var/lib/docker/volumes/*/. Volumes persist after their container is removed unless the container was removed with -v. docker system prune --volumes is required to remove unused volumes; without --volumes, they survive even with --all.

  8. Look for untracked overlay2 bloat. If docker system df reports 0B reclaimable but du shows a massive overlay2/ directory, you may have orphaned layer directories left by unclean shutdowns, parallel build races, or old storage drivers. In this state, docker system prune -af recovers nothing. docker buildx prune -af may recover BuildKit-related space. Full recovery of orphaned metadata sometimes requires stopping the Docker daemon and removing /var/lib/docker contents, which is destructive and requires re-pulling all images.

  9. Review container writable layers. Run docker ps -a --size. Containers should ideally write nothing to their writable layer. Layers growing into gigabytes indicate the application is writing logs, temp files, or data inside the container instead of to a mounted volume.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Filesystem usage on /var/lib/dockerThe cliff-edge failure point>80% used
Docker Disk Usage total (docker system df)Tracks images, containers, volumes, and build cache in one viewRECLAIMABLE >20% of total SIZE
Container log file sizesjson-file logs grow unbounded without rotationAny single *-json.log >1GB
Image count and dangling image countOld images accumulate over deploymentsDangling images >100 or >10GB
Build cache sizeCI/CD runners accumulate cache rapidlyBuild cache >20GB or >20% of Docker disk usage
Volume usageOrphaned volumes persist silentlyRapid growth in /var/lib/docker/volumes/
Container writable layer sizeApps writing to overlay2 instead of volumesAny container writable layer >1GB
Docker daemon errorsStorage driver errors precede corruption“no space left on device” or overlay2 errors in journal

Fixes

If logs are consuming the disk

Truncating a live json-file log is safe because the file is opened with O_APPEND. You can zero a large log immediately to recover space:

# Truncate a specific container log (safe while container is running)
truncate -s 0 /var/lib/docker/containers/<container-id>/<container-id>-json.log

Then fix the root cause. Add to /etc/docker/daemon.json:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

Restart the daemon to apply. Consider switching to the local log driver, which is more efficient and supports rotation by default. For a deeper treatment of log rotation, see Docker log rotation: preventing json-file logs from filling disk.

If unused images dominate

Targeted pruning is safer than a blanket docker system prune -af. Remove images older than a safe window:

# Remove dangling images
docker image prune -f

# Remove all unused images not referenced in 48 hours
docker image prune -a --filter "until=48h"

Remember that shared parent layers are protected. If an image is a base for others, Docker will retain it until all derived images are removed.

If volumes are consuming space

docker volume prune removes only volumes not attached to any container. If you need to remove stopped containers and their anonymous volumes together, use:

# Destructive: removes stopped containers, unused networks, dangling images, and unused volumes
docker system prune --all --volumes -f

Without --volumes, volumes survive. Always verify volume contents before pruning in production.

If build cache is the culprit

docker system prune does not clear BuildKit-native cache. Run:

# Dry-run first
docker builder prune --dry-run

# Clear legacy builder cache
docker builder prune -f

# Clear BuildKit cache
docker buildx prune -f

On Docker v25 and later, configure automatic garbage collection in daemon.json:

{
  "builder": {
    "gc": {
      "defaultKeepStorage": "20GB"
    }
  }
}

If container writable layers are large

The application is writing data to the container filesystem instead of a volume. You cannot shrink a writable layer. Stop the container, remove it, and recreate it with a volume mount for the data path. Audit the application for log and temp file paths.

If overlay2 has untracked orphan layers

When docker system df shows 0B reclaimable but du shows massive overlay2/ usage, standard prune will not help. Try docker buildx prune -af first. If that fails, the remaining orphan directories have no metadata references. The last resort is to stop the Docker daemon and remove /var/lib/docker entirely, then re-pull images. This is destructive and requires planning for running containers.

Prevention

  • Configure log rotation before deploying workloads. The default json-file driver has no size limit. Set max-size and max-file in daemon.json, or use the local driver.
  • Automate cleanup with filters, not broad prunes. Schedule docker system prune --filter "until=48h" and docker volume prune via cron or systemd timer. Avoid unattended docker system prune -af, which removes all stopped containers and can destroy debug data.
  • Mount volumes for all persistent or large data. Never let applications write logs, caches, or databases to the container filesystem.
  • Monitor growth rate, not just absolute usage. Alert when /var/lib/docker exceeds 70-80% or grows faster than 1GB per day. Cleanup at 80% is safe; cleanup at 95% may fail because prune operations need working space.
  • Cap build cache on CI runners. Use daemon.json builder.gc settings on Docker v25+, or run docker builder prune and docker buildx prune as part of the CI teardown.
  • Test your cleanup commands. Run prune with --dry-run or --filter during low-risk windows to verify what would be removed before automating it.

How Netdata helps

Netdata correlates host-level disk saturation on the /var/lib/docker filesystem with container runtime signals. When disk usage spikes, correlate it with:

  • Container restart counts and exit codes: Crash loops flood logs and accelerate disk growth.
  • Container block I/O: Identify which container is writing heavily to its writable layer or volumes.
  • Docker daemon error logs: Storage driver errors and “no space left on device” messages appear alongside disk saturation.
  • Container state distribution: A rising count of exited containers indicates cleanup failure before disk pressure becomes critical.