Docker memory usage explained: anonymous, file, slab, and what counts

You look at docker stats, see a container sitting at 1.2 GB of a 1.5 GB limit, and assume it is about to explode. It might be fine. That total includes page cache the kernel will reclaim the second another process asks for memory. Meanwhile, a different container reports 600 MB with no limit set, yet its anonymous memory grows 50 MB per hour and will force a host-level OOM kill before lunch.

This guide explains how Docker accounts for container memory through the Linux cgroup memory controller. You will learn what anonymous, file-backed, and slab memory are, where the numbers come from in cgroup v1 and v2, and which metrics actually predict OOM kills.

What this means

Docker does not allocate memory for containers. The Linux kernel charges every page of RAM to the container’s cgroup, and Docker reads those counters back through the cgroup filesystem. The headline number you see in docker stats is the total cgroup charge. It is accurate but operationally incomplete because it mixes reclaimable and non-reclaimable memory into one figure.

In both cgroup v1 and v2, the canonical breakdown lives in memory.stat. The three fields that matter most are:

  • anon: anonymous pages. This is heap, stack, and anonymous mmap memory. The kernel cannot reclaim these unless swap is configured and available. This is the core memory your application actually needs.
  • file: file-backed pages. This is the page cache. When a container reads from disk, the kernel caches those pages in RAM. This memory is reclaimable without data loss. A high file value looks alarming in docker stats but is usually harmless.
  • slab: kernel slab allocations. These are kernel data structures charged to the cgroup, including caches for dentries, inodes, and other objects. Slab counts toward the cgroup limit. Much of it is reclaimable under pressure, but it still adds to the charge.

cgroup v1 paths are typically /sys/fs/cgroup/memory/docker/<container-id>/, with memory.usage_in_bytes as the total and memory.stat for the breakdown. cgroup v2 paths are typically /sys/fs/cgroup/system.slice/docker-<container-id>.scope/, with memory.current as the total and memory.stat for the breakdown. If your host uses a different cgroup driver, the exact directory names may differ.

docker stats pulls from the same cgroup files. The MEM USAGE column maps to memory.usage_in_bytes or memory.current. The LIMIT column maps to memory.limit_in_bytes or memory.max. Because the usage figure includes file and slab, it is a poor proxy for application memory pressure. The OOM killer does not care how much page cache you hold. It cares whether the non-reclaimable charge exceeds the limit. In practice, that means anon plus slab and related kernel allocations are what you need to watch.

Common causes

CauseWhat it looks likeFirst thing to check
Page cache dominates usagedocker stats shows usage near the limit, but the application is idle or only reading filesmemory.stat file is much larger than anon
Slab accumulationTotal usage drifts up slowly without a matching workload increasememory.stat slab is a large fraction of total
Anonymous memory leakUsage climbs steadily and never plateausmemory.stat anon grows monotonically
Memory limit too tight for baselineanon plus slab sit consistently above 80% of the limitBaseline non-reclaimable usage vs memory.max
JVM or runtime overheadContainer OOMs even though the runtime heap looks smallRuntime native memory vs cgroup limit

Quick checks

Detect cgroup version. The paths and file names you read depend on it.

# Detect cgroup version
if [ -f /sys/fs/cgroup/cgroup.controllers ]; then echo "cgroup v2"; else echo "cgroup v1"; fi

Read total usage and limit (cgroup v2).

# Check total usage and limit for a container
CONTAINER_ID=$(docker inspect --format '{{.Id}}' <container_name>)
cat /sys/fs/cgroup/system.slice/docker-${CONTAINER_ID}.scope/memory.current
cat /sys/fs/cgroup/system.slice/docker-${CONTAINER_ID}.scope/memory.max

Read memory breakdown (cgroup v2).

# Check memory.stat breakdown
cat /sys/fs/cgroup/system.slice/docker-${CONTAINER_ID}.scope/memory.stat | grep -E "^(anon|file|slab|inactive_file) "

Read total usage and limit (cgroup v1).

# Check total usage and limit for a container
CONTAINER_ID=$(docker inspect --format '{{.Id}}' <container_name>)
cat /sys/fs/cgroup/memory/docker/${CONTAINER_ID}/memory.usage_in_bytes
cat /sys/fs/cgroup/memory/docker/${CONTAINER_ID}/memory.limit_in_bytes

Read memory breakdown (cgroup v1).

# Check memory.stat breakdown
cat /sys/fs/cgroup/memory/docker/${CONTAINER_ID}/memory.stat | grep -E "^(anon|file|slab|total_inactive_file) "

Quick view from Docker CLI.

# Live memory stats from docker
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"

Check OOM kill history.

# Check if the container was OOM-killed and its exit code
docker inspect --format '{{.State.OOMKilled}} {{.State.ExitCode}}' <container_name>

Check raw cgroup OOM kill counter (cgroup v2).

cat /sys/fs/cgroup/system.slice/docker-${CONTAINER_ID}.scope/memory.events | grep oom_kill

Check raw cgroup OOM kill counter (cgroup v1).

cat /sys/fs/cgroup/memory/docker/${CONTAINER_ID}/memory.oom_control | grep oom_kill

Check host memory headroom when no container limit is set.

grep MemAvailable /proc/meminfo

How to diagnose it

  1. Determine cgroup version and locate the container’s cgroup. If /sys/fs/cgroup/memory exists, you are on v1. If /sys/fs/cgroup/cgroup.controllers exists, you are on v2. The container path differs, so use the correct files.

  2. Read the total usage and the limit. If the container has no limit, memory.max contains max or memory.limit_in_bytes contains a very large number. Without a limit, the container competes directly with the host and other containers.

  3. Read memory.stat and compare file to anon. If file makes up more than half of memory.current or memory.usage_in_bytes, the container is mostly holding page cache. This is reclaimable. The immediate OOM risk is low unless the host itself is under global memory pressure.

  4. Add anon and slab and treat this as your pressure baseline. If this baseline is above 80% of the limit, the container has almost no headroom for bursts. If it is at the limit, the next allocation triggers an OOM kill.

  5. Check for growth trends. Memory that rises continuously over hours with no corresponding traffic increase is a leak. Leaks almost always live in anon. Slab can grow temporarily but usually plateaus.

  6. Check for past OOM kills. docker inspect shows OOMKilled if PID 1 was killed. The cgroup memory.events (v2) or memory.oom_control (v1) shows the raw oom_kill counter, which also catches child process kills that Docker does not surface in inspect.

  7. Correlate with exit codes and restarts. Exit code 137 means SIGKILL. If OOMKilled is true, the kernel killed the container. If OOMKilled is false but the exit code is 137, something else sent SIGKILL.

  8. For language runtimes like the JVM, compare cgroup limits to runtime flags. If -Xmx is set equal to the container limit, there is no room for metaspace, thread stacks, or native memory. The container will OOM even when the heap looks healthy.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
cgroup memory.current or memory.usage_in_bytesTotal charge that Docker reports as usageSustained >80% of limit
memory.stat anonAnonymous pages: heap, stack, mmap. Non-reclaimable without swap.Steady growth or >60% of limit
memory.stat filePage cache. Reclaimable under pressure.High is normal; watch only if host is low on memory
memory.stat slabKernel slab charged to the cgroup.>20% of total usage
Container OOMKilledBoolean from Docker container state.Any true in production
cgroup oom_kill counterRaw kernel OOM kills for the cgroup.Nonzero increase
Host MemAvailable / MemTotalGlobal headroom for unlimited containers.Host MemAvailable < 20% of total
Exit code 137 + restart countPattern of OOM-induced crash loops.Restarts increasing with exit 137

Fixes

If the cause is page cache bloat

Usually no container-level fix is needed. The kernel reclaims page cache automatically when the host needs memory. If the host is under global pressure, add more RAM or reduce the number of file-heavy containers on the node. Do not raise the container memory limit just to accommodate cache.

If the cause is slab growth

Slab spikes often follow large numbers of file operations or socket creations. If slab stays high, check whether the application is creating millions of small files or connections. Spreading the workload or reducing object churn usually brings slab down.

If the cause is anonymous memory pressure

If anon is high but stable under load, the limit is too low. Raise memory.max or --memory to at least 120% of the observed baseline. If anon grows forever, the application has a leak. Increase the limit temporarily to stop crash loops, then profile the heap.

If the cause is runtime overhead

For the JVM, set -Xmx to roughly 75% of the container memory limit and cap metaspace and code cache explicitly. For Go, check for goroutine leaks that inflate stack memory. For Node.js, inspect the V8 heap and native addon usage.

If the cause is missing limits

Set a limit. Containers without a memory limit can consume all host RAM and trigger a system-wide OOM. The kernel may then kill dockerd, containerd, or other innocent containers.

Prevention

  • Monitor anon and slab, not just the total usage figure from docker stats.
  • Set memory limits based on load-test baselines of non-reclaimable memory, plus burst headroom.
  • Validate language runtime settings against cgroup limits before production deployment.
  • Alert on host MemAvailable when running containers without memory limits.
  • Review container restart counts and exit codes weekly. A restart count of 1 with exit code 137 is an OOM kill that already happened.

How Netdata helps

  • Netdata collects per-container cgroup memory metrics directly from memory.current and memory.stat, surfacing anon, file, and slab on separate dimensions.
  • Correlate anon growth with container restarts, exit codes, and OOM kill events on the same timeline.
  • Alert on memory usage as a percentage of limit while exposing the reclaimable vs non-reclaimable split, so you do not wake up for page cache.
  • Capture cgroup-level OOM events from memory.events without requiring dmesg access.