Docker container high memory usage: how to diagnose it
Your container is sitting at 90% of its memory limit but has not been OOMKilled. Or it is being killed repeatedly and you cannot tell whether the limit is too low or the application is leaking. docker stats shows a single percentage, but that number mixes reclaimable page cache with anonymous memory that the kernel cannot reclaim. To diagnose this correctly, you need to decompose cgroup memory.stat, map it to your runtime’s actual allocations, and decide whether the problem is cache pressure, a runtime mismatch, or a true leak.
After reading this guide, you will be able to break down container memory by category, distinguish reclaimable cache from chargeable anonymous pages, correlate runtime heap metrics with cgroup RSS, and decide whether to raise the limit, tune the runtime, or profile the application.
What this means
Docker enforces memory limits through the cgroup memory controller. The kernel charges several types of memory against the container’s limit:
- anon: anonymous pages from heap, stack, and mmap. This is the memory your application actively owns. It is non-reclaimable and the primary driver of OOM risk.
- file: page cache from filesystem reads and memory-mapped files. This is reclaimable under pressure. High file memory can make a container look full while leaving ample headroom for allocations.
- slab: kernel slab allocations for the cgroup. Usually small, but unbounded slab growth can exhaust the limit.
- shmem: shared memory pages, including tmpfs mounts inside the container.
docker stats subtracts inactive file cache from the raw cgroup usage before displaying its percentage. The result approximates RSS. However, the kernel OOM killer evaluates the total chargeable footprint, and runtimes like the JVM, .NET, and Node.js maintain native memory outside their heaps that cgroup metrics capture but runtime dashboards do not. This creates a dangerous gap: your application metrics may look healthy while the cgroup is one allocation away from a SIGKILL.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Anonymous memory leak | memory.stat anon grows continuously; container eventually OOMKilled | Application heap profile or per-process RSS inside the container |
| Page cache dominant | docker stats shows high usage but anon is low and stable | memory.stat file vs anon ratio |
| JVM heap/native mismatch | OOMKilled despite JVM heap usage below limit | -Xmx, MaxMetaspaceSize, and native memory relative to --memory |
| .NET aggressive file caching on large hosts | High file cache near limit; GC acts as if memory is exhausted | DOTNET_GCConserveMemory or host free RAM |
| Undersized or missing limit | Usage climbs until it hits a hard ceiling or host-wide OOM | docker inspect HostConfig.Memory |
| Kernel slab growth | memory.stat slab increases without corresponding app growth | cgroup kmem accounting and dmesg |
Quick checks
Run these read-only commands to classify the memory pressure.
# Approximate usage and limit from Docker CLI
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"
# cgroup v2: detailed memory breakdown per container
CONTAINER_ID=$(docker inspect --format '{{.Id}}' <container_name>)
cat /sys/fs/cgroup/system.slice/docker-${CONTAINER_ID}.scope/memory.stat
# cgroup v1: detailed memory breakdown per container
cat /sys/fs/cgroup/memory/docker/${CONTAINER_ID}/memory.stat
# Current usage and hard limit (v2)
cat /sys/fs/cgroup/system.slice/docker-${CONTAINER_ID}.scope/memory.current
cat /sys/fs/cgroup/system.slice/docker-${CONTAINER_ID}.scope/memory.max
# Current usage and hard limit (v1)
cat /sys/fs/cgroup/memory/docker/${CONTAINER_ID}/memory.usage_in_bytes
cat /sys/fs/cgroup/memory/docker/${CONTAINER_ID}/memory.limit_in_bytes
# OOM status and configured limit
docker inspect --format='OOMKilled={{.State.OOMKilled}} Memory={{.HostConfig.Memory}}' <container_name>
# Kernel OOM log for the container
dmesg | grep -i "memory cgroup out of memory"
# systemd journal alternative for OOM events
journalctl -k | grep -i oom
# OOM kill counter (cgroup v2)
cat /sys/fs/cgroup/system.slice/docker-${CONTAINER_ID}.scope/memory.events
# OOM kill counter (cgroup v1)
cat /sys/fs/cgroup/memory/docker/${CONTAINER_ID}/memory.oom_control
How to diagnose it
Decompose memory.stat to find the real consumer. Read
memory.statinside the container’s cgroup. Iffiledominates andanonis stable, the container is using page cache. This is usually reclaimable and not an immediate threat. Ifanonis high or growing, the application is holding non-reclaimable memory. That is where your pressure is.Check for actual OOM kills. Run
docker inspect --format='{{.State.OOMKilled}}' <container>and check the exit code. Exit code 137 withOOMKilled=truemeans the kernel killed the container at the cgroup boundary. IfOOMKilled=false, the SIGKILL came from outside, such as an orchestrator or operator. Do not assume 137 always means memory.Map runtime memory to cgroup limits. For JVM containers, compare
-XmxplusMaxMetaspaceSizeand native overhead to the container limit. A common mistake is setting-Xmxequal to the container limit, leaving no room for metaspace, thread stacks, JIT code cache, or direct byte buffers. Set-Xmxto roughly 75% of the container limit and cap metaspace.For Node.js, note that V8 defaults to about 1.5 times
--max-old-space-sizein its internal heap budgeting. In a 512MB container, set--max-old-space-size=384to prevent V8 from attempting to allocate beyond the cgroup boundary.For .NET on hosts with more than 36GB free RAM, the runtime may fill file cache up to the container memory limit, causing the GC to believe memory is exhausted even though anonymous memory is not. Set
DOTNET_GCConserveMemoryor reduce the container memory limit to constrain this behavior.Look for leak signatures. If
anonincreases monotonically over hours or days without a corresponding traffic increase, treat it as a leak until proven otherwise. Collect an application heap dump or use a runtime profiler. A sawtooth pattern where memory grows then drops sharply is usually healthy garbage collection. A staircase pattern that never plateaus is not.Check for child-process OOM kills. The kernel OOM killer may target a child process instead of PID 1. In this case, the container survives but is degraded, and
docker inspectmay showOOMKilled=false. Checkdmesgormemory.eventsfor OOM kill counters. If the counter is rising but Docker does not report an OOM, a child process was sacrificed.Evaluate the limit itself. If
memory.maxshowsmax, the container has no hard limit. It can consume host memory until the system-wide OOM killer selects a victim, which may not be the offending container. Set explicit limits on every production container so that misbehavior is contained.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
memory.stat anon | Non-reclaimable application memory; drives OOM risk | Steady growth over 6+ hours |
memory.stat file | Reclaimable page cache; included in raw usage but not in RSS | High value is fine if anon is low |
memory.events oom_kill (v2) or memory.oom_control (v1) | Kernel-level counter of kills inside the cgroup | Any increment |
Container OOMKilled status | Docker’s view of whether the last termination was an OOM kill | true after a restart |
docker stats MemPerc | Approximate RSS as percentage of limit | Sustained >85% |
| Container restart count + exit code 137 | Indicates an OOM crash loop | Restart count increasing with 137 |
| JVM heap committed / .NET GC heap | Runtime’s own view of memory vs cgroup reality | Runtime heap far below cgroup limit but container still OOMs |
Fixes
If the cause is cache pressure
High file memory means the container is caching disk reads. This is normally harmless. If it is causing the kernel to reclaim cache too aggressively and hurting I/O performance, reduce application read-ahead or move temp files off tmpfs. Do not lower the container memory limit solely because file is large; the limit should be sized for anon plus runtime headroom.
If the cause is a runtime mismatch
For JVM workloads, set the container limit to roughly 20-30% above the intended max heap, then configure -Xmx to 75% of the container limit. Explicitly cap metaspace with -XX:MaxMetaspaceSize and code cache with -XX:ReservedCodeCacheSize.
For Node.js, set --max-old-space-size to leave headroom below the cgroup limit. Use roughly 75% of the limit as a starting point.
For .NET on large-memory hosts, set the DOTNET_GCConserveMemory environment variable to reduce the GC’s tendency to treat file cache as unavailable memory.
If the cause is a memory leak
Profile the application using runtime-specific tools: heap dumps for the JVM, V8 heap snapshots for Node.js, or dotnet-gcdump for .NET. If the leak is in application code, fix and redeploy. If the leak is in a dependency, upgrade or constrain the container limit and accept periodic restarts as a temporary mitigation.
If the cause is an undersized limit
Raise the container --memory limit to match the application’s steady-state anon plus a burst buffer. As a rule, a container consistently above 85% of its limit warrants investigation. A container at 90% with stable, non-growing anon is simply undersized, not leaking.
Prevention
- Set explicit
--memorylimits on every production container. Without a limit, a runaway container becomes a host-wide OOM risk. - Size runtime memory settings relative to the cgroup limit. Never set JVM
-Xmxequal to the container limit. - Monitor
memory.statanontrend, not just total usage. Total usage includes cache and creates false alarms or false confidence. - Alert on
memory.eventsoom_killincrements and container restart counts paired with exit code 137. - Configure container log rotation so that disk exhaustion does not compound memory pressure during incidents.
- Set PID limits to prevent fork bombs from amplifying memory pressure.
How Netdata helps
- Per-container cgroup breakdown: Netdata surfaces
anon,file, andslabfrommemory.statper container, so you can see whether high usage is cache or application memory without logging into the host. - OOM correlation: Netdata correlates container memory percentage,
oom_killevents, and exit code 137 on the same timeline, making it easier to distinguish OOM kills from external SIGKILLs. - Runtime context: By viewing container memory alongside CPU throttling and disk I/O, you can determine whether memory pressure is isolated or part of a broader resource saturation pattern.




