Docker container high memory usage: how to diagnose it

Your container is sitting at 90% of its memory limit but has not been OOMKilled. Or it is being killed repeatedly and you cannot tell whether the limit is too low or the application is leaking. docker stats shows a single percentage, but that number mixes reclaimable page cache with anonymous memory that the kernel cannot reclaim. To diagnose this correctly, you need to decompose cgroup memory.stat, map it to your runtime’s actual allocations, and decide whether the problem is cache pressure, a runtime mismatch, or a true leak.

After reading this guide, you will be able to break down container memory by category, distinguish reclaimable cache from chargeable anonymous pages, correlate runtime heap metrics with cgroup RSS, and decide whether to raise the limit, tune the runtime, or profile the application.

What this means

Docker enforces memory limits through the cgroup memory controller. The kernel charges several types of memory against the container’s limit:

anon: anonymous pages from heap, stack, and mmap. This is the memory your application actively owns. It is non-reclaimable and the primary driver of OOM risk.
file: page cache from filesystem reads and memory-mapped files. This is reclaimable under pressure. High file memory can make a container look full while leaving ample headroom for allocations.
slab: kernel slab allocations for the cgroup. Usually small, but unbounded slab growth can exhaust the limit.
shmem: shared memory pages, including tmpfs mounts inside the container.

docker stats subtracts inactive file cache from the raw cgroup usage before displaying its percentage. The result approximates RSS. However, the kernel OOM killer evaluates the total chargeable footprint, and runtimes like the JVM, .NET, and Node.js maintain native memory outside their heaps that cgroup metrics capture but runtime dashboards do not. This creates a dangerous gap: your application metrics may look healthy while the cgroup is one allocation away from a SIGKILL.

Common causes

Cause	What it looks like	First thing to check
Anonymous memory leak	`memory.stat` `anon` grows continuously; container eventually OOMKilled	Application heap profile or per-process RSS inside the container
Page cache dominant	docker stats shows high usage but `anon` is low and stable	`memory.stat` `file` vs `anon` ratio
JVM heap/native mismatch	OOMKilled despite JVM heap usage below limit	`-Xmx`, `MaxMetaspaceSize`, and native memory relative to `--memory`
.NET aggressive file caching on large hosts	High `file` cache near limit; GC acts as if memory is exhausted	`DOTNET_GCConserveMemory` or host free RAM
Undersized or missing limit	Usage climbs until it hits a hard ceiling or host-wide OOM	`docker inspect` `HostConfig.Memory`
Kernel slab growth	`memory.stat` `slab` increases without corresponding app growth	cgroup kmem accounting and `dmesg`

Quick checks

Run these read-only commands to classify the memory pressure.

# Approximate usage and limit from Docker CLI
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"

# cgroup v2: detailed memory breakdown per container
CONTAINER_ID=$(docker inspect --format '{{.Id}}' <container_name>)
cat /sys/fs/cgroup/system.slice/docker-${CONTAINER_ID}.scope/memory.stat

# cgroup v1: detailed memory breakdown per container
cat /sys/fs/cgroup/memory/docker/${CONTAINER_ID}/memory.stat

# Current usage and hard limit (v2)
cat /sys/fs/cgroup/system.slice/docker-${CONTAINER_ID}.scope/memory.current
cat /sys/fs/cgroup/system.slice/docker-${CONTAINER_ID}.scope/memory.max

# Current usage and hard limit (v1)
cat /sys/fs/cgroup/memory/docker/${CONTAINER_ID}/memory.usage_in_bytes
cat /sys/fs/cgroup/memory/docker/${CONTAINER_ID}/memory.limit_in_bytes

# OOM status and configured limit
docker inspect --format='OOMKilled={{.State.OOMKilled}} Memory={{.HostConfig.Memory}}' <container_name>

# Kernel OOM log for the container
dmesg | grep -i "memory cgroup out of memory"

# systemd journal alternative for OOM events
journalctl -k | grep -i oom

# OOM kill counter (cgroup v2)
cat /sys/fs/cgroup/system.slice/docker-${CONTAINER_ID}.scope/memory.events

# OOM kill counter (cgroup v1)
cat /sys/fs/cgroup/memory/docker/${CONTAINER_ID}/memory.oom_control

How to diagnose it

Decompose memory.stat to find the real consumer. Read memory.stat inside the container’s cgroup. If file dominates and anon is stable, the container is using page cache. This is usually reclaimable and not an immediate threat. If anon is high or growing, the application is holding non-reclaimable memory. That is where your pressure is.
Check for actual OOM kills. Run docker inspect --format='{{.State.OOMKilled}}' <container> and check the exit code. Exit code 137 with OOMKilled=true means the kernel killed the container at the cgroup boundary. If OOMKilled=false, the SIGKILL came from outside, such as an orchestrator or operator. Do not assume 137 always means memory.
Map runtime memory to cgroup limits. For JVM containers, compare -Xmx plus MaxMetaspaceSize and native overhead to the container limit. A common mistake is setting -Xmx equal to the container limit, leaving no room for metaspace, thread stacks, JIT code cache, or direct byte buffers. Set -Xmx to roughly 75% of the container limit and cap metaspace.
For Node.js, note that V8 defaults to about 1.5 times --max-old-space-size in its internal heap budgeting. In a 512MB container, set --max-old-space-size=384 to prevent V8 from attempting to allocate beyond the cgroup boundary.
For .NET on hosts with more than 36GB free RAM, the runtime may fill file cache up to the container memory limit, causing the GC to believe memory is exhausted even though anonymous memory is not. Set DOTNET_GCConserveMemory or reduce the container memory limit to constrain this behavior.
Look for leak signatures. If anon increases monotonically over hours or days without a corresponding traffic increase, treat it as a leak until proven otherwise. Collect an application heap dump or use a runtime profiler. A sawtooth pattern where memory grows then drops sharply is usually healthy garbage collection. A staircase pattern that never plateaus is not.
Check for child-process OOM kills. The kernel OOM killer may target a child process instead of PID 1. In this case, the container survives but is degraded, and docker inspect may show OOMKilled=false. Check dmesg or memory.events for OOM kill counters. If the counter is rising but Docker does not report an OOM, a child process was sacrificed.
Evaluate the limit itself. If memory.max shows max, the container has no hard limit. It can consume host memory until the system-wide OOM killer selects a victim, which may not be the offending container. Set explicit limits on every production container so that misbehavior is contained.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`memory.stat` `anon`	Non-reclaimable application memory; drives OOM risk	Steady growth over 6+ hours
`memory.stat` `file`	Reclaimable page cache; included in raw usage but not in RSS	High value is fine if `anon` is low
`memory.events` `oom_kill` (v2) or `memory.oom_control` (v1)	Kernel-level counter of kills inside the cgroup	Any increment
Container `OOMKilled` status	Docker’s view of whether the last termination was an OOM kill	`true` after a restart
docker stats `MemPerc`	Approximate RSS as percentage of limit	Sustained >85%
Container restart count + exit code 137	Indicates an OOM crash loop	Restart count increasing with 137
JVM heap committed / .NET GC heap	Runtime’s own view of memory vs cgroup reality	Runtime heap far below cgroup limit but container still OOMs

Fixes

If the cause is cache pressure

High file memory means the container is caching disk reads. This is normally harmless. If it is causing the kernel to reclaim cache too aggressively and hurting I/O performance, reduce application read-ahead or move temp files off tmpfs. Do not lower the container memory limit solely because file is large; the limit should be sized for anon plus runtime headroom.

If the cause is a runtime mismatch

For JVM workloads, set the container limit to roughly 20-30% above the intended max heap, then configure -Xmx to 75% of the container limit. Explicitly cap metaspace with -XX:MaxMetaspaceSize and code cache with -XX:ReservedCodeCacheSize.

For Node.js, set --max-old-space-size to leave headroom below the cgroup limit. Use roughly 75% of the limit as a starting point.

For .NET on large-memory hosts, set the DOTNET_GCConserveMemory environment variable to reduce the GC’s tendency to treat file cache as unavailable memory.

If the cause is a memory leak

Profile the application using runtime-specific tools: heap dumps for the JVM, V8 heap snapshots for Node.js, or dotnet-gcdump for .NET. If the leak is in application code, fix and redeploy. If the leak is in a dependency, upgrade or constrain the container limit and accept periodic restarts as a temporary mitigation.

If the cause is an undersized limit

Raise the container --memory limit to match the application’s steady-state anon plus a burst buffer. As a rule, a container consistently above 85% of its limit warrants investigation. A container at 90% with stable, non-growing anon is simply undersized, not leaking.

Prevention

Set explicit --memory limits on every production container. Without a limit, a runaway container becomes a host-wide OOM risk.
Size runtime memory settings relative to the cgroup limit. Never set JVM -Xmx equal to the container limit.
Monitor memory.stat anon trend, not just total usage. Total usage includes cache and creates false alarms or false confidence.
Alert on memory.events oom_kill increments and container restart counts paired with exit code 137.
Configure container log rotation so that disk exhaustion does not compound memory pressure during incidents.
Set PID limits to prevent fork bombs from amplifying memory pressure.

How Netdata helps

Per-container cgroup breakdown: Netdata surfaces anon, file, and slab from memory.stat per container, so you can see whether high usage is cache or application memory without logging into the host.
OOM correlation: Netdata correlates container memory percentage, oom_kill events, and exit code 137 on the same timeline, making it easier to distinguish OOM kills from external SIGKILLs.
Runtime context: By viewing container memory alongside CPU throttling and disk I/O, you can determine whether memory pressure is isolated or part of a broader resource saturation pattern.

Docker container high memory usage: how to diagnose it

Docker container high memory usage: how to diagnose it

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

If the cause is cache pressure

If the cause is a runtime mismatch

If the cause is a memory leak

If the cause is an undersized limit

Prevention

How Netdata helps

Related guides