Docker exit code 137: OOMKilled or SIGKILL?

A container exits with code 137. Docker restarts it, or it stays down, and you need to know why. The number itself only tells you that the process received SIGKILL. What matters for your next step is whether the kernel’s cgroup OOM killer fired because the container exceeded its memory limit, or whether an external actor sent the signal. The remediation for an undersized memory limit is completely different from fixing a misconfigured stop timeout or an orchestrator sending a premature kill. This guide shows how to classify the cause in under a minute using only the Docker CLI and cgroup files.

What this means

Exit code 137 follows the standard Linux fatal signal convention: 128 plus the signal number. SIGKILL is 9, so 128 + 9 = 137. When a Docker container exits 137, its PID 1 was terminated by SIGKILL. The operational question is who delivered it.

There are two broad families:

  1. Cgroup OOM kill. The container exceeded its memory limit, or the host ran out of memory and the kernel OOM killer chose a process in the container’s cgroup. If the killer took down PID 1, Docker sets .State.OOMKilled to true.
  2. External SIGKILL. The container was killed by docker kill, docker stop escalation, systemd, Kubernetes, a CI/CD runner, or the host OOM killer acting outside the container’s cgroup. In these cases, .State.OOMKilled is false.

Multi-process containers add a third scenario. In a container running supervisord, a shell wrapper, or a sidecar model, the kernel may OOM-kill a child worker while PID 1 survives. The container continues running in a degraded state, docker inspect shows OOMKilled: false, and the only trace is an increment in the cgroup’s memory.events counter.

Common causes

CauseWhat it looks likeFirst thing to check
Cgroup OOM killExit 137, .State.OOMKilled: true, memory was near the limitdocker inspect .State.OOMKilled
External SIGKILL from operator or orchestratorExit 137, .State.OOMKilled: false, no OOM lines in kernel logsDocker daemon logs for kill or stop events
docker stop timeout escalationExit 137 after docker stop, preceded by SIGTERM; .State.OOMKilled: falseWhether the application handles SIGTERM within the timeout
Host-level OOM (no container memory limit)Exit 137, .State.OOMKilled: false, dmesg shows kill without a cgroup pathHost memory pressure and MemAvailable
Child-process OOM in multi-process containerContainer still running but degraded, or exits later; .State.OOMKilled: falseCgroup memory.events oom_kill counter

Quick checks

These checks are read-only and safe to run during an incident.

# Check OOMKilled flag and exit code
docker inspect --format '{{.State.OOMKilled}} {{.State.ExitCode}}' <container_id>

# Check kernel OOM logs
dmesg | grep -i "oom\|killed process" | tail -20

# Alternative via journal
journalctl -k | grep -i oom | tail -20

# Check current memory usage against the limit
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}" <container_id>

# cgroup v2: read the OOM kill counter
cat /sys/fs/cgroup/system.slice/docker-<container_id>.scope/memory.events

# cgroup v1: read the OOM kill counter
cat /sys/fs/cgroup/memory/docker/<container_id>/memory.oom_control

Cgroup paths vary by host configuration. On systemd-managed hosts with cgroup v2, the path is typically /sys/fs/cgroup/system.slice/docker-<id>.scope/. On cgroup v1 hosts it is typically /sys/fs/cgroup/memory/docker/<id>/. If your cgroup driver or systemd integration differs, the exact path may vary.

How to diagnose it

Use this flow to classify the source of the SIGKILL.

  1. Check docker inspect for .State.OOMKilled and .State.ExitCode. If OOMKilled is true, the kernel’s cgroup OOM killer terminated PID 1. The container exceeded its memory limit, or the host was under memory pressure and the kernel selected this cgroup. Move to the fixes section.

  2. If OOMKilled is false, check dmesg or journalctl -k for OOM messages. Look for lines containing Memory cgroup out of memory: Killed process. If you see a cgroup path matching your container, the OOM killer acted inside the cgroup but did not kill PID 1. This is common in multi-process containers where a child process is sacrificed. Proceed to step 4.

  3. If there are no kernel OOM lines, the SIGKILL was external. Check Docker daemon logs for stop-timeout escalations, manual docker kill commands, or orchestrator actions. Also verify whether systemd or a CI/CD runner sent SIGKILL to the container scope.

  4. Read the cgroup memory.events file (v2) or memory.oom_control (v1). On cgroup v2, memory.events contains two relevant counters: oom (times OOM was triggered) and oom_kill (actual kills). If oom_kill is nonzero but docker inspect showed OOMKilled: false, a child process was OOM-killed inside the container. On cgroup v1, memory.oom_control exposes an oom_kill counter.

  5. Check whether the container had a memory limit and how close it was. Use docker inspect to verify the limit. If no limit was set, the host-level OOM killer may have targeted the process when the host ran out of memory. This produces exit code 137 with OOMKilled: false because the kill was not scoped to the container’s cgroup.

  6. Correlate with container restart count and timing. A sawtooth pattern where memory grows until kill, then the container restarts and grows again, confirms an OOM crash loop. If the restart count is climbing and the exit code is always 137, you are looking at either a limit that is too low or an application memory leak.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Container exit code 137Identifies SIGKILL eventsAny unexpected 137 on a long-running container
Container OOM Killed statusBinary confirmation of cgroup OOMOOMKilled: true in production
Container memory usage vs limitPredicts OOM before it happensSustained usage >75% of limit
cgroup memory.events oom_killCatches child OOMs invisible to DockerCounter increasing while container stays running
Host MemAvailable / memory pressureReveals host-level OOM riskMemAvailable <20% of MemTotal
Container restart countDetects crash loopsRestart count increasing faster than once per hour

Fixes

If the cause is cgroup OOM kill

  • Increase the memory limit. If the workload legitimately needs more memory, raise the limit with docker update --memory or your orchestrator equivalent. Leave headroom for runtime overhead.
  • Fix the memory leak. If usage climbs monotonically until OOM, profile the application. For JVM containers, ensure -Xmx leaves room for metaspace, thread stacks, and native memory. Setting -Xmx equal to the container limit is a common mistake that guarantees OOM.
  • Tune the runtime. Language runtimes often pre-allocate memory based on host size rather than cgroup limits. Verify that your runtime respects container boundaries and size its internal heaps or arenas accordingly.
  • Consider swap. If your workload tolerates swapped pages, enabling swap can delay or prevent OOM kills. Without swap, the OOM killer fires immediately when the limit is hit.

If the cause is external SIGKILL

  • Fix graceful shutdown. If docker stop produces 137 because the application ignores SIGTERM, implement a signal handler or increase the stop timeout with --stop-timeout. The default is 10 seconds.
  • Stop aggressive cleanup jobs. CI/CD runners and some orchestration controllers send SIGKILL for fast cleanup. If this is premature, increase the grace period or fix the job logic.
  • Review orchestrator policies. Kubernetes terminationGracePeriodSeconds, systemd TimeoutStopSec, and Swarm stop actions can all escalate to SIGKILL. Ensure the timeout matches the application’s actual shutdown time.

If the cause is child-process OOM

  • Increase the container memory limit. Even if PID 1 survived, the cgroup is under memory pressure. The child died because the overall limit was too low.
  • Restructure the container. Where possible, use a single-process model per container so that any OOM kill is visible to Docker via OOMKilled: true and triggers a restart if configured.
  • Monitor memory.events directly. Because Docker does not surface child OOM kills in docker inspect, track the cgroup oom_kill counter as a primary signal for multi-process containers.

Prevention

  • Set memory limits with headroom. Do not run production containers without limits, but do not set limits so tight that normal spikes trigger OOM.
  • Monitor cgroup memory.events and oom_kill. On cgroup v2 hosts, alert on any increase in oom_kill. On cgroup v1, monitor memory.oom_control. This catches child OOMs that Docker hides.
  • Configure log rotation. Exit 137 investigations often happen during incidents where disk pressure also masks signals. Configure max-size and max-file for the json-file driver.
  • Test graceful shutdown. Verify that your application exits cleanly on SIGTERM within the stop timeout. If it cannot, adjust the timeout or the application.
  • Alert on restart counts. A container that restarts even once per hour is degrading. Correlate restart spikes with memory usage and exit codes.

How Netdata helps

Netdata surfaces the signals you need to correlate exit code 137 with its root cause without manual cgroup inspection:

  • Per-container memory charts show usage against the cgroup limit, highlighting when a container is approaching OOM.
  • Cgroup v2 memory.events monitoring tracks oom and oom_kill counters, including child-process kills that Docker does not report.
  • Exit code and restart count visibility across the fleet lets you spot 137 patterns and crash loops without running docker inspect on every host.
  • Host memory pressure correlation shows whether the kill was a container-level limit breach or part of a wider host OOM event.