Docker Engine monitoring with Netdata

What is Docker Engine?

Docker Engine is the industry’s de facto container runtime that runs on various Linux (CentOS, Debian, Fedora, Oracle Linux, RHEL, SUSE, and Ubuntu) and Windows Server operating systems.

Monitoring Docker Engine with Netdata

The prerequisites for monitoring Docker Engine with Netdata are to have Docker Engine and Netdata installed on your system.

Netdata auto discovers hundreds of services, and for those it doesn’t turning on manual discovery is a one line configuration. For more information on configuring Netdata for Docker Engine monitoring please read the collector documentation.

You should now see the Docker Engine section on the Overview tab in Netdata Cloud already populated with charts about all the metrics you care about.

Netdata has a public demo space (no login required) where you can explore different monitoring use-cases and get a feel for Netdata.

What Docker Engine metrics are important to monitor - and why?

Engine Daemon Container Actions

The number of container actions (such as changes, commit, create, delete, and start) per second that the Docker Engine daemon performs. This metric can help identify usage patterns of containers and can be used to detect any unexpected spikes in container usage. Monitoring this metric can help prevent performance issues due to overloading the Docker Engine daemon.

Engine Daemon Container States Containers

The number of containers in each state (running, paused, or stopped) that the Docker Engine daemon is managing. This metric provides insights into the overall health of the Docker Engine daemon and can be used to detect any unexpected changes in the number of containers in each state. Monitoring this metric can help prevent any issues caused by an unexpected change in the number of containers.

Builder Builds Failed Total

The number of failed builds per second in the Docker Builder service. This metric can help identify any issues with the Docker Builder service and can be used to detect any unexpected spikes in failed builds. Monitoring this metric can help prevent any performance issues caused by an unexpected increase in failed builds.

Engine Daemon Health Checks Failed Total

The number of health checks failed per second in the Docker Engine daemon. This metric can help identify any issues with the health checks performed by the Docker Engine daemon and can be used to detect any unexpected spikes in failed health checks. Monitoring this metric can help prevent any performance issues caused by an unexpected increase in failed health checks.

Swarm Manager Leader

The boolean value that indicates whether or not the Docker Swarm Manager is the leader of the cluster. This metric is important for monitoring the overall health of the cluster and can be used to detect any unexpected changes in the leader of the cluster. Monitoring this metric can help prevent any issues caused by an unexpected change in the leader of the cluster.

Swarm Manager Object Store

The number of objects stored in the Docker Swarm Manager object store, such as nodes, services, tasks, networks, secrets, and configs. This metric can help identify any issues with the object store and can be used to detect any unexpected changes in the number of objects stored in the object store. Monitoring this metric can help prevent any issues caused by an unexpected change in the number of objects stored in the object store.

Swarm Manager Nodes Per State

The number of nodes in each state (ready, down, unknown, and disconnected) that the Docker Swarm Manager is managing. This metric provides insights into the overall health of the Docker Swarm Manager and can be used to detect any unexpected changes in the number of nodes in each state. Monitoring this metric can help prevent any issues caused by an unexpected change in the number of nodes.

Swarm Manager Tasks Per State

The number of tasks in each state (running, failed, ready, rejected, starting, shutdown, new, orphaned, preparing, pending, complete, remove, accepted, and assigned) that the Docker Swarm Manager is managing. This metric provides insights into the overall health of the Docker Swarm Manager and can be used to detect any unexpected changes in the number of tasks in each state. Monitoring this metric can help prevent any issues caused by an unexpected change in the number of tasks.

Get Netdata

Sign up for free

Want to see a demonstration of Netdata for multiple use cases?

Go to Live Demo