Netdata Agent v1.24: Prometheus/OpenMetrics collector and multi-host database mode

by | Aug 10, 2020 | Blog, Product

The v1.24.0 release of the Netdata Agent brings enhancements to the breadth of metrics we collect with a new Prometheus/OpenMetrics collector and enhanced storage and querying with a new multi-host database mode. Let’s take a look at each of these enhancements.

Instantly access thousands of metrics with the new Prometheus/OpenMetrics collector

This release broadens our commitment to open standards, interoperability, and extensibility with a new generic Prometheus collector that works seamlessly with any application that makes its metrics available in the Prometheus/OpenMetrics exposition format, including support for Windows 10 via windows_exporter. Netdata will autodetect over 600 Prometheus endpoints and instantly generate charts with all the exposed metrics, meaningfully visualized.

You can also quickly and easily configure the collector with the names and URLs of additional Prometheus endpoints to instantly view automatically generated charts with all the exposed metrics, meaningfully visualized within Netdata at the same high-granularity, per-second frequency you expect, all in real time. To learn more about how to configure, check out our documentation.

 

 What is the Prometheus exposition format?

The Prometheus exposition format is a simple, text-based standard for exposing metrics from third-party software. Netdata now has a turnkey collector for any metrics that are exposed using this format.

What is OpenMetrics?

OpenMetrics is a project aimed at evolving the Prometheus exposition format into a universal standard for transmitting metrics at scale, with support for both text representation and Protocol Buffers. It is a CNCF Sandbox Project with a home on GitHub.

Why does it matter for Netdata?

Netdata is designed to be the best tool possible for collecting every single metric available from any system or application, then presenting those metrics in a way that makes them understandable and actionable. Part of that intention means that Netdata must definitionally be open, interoperable, and extensible so that it can work with the entirety of modern infrastructure.

This new enhancement is exciting because it radically extends the number of metrics available to Netdata out of the box, but also because enables Netdata to support an evolving standard that will allow us to continue to fit into any technology stack possible, now and in the future, with no limitations on the number, kind, or frequency of metrics collected.

Limitations and future direction

This first attempt at visualizing Prometheus metrics with zero configuration has a few limitations that we are aware of.

Metrics with the same name and many different label key-value pairs can potentially have a very high cardinality, namely too many time series to visualize in a single chart. We debated automatically splitting such time series into multiple charts based on the cardinality of the detected label key-value pairs, but we decided that using the wrong labels to organize information is worse than not choosing any labels at all. So we opted to release this first version with a temporary solution that will split the time series arbitrarily and are working on providing configuration options to allow users to define meaningful grouping options. For example, for Prometheus metric X, group the time series into charts based on labels Y and Z.

Zero-configuration autodetection currently works for services running on the same host as the Netdata Agent. You can configure additional endpoints as with any Netdata collector. We will be extending our service discovery capabilities so that we can discover as many OpenMetrics endpoints as possible in Docker and Kubernetes.

When you use Netdata’s built-in, long-term storage (dbengine), the memory usage is currently directly related to the number of dimensions collected and stored in the database. We are working to significantly reduce that memory footprint, so we can collect arbitrary numbers of metrics, and store them for arbitrarily long periods of time, with limited memory requirements.

We need your help!

We are very proud of the direction we are taking with service discovery and automated OpenMetrics, but we need your feedback to improve. Let’s exchange ideas about the new collector in our community forum.

New multi-host database mode

While Netdata is renowned for visualizing metrics in real time, it is also possible for Netdata to support long-term storage of per-second metrics. This capability is user-configurable based on available RAM and disk space.

Additionally, Netdata can be configured to support parent-child relationships between nodes. Any Netdata node is able to replicate/mirror its database to another Netdata node by streaming collected metrics to it in real time. This is quite different to data archiving to third party time-series databases.

In our new, multi-host database mode, parent and child nodes share resources in a single instance. Any pre-existing child node metrics remain in the legacy dbengine paths to ensure backward compatibility. To migrate those nodes to the new multi-host database, simply delete those metric cache paths. This new mode supports distributed queries for the Agent as well as specific scenarios like streaming metrics from the child to parent database, streaming multiple child nodes to a single parent, and remembering which child or children are connected to the database even if streaming hasn’t started.

 

This new mode provides better support for cloud-based infrastructure that relies on ephemeral nodes as well as for any infrastructure that utilizes parent-child node relationships.

Contributors

We wish to extend a  special thanks to these contributors:

  • @lassebm for the FreeBSD interface error alarms
  • @Saruspete for fixing the RPM default permissions for /usr/libexec/netdata
  • @Steve8291 for adjusting check-kernel-config.sh to run in bash
  • @bmatheny for adding pihole to the dns app group
  • @tinyhammers for templatizing the health/megacli alarmsd

Be sure to check out the release notes on GitHub for other notable changes and bug fixes.