Metric Correlations on the Agent

Discovering Relationships for Better Troubleshooting Insights

As of v1.35.0 the Netdata Agent can now run Metric Correlations (MC) itself. This means that, for nodes with MC enabled, the Metric Correlations feature just got a whole lot faster!

The Netdata Metric Correlations feature uses a Two Sample Kolmogorov-Smirnov test to look for which metrics have a significant distributional change around a highlighted window of interest. This can be useful when you are interested in short term “change detection” and want to try answer the question “what else changed around this time?”.

Our original implementation was via a Python FastAPI based cloud service that wrapped up the scipy.stats.ks_2samp() function of the popular open source SciPy package.

While this has worked well, it requires the underlying raw input data (all metrics!) to be sent over the network to the cloud service for the statistical test computation to happen. Results are then returned to the Netdata Cloud frontend to filter for only those metrics that have changed the most (according to the statistical test). Obviously, this means some latency while all this data gets sent around, so users could be waiting anywhere from 5 to up to 25 seconds for results in some cases.

So, to give users a faster option, we decided to re-implement the whole algorithm on the Agent itself in C (here is the PR if you’d like to geek out a bit). This means that Netdata Agents now have a new /api/v1/metric_correlations endpoint that can run the MC algorithm without having to send any data anywhere.

Below you can see the typical latencies on the default cloud based Metric Correlations service range from 5 to 25 seconds.

latency based on cloud microservice

In comparison we can see that the latencies for the agent based Metric Correlations tend to be between 100 milliseconds to maybe 5 seconds typical upper range.

latency based on agent

Getting started

Following some amazing optimizations by our CEO (yes our CEO!) it is actually more efficient to run MC on the agent than it even was previously to prepare the input data to send to the MC cloud service, as such it has been enabled by default since v1.35.0-22-nightly.

This means you can get blazing fast metric correlations, out of the box, just by updating your nodes to the latest nightly release.

Coming soon

Creating the ability to run the Metric Correlations computation on the Netdata Agent is the first step towards cross node Metric Correlations on the Overview tab. We are also planning on implementing various different algorithms as part of metric correlations so you can try a few different approaches as you troubleshoot.

Learn more

If you would like to learn more, check out the Metric Correlations documentation or feel free to leave some feedback in our beta launch community post or come chat in our discord.