Metric Correlations on the Agent
agent, Machine Learning, Troubleshooting Jun 15, 2022
The Netdata Metric Correlations feature uses a Two Sample Kolmogorov-Smirnov test to look for which metrics have a significant distributional change around a highlighted window of interest. This can be useful when you are interested in short term “change detection” and want to try answer the question “what else changed around this time?”.
While this has worked well, it requires the underlying raw input data (all metrics!) to be sent over the network to the cloud service for the statistical test computation to happen. Results are then returned to the Netdata Cloud frontend to filter for only those metrics that have changed the most (according to the statistical test). Obviously, this means some latency while all this data gets sent around, so users could be waiting anywhere from 5 to up to 25 seconds for results in some cases.
So, to give users a faster option, we decided to re-implement the whole algorithm on the Agent itself in C (here is the PR if you’d like to geek out a bit). This means that Netdata Agents now have a new
/api/v1/metric_correlations endpoint that can run the MC algorithm without having to send any data anywhere.
Below you can see the typical latencies on the default cloud based Metric Correlations service range from 5 to 25 seconds.
In comparison we can see that the latencies for the agent based Metric Correlations tend to be between 100 milliseconds to maybe 5 seconds typical upper range.
Following some amazing optimizations by our CEO (yes our CEO!) it is actually more efficient to run MC on the agent than it even was previously to prepare the input data to send to the MC cloud service, as such it has been enabled by default since
This means you can get blazing fast metric correlations, out of the box, just by updating your nodes to the latest nightly release.
Creating the ability to run the Metric Correlations computation on the Netdata Agent is the first step towards cross node Metric Correlations on the Overview tab. We are also planning on implementing various different algorithms as part of metric correlations so you can try a few different approaches as you troubleshoot.