The only agent that thinks for itself

Autonomous Monitoring with self-learning AI built-in, operating independently across your entire stack.

Unlimited Metrics & Logs
Machine learning & MCP
5% CPU, 150MB RAM
3GB disk, >1 year retention
800+ integrations, zero config
Dashboards, alerts out of the box
> Discover Netdata Agents
Centralized metrics streaming and storage

Aggregate metrics from multiple agents into centralized Parent nodes for unified monitoring across your infrastructure.

Stream from unlimited agents
Long-term data retention
High availability clustering
Data replication & backup
Scalable architecture
Enterprise-grade security
> Learn about Parents
Fully managed cloud platform

Access your monitoring data from anywhere with our SaaS platform. No infrastructure to manage, automatic updates, and global availability.

Zero infrastructure management
99.9% uptime SLA
Global data centers
Automatic updates & patches
Enterprise SSO & RBAC
SOC2 & ISO certified
> Explore Netdata Cloud
Deploy Netdata Cloud in your infrastructure

Run the full Netdata Cloud platform on-premises for complete data sovereignty and compliance with your security policies.

Complete data sovereignty
Air-gapped deployment
Custom compliance controls
Private network integration
Dedicated support team
Kubernetes & Docker support
> Learn about Cloud On-Premises
Powerful, intuitive monitoring interface

Modern, responsive UI built for real-time troubleshooting with customizable dashboards and advanced visualization capabilities.

Real-time chart updates
Customizable dashboards
Dark & light themes
Advanced filtering & search
Responsive on all devices
Collaboration features
> Explore Netdata UI
Monitor on the go

Native iOS and Android apps bring full monitoring capabilities to your mobile device with real-time alerts and notifications.

iOS & Android apps
Push notifications
Touch-optimized interface
Offline data access
Biometric authentication
Widget support
> Download apps

Best energy efficiency

True real-time per-second

100% automated zero config

Centralized observability

Multi-year retention

High availability built-in

Zero maintenance

Always up-to-date

Enterprise security

Complete data control

Air-gap ready

Compliance certified

Millisecond responsiveness

Infinite zoom & pan

Works on any device

Native performance

Instant alerts

Monitor anywhere

80% Faster Incident Resolution
AI-powered troubleshooting from detection, to root cause and blast radius identification, to reporting.
True Real-Time and Simple, even at Scale
Linearly and infinitely scalable full-stack observability, that can be deployed even mid-crisis.
90% Cost Reduction, Full Fidelity
Instead of centralizing the data, Netdata distributes the code, eliminating pipelines and complexity.
Control Without Surrender
SOC 2 Type 2 certified with every metric kept on your infrastructure.
Integrations

800+ collectors and notification channels, auto-discovered and ready out of the box.

800+ data collectors
Auto-discovery & zero config
Cloud, infra, app protocols
Notifications out of the box
> Explore integrations
Real Results
46% Cost Reduction

Reduced monitoring costs by 46% while cutting staff overhead by 67%.

— Leonardo Antunez, Codyas

Zero Pipeline

No data shipping. No central storage costs. Query at the edge.

From Our Users
"Out-of-the-Box"

So many out-of-the-box features! I mostly don't have to develop anything.

— Simon Beginn, LANCOM Systems

No Query Language

Point-and-click troubleshooting. No PromQL, no LogQL, no learning curve.

Enterprise Ready
67% Less Staff, 46% Cost Cut

Enterprise efficiency without enterprise complexity—real ROI from day one.

— Leonardo Antunez, Codyas

SOC 2 Type 2 Certified

Zero data egress. Only metadata reaches the cloud. Your metrics stay on your infrastructure.

Full Coverage
800+ Collectors

Auto-discovered and configured. No manual setup required.

Any Notification Channel

Slack, PagerDuty, Teams, email, webhooks—all built-in.

Built for the People Who Get Paged
Because 3am alerts deserve instant answers, not hour-long hunts.
Every Industry Has Rules. We Master Them.
See how healthcare, finance, and government teams cut monitoring costs 90% while staying audit-ready.
Monitor Any Technology. Configure Nothing.
Install the agent. It already knows your stack.
From Our Users
"A Rare Unicorn"

Netdata gives more than you invest in it. A rare unicorn that obeys the Pareto rule.

— Eduard Porquet Mateu, TMB Barcelona

99% Downtime Reduction

Reduced website downtime by 99% and cloud bill by 30% using Netdata alerts.

— Falkland Islands Government

Real Savings
30% Cloud Cost Reduction

Optimized resource allocation based on Netdata alerts cut cloud spending by 30%.

— Falkland Islands Government

46% Cost Cut

Reduced monitoring staff by 67% while cutting operational costs by 46%.

— Codyas

Real Coverage
"Plugin for Everything"

Netdata has agent capacity or a plugin for everything, including Windows and Kubernetes.

— Eduard Porquet Mateu, TMB Barcelona

"Out-of-the-Box"

So many out-of-the-box features! I mostly don't have to develop anything.

— Simon Beginn, LANCOM Systems

Real Speed
Troubleshooting in 30 Seconds

From 2-3 minutes to 30 seconds—instant visibility into any node issue.

— Matthew Artist, Nodecraft

20% Downtime Reduction

20% less downtime and 40% budget optimization from out-of-the-box monitoring.

— Simon Beginn, LANCOM Systems

Pay per Node. Unlimited Everything Else.

One price per node. Unlimited metrics, logs, users, and retention. No per-GB surprises.

Free tier—forever
No metric limits or caps
Retention you control
Cancel anytime
> See pricing plans
What's Your Monitoring Really Costing You?

Most teams overpay by 40-60%. Let's find out why.

Expose hidden metric charges
Calculate tool consolidation
Customers report 30-67% savings
Results in under 60 seconds
> See what you're really paying
Your Infrastructure Is Unique. Let's Talk.

Because monitoring 10 nodes is different from monitoring 10,000.

On-prem & air-gapped deployment
Volume pricing & agreements
Architecture review for your scale
Compliance & security support
> Start a conversation
Monitoring That Sells Itself

Deploy in minutes. Impress clients in hours. Earn recurring revenue for years.

30-second live demos close deals
Zero config = zero support burden
Competitive margins & deal protection
Response in 48 hours
> Apply to partner
Per-Second Metrics at Homelab Prices

Same engine, same dashboards, same ML. Just priced for tinkerers.

Community: Free forever · 5 nodes · non-commercial
Homelab: $90/yr · unlimited nodes · fair usage
> Start monitoring your lab—free
$1,000 Per Referral. Unlimited Referrals.

Your colleagues get 10% off. You get 10% commission. Everyone wins.

10% of subscriptions, up to $1,000 each
Track earnings inside Netdata Cloud
PayPal/Venmo payouts in 3-4 weeks
No caps, no complexity
> Get your referral link
Cost Proof
40% Budget Optimization

"Netdata's significant positive impact" — LANCOM Systems

Calculate Your Savings

Compare vs Datadog, Grafana, Dynatrace

Savings Proof
46% Cost Reduction

"Cut costs by 46%, staff by 67%" — Codyas

30% Cloud Bill Savings

"Reduced cloud bill by 30%" — Falkland Islands Gov

Enterprise Proof
"Better Than Combined Alternatives"

"Better observability with Netdata than combining other tools." — TMB Barcelona

Real Engineers, <24h Response

DPA, SLAs, on-prem, volume pricing

Why Partners Win
Demo Live Infrastructure

One command, 30 seconds, real data—no sandbox needed

Zero Tickets, High Margins

Auto-config + per-node pricing = predictable profit

Homelab Ready
"Absolutely Incredible"

"We tested every monitoring system under the sun." — Benjamin Gabler, CEO Rocket.Net

76k+ GitHub Stars

3rd most starred monitoring project

Worth Recommending
Product That Delivers

Customers report 40-67% cost cuts, 99% downtime reduction

Zero Risk to Your Rep

Free tier lets them try before they buy

Never Fight Fires Alone

Docs, community, and expert help—pick your path to resolution.

Learn.netdata.cloud docs
Discord, Forums, GitHub
Premium support available
> Get answers now
60 Seconds to First Dashboard

One command to install. Zero config. 850+ integrations documented.

Linux, Windows, K8s, Docker
Auto-discovers your stack
> Read our documentation
See Netdata in Action

Watch real-time monitoring in action—demos, tutorials, and engineering deep dives.

Product demos and walkthroughs
Real infrastructure, not staged
> Start with the 3-minute tour
Level Up Your Monitoring
Real problems. Real solutions. 112+ guides from basic monitoring to AI observability.
76,000+ Engineers Strong
615+ contributors. 1.5M daily downloads. One mission: simplify observability.
Per-Second. 90% Cheaper. Data Stays Home.
Side-by-side comparisons: costs, real-time granularity, and data sovereignty for every major tool.

See why teams switch from Datadog, Prometheus, Grafana, and more.

> Browse all comparisons
Edge-Native Observability, Born Open Source
Per-second visibility, ML on every metric, and data that never leaves your infrastructure.
Founded in 2016
615+ contributors worldwide
Remote-first, engineering-driven
Open source first
> Read our story
Promises We Publish—and Prove
12 principles backed by open code, independent validation, and measurable outcomes.
Open source, peer-reviewed
Zero config, instant value
Data sovereignty by design
Aligned pricing, no surprises
> See all 12 principles
Edge-Native, AI-Ready, 100% Open
76k+ stars. Full ML, AI, and automation—GPLv3+, not premium add-ons.
76,000+ GitHub stars
GPLv3+ licensed forever
ML on every metric, included
Zero vendor lock-in
> Explore our open source
Build Real-Time Observability for the World
Remote-first team shipping per-second monitoring with ML on every metric.
Remote-first, fully distributed
Open source (76k+ stars)
Challenging technical problems
Your code on millions of systems
> See open roles
Talk to a Netdata Human in <24 Hours
Sales, partnerships, press, or professional services—real engineers, fast answers.
Discuss your observability needs
Pricing and volume discounts
Partnership opportunities
Media and press inquiries
> Book a conversation
Your Data. Your Rules.
On-prem data, cloud control plane, transparent terms.
Trust & Scale
76,000+ GitHub Stars

One of the most popular open-source monitoring projects

SOC 2 Type 2 Certified

Enterprise-grade security and compliance

Data Sovereignty

Your metrics stay on your infrastructure

Validated
University of Amsterdam

"Most energy-efficient monitoring solution" — ICSOC 2023, peer-reviewed

ADASTEC (Autonomous Driving)

"Doesn't miss alerts—mission-critical trust for safety software"

Community Stats
615+ Contributors

Global community improving monitoring for everyone

1.5M+ Downloads/Day

Trusted by teams worldwide

GPLv3+ Licensed

Free forever, fully open source agent

Why Join?
Remote-First

Work from anywhere, async-friendly culture

Impact at Scale

Your work helps millions of systems

Compliance
SOC 2 Type 2

Audited security controls

GDPR Ready

Data stays on your infrastructure

Blog

How Netdata's ML-based Anomaly Detection Works

Advanced Algorithms for Proactive System Health Management
by Andrew Maguire · May 23, 2023

title image

How does Netdata’s machine learning (ML) based anomaly detection actually work? Read on to find out!

Design considerations

Lets first start with some of the key design considerations and principles of Netdata’s anomaly detection (and some comments in parenthesis along the way):

  1. We don’t have any labels or examples of previous anomalies. This means we are in an unsupervised setting (best we can try to do is learn what “normal” data looks like assuming the collected data is “mostly” normal).
  2. Needs to be lightweight and run on the agent (or a parent).
    • Need to be very careful of impact on CPU overhead when training and scoring (lots of cheap models are better than a few expensive and heavy ones).
    • Models themselves need to be small so as to not drastically increase the agents memory footprint (model objects need to be small for storage).
    • This has implications for the ML formulation (sorry - no deep learning models yet) and its implementation (we need to be surgical and optimized).
  3. Needs to scale for thousands of metrics and score in realtime every second as metrics are collected.
    • Typical Netdata nodes have thousands of metrics and we want to be able to score every metric every second with minimal latency overhead (we need to use sensible approaches to training like spreading the training cost over a wide training window).
  4. Needs to be able to handle a wide variety of metrics.
    • There is no single perfect model or approach for all types of metrics so we need a good all rounder that can work well enough across any and all different types of time seres metrics (for any given metric of course you could handcraft a better model but thats not feasible here, we need something like a “weak learners” approach of lots of generally useful models adding up to “more than the sum of their parts”).
  5. Needs to be written in C or C++ as that is the language of the Netdata agent (We are using dlib for the current implementation).
  6. We Need to be very careful about taking big or complex dependencies if using third party libraries.
    • We want to be able to easily build and deploy Netdata on any Linux system without having to worry about installing or managing complex dependencies (we need to be careful of more complex algorithms that would have larger dependencies and potentially limit where Netdata can run).

The above considerations are important and useful to keep in mind as we explore the system in more detail.

Two Important Concepts

Key to understanding how anomaly detection works in Netdata are two important concepts:

  1. Anomaly Bit: The anomaly bit is the 0 (normal) or 1 (anomalous) generated each time a metric is collected.
  2. Anomaly Rate: The anomaly rate is the percentage of anomaly bits that are 1 (anomalous) over a given time window and/or set of metrics.

Anomaly Bit

The core concept underlying how anomaly detection works in Netdata is the “Anomaly Bit”. The anomaly bit is the basic lego block on which everything else is built.

The simplest way to think about the anomaly bit is that in addition to Netdata collecting a raw metric each second (e.g. cpu usage) if also generates a corresponding anomaly bit. The anomaly bit is a single bit (literally it is a bit in the internal storage representation of the agent) that is either 0 (normal) or 1 (anomalous) based on the pattern of very recent last few collected values for the metric (more on this later).

To be as clear as possible - here you can see some recent raw user system cpu metrics from one of our demo nodes. It should look something like this:

{
        "labels":["time","user"],
        "data":[
            [1684852570,0.7518797],
            [1684852569,0.5076142],
            [1684852568,1.2531328],
            [1684852567,0.75],
            [1684852566,0.2518892],
            [1684852565,0.5050505],
            [1684852564,0.5076142],
            [1684852563,0.7575758],
            [1684852562,0.5089059],
            [1684852561,0.5089059]
        ]
    }

Here (by adding the options=anomaly-bit param) are the corresponding anomaly bits (probably mostly all 0 in this example):

{
        "labels":["time","user"],
        "data":[
            [1684852570,0],
            [1684852569,0],
            [1684852568,0],
            [1684852567,0],
            [1684852566,0],
            [1684852565,0],
            [1684852564,0],
            [1684852563,0],
            [1684852562,0],
            [1684852561,0]
        ]
    }

Anomaly Rate

The beauty of the anomaly bit is that as soon as we aggregate beyond 1 second in time (or for that matter if you aggregate across multiple metrics or nodes but for the same point in time, you can ignore this for now) the anomaly bit turns into an “Anomaly Rate”.

It is literally the percentage of time in a window that the Netdata agent considered the raw metric values to be anomalous.

So in the same way that Netdata, in addition to collecting each raw metric also has a corresponding “Anomaly Bit”, for each time series in Netdata there is also a corresponding “Anomaly Rate” time series for that metric.

Again an example might help - here is the same user system cpu metric from earlier for the last 60 seconds but aggregated to 10 second intervals.

{
        "labels":["time","user"],
        "data":[
            [1684852770,0.5797338],
            [1684852760,0.7823281],
            [1684852750,0.6829874],
            [1684852740,0.7056189],
            [1684852730,0.9331536],
            [1684852720,0.681133]
        ]
    }

And here is the corresponding anomaly rate for this metric.

{
        "labels":["time","user"],
        "data":[
            [1684852770,0],
            [1684852760,0],
            [1684852750,0],
            [1684852740,0],
            [1684852730,0],
            [1684852720,0]
        ]
    }

So the last building block then is the concept of the “Anomaly Rate” which really is just some (typically average) aggregation over a set of anomaly bits.

Congratulations! We now have all the building blocks needed to understand how anomaly detection works in Netdata.

So how does the ML actually work then?

Now that we understand the anomaly bit and anomaly rate concepts, all that is left is to understand how the anomaly bit itself gets flipped from its default of 0 (normal) to 1 (anomalous). This is where the ML comes in.

As of writing (and we expect this will change over time as we work on the ML), Netdata uses an unsupervised anomaly detection approach based on K-means clustering under the hood (k=2 for the hard core ML folks reading this).

The basic idea is that we train (and regularly retrain) K-means model(s) for each metric. As raw metrics are collected they are scored against a set of trained models (spanning for example the last 24 hours - discussed in this blog post), if a metric is anomalous on all trained models then the anomaly bit is flipped to 1 (anomalous). This is based on an internal score that essentially is the euclidean distance of the most recent data to the cluster centers of the trained models.

If you would like to dive deeper checkout this deepdive notebook, this slide deck or this YouTube Video.

Click see some slides illustrating how we go from raw metrics to anomaly rate.

processing raw data

feature vectors

anomaly score

anomaly bit

anomaly rate

Netdata Anomaly Detection - Slides

Putting it all together

So now we have all the pieces to understand how anomaly detection works in Netdata. Let’s see where the various parts of the Netdata UI fit in.

Anomaly Rate Ribbon

The anomaly rate ribbon sits at the top of every netdata chart. Brighter spots in the ribbon correspond to windows of time where one or more metrics on the chart were anomalous.

anomaly rate ribbon

You can zoom and and hover on the ribbon to see what the AR% was for that window of time for each metric.

anomaly rate ribbon hover

Node Anomaly Rate Chart

The node anomaly rate chart (anomaly_detection.anomaly_rate) gives an overall anomaly rate across all metrics and nodes as a single high level time series.

node ar chart

You can of course group by node to see the anomaly rate for each node.

node ar chart group by node

Anomaly Advisor

The Anomaly Advisor (anomalies tab) builds on the node anomaly rate and lets you drill down into the metrics that are contributing to the overall anomaly rate.

anomaly advisor

You can highlight and area to see an ordered list of the most anomalous metrics in that window of time.

anomaly advisor results

Learn More!

Stay tuned as we plan on doing a series of posts like this to dive deeper into how ML works in Netdata and various related topics from tips and tricks to configuration options and more. In the meantime below is a list of useful links and resources you might be interested in if you have read this far 🙂