The only agent that thinks for itself

Autonomous Monitoring with self-learning AI built-in, operating independently across your entire stack.

Unlimited Metrics & Logs
Machine learning & MCP
5% CPU, 150MB RAM
3GB disk, >1 year retention
800+ integrations, zero config
Dashboards, alerts out of the box
> Discover Netdata Agents

Centralized metrics streaming and storage

Aggregate metrics from multiple agents into centralized Parent nodes for unified monitoring across your infrastructure.

Stream from unlimited agents
Long-term data retention
High availability clustering
Data replication & backup
Scalable architecture
Enterprise-grade security
> Learn about Parents

Fully managed cloud platform

Access your monitoring data from anywhere with our SaaS platform. No infrastructure to manage, automatic updates, and global availability.

Zero infrastructure management
99.9% uptime SLA
Global data centers
Automatic updates & patches
Enterprise SSO & RBAC
SOC2 & ISO certified
> Explore Netdata Cloud

Deploy Netdata Cloud in your infrastructure

Run the full Netdata Cloud platform on-premises for complete data sovereignty and compliance with your security policies.

Complete data sovereignty
Air-gapped deployment
Custom compliance controls
Private network integration
Dedicated support team
Kubernetes & Docker support
> Learn about Cloud On-Premises

Powerful, intuitive monitoring interface

Modern, responsive UI built for real-time troubleshooting with customizable dashboards and advanced visualization capabilities.

Real-time chart updates
Customizable dashboards
Dark & light themes
Advanced filtering & search
Responsive on all devices
Collaboration features
> Explore Netdata UI

Monitor on the go

Native iOS and Android apps bring full monitoring capabilities to your mobile device with real-time alerts and notifications.

iOS & Android apps
Push notifications
Touch-optimized interface
Offline data access
Biometric authentication
Widget support
> Download apps

Best energy efficiency

True real-time per-second

100% automated zero config

Centralized observability

Multi-year retention

High availability built-in

Zero maintenance

Always up-to-date

Enterprise security

Complete data control

Air-gap ready

Compliance certified

Millisecond responsiveness

Infinite zoom & pan

Works on any device

Native performance

Instant alerts

Monitor anywhere

80% Faster Incident Resolution

AI-powered troubleshooting from detection, to root cause and blast radius identification, to reporting.

True Real-Time and Simple, even at Scale

Linearly and infinitely scalable full-stack observability, that can be deployed even mid-crisis.

90% Cost Reduction, Full Fidelity

Instead of centralizing the data, Netdata distributes the code, eliminating pipelines and complexity.

Control Without Surrender

SOC 2 Type 2 certified with every metric kept on your infrastructure.

Integrations

800+ collectors and notification channels, auto-discovered and ready out of the box.

800+ data collectors
Auto-discovery & zero config
Cloud, infra, app protocols
Notifications out of the box
> Explore integrations
Real Results
46% Cost Reduction

Reduced monitoring costs by 46% while cutting staff overhead by 67%.

— Leonardo Antunez, Codyas

Zero Pipeline

No data shipping. No central storage costs. Query at the edge.

From Our Users
"Out-of-the-Box"

So many out-of-the-box features! I mostly don't have to develop anything.

— Simon Beginn, LANCOM Systems

No Query Language

Point-and-click troubleshooting. No PromQL, no LogQL, no learning curve.

Enterprise Ready
67% Less Staff, 46% Cost Cut

Enterprise efficiency without enterprise complexity—real ROI from day one.

— Leonardo Antunez, Codyas

SOC 2 Type 2 Certified

Zero data egress. Only metadata reaches the cloud. Your metrics stay on your infrastructure.

Full Coverage
800+ Collectors

Auto-discovered and configured. No manual setup required.

Any Notification Channel

Slack, PagerDuty, Teams, email, webhooks—all built-in.

Built for the People Who Get Paged

Because 3am alerts deserve instant answers, not hour-long hunts.

Every Industry Has Rules. We Master Them.

See how healthcare, finance, and government teams cut monitoring costs 90% while staying audit-ready.

Monitor Any Technology. Configure Nothing.

Install the agent. It already knows your stack.
From Our Users
"A Rare Unicorn"

Netdata gives more than you invest in it. A rare unicorn that obeys the Pareto rule.

— Eduard Porquet Mateu, TMB Barcelona

99% Downtime Reduction

Reduced website downtime by 99% and cloud bill by 30% using Netdata alerts.

— Falkland Islands Government

Real Savings
30% Cloud Cost Reduction

Optimized resource allocation based on Netdata alerts cut cloud spending by 30%.

— Falkland Islands Government

46% Cost Cut

Reduced monitoring staff by 67% while cutting operational costs by 46%.

— Codyas

Real Coverage
"Plugin for Everything"

Netdata has agent capacity or a plugin for everything, including Windows and Kubernetes.

— Eduard Porquet Mateu, TMB Barcelona

"Out-of-the-Box"

So many out-of-the-box features! I mostly don't have to develop anything.

— Simon Beginn, LANCOM Systems

Real Speed
Troubleshooting in 30 Seconds

From 2-3 minutes to 30 seconds—instant visibility into any node issue.

— Matthew Artist, Nodecraft

20% Downtime Reduction

20% less downtime and 40% budget optimization from out-of-the-box monitoring.

— Simon Beginn, LANCOM Systems

Pay per Node. Unlimited Everything Else.

One price per node. Unlimited metrics, logs, users, and retention. No per-GB surprises.

Free tier—forever
No metric limits or caps
Retention you control
Cancel anytime
> See pricing plans

What's Your Monitoring Really Costing You?

Most teams overpay by 40-60%. Let's find out why.

Expose hidden metric charges
Calculate tool consolidation
Customers report 30-67% savings
Results in under 60 seconds
> See what you're really paying

Your Infrastructure Is Unique. Let's Talk.

Because monitoring 10 nodes is different from monitoring 10,000.

On-prem & air-gapped deployment
Volume pricing & agreements
Architecture review for your scale
Compliance & security support
> Start a conversation

Monitoring That Sells Itself

Deploy in minutes. Impress clients in hours. Earn recurring revenue for years.

30-second live demos close deals
Zero config = zero support burden
Competitive margins & deal protection
Response in 48 hours
> Apply to partner

Per-Second Metrics at Homelab Prices

Same engine, same dashboards, same ML. Just priced for tinkerers.

Community: Free forever · 5 nodes · non-commercial
Homelab: $90/yr · unlimited nodes · fair usage
> Start monitoring your lab—free

$1,000 Per Referral. Unlimited Referrals.

Your colleagues get 10% off. You get 10% commission. Everyone wins.

10% of subscriptions, up to $1,000 each
Track earnings inside Netdata Cloud
PayPal/Venmo payouts in 3-4 weeks
No caps, no complexity
> Get your referral link
Cost Proof
40% Budget Optimization

"Netdata's significant positive impact" — LANCOM Systems

Calculate Your Savings

Compare vs Datadog, Grafana, Dynatrace

Savings Proof
46% Cost Reduction

"Cut costs by 46%, staff by 67%" — Codyas

30% Cloud Bill Savings

"Reduced cloud bill by 30%" — Falkland Islands Gov

Enterprise Proof
"Better Than Combined Alternatives"

"Better observability with Netdata than combining other tools." — TMB Barcelona

Real Engineers, <24h Response

DPA, SLAs, on-prem, volume pricing

Why Partners Win
Demo Live Infrastructure

One command, 30 seconds, real data—no sandbox needed

Zero Tickets, High Margins

Auto-config + per-node pricing = predictable profit

Homelab Ready
"Absolutely Incredible"

"We tested every monitoring system under the sun." — Benjamin Gabler, CEO Rocket.Net

76k+ GitHub Stars

3rd most starred monitoring project

Worth Recommending
Product That Delivers

Customers report 40-67% cost cuts, 99% downtime reduction

Zero Risk to Your Rep

Free tier lets them try before they buy

Never Fight Fires Alone

Docs, community, and expert help—pick your path to resolution.

Learn.netdata.cloud docs
Discord, Forums, GitHub
Premium support available
> Get answers now

60 Seconds to First Dashboard

One command to install. Zero config. 850+ integrations documented.

Linux, Windows, K8s, Docker
Auto-discovers your stack
> Read our documentation

See Netdata in Action

Watch real-time monitoring in action—demos, tutorials, and engineering deep dives.

Product demos and walkthroughs
Real infrastructure, not staged
> Start with the 3-minute tour

Level Up Your Monitoring

Real problems. Real solutions. 112+ guides from basic monitoring to AI observability.

76,000+ Engineers Strong

615+ contributors. 1.5M daily downloads. One mission: simplify observability.

Per-Second. 90% Cheaper. Data Stays Home.

Side-by-side comparisons: costs, real-time granularity, and data sovereignty for every major tool.

See why teams switch from Datadog, Prometheus, Grafana, and more.

> Browse all comparisons
Edge-Native Observability, Born Open Source
Per-second visibility, ML on every metric, and data that never leaves your infrastructure.
Founded in 2016
615+ contributors worldwide
Remote-first, engineering-driven
Open source first
> Read our story
Promises We Publish—and Prove
12 principles backed by open code, independent validation, and measurable outcomes.
Open source, peer-reviewed
Zero config, instant value
Data sovereignty by design
Aligned pricing, no surprises
> See all 12 principles
Edge-Native, AI-Ready, 100% Open
76k+ stars. Full ML, AI, and automation—GPLv3+, not premium add-ons.
76,000+ GitHub stars
GPLv3+ licensed forever
ML on every metric, included
Zero vendor lock-in
> Explore our open source
Build Real-Time Observability for the World
Remote-first team shipping per-second monitoring with ML on every metric.
Remote-first, fully distributed
Open source (76k+ stars)
Challenging technical problems
Your code on millions of systems
> See open roles
Talk to a Netdata Human in <24 Hours
Sales, partnerships, press, or professional services—real engineers, fast answers.
Discuss your observability needs
Pricing and volume discounts
Partnership opportunities
Media and press inquiries
> Book a conversation
Your Data. Your Rules.
On-prem data, cloud control plane, transparent terms.
Trust & Scale
76,000+ GitHub Stars

One of the most popular open-source monitoring projects

SOC 2 Type 2 Certified

Enterprise-grade security and compliance

Data Sovereignty

Your metrics stay on your infrastructure

Validated
University of Amsterdam

"Most energy-efficient monitoring solution" — ICSOC 2023, peer-reviewed

ADASTEC (Autonomous Driving)

"Doesn't miss alerts—mission-critical trust for safety software"

Community Stats
615+ Contributors

Global community improving monitoring for everyone

1.5M+ Downloads/Day

Trusted by teams worldwide

GPLv3+ Licensed

Free forever, fully open source agent

Why Join?
Remote-First

Work from anywhere, async-friendly culture

Impact at Scale

Your work helps millions of systems

Compliance
SOC 2 Type 2

Audited security controls

GDPR Ready

Data stays on your infrastructure

Blog

Our Approach to Machine Learning

How Machine Learning Powers Enhanced Monitoring Solutions
by Andrew Maguire · March 25, 2022

There is a lot of buzz in the world of machine learning (ML) and as a layperson it can be hard to keep up with it all. Therefore, we decided to write down some of our thoughts and musings on how we are approaching ML at Netdata.

Our Approach to Machine Learning (ML)

We’ll touch on the current state of applied ML in industry in general, and zoom in on ML in the monitoring industry. We’ll discuss how we can leverage “good honest ML” to punch above our weight and add some useful and novel features for our users over the next few years.

No nonsense, ever

This is first for a reason. Too many companies in too many industries try to mystify and oversell ML-based features. These narratives are used to generate hype but they put up artificial barriers to a wider understanding how ML actually works. Ideally, we would like to end up in a world where ML is just another tool in the shed. Therefore, we will actually explain in detail how our ML works under the hood. Knowledge is the best way to empower our community to understand the cases where various ML-based features might not be so reliable and cases where they can be very useful.

Whenever I see competitors or new startups in the observability and monitoring space putting “AIOps” as a central selling point, I get curious and try to dig deeper.  I visit their docs and actually try to understand: (1) How are they actually formulating the problem to make it amenable to ML? (2) What ML techniques are they actually using? 

9 times out of 10 I am left none the wiser, even though I have been an ML practitioner of many years.

In contrast, if you want to learn a little about how our unsupervised anomaly detection works, you can just check out the README.md right next to the code. The first few paragraphs straight off the bat tell you the algorithm is based on good old kmeans clustering. In the notes there are further presentations (feel free to add a comment) and even a python-based colab notebook to go into more detail if you are so inclined.

Companies should rejoice and take pride in openly explaining how their ML works. Laying out the various pros and cons of their approach helps users be more informed when they use these features.  

Observability is actually behind when it comes to ML

When you compare the observability industry to other industries like advertising, finance, or technology in general, observability is actually quite far behind when it comes to leveraging ML as a core building block. A lot of what people call “AIOps” is actually just “fancy plumbing” (auto discovery, open standards for integrations, good solid data engineering etc). Fancy plumbing is very important and impressive in its own right (it’s a core part of what Netdata Cloud does) but it is not ML.

There are some exceptions to the rule here. More and more companies in the space are building out ML-driven capabilities but it’s still considered special or novel to leverage ML in the observability space. Whereas in industries like advertising or finance, concepts like click-through rate prediction or churn prediction are core to what those businesses actually do.

Monitoring agents don’t need to be dumb

It seems like a big missed opportunity that we have all these monitoring agents that just blindly collect data and pass it on. What a waste. Sure this makes sense if all that data lands in some centralized cloud store. There, you can take the “kitchen sink” approach and throw all popular ML algorithms at the problem, so long as you can eat the cloud costs or pass them on.

What if, instead, the agent could actually not just blindly pass on the data but also learn a little from it as it sees the data? These insights might be useful when things go wrong. Or, even more ambitious, what if that agent could actually “do some ML” when you need it to help you solve a problem?

We can think of two main ways to implement this “always-on ML” vs “push-button ML”. 

  • always-on ML - ML that runs continually on the agent as data flows through it. An obvious example here is running anomaly detection on the agent itself, implemented as cheap and efficiently as possible.
  • push-button ML - This scenario works best when you have a specific use case where you would like to “ask” the agent for some answer about your data. The agent would then run the relevant ML algorithms to get that answer. For example, at Netdata, we are currently implementing a push-button ML approach to move our Metric Correlations from a cloud-based service to a new endpoint on the agent itself. Another example might be clustering all your metrics on demand so that you can see which metrics “naturally” “group together” in some sense.

There are obvious limitations to how much work you can push onto a monitoring agent before you risk the agent itself taking too many resources from the system it is supposed to be monitoring. Many ML-based features on the agent will initially be opt-in, allowing you to enable these more advanced capabilities as your node allows. To provide even more flexibility, you could set up streaming, so that the ML-processing happens on parent agents only. Ultimately, we want to give you as many options as possible to avoid huge cloud-based centralization costs that other tools often assume in their approach.

Beyond raw metrics

Metrics are one of the pillars of observability and as such, a lot of people and companies focus on them. But what a raw metric represents often requires a lot of manual context to come from somewhere (What container was it running in? What application does it actually relate to? And so on…). For example, at Netdata we group metrics into semantic contexts and families which is how they are then visualized on charts and in menus. 

What if we go beyond the raw metric alone as a core building block and introduce some notion of “strangeness” along with the raw value itself? When done right, this can provide one form of, potentially useful, context right out of the box. Based on past observations, does this raw metric look somewhat expected or unexpected?

Our first attempt at this is called the “Anomaly Bit”, which is basically a 0 if the recent raw metrics look normal enough or a 1 if they look sufficiently different from the model’s definition of “normal”.

We’ve recently added the capability for the Netdata agent to produce an “anomaly bit” in addition to each raw metric value every second, with no extra storage overhead and typically negligible CPU cost. For example, have a look at some recent raw CPU metrics from one of our demo servers, and their corresponding anomaly bits (they are probably mostly 0 assuming all is normal on the demo server, but you may see some 100’s). When you start aggregating anomaly bits beyond 1 second (to 5, 10 second etc), you will get  an “anomaly rate” for every metric out of the box. Here are the corresponding anomaly rates for each CPU dimension above (note that in the URL we have added points=1 to just average all the underlying anomaly bit’s into a single one number anomaly rate for each metric).

The big idea here is that you can look at your raw metrics as usual, and see the corresponding anomaly rates alongside them. That additional context helps you decide if you are looking at normal and expected metrics or if something is maybe a bit strange that might merit further investigation.

Innovate but resist the urge to be too fancy too soon

Building on the idea that observability is actually still quite early on in its ML journey, there is often little need to get too excited, fancy or complex with what ML-based features you explore first. 

“Good honest ML” is the term we use internally in this regard. You can go a long way with relatively simple, well understood algorithms and approaches before you need to get complex and implement the latest hyped deep learning model (which we do of course want to get to, but only when its time and we have built a useful v1 of the solution first).

Starting with the simplest, but still useful, approach will help you get to the, often harder, UX challenges and struggles. This means that you end up with a simple, understandable baseline on which to build initial features. If you can’t make a useful enough MVP feature based on fairly simple ML approaches, that’s a sign that there may be something more fundamentally wrong with how you are approaching the problem.

Once you get traction and prove the approach has merit, you can then get as fancy as you want later in terms of the ML running “under the hood”.

Human in the loop

Why would we pretend that the ML actually “understands” the data? It doesn’t. At least not without a lot of effort encoding the semantic meaning of the data which at best is still very much an open research area. Anyone promising you that it does “understand” is usually selling you snake oil or at best maybe does not really understand how it works themselves.

Instead, it seems more obvious to focus on the UX of it all. After all, especially in a monitoring system, it’s going to be the human that will decide if the ML insights point to an anomaly, or if no action is required. If the ML shows you something useful even 1 in 10 times that should be considered amazing. The challenge lies in the implementation of the UX for the other 9 times the ML was wrong or not so useful. In this regard we should aspire to make the experience as easy and painless as possible. 

Lego blocks

We want ML to be just another “part of the furniture” of Netdata. As such, our aim will be on small, discrete ML features or functionality that can play well with other parts of Netdata. The end goal is to empower users with a whole toolbox of different ML features that they might bring to bear when troubleshooting, and feel confident and educated on when to maybe try one over the other and usefully interpret the results.

For example, we are working on making the “anomaly bit” available to the health engine. A user could easily derive an alert based on the anomaly rate of a metric (or group of metrics) in addition to just triggering based on traditional hard coded rules evaluated against the raw metric values. 

In the stylized example below, let’s say you always run your CPU usage steadily around 75%. You could manually configure and alert once it goes outside some upper and lower threshold. But often data turns anomalous in ways you could not have foreseen - so the idea is that you could just set an alert based on the anomaly rate. If the anomaly rate corresponding to the CPU metric (the orange line) passes say 50% and stays elevated for long enough, you would be alerted. So we still have thresholds, but the thresholds are set against the anomaly rate which can capture more naturally all the ways your data might turn anomalous instead of you having to think of a rule for every scenario. It may even be that the new behaviour represents some new normal following a code change you made, in which case you may want to see the impact of the change but also have the ML retrain to learn the new pattern for what’s considered normal, all without having to change any alert rules. This is the essence of ML: Instead of having to think about all the logic and decisions in advance, you pass some of that complexity off to the ML model itself, and you then are just a consumer of its outputs.

example anomaly

Another goal will be to expose the anomaly rates corresponding to all alerts to help give some quick additional context when viewing or trying to prioritize alerts. Traditional visual anomaly detection has always been a core part of what Netdata does, in built anomaly rates should only aid in this visual flow.

There is no one size fits all model or solution - we just want to give our users a few extra tools in their toolboxes. Over time, they will become experts in leveraging our ML-based features and we will hopefully make their lives a little easier when using Netdata for monitoring and troubleshooting. 

Embrace uncertainty

We can never be 100% sure “if the ML will work” when thinking about new potential ML features. Often, we simply don’t have the data we would need to try and even begin to scratch the surface in answering this question. Nevertheless, it should not bog us down and get in the way of innovation. Even if we validate as best as we can that it works on our own production data, while very useful and encouraging, that’s still a sample of size 1.

Too many times when it comes to new ML projects, new ideas turn into 6 month research projects that struggle to get anywhere. Instead, we want to focus on (almost embarrassing) initial POC and MVP based features. We can introduce and explore them in a safe and de-risked way. Then we can focus on telemetry and user feedback to figure out if or how well they work and identify areas for further improvement.

Minimize learning curve

We’ve discussed how ML is often overly mystified and under-used in the monitoring and observability sector. To counter these obstacles, our mission is to make the adoption of ML features as easy and understandable as possible to our users. Apart from good usability, there should be layers of reference material so that users who just want to get a quick overview and start using features can get up and running. Those more curious can go as deep as they like, and ideally even learn a little about machine learning in general along the way. We are stronger together, so the more our features are used, the more we can improve them and lighten the burden of troubleshooting for the Netdata community.

If you’re interested in ML and observability come join us in the 🤖-ml-powered-monitoring channel of the Netdata discord or feel free to create a GitHub discussion if you’d like to learn more.