The only agent that thinks for itself

Autonomous Monitoring with self-learning AI built-in, operating independently across your entire stack.

Unlimited Metrics & Logs
Machine learning & MCP
5% CPU, 150MB RAM
3GB disk, >1 year retention
800+ integrations, zero config
Dashboards, alerts out of the box
> Discover Netdata Agents
Centralized metrics streaming and storage

Aggregate metrics from multiple agents into centralized Parent nodes for unified monitoring across your infrastructure.

Stream from unlimited agents
Long-term data retention
High availability clustering
Data replication & backup
Scalable architecture
Enterprise-grade security
> Learn about Parents
Fully managed cloud platform

Access your monitoring data from anywhere with our SaaS platform. No infrastructure to manage, automatic updates, and global availability.

Zero infrastructure management
99.9% uptime SLA
Global data centers
Automatic updates & patches
Enterprise SSO & RBAC
SOC2 & ISO certified
> Explore Netdata Cloud
Deploy Netdata Cloud in your infrastructure

Run the full Netdata Cloud platform on-premises for complete data sovereignty and compliance with your security policies.

Complete data sovereignty
Air-gapped deployment
Custom compliance controls
Private network integration
Dedicated support team
Kubernetes & Docker support
> Learn about Cloud On-Premises
Powerful, intuitive monitoring interface

Modern, responsive UI built for real-time troubleshooting with customizable dashboards and advanced visualization capabilities.

Real-time chart updates
Customizable dashboards
Dark & light themes
Advanced filtering & search
Responsive on all devices
Collaboration features
> Explore Netdata UI
Monitor on the go

Native iOS and Android apps bring full monitoring capabilities to your mobile device with real-time alerts and notifications.

iOS & Android apps
Push notifications
Touch-optimized interface
Offline data access
Biometric authentication
Widget support
> Download apps

Best energy efficiency

True real-time per-second

100% automated zero config

Centralized observability

Multi-year retention

High availability built-in

Zero maintenance

Always up-to-date

Enterprise security

Complete data control

Air-gap ready

Compliance certified

Millisecond responsiveness

Infinite zoom & pan

Works on any device

Native performance

Instant alerts

Monitor anywhere

80% Faster Incident Resolution
AI-powered troubleshooting from detection, to root cause and blast radius identification, to reporting.
True Real-Time and Simple, even at Scale
Linearly and infinitely scalable full-stack observability, that can be deployed even mid-crisis.
90% Cost Reduction, Full Fidelity
Instead of centralizing the data, Netdata distributes the code, eliminating pipelines and complexity.
Control Without Surrender
SOC 2 Type 2 certified with every metric kept on your infrastructure.
Integrations

800+ collectors and notification channels, auto-discovered and ready out of the box.

800+ data collectors
Auto-discovery & zero config
Cloud, infra, app protocols
Notifications out of the box
> Explore integrations
Real Results
46% Cost Reduction

Reduced monitoring costs by 46% while cutting staff overhead by 67%.

— Leonardo Antunez, Codyas

Zero Pipeline

No data shipping. No central storage costs. Query at the edge.

From Our Users
"Out-of-the-Box"

So many out-of-the-box features! I mostly don't have to develop anything.

— Simon Beginn, LANCOM Systems

No Query Language

Point-and-click troubleshooting. No PromQL, no LogQL, no learning curve.

Enterprise Ready
67% Less Staff, 46% Cost Cut

Enterprise efficiency without enterprise complexity—real ROI from day one.

— Leonardo Antunez, Codyas

SOC 2 Type 2 Certified

Zero data egress. Only metadata reaches the cloud. Your metrics stay on your infrastructure.

Full Coverage
800+ Collectors

Auto-discovered and configured. No manual setup required.

Any Notification Channel

Slack, PagerDuty, Teams, email, webhooks—all built-in.

Built for the People Who Get Paged
Because 3am alerts deserve instant answers, not hour-long hunts.
Every Industry Has Rules. We Master Them.
See how healthcare, finance, and government teams cut monitoring costs 90% while staying audit-ready.
Monitor Any Technology. Configure Nothing.
Install the agent. It already knows your stack.
From Our Users
"A Rare Unicorn"

Netdata gives more than you invest in it. A rare unicorn that obeys the Pareto rule.

— Eduard Porquet Mateu, TMB Barcelona

99% Downtime Reduction

Reduced website downtime by 99% and cloud bill by 30% using Netdata alerts.

— Falkland Islands Government

Real Savings
30% Cloud Cost Reduction

Optimized resource allocation based on Netdata alerts cut cloud spending by 30%.

— Falkland Islands Government

46% Cost Cut

Reduced monitoring staff by 67% while cutting operational costs by 46%.

— Codyas

Real Coverage
"Plugin for Everything"

Netdata has agent capacity or a plugin for everything, including Windows and Kubernetes.

— Eduard Porquet Mateu, TMB Barcelona

"Out-of-the-Box"

So many out-of-the-box features! I mostly don't have to develop anything.

— Simon Beginn, LANCOM Systems

Real Speed
Troubleshooting in 30 Seconds

From 2-3 minutes to 30 seconds—instant visibility into any node issue.

— Matthew Artist, Nodecraft

20% Downtime Reduction

20% less downtime and 40% budget optimization from out-of-the-box monitoring.

— Simon Beginn, LANCOM Systems

Pay per Node. Unlimited Everything Else.

One price per node. Unlimited metrics, logs, users, and retention. No per-GB surprises.

Free tier—forever
No metric limits or caps
Retention you control
Cancel anytime
> See pricing plans
What's Your Monitoring Really Costing You?

Most teams overpay by 40-60%. Let's find out why.

Expose hidden metric charges
Calculate tool consolidation
Customers report 30-67% savings
Results in under 60 seconds
> See what you're really paying
Your Infrastructure Is Unique. Let's Talk.

Because monitoring 10 nodes is different from monitoring 10,000.

On-prem & air-gapped deployment
Volume pricing & agreements
Architecture review for your scale
Compliance & security support
> Start a conversation
Monitoring That Sells Itself

Deploy in minutes. Impress clients in hours. Earn recurring revenue for years.

30-second live demos close deals
Zero config = zero support burden
Competitive margins & deal protection
Response in 48 hours
> Apply to partner
Per-Second Metrics at Homelab Prices

Same engine, same dashboards, same ML. Just priced for tinkerers.

Community: Free forever · 5 nodes · non-commercial
Homelab: $90/yr · unlimited nodes · fair usage
> Start monitoring your lab—free
$1,000 Per Referral. Unlimited Referrals.

Your colleagues get 10% off. You get 10% commission. Everyone wins.

10% of subscriptions, up to $1,000 each
Track earnings inside Netdata Cloud
PayPal/Venmo payouts in 3-4 weeks
No caps, no complexity
> Get your referral link
Cost Proof
40% Budget Optimization

"Netdata's significant positive impact" — LANCOM Systems

Calculate Your Savings

Compare vs Datadog, Grafana, Dynatrace

Savings Proof
46% Cost Reduction

"Cut costs by 46%, staff by 67%" — Codyas

30% Cloud Bill Savings

"Reduced cloud bill by 30%" — Falkland Islands Gov

Enterprise Proof
"Better Than Combined Alternatives"

"Better observability with Netdata than combining other tools." — TMB Barcelona

Real Engineers, <24h Response

DPA, SLAs, on-prem, volume pricing

Why Partners Win
Demo Live Infrastructure

One command, 30 seconds, real data—no sandbox needed

Zero Tickets, High Margins

Auto-config + per-node pricing = predictable profit

Homelab Ready
"Absolutely Incredible"

"We tested every monitoring system under the sun." — Benjamin Gabler, CEO Rocket.Net

76k+ GitHub Stars

3rd most starred monitoring project

Worth Recommending
Product That Delivers

Customers report 40-67% cost cuts, 99% downtime reduction

Zero Risk to Your Rep

Free tier lets them try before they buy

Never Fight Fires Alone

Docs, community, and expert help—pick your path to resolution.

Learn.netdata.cloud docs
Discord, Forums, GitHub
Premium support available
> Get answers now
60 Seconds to First Dashboard

One command to install. Zero config. 850+ integrations documented.

Linux, Windows, K8s, Docker
Auto-discovers your stack
> Read our documentation
See Netdata in Action

Watch real-time monitoring in action—demos, tutorials, and engineering deep dives.

Product demos and walkthroughs
Real infrastructure, not staged
> Start with the 3-minute tour
Level Up Your Monitoring
Real problems. Real solutions. 112+ guides from basic monitoring to AI observability.
76,000+ Engineers Strong
615+ contributors. 1.5M daily downloads. One mission: simplify observability.
Per-Second. 90% Cheaper. Data Stays Home.
Side-by-side comparisons: costs, real-time granularity, and data sovereignty for every major tool.

See why teams switch from Datadog, Prometheus, Grafana, and more.

> Browse all comparisons
Edge-Native Observability, Born Open Source
Per-second visibility, ML on every metric, and data that never leaves your infrastructure.
Founded in 2016
615+ contributors worldwide
Remote-first, engineering-driven
Open source first
> Read our story
Promises We Publish—and Prove
12 principles backed by open code, independent validation, and measurable outcomes.
Open source, peer-reviewed
Zero config, instant value
Data sovereignty by design
Aligned pricing, no surprises
> See all 12 principles
Edge-Native, AI-Ready, 100% Open
76k+ stars. Full ML, AI, and automation—GPLv3+, not premium add-ons.
76,000+ GitHub stars
GPLv3+ licensed forever
ML on every metric, included
Zero vendor lock-in
> Explore our open source
Build Real-Time Observability for the World
Remote-first team shipping per-second monitoring with ML on every metric.
Remote-first, fully distributed
Open source (76k+ stars)
Challenging technical problems
Your code on millions of systems
> See open roles
Talk to a Netdata Human in <24 Hours
Sales, partnerships, press, or professional services—real engineers, fast answers.
Discuss your observability needs
Pricing and volume discounts
Partnership opportunities
Media and press inquiries
> Book a conversation
Your Data. Your Rules.
On-prem data, cloud control plane, transparent terms.
Trust & Scale
76,000+ GitHub Stars

One of the most popular open-source monitoring projects

SOC 2 Type 2 Certified

Enterprise-grade security and compliance

Data Sovereignty

Your metrics stay on your infrastructure

Validated
University of Amsterdam

"Most energy-efficient monitoring solution" — ICSOC 2023, peer-reviewed

ADASTEC (Autonomous Driving)

"Doesn't miss alerts—mission-critical trust for safety software"

Community Stats
615+ Contributors

Global community improving monitoring for everyone

1.5M+ Downloads/Day

Trusted by teams worldwide

GPLv3+ Licensed

Free forever, fully open source agent

Why Join?
Remote-First

Work from anywhere, async-friendly culture

Impact at Scale

Your work helps millions of systems

Compliance
SOC 2 Type 2

Audited security controls

GDPR Ready

Data stays on your infrastructure

Blog

Actionable alerts with fewer false positives: intelligent alarms with Netdata

Enhancing Alert Accuracy and Operational Efficiency
by Netdata Team · January 21, 2021

Think about any sport or competitive activity, whether that’s football or a spelling bee. They always feature at least one person who acts as a moderator, referee, or judge. With their domain expertise, this person watches everyone’s behavior and constantly compares that against a set of rules. If someone crosses that threshold, they blow a whistle or throw up a flag. They are, in effect, saying that things have gone from OK to not OK.

Deploying an application, running an infrastructure, or even keeping tabs on a single virtual machine (VM) running on a cloud provider is much like playing one of these games or sports. There are a lot of moving parts, but there are also distinct thresholds between OK and not OK.

  • A system running at 99% CPU is not OK.
  • A MySQL database returning 50% slow queries is not OK.
  • An Apache web server returning more 503 errors than 200 successes is not OK.
On the other hand:
  • A system running at 82% CPU utilization might be OK.
  • A MySQL database returning 5% slow queries might be OK.
  • An Apache web server returning a few uncorrelated 503 errors might be OK.
But no one can possibly watch every system, and every application, for every possible broken rule or crossed threshold. Nor can they always be expected to have the domain expertise to know exactly what the OK/not OK threshold is for every single application or process running on their infrastructure. If anyone is meant to tackle these complexities, while also feeling like they’re able to keep tabs on many discrete systems and applications at one time, they need help.

That’s where preconfigured alarms, with smart defaults designed by people who have experience monitoring systems and mission-critical applications, provide so much immediate value. So, to nip some of the first concerns users have about alerting in the bud, like, “My customers will let me know when my website is down,” or “I don’t even know where to start with thresholds, so I’ll just use htop,” let’s look into how these systems work in Netdata.

What are alerting and alarms?

Every monitoring solution that offers these features uses slightly different terminology, but here’s the gist. Alerts or alarms are processes that compare metrics data against thresholds and let their users know when something is not OK.

In the Netdata world, we also call this feature health monitoring: is your node OK or not OK right now? When Netdata’s watchdog process notices something odd, whether that’s in the underlying system, the container layer (when in play), or specific applications, it generates an alarm.

An alarm always begins with metrics data, which is stored in a time-series database (TSDB). The database stores a series of data points with the timestamp at which that data was collected to provide meaningful context for each number. A point of metrics data is only valuable when you can compare it to other points, particularly over large timeframes, which then lets you calculate averages, minimums, maximums, and deviations from “normal.”

On a regular interval, the watchdog process queries the TSDB for a bit of metrics data, and then runs a calculation on that data, such as calculating the average of 10 data points collected over the last 10 seconds. It then compares the result of that calculation against a configured threshold, and if the calculation exceeds the threshold, it raises an alarm.

For example, one of the use alarms that comes preconfigured with every Netdata installation out of disk space time. This alarm first queries the database for available disk space metrics over the last hour, then calculates the rate at which the database is filling. The alarm then calculates whether the disk will fill up soon based on that rate, and if so, fires off a warning or critical alarm.

On a monitoring platform like Netdata, this chain of query-calculate-compare happens many times every second, using a huge variety of preconfigured thresholds, attached to the most critical metrics you might want to stay aware of.

The output of alarms, whether that’s for single nodes or infrastructure of hundreds of ephemeral nodes, are notifications and actions.

Notifications are those annoying (more on that later) pop-ups or pings that you get from other software, like Slack. When it comes to monitoring the health of an infrastructure, system, container, or application, notifications are designed to notify you of a not OK situation and provide some important context to help you take action, such as the system/application affected, which metric(s) are related, which chart(s) you could look at to begin your investigation, and where the alarm is configured if you want to quickly silence an alarm or change a threshold.

Actions are a little broader in nature. They could be manual actions, such as opening up a dashboard to visualize metrics, performing root cause analysis, or running a script designed to help remedy the situation. They could also be automated, like firing off an incident response process on a platform like PagerDuty or StackPulse. Whatever form the action takes, the immediate goal is to identify the source of the problem, come up with a resolution, deploy it, and sit back as the health watchdog goes from wildly waving its arms about a not OK alarm to sitting quietly and waiting for its next time to shine.

All of the above, from the way the health watchdog queries the TSDB, to where notifications are deployed, can create two big problems—if they’re not configured properly.

  • False positives: An alert telling you that a system or application is not OK, when in fact it is. These result in wasted time and resources, and lead to mistrust in the monitoring system itself.
  • Alarm fatigue: A desensitization to the alerts themselves, which leads to either silencing or ignoring them altogether. Fatigue can come from bad experiences with false positives or the sheer volume of emails, Slack pings, or automated incidents created on a third-party platform.

The anatomy of Netdata’s intelligent alarms

Every alarm that ships with Netdata comes with intelligent defaults for each of these anatomical points. Because they’re designed by people who have monitored these types of systems and applications in production before, they reduce the risk of false positives, which create panic for no reason and leads to alarm fatigue.

Here are a few of the ways the Netdata team, and its community of IT professionals, designs our alarms to generate useful notifications and actions:

  • Metrics data: This is the raw information, stored alongside the time it was collected, about resource usage, interactions between a system’s components, or actions they’re taking. Metrics can come from hardware, the operating system, containerization layers, and the applications running on a node. In Netdata, the collection interval is every second (and at “event frequency” for eBPF metrics, meaning you see every kernel interaction, no matter how quickly they happen), giving you the most precise foundation for alarms.
  • Filtering: Every alarm should only run off a specific series. For example, the Netdata alarm related to disk space only queries for the available disk space on every disk, and doesn’t bother with anything else. Filtering also allows you to run certain alarms against nodes with specific labelshostnames, or operating systems. Filters often allow some pattern matching.
  • Frequency: This is how often the alarm’s calculation is run against metrics data. Set this based on how quickly you would like to know about a fault in a particular system. While you might not need to know every 10 seconds that your disk isn’t filling up, you definitely want to know within seconds if a MySQL server crashes.
  • Templates: Write once, apply everywhere. Use templates to apply a specific query and calculation to multiple metrics series without having to write the same alarm again and again. Netdata offers this via dimension templates and the ability to apply logic to multiple charts.
  • Calculation: Many alarms convert the raw metrics data into another format, or compare it against a parallel metric series, in order to make the result human-readable. For example, Netdata’s active processes alarm doesn’t just alert you when the volume of processes on the system reaches X. Instead, it multiples the active processes by 100, then divides that by the system’s maximum processes (from /proc/sys/kernel/pid_max). The result is a percentage, and the alarm crosses its warning threshold at 75%.
  • Thresholds: Every alarm takes queried metrics data, or a calculated result, and decides whether it’s OK or not OK. Most thresholds never change, but some monitoring platforms offer dynamic thresholds based on the system’s baseline.
  • Hysteresis: This prevents floods of alarms for metrics data that is “flapping” around a configured threshold. For example, if your node’s CPU usage is between 80 and 90%, and the warning’s threshold is 85%, you’ll get flooded with notifications. A recipe for alarm fatigue. Hysteresis prevents additional alarms until the CPU usage first drops to a normal level, then comes back up again.
  • Severity: Netdata uses CLEAR, WARNING, and CRITICAL alarm statuses, with CLEAR meaning the alarm is OK, and WARNING/CRITICAL meaning it’s in some state of being not OK. Severity is essential to alerting and alarms best practices.
  • Advanced configuration: In Netdata, there are even more variables, syntaxes, and outputs available for the adventurous soul, some of which we use in preconfigured alarms to make them relevant and valuable.
Properly-designed alarms create the most value in the notifications they spawn.
  • Recipients: To avoid alarm fatigue, organizations should audit the intended recipients of every alarm to ensure only the proper stakeholders, who have the capacity to deal with a particular issue (and are online/at work to address them), receive them.
  • Platforms: Some notification platforms are “dumb,” in that they only receive a notification and display it to you; other “smart” platforms take take additional, automated action based on it. An email or Slack notification implies the recipient will take action as necessary, while an incident management platform like PagerDuty spins up new processes, such as creating a formal incident and inviting colleagues to join in troubleshooting.
  • Severity: Send warning notifications to Slack for passive observation, but critical alarms to an external platform to ensure all the right players are paged and hop into a conference right away.
In Netdata, the notifications can come either from individual nodes running the monitoring Agent, or from a centralized source of truth in Netdata Cloud.

Ready for alarms? Some resources to get you started

If you’re ready to start monitoring the health and performance of your infrastructure, with hundreds of preconfigured alarms and entirely for free, sign up for Netdata.

Once you’ve set up a node and are seeing its metrics in Netdata Cloud, read our documentation on viewing active alarmsconfiguring existing alarms, or enabling notifications. From there, look into every service and application Netdata integrates with, or make root cause analysis a little more fun with intelligent features like Metric Correlations.