The only agent that thinks for itself

Autonomous Monitoring with self-learning AI built-in, operating independently across your entire stack.

Unlimited Metrics & Logs
Machine learning & MCP
5% CPU, 150MB RAM
3GB disk, >1 year retention
800+ integrations, zero config
Dashboards, alerts out of the box
> Discover Netdata Agents

Centralized metrics streaming and storage

Aggregate metrics from multiple agents into centralized Parent nodes for unified monitoring across your infrastructure.

Stream from unlimited agents
Long-term data retention
High availability clustering
Data replication & backup
Scalable architecture
Enterprise-grade security
> Learn about Parents

Fully managed cloud platform

Access your monitoring data from anywhere with our SaaS platform. No infrastructure to manage, automatic updates, and global availability.

Zero infrastructure management
99.9% uptime SLA
Global data centers
Automatic updates & patches
Enterprise SSO & RBAC
SOC2 & ISO certified
> Explore Netdata Cloud

Deploy Netdata Cloud in your infrastructure

Run the full Netdata Cloud platform on-premises for complete data sovereignty and compliance with your security policies.

Complete data sovereignty
Air-gapped deployment
Custom compliance controls
Private network integration
Dedicated support team
Kubernetes & Docker support
> Learn about Cloud On-Premises

Powerful, intuitive monitoring interface

Modern, responsive UI built for real-time troubleshooting with customizable dashboards and advanced visualization capabilities.

Real-time chart updates
Customizable dashboards
Dark & light themes
Advanced filtering & search
Responsive on all devices
Collaboration features
> Explore Netdata UI

Monitor on the go

Native iOS and Android apps bring full monitoring capabilities to your mobile device with real-time alerts and notifications.

iOS & Android apps
Push notifications
Touch-optimized interface
Offline data access
Biometric authentication
Widget support
> Download apps

Best energy efficiency

True real-time per-second

100% automated zero config

Centralized observability

Multi-year retention

High availability built-in

Zero maintenance

Always up-to-date

Enterprise security

Complete data control

Air-gap ready

Compliance certified

Millisecond responsiveness

Infinite zoom & pan

Works on any device

Native performance

Instant alerts

Monitor anywhere

80% Faster Incident Resolution

AI-powered troubleshooting from detection, to root cause and blast radius identification, to reporting.

True Real-Time and Simple, even at Scale

Linearly and infinitely scalable full-stack observability, that can be deployed even mid-crisis.

90% Cost Reduction, Full Fidelity

Instead of centralizing the data, Netdata distributes the code, eliminating pipelines and complexity.

Control Without Surrender

SOC 2 Type 2 certified with every metric kept on your infrastructure.

Integrations

800+ collectors and notification channels, auto-discovered and ready out of the box.

800+ data collectors
Auto-discovery & zero config
Cloud, infra, app protocols
Notifications out of the box
> Explore integrations
Real Results
46% Cost Reduction

Reduced monitoring costs by 46% while cutting staff overhead by 67%.

— Leonardo Antunez, Codyas

Zero Pipeline

No data shipping. No central storage costs. Query at the edge.

From Our Users
"Out-of-the-Box"

So many out-of-the-box features! I mostly don't have to develop anything.

— Simon Beginn, LANCOM Systems

No Query Language

Point-and-click troubleshooting. No PromQL, no LogQL, no learning curve.

Enterprise Ready
67% Less Staff, 46% Cost Cut

Enterprise efficiency without enterprise complexity—real ROI from day one.

— Leonardo Antunez, Codyas

SOC 2 Type 2 Certified

Zero data egress. Only metadata reaches the cloud. Your metrics stay on your infrastructure.

Full Coverage
800+ Collectors

Auto-discovered and configured. No manual setup required.

Any Notification Channel

Slack, PagerDuty, Teams, email, webhooks—all built-in.

Built for the People Who Get Paged

Because 3am alerts deserve instant answers, not hour-long hunts.

Every Industry Has Rules. We Master Them.

See how healthcare, finance, and government teams cut monitoring costs 90% while staying audit-ready.

Monitor Any Technology. Configure Nothing.

Install the agent. It already knows your stack.
From Our Users
"A Rare Unicorn"

Netdata gives more than you invest in it. A rare unicorn that obeys the Pareto rule.

— Eduard Porquet Mateu, TMB Barcelona

99% Downtime Reduction

Reduced website downtime by 99% and cloud bill by 30% using Netdata alerts.

— Falkland Islands Government

Real Savings
30% Cloud Cost Reduction

Optimized resource allocation based on Netdata alerts cut cloud spending by 30%.

— Falkland Islands Government

46% Cost Cut

Reduced monitoring staff by 67% while cutting operational costs by 46%.

— Codyas

Real Coverage
"Plugin for Everything"

Netdata has agent capacity or a plugin for everything, including Windows and Kubernetes.

— Eduard Porquet Mateu, TMB Barcelona

"Out-of-the-Box"

So many out-of-the-box features! I mostly don't have to develop anything.

— Simon Beginn, LANCOM Systems

Real Speed
Troubleshooting in 30 Seconds

From 2-3 minutes to 30 seconds—instant visibility into any node issue.

— Matthew Artist, Nodecraft

20% Downtime Reduction

20% less downtime and 40% budget optimization from out-of-the-box monitoring.

— Simon Beginn, LANCOM Systems

Pay per Node. Unlimited Everything Else.

One price per node. Unlimited metrics, logs, users, and retention. No per-GB surprises.

Free tier—forever
No metric limits or caps
Retention you control
Cancel anytime
> See pricing plans

What's Your Monitoring Really Costing You?

Most teams overpay by 40-60%. Let's find out why.

Expose hidden metric charges
Calculate tool consolidation
Customers report 30-67% savings
Results in under 60 seconds
> See what you're really paying

Your Infrastructure Is Unique. Let's Talk.

Because monitoring 10 nodes is different from monitoring 10,000.

On-prem & air-gapped deployment
Volume pricing & agreements
Architecture review for your scale
Compliance & security support
> Start a conversation

Monitoring That Sells Itself

Deploy in minutes. Impress clients in hours. Earn recurring revenue for years.

30-second live demos close deals
Zero config = zero support burden
Competitive margins & deal protection
Response in 48 hours
> Apply to partner

Per-Second Metrics at Homelab Prices

Same engine, same dashboards, same ML. Just priced for tinkerers.

Community: Free forever · 5 nodes · non-commercial
Homelab: $90/yr · unlimited nodes · fair usage
> Start monitoring your lab—free

$1,000 Per Referral. Unlimited Referrals.

Your colleagues get 10% off. You get 10% commission. Everyone wins.

10% of subscriptions, up to $1,000 each
Track earnings inside Netdata Cloud
PayPal/Venmo payouts in 3-4 weeks
No caps, no complexity
> Get your referral link
Cost Proof
40% Budget Optimization

"Netdata's significant positive impact" — LANCOM Systems

Calculate Your Savings

Compare vs Datadog, Grafana, Dynatrace

Savings Proof
46% Cost Reduction

"Cut costs by 46%, staff by 67%" — Codyas

30% Cloud Bill Savings

"Reduced cloud bill by 30%" — Falkland Islands Gov

Enterprise Proof
"Better Than Combined Alternatives"

"Better observability with Netdata than combining other tools." — TMB Barcelona

Real Engineers, <24h Response

DPA, SLAs, on-prem, volume pricing

Why Partners Win
Demo Live Infrastructure

One command, 30 seconds, real data—no sandbox needed

Zero Tickets, High Margins

Auto-config + per-node pricing = predictable profit

Homelab Ready
"Absolutely Incredible"

"We tested every monitoring system under the sun." — Benjamin Gabler, CEO Rocket.Net

76k+ GitHub Stars

3rd most starred monitoring project

Worth Recommending
Product That Delivers

Customers report 40-67% cost cuts, 99% downtime reduction

Zero Risk to Your Rep

Free tier lets them try before they buy

Never Fight Fires Alone

Docs, community, and expert help—pick your path to resolution.

Learn.netdata.cloud docs
Discord, Forums, GitHub
Premium support available
> Get answers now

60 Seconds to First Dashboard

One command to install. Zero config. 850+ integrations documented.

Linux, Windows, K8s, Docker
Auto-discovers your stack
> Read our documentation

See Netdata in Action

Watch real-time monitoring in action—demos, tutorials, and engineering deep dives.

Product demos and walkthroughs
Real infrastructure, not staged
> Start with the 3-minute tour

Level Up Your Monitoring

Real problems. Real solutions. 112+ guides from basic monitoring to AI observability.

76,000+ Engineers Strong

615+ contributors. 1.5M daily downloads. One mission: simplify observability.

Per-Second. 90% Cheaper. Data Stays Home.

Side-by-side comparisons: costs, real-time granularity, and data sovereignty for every major tool.

See why teams switch from Datadog, Prometheus, Grafana, and more.

> Browse all comparisons
Edge-Native Observability, Born Open Source
Per-second visibility, ML on every metric, and data that never leaves your infrastructure.
Founded in 2016
615+ contributors worldwide
Remote-first, engineering-driven
Open source first
> Read our story
Promises We Publish—and Prove
12 principles backed by open code, independent validation, and measurable outcomes.
Open source, peer-reviewed
Zero config, instant value
Data sovereignty by design
Aligned pricing, no surprises
> See all 12 principles
Edge-Native, AI-Ready, 100% Open
76k+ stars. Full ML, AI, and automation—GPLv3+, not premium add-ons.
76,000+ GitHub stars
GPLv3+ licensed forever
ML on every metric, included
Zero vendor lock-in
> Explore our open source
Build Real-Time Observability for the World
Remote-first team shipping per-second monitoring with ML on every metric.
Remote-first, fully distributed
Open source (76k+ stars)
Challenging technical problems
Your code on millions of systems
> See open roles
Talk to a Netdata Human in <24 Hours
Sales, partnerships, press, or professional services—real engineers, fast answers.
Discuss your observability needs
Pricing and volume discounts
Partnership opportunities
Media and press inquiries
> Book a conversation
Your Data. Your Rules.
On-prem data, cloud control plane, transparent terms.
Trust & Scale
76,000+ GitHub Stars

One of the most popular open-source monitoring projects

SOC 2 Type 2 Certified

Enterprise-grade security and compliance

Data Sovereignty

Your metrics stay on your infrastructure

Validated
University of Amsterdam

"Most energy-efficient monitoring solution" — ICSOC 2023, peer-reviewed

ADASTEC (Autonomous Driving)

"Doesn't miss alerts—mission-critical trust for safety software"

Community Stats
615+ Contributors

Global community improving monitoring for everyone

1.5M+ Downloads/Day

Trusted by teams worldwide

GPLv3+ Licensed

Free forever, fully open source agent

Why Join?
Remote-First

Work from anywhere, async-friendly culture

Impact at Scale

Your work helps millions of systems

Compliance
SOC 2 Type 2

Audited security controls

GDPR Ready

Data stays on your infrastructure

Blog

Linux CPU Consumption, Load & Pressure | Optimize Performance

Analyzing CPU Usage to Optimize System Performance
by Satyadeep Ashwathnarayana · May 2, 2023

stacked-netdata

As a system administrator, understanding how your Linux system’s CPU is being utilized is crucial for identifying bottlenecks and optimizing performance. In this blog post, we’ll dive deep into the world of Linux CPU consumption, load, and pressure, and discuss how to use these metrics effectively to identify issues and improve your system’s performance.

CPU Consumption and Utilization

CPU consumption refers to the amount of processing power being used by applications running on your system. The system.cpu chart in Netdata represents the Total CPU utilization of your Linux system, broken down into different dimensions. Each dimension provides insight into how the CPU is being used by various tasks and processes. Here’s a brief explanation of each dimension:

  1. user: This dimension represents the percentage of CPU time spent executing user-level applications or processes (i.e., non-kernel code). It indicates the share of time the CPU is busy running tasks initiated by users.

  2. system: This dimension represents the percentage of CPU time spent executing kernel-level processes, such as handling system calls, managing memory, or controlling hardware. It reflects the share of time the CPU is busy with system-level tasks.

  3. nice: This dimension represents the percentage of CPU time spent executing user-level processes with a positive nice value, which indicates a lower priority. A high nice value can suggest that lower-priority tasks are consuming a significant portion of the CPU time.

  4. iowait: This dimension represents the percentage of CPU time spent waiting for input/output (I/O) operations to complete, such as disk or network access. High iowait values can indicate I/O-bound tasks, slow storage devices, or storage subsystem issues.

  5. irq: This dimension represents the percentage of CPU time spent handling hardware interrupt requests (IRQs), which are signals sent by hardware devices to the CPU to request attention. High irq values can suggest that the system is spending a significant amount of time responding to hardware events.

  6. softirq: This dimension represents the percentage of CPU time spent handling software interrupt requests (soft IRQs), which are kernel-level processes that handle specific hardware-related tasks, such as network packet processing. High softirq values can indicate that the system is spending a considerable amount of time processing software interrupts.

  7. steal: This dimension represents the percentage of CPU time that is “stolen” by the hypervisor (in virtualized environments) for other virtual machines (VMs) running on the same physical host. High steal values can indicate resource contention among VMs or insufficient resources allocated to your VM.

  8. guest: This dimension represents the percentage of CPU time spent on running virtual CPU processes for guest VMs in a virtualized environment. High guest values indicate that the host system is dedicating a significant portion of CPU time to running guest VMs.

  9. guest_nice: This dimension represents the percentage of CPU time spent on running virtual CPU processes for guest VMs with a positive nice value, indicating lower priority.

Internally in Netdata there is another dimension: idle, that represents the percentage of time that the CPU is not executing any process and is available for other tasks. iowait is also another dimension that represents idle CPU time, but in this case it also means reduced performance due to I/O and this is why they are separate.

Keep in mind that Linux supports different schedulers and priority mechanisms apart from the nice value, such as the Completely Fair Scheduler (CFS), real-time schedulers (SCHED_FIFO and SCHED_RR), and the deadline scheduler (SCHED_DEADLINE). Each of these schedulers and priorities influences the dimensions in the system.cpu chart differently:

  • Completely Fair Scheduler (CFS): CFS is the default scheduler used in most Linux systems for normal, non-real-time tasks. It attempts to distribute CPU time fairly among all processes based on their assigned “weight” or “niceness”. While CFS doesn’t introduce additional dimensions to the system.cpu chart, it affects the distribution of CPU time between the user, system, and nice dimensions.

  • Real-time schedulers (SCHED_FIFO and SCHED_RR): Real-time schedulers are designed for time-critical tasks and can preempt other processes to ensure that high-priority real-time tasks receive the CPU time they need. When real-time tasks are running, they can increase the user or system dimensions in the system.cpu chart, depending on whether they are user-level or kernel-level tasks. Real-time tasks may cause other, lower-priority tasks to be delayed or starved for CPU time, which can lead to lower values in the nice dimension or increased iowait if they are waiting for I/O operations to complete.

  • Deadline scheduler (SCHED_DEADLINE): The deadline scheduler assigns a deadline to each task and prioritizes them based on these deadlines. Tasks that are running with the deadline scheduler will also contribute to the user or system dimensions in the system.cpu chart. The deadline scheduler can affect the distribution of CPU time among tasks and may cause lower-priority tasks to experience increased iowait or decreased nice values if they are being preempted by higher-priority deadline tasks.

The “CPUs” section in Netdata provides per-core utilization (cpu.cpu) with the same dimensions as system.cpu, but for each CPU core individually. Per core utilization is important in several scenarios, and monitoring it can provide valuable insights into your system’s performance. Here’s are a few scenarios per core CPU utilization may help:

  1. Identifying Core-Specific Issues: If you notice performance issues on your system but don’t see high overall CPU utilization, it’s possible that a single core is experiencing high load or contention, while the others are idle or underutilized. Monitoring per core utilization can help you identify these issues and take corrective actions, such as adjusting process affinity or tweaking application settings to better distribute the load.

  2. Ensuring Balanced Workloads: In multi-core systems, it’s essential to maintain a balanced workload across all cores for optimal performance. Monitoring per core utilization can help you detect any imbalances and take necessary steps to distribute workloads evenly. This can be particularly important for applications that are designed to take advantage of multiple cores, such as parallel processing or multi-threaded applications.

  3. Optimizing Multi-Threaded Applications: When developing or optimizing multi-threaded applications, it’s important to ensure that each thread runs efficiently and doesn’t cause unnecessary contention for resources. By monitoring per core utilization, you can assess how well your application is utilizing available cores and identify opportunities to improve its performance through better thread management or parallelism.

  4. Detecting Thermal Throttling: High CPU core utilization can lead to increased temperatures, causing thermal throttling to kick in and reduce the core’s performance to prevent overheating. Monitoring per core utilization can help you identify cores that are consistently running hot, which may indicate a need for better cooling solutions or adjustments to your system’s power management settings. Keep in mind that when Netdata runs in bare metal systems, it also monitors CPU throttling events (cpu.core_throttling) per core.

  5. Capacity Planning: Monitoring per core utilization can provide valuable data for capacity planning purposes. By understanding how each core is utilized, you can make more informed decisions about hardware upgrades or resource allocation, ensuring that your system remains capable of handling current and future workloads.

CPU Load

CPU load is a measure of the number of processes that are either using the CPU or waiting for system resources, such as CPU and disk. The “System Load Average” (system.load) chart in Netdata displays three dimensions: load1, load5, and load15, representing the average number of processes in the system load over the past 1, 5, and 15 minutes, respectively.

A high CPU load can indicate that your system is experiencing resource contention or that processes are waiting for resources to become available. It’s essential to keep an eye on your system load and investigate any sudden spikes or consistently high values, as these can be signs of performance issues or bottlenecks.

CPU Pressure

CPU pressure is a relatively new concept in Linux, introduced by the Pressure Stall Information (PSI) feature in kernel 4.20. It provides information about the share of time in which tasks are stalled on the CPU, either partially (some) or fully (all non-idle tasks).

Netdata displays this information in the “CPU some pressure” (system.cpu_some_pressure) and “CPU full pressure” (system.cpu_full_pressure) charts, each with dimensions representing recent trends over 10, 60, and 300-second windows. In both of these charts, the value shown represents the percentage of time over these windows that some or all tasks have been waiting for CPU resources due to CPU congestion.

The system.cpu_some_pressure_stall_time and system.cpu_full_pressure_stall_time chart charts display the amount of time some or all processes have been waiting for CPU time due to CPU congestion within the Netdata sampling frequency (2-seconds for both of the charts).

When interpreting these charts, it’s important to look at the values of both the some and the full together, to get a complete picture of CPU pressure. If you see high values for some and low values for full, it may indicate that only some processes are experiencing delays due to CPU congestion, while others may be executing normally. This could happen, for example, if some processes are using a lot of CPU resources while others are idle.

If you see high values for both the some and the full, it indicates that all processes are experiencing delays due to CPU congestion. This could happen if the CPU is overloaded with too many processes or if some processes are using a lot of CPU resources for extended periods of time.

These metrics provide valuable information about how long processes are waiting for CPU resources due to CPU congestion and can help identify cases of CPU saturation and contention.

Applications CPU Utilization

Netdata also provides CPU utilization breakdown per application, user and group.

The apps.cpu chart displays the total CPU utilization per application group. 100% on this chart means 1 core. So, applications can exceed 100% when they utilize more than 1 CPU core. In Netdata, application groups are defined in /etc/netdata/apps_groups.conf, and by default, Netdata comes preconfigured with a lot of popular applications. If Netdata is not configured to track the CPU consumption of some application, its utilization is tracked in a dimension called other.

The users.cpu chart displays CPU consumption by the system user (uid) the applications are running. Netdata automatically picks the username to prettify the output, and no additional configuration is required. Again, 100% = 1 core in this chart.

The groups.cpu chart displays CPU consumption by the system group (gid) the applications are running. Netdata automatically picks the groupname to prettify the output, and no additional configuration is required. 100% = 1 core in this chart too.

By monitoring these charts over time, you can identify patterns and trends in CPU usage for different applications, users, and groups and use this information to optimize system performance.

Systemd Services CPU Utilization

Systemd services are configured to run as cgroups in the default namespace by default, which allows for full tracking of CPU utilization for each service. The services.cpu chart in Netdata tracks the CPU utilization of each systemd service individually, providing a breakdown of the total CPU utilization within the system-wide CPU resources (all cores) for each service running on the system. So, unlike apps.cpu, 100% on this chart means all the CPU resources available.

The chart tracks the amount of time spent by tasks of the cgroup in user and kernel modes, and provides a view of the CPU usage for each service in a separate dimension. This can be useful in identifying which systemd services are consuming the most CPU resources and can help to optimize system performance by limiting the CPU usage of certain services.

To use the services.cpu chart in Netdata, systemd must be running on the system. The chart may not be available on systems that use other init systems. Additionally, it’s important to note that systemd services may not necessarily correspond directly to specific applications or processes, but may instead represent more abstract system components or functions.

Containers and VMs CPU Utilization

Netdata is able to track both v1 and v2 cgroups, allowing you to monitor CPU utilization for applications running in both types of cgroups, created by any container or VM orchestrator, including Docker, LXC and LXD, Kubernetes, KVM, LibVirt, Qemu, Proxmox, Mesos, Nomad, Docker Swarm, OpenShift, Rancher, etc.

By monitoring the cgroups.cpu chart in Netdata for both v1 and v2 cgroups, you can gain a more comprehensive understanding of CPU utilization across different types of cgroups and use this information to optimize system performance over time.

It’s worth noting that cgroups v2 provides additional features and improvements over cgroups v1, including a simpler interface, better scalability, and improved performance. If possible, it may be beneficial to use cgroups v2 on your system to take advantage of these improvements.

Finding the Root Cause of High CPU Utilization

To find the root cause of high CPU utilization in Netdata, you can use a combination of different charts to get a comprehensive view of how the CPU is being used on your system. Here are some steps you can take:

First verify that the system is under high CPU utilization:

  • Check the system.cpu chart: Start by checking the system.cpu chart to see how the CPU is being utilized overall. system-cpu

  • Check the CPU pressure charts: Check the CPU pressure charts to see if any tasks are stalled on the CPU due to CPU congestion. High values in the some and full charts can indicate that processes are waiting for CPU resources to become available, potentially causing high CPU utilization. system-pressure

  • Check the system.load chart: It displays the average number of processes in the run queue over the past 1, 5, and 15 minutes. High load values can indicate a CPU bottleneck and can help identify performance issues that need further investigation. system-load Once you have verified the high CPU utilization on the system, you can identify the processes causing it, using the following:

  • Check the apps.cpu chart: Next, check the apps.cpu chart to see which applications or groups of applications are using the most CPU resources. If you get high cpu usage on the dimension other of the apps.cpu chart, you are running an application that Netdata does not know. You can check which processes are accumulated into dimension other using the function “Processes” on Netdata Cloud. Once you have identified the processes that need to be monitored, edit /etc/netdata/apps_groups.conf to add your application and restart Netdata to start monitoring its resource consumption. You may also consult the users.cpu and the groups.cpu charts. These charts do not require any configuration, so as long as your application runs under its own user or user group, these charts will automatically monitor it. apps-cpu

  • Check the services.cpu chart: If you’re using systemd, check the services.cpu chart to see which systemd services are consuming the most CPU resources. This can help you identify which system components or functions are causing high CPU utilization. services-cpu

  • Check the cgroups.cpu charts: If you’re using containers or virtual machines, check the cgroups.cpu chart to see how CPU resources are being utilized by different cgroups. This can help you identify any issues with container or VM performance that may be contributing to high CPU utilization. cgroups-cpu

By using these charts in Netdata, you can get a comprehensive view of how the CPU is being used on your systems and identify the root cause of high CPU utilization.

Netdata Functions: Processes

Netdata Function “Processes” provides also an alternative way of checking the current CPU utilization of processes on any Netdata monitored system.

Function “Processes” provides a top-like view, listing and sorting individual processes based on their CPU consumption, memory consumption, I/O and more.

Although Functions cannot provide historical data, it can be a great tool for identifying the specific processes that currently cause high CPU utilization.

netdata-functions