The only agent that thinks for itself

Autonomous Monitoring with self-learning AI built-in, operating independently across your entire stack.

Unlimited Metrics & Logs
Machine learning & MCP
5% CPU, 150MB RAM
3GB disk, >1 year retention
800+ integrations, zero config
Dashboards, alerts out of the box
> Discover Netdata Agents
Centralized metrics streaming and storage

Aggregate metrics from multiple agents into centralized Parent nodes for unified monitoring across your infrastructure.

Stream from unlimited agents
Long-term data retention
High availability clustering
Data replication & backup
Scalable architecture
Enterprise-grade security
> Learn about Parents
Fully managed cloud platform

Access your monitoring data from anywhere with our SaaS platform. No infrastructure to manage, automatic updates, and global availability.

Zero infrastructure management
99.9% uptime SLA
Global data centers
Automatic updates & patches
Enterprise SSO & RBAC
SOC2 & ISO certified
> Explore Netdata Cloud
Deploy Netdata Cloud in your infrastructure

Run the full Netdata Cloud platform on-premises for complete data sovereignty and compliance with your security policies.

Complete data sovereignty
Air-gapped deployment
Custom compliance controls
Private network integration
Dedicated support team
Kubernetes & Docker support
> Learn about Cloud On-Premises
Powerful, intuitive monitoring interface

Modern, responsive UI built for real-time troubleshooting with customizable dashboards and advanced visualization capabilities.

Real-time chart updates
Customizable dashboards
Dark & light themes
Advanced filtering & search
Responsive on all devices
Collaboration features
> Explore Netdata UI
Monitor on the go

Native iOS and Android apps bring full monitoring capabilities to your mobile device with real-time alerts and notifications.

iOS & Android apps
Push notifications
Touch-optimized interface
Offline data access
Biometric authentication
Widget support
> Download apps

Best energy efficiency

True real-time per-second

100% automated zero config

Centralized observability

Multi-year retention

High availability built-in

Zero maintenance

Always up-to-date

Enterprise security

Complete data control

Air-gap ready

Compliance certified

Millisecond responsiveness

Infinite zoom & pan

Works on any device

Native performance

Instant alerts

Monitor anywhere

80% Faster Incident Resolution
AI-powered troubleshooting from detection, to root cause and blast radius identification, to reporting.
True Real-Time and Simple, even at Scale
Linearly and infinitely scalable full-stack observability, that can be deployed even mid-crisis.
90% Cost Reduction, Full Fidelity
Instead of centralizing the data, Netdata distributes the code, eliminating pipelines and complexity.
Control Without Surrender
SOC 2 Type 2 certified with every metric kept on your infrastructure.
Integrations

800+ collectors and notification channels, auto-discovered and ready out of the box.

800+ data collectors
Auto-discovery & zero config
Cloud, infra, app protocols
Notifications out of the box
> Explore integrations
Real Results
46% Cost Reduction

Reduced monitoring costs by 46% while cutting staff overhead by 67%.

— Leonardo Antunez, Codyas

Zero Pipeline

No data shipping. No central storage costs. Query at the edge.

From Our Users
"Out-of-the-Box"

So many out-of-the-box features! I mostly don't have to develop anything.

— Simon Beginn, LANCOM Systems

No Query Language

Point-and-click troubleshooting. No PromQL, no LogQL, no learning curve.

Enterprise Ready
67% Less Staff, 46% Cost Cut

Enterprise efficiency without enterprise complexity—real ROI from day one.

— Leonardo Antunez, Codyas

SOC 2 Type 2 Certified

Zero data egress. Only metadata reaches the cloud. Your metrics stay on your infrastructure.

Full Coverage
800+ Collectors

Auto-discovered and configured. No manual setup required.

Any Notification Channel

Slack, PagerDuty, Teams, email, webhooks—all built-in.

From Our Users
"A Rare Unicorn"

Netdata gives more than you invest in it. A rare unicorn that obeys the Pareto rule.

— Eduard Porquet Mateu, TMB Barcelona

99% Downtime Reduction

Reduced website downtime by 99% and cloud bill by 30% using Netdata alerts.

— Falkland Islands Government

Real Savings
30% Cloud Cost Reduction

Optimized resource allocation based on Netdata alerts cut cloud spending by 30%.

— Falkland Islands Government

46% Cost Cut

Reduced monitoring staff by 67% while cutting operational costs by 46%.

— Codyas

Real Coverage
"Plugin for Everything"

Netdata has agent capacity or a plugin for everything, including Windows and Kubernetes.

— Eduard Porquet Mateu, TMB Barcelona

"Out-of-the-Box"

So many out-of-the-box features! I mostly don't have to develop anything.

— Simon Beginn, LANCOM Systems

Real Speed
Troubleshooting in 30 Seconds

From 2-3 minutes to 30 seconds—instant visibility into any node issue.

— Matthew Artist, Nodecraft

20% Downtime Reduction

20% less downtime and 40% budget optimization from out-of-the-box monitoring.

— Simon Beginn, LANCOM Systems

Pay per Node. Unlimited Everything Else.

One price per node. Unlimited metrics, logs, users, and retention. No per-GB surprises.

Free tier—forever
No metric limits or caps
Retention you control
Cancel anytime
> See pricing plans
What's Your Monitoring Really Costing You?

Most teams overpay by 40-60%. Let's find out why.

Expose hidden metric charges
Calculate tool consolidation
Customers report 30-67% savings
Results in under 60 seconds
> See what you're really paying
Your Infrastructure Is Unique. Let's Talk.

Because monitoring 10 nodes is different from monitoring 10,000.

On-prem & air-gapped deployment
Volume pricing & agreements
Architecture review for your scale
Compliance & security support
> Start a conversation
Monitoring That Sells Itself

Deploy in minutes. Impress clients in hours. Earn recurring revenue for years.

30-second live demos close deals
Zero config = zero support burden
Competitive margins & deal protection
Response in 48 hours
> Apply to partner
Per-Second Metrics at Homelab Prices

Same engine, same dashboards, same ML. Just priced for tinkerers.

Community: Free forever · 5 nodes · non-commercial
Homelab: $90/yr · unlimited nodes · fair usage
> Start monitoring your lab—free
$1,000 Per Referral. Unlimited Referrals.

Your colleagues get 10% off. You get 10% commission. Everyone wins.

10% of subscriptions, up to $1,000 each
Track earnings inside Netdata Cloud
PayPal/Venmo payouts in 3-4 weeks
No caps, no complexity
> Get your referral link
Cost Proof
40% Budget Optimization

"Netdata's significant positive impact" — LANCOM Systems

Calculate Your Savings

Compare vs Datadog, Grafana, Dynatrace

Savings Proof
46% Cost Reduction

"Cut costs by 46%, staff by 67%" — Codyas

30% Cloud Bill Savings

"Reduced cloud bill by 30%" — Falkland Islands Gov

Enterprise Proof
"Better Than Combined Alternatives"

"Better observability with Netdata than combining other tools." — TMB Barcelona

Real Engineers, <24h Response

DPA, SLAs, on-prem, volume pricing

Why Partners Win
Demo Live Infrastructure

One command, 30 seconds, real data—no sandbox needed

Zero Tickets, High Margins

Auto-config + per-node pricing = predictable profit

Homelab Ready
"Absolutely Incredible"

"We tested every monitoring system under the sun." — Benjamin Gabler, CEO Rocket.Net

76k+ GitHub Stars

3rd most starred monitoring project

Worth Recommending
Product That Delivers

Customers report 40-67% cost cuts, 99% downtime reduction

Zero Risk to Your Rep

Free tier lets them try before they buy

Never Fight Fires Alone

Docs, community, and expert help—pick your path to resolution.

Learn.netdata.cloud docs
Discord, Forums, GitHub
Premium support available
> Get answers now
60 Seconds to First Dashboard

One command to install. Zero config. 850+ integrations documented.

Linux, Windows, K8s, Docker
Auto-discovers your stack
> Start monitoring now
See Netdata in Action

Watch real-time monitoring in action—demos, tutorials, and engineering deep dives.

Product demos and walkthroughs
Real infrastructure, not staged
> Start with the 3-minute tour
Level Up Your Monitoring
Real problems. Real solutions. 112+ guides from basic monitoring to AI observability.
76,000+ Engineers Strong
615+ contributors. 1.5M daily downloads. One mission: simplify observability.
Per-Second. 90% Cheaper. Data Stays Home.
Side-by-side comparisons: costs, real-time granularity, and data sovereignty for every major tool.

See why teams switch from Datadog, Prometheus, Grafana, and more.

> Browse all comparisons
Edge-Native Observability, Born Open Source
Per-second visibility, ML on every metric, and data that never leaves your infrastructure.
Founded in 2016
615+ contributors worldwide
Remote-first, engineering-driven
Open source first
> Read our story
Promises We Publish—and Prove
12 principles backed by open code, independent validation, and measurable outcomes.
Open source, peer-reviewed
Zero config, instant value
Data sovereignty by design
Aligned pricing, no surprises
> See all 12 principles
Edge-Native, AI-Ready, 100% Open
76k+ stars. Full ML, AI, and automation—GPLv3+, not premium add-ons.
76,000+ GitHub stars
GPLv3+ licensed forever
ML on every metric, included
Zero vendor lock-in
> Explore our open source
Build Real-Time Observability for the World
Remote-first team shipping per-second monitoring with ML on every metric.
Remote-first, fully distributed
Open source (76k+ stars)
Challenging technical problems
Your code on millions of systems
> See open roles
Talk to a Netdata Human in <24 Hours
Sales, partnerships, press, or professional services—real engineers, fast answers.
Discuss your observability needs
Pricing and volume discounts
Partnership opportunities
Media and press inquiries
> Book a conversation
Your Data. Your Rules.
On-prem data, cloud control plane, transparent terms.
Trust & Scale
76,000+ GitHub Stars

One of the most popular open-source monitoring projects

SOC 2 Type 2 Certified

Enterprise-grade security and compliance

Data Sovereignty

Your metrics stay on your infrastructure

Validated
University of Amsterdam

"Most energy-efficient monitoring solution" — ICSOC 2023, peer-reviewed

ADASTEC (Autonomous Driving)

"Doesn't miss alerts—mission-critical trust for safety software"

Community Stats
615+ Contributors

Global community improving monitoring for everyone

1.5M+ Downloads/Day

Trusted by teams worldwide

GPLv3+ Licensed

Free forever, fully open source agent

Why Join?
Remote-First

Work from anywhere, async-friendly culture

Impact at Scale

Your work helps millions of systems

Compliance
SOC 2 Type 2

Audited security controls

GDPR Ready

Data stays on your infrastructure

Blog

Understanding Huge Pages

Optimizing Memory Usage
by Satyadeep Ashwathnarayana · May 4, 2023

Memory-intensive applications can benefit from improved performance by using huge pages, as they can reduce TLB pressure and memory fragmentation, and lower the memory management overhead overall. Developers should consider using HugeTLBfs in their mmap() and shmget() calls to take advantage of huge pages.

Transparent Huge Pages (THP) is a Linux kernel feature that provides some of the benefits of huge pages without requiring any development effort. However, THP can cause latency in many applications. Although kernel developers are actively working to address these issues, many system administrators prefer to disable THP altogether.

Netdata can assist in determining whether THP is helpful or harmful to your applications, which can guide your decision regarding its use.

Huge Pages

Huge pages are a memory management technique used in modern computer systems to improve performance by using larger memory blocks than the default page size. They help reduce the pressure on the Translation Lookaside Buffer (TLB) and lower the overhead of managing memory in systems with large amounts of RAM.

To understand what they are and how they improve performance, we first need to understand how modern computers use their physical memory.

Virtual Memory and Paging

Virtual memory and paging are two concepts that work together to manage memory efficiently and provide an abstraction layer between physical memory and applications.

Virtual memory

Virtual memory is a technique that provides an abstraction layer between the physical memory (RAM) and the applications running on a computer. It allows each process to have its own private address space, which is isolated from the address spaces of other processes. This ensures that one process cannot directly access the memory of another process, providing security and stability.

Virtual memory also allows a computer to use more memory than is physically installed by using disk storage as an extension of the physical memory. This is achieved through a technique called “swapping” or “paging” (discussed below), where portions of memory are temporarily stored on disk and loaded back into RAM when needed.

The main benefits of virtual memory include:

  • Isolation: Each process has its own address space, preventing accidental or malicious access to another process’s memory.

  • Efficient memory utilization: Processes only need to have their active portions in physical memory, allowing the system to run more applications concurrently.

  • Simplified memory management: The operating system can provide a consistent view of memory to applications, regardless of the underlying physical memory layout.

Paging

Paging is the mechanism that enables virtual memory to work. It involves dividing the virtual memory address space into fixed-size units called pages, and the physical memory into fixed-size units called frames. The size of a page is typically the same as that of a frame (e.g., 4KB on x86 systems). The operating system maintains a mapping between virtual pages and physical frames, called the page table.

When an application accesses memory, it uses virtual addresses. The operating system, with the help of the CPU’s memory management unit (MMU), translates these virtual addresses into physical addresses using the page table. This translation is called “paging” or “page translation.”

If the required page is not present in the physical memory (a “page fault”), the operating system fetches the page from disk storage (if it was swapped out) or allocates a new page in memory (if it’s a new allocation). Then, the page table is updated with the new mapping, and the application can continue executing.

Translation Lookaside Buffers (TLB)

When a CPU accesses memory using a virtual address, it needs to translate that address into a physical address. This translation process involves looking up the virtual address in the page table, which contains mappings between virtual pages and physical frames. However, accessing the page table can be slow, as it resides in the main memory (RAM) and can span multiple levels (e.g., in multi-level paging).

To speed up address translation, the CPU uses a TLB to cache recent virtual-to-physical address translations. The TLB is much faster than main memory because it’s a small, specialized cache built within the CPU. When the CPU needs to translate a virtual address, it first checks the TLB for the translation. If the translation is found in the TLB (a “TLB hit”), the CPU can quickly access the corresponding physical memory.

If the translation is not found in the TLB (a “TLB miss”), the CPU must access the page table in main memory to perform the translation. This process is called a “page table walk” and can be time-consuming. After the translation is obtained from the page table, the CPU updates the TLB with the new translation, so future accesses to the same virtual address can benefit from the TLB cache.

Huge Pages

As the amount of physical memory increases, the number of TLB entries required to manage the memory also increases, leading to more TLB misses and reduced performance.

Huge pages address this issue by using larger memory blocks (e.g., 2MB or 1GB) that can be mapped using a single TLB entry. By using huge pages, more memory can be managed with fewer TLB entries, resulting in fewer TLB misses.

This improves performance for memory-intensive applications in the following ways:

  1. Reduced TLB pressure: As huge pages are larger than regular pages, fewer TLB entries are required to manage the same amount of memory. This leads to fewer TLB misses and better performance for memory-intensive applications.

  2. Lower memory management overhead: With fewer pages to manage, the operating system spends less time updating and maintaining page tables.

  3. Reduced memory fragmentation: Large memory allocations are more likely to be contiguous when using huge pages, reducing internal and external fragmentation.

HugeTLBfs

Linux provides a feature called HugeTLBfs, which allows applications to explicitly use huge pages.

HugeTLBfs requires application developers to make changes to their code to use huge pages (e.g., using mmap() with the MAP_HUGETLB flag or shmget() with the SHM_HUGETLB flag). So, HugeTLBfs can only be used if the application has been developed to make use of it.

Transparent HugePages (THP)

On the other hand, THP is a Linux kernel feature that automatically backs virtual memory with huge pages, without requiring any changes to the application code.

THP monitors memory usage and can promote smaller pages to huge pages when it’s beneficial, or demote them back to smaller pages when it’s not. THP works with anonymous memory mappings (memory not backed by files on disk) and tmpfs/shmem (temporary file systems in memory). THP aims to provide the benefits of huge pages with minimal effort from developers and system administrators.

Pros of Transparent HugePages (THP):

  1. Improved performance: THP can improve performance for memory-intensive applications by reducing TLB misses and lowering memory management overhead. This is particularly beneficial for systems with large amounts of RAM.

  2. Automatic management: THP automatically manages the promotion and demotion of page sizes, requiring no modifications to application code or explicit huge page allocations. This simplifies the process of using huge pages and makes it accessible to a wider range of applications.

  3. Works with anonymous memory and tmpfs/shmem: THP supports anonymous memory mappings (memory not backed by files on disk) and tmpfs/shmem (temporary file systems in memory), which are common use cases for memory-intensive applications.

  4. Configurable: THP behavior can be tuned through the sysfs interface or kernel boot parameters, allowing system administrators to control aspects such as allocation policies and defragmentation.

Cons of Transparent HugePages (THP):

  1. Suboptimal memory usage: THP may promote smaller pages to huge pages even when it is not beneficial, which can lead to increased memory usage. This can be particularly problematic for applications with irregular or unpredictable memory access patterns.

  2. Latency spikes: The process of promoting and demoting pages can introduce latency spikes, especially during memory compaction and defragmentation. This can negatively impact the performance of latency-sensitive applications.

  3. Limited control: While THP simplifies huge page usage by automating it, this can also limit the control that developers have over memory allocation. For applications that require fine-grained control over huge page allocations, the HugeTLBfs interface may be more suitable.

  4. Compatibility issues: Some applications may not work well with THP or may encounter performance regressions due to the automatic management of page sizes. In such cases, it might be necessary to disable THP for specific applications or system-wide.

  5. Fragmentation: In some cases, THP might cause memory fragmentation, which can make it more difficult to allocate huge pages during runtime. This can lead to suboptimal performance or even allocation failures.

  6. Overhead of background processes: THP introduces additional background processes, such as khugepaged, which is responsible for promoting smaller pages to huge pages. In some cases, the overhead of these processes might outweigh the performance benefits of using THP.

  7. Difficulty in monitoring: When using THP, memory statistics and usage may become more difficult to understand and monitor. For example, the total amount of memory used by huge pages might not be immediately apparent from standard memory statistics.

There are some well-known applications that have recommendations regarding HugePages usage based on their specific memory access patterns and performance characteristics.

Here are a few examples:

  1. MongoDB: The MongoDB documentation recommends disabling THP due to potential performance issues. MongoDB’s memory usage pattern can lead to suboptimal memory allocation when using THP, which may result in reduced performance.

  2. Redis: The Redis documentation also suggests disabling THP. It states that THP may have a negative impact on Redis performance due to the increased latency caused by memory defragmentation during the allocation of huge pages.

  3. PostgreSQL: PostgreSQL does not have an official recommendation regarding THP usage. However, PostgreSQL has added support for HugePages for shared buffers, meaning that it can use them when requested, without THP enabled.

  4. MySQL: There is no official recommendation from MySQL regarding THP usage. However, MySQL has added support for HugePages for InnoDB, meaning that it can use them when requested, without THP enabled.

  5. Oracle Database: Oracle recommends disabling THP, due to memory allocation delays during runtime.

  6. Apache Cassandra: Apache Cassandra documentation recommends disabling THP, due to noticeable performance problems when Linux tries to defrag 2MB pages.

  7. Elasticsearch: Elasticsearch does not have an official recommendation regarding THP usage. However, there is a discussion where the core developers of Elasticsearch made tests and they experienced a degradation of performance by 7%.

  8. Java Virtual Machine (JVM): The use of THP with JVM-based applications (such as Apache Tomcat, Hadoop, or Spark) can have mixed results. Some users have reported performance improvements with THP enabled, while others have experienced performance regressions or increased garbage collection (GC) pauses.

  9. KVM/QEMU (Virtualization): For virtualization platforms such as KVM and QEMU, the benefits of using THP can vary depending on the workloads running on the virtual machines. Some users have reported improved performance with THP enabled, while others have experienced performance regressions. For KVM/QEMU environments, Suse has observed a 1000x decrease in page faults when running synthetic workloads with contiguous memory access patterns. However, as they comment, workloads with sparse memory access patterns (like databases) may perform poorly with THP, so it needs to be disabled.

Although these recommendations and experiences are not that promising and have led many users to disable THP globally, it seems that this technology plays a crucial role in the future of memory management in Linux systems. The 4k TLB just cannot keep up with the current needs of computing. As this user says, currently (April 2023) Meta on its 64GB servers observes close to 20% of the execution cycles to be handling TLB misses, making THP a necessity for common applications.

There is currently a lot of activity around THP. More and more kernel patches, (check also this one), claim that the problem of latency spikes due to defragmentation is greatly improved in their tests, so the Linux community is probably very close to a breakthrough in this area.

How can Netdata help?

In the “Memory” section of the dashboard, the “hupepages” subsection will automatically appear if THP is enabled on a system.

The first chart shows the current THP memory. In this specific instance, we see how THP compacts small 4KB pages into 2MB huge pages over time. This process is totally transparent to the applications running.

mem-thp

Next to it, there are 3 charts that show THP memory allocations. There are 3 kinds of allocations THP does, on page fault, mapped file allocations and huge zero page allocations:

mem-thp-faults

After the allocation charts, there are 2 charts that present how THP a) collapses 4KB pages into 2MB pages and b) splits 2MB pages into 4KB pages:

mem-thp-collapse

And finally, there is a chart that presents the compaction process of THP. In this chart, there is a dimension called stall, which shows the number of times an application was stalled during a memory allocation, due to THP trying to compact memory to make room for a huge page. In our tests, we couldn’t make it trigger, probably because this requires memory to be severely fragmented to happen.

mem-thp-compact

Checking memory fragmentation

Memory fragmentation is exposed via debugfs in Linux. To check the current memory fragmentation, execute:

sudo cat /sys/kernel/debug/extfrag/extfrag_index

The output looks like this:

Node 0, zone      DMA -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 
Node 0, zone    DMA32 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 
Node 0, zone   Normal -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 

The closer the index is to 0, the more the memory allocation is likely to fail due to insufficient memory. When the index is closer to 1000, the more the allocation is likely to fail due to excessive external fragmentation. An index of -1 means that allocations will not fail.

Disabling THP

To view the current THP configuration:

cat /sys/kernel/mm/transparent_hugepage/enabled

If the value is always, execute the following commands:

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag