Blog

SafetyDetectives: An Interview-With-Costa-Tsaousis

How Netdata is setting new standards in the monitoring industry
by Netdata Team · April 1, 2024

In a recent conversation with SafetyDetectives, Costa Tsaousis, CEO and founder of Netdata, shares insights into the inception and evolution of Netdata, a game-changing monitoring solution. With a background in fintech and a passion for real-time data processing, Tsaousis was driven to create Netdata in response to the significant gaps he identified in traditional monitoring tools. Emphasizing the importance of real-time data, comprehensive metrics collection, and the innovative use of machine learning, Tsaousis discusses how Netdata is setting new standards in the monitoring industry. His vision for Netdata not only challenges the status quo but also introduces a novel approach to cybersecurity, making it an essential tool for organizations worldwide.

Introduction

Could you introduce yourself and explain your role at Netdata?

My name is Costa Tsaousis, and I am the founder of Netdata. I started the project several years ago, as an open source project. Today, I’m the CEO – As a startup we’ve raised some money and we’re trying to make monitoring a little bit different from what it used to be and what it commonly was.

Inspiration Behind Netdata

What inspired you to start Netdata, and what kind of gaps in the market do you aim to fill?

The inspiration came during a challenging migration of infrastructure from on-prem to the cloud. I was working for a fintech company where real-time processing is crucial. Delays are unacceptable as they lead to bottlenecks in retail sales. We encountered severe slowdowns without any clear understanding of the cause. Despite spending millions of euros on existing monitoring solutions, assembling a team of eight, and hiring consultants, we couldn’t pinpoint the issue. That’s when I realized something fundamental was lacking in the monitoring industry. Certain problems could remain completely undetected.

Driven by the question of how an ideal monitoring solution could operate – one that could detect everything in real-time, handle unlimited metrics – I began experimenting. This led to the development of an application that, after two years, solved our issues, which turned out to be related to cloud provider bugs that intermittently froze our VMs.

Upon releasing the solution as open source on GitHub, it quickly became one of the fastest-growing projects, it got 10,000 stars in just two weeks. The traction it gained made it clear many people faced similar challenges. Engaging with the community, I continued to enhance the application.

As its user base expanded, with Netdata attracting five to ten thousand new users daily and over 200,000 Docker Hub downloads per day, I recognized its potential. Netdata introduces a novel approach to monitoring: it’s distributed, high-fidelity, and offers fully automated, out-of-the-box dashboards. Traditional monitoring systems require extensive setup and configuration to understand and utilize metrics. In contrast, Netdata simplifies this process, providing immediate insights and alerts upon installation, even mid-crisis.

Realizing the significant impact and the unique value proposition of Netdata, I decided to leave the fintech company, sold my shares, and founded Netdata to further this vision.

Enhancing Cybersecurity

How does Netdata enhance the cybersecurity posture of an organization?

This is where it gets really interesting. First off, Netdata is real-time. This means you’re not looking at data from the past; you’re seeing everything as it happens. If there’s an SSH activity on a server, it appears on your dashboard just a second later. We collect a vast amount of metrics because we believe everything could be vital for understanding your system’s conditions. We don’t filter out data; if a metric is exposed by a system or an application, we collect it.

We’ve integrated machine learning in a way that’s unique to Netdata. We train multiple machine learning models for every collected metric to understand the normal behavior patterns. This enables us to detect anomalies in real-time. For cybersecurity, this is crucial because security incidents often involve activities that shouldn’t be happening, and our system picks up on these anomalies across a wide array of metrics.

When a security incident occurs, it’s not just a single metric that goes off; it’s a cluster of metrics indicating abnormal activity. This could range from unexpected SSH sessions to unusual system load patterns indicating a potential attack. With Netdata, you’re not just monitoring; you’re actively detecting these anomalies as they occur.

Consider the observability difference: on an empty VM, traditional tools might give you around a hundred metrics, but with Netdata, you’d have access to around 3,000 per-second metrics. This comprehensive coverage is critical for cybersecurity. When we detect anomalies, they usually occur in clusters, meaning multiple related metrics will signal an issue simultaneously.

This clustering of anomalies, seen across multiple servers or within a short time frame, provides a clear picture of security incidents or attacks. Netdata’s machine learning capabilities allow for this level of detailed observation. When you notice a spike in anomalies, you can easily highlight the area and ask Netdata to sift through thousands of metrics, prioritizing them by their anomaly score. This process gives you a sorted list of potential issues without the need for speculation.

In traditional monitoring systems, you might notice a spike or drop and start guessing what could be the cause—DNS, storage, etc. With Netdata, we flip this approach. By highlighting an anomaly spike, you get a prioritized list of all the metrics showing unusual behavior. This direct access to relevant data streamlines the process, making it easier to identify and respond to security threats efficiently.

Remote Work and Distributed IT Environments

With the shift towards remote work, how can tools like Netdata help organizations manage their distributed IT environments?

Netdata is inherently distributed by design, which aligns perfectly with the nature of remote work and distributed IT environments. Unlike traditional monitoring tools that rely on centralization, Netdata emphasizes monitoring and observability directly at the edge, close to where data and activities are happening.

This distributed approach is especially beneficial for organizations with IoT devices scattered globally or those needing to monitor personal computing devices to ensure they operate within specified norms. Our users range from those managing IoT deployments across different locations to individuals tracking the performance of personal devices like Windows PCs, ensuring everything functions as expected.

By flipping the traditional centralized monitoring model, Netdata facilitates a more immediate and contextually relevant view of an organization’s infrastructure. This allows for real-time observability and management across all nodes of an infrastructure, regardless of their physical location. It’s this capability that makes Netdata particularly suited to today’s distributed organizations, providing a seamless solution for monitoring in the age of remote work.

Leveraging AI and Machine Learning

With AI and machine learning becoming increasingly important in cybersecurity, how is Netdata leveraging these technologies?

It’s essential to distinguish between the buzz around AI and machine learning and the actual application of these technologies. While “machine learning observability” has become a trendy phrase for many companies, often what’s being described is more akin to an expert system that doesn’t utilize genuine machine learning techniques. At Netdata, we take a different approach. Our machine learning code is open source, allowing for transparency and community review. This openness ensures that users can not only trust but also verify and understand the machine learning processes we employ.

We’re committed to leveraging real machine learning in observability, setting us apart in a field where true application of these technologies is rare. Our approach aligns with the principle that machine learning isn’t a magical solution that operates autonomously. Instead, it should be seen as an advisor or consultant, enhancing human decision-making processes.

This perspective was echoed in a Google talk in 2019, highlighting the misconception that machine learning can solve any problem autonomously. The reality is that machine learning is a tool to provide insights, making processes more effective, efficient, and results-oriented. At Netdata, we embrace this philosophy, using machine learning to empower users with actionable insights, thereby enhancing the cybersecurity posture and overall efficiency of the systems we monitor.

Future Developments

Are there any new features or developments that Netdata is working on that your users can anticipate?

Absolutely, Netdata is in a constant state of evolution. We are always introducing new features, particularly in the realm of visualization, as part of our mission to simplify user experiences. Our ultimate aim is to enhance efficiency and ease the monitoring process by stripping away complexity. This commitment to innovation is driven by the need to adapt alongside shifting technologies and advancements in the field.

Our objectives revolve around reducing the skills, effort, time, and cost associated with observability. By integrating more accurate insights and leveraging tools like machine learning, we strive to offer more value. This journey is ongoing and touches on several aspects, including visualization enhancements, automation improvements, and database optimizations. In essence, every development at Netdata is designed to empower our users to become more effective and efficient in their roles.