Costa Tsaousis spoke at SREday London 2024 on September 19–20 with “Practical AI with Machine Learning for Observability in Netdata.” The event ran a promo code – COSTA10 – for 10% off tickets, which brought some extra traffic our way.
The talk was tailored for an SRE audience. These are people who live in dashboards, write alert rules, and get paged at night. They are skeptical of ML claims because they have seen too many tools that promise “intelligent alerting” and deliver more noise. Costa focused on the mechanics: how Netdata trains unsupervised models per metric at the edge, why anomaly convergence across metrics matters more than any single anomaly score, and how this translates to fewer false positives in practice.
The gap between traditional threshold-based alerting and ML-driven anomaly discovery was a recurring topic in hallway conversations. SREs want to reduce alert fatigue without missing real incidents. That is exactly the problem space where Netdata’s approach – multiple independent ML models that flag deviations, combined with a correlation layer that only escalates when anomalies cluster – gets the most traction.
SREday drew site reliability engineers and platform engineers from across the UK and Europe for two days of talks and workshops. The audience was technical and direct, which made for productive discussions.