Another release of the Netdata Monitoring solution is here!
- Netdata Growth
- Release Highlights
- Dashboard Sections' Summary Tiles
Added summary tiles to most sections of the fully-automated dashboards, to provide an instant view of the most important metrics for each section. - Silencing of Cloud Alert Notifications
Maintenance window coming up? Active issue being checked? Use the Alert notification silencing engine to mute your notifications. - Machine Learning - Extended Training to 24 Hours
Netdata now trains multiple models per metric, to learn the behavior of each metric for the last 24 hours. Trained models are persisted on disk and are loaded back on Netdata restart. - Rewritten SSL Support for the Agent
Netdata Agent now features a new SSL layer that allows it to reliably use SSL on all its features, including the API and Streaming.
- Dashboard Sections' Summary Tiles
- Alerts and Notifications
- Visualizations / Charts and Dashboards
- Preliminary steps to split native packages
- Acknowledgements
- Contributions
- Deprecation notice
- Cloud recommended version
- Release meetup
- Support options
- Running survey
Netdata Growth
🚀 Our community growth is increasing steadily. ❤️ Thank you! Your love and acceptance give us the energy and passion to work harder to simplify and make monitoring easier, more effective and more fun to use.
- Over 63,000 GitHub Stars ⭐
- Over 1.5 million online nodes
- Almost 94 million sessions served
- Over 600 thousand total nodes in Netdata Cloud
Wow! Netdata Cloud is about to become the biggest and most scalable monitoring infra ever created!
Let the world know you love Netdata. Give Netdata a ⭐ on GitHub now. Motivate us to keep pushing forward!
Unlimited Docker Hub Pulls!
To help our community use Netdata more broadly, we just signed an agreement with Docker for the purchase of Rate Limit Removal, which will remove all Docker Hub pull limits for the Netdata repos at Docker Hub. We expect this add-on to be applied to our repos in the following few days, so that you will enjoy unlimited Docker Hub pulls of Netdata Docker images for free!
Release Highlights
Dashboard Sections' Summary Tiles
Netdata Cloud dashboards have been improved to provide instant summary tiles for most of their sections. This includes system overview, disks, network interfaces, memory, mysql, postgresql, nginx, apache, and dozens more.
To accomplish this, we extended the query engine of Netdata to support multiple grouping passes, so that queries like “sum metrics by label X, and then average by node” are now possible. At the same time we made room for presenting anomaly rates on them (vertical purple bar on the right) and significantly improved the tile placement algorithm to support multi-line summary headers and precise sizing and positioning, providing a look and feel like this:
The following chart tile types have been added:
- Donut
- Gauge
- Bar
- Trendline
- Number
- Pie chart
To improve the efficiency of using these tiles, each of these tiles supports the following interactive actions:
- Clicking the title of the tile scroll the dashboard to the data source chart, where you can slice, dice and filter the data based on which the tile was created.
- Hovering the tile with your mouse pointer, the NIDL (Nodes, Instances, Dimensions, Labels) framework buttons appear, allowing you to explore and filter the data set, right on the tile.
Some examples that you can see from the Netdata Demo space:
Silencing of Cloud Alert Notifications
Although Netdata Agent alerts support silencing, centrally dispatched alert notifications from Netdata Cloud were missing that feature. Today, we release alert notifications silencing rules for Netdata Cloud!
Silencing rules are applied on any combination of the following: users, rooms, nodes, host labels, contexts (charts), alert name, alert role. For the matching alerts, silencing can optionally have a starting date and time and/or an ending date time.
With this feature you can now easily setup silencing rules, which can be set to be applied immediately or at a defined schedule, allowing you to plan for upcoming schedule maintenance windows - see some examples here.
Read more about Silencing Alert notifications on our documentation.
Machine Learning - Extended Training to 24 Hours
Netdata trains ML models for each metric, using its past data. This allows Netdata to detect anomalous behaviors in metrics, based exclusively on the recent past data of the metric itself.
Before this release Netdata was training one model of each metric, learning the behavior of each metric during the last 4 hours. In the previous release we introduced persisting these models to disk and loading them back when Netdata restarts.
In this release we change the default ML settings to support multiple models per metric, maintaining multiple trained models per metric, covering the behavior of each metric for last 24 hours. All these models are now consulted automatically in order to decide if a data collection point is anomalous or not.
This has been implemented in a way to avoid introducing additional CPU overhead on Netdata agents. So, instead of training one model for 24 hours which would introduce significant query overhead on the server, we train each metric every 3 hours using the last 6 hours of data, and we keep 9 models per metric. The most recent model is consulted first during anomaly detection. Additional models are consulted as long as the previous ones predict an anomaly. So only when all 9 models agree that a data collection is anomalous, we mark the collected sample as anomalous in the database.
The impact of these changes is more accurate anomaly detection out of the box, with much fewer false positives.
You can read more about it in this deck presented during a recent office hours (office hours recording).
Rewritten SSL Support for the Agent
The SSL support at the Netdata Agent has been completely rewritten. The new code now reliably support SSL connections for both the Netdata internal web server and streaming. It is also easier to understand, troubleshoot and expand. At the same time performance has been improved by removing redundant checks.
During this process a long-standing bug on streaming connection timeouts has been identified and fixed, making streaming reliable and robust overall.
Alerts and Notifications
Mattermost notifications for Business Plan users
To keep building up on our set of existing alert notification methods we added Mattermost as another notification integration option on Netdata Cloud. As part of our commitment to expanding our set of alert notification methods, Mattermost provides another reliable way to deliver alerts to your team, ensuring the continuity and reliability of your services.
Business Plan users can now configure Netdata Cloud to send alert notifications to their team on Mattermost.
Visualizations / Charts and Dashboards
Netdata Functions
On top of the work done on release v1.38, where we introduced real-time functions that enable you to trigger specific routines to be executed by a given Agent on demand. Our initial function provided detailed information on currently running processes on the node, effectively replacing top and iotop.
We have now added the capability to group your results by specific attributes. For example, on the Processes function you are now able to group the results by: Category, Cmd or User. With this capability you can now get a consolidated view of your reported statistics over any of these attributes.
External plugin integration
The agent core has been improved when it comes to integration with external plugins. Under certain conditions, a failed plugin would not be correctly acknowledged by the agent resulting in a defunc (i.e. zombie) plugin process. This is now fixed.
Preliminary steps to split native packages
Starting with this release, our official DEB/RPM packages have been split so that each external data collection plugin is in its own package instead of having everything bundled into a single package. We have previously had our CUPS and FreeIPMI collectors split out like this, but this change extends that to almost all of our external data collectors. This is the first step towards making these external collectors optional on installs that use our native packages, which will in turn allow users to avoid installing things they don’t actually need.
Short-term, these external collectors are listed as required dependencies to ensure that updates work correctly. At some point in the future almost all of them will be changed to be optional dependencies so that users can pick and choose which ones they want installed.
This change also includes a large number of fixes for minor issues in our native packages, including better handling of user accounts and file permissions and more prevalent usage of file capabilities to improve the security of our native packages.
Acknowledgements
We would like to thank our dedicated, talented contributors that make up this amazing community. The time and expertise that you volunteer are essential to our success. We thank you and look forward to continuing to grow together to build a remarkable product.
- @n0099 for fixing typos in the documentation.
- @mochaaP for fixing cross-compiling issues.
- @jmphilippe for making control address configurable in python.d/tor.
- @TougeAI for documenting the “age” configuration option in python.d/smartd_log.
- @mochaaP for adding support of python-oracledb to python.d/oracledb.
Contributions
Collectors
Improvements
- Add parent_table label to table/index metrics (go.d/postgres) (#1199, @ilyam8)
- Make tables and indexes limit configurable (go.d/postgres) (#1200, @ilyam8)
- Add Hyper-V metrics (go.d/windows) (#1164, @thiagoftsm)
- Add “maps per core” config option (ebpf.plugin) (#14691, @thiagoftsm)
- Add plugin that collect metrics from /sys/fs/debugfs (debugfs.plugin) (#15017, @thiagoftsm)
- Add support of python-oracledb (python.d/oracledb) (#15074, @EricAndrechek)
- Make control address configurable (python.d/tor) (#15041, @jmphilippe)
- Make connection protocol configurable (python.d/oracledb) (#15104, @ilyam8)
- Add availability status chart and alarm (freeipmi.plugin) (#15151, @ilyam8)
- Improve error messages when legacy code is not installed (ebpf.plugin) (#15146, @thiagoftsm)
Bug fixes
- Fix handling of newlines in HELP (go.d/prometheus) (#1196, @ilyam8)
- Fix collection of bind mounts (diskspace.plugin) (#14831, @MrZammler)
- Fix collection of zero metrics if Zswap is disabled (debugfs.plugin) (#15054, @ilyam8)
Other
- Document the “age” configuration option (python.d/smartd_log) (#15171, @TougeAI)
- Send EXIT before exiting in (freeipmi.plugin, debugfs.plugin) (#15140, @ilyam8)
Documentation
- Add Mattermost cloud integration docs (#15141, @car12o)
- Update Events and Silencing Rules docs (#15134, @hugovalente-pm)
- Fix a typo in simple patterns readme (#15135, @n0099)
- Add netdata demo rooms to the list of demo urls (#15120, @andrewm4894)
- Add initial draft for the silencing docs (#15112, @hugovalente-pm)
- Create category overview pages for Learn restructure (#15091, @Ancairon)
- Mention waive off of space subscription price (#15082, @hugovalente-pm)
- Update Security doc (#15072, @tkatsoulas)
- Update netdata-security.md (#15068, @cakrit)
- Fix wording in interact with charts doc (#15040, @Ancairon)
- Fix wording in the database readme (#15034, @Ancairon)
- Update troubleshooting-agent-with-cloud-connection.md (#15029, @cakrit)
- Update the billing docs for the flow (#15014, @hugovalente-pm)
- Update chart documentation (#15010, @Ancairon)
Packaging / Installation
- Fix package conflicts policy on deb based packages (#15170, @tkatsoulas)
- Fix user and group handling in DEB packages (#15166, @Ferroin)
- Change mandatory packages for RPMs (#15165, @tkatsoulas)
- Restrict ebpf dep in DEB package to amd64 only (#15161, @Ferroin)
- Make plugin packages hard dependencies (#15160, @Ferroin)
- Update libbpf to v1.2.0 (#15038, @thiagoftsm)
- Provide necessary permission for the kickstart to run the netdata-updater script (#15132, @tkatsoulas)
- Fix bundling of eBPF legacy code for DEB packages (#15127, @Ferroin)
- Fix package versioning issues (#15125, @Ferroin)
- Fix handling of eBPF plugin for DEB packages (#15117, @Ferroin)
- Improve some of the error messages in the kickstart script (#15061, @Ferroin)
- Split plugins to individual packages for DEB/RPM packaging (#13927, @Ferroin)
- Update agent telemetry url to be cloud function instead of posthog (#15085, @andrewm4894)
- Remove Fedora 36 from CI and platform support. (#14938, @Ferroin)
- Fix a fatal in the claiming script when the main action is not claiming (#15039, @ilyam8)
- Remove old logic for handling of legacy stock config files (#14829, @Ferroin)
- Make zlib compulsory dep (#14928, @underhood)
- Replace JudyLTablesGen with generated files (#14984, @mochaaP)
- Update SQLITE to version 3.41.2 (#15031, @stelfrag)
Streaming
Health
- Fix cockroachdb alarms (#15095, @ilyam8)
- Use chart labels to filter alarms (#14982, @MrZammler)
- Remove “families” from alarm configs (#15086, @ilyam8)
Exporting
- Add chart labels to Prometheus exporter (#15099, @thiagoftsm)
- Fix out-of-order labels in Prometheus exporter (#15094, @thiagoftsm)
- Fix out-of-order labels in Prometheus remote write exporter (#15097, @thiagoftsm)
ML
- Update ML defaults to 24h (#15093, @andrewm4894)
Other Notable Changes
Improvements
- Reduce netdatacli size (#15024, @stelfrag)
- Make percentage-of-group aggregatable at cloud (#15126, @ktsaou)
- Add percentage calculation on grouped queries to /api/v2/data (#15100, @ktsaou)
- Add status information and streaming stats to /api/v2/nodes (#15162, @ktsaou)
Bug fixes
- Fix the units when returning percentage of a group (#15105, @ktsaou)
- Fix uninitialized array vh in percentage-of-group (#15106, @ktsaou)
- Fix not respecting maximum message size limit of MQTT server (#15009, @underhood)
- Fix not freeing context when establishing an ACLK connection (#15073, @stelfrag)
- Fix sanitizing square brackets in label value (#15131, @ilyam8)
- Fix crash when UUID is NULL in SQLite (#15147, @stelfrag)
Code organization
- Add initial minimal h2o webserver integration (#14585, @underhood)
- Release buffer in case of error – CID 385075 (#15090, @stelfrag)
- Improve cleanup of health log table (#15045, @MrZammler)
- Simplify loop in alert checkpoint (#15065, @MrZammler)
- Only queue an alert to the cloud when it’s inserted (#15110, @MrZammler)
- Generate, store and transmit a unique alert event_hash_id (#15111, @MrZammler)
- Fix syntax in config.ac (#15139, @underhood)
- Add library to encode/decode Gorilla compressed buffers. (#15128, @vkalintiris)
- Fix coverity issues (#15169, @stelfrag)
- Fix CID 385073 – Uninitialized scalar variable (#15163, @stelfrag)
- Fix CodeQL warning (#15062, @stelfrag)
Deprecation notice
The following items will be removed in our next minor release (v1.41.0):
Patch releases (if any) will not be affected.
Component | Type | Will be replaced by |
---|---|---|
python.d/nvidia_smi | collector | go.d/nvidia_smi |
family attribute |
alert configuration and Health API | chart labels attribute (more details on netdata#15030) |
Cloud recommended version
When using Netdata Cloud, the required agent version to take most benefits from the latest features is one version before the last stable.
On this release this will become v1.39.1
and you’ll be notified and guided to take action on the UI if you are running agents on lower versions.
Check here for details on how to Update Netdata agents.
Netdata Release Meetup
Join the Netdata team on the 19th of June at 16:00 UTC for the Netdata Release Meetup.
Together we’ll cover:
- Release Highlights.
- Acknowledgements.
- Q&A with the community.
RSVP now - we look forward to meeting you.
Support options
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
- Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
- GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
- GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
- Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
- Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 1400 engineers are already using it!
Running survey
Helps us make Netdata even greater! We are trying to gather valuable information that is key for us to better position Netdata and ensure we keep bringing more value to you.
We would appreciate if you could take some time to answer this short survey (4 questions only).