Pulsar monitoring with Netdata

What is Pulsar?

Apache Pulsar is a distributed, real-time messaging system originally created at Yahoo and now part of the Apache Software Foundation. It was designed to provide high throughput, low latency messaging for applications that require streaming data. It supports multiple messaging protocols and is horizontally scalable, fault tolerant, and provides easy integration with other open source technologies.

Monitoring Pulsar with Netdata

The prerequisites for monitoring Pulsar with Netdata are to have Pulsar and Netdata installed on your system.

Netdata auto discovers hundreds of services, and for those it doesn’t turning on manual discovery is a one line configuration. For more information on configuring Netdata for Pulsar monitoring please read the collector documentation.

You should now see the Pulsar section on the Overview tab in Netdata Cloud already populated with charts about all the metrics you care about.

Netdata has a public demo space (no login required) where you can explore different monitoring use-cases and get a feel for Netdata.

What Pulsar metrics are important to monitor - and why?

These metrics are available globally, as well as per namespace and per topic.

Broker Components

Broker components are the building blocks of Apache Pulsar. It includes namespaces, topics, subscriptions, producers, and consumers. It is important to monitor these components to ensure the system is running as expected. If there is a sudden surge in the number of components, it could indicate an underlying issue with the system or a potential attack.

Messages Rate

The message rate measures the rate at which messages are published and dispatched. This metric is useful for understanding the throughput of the system and ensuring that messages are being processed quickly. If the message rate is abnormally low, it could be caused by a poorly configured system or a lack of resources.

Throughput Rate

The throughput rate measures the rate at which data is being transferred in and out of the system. This metric is useful for understanding the overall performance of the system, as well as any potential bottlenecks. If the throughput rate is abnormally low, it could be caused by an inadequate network connection or a lack of resources.

Storage Size

Storage size measures the amount of storage used by Apache Pulsar. This metric is important for understanding the overall usage of the system and ensuring that it is not running out of resources. If the storage size is abnormally high, it could be caused by inefficient storage utilization or a lack of resources.

Storage Operations Rate

Storage operations rate measures the rate at which messages are read and written. This metric is useful for understanding the performance of the storage system and ensuring that messages are being processed quickly. If the storage operations rate is abnormally low, it could be caused by an inefficient storage system or a lack of resources.

Message Backlog

Message backlog measures the amount of messages that are waiting to be processed. This metric is important for understanding the overall performance of the system and ensuring that messages are being processed quickly. If the message backlog is abnormally high, it could be caused by inefficient message processing or a lack of resources.

Storage Write Latency

Storage write latency measures the latency of writing messages to storage. This metric is important for understanding the overall performance of the system and ensuring that messages are being processed quickly. If the storage write latency is abnormally high, it could be caused by a poorly configured storage system or a lack of resources.

Entry Size

Entry size measures the size of a single message. This metric is useful for understanding the overall usage of the system and ensuring that messages are not too large for the system. If the entry size is abnormally large, it could be caused by inefficient storage utilization or a lack of resources.

Subscription Delayed

Subscription delayed measures the number of messages that have not been acknowledged by a consumer. This metric is important for understanding the performance of the system and ensuring that messages are being processed quickly. If the subscription delayed is abnormally high, it could be caused by an inefficient consumer or a lack of resources.

Subscription Message Rate Redeliver

Subscription message rate redeliver measures the rate at which messages are being redelivered. This metric is useful for understanding the performance of the system and ensuring that messages are being processed quickly. If the subscription message rate redeliver is abnormally high, it could be caused by an inefficient consumer or a lack of resources.

Subscription Blocked on Unacked Messages

Subscription blocked on unacked messages measures the number of subscriptions that are blocked on unacknowledged messages. This metric is important for understanding the performance of the system and ensuring that messages are being processed quickly. If the subscription blocked on unacked messages is abnormally high, it could be caused by an inefficient consumer or a lack of resources.

Replication Rate

Replication rate measures the rate at which messages are replicated in and out of the system. This metric is important for understanding the overall performance of the system and ensuring that messages are being replicated quickly. If the replication rate is abnormally low, it could be caused by an inadequate network connection or a lack of resources.

Replication Throughput Rate

Replication throughput rate measures the rate at which data is being transferred in and out of the system for replication. This metric is useful for understanding the performance of the replication system and ensuring that data is being replicated quickly. If the replication throughput rate is abnormally low, it could be caused by an inadequate network connection or a lack of resources.

Replication Backlog

Replication backlog measures the amount of messages that are waiting to be replicated. This metric is important for understanding the overall performance of the system and ensuring that messages are being replicated quickly. If the replication backlog is abnormally high, it could be caused by an inefficient replication system or a lack of resources.

Get Netdata

Sign up for free

Want to see a demonstration of Netdata for multiple use cases?

Go to Live Demo