Kafka LeaderElectionRateAndTimeMs spiking: election storms and slow elections

LeaderElectionRateAndTimeMs climbing outside maintenance windows means the controller is struggling to keep the partition map consistent. Producers and consumers may see NOT_LEADER_FOR_PARTITION or request timeouts. Partitions can hang under-replicated or offline while the controller works through a backlog.

This metric has two dimensions. The rate is how often leadership changes. The time is how long the metadata store takes to commit each change. High rate with low time means broker flapping or repeated administrative actions. Normal rate with high time means the controller event queue is backed up, ZooKeeper writes are slow, or the KRaft quorum is lagging. Determine which mode you are in before the cluster degrades further.

In KRaft mode prior to Kafka 3.9.0, kafka.controller:type=ControllerStats MBeans are not available, so LeaderElectionRateAndTimeMs is absent from JMX . If you are running KRaft on an older version, upgrade to observe this signal. ZooKeeper mode was removed in Kafka 4.0.

What this means

kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs exposes a rate (elections per second) and a latency histogram (milliseconds per election). During healthy steady state, both should be near zero.

Bursts are expected during a broker restart, partition reassignment, or when auto.leader.rebalance.enable triggers preferred replica elections. In these cases, the rate spikes briefly and returns to zero within roughly one to two times replica.lag.time.max.ms.

Sustained nonzero rate outside maintenance means brokers are repeatedly joining and leaving the cluster, or ISR changes are forcing continuous re-elections. This is an election storm. Slow election times, measured in hundreds of milliseconds to seconds, point to the controller event queue backing up, ZooKeeper write latency in ZooKeeper mode, or KRaft quorum communication delays in KRaft mode. If elections happen faster than the controller can commit them, ControllerEventQueueSize grows and partitions wait longer to come back online.

flowchart TD
    A[Election rate spikes] --> B{Burst or sustained?}
    B -->|Burst| C[Correlates with maintenance?]
    C -->|Yes| D[Expected behavior]
    B -->|Sustained| E[Check ControllerEventQueueSize]
    E --> F{Growing?}
    F -->|Yes| G[Controller backlog. Check ZK or KRaft latency.]
    F -->|No| H[Check broker uptimes and logs]
    H --> I{Uptime resetting?}
    I -->|Yes| J[Broker flapping. Remove sick node.]
    I -->|No| K[Check for admin election scripts or Cruise Control.]

Common causes

CauseWhat it looks likeFirst thing to check
Broker flappingSustained election rate, broker process restarting, waves of URPBroker logs for repeated shutdowns, systemd status, or process uptime
Controller event queue backupSlow election times, growing ControllerEventQueueSize, offline partitions lagging behind realityControllerEventQueueSize on the active controller
ZooKeeper or KRaft metadata latencyElections taking seconds, ZK p99 latency elevated or KRaft commit latency highZK request latency (ZK mode) or kafka.server:type=raft-metrics (KRaft)
Slow follower spurious elections (KRaft)Elections without broker failure, follower fetch timeoutsFetchFollower latency and controller.quorum.fetch.timeout.ms
Admin-triggered election stormSpike correlates with kafka-leader-election.sh or Cruise Control rebalanceController logs for election trigger source

Quick checks

Run these read-only checks from a host with JMX access.

# Check election rate and 99th percentile latency
echo "get -b kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs OneMinuteRate" | java -jar jmxterm.jar -l localhost:9999
echo "get -b kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs 99thPercentile" | java -jar jmxterm.jar -l localhost:9999

# Check controller event queue depth
echo "get -b kafka.controller:type=ControllerEventManager,name=EventQueueSize Value" | java -jar jmxterm.jar -l localhost:9999

# Verify active controller identity
echo "get -b kafka.controller:type=KafkaController,name=ActiveControllerCount Value" | java -jar jmxterm.jar -l localhost:9999

# Find under-replicated partitions cluster-wide
kafka-topics.sh --bootstrap-server localhost:9092 --describe --under-replicated-partitions

# Check for offline partitions
kafka-topics.sh --bootstrap-server localhost:9092 --describe --unavailable-partitions

# Check ZooKeeper request latency (ZK mode only)
echo "get -b kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs 99thPercentile" | java -jar jmxterm.jar -l localhost:9999

# Check KRaft quorum status (KRaft mode only)
kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --status  <!-- TODO: verify exact connectivity flag for your Kafka version -->

How to diagnose it

  1. Expected burst or incident. During rolling restarts or kafka-reassign-partitions.sh, bursts are normal. Monitor until the rate returns to zero. If it does not resolve within one to two times replica.lag.time.max.ms, treat it as an incident.
  2. Isolate rate from time. High rate with fast times suggests brokers are flapping or an automation tool is triggering elections repeatedly. Normal rate with slow times suggests the controller or metadata store is the bottleneck.
  3. Broker flapping. Inspect broker logs for repeated shutdown sequences, JVM crash files, or zookeeper.session.timeout.ms expirations. Look for correlating spikes in UnderReplicatedPartitions that align with the same broker rejoining. If a broker’s uptime is resetting, isolate it.
  4. Controller event queue. If ControllerEventQueueSize is consistently above 100 or growing without bound, the controller is backlogged. In ZooKeeper mode, check ZK request latency. In KRaft mode, check kafka-metadata-quorum.sh describe --status for voter lag and commit latency. Do not restart additional brokers while the queue is growing; this generates more events.
  5. Partition availability. If LeaderElectionRateAndTimeMs is elevated but OfflinePartitionsCount is not dropping, the controller cannot process elections fast enough to restore availability. This is a PAGE-worthy condition.
  6. Administrative triggers. Look in controller logs for kafka-leader-election.sh or Cruise Control rebalance operations. KAFKA-14667 can cause preferred leadership elections triggered by Cruise Control to get stuck in purgatory indefinitely and time out.
  7. KRaft log noise versus real unclean elections. KAFKA-19148 (versions 3.9.x through 4.0.x) causes the controller to log “UNCLEAN partition change” for clean elections. This is a false positive. Verify actual data safety via UncleanLeaderElectionsPerSec and OfflinePartitionsCount rather than log lines alone.
  8. KAFKA-20554 and acks=1 durability. In versions 3.9 through 4.1.2, acks=1 producers can lose acknowledged records during planned leader transitions across both KRaft and ZooKeeper modes. acks=all is unaffected.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
LeaderElectionRateAndTimeMs (OneMinuteRate)Direct measure of election velocitySustained >0 outside maintenance windows
LeaderElectionRateAndTimeMs (99thPercentile)Exposes slow controller metadata commitsp99 > 1000ms consistently
ControllerEventQueueSizeBacked-up queue serializes all controller work>100 sustained, or growing without bound
ActiveControllerCountExactly one controller must existCluster-wide sum != 1 for >30 seconds
UnderReplicatedPartitionsFollowers falling behind during leadership churnGrowing across multiple brokers
OfflinePartitionsCountConfirmed unavailabilityNonzero and growing while elections are slow
ZK Request Latency (ZK mode)Metadata store round-trips dominate election timep99 > 100ms sustained
KRaft quorum health (KRaft)Raft log append latency replaces ZKcommit-latency-avg > 100ms or current-leader = -1

Fixes

Broker flapping

Identify the flapping broker via logs and uptime. If the root cause is a failing disk, network partition, or GC-induced session timeout, stop restarting the broker in place. Perform a controlled shutdown to trigger clean leader elections for its partitions, then repair or replace the hardware before rejoining. A sick broker that repeatedly rejoins generates controller events faster than they can be processed.

Controller queue backup

Do not restart brokers or trigger additional reassignments. Check the metadata store:

  • In ZooKeeper mode, ensure the ZK transaction log is on a dedicated low-latency disk. ZK latency above 100ms directly inflates election time.
  • In KRaft mode, check disk latency on the controller nodes. Raft is sensitive to fsync latency. Verify voter connectivity with kafka-metadata-quorum.sh.

If the queue is draining, even slowly, allow it to complete. If it is growing, the controller may be in GC distress or the metadata store may be unresponsive. The controller event queue is single-threaded. There is no way to scale it horizontally except to reduce partitions or improve metadata store latency.

Slow elections from metadata store latency

For ZooKeeper, co-locating ZK with other workloads or using slow disks for the transaction log is a common cause. Move ZK to dedicated nodes with SSD-backed transaction logs.

For KRaft, increase controller.quorum.election.timeout.ms only to tolerate slow bootstrap in containerized environments where pods start sequentially. Do not increase this as a fix for runtime latency; runtime slowness is a disk or network problem.

Spurious KRaft elections from slow followers

If followers do not fetch within controller.quorum.fetch.timeout.ms, KRaft may trigger unnecessary new elections. KAFKA-15489 documents this behavior. Raising the fetch timeout to 1.5x its default value can reduce spurious elections without masking real follower failure.

Manual election tools timing out

KAFKA-16015 means kafka-leader-election.sh ignores client-supplied request.timeout.ms and always uses broker defaults of roughly 15 seconds. If this tool times out against a stressed cluster, pass --admin.config pointing to a properties file with increased timeout values.

Prevention

  • Monitor LeaderElectionRateAndTimeMs and ControllerEventQueueSize as a pair. Queue growth is the leading indicator for election time spikes.
  • Keep partition counts per broker within tested limits. Controller recovery time scales linearly with partition count, and the event queue is single-threaded.
  • Run game-day failovers. Know how long your cluster takes to process leader elections for all partitions when a broker dies.
  • In KRaft mode, ensure controller nodes have disk read-write latency under 10ms. Raft throughput collapses when fsync latency spikes.
  • Avoid manual preferred replica elections during peak traffic. If you use Cruise Control, watch for KAFKA-14667, where preferred leadership elections can get stuck in purgatory indefinitely.
  • If you are on Kafka versions 3.9 through 4.1.2, be aware that KAFKA-20554 creates a narrow window where acks=1 writes can be lost during clean leader transitions. Use acks=all with min.insync.replicas=2 for topics where durability matters.

How Netdata helps

Netdata collects LeaderElectionRateAndTimeMs, ControllerEventQueueSize, and ActiveControllerCount by default. Composite charts overlay election time with UnderReplicatedPartitions and OfflinePartitionsCount to distinguish maintenance from stabilization failure. Alerts on the 99th percentile election time catch slow controllers before offline partitions accumulate. Per-broker process uptime identifies flapping brokers without JMX queries.