Kafka LeaderElectionRateAndTimeMs spiking: election storms and slow elections
LeaderElectionRateAndTimeMs climbing outside maintenance windows means the controller is struggling to keep the partition map consistent. Producers and consumers may see NOT_LEADER_FOR_PARTITION or request timeouts. Partitions can hang under-replicated or offline while the controller works through a backlog.
This metric has two dimensions. The rate is how often leadership changes. The time is how long the metadata store takes to commit each change. High rate with low time means broker flapping or repeated administrative actions. Normal rate with high time means the controller event queue is backed up, ZooKeeper writes are slow, or the KRaft quorum is lagging. Determine which mode you are in before the cluster degrades further.
In KRaft mode prior to Kafka 3.9.0, kafka.controller:type=ControllerStats MBeans are not available, so LeaderElectionRateAndTimeMs is absent from JMX . If you are running KRaft on an older version, upgrade to observe this signal. ZooKeeper mode was removed in Kafka 4.0.
What this means
kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs exposes a rate (elections per second) and a latency histogram (milliseconds per election). During healthy steady state, both should be near zero.
Bursts are expected during a broker restart, partition reassignment, or when auto.leader.rebalance.enable triggers preferred replica elections. In these cases, the rate spikes briefly and returns to zero within roughly one to two times replica.lag.time.max.ms.
Sustained nonzero rate outside maintenance means brokers are repeatedly joining and leaving the cluster, or ISR changes are forcing continuous re-elections. This is an election storm. Slow election times, measured in hundreds of milliseconds to seconds, point to the controller event queue backing up, ZooKeeper write latency in ZooKeeper mode, or KRaft quorum communication delays in KRaft mode. If elections happen faster than the controller can commit them, ControllerEventQueueSize grows and partitions wait longer to come back online.
flowchart TD
A[Election rate spikes] --> B{Burst or sustained?}
B -->|Burst| C[Correlates with maintenance?]
C -->|Yes| D[Expected behavior]
B -->|Sustained| E[Check ControllerEventQueueSize]
E --> F{Growing?}
F -->|Yes| G[Controller backlog. Check ZK or KRaft latency.]
F -->|No| H[Check broker uptimes and logs]
H --> I{Uptime resetting?}
I -->|Yes| J[Broker flapping. Remove sick node.]
I -->|No| K[Check for admin election scripts or Cruise Control.]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Broker flapping | Sustained election rate, broker process restarting, waves of URP | Broker logs for repeated shutdowns, systemd status, or process uptime |
| Controller event queue backup | Slow election times, growing ControllerEventQueueSize, offline partitions lagging behind reality | ControllerEventQueueSize on the active controller |
| ZooKeeper or KRaft metadata latency | Elections taking seconds, ZK p99 latency elevated or KRaft commit latency high | ZK request latency (ZK mode) or kafka.server:type=raft-metrics (KRaft) |
| Slow follower spurious elections (KRaft) | Elections without broker failure, follower fetch timeouts | FetchFollower latency and controller.quorum.fetch.timeout.ms |
| Admin-triggered election storm | Spike correlates with kafka-leader-election.sh or Cruise Control rebalance | Controller logs for election trigger source |
Quick checks
Run these read-only checks from a host with JMX access.
# Check election rate and 99th percentile latency
echo "get -b kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs OneMinuteRate" | java -jar jmxterm.jar -l localhost:9999
echo "get -b kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs 99thPercentile" | java -jar jmxterm.jar -l localhost:9999
# Check controller event queue depth
echo "get -b kafka.controller:type=ControllerEventManager,name=EventQueueSize Value" | java -jar jmxterm.jar -l localhost:9999
# Verify active controller identity
echo "get -b kafka.controller:type=KafkaController,name=ActiveControllerCount Value" | java -jar jmxterm.jar -l localhost:9999
# Find under-replicated partitions cluster-wide
kafka-topics.sh --bootstrap-server localhost:9092 --describe --under-replicated-partitions
# Check for offline partitions
kafka-topics.sh --bootstrap-server localhost:9092 --describe --unavailable-partitions
# Check ZooKeeper request latency (ZK mode only)
echo "get -b kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs 99thPercentile" | java -jar jmxterm.jar -l localhost:9999
# Check KRaft quorum status (KRaft mode only)
kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --status <!-- TODO: verify exact connectivity flag for your Kafka version -->
How to diagnose it
- Expected burst or incident. During rolling restarts or
kafka-reassign-partitions.sh, bursts are normal. Monitor until the rate returns to zero. If it does not resolve within one to two timesreplica.lag.time.max.ms, treat it as an incident. - Isolate rate from time. High rate with fast times suggests brokers are flapping or an automation tool is triggering elections repeatedly. Normal rate with slow times suggests the controller or metadata store is the bottleneck.
- Broker flapping. Inspect broker logs for repeated shutdown sequences, JVM crash files, or
zookeeper.session.timeout.msexpirations. Look for correlating spikes inUnderReplicatedPartitionsthat align with the same broker rejoining. If a broker’s uptime is resetting, isolate it. - Controller event queue. If
ControllerEventQueueSizeis consistently above 100 or growing without bound, the controller is backlogged. In ZooKeeper mode, check ZK request latency. In KRaft mode, checkkafka-metadata-quorum.sh describe --statusfor voter lag and commit latency. Do not restart additional brokers while the queue is growing; this generates more events. - Partition availability. If
LeaderElectionRateAndTimeMsis elevated butOfflinePartitionsCountis not dropping, the controller cannot process elections fast enough to restore availability. This is a PAGE-worthy condition. - Administrative triggers. Look in controller logs for
kafka-leader-election.shor Cruise Control rebalance operations. KAFKA-14667 can cause preferred leadership elections triggered by Cruise Control to get stuck in purgatory indefinitely and time out. - KRaft log noise versus real unclean elections. KAFKA-19148 (versions 3.9.x through 4.0.x) causes the controller to log “UNCLEAN partition change” for clean elections. This is a false positive. Verify actual data safety via
UncleanLeaderElectionsPerSecandOfflinePartitionsCountrather than log lines alone. - KAFKA-20554 and
acks=1durability. In versions 3.9 through 4.1.2,acks=1producers can lose acknowledged records during planned leader transitions across both KRaft and ZooKeeper modes.acks=allis unaffected.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
LeaderElectionRateAndTimeMs (OneMinuteRate) | Direct measure of election velocity | Sustained >0 outside maintenance windows |
LeaderElectionRateAndTimeMs (99thPercentile) | Exposes slow controller metadata commits | p99 > 1000ms consistently |
ControllerEventQueueSize | Backed-up queue serializes all controller work | >100 sustained, or growing without bound |
ActiveControllerCount | Exactly one controller must exist | Cluster-wide sum != 1 for >30 seconds |
UnderReplicatedPartitions | Followers falling behind during leadership churn | Growing across multiple brokers |
OfflinePartitionsCount | Confirmed unavailability | Nonzero and growing while elections are slow |
| ZK Request Latency (ZK mode) | Metadata store round-trips dominate election time | p99 > 100ms sustained |
| KRaft quorum health (KRaft) | Raft log append latency replaces ZK | commit-latency-avg > 100ms or current-leader = -1 |
Fixes
Broker flapping
Identify the flapping broker via logs and uptime. If the root cause is a failing disk, network partition, or GC-induced session timeout, stop restarting the broker in place. Perform a controlled shutdown to trigger clean leader elections for its partitions, then repair or replace the hardware before rejoining. A sick broker that repeatedly rejoins generates controller events faster than they can be processed.
Controller queue backup
Do not restart brokers or trigger additional reassignments. Check the metadata store:
- In ZooKeeper mode, ensure the ZK transaction log is on a dedicated low-latency disk. ZK latency above 100ms directly inflates election time.
- In KRaft mode, check disk latency on the controller nodes. Raft is sensitive to fsync latency. Verify voter connectivity with
kafka-metadata-quorum.sh.
If the queue is draining, even slowly, allow it to complete. If it is growing, the controller may be in GC distress or the metadata store may be unresponsive. The controller event queue is single-threaded. There is no way to scale it horizontally except to reduce partitions or improve metadata store latency.
Slow elections from metadata store latency
For ZooKeeper, co-locating ZK with other workloads or using slow disks for the transaction log is a common cause. Move ZK to dedicated nodes with SSD-backed transaction logs.
For KRaft, increase controller.quorum.election.timeout.ms only to tolerate slow bootstrap in containerized environments where pods start sequentially. Do not increase this as a fix for runtime latency; runtime slowness is a disk or network problem.
Spurious KRaft elections from slow followers
If followers do not fetch within controller.quorum.fetch.timeout.ms, KRaft may trigger unnecessary new elections. KAFKA-15489 documents this behavior. Raising the fetch timeout to 1.5x its default value can reduce spurious elections without masking real follower failure.
Manual election tools timing out
KAFKA-16015 means kafka-leader-election.sh ignores client-supplied request.timeout.ms and always uses broker defaults of roughly 15 seconds. If this tool times out against a stressed cluster, pass --admin.config pointing to a properties file with increased timeout values.
Prevention
- Monitor
LeaderElectionRateAndTimeMsandControllerEventQueueSizeas a pair. Queue growth is the leading indicator for election time spikes. - Keep partition counts per broker within tested limits. Controller recovery time scales linearly with partition count, and the event queue is single-threaded.
- Run game-day failovers. Know how long your cluster takes to process leader elections for all partitions when a broker dies.
- In KRaft mode, ensure controller nodes have disk read-write latency under 10ms. Raft throughput collapses when fsync latency spikes.
- Avoid manual preferred replica elections during peak traffic. If you use Cruise Control, watch for KAFKA-14667, where preferred leadership elections can get stuck in purgatory indefinitely.
- If you are on Kafka versions 3.9 through 4.1.2, be aware that KAFKA-20554 creates a narrow window where
acks=1writes can be lost during clean leader transitions. Useacks=allwithmin.insync.replicas=2for topics where durability matters.
How Netdata helps
Netdata collects LeaderElectionRateAndTimeMs, ControllerEventQueueSize, and ActiveControllerCount by default. Composite charts overlay election time with UnderReplicatedPartitions and OfflinePartitionsCount to distinguish maintenance from stabilization failure. Alerts on the 99th percentile election time catch slow controllers before offline partitions accumulate. Per-broker process uptime identifies flapping brokers without JMX queries.
Related guides
- How Kafka actually works in production: a mental model for operators
- Kafka ISR shrinking: IsrShrinksPerSec, flapping, and the cascade to offline
- Kafka LEADER_NOT_AVAILABLE: causes during elections, restarts, and topic creation
- Kafka leadership imbalance: LeaderCount skew and preferred replica election
- Kafka min.insync.replicas and acks: configuring durability you actually have
- Kafka monitoring checklist: the signals every production cluster needs
- Kafka monitoring maturity model: from survival to expert
- Kafka ActiveControllerCount not equal to 1: no controller or split brain
- Kafka NotEnoughReplicasException: acks=all writes rejected below min.insync.replicas
- Kafka NOT_LEADER_FOR_PARTITION: stale metadata, controller lag, and client retries
- Kafka OfflinePartitionsCount > 0: partitions with no leader and how to recover
- Kafka replica MaxLag growing: slow followers and replica fetcher health







