Kafka LEADER_NOT_AVAILABLE: causes during elections, restarts, and topic creation

LEADER_NOT_AVAILABLE means a client asked a broker to produce or fetch from a partition that has no assigned leader. In healthy clusters this is brief during rolling restarts, controller elections, or topic creation. Persistent errors correlate with OfflinePartitionsCount > 0 and indicate the data plane is broken for those partitions. Distinguish this from NOT_LEADER_FOR_PARTITION, which means a leader exists but the client contacted the wrong broker and needs a metadata refresh.

What this means

A partition serves reads and writes only through its leader. The active controller assigns leadership. When a broker receives a produce or fetch request for a partition with no leader, it returns LEADER_NOT_AVAILABLE and the client retries after refreshing metadata.

Normal operation:

  • A broker is restarting and its partitions are undergoing leader election.
  • The controller is moving leadership during a preferred replica election.
  • A new topic was created but leader assignments have not yet propagated.

Abnormal operation:

  • Every replica is offline or out of sync and unclean.leader.election.enable=false.
  • The controller is absent, crashed, or its event queue is backed up and elections stall.
  • A broker is network-partitioned but not fully down, preventing the controller from cleaning up leadership.
flowchart TD
    A[Client sees LEADER_NOT_AVAILABLE] --> B{Transient?}
    B -->|Yes: restart, election, new topic| C[Wait and let client retry]
    B -->|No: persists >60s| D{ActiveControllerCount == 1?}
    D -->|No| E[Fix controller or quorum]
    D -->|Yes| F{OfflinePartitionsCount > 0?}
    F -->|Yes| G[Find leaderless partitions and replica state]
    F -->|No| H[Treat as NOT_LEADER_FOR_PARTITION stale metadata]
    G --> I{ISR is empty?}
    I -->|Yes| J[Recover replicas or accept unclean election]
    I -->|No| K[Check controller queue and election latency]

Common causes

CauseSymptomsFirst check
Transient leader election during rolling restart or broker failureLeaderElectionRateAndTimeMs spikes; OfflinePartitionsCount and UnderReplicatedPartitions briefly rise, then fallActiveControllerCount equals 1 and election p99 is under 1 second
New topic metadata not yet propagatedErrors target only the new topic; other topics are healthykafka-topics.sh --describe for the topic and watch leaders appear
No ISR available for the partitionOfflinePartitionsCount stays above 0; UncleanLeaderElectionsPerSec is 0kafka-topics.sh --describe --unavailable-partitions and broker liveness
Controller loss or event queue backupActiveControllerCount is not 1, or ControllerEventQueueSize grows without drainingController broker logs, ZK session state, or KRaft quorum health
Network-partitioned or flapping brokerBroker process is up but unreachable; ISR shrinks on cluster leaders; follower fetch latency risesNetwork reachability between brokers, dmesg, interface counters

Quick checks

# Leaderless partitions
kafka-topics.sh --bootstrap-server localhost:9092 --describe --unavailable-partitions

# Under-replicated partitions
kafka-topics.sh --bootstrap-server localhost:9092 --describe --under-replicated-partitions

# Active controller count (cluster-wide sum must equal 1)
echo "get -b kafka.controller:type=KafkaController,name=ActiveControllerCount Value" | java -jar jmxterm.jar -l localhost:9999

# Controller event queue depth
echo "get -b kafka.controller:type=ControllerEventManager,name=EventQueueSize Value" | java -jar jmxterm.jar -l localhost:9999

# Leader election rate and latency
echo "get -b kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs OneMinuteRate" | java -jar jmxterm.jar -l localhost:9999
echo "get -b kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs 99thPercentile" | java -jar jmxterm.jar -l localhost:9999

# ISR shrink/expand velocity
echo "get -b kafka.server:type=ReplicaManager,name=IsrShrinksPerSec OneMinuteRate" | java -jar jmxterm.jar -l localhost:9999
echo "get -b kafka.server:type=ReplicaManager,name=IsrExpandsPerSec OneMinuteRate" | java -jar jmxterm.jar -l localhost:9999

# KRaft quorum state
kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --status

# Broker process liveness
systemctl status kafka
ss -tnlp | grep 9092

How to diagnose it

  1. Separate LEADER_NOT_AVAILABLE from NOT_LEADER_FOR_PARTITION. LEADER_NOT_AVAILABLE means no leader exists. NOT_LEADER_FOR_PARTITION means the broker contacted is not the leader and the client metadata is stale. If only some clients complain and a metadata refresh fixes it, you are dealing with stale metadata, not a leaderless partition.

  2. Check if the error is transient. Examine the last few minutes of LeaderElectionRateAndTimeMs, OfflinePartitionsCount, and UnderReplicatedPartitions. If these spike and recover within 30-60 seconds of a rolling restart or topic creation, this is expected.

  3. Verify controller health. Sum ActiveControllerCount across all brokers. It must equal exactly 1. If it does not, or if ControllerEventQueueSize is consistently above 100 and growing, the controller is the bottleneck.

  4. Identify offline partitions. Run kafka-topics.sh --describe --unavailable-partitions. This lists the topic, partition, and replica list. Cross-reference with broker liveness.

  5. Check replica state. In the describe output, look at the ISR. If the ISR is empty and all replicas are on down brokers, the partition stays offline until a replica recovers or an unclean election is allowed.

  6. Investigate broker liveness. Check process state, port reachability, recent restarts, disk I/O latency, and network partitions. A broker that is up but unreachable produces the same symptoms as a dead broker.

  7. Check replication health on surviving leaders. Elevated IsrShrinksPerSec, UnderReplicatedPartitions, and follower fetch latency indicate that followers are being removed from the ISR, which can push more partitions below min.insync.replicas or to zero ISR.

  8. Check metadata store health. In KRaft mode, verify the quorum has a leader and acceptable commit latency. In ZooKeeper mode, check ZooKeeperRequestLatencyMs p99 and ZooKeeperExpiresPerSec for session expirations that can eject the controller.

  9. Read broker and controller logs. Search for ERROR lines around leader election, controller events, and network timeouts. Logs often reveal the first failure before metrics show the full impact.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
OfflinePartitionsCountDirect measure of leaderless partitionsNonzero for more than 60 seconds
ActiveControllerCountExactly one controller must exist to assign leadersCluster-wide sum is not 1
LeaderElectionRateAndTimeMsSpikes during failures; high latency means elections stallRate spikes outside maintenance, or p99 above 1 second
ControllerEventQueueSizePending metadata operations; a backed-up queue delays electionsConsistently above 100, or growing above 1000
UnderReplicatedPartitionsReplicas are falling behind and may leave the ISRNonzero and growing outside maintenance
IsrShrinksPerSecVelocity of replicas leaving the ISRSustained above 0 for more than 5 minutes
FailedProduceRequestsPerSecDirect producer-visible impactSustained nonzero rate
KRaft quorum stateMetadata-plane health in KRaft modecurrent-leader = -1, or high commit latency
ZooKeeperRequestLatencyMs / ZooKeeperExpiresPerSecZK health in ZK mode; high latency or expiry can kill the controllerp99 above 1 second, or any session expiry

Fixes

Transient errors during elections, restarts, or topic creation

Do not restart brokers or recreate topics while the controller is electing leaders. Let the controller finish and let clients retry with metadata refreshes. If producers are especially sensitive, confirm they are configured to retry and refresh metadata on LEADER_NOT_AVAILABLE. For new topics, wait until kafka-topics.sh --describe shows leaders assigned before directing traffic to the topic.

Persistent offline partitions with no ISR

If all replicas are down and unclean.leader.election.enable=false, the partition stays offline until a replica returns. Priorities:

  1. Restore the failed brokers or fix the network partition.
  2. If a broker is permanently lost, use kafka-reassign-partitions.sh to move replicas to healthy brokers. This triggers large data movement.
  3. Warning: As a last resort, if data loss is acceptable, enable unclean leader election temporarily on the topic, then disable it immediately after recovery. This can silently truncate acknowledged writes.

Controller loss or queue backup

If ActiveControllerCount is not 1:

  • Identify why the controller was lost. Common causes are JVM OOM, long GC pauses, ZK session expiry, or KRaft quorum partition.
  • Do not restart additional brokers; that generates more controller events.
  • In KRaft mode, check voter connectivity and quorum logs. In ZK mode, check ZK latency and whether the ensemble has quorum.
  • If the controller queue is large but a controller exists, monitor drain rate. If it is draining, wait. If it is growing, reduce load by stopping admin operations and reassignment jobs.

Network-partitioned or flapping broker

A broker that is alive but partitioned can hold leadership without acknowledging followers, or prevent the controller from cleaning up its state. If the network issue cannot be resolved quickly, use controlled shutdown to remove the broker and let the cluster elect clean leaders on reachable replicas.

Prevention

  • Monitor OfflinePartitionsCount and ActiveControllerCount with paging thresholds. Do not rely on UnderReplicatedPartitions alone; under-replication is normal during restarts, but offline partitions are not.
  • Set min.insync.replicas=2 for replication.factor=3 topics. This prevents the write path from degrading to a single replica, reducing the chance of a leaderless partition after one broker failure.
  • Leave unclean.leader.election.enable=false unless you explicitly accept data loss. The default has been safe since Kafka 0.11.0.0.
  • Size controller nodes for your partition count. Give them dedicated ZK or KRaft resources, and watch ControllerEventQueueSize during normal operations to detect creeping overload.
  • Use rack-aware replication. A rack failure should shrink the ISR but not take all replicas of a partition offline.
  • Run game-day rolling restarts and broker failures. Measure how long leader elections take, how high OfflinePartitionsCount spikes, and how long ISR recovery takes. Use that to set alert thresholds and maintenance windows.

How Netdata helps

  • Correlates OfflinePartitionsCount, ActiveControllerCount, UnderReplicatedPartitions, and ControllerEventQueueSize in one view to separate transient elections from persistent leaderless partitions.
  • Surfaces Kafka request latency breakdowns (RequestQueueTimeMs, LocalTimeMs, RemoteTimeMs) to show whether the bottleneck is replication, disk I/O, or thread saturation.
  • Tracks KRaft quorum health and ZooKeeper latency alongside broker metrics, making metadata-plane failures visible without switching tools.
  • Alerts on disk I/O latency, page cache pressure, and network retransmits that often precede ISR shrinks and leaderless partitions.
  • Provides per-broker LeaderCount and PartitionCount views to catch leadership imbalance that can overload the broker most likely to lose leadership first.