Kafka NOT_LEADER_FOR_PARTITION: stale metadata, controller lag, and client retries
Producers and consumers log NOT_LEADER_FOR_PARTITION. Broker response metrics show spikes in failed produce or fetch requests. The cluster usually self-heals within seconds as clients refresh metadata. When the error persists for minutes, or flaps across many partitions, the root cause is typically a controller that cannot keep up with leadership changes. Distinguishing a routine leader election from a controller queue backup that blocks metadata propagation is the first step.
What this means
Kafka clients cache partition leadership metadata. When a leader moves (rolling restart, broker failure, preferred replica election), a client with a stale view sends requests to the previous leader. That broker returns NOT_LEADER_FOR_PARTITION. The Java client treats this as a retriable error and refreshes metadata eagerly. A short spike during a restart is normal and usually clears immediately.
If the error persists, the controller’s event queue is likely backed up. The active controller processes leadership changes, ISR updates, and topic operations sequentially from a single-threaded queue. When events arrive faster than they drain, metadata changes propagate slowly. Brokers serve conflicting leadership metadata, and clients receive unstable answers even after refresh. The result is sustained NOT_LEADER_FOR_PARTITION responses, often accompanied by growing under-replication and delayed leader elections.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Transient leader election | Errors spike for 10-60 seconds then flatline | LeaderElectionRateAndTimeMs burst; ActiveControllerCount == 1 |
| Controller event queue backup | Errors sustained for minutes; queue growing | ControllerEventQueueSize on the active controller |
| No active controller | Errors spread cluster-wide; no elections completing | ActiveControllerCount summed across brokers != 1 |
| Broker overload | High request latency alongside leadership errors | RequestHandlerAvgIdlePercent below 0.3 |
Quick checks
# Verify exactly one active controller exists
for host in broker1 broker2 broker3; do
printf "%s: " "$host"
echo "get -b kafka.controller:type=KafkaController,name=ActiveControllerCount Value" | java -jar jmxterm.jar -l $host:9999 -n
done
# Check controller event queue depth (run on the active controller)
echo "get -b kafka.controller:type=ControllerEventManager,name=EventQueueSize Value" | java -jar jmxterm.jar -l localhost:9999 -n
# Check recent leader election volume and timing
echo "get -b kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs Count" | java -jar jmxterm.jar -l localhost:9999 -n
echo "get -b kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs 99thPercentile" | java -jar jmxterm.jar -l localhost:9999 -n
# List partitions lacking a full ISR
kafka-topics.sh --bootstrap-server localhost:9092 --describe --under-replicated-partitions
# List partitions with no leader
kafka-topics.sh --bootstrap-server localhost:9092 --describe --unavailable-partitions
# Check broker-side failed request rates
echo "get -b kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec OneMinuteRate" | java -jar jmxterm.jar -l localhost:9999 -n
echo "get -b kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec OneMinuteRate" | java -jar jmxterm.jar -l localhost:9999 -n
# Rule out broker processing saturation
echo "get -b kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent Value" | java -jar jmxterm.jar -l localhost:9999 -n
How to diagnose it
Use the following flow to isolate whether you are seeing a transient transition or a controller backlog.
flowchart TD
A[NOT_LEADER_FOR_PARTITION errors] --> B{Persistent > 2 min?}
B -->|No| C[Transient leader election]
B -->|Yes| D[ActiveControllerCount == 1?]
D -->|No| E[Controller outage]
D -->|Yes| F[ControllerEventQueueSize]
F -->|Growing > 1000| G[Controller queue backup]
F -->|Spike then drain| H[Large failure recovery]
F -->|Near zero| I[Check client or network]
G --> J[Check ZK or KRaft health]- Confirm the cluster has one active controller. Query
ActiveControllerCounton every broker. The cluster-wide sum must be exactly 1. If it is 0, the cluster cannot elect leaders or update metadata. If it is greater than 1 in ZooKeeper mode, you may have split-brain. - Check the controller event queue size. On the active controller, read
kafka.controller:type=ControllerEventManager,name=EventQueueSize. Near zero is healthy. Sustained values above 100 indicate pressure. Continuous growth above 1000 means the controller is overwhelmed and metadata changes are queuing. - Evaluate leader election timing. Read
LeaderElectionRateAndTimeMs. A brief burst with completion times under 100ms suggests a normal transition. Elections consistently taking over 1 second, or a sustained high election rate outside maintenance, indicate the controller or metadata store is degraded. - Correlate with replication state. Check
UnderReplicatedPartitions. If it is rising across many brokers whileNOT_LEADER_FOR_PARTITIONpersists, followers are also unable to sync, pointing to a broader broker or network issue rather than pure metadata staleness. - Check for offline partitions. If
OfflinePartitionsCountis increasing while the controller queue is backed up, partitions are waiting in line for leader election. This confirms the controller cannot keep up with failure recovery. - Rule out broker saturation. If
RequestHandlerAvgIdlePercentis sustained below 0.3 andRequestQueueSizeis elevated, the broker is too slow to process requests. This can produce leadership timeouts that look like metadata issues but are actually resource exhaustion.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
ControllerEventQueueSize | Measures controller backlog directly. Only meaningful on the active controller. | Sustained > 100; continuous growth > 1000 |
LeaderElectionRateAndTimeMs | Reveals election velocity and whether the controller is stalling. | p99 time > 1s; sustained high rate outside maintenance |
ActiveControllerCount | Confirms a single controller exists to process events. | Cluster-wide sum != 1 |
UnderReplicatedPartitions | Shows if replication is degraded beyond leadership metadata. | Nonzero and growing across multiple brokers |
OfflinePartitionsCount | Confirms partitions are truly unavailable, not just misrouted. | Increasing while controller queue is backed up |
FailedProduceRequestsPerSec / FailedFetchRequestsPerSec | Broker-side view of client-visible errors including this one. | Sustained nonzero rate outside maintenance |
RequestHandlerAvgIdlePercent | Distinguishes broker overload from metadata propagation delays. | Sustained below 0.3 |
Fixes
If the controller queue is backed up
Do not restart additional brokers. Restarting generates more controller events and worsens the backlog. Check the metadata store:
- In ZooKeeper mode, check
ZooKeeperRequestLatencyMsandZooKeeperExpiresPerSec. High ZK latency slows every controller event. If ZK is shared with other systems, isolate it. - In KRaft mode, check quorum health with
kafka-metadata-quorum.sh --bootstrap-server broker:9092 describe --status. Look for voter lag and commit latency growth. If the quorum has lost its leader, metadata is frozen.
If a specific broker is unhealthy (disk latency spikes, GC pauses) and generating repeated ISR changes, perform a controlled shutdown to remove it cleanly rather than letting it flap and enqueue more events.
If the queue is draining slowly after a large-scale failure, wait. Monitor LeaderElectionRateAndTimeMs for completion. If the queue is growing without bound, the controller itself may need attention; check JVM GC logs and CPU.
If there is no active controller
In ZooKeeper mode, check for ZK session expirations and ensure network connectivity between brokers and ZK nodes. In KRaft mode, verify quorum voter connectivity and that controller nodes are healthy. Without a controller, the cluster cannot self-heal and manual intervention is required to restore the metadata plane.
If the issue is transient
If errors spike during a rolling restart or single broker recovery and clear within 30-60 seconds, no fix is needed. Verify that UnderReplicatedPartitions returns to zero and that clients have resumed normal throughput.
Prevention
- Monitor
ControllerEventQueueSizeas a primary controller health signal. Alert on sustained values above 100. - Keep partition counts per broker within tested limits. The controller must process an event per partition during failures. Test recovery time by gracefully shutting down one broker. If recovery exceeds 1-2 minutes, reduce partition density.
- Maintain ZooKeeper or KRaft quorum health independently. Do not share ZK clusters with other applications.
- Avoid coordinated restarts of multiple brokers. Stagger maintenance to prevent controller overload.
- Verify leadership rebalancing after restarts. If
auto.leader.rebalance.enabledoes not rebalance sufficiently, runkafka-leader-election.shto prevent hot brokers from concentrating metadata churn.
How Netdata helps
- Correlate
ControllerEventQueueSizewithFailedProduceRequestsPerSecin the same time window to confirm that controller lag is causing client errors. - Alert on
LeaderElectionRateAndTimeMsspikes andActiveControllerCountanomalies. - Track
UnderReplicatedPartitionsandOfflinePartitionsCountalongside broker resource metrics to distinguish controller issues from disk or network degradation. - Visualize request latency breakdowns (
RequestQueueTimeMs,LocalTimeMs) to rule out I/O thread saturation that can mimic metadata propagation delays.
Related guides
- How Kafka actually works in production: a mental model for operators
- Kafka ISR shrinking: IsrShrinksPerSec, flapping, and the cascade to offline
- Kafka monitoring checklist: the signals every production cluster needs
- Kafka monitoring maturity model: from survival to expert
- Kafka NotEnoughReplicasException: acks=all writes rejected below min.insync.replicas
- Kafka replica MaxLag growing: slow followers and replica fetcher health
- Kafka UnderMinIsrPartitionCount: confirming the write path is blocked
- Kafka UnderReplicatedPartitions > 0: the most important metric and how to clear it







