Kafka authentication failures: SASL/mTLS errors, credential rotation, and brute force
AuthenticationException lines in broker logs are spiking and failed-authentication-total is climbing on one or more listeners. Producers or consumers are failing to connect, and every failed attempt consumes a network thread and a file descriptor before the broker closes the connection. The same metric pattern can mean routine credential rotation, a client misconfiguration, or an active brute-force attempt. Telling the difference determines whether you need a config fix, a secret rotation, or a security incident response.
What this means
Kafka brokers authenticate every client connection before admitting requests. When SASL/SCRAM or mTLS fails, the broker closes the connection, emits a log line, and increments the failed-authentication-rate and failed-authentication-total attributes on the kafka.server:type=socket-server-metrics MBean for that listener. A single misconfigured client with a stale secret looks nearly identical in the aggregate metric to a credential-stuffing attack. Correlate the rate spike with deployment timestamps, source IPs, and certificate validity windows.
flowchart TD
A[Failed-auth rate spikes] --> B{Correlate with deployment?}
B -->|Yes| C[Check rotated credentials or config drift]
B -->|No| D{Single unknown IP >10/min?}
D -->|Yes| E[Escalate: credential stuffing]
D -->|No| F{SSL handshake failed?}
F -->|Yes| G[Check client cert and truststore]
F -->|No| H[Check SASL mechanism and JAAS alignment]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Expired or rotated SCRAM credential | Spike in failures from a known principal after a deployment or secret lease expiry | kafka-configs.sh --describe --entity-type users for the principal; correlate the failure start time with your secret rotation tooling |
| Client-broker SASL mechanism mismatch | A new or upgraded client fails consistently while existing clients work | Client sasl.mechanism vs broker sasl.enabled.mechanisms and listener JAAS config |
| mTLS certificate expiry or truststore mismatch | SSL handshake failed in broker logs; failures start exactly at certificate renewal time | Client certificate validity dates and whether the signing CA is present in the broker truststore |
| Brute force / credential stuffing | Sustained burst of more than 10 failures per minute from one or more unknown source IPs | Source IP distribution in broker logs; no correlating deployment or service |
| KRaft inter-broker or controller SASL misconfiguration | Auth failures between brokers and controllers after a rolling restart or upgrade | sasl.mechanism.inter.broker.protocol and sasl.mechanism.controller.protocol alignment |
Quick checks
# Check per-listener failed auth total (adjust listener and processor IDs)
echo "get -b kafka.server:type=socket-server-metrics,listener=PLAINTEXT,networkProcessor=0 failed-authentication-total" | java -jar jmxterm.jar -l localhost:9999
# Tail broker logs for authentication exceptions
grep "AuthenticationException\|Failed authentication" /var/log/kafka/server.log | tail -n 50
# List SCRAM users known to the broker
kafka-configs.sh --bootstrap-server localhost:9092 --describe --entity-type users
# Verify broker SASL mechanism configuration
grep -E "^sasl.enabled.mechanisms|^listener.name.*.sasl.jaas.config" /etc/kafka/server.properties
# Rough count of active Kafka connections
ss -tnp | grep $(pgrep -f 'kafka\.Kafka') | wc -l
How to diagnose it
- Quantify the failure rate and compare it to your baseline. Read
failed-authentication-rateorfailed-authentication-totalvia JMX for each listener. - Determine whether the spike correlates with a known deployment, credential rotation, or certificate renewal window. If yes, treat it as a TICKET and move to credential or config verification. If no, proceed to step 3.
- Identify source IPs and principals from broker logs. If a single unknown IP generates more than 10 failures per minute, treat it as credential stuffing and escalate.
- For SASL failures, verify that the client and broker agree on the mechanism. The client must use a mechanism listed in the broker’s
sasl.enabled.mechanisms, and the JAAS config must match the listener name. - For mTLS listeners, inspect broker logs for
SSL handshake failed. Verify that the client certificate has not expired and that its signing CA is in the broker truststore. - In KRaft clusters, verify that
sasl.mechanism.inter.broker.protocolandsasl.mechanism.controller.protocolmatch the configured listener mechanisms, especially after upgrades. - Check
NetworkProcessorAvgIdlePercentand total connection count. A brute-force storm can saturate network threads even though individual auth attempts fail.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
failed-authentication-rate / failed-authentication-total (per listener) | Direct count of rejected authentication attempts | Nonzero sustained rate outside maintenance windows |
AuthenticationException in broker logs | Reveals principal, mechanism, and source IP | Bursts from unknown or unexpected sources |
NetworkProcessorAvgIdlePercent | Handshake storms consume network thread CPU | Sustained drop below 0.5 during auth spikes |
| Connection count per listener | Distinguishes organic growth from an attack | 2x baseline with flat producer or consumer count |
FailedProduceRequestsPerSec | Measures client-visible impact | Spikes correlated with the auth failure window |
ActiveControllerCount | Auth issues between brokers and controllers can stall metadata | Expected controller shows 0 |
Fixes
Expired or rotated SCRAM credentials
Use kafka-configs.sh --alter --entity-type users --entity-name <principal> to update the password dynamically without restarting brokers.
Warning: This immediately invalidates the old credential. Clients using it will fail to authenticate until they are rolled.
In KRaft mode, brokers must have been bootstrapped with kafka-storage.sh --add-scram before startup to recognize the initial admin credential; dynamic updates apply once brokers are running. Roll clients with the new secret. Overlap old and new credentials briefly in your secret store if your tooling supports it to avoid a hard outage.
SASL mechanism or JAAS misconfiguration
Align the client sasl.mechanism with a value in the broker’s sasl.enabled.mechanisms. Check that the listener name in the JAAS config property matches the listener in listeners. Remember config precedence: listener.name.{listener}.{mechanism}.sasl.jaas.config overrides static JAAS files. Restart brokers only if you change static server properties; client-side fixes do not require broker restarts.
mTLS certificate issues
If the client certificate expired or the CA changed, generate a new client certificate signed by the current CA and import it into the client’s keystore. Ensure the broker truststore contains the full CA chain. Broker truststore changes require a broker restart to take effect in most configurations.
Brute-force or credential stuffing
Block the offending IP at your firewall, cloud security group, or reverse proxy. Rotate any credentials that may have been exposed. Audit listener exposure; brokers with SASL on public interfaces are high-risk. Restrict listeners to internal networks where possible.
Prevention
- Baseline
failed-authentication-rateper listener and alert on deviations rather than absolute values. - Run SCRAM over
SASL_SSL.SASL_PLAINTEXTexposes credentials on the wire and violates RFC 5802 security requirements. - Automate credential rotation with overlapping validity windows to eliminate hard cutovers.
- Restrict broker listener access to known networks. Exposing authentication ports to the internet increases brute-force noise and risk.
- In KRaft mode, validate inter-broker and controller SASL protocol settings during upgrade testing.
How Netdata helps
- Collects
failed-authentication-rateandfailed-authentication-totalper listener from the Kafka JMX endpoint. - Correlates authentication spikes with
NetworkProcessorAvgIdlePercentand connection count to reveal whether a failure storm is exhausting broker resources. - Surfaces
AuthenticationExceptionpatterns through log monitoring. - Displays
FailedProduceRequestsPerSecalongside auth metrics to confirm whether failed handshakes are translating into write-path impact. - Maintains historical baselines so you can distinguish a gradual certificate expiry from a sudden brute-force attack.
Related guides
- How Kafka actually works in production: a mental model for operators
- Kafka enable.auto.commit data loss: committed offsets that outrun processing
- Kafka ‘Broker may not be available’: clients that can’t connect or stay connected
- Kafka broker out of disk: log.dirs full, the cliff-edge shutdown, and recovery
- Kafka CommitFailedException: rebalanced-out consumers and poll loop timeouts
- Kafka consumer group stuck Empty or Dead: no members consuming
- Kafka consumer group lag growing: detection, lag-as-time, and root causes
- Kafka consumer group rebalancing too often: heartbeats, session timeout, and assignors
- Kafka __consumer_offsets growing huge: compaction failure on the offsets topic
- Kafka consumer rebalance storm: stuck in PreparingRebalance and max.poll.interval.ms
- Kafka controller event queue backing up: overwhelmed controller and stalled metadata
- Kafka disk I/O latency high: await, LocalTimeMs, and the slow-disk broker







