Kafka authentication failures: SASL/mTLS errors, credential rotation, and brute force

AuthenticationException lines in broker logs are spiking and failed-authentication-total is climbing on one or more listeners. Producers or consumers are failing to connect, and every failed attempt consumes a network thread and a file descriptor before the broker closes the connection. The same metric pattern can mean routine credential rotation, a client misconfiguration, or an active brute-force attempt. Telling the difference determines whether you need a config fix, a secret rotation, or a security incident response.

What this means

Kafka brokers authenticate every client connection before admitting requests. When SASL/SCRAM or mTLS fails, the broker closes the connection, emits a log line, and increments the failed-authentication-rate and failed-authentication-total attributes on the kafka.server:type=socket-server-metrics MBean for that listener. A single misconfigured client with a stale secret looks nearly identical in the aggregate metric to a credential-stuffing attack. Correlate the rate spike with deployment timestamps, source IPs, and certificate validity windows.

flowchart TD
    A[Failed-auth rate spikes] --> B{Correlate with deployment?}
    B -->|Yes| C[Check rotated credentials or config drift]
    B -->|No| D{Single unknown IP >10/min?}
    D -->|Yes| E[Escalate: credential stuffing]
    D -->|No| F{SSL handshake failed?}
    F -->|Yes| G[Check client cert and truststore]
    F -->|No| H[Check SASL mechanism and JAAS alignment]

Common causes

CauseWhat it looks likeFirst thing to check
Expired or rotated SCRAM credentialSpike in failures from a known principal after a deployment or secret lease expirykafka-configs.sh --describe --entity-type users for the principal; correlate the failure start time with your secret rotation tooling
Client-broker SASL mechanism mismatchA new or upgraded client fails consistently while existing clients workClient sasl.mechanism vs broker sasl.enabled.mechanisms and listener JAAS config
mTLS certificate expiry or truststore mismatchSSL handshake failed in broker logs; failures start exactly at certificate renewal timeClient certificate validity dates and whether the signing CA is present in the broker truststore
Brute force / credential stuffingSustained burst of more than 10 failures per minute from one or more unknown source IPsSource IP distribution in broker logs; no correlating deployment or service
KRaft inter-broker or controller SASL misconfigurationAuth failures between brokers and controllers after a rolling restart or upgradesasl.mechanism.inter.broker.protocol and sasl.mechanism.controller.protocol alignment

Quick checks

# Check per-listener failed auth total (adjust listener and processor IDs)
echo "get -b kafka.server:type=socket-server-metrics,listener=PLAINTEXT,networkProcessor=0 failed-authentication-total" | java -jar jmxterm.jar -l localhost:9999

# Tail broker logs for authentication exceptions
grep "AuthenticationException\|Failed authentication" /var/log/kafka/server.log | tail -n 50

# List SCRAM users known to the broker
kafka-configs.sh --bootstrap-server localhost:9092 --describe --entity-type users

# Verify broker SASL mechanism configuration
grep -E "^sasl.enabled.mechanisms|^listener.name.*.sasl.jaas.config" /etc/kafka/server.properties

# Rough count of active Kafka connections
ss -tnp | grep $(pgrep -f 'kafka\.Kafka') | wc -l

How to diagnose it

  1. Quantify the failure rate and compare it to your baseline. Read failed-authentication-rate or failed-authentication-total via JMX for each listener.
  2. Determine whether the spike correlates with a known deployment, credential rotation, or certificate renewal window. If yes, treat it as a TICKET and move to credential or config verification. If no, proceed to step 3.
  3. Identify source IPs and principals from broker logs. If a single unknown IP generates more than 10 failures per minute, treat it as credential stuffing and escalate.
  4. For SASL failures, verify that the client and broker agree on the mechanism. The client must use a mechanism listed in the broker’s sasl.enabled.mechanisms, and the JAAS config must match the listener name.
  5. For mTLS listeners, inspect broker logs for SSL handshake failed. Verify that the client certificate has not expired and that its signing CA is in the broker truststore.
  6. In KRaft clusters, verify that sasl.mechanism.inter.broker.protocol and sasl.mechanism.controller.protocol match the configured listener mechanisms, especially after upgrades.
  7. Check NetworkProcessorAvgIdlePercent and total connection count. A brute-force storm can saturate network threads even though individual auth attempts fail.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
failed-authentication-rate / failed-authentication-total (per listener)Direct count of rejected authentication attemptsNonzero sustained rate outside maintenance windows
AuthenticationException in broker logsReveals principal, mechanism, and source IPBursts from unknown or unexpected sources
NetworkProcessorAvgIdlePercentHandshake storms consume network thread CPUSustained drop below 0.5 during auth spikes
Connection count per listenerDistinguishes organic growth from an attack2x baseline with flat producer or consumer count
FailedProduceRequestsPerSecMeasures client-visible impactSpikes correlated with the auth failure window
ActiveControllerCountAuth issues between brokers and controllers can stall metadataExpected controller shows 0

Fixes

Expired or rotated SCRAM credentials

Use kafka-configs.sh --alter --entity-type users --entity-name <principal> to update the password dynamically without restarting brokers.

Warning: This immediately invalidates the old credential. Clients using it will fail to authenticate until they are rolled.

In KRaft mode, brokers must have been bootstrapped with kafka-storage.sh --add-scram before startup to recognize the initial admin credential; dynamic updates apply once brokers are running. Roll clients with the new secret. Overlap old and new credentials briefly in your secret store if your tooling supports it to avoid a hard outage.

SASL mechanism or JAAS misconfiguration

Align the client sasl.mechanism with a value in the broker’s sasl.enabled.mechanisms. Check that the listener name in the JAAS config property matches the listener in listeners. Remember config precedence: listener.name.{listener}.{mechanism}.sasl.jaas.config overrides static JAAS files. Restart brokers only if you change static server properties; client-side fixes do not require broker restarts.

mTLS certificate issues

If the client certificate expired or the CA changed, generate a new client certificate signed by the current CA and import it into the client’s keystore. Ensure the broker truststore contains the full CA chain. Broker truststore changes require a broker restart to take effect in most configurations.

Brute-force or credential stuffing

Block the offending IP at your firewall, cloud security group, or reverse proxy. Rotate any credentials that may have been exposed. Audit listener exposure; brokers with SASL on public interfaces are high-risk. Restrict listeners to internal networks where possible.

Prevention

  • Baseline failed-authentication-rate per listener and alert on deviations rather than absolute values.
  • Run SCRAM over SASL_SSL. SASL_PLAINTEXT exposes credentials on the wire and violates RFC 5802 security requirements.
  • Automate credential rotation with overlapping validity windows to eliminate hard cutovers.
  • Restrict broker listener access to known networks. Exposing authentication ports to the internet increases brute-force noise and risk.
  • In KRaft mode, validate inter-broker and controller SASL protocol settings during upgrade testing.

How Netdata helps

  • Collects failed-authentication-rate and failed-authentication-total per listener from the Kafka JMX endpoint.
  • Correlates authentication spikes with NetworkProcessorAvgIdlePercent and connection count to reveal whether a failure storm is exhausting broker resources.
  • Surfaces AuthenticationException patterns through log monitoring.
  • Displays FailedProduceRequestsPerSec alongside auth metrics to confirm whether failed handshakes are translating into write-path impact.
  • Maintains historical baselines so you can distinguish a gradual certificate expiry from a sudden brute-force attack.