Kafka authorization failures: ACL denials, wrong-topic clients, and audit trails

Your consumers or producers throw TOPIC_AUTHORIZATION_FAILED or CLUSTER_AUTHORIZATION_FAILED. Your security team spots repeated log lines in the broker logs. You need to know within minutes whether this is a deployment misconfiguration, a client pointing at the wrong topic, or a security incident.

In clusters where authorizer.class.name is configured, Kafka writes authorization decisions to kafka-authorizer.log. Denials log at INFO level; allowed operations are silent unless you enable DEBUG logging for the authorizer.

What this means

When a Kafka broker has an authorizer configured, the default is deny unless an explicit Allow ACL matches the principal, resource, and operation. Authorization decisions are written to logs/kafka-authorizer.log. A denial means the authenticated principal attempted an operation on a resource with no matching Allow ACL.

In KRaft mode, ACLs are stored in the __cluster_metadata topic and enforced by org.apache.kafka.metadata.authorizer.StandardAuthorizer. In ZooKeeper-based clusters, ACLs are stored in ZooKeeper and enforced by kafka.security.authorizer.AclAuthorizer. Super users configured via super.users bypass all ACL checks. The allow.everyone.if.no.acl.found setting defaults to false, so resources without ACLs are not world-accessible.

Occasional denials are almost always misconfigurations: a new service missing an ACL, a developer pointing a consumer at prod-events instead of prod-events-v2, or a staging credential reused in production. Sustained denials from an authenticated principal making requests outside its normal pattern suggest a compromised credential, lateral movement, or probing.

The distinction matters because the response differs. Misconfigurations are fixed with kafka-acls.sh. Security incidents require credential rotation, scope investigation, and escalation.

flowchart TD
    A[Authorization denial in kafka-authorizer.log] --> B{Occasional or sustained?}
    B -->|Occasional / deployment correlated| C[ACL misconfig or wrong-topic client]
    B -->|Sustained from known principal| D[Compromised credential or application bug]
    B -->|Sustained from unexpected principal| E[Unauthorized access or lateral movement]
    C --> F[kafka-acls.sh --list to verify]
    D --> G[Check client logs and credential rotation history]
    E --> H[Escalate security incident]
    F --> I[Fix ACL or client configuration]

Common causes

CauseWhat it looks likeFirst thing to check
Misconfigured ACLDenied operations from a known application right after deployment or credential rotationkafka-authorizer.log for the principal and exact resource name
Client targeting wrong topicA producer or consumer logs authorization failures for a topic it should not accessClient configuration for topic.name or subscription regex
Compromised or over-permissioned credentialSustained denials from a principal making requests outside its normal patternHistorical audit trail for that principal’s typical resources
Missing broker principal ACLs (PLAINTEXT inter-broker)Internal broker errors like UpdateMetadata denials after enabling authorizationsecurity.inter.broker.protocol and whether broker traffic authenticates as ANONYMOUS
KRaft controller missing authorizerNo Authorizer is configured errors from kafka-acls.sh despite broker configauthorizer.class.name on both broker and controller processes
False client-side authorization failureKafkaJS producer reports authorization failure for a topic it previously produced toClient library version and known issues like KafkaJS #1346

Quick checks

Run these read-only checks before making changes.

# Check recent authorization denials on a broker
grep -E "DENIED|Denied" /var/log/kafka/kafka-authorizer.log | tail -n 50

# Verify the configured authorizer class
grep "authorizer.class.name" /etc/kafka/server.properties

# List ACLs for a specific principal
kafka-acls.sh --bootstrap-server localhost:9092 --list --principal User:app-producer

# List ACLs for a specific topic
kafka-acls.sh --bootstrap-server localhost:9092 --list --topic orders --operation Read

# Check super users configured on the broker
grep "super.users" /etc/kafka/server.properties

# Check for admin operations in broker logs (topic/ACL/config changes)
grep -E "CreateTopics|DeleteTopics|AlterConfigs|CreateAcls|DeleteAcls" /var/log/kafka/server.log | tail -n 20

On AWS MSK, broker-side kafka-authorizer.log is not exposed to customers. Use CloudWatch Logs or MSK access logging. The kafka-acls.sh checks still work via the --bootstrap-server endpoint.

How to diagnose it

  1. Filter the authorizer log for the incident window. Look for lines containing the denied principal and resource type (Topic, Group, Cluster). The log includes the principal, operation, resource, and decision.
  2. Determine if the principal is known. If it belongs to a newly deployed service, this is likely a missing ACL. If it is an existing production service, check whether the resource is new or unexpected.
  3. Verify existing ACLs with kafka-acls.sh --list. Check both the principal and the resource. Principal names are case-sensitive. Prefixed ACLs match any resource starting with the given prefix.
  4. Check for admin operations around the time of first denial. Unauthorized ACL changes, topic creation, or config alterations can explain sudden denials. Look in broker logs or audit logs for changes outside a change window.
  5. Distinguish authZ from authN. If the client cannot authenticate, it never reaches the authorizer. Check server.log for AuthenticationException or JMX failed-authentication-rate to rule out authentication problems.
  6. Validate the client library. Some libraries report authorization failures during reconnection even when ACLs are correct. If the denial is intermittent and the client previously produced successfully, verify the client version and check for known issues.
  7. Check KRaft controller configuration. In KRaft mode with separated controller and broker processes, authorizer.class.name=org.apache.kafka.metadata.authorizer.StandardAuthorizer must be set on both. If kafka-acls.sh returns “No Authorizer is configured,” verify the controller config.
  8. Assess the scope and duration. Occasional denials that stop after a few minutes suggest a transient misconfiguration. Sustained denials that continue for tens of minutes, especially from a principal trying multiple resources, warrant escalation as a potential security event.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Authorization failure rateDirect measure of ACL denials from kafka-authorizer.logSustained rate from a principal with no recent history of accessing that resource
Authentication failure rateDistinguishes authN from authZ problemsSustained AuthenticationException before denials appear
Admin operations / config changesUnauthorized changes cause sudden authorization shiftsTopic creation, deletion, or ACL modification outside change windows
UnderReplicatedPartitionsBroker principal denials can block internal replicationURP rises after enabling ACLs without granting Cluster operations to brokers
Consumer group lagAuthorization failures block consumers, causing lag to growLag grows for a group whose members recently started logging denials
FailedProduceRequestsPerSecProducers receive hard errors when deniedSpike correlating with authorization denials
FailedFetchRequestsPerSecConsumers receive hard errors when deniedSpike on topics with new ACL restrictions

Fixes

Fix the ACL

If kafka-acls.sh --list shows no matching Allow ACL, add the minimum required permission. Prefer resource-specific ACLs over wildcard grants.

# Example: grant Write and Describe on a topic to a producer principal
kafka-acls.sh --bootstrap-server localhost:9092 --add \
  --allow-principal User:app-producer \
  --operation Write --operation Describe \
  --topic orders

Tradeoff: Wildcard topic ACLs (--topic '*') reduce operational overhead but increase blast radius if the principal is compromised.

Fix the client configuration

If the client targets the wrong topic or consumer group, fix the application configuration. Common mistakes include hardcoded topic names from a previous environment, regex subscriptions that match unintended topics, or MirrorMaker configs pointing to the wrong cluster.

Tradeoff: Fixing the client requires a deployment, which may take longer than adding an ACL. If the client should not access that resource, do not add the ACL.

Resolve inter-broker authorization failures

If brokers use PLAINTEXT for security.inter.broker.protocol and ACLs are enabled, internal requests authenticate as ANONYMOUS and may be denied. Switch inter-broker communication to SASL_PLAINTEXT or SASL_SSL and ensure the broker principal has Cluster-level ACLs.

Tradeoff: Switching inter-broker protocol requires a rolling restart. Granting broad Cluster ACLs to brokers reduces security boundaries.

Escalate sustained anomalous denials

If a principal makes sustained authorization requests to resources it has never touched before, treat it as a potential security incident. Rotate the principal’s credentials, audit its recent activity, and check whether other principals from the same source exhibit similar behavior.

Prevention

  • Enable DEBUG logging for allowed operations. Set log4j.logger.kafka.authorizer.logger=DEBUG for log4j1 or logger.authorizer.level=DEBUG for log4j2. Without this, allowed operations are silent and you cannot build a baseline. Expect high log volume; use short-lived toggles or dedicated appenders.
  • Require ACL changes through automation. Topic and ACL provisioning should run through CI/CD or infrastructure-as-code with mandatory review. Ad-hoc kafka-acls.sh commands in production create audit gaps.
  • Monitor admin operations outside change windows. Any topic creation, deletion, or ACL modification by an unexpected principal should generate a ticket immediately.
  • Validate client configs in pre-production. Run integration tests with the same authorizer configuration as production to catch wrong-topic errors before deployment.
  • Document principal-to-resource mappings. Maintain a living document or repo that defines which services access which topics. This accelerates diagnosis when a principal is denied.

How Netdata helps

  • Correlate authorization failure rates with broker health metrics (RequestHandlerAvgIdlePercent, UnderReplicatedPartitions) to detect instability triggered by ACL changes.
  • Track authentication and authorization event trends over time to establish baselines for normal principal behavior.
  • Alert on sustained authorization denials alongside consumer group lag growth to detect when ACL issues block data flow.
  • Monitor broker log patterns for admin operations, surfacing unexpected topic or ACL changes that precede authorization incidents.
  • How Kafka actually works in production: a mental model for operators: /guides/kafka/how-kafka-works-in-production/
  • Kafka enable.auto.commit data loss: committed offsets that outrun processing: /guides/kafka/kafka-auto-commit-silent-data-loss/
  • Kafka ‘Broker may not be available’: clients that can’t connect or stay connected: /guides/kafka/kafka-broker-may-not-be-available/
  • Kafka broker out of disk: log.dirs full, the cliff-edge shutdown, and recovery: /guides/kafka/kafka-broker-out-of-disk/
  • Kafka network egress saturation: BytesOutPerSec, replication amplification, and fan-out: /guides/kafka/kafka-bytes-out-network-saturation/
  • Kafka CommitFailedException: rebalanced-out consumers and poll loop timeouts: /guides/kafka/kafka-commit-failed-exception/
  • Kafka connection storms: connection-count spikes, FD pressure, and network threads: /guides/kafka/kafka-connection-count-storm/
  • Kafka consumer group stuck Empty or Dead: no members consuming: /guides/kafka/kafka-consumer-group-empty-stuck/
  • Kafka consumer group lag growing: detection, lag-as-time, and root causes: /guides/kafka/kafka-consumer-group-lag-growing/
  • Kafka consumer group rebalancing too often: heartbeats, session timeout, and assignors: /guides/kafka/kafka-consumer-group-rebalancing-frequently/
  • Kafka __consumer_offsets growing huge: compaction failure on the offsets topic: /guides/kafka/kafka-consumer-offsets-topic-growing/
  • Kafka consumer rebalance storm: stuck in PreparingRebalance and max.poll.interval.ms: /guides/kafka/kafka-consumer-rebalance-storm/