Kafka __consumer_offsets growing huge: compaction failure on the offsets topic

One broker’s disk is climbing faster than its peers, or __consumer_offsets has grown to multiple gigabytes while producer traffic is flat. This internal topic is compacted by default; its size should stay roughly proportional to active consumer groups and partitions. Unbounded growth means the log cleaner has stalled or crashed. The failure is silent: producers and consumers keep working, but every offset commit appends a record that compaction will never remove. Growth is not evenly distributed: __consumer_offsets is partitioned by group.id hash, so a stalled cleaner on one broker affects only the partitions in that broker’s log.dirs. Eventually the log directory fills, and the broker may mark it offline.

What this means

__consumer_offsets stores committed offsets as key-value pairs: the key is {group, topic, partition} and the value is the offset. Its cleanup policy is compact, so Kafka should retain only the latest value per key. The log cleaner thread scans closed segments, builds an offset map for each key in the dirty range, and rewrites segments to remove obsolete records. Cleaned segments are swapped into the log atomically.

If the cleaner thread dies, compaction stops. Offset commits continue because the Group Coordinator still appends to the active segment, turning the topic into an append-only log. The bloated partition resides on a specific broker’s log directory, so that broker’s disk diverges from its peers while the cluster otherwise looks healthy. Because compaction is a local operation, other brokers hosting the same topic are unaffected unless they also have dead cleaners.

Before calling it a compaction failure, rule out legitimate bloat. A surge in consumer groups or an abnormally long offsets.retention.minutes enlarges the baseline, because compaction must retain the latest record for every active key. A dead cleaner produces uncleanable growth that outpaces the actual number of active groups.

flowchart TD
    A[Offset commits append to __consumer_offsets] --> B{Log cleaner healthy?}
    B -->|Yes| C[Compaction retains latest offset per key]
    B -->|No| D[Dirty ratio climbs]
    D --> E[Topic grows without bound]
    E --> F[Broker log dir fills]
    F --> G[Disk pressure or offline log directory]

Common causes

CauseWhat it looks likeFirst thing to check
Dead log cleaner threadmax-dirty-percent climbs above 0.5 and DeadThreadCount is nonzero; broker logs contain cleaner ERROR linesJMX kafka.log:type=LogCleaner,name=DeadThreadCount
Corrupt segment that re-crashes the cleanerBroker restarts temporarily reduce disk growth, then the cleaner dies again at the same offsetLogs for repeated cleaner exceptions at a specific offset or segment
High cardinality of active consumer groupsTopic is large but max-dirty-percent is low and stable; group count recently increasedkafka-consumer-groups.sh --list
Expired groups retained by long offset retentionSlow growth over weeks; many inactive groups still have stored offsetsoffsets.retention.minutes and the ratio of active to total groups

Quick checks

# Check log cleaner dirty ratio (requires jmxterm; -n prevents interactive hang)
# A value above 0.5 that keeps climbing indicates the cleaner is not processing segments.
echo "get -b kafka.log:type=LogCleanerManager,name=max-dirty-percent Value" | java -jar jmxterm.jar -n -l localhost:9999

# Check for dead cleaner threads
# Nonzero means a cleaner thread crashed and was not restarted.
echo "get -b kafka.log:type=LogCleaner,name=DeadThreadCount Value" | java -jar jmxterm.jar -n -l localhost:9999

# Inspect __consumer_offsets size per log directory
# Look for one broker whose partition size is significantly larger than its replicas.
kafka-log-dirs.sh --bootstrap-server localhost:9092 --describe | grep -A2 -B2 __consumer_offsets

# Search broker logs for cleaner crashes
# Focus on ERROR lines that mention a specific segment filename or offset.
grep -i "cleaner\|compaction" /var/log/kafka/server.log

# Count consumer groups to assess key cardinality
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list | wc -l

# Check disk utilization for each log directory
grep '^log.dirs=' /etc/kafka/server.properties | cut -d'=' -f2 | tr ',' '\n' | while read -r d; do df -h "$d"; done

How to diagnose it

  1. Confirm the bloated partition. kafka-log-dirs.sh --describe lists size per partition per log directory. If one broker shows a much larger __consumer_offsets partition than its followers, the cleaner on that broker is the suspect. If multiple brokers show large sizes, check whether the group count is actually high or whether the cluster has a widespread cleaner failure.

  2. Check cleaner health via JMX. max-dirty-percent steadily climbing above log.cleaner.min.cleanable.ratio (default 0.5) means the cleaner is not keeping up. A flat but high dirty ratio can mean the cleaner is alive but backlogged. Confirm with DeadThreadCount; nonzero means a thread has died.

  3. Inspect broker logs for the crash signature. Look for ERROR lines mentioning LogCleaner or compaction. Note the offset or segment file in the stack trace. CorruptRecordException or IllegalStateException during segment conversion indicates a bad record. If the exception names a specific .log file, record the full path; you will need it for recovery.

  4. Rule out traffic-driven growth. Compare BytesInPerSec for __consumer_offsets against its historical baseline. Flat traffic with growing disk means compaction has failed. A spike in commit rate from consumer group rebalances or flapping consumers can also bloat the topic temporarily; if the cleaner is healthy, it will catch up.

  5. Rule out retention misconfiguration. Check offsets.retention.minutes. Expired groups leave orphaned records, but that produces gradual growth, not the runaway growth of a dead cleaner. If retention is already short, orphaned records are not the root cause.

  6. Determine if a restart will hold. Transient OOM or one-off format errors recover after a broker restart. Watch the dirty ratio for 10-15 minutes after startup. If the crash references the same segment offset every time, the record is corrupt and the cleaner will die again until the segment is removed.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
kafka.log:type=LogCleanerManager,name=max-dirty-percentFraction of compacted log waiting to be cleaned. A rising value means the cleaner cannot keep up with the commit rate.Sustained value above log.cleaner.min.cleanable.ratio (default 0.5)
kafka.log:type=LogCleaner,name=DeadThreadCountCleaner threads that have crashed and not restarted. Even one dead thread halts compaction for the partitions that thread was responsible for.Any value above zero
Disk utilization on log.dirs volumesUncleaned segments consume disk without bound because obsolete records are never removed.Steady growth on a broker hosting compacted topics while traffic is flat
kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=__consumer_offsetsSeparates compaction failure from a traffic surge caused by consumer group rebalances or flapping consumers.Normal or flat ingest rate alongside growing disk usage
Consumer group countDetermines whether the topic baseline is reasonably large. Each unique group-partition key adds one retained record after compaction.Group count is flat but topic size grows
kafka.log:type=LogManager,name=OfflineLogDirectoryCountConsequence if the disk fills completely and the broker marks the directory offline.Any value above zero

Fixes

Restart the broker

Warning: Restarting a broker triggers leader election and under-replicated partitions. Plan for transient client latency and monitor UnderReplicatedPartitions during the restart.

If the cleaner died from a transient error, a broker restart revives the cleaner thread. Watch DeadThreadCount and max-dirty-percent for 10-15 minutes. If the dirty ratio falls and no new dead threads appear, the cleaner has recovered.

Remove the corrupt segment

Destructive. Stop the broker before deleting any segment file. When logs show the cleaner crashing repeatedly on the same offset, that segment likely contains a corrupt record. Moving the .log, .index, and .timeindex files out of the partition directory forces the cleaner to skip the segment . Any offset commits stored only in that segment are lost; affected consumer groups may need to reset to a valid offset. Capture the segment filename from the log, stop the broker, move the files, then restart. Be prepared to re-commit offsets or accept reprocessing.

Increase cleaner throughput

If the cleaner is alive but cannot keep up, raise log.cleaner.threads above its default of 1. Ensure log.cleaner.dedupe.buffer.size is large enough for the unique keys in __consumer_offsets. These settings consume additional CPU and heap; monitor RequestHandlerAvgIdlePercent and JVM GC pause times after changing them.

Lower offsets retention

If the topic is large due to thousands of expired groups, reduce offsets.retention.minutes. The cleaner then removes inactive-group offsets sooner. The tradeoff is that consumers restarting after a long idle period lose their offsets and fall back to auto.offset.reset.

Prevention

  • Monitor max-dirty-percent and DeadThreadCount. Alert when dirty ratio climbs above log.cleaner.min.cleanable.ratio or DeadThreadCount becomes nonzero.
  • Alert on anomalous disk growth for compacted topics. Stable BytesInPerSec with growing disk on a broker hosting __consumer_offsets is a leading indicator.
  • Audit consumer groups regularly. Remove dead groups to limit unique keys.
  • Size cleaner resources for peak commit load. Do not leave log.cleaner.threads at 1 on clusters with high group churn or many compacted topics.
  • Keep offsets.retention.minutes aligned with actual consumer behavior. Excessively long retention bloats the key space and increases cleaner memory pressure.

How Netdata helps

  • Correlate per-broker disk utilization with BytesInPerSec to rule out traffic spikes.
  • Surface JVM heap utilization and GC pause patterns that precede cleaner OOM failures.
  • Track OS-level disk I/O latency (await) to differentiate compaction backlog from disk saturation.
  • Expose JMX metrics such as max-dirty-percent and DeadThreadCount without manual jmxterm queries.
  • Alert on log directory growth rate anomalies for compacted topics before disks fill and the broker marks the directory offline.