Kafka __consumer_offsets growing huge: compaction failure on the offsets topic
One broker’s disk is climbing faster than its peers, or __consumer_offsets has grown to multiple gigabytes while producer traffic is flat. This internal topic is compacted by default; its size should stay roughly proportional to active consumer groups and partitions. Unbounded growth means the log cleaner has stalled or crashed. The failure is silent: producers and consumers keep working, but every offset commit appends a record that compaction will never remove. Growth is not evenly distributed: __consumer_offsets is partitioned by group.id hash, so a stalled cleaner on one broker affects only the partitions in that broker’s log.dirs. Eventually the log directory fills, and the broker may mark it offline.
What this means
__consumer_offsets stores committed offsets as key-value pairs: the key is {group, topic, partition} and the value is the offset. Its cleanup policy is compact, so Kafka should retain only the latest value per key. The log cleaner thread scans closed segments, builds an offset map for each key in the dirty range, and rewrites segments to remove obsolete records. Cleaned segments are swapped into the log atomically.
If the cleaner thread dies, compaction stops. Offset commits continue because the Group Coordinator still appends to the active segment, turning the topic into an append-only log. The bloated partition resides on a specific broker’s log directory, so that broker’s disk diverges from its peers while the cluster otherwise looks healthy. Because compaction is a local operation, other brokers hosting the same topic are unaffected unless they also have dead cleaners.
Before calling it a compaction failure, rule out legitimate bloat. A surge in consumer groups or an abnormally long offsets.retention.minutes enlarges the baseline, because compaction must retain the latest record for every active key. A dead cleaner produces uncleanable growth that outpaces the actual number of active groups.
flowchart TD
A[Offset commits append to __consumer_offsets] --> B{Log cleaner healthy?}
B -->|Yes| C[Compaction retains latest offset per key]
B -->|No| D[Dirty ratio climbs]
D --> E[Topic grows without bound]
E --> F[Broker log dir fills]
F --> G[Disk pressure or offline log directory]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Dead log cleaner thread | max-dirty-percent climbs above 0.5 and DeadThreadCount is nonzero; broker logs contain cleaner ERROR lines | JMX kafka.log:type=LogCleaner,name=DeadThreadCount |
| Corrupt segment that re-crashes the cleaner | Broker restarts temporarily reduce disk growth, then the cleaner dies again at the same offset | Logs for repeated cleaner exceptions at a specific offset or segment |
| High cardinality of active consumer groups | Topic is large but max-dirty-percent is low and stable; group count recently increased | kafka-consumer-groups.sh --list |
| Expired groups retained by long offset retention | Slow growth over weeks; many inactive groups still have stored offsets | offsets.retention.minutes and the ratio of active to total groups |
Quick checks
# Check log cleaner dirty ratio (requires jmxterm; -n prevents interactive hang)
# A value above 0.5 that keeps climbing indicates the cleaner is not processing segments.
echo "get -b kafka.log:type=LogCleanerManager,name=max-dirty-percent Value" | java -jar jmxterm.jar -n -l localhost:9999
# Check for dead cleaner threads
# Nonzero means a cleaner thread crashed and was not restarted.
echo "get -b kafka.log:type=LogCleaner,name=DeadThreadCount Value" | java -jar jmxterm.jar -n -l localhost:9999
# Inspect __consumer_offsets size per log directory
# Look for one broker whose partition size is significantly larger than its replicas.
kafka-log-dirs.sh --bootstrap-server localhost:9092 --describe | grep -A2 -B2 __consumer_offsets
# Search broker logs for cleaner crashes
# Focus on ERROR lines that mention a specific segment filename or offset.
grep -i "cleaner\|compaction" /var/log/kafka/server.log
# Count consumer groups to assess key cardinality
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list | wc -l
# Check disk utilization for each log directory
grep '^log.dirs=' /etc/kafka/server.properties | cut -d'=' -f2 | tr ',' '\n' | while read -r d; do df -h "$d"; done
How to diagnose it
Confirm the bloated partition.
kafka-log-dirs.sh --describelists size per partition per log directory. If one broker shows a much larger__consumer_offsetspartition than its followers, the cleaner on that broker is the suspect. If multiple brokers show large sizes, check whether the group count is actually high or whether the cluster has a widespread cleaner failure.Check cleaner health via JMX.
max-dirty-percentsteadily climbing abovelog.cleaner.min.cleanable.ratio(default 0.5) means the cleaner is not keeping up. A flat but high dirty ratio can mean the cleaner is alive but backlogged. Confirm withDeadThreadCount; nonzero means a thread has died.Inspect broker logs for the crash signature. Look for
ERRORlines mentioningLogCleaneror compaction. Note the offset or segment file in the stack trace.CorruptRecordExceptionorIllegalStateExceptionduring segment conversion indicates a bad record. If the exception names a specific.logfile, record the full path; you will need it for recovery.Rule out traffic-driven growth. Compare
BytesInPerSecfor__consumer_offsetsagainst its historical baseline. Flat traffic with growing disk means compaction has failed. A spike in commit rate from consumer group rebalances or flapping consumers can also bloat the topic temporarily; if the cleaner is healthy, it will catch up.Rule out retention misconfiguration. Check
offsets.retention.minutes. Expired groups leave orphaned records, but that produces gradual growth, not the runaway growth of a dead cleaner. If retention is already short, orphaned records are not the root cause.Determine if a restart will hold. Transient OOM or one-off format errors recover after a broker restart. Watch the dirty ratio for 10-15 minutes after startup. If the crash references the same segment offset every time, the record is corrupt and the cleaner will die again until the segment is removed.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
kafka.log:type=LogCleanerManager,name=max-dirty-percent | Fraction of compacted log waiting to be cleaned. A rising value means the cleaner cannot keep up with the commit rate. | Sustained value above log.cleaner.min.cleanable.ratio (default 0.5) |
kafka.log:type=LogCleaner,name=DeadThreadCount | Cleaner threads that have crashed and not restarted. Even one dead thread halts compaction for the partitions that thread was responsible for. | Any value above zero |
Disk utilization on log.dirs volumes | Uncleaned segments consume disk without bound because obsolete records are never removed. | Steady growth on a broker hosting compacted topics while traffic is flat |
kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=__consumer_offsets | Separates compaction failure from a traffic surge caused by consumer group rebalances or flapping consumers. | Normal or flat ingest rate alongside growing disk usage |
| Consumer group count | Determines whether the topic baseline is reasonably large. Each unique group-partition key adds one retained record after compaction. | Group count is flat but topic size grows |
kafka.log:type=LogManager,name=OfflineLogDirectoryCount | Consequence if the disk fills completely and the broker marks the directory offline. | Any value above zero |
Fixes
Restart the broker
Warning: Restarting a broker triggers leader election and under-replicated partitions. Plan for transient client latency and monitor UnderReplicatedPartitions during the restart.
If the cleaner died from a transient error, a broker restart revives the cleaner thread. Watch DeadThreadCount and max-dirty-percent for 10-15 minutes. If the dirty ratio falls and no new dead threads appear, the cleaner has recovered.
Remove the corrupt segment
Destructive. Stop the broker before deleting any segment file. When logs show the cleaner crashing repeatedly on the same offset, that segment likely contains a corrupt record. Moving the .log, .index, and .timeindex files out of the partition directory forces the cleaner to skip the segment . Any offset commits stored only in that segment are lost; affected consumer groups may need to reset to a valid offset. Capture the segment filename from the log, stop the broker, move the files, then restart. Be prepared to re-commit offsets or accept reprocessing.
Increase cleaner throughput
If the cleaner is alive but cannot keep up, raise log.cleaner.threads above its default of 1. Ensure log.cleaner.dedupe.buffer.size is large enough for the unique keys in __consumer_offsets. These settings consume additional CPU and heap; monitor RequestHandlerAvgIdlePercent and JVM GC pause times after changing them.
Lower offsets retention
If the topic is large due to thousands of expired groups, reduce offsets.retention.minutes. The cleaner then removes inactive-group offsets sooner. The tradeoff is that consumers restarting after a long idle period lose their offsets and fall back to auto.offset.reset.
Prevention
- Monitor
max-dirty-percentandDeadThreadCount. Alert when dirty ratio climbs abovelog.cleaner.min.cleanable.ratioorDeadThreadCountbecomes nonzero. - Alert on anomalous disk growth for compacted topics. Stable
BytesInPerSecwith growing disk on a broker hosting__consumer_offsetsis a leading indicator. - Audit consumer groups regularly. Remove dead groups to limit unique keys.
- Size cleaner resources for peak commit load. Do not leave
log.cleaner.threadsat 1 on clusters with high group churn or many compacted topics. - Keep
offsets.retention.minutesaligned with actual consumer behavior. Excessively long retention bloats the key space and increases cleaner memory pressure.
How Netdata helps
- Correlate per-broker disk utilization with
BytesInPerSecto rule out traffic spikes. - Surface JVM heap utilization and GC pause patterns that precede cleaner OOM failures.
- Track OS-level disk I/O latency (
await) to differentiate compaction backlog from disk saturation. - Expose JMX metrics such as
max-dirty-percentandDeadThreadCountwithout manualjmxtermqueries. - Alert on log directory growth rate anomalies for compacted topics before disks fill and the broker marks the directory offline.
Related guides
- How Kafka actually works in production: a mental model for operators
- Kafka enable.auto.commit data loss: committed offsets that outrun processing
- Kafka broker out of disk: log.dirs full, the cliff-edge shutdown, and recovery
- Kafka CommitFailedException: rebalanced-out consumers and poll loop timeouts
- Kafka consumer group stuck Empty or Dead: no members consuming
- Kafka consumer group lag growing: detection, lag-as-time, and root causes
- Kafka consumer group rebalancing too often: heartbeats, session timeout, and assignors
- Kafka consumer rebalance storm: stuck in PreparingRebalance and max.poll.interval.ms
- Kafka controller event queue backing up: overwhelmed controller and stalled metadata
- Kafka disk I/O latency high: await, LocalTimeMs, and the slow-disk broker
- Kafka disk space planning: retention, replication, and runway estimation
- Kafka fetch request latency high: FetchConsumer vs FetchFollower and page cache misses







