Kafka disk space planning: retention, replication, and runway estimation

When a Kafka broker exhausts a log.dirs volume, the directory goes offline. If it is the only volume, the broker halts. Partitions on that volume become unavailable and the broker cannot rejoin until space is freed. Disk planning is a continuous operational calculation, not a one-time purchase. You need a working estimate of steady-state usage, a runway model against an operational full threshold, and an explicit map of the retention rules that actually delete data.

What steady-state usage looks like

At the cluster level, Kafka stores every produced byte replication.factor times and keeps it for the effective retention window. Per-broker steady state, assuming even partition distribution, is:

steady_state_bytes_per_broker = (bytes_in_per_sec × retention_seconds × replication_factor) / broker_count

Use the compressed producer ingress rate for bytes_in_per_sec. If you sample from a broker, the broker-level kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec includes replication traffic received as a follower. For capacity planning, aggregate topic-level BytesInPerSec across all topics, or measure total cluster producer ingress and divide by broker count. This formula is an average; heavy partition skew or large compacted topics can push individual brokers above the mean, so size the worst-case broker, not the mean.

The formula also assumes retention is actually reclaiming data. If retention is misconfigured, the topic is compacted, or the log cleaner has stalled, the effective growth rate is bytes_in - bytes_deleted_by_retention, and that denominator is what really matters.

Segments roll at log.segment.bytes (default 1 GiB) or segment.ms (default 7 days), whichever comes first. Retention only deletes closed segments. Expect every partition to briefly hold one active segment above the retention cap, plus a short delay while eligible segments wait for log.segment.delete.delay.ms (default 1 minute) before deletion.

The retention trap: time and size are independent

retention.ms and retention.bytes are independent stop conditions. Whichever fires first removes the segment. Setting both does not mean “keep data until both limits are reached”; it means “delete the segment as soon as either limit is reached.” Topic-level retention configs override broker-level log.retention.*. Verify effective configs with:

kafka-configs.sh --bootstrap-server localhost:9092 --describe --topic my-topic

This is a common source of surprise growth. A team sets retention.bytes to 500 GiB and retention.ms to 7 days, expecting 7 days of data. If the 500 GiB threshold is crossed on day 3, Kafka deletes older segments immediately. Conversely, if write volume is low, the topic can sit well under 500 GiB while still deleting data at day 7. Plan against the stricter boundary for your workload.

Pure compact topics ignore retention.ms and retention.bytes. Their size is driven by the number of unique keys and the average value size. A compacted topic with a billion unique keys and 1 KiB messages approaches 1 TiB regardless of retention settings. __consumer_offsets is compacted by default and is the most common silent disk consumer. If a topic uses cleanup.policy=compact,delete, both compaction and retention apply.

Runway and the 85% threshold

The operational full line is 85%, not 100%. Runway in days is:

runway_days = (disk_capacity × 0.85 - current_used)
              / (bytes_in_per_sec × replication_factor × 86400)

If retention is working and the cluster is near steady state, the numerator should stay roughly flat and runway should remain long. If runway is shrinking, retention is not reclaiming space, ingress is growing, or both. Recompute runway after any change to retention, replication factor, partition reassignment, or burst traffic.

Do not wait until 90% to act. By 90% you have lost the margin to absorb a compaction pass, a reassignment, or a traffic spike without crossing the cliff.

Headroom for compaction, reassignment, and bursts

Keep at least 15-20% of each data volume free at all times. That headroom is not idle waste; it covers predictable operational events:

  • Compaction doubling. The log cleaner rewrites closed segments. Until compaction finishes, the old and new copies coexist on disk. Peak usage for a compacted partition can approach clean + dirty before the old segments are deleted.
  • Partition reassignment. kafka-reassign-partitions.sh copies replicas to the target broker before removing them from the source. The target broker needs enough space to hold both the incoming copies and its existing data until the reassignment completes.
  • Burst traffic. Producer batching, seasonal spikes, or failed consumers that stop advancing offsets can all cause a temporary step change in disk usage.

Because log.dirs can specify multiple directories, evaluate each volume independently. Kafka places new partitions round-robin, but existing segments do not move automatically. One disk can reach 85% while another sits at 50%, especially after reassignment or leadership changes.

Where the plan breaks in production

JBOD skew

With multiple log.dirs, per-broker aggregates hide per-disk reality. A single disk that hosts the largest partitions, the most compacted topics, or the heaviest leaders fills first. Monitor each mount point separately, not just the host average.

Compacted topics and silent cleaner death

The log cleaner thread can crash silently on a corrupt record or an out-of-memory condition and it does not restart automatically. Disk usage for compacted topics then grows without bound. Watch kafka.log:type=LogCleanerManager,name=max-dirty-percent and kafka.log:type=LogCleaner,name=DeadThreadCount. A dirty ratio sustained above log.cleaner.min.cleanable.ratio (default 0.5) or any dead cleaner thread is a ticket.

Tiered storage local retention

With tiered storage enabled, local disk holds only recent segments controlled by local.retention.bytes and local.retention.ms. Plan local capacity around the local retention window plus upload lag, not the full topic retention. If uploads fall behind, local segments stay on disk longer and your local footprint expands. Track the lag between local generation and remote upload as a disk-sizing risk.

Retention not running

log.retention.check.interval.ms defaults to 5 minutes. Under heavy load, a lot of data can accumulate between checks. More importantly, if both retention.ms and retention.bytes are set but misunderstood, operators may think one policy protects them while the other has already deleted or retained data. Always verify observed disk usage against the stricter expected boundary.

Signals to watch in production

SignalWhy it mattersWarning sign
Disk utilization per log.dirs mountKafka takes the directory offline near fullTICKET at 75%; PAGE at 90% or runway under 4 hours
BytesInPerSec (producer ingress)Feeds the steady-state formula and runway modelSustained growth that outpaces retention
kafka.log:type=LogManager,name=OfflineLogDirectoryCountConfirms a volume has gone offlineAny nonzero value is a PAGE
kafka.log:type=LogCleanerManager,name=max-dirty-percentDetects compaction falling behindSustained above 50%
kafka.log:type=LogCleaner,name=DeadThreadCountSilent cleaner death causes unbounded compacted-topic growthAny value above 0
kafka.server:type=ReplicaManager,name=UnderReplicatedPartitionsDisk pressure often precedes ISR shrinkNonzero outside maintenance
kafka.server:type=ReplicaManager,name=ReassigningPartitionsReassignment temporarily inflates disk usageCoordinate capacity checks with reassignment windows

Check current log directory sizes with:

kafka-log-dirs.sh --bootstrap-server localhost:9092 --describe

This is a read-only operation. Use the per-directory and per-partition sizes to identify a hot mount before the broker takes it offline.

For compacted-topic health, search the broker log for cleaner or compaction errors. The exact path depends on your packaging; common locations include /var/log/kafka/server.log, the systemd journal, or $KAFKA_HOME/logs/server.log.

How Netdata helps

  • Correlates per-mount disk utilization with broker-level BytesInPerSec so you can distinguish normal retention oscillation from true growth.
  • Surfaces disk runway by combining volume capacity, current usage, and recent growth rate in one view.
  • Highlights UnderReplicatedPartitions alongside disk metrics to catch the early phase of disk-pressure-induced ISR shrink.
  • Tracks log cleaner dirty ratio and dead-thread signals where exposed, reducing the risk of silent compaction failure filling the disk.