Kafka network egress saturation: BytesOutPerSec, replication amplification, and fan-out
Kafka is usually sized for producer ingress because that is what the business reports. In practice, the first resource to saturate is almost always network egress. Every byte written to a partition leader is read at least once by each follower replica and once by every active consumer group keeping up with the log. A broker can hit its NIC ceiling while producer throughput looks comfortable and disk I/O is barely warmed up.
The degradation curve is gradual, then abrupt. Between roughly 50% and 70% of line rate, latency rises as kernel queues and TCP buffers fill. Above 70%, retransmits climb and effective throughput drops. Because both producers and consumers retry, the visible symptom is usually a latency spike or an ISR shrink, not a clean throughput plateau. Knowing how BytesOutPerSec is composed, where the amplification comes from, and which signals predict the cliff separates capacity planning from a 03:00 page.
What it is and why it matters
kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec measures all data leaving the broker to serve fetch requests: follower replicas pulling to stay in sync and consumers reading for their groups. It does not include produce traffic, which is measured separately by BytesInPerSec.
The egress multiplier on a leader broker is approximately:
(replication_factor - 1) + number_of_active_consumer_groups
With replication.factor=3 and two consumer groups, every byte of ingress generates roughly four bytes of egress on the leader: two for the followers, two for the consumer groups. A third group pushes the multiplier to five. In clusters where many services read the same firehose topic, the NIC becomes the hard bottleneck long before disk or CPU.
Egress saturation is dangerous because replication and consumer traffic share the same network threads and NIC. When replication fetches stall due to network queuing, followers fall behind. If they exceed replica.lag.time.max.ms, the leader shrinks the ISR. Once durability is degraded, a second failure can push partitions below min.insync.replicas and block the write path entirely.
How it works
A partition leader handles two read workloads from the same log segments:
- Follower fetches. Each follower in the ISR pulls data from the leader. The leader emits
(RF-1)full copies of the log to keep replicas in sync. - Consumer fetches. Every active consumer group fetches independently. Four groups reading the same topic means four additional full copies from the leader.
Both request types flow through the broker’s network threads (num.network.threads, default 3) and are ideally served via zero-copy sendfile from the OS page cache to the socket. This avoids JVM heap copies and keeps CPU usage low.
Zero-copy is lost during message down-conversion. When an older client requires format translation, the broker materializes records on-heap, converts them, and sends the result. The metric kafka.server:type=BrokerTopicMetrics,name=MessageConversionsPerSec exposes this. Non-zero conversion volume during peak traffic drops effective network throughput because the same NIC capacity now costs significantly more CPU and memory.
TLS encryption also raises network thread utilization. TLS handshakes are CPU-intensive and run on network threads. In encrypted environments, the default thread count is often too low, causing NetworkProcessorAvgIdlePercent to drop even when BytesOutPerSec is below the NIC line rate.
The saturation curve follows TCP behavior. Below 50% of line rate, throughput is linear. Between 50% and 70%, latency rises as buffers fill. Sustained operation above 70% leaves no headroom for bursts, retransmits, or catch-up replication after a restart. The cliff arrives when queuing causes follower fetch timeouts, ISR shrinks, and producer retry storms that add load exactly when the broker is least able to handle it.
flowchart LR P[Producer] -->|BytesIn| L[Leader broker] L -->|Replication| F1[Follower replica] L -->|Replication| F2[Follower replica] L -->|Consumer fetch| C1[Consumer group A] L -->|Consumer fetch| C2[Consumer group B] L -->|BytesOut| N[NIC capacity]
Where it shows up in production
New consumer groups multiply egress instantly. Onboarding a service that reads an existing topic from latest offset adds a full copy of the read bandwidth to every leader. If the cluster was sized for two consumer groups, adding a third or fourth can push leaders from 50% to 80% NIC utilization without any change in producer traffic.
Backfill and reprocessing. A consumer resetting offsets to earliest, a new mirror pipeline, or a stream processing job rebuilding state reads historical data from disk. The temporary egress spike can saturate NICs and evict hot data from the page cache. Page cache thrashing from backfill consumers raises FetchConsumer LocalTimeMs and degrades latency for all consumers, including tail readers.
Cross-rack replication. In multi-rack or multi-AZ deployments, replication traffic traverses spine links. If inter-rack bandwidth is oversubscribed relative to broker NIC speed, replication becomes the bottleneck before the local interface is full. Rack-aware replication changes how you interpret ISR shrinks: a rack failure should shrink the ISR without losing availability, but if the network layer is the constraint, the remaining leaders may not serve followers fast enough to keep them in sync.
Down-conversion storms. Upgrading brokers before all clients can trigger MessageConversionsPerSec spikes. The same logical egress volume now flows through the JVM heap, increasing GC pressure and reducing the bytes the broker can push per second.
Rolling restarts and catch-up. When a broker restarts, its hosted replicas must catch up from leaders. The leaders serving that catch-up traffic see an egress burst proportional to the backlog. If the cluster was already above 60% sustained egress, the restart can tip it into saturation.
Tradeoffs and common misuses
Sizing NICs for ingress only. Budgeting disk and NIC for the producer throughput target ignores the fan-out multiplier. Size NICs for ingress × (RF-1 + expected_consumer_groups) and keep sustained egress below 70% of capacity.
Using a single listener for replication and clients. Inter-broker and client traffic should be separated onto different listeners or subnets. A consumer backfill or a misbehaving client can starve replication fetches. If replication starves, ISR shrinks follow, adding controller load and write-path latency.
Assuming zero-copy is always on. Monitor MessageConversionsPerSec. If it is non-zero during peak egress, effective NIC capacity is reduced because data is being copied through the heap.
Ignoring follower fetch load after enabling KIP-392. Redirecting consumer reads to follower replicas (using replica.selector.class, Kafka 2.4+) can isolate read traffic. However, follower distribution may be uneven, concentrating fetch load on specific brokers. If you move consumer reads to followers without monitoring per-broker BytesOutPerSec, you may simply move the bottleneck.
Throttling only producers during saturation. Kafka quotas can throttle consumer byte rates, which is useful for backfill isolation. Replication traffic, however, is not throttled by standard client quotas. If the root cause is too many consumer groups, throttling producers only delays the inevitable and increases write latency without solving the egress multiplier.
Monitoring only aggregate broker egress. BytesOutPerSec at the broker level hides which topic is driving saturation. Break it down by topic (topic=TOPIC in the MBean name) to identify a firehose topic before adding more consumer groups.
Signals to watch in production
| Signal | Why it matters | Warning sign |
|---|---|---|
kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec | Direct measure of total egress. | Sustained >70% of NIC capacity. |
kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec | Baseline for the amplification ratio. | Per-broker ratio BytesOutPerSec / BytesInPerSec climbs above (RF-1 + known_consumer_groups). |
kafka.server:type=BrokerTopicMetrics,name=MessageConversionsPerSec | Indicates zero-copy is lost; egress becomes heap-bound and CPU-bound. | Sustained non-zero during peak traffic. |
kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer | Time spent pushing fetch responses onto the wire. | p99 increasing while BytesOutPerSec is flat or dropping (sign of kernel or NIC queuing). |
kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent | Network thread saturation. TLS increases thread cost. | Sustained below 0.3; with TLS, monitor for earlier drops. |
kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower | Leader’s ability to serve replication traffic. | p99 approaching replica.lag.time.max.ms (default 30s), which triggers ISR shrinks. |
OS TCP retransmit rate (/proc/net/snmp RetransSegs) | Packet loss and queuing invisible to Kafka JMX. | Per-second rate increasing above baseline; read the counter twice and diff. |
How Netdata helps
- Correlates per-broker
BytesOutPerSecandBytesInPerSecwith OS-level NIC utilization and TCP retransmit rates so you can see the amplification ratio and the hardware ceiling together. - Surfaces
MessageConversionsPerSecalongside CPU and network metrics to flag down-conversion overhead that pure network monitoring would miss. - Tracks
NetworkProcessorAvgIdlePercentandResponseSendTimeMsper broker to distinguish NIC saturation from network thread saturation caused by TLS or connection storms. - Uses per-second granularity to catch transient egress bursts from backfill consumers or replica catch-up that minutely averages would smooth over.
- Cross-broker heatmaps make hot spots from uneven leadership or follower fetch load visible, showing which broker will saturate first.
Related guides
- How Kafka actually works in production: a mental model for operators: /guides/kafka/how-kafka-works-in-production/
- Kafka enable.auto.commit data loss: committed offsets that outrun processing: /guides/kafka/kafka-auto-commit-silent-data-loss/
- Kafka ‘Broker may not be available’: clients that can’t connect or stay connected: /guides/kafka/kafka-broker-may-not-be-available/
- Kafka broker out of disk: log.dirs full, the cliff-edge shutdown, and recovery: /guides/kafka/kafka-broker-out-of-disk/
- Kafka CommitFailedException: rebalanced-out consumers and poll loop timeouts: /guides/kafka/kafka-commit-failed-exception/
- Kafka consumer group stuck Empty or Dead: no members consuming: /guides/kafka/kafka-consumer-group-empty-stuck/
- Kafka consumer group lag growing: detection, lag-as-time, and root causes: /guides/kafka/kafka-consumer-group-lag-growing/
- Kafka consumer group rebalancing too often: heartbeats, session timeout, and assignors: /guides/kafka/kafka-consumer-group-rebalancing-frequently/
- Kafka __consumer_offsets growing huge: compaction failure on the offsets topic: /guides/kafka/kafka-consumer-offsets-topic-growing/
- Kafka consumer rebalance storm: stuck in PreparingRebalance and max.poll.interval.ms: /guides/kafka/kafka-consumer-rebalance-storm/
- Kafka controller event queue backing up: overwhelmed controller and stalled metadata: /guides/kafka/kafka-controller-event-queue-backup/
- Kafka disk I/O latency high: await, LocalTimeMs, and the slow-disk broker: /guides/kafka/kafka-disk-io-latency-high/







