Redis cluster bus port blocked: the port+10000 firewall gotcha
CLUSTER INFO reports cluster_state:fail. Nodes show non-zero cluster_slots_pfail. Clients receive CLUSTERDOWN. Yet redis-cli -p 6379 PING returns PONG on every node, application connections are still accepted, and the client port shows no obvious network outage. The cluster behaves like it is partitioned, but only the bus is broken. Port 16379, or your configured client port plus 10000, is missing from a firewall rule, security group, or container port mapping. The cluster bus carries gossip, failure detection, and node discovery over this separate TCP port. When the bus is unreachable, nodes cannot synchronize the cluster map, so they mark peers as failed and withdraw slot coverage even though the data port stays healthy. Because firewall rules often cover the client port but omit the bus port, this failure mode is common after infrastructure changes, node replacements, or environment migrations.
flowchart TD
A[Bus port blocked] --> B[Heartbeat and gossip loss]
B --> C[Nodes mark peers PFAIL]
C --> D[Master quorum confirms FAIL]
D --> E[Slots lose coverage]
E --> F[cluster_state:fail]
F --> G[CLUSTERDOWN to clients]What this means
In Redis Cluster, every node binds a client command port (default 6379) and a cluster bus port (client port + 10000, default 16379). The bus carries heartbeat exchange, failure detection, slot migration coordination, and node discovery.
If a host firewall, cloud security group, Kubernetes NetworkPolicy, or missing container mapping blocks the bus port, nodes on opposite sides of the block lose gossip connectivity. They continue to serve client traffic, but stop trusting each other. Without heartbeats, a node suspects its peer is dead and marks it PFAIL. If the majority of masters agree, the suspicion becomes FAIL. The failed node’s slots become unavailable, cluster_state transitions to fail, and clients receive CLUSTERDOWN. The root cause is not a crashed node or a true network partition. It is a missing allow rule for port+10000.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Cloud security group or VPC firewall missing port+10000 | Cross-AZ or cross-subnet nodes cannot form a healthy cluster; new nodes hang during join | Security group ingress and egress rules for 16379 between all node private IPs |
| Host-level firewall (iptables, firewalld, nftables) | A single node or rack appears isolated; CLUSTER INFO on the isolated node shows fewer known nodes | Local firewall rules and ss -tlnp for the bus port |
| Kubernetes Service or NetworkPolicy exposing only port 6379 | Pods restart and the cluster never re-forms; only the client port is routable | Service spec ports and NetworkPolicy ingress rules covering the bus port |
| Container runtime port mapping only forwarding 6379 | Nodes on different Docker hosts cannot gossip, but single-host clusters work fine | Container port mappings and host firewall between container hosts |
Quick checks
Run these safe, read-only commands to confirm the bus port is the problem.
# Verify the client port responds while cluster state is broken
redis-cli -p 6379 PING
# Check cluster state and gossip counters
redis-cli CLUSTER INFO
# Inspect node topology and flags
redis-cli CLUSTER NODES
# Verify Redis is listening on the bus port
ss -tlnp | grep ':16379'
# Test bus port reachability from a peer using bash TCP
timeout 2 bash -c "</dev/tcp/<peer_ip>/16379" && echo "open" || echo "blocked"
# Review local firewall rules for the bus port
sudo iptables -L -n | grep 16379
If CLUSTER INFO shows cluster_state:fail while ss -tlnp confirms Redis is listening on 16379, the process is healthy but something on the network path is dropping packets.
How to diagnose it
Collect cluster state from every node. Run
redis-cli CLUSTER INFOon each node. Look forcluster_state:fail, non-zerocluster_slots_pfail, or non-zerocluster_slots_fail. Recordcluster_stats_messages_sentandcluster_stats_messages_received.Compare gossip asymmetry. If node A shows rising
cluster_stats_messages_sentwhile node B shows flat or zerocluster_stats_messages_receivedcorresponding to A, bus traffic is being dropped in one or both directions. This asymmetry is the hallmark of a firewall block.Inspect node topology. Run
redis-cli CLUSTER NODES. Look for nodes with failure flags or nodes that appear disconnected from the local view. If a node has been isolated long enough, it may be markedfail.Verify local binding. Run
ss -tlnp | grep ':16379'to confirm theredis-serverprocess is listening on the bus port. If the port is not bound, check whether the node was started without cluster mode enabled or whether the port is already in use.Test peer reachability. From each node, test connectivity to every other node’s bus port using
nc -zor the/dev/tcpbashism. If the client port (6379) connects but the bus port (16379) does not, the path is filtered.Audit all network layers. Check host-level firewalls (
iptables,nftables,firewalld), cloud security groups, VPC network ACLs, and container network policies. The most common mistake is an allow rule for 6379 that omits 16379.Check for one-way rules. A firewall that allows outbound but not inbound on 16379, or vice versa, still breaks gossip. The bus port needs bidirectional reachability between every pair of nodes.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
cluster_state | Binary indicator of cluster health | fail |
cluster_slots_pfail | Slots on nodes suspected of failure | Non-zero and growing |
cluster_slots_fail | Slots on confirmed-failed nodes | Non-zero |
cluster_stats_messages_sent vs received | Asymmetry means gossip is one-way or dropped | Sent count rising while received is flat or zero |
cluster_known_nodes | Whether the node sees the full topology | Fewer nodes than expected |
connected_clients | Client port may still accept connections | Healthy count despite cluster failure |
A healthy client port alongside a failing cluster state strongly suggests the bus port is blocked rather than a total node failure.
Fixes
Open the bus port in all firewalls
Allow TCP traffic on the bus port (default 16379) between all cluster node IPs. Update cloud security groups, VPC ACLs, host firewalls, and container port mappings to permit bidirectional traffic. You do not need to restart Redis for most firewall changes. The bus port does not need to be exposed to applications or the public internet, but every node must reach every other node’s bus port.
Rejoin isolated nodes
If a node was isolated long enough to be marked FAIL, opening the port may not automatically restore full cluster membership. The node may need to be reintroduced manually. If the node was removed from the topology during incident response, use the standard cluster management commands to re-add it, or restart the node after confirming full connectivity so it rejoins via the remaining nodes. Restarting a node is disruptive and will interrupt client connections.
Recover slot coverage
If the cluster entered fail state and slot assignments were altered during the incident, verify slot ownership with CLUSTER NODES. Correct any misassigned slots before returning the cluster to production traffic. In severe cases where multiple masters independently marked each other failed, restart the affected nodes one at a time after ensuring full bus connectivity. Restarting nodes is disruptive; do this during a maintenance window or with traffic rerouted.
Prevention
Infrastructure-as-code checklist. Every Redis Cluster node provisioning template must open both the client port and the bus port (port+10000) in all security layers. A single omission in one security group or container mapping is enough to break the cluster.
Node bootstrap verification. Before marking a new node as ready, confirm it is listening on the bus port with
ss -tlnp | grep ':16379'and verify connectivity from an existing node to the new node’s bus port.Monitor gossip asymmetry. Alert when
cluster_stats_messages_sentgrows whilecluster_stats_messages_receivedstays flat. This catches one-way firewall rules, asymmetric network policies, or packet loss before they escalate tocluster_state:fail.Track topology size. Monitor
cluster_known_nodesduring scaling events. A drop immediately after a new node joins indicates the bus port is not open to or from the new member.
How Netdata helps
- Track
cluster_state,cluster_slots_pfail, andcluster_slots_failto correlate cluster-wide failures with the exact moment slot coverage dropped. - Surface
cluster_stats_messages_sentandcluster_stats_messages_receivedto detect gossip asymmetry without relying on manualCLUSTER INFOsampling. - Correlate healthy
connected_clientswith failing cluster state to distinguish a bus port block from a total node outage. - Monitor
cluster_known_nodesper node to detect topology drift before it becomes a full partition.
Related guides
- How Redis actually works in production: a mental model for operators: /guides/redis/how-redis-works-in-production/
- Redis NOAUTH / WRONGPASS authentication failures: ACL LOG and credential drift: /guides/redis/redis-acl-noauth-errors/
- Redis aof_last_write_status:err: AOF write failures and recovery: /guides/redis/redis-aof-last-write-status-err/
- Redis appendfsync always latency: durability vs throughput trade-offs: /guides/redis/redis-appendfsync-always-latency/
- Redis big keys: finding the giant key that blocks the event loop: /guides/redis/redis-big-keys-latency/
- Redis blocked_clients growing: dead consumers vs healthy queues: /guides/redis/redis-blocked-clients-growing/
- Redis BUSY Redis is busy running a script: blocking Lua and how to recover: /guides/redis/redis-busy-running-script/
- Redis Can’t save in background: fork: Cannot allocate memory - diagnosis and fix: /guides/redis/redis-cant-save-in-background-fork/
- Redis client output buffer overflow: slow consumers and client-output-buffer-limit: /guides/redis/redis-client-output-buffer-limit/
- Redis cluster_slots_pfail > 0: impending node failure in a cluster: /guides/redis/redis-cluster-slots-pfail/
- Redis CLUSTERDOWN / cluster_state:fail: slot coverage and recovery: /guides/redis/redis-cluster-state-fail/
- Redis connected_clients climbing: connection leak detection: /guides/redis/redis-connected-clients-climbing/







