$ guides / cassandra / cassandra-native-transport-not-running ▌

Operations Guides

Cassandra native transport not running: node UP in gossip but refusing CQL clients

A node reports UN in nodetool status but rejects CQL connections on port 9042. Gossip and replication are healthy; the failure is isolated to the native transport layer.

Because the node remains in the token ring, it continues to handle internode replication, gossip, and streaming. Applications see it as down; the cluster sees it as up. The JMX attribute NativeTransportRunning on org.apache.cassandra.db:type=StorageService is false while gossip heartbeats continue. The usual triggers are nodetool disablebinary left active after maintenance, or a firewall blocking TCP 9042.

flowchart TD
    A[Clients refuse CQL on 9042] --> B{nodetool status}
    B -->|UJ| C[Bootstrap: wait for streaming]
    B -->|DS| D[Drained: restart required]
    B -->|UN| E{nodetool statusbinary}
    E -->|not running| F{Maintenance window?}
    F -->|yes| G[nodetool enablebinary]
    F -->|no| H{Port 9042 reachable?}
    H -->|no| I[Firewall or security group]
    H -->|yes| J[Investigate RPC state]

Common causes

Cause	What it looks like	First thing to check
`nodetool disablebinary` left active after maintenance	Node `UN`; `nodetool statusbinary` returns `not running`; ops logs show recent maintenance	`nodetool statusbinary` and maintenance calendar
Firewall, security group, or host firewall blocks 9042	Node `UN`; `statusbinary` reports `running` locally; clients timeout from application subnet	`nc -vz <node-ip> 9042` from a client host
Node joining the cluster	Node shows `UJ`; native transport binds after bootstrap streaming completes	`nodetool status` for `UJ` state
`nodetool drain` confusion	`drain` stops native transport and gossip; node shows `DS` and requires restart	`nodetool status` to confirm state letter

Quick checks

# Is native transport enabled?
nodetool statusbinary

# Confirm state in nodetool info
nodetool info | grep "Native Transport"

# Gossip state: UN, UJ, DS, etc.
nodetool status

# CQL port reachability from the client subnet
nc -vz <node-ip> 9042

# Local port binding (run as root or the cassandra user to see PIDs)
ss -tlnp | grep 9042

# Configured native transport port
grep -E "^native_transport_port:" /etc/cassandra/cassandra.yaml

# If client encryption is required, check the SSL port
grep -E "^native_transport_port_ssl:" /etc/cassandra/cassandra.yaml

# Connected client count
nodetool clientstats

# Recent disablebinary/enablebinary activity in logs
grep -E "disablebinary|enablebinary" /var/log/cassandra/system.log

# Streaming progress on new nodes
nodetool netstats

How to diagnose it

Confirm gossip state. Run nodetool status. DS means the node was drained and needs a restart before accepting CQL. UJ means it is bootstrapping; native transport starts only after streaming finishes and the state transitions to NORMAL. UN means the node is in the ring but the binary interface is off.
Check native transport directly. Run nodetool statusbinary and nodetool info | grep "Native Transport". Both should report running on a traffic-bearing node. If they report not running, Cassandra is not accepting CQL connections regardless of gossip health.
Distinguish maintenance from incident. Review system.log for disablebinary, enablebinary, Native transport service stopped, or Stopping native transport. Check your maintenance calendar. If nodetool disablebinary was run without a matching enablebinary, the fix is simply to re-enable it. This is the most common cause of a UN node refusing clients.
Verify the network path. Even when native transport is running, a firewall, security group, or iptables rule can block 9042 between the application and the node. Run nc -vz <node-ip> 9042 from an application host. If this fails while ss -tlnp on the Cassandra node shows 9042 in LISTEN, the block is external.
Validate the configured port. If the node uses a non-default native_transport_port in cassandra.yaml, clients and load balancers may target the wrong port. The default is 9042. If client encryption is mandatory, check native_transport_port_ssl (default 9142) and ensure clients are not trying to connect plaintext.
Assess bootstrap progress. If the node is new or replaced, check nodetool netstats. If streaming is active, the node is UJ and intentionally delays native transport until bootstrap finishes. Do not force enablebinary during bootstrap.
Check for port conflicts. In system.log, look for BindException on port 9042. If another process has bound the port, Cassandra native transport will fail to start. This is rare but can occur during failed restarts or container port collisions.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`NativeTransportRunning` (JMX: `StorageService`)	Direct boolean for CQL transport state	`false` while node is `UN`
`nodetool statusbinary`	Operator-facing transport state	Returns `not running` on a production node
`connectedNativeClients` (JMX: `Client`)	Active CQL session count	Sudden drop to zero on a traffic-bearing node
Gossip state (`nodetool status`)	Separates bootstrap/drain from binary-only issues	`UJ` or `DS` explains transport unavailability
Port 9042 connectivity	Separates Cassandra state from network policy	Refused or timeout from client subnet
Client request timeouts / unavailables	Direct client impact	Spikes correlate with transport downtime

Fixes

Re-enable native transport after maintenance

If nodetool statusbinary reports not running and the node was taken out of client rotation intentionally:

nodetool enablebinary

Verify:

nodetool statusbinary
nodetool clientstats

No restart is required. If multiple nodes in the same rack or replica set were disabled, re-enable them one at a time and verify client connections recover before proceeding. Re-enabling many nodes simultaneously can cause a reconnect storm.

If clients continue to timeout after enablebinary, the driver may have temporarily blacklisted the node. Wait for the driver’s reconnection window, or restart the application connection pool if the driver does not retry the node automatically.

Unblock port 9042

If the transport is running locally but clients cannot connect:

Cloud security groups: Allow ingress TCP 9042 from the client subnet.
Host firewall (iptables/ufw/firewalld): Add an allow rule for 9042.
Container networking: Verify the container port is mapped and the CNI policy permits the connection.

Restrict the source to your client subnet. Unless client_encryption_options is enabled, native transport traffic is unencrypted, and exposing it broadly is a security risk.

Wait for bootstrap or restart after drain

If the node is UJ, do not force enablebinary. Native transport binds after bootstrap completes by design. Monitor nodetool netstats until streaming finishes and the state transitions to UN.

If the node is DS after nodetool drain, enablebinary will not restore service. The node requires a full Cassandra process restart.

Prevention

Document runbooks clearly. Distinguish nodetool disablebinary (reversible, gossip stays UP) from nodetool drain (requires restart, shows DS). Never use them interchangeably. Post-maintenance verification must include nodetool statusbinary.
Monitor native transport state alongside gossip. Alerting only on nodetool status misses this failure mode. Include NativeTransportRunning or nodetool statusbinary in your availability checks, and suppress alerts on connectedNativeClients = 0 when a node is tagged for maintenance.
Verify 9042 end-to-end. Run periodic connectivity checks from the client subnet, not just localhost, to catch firewall drift before applications fail. A node can pass all local health checks while a security group change blocks remote clients.

How Netdata helps

Correlates NativeTransportRunning = false with gossip UP state to surface nodes that look healthy to the cluster but are unreachable by clients.
Tracks connectedNativeClients per node to detect sudden disconnection events that precede application timeouts.
Charts client request rates and errors against transport state changes, distinguishing binary transport outages from quorum loss or GC pauses.
Alerts on zero connected clients for nodes that historically carry traffic, catching disablebinary or firewall issues before downstream services degrade.

The Netdata solution

Cassandra monitoring with Netdata

Netdata monitors Apache Cassandra with per-second metrics and automatic dashboards. Correlate GC pauses, compaction backlog, tombstone rates, pending hints, and disk usage across nodes to catch a creeping cluster before it tips over.

See Cassandra monitoring → Start monitoring free

Cassandra native transport not running: node UP in gossip but refusing CQL clients

Cassandra native transport not running: node UP in gossip but refusing CQL clients

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Re-enable native transport after maintenance

Unblock port 9042

Wait for bootstrap or restart after drain

Prevention

How Netdata helps

Related guides

Cassandra monitoring with Netdata