$ guides / cassandra / cassandra-bootstrap-stuck ▌

Operations Guides

Cassandra node stuck in joining (UJ): bootstrap diagnosis

You add a node to the ring, run nodetool status, and see it stuck in UJ (Up/Joining) for hours. The cluster sees it in gossip, but it never transitions to UN (Up/Normal). Client drivers do not route traffic to it, so the expansion has not added usable capacity. Until the state changes, the node is a ghost member: visible to the ring but unable to serve reads or writes for its assigned token ranges.

Bootstrap streams the SSTables that belong to the new node’s assigned token ranges from current replica owners. The joining node is passive for client traffic until every byte is received, validated, and made available locally. If a stream stalls, fails silently, or the joining node is interrupted mid-transfer, it stays in UJ indefinitely. The most common root cause is not the joining node; it is the health and capacity of the source nodes serving the stream.

What this means

UJ means the node has passed gossip startup and token allocation, but has not finished ingesting its replica data. It holds a token assignment, so the cluster knows it owns ranges, yet it cannot serve them. Streaming sessions are TCP-based, long-lived transfers of SSTable files. They are vulnerable to anything that interrupts sustained disk read on the source side: disk saturation, GC pauses, corrupt files, or network blips.

Modern Cassandra versions persist partial bootstrap progress, so a restart can resume from the last checkpoint rather than starting over entirely. Even so, the stream will not complete until every source-side blockage is cleared. Because streaming reads raw SSTables from disk, it competes directly with compaction, client reads, and memtable flushes on the source node. When those consumers saturate the disk, the stream starves.

flowchart TD
    A[Node enters UJ state] --> B[Select source replicas]
    B --> C[Stream SSTables per range]
    C --> D{Progress stalls?}
    D -->|No| E[Continue until complete]
    E --> F[Transition to UN]
    D -->|Yes| G[Check source disk I/O]
    G --> H[Check source GC and heap]
    H --> I[Check for corrupt SSTables]
    I --> J[Resume or restart join]
    J --> C

Common causes

Cause	What it looks like	First thing to check
Source node disk saturation	`nodetool netstats` shows no byte increase between samples; source `iostat` shows high `%util` or `await`	`iostat -x 1` on each source node
Source node GC pressure	Source node drops messages or flaps between UP and DOWN during the stream	GC logs and `nodetool info` heap usage on source
Corrupt SSTable on source	Stream fails repeatedly at the same file or token range; errors in system log	`nodetool verify` on the source replica
Too many parallel token ranges	High `num_tokens` creates many concurrent streams, overwhelming source heap or disk	`nodetool netstats` session count and `num_tokens` in cassandra.yaml
Thread pool saturation on source	`nodetool tpstats` on source shows pending or blocked MutationStage or ReadStage tasks	`nodetool tpstats` on source nodes
File descriptor exhaustion	Source or joining node cannot open new SSTable components	`nodetool info` FD count versus `ulimit -n`
Network or internode timeout	Session breaks with timeout errors in logs; network latency between nodes spikes	Connectivity and error logs on both sides

Quick checks

# Confirm the node is still joining
nodetool status

# Inspect active streaming sessions and bytes received
nodetool netstats

# Check for backpressure in internal thread pools
nodetool tpstats

# Check source node disk saturation
iostat -x 1

# Review recent errors and timeouts
grep -iE "stream|timeout|corrupt" /var/log/cassandra/system.log

# Check heap and GC health on the source
nodetool info | grep -i "Heap Memory"
grep -i "pause" /var/log/cassandra/gc.log | tail -20

# Check compaction backlog on source
nodetool compactionstats

# Check file descriptor pressure
nodetool info | grep "File Descriptors"
ulimit -n

How to diagnose it

Confirm UJ state with nodetool status and identify the streaming sources from nodetool netstats on the joining node.
Sample nodetool netstats twice, spaced by a few minutes. If bytes received or files completed do not increment, the stream is stalled.
Log into the source nodes identified in netstats. Run iostat -x 1 and check %util and await. If the disk backing the data directory is saturated, streaming reads are queued behind compaction and client traffic.
On the source nodes, run nodetool tpstats. Sustained pending tasks in MutationStage, ReadStage, or CompactionExecutor mean the node is too loaded to serve streams promptly.
Check the source node GC logs. Stop-the-world pauses longer than a few seconds can cause internode messaging timeouts, which tear down streaming sessions.
Search system.log on both sides for CorruptSSTableException, FSError, or stream timeout messages. A single corrupt SSTable on a source replica can block an entire range transfer.
If the joining node was restarted mid-bootstrap, check nodetool netstats for resumed progress. On versions that support resumable bootstrap, uncompleted ranges replay from the last checkpoint. Repeated restarts can still leave gaps or conflicting sessions. If sessions look inconsistent, restart the joining node only after all source nodes are stable.
If the joining node has many concurrent sessions in nodetool netstats, check num_tokens in cassandra.yaml. A very high vnode count increases parallel stream count and can saturate source-side heap or I/O.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
Streaming incoming bytes	Direct measure of bootstrap progress	Flat for more than 30 minutes
Disk I/O await on source	High await means source disk cannot read SSTables fast enough	`await` greater than 50 ms sustained
GC pause duration on source	Long pauses break internode TCP sessions and stall streams	Pauses greater than 2 seconds
Pending compactions on source	Compaction competes for the same disk as streaming reads	Count trending upward during bootstrap
Thread pool pending tasks	Queued tasks mean the source cannot keep up with requests	Pending greater than 0 in MutationStage or ReadStage
Dropped messages on source	The node is shedding load; streams may be next	Any sustained non-zero rate
File descriptor usage	FD exhaustion prevents opening SSTable files	Usage greater than 80% of ulimit
Pending flushes	Write path saturation delays all disk operations	MemtableFlushWriter pending greater than 0 sustained

Fixes

Address source node disk saturation

If iostat shows the data device is saturated, streaming cannot proceed until I/O is freed. Pause non-critical repairs, reduce compaction throughput with nodetool setcompactionthroughput, or schedule the bootstrap during a lower-traffic window. Adding IOPS to the source node or moving the commitlog to a separate device are longer-term fixes. Do not raise streaming socket timeouts to mask the stall; the timeout is a symptom, and extending it without fixing the source disk will prolong the incident.

Reduce pressure on source nodes

If the source node is in a GC death spiral or thread pool saturation, stop increasing load. Do not repeatedly trigger resume operations while the source is unhealthy; the stream will only fail again. Wait for the source node to return to a stable state with zero pending tasks and normal GC before allowing the join to continue.

Handle corrupt SSTables

If nodetool verify on a source node reports corruption, that SSTable must be replaced or repaired. nodetool verify reads every row and is expensive on large tables; run it during low traffic. If replication factor is greater than one, you can temporarily take the corrupt source node offline so the joining node streams from healthy replicas instead. After the new node joins, run a full repair on the affected range.

Resume or restart the joining node

If the stream failed but the joining node persists bootstrap state, a clean restart of the joining node will resume from the last checkpoint. Verify with nodetool netstats that progress continues. If the node does not support resumable bootstrap, you may need to wipe the data directory and restart the bootstrap from scratch after fixing the source-side issue.

WARNING: Wiping the data directory is destructive. Stop Cassandra, clear the data, commitlog, and saved_caches directories, and ensure the node is fully removed from the ring before you re-bootstrap.

Lower the parallel stream count

A high num_tokens value increases the number of token ranges and therefore the number of concurrent streaming sessions. If source nodes are OOMing or saturating disk, reducing num_tokens requires reconfiguring and re-bootstrapping the joining node, but it can make large-node bootstraps stable.

Run repair after recovery

Any bootstrap that was interrupted or resumed after timeout may have missed writes, especially if hints were not delivered during the window. Once the node reaches UN, run nodetool repair to reconcile any inconsistencies before the node serves production traffic.

Prevention

Validate source node health before bootstrap. Check nodetool compactionstats, heap usage, and disk headroom.
Schedule bootstrap during off-peak hours when source node I/O and GC are stable.
Monitor source node disk latency and thread pools continuously during the operation.
Keep num_tokens aligned with your heap and disk capacity. Very large nodes may need fewer vnodes.
Verify SSTable integrity with nodetool verify before major topology changes.
In containerized environments, use Pod Disruption Budgets to prevent mid-stream pod eviction.

How Netdata helps

Correlate flat streaming throughput on the joining node with disk latency spikes on the source node in the same time window.
Track GC pause duration on source nodes to preempt streaming timeouts before sessions break.
Alert on sustained pending tasks in the MutationStage and CompactionExecutor during bootstrap operations.
Monitor off-heap memory growth on source nodes to catch OOM risk from too many concurrent SSTable transfers.
Surface file descriptor utilization per node to detect the approach of ulimit exhaustion during heavy streaming.

The Netdata solution

Cassandra monitoring with Netdata

Netdata monitors Apache Cassandra with per-second metrics and automatic dashboards. Correlate GC pauses, compaction backlog, tombstone rates, pending hints, and disk usage across nodes to catch a creeping cluster before it tips over.

See Cassandra monitoring → Start monitoring free

Cassandra node stuck in joining (UJ): bootstrap diagnosis

Cassandra node stuck in joining (UJ): bootstrap diagnosis

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Address source node disk saturation

Reduce pressure on source nodes

Handle corrupt SSTables

Resume or restart the joining node

Lower the parallel stream count

Run repair after recovery

Prevention

How Netdata helps

Related guides

Cassandra monitoring with Netdata