Cassandra schema disagreement: nodetool describecluster shows multiple versions

nodetool describecluster should report exactly one schema version UUID cluster-wide. When the Schema versions section lists more than one UUID, the cluster is in schema disagreement. DDL (CREATE, ALTER, DROP) fails or hangs until every node converges. DML and reads against existing tables are unaffected.

Transient disagreement lasting less than five minutes during or immediately after a DDL change is normal. Cassandra propagates schema mutations asynchronously via gossip. If multiple versions persist beyond five minutes, you have a stuck node, a partitioned peer, or a migration stage backlog.

What this means

Cassandra stores schema metadata in local system tables and versions the entire schema as a single UUID. When a DDL statement executes, the coordinator proposes a new schema mutation, updates its local version, and gossips the change to the rest of the ring. Each node applies the mutation in its Migration stage and advertises the new version.

If a node is DOWN, unreachable, or its migration thread is stalled, it never applies the mutation. That node retains the old schema UUID while the rest of the cluster moves forward. Cassandra serializes schema changes globally, so it refuses the next DDL until the outlier catches up.

The schema version is a digest of the entire schema definition. Any structural difference produces a different UUID. One UUID means agreement; more than one means divergence.

flowchart TD
    A[nodetool describecluster shows >1 schema version] --> B{Transient? <5 min after DDL}
    B -->|Yes| C[Wait for gossip propagation]
    B -->|No| D{Outlier node status}
    D -->|DOWN| E[Restore node liveness first]
    D -->|UP| F[Check nodetool tpstats Migration pending]
    F -->|Pending > 0| G[Wait or drain node]
    F -->|Pending = 0| H[Run nodetool resetlocalschema on outlier]
    E --> I[Re-check describecluster]
    G --> I
    H --> I
    I -->|Still disagreeing| J[Graceful rolling restart of outlier]
    I -->|One version| K[Resolved]
    J --> K

Common causes

CauseLooks likeFirst check
Transient gossip propagationMultiple versions for seconds to minutes after DDLnodetool describecluster after 60 seconds
Node DOWN or unreachableOne UUID lists a node that is DN in nodetool statusnodetool status
Migration stage backlogSchema changes hang; nodetool tpstats shows pending Migration tasksnodetool tpstats
Rolling upgrade in progressNodes on different Cassandra versions show different UUIDsVersion strings in nodetool status or logs
Partitioned or zombie nodeNode is UP but retains an old version; may appear in UNREACHABLELast line of nodetool describecluster

Quick checks

# Schema version distribution. Each UUID should list all nodes.
nodetool describecluster

# Node liveness. A DN node cannot apply schema changes.
nodetool status

# Pending schema mutations. In 3.x look for MigrationStage; in 4.x, MIGRATION.
nodetool tpstats | grep -i migration

# Gossip reachability. If the outlier is dead, schema cannot propagate.
nodetool gossipinfo

# Active streaming or topology changes that may delay schema application.
nodetool netstats

Pay attention to the last line of nodetool describecluster. It lists UNREACHABLE nodes separately from the schema version groupings. An UNREACHABLE node is a gossip issue that must be resolved before schema reconciliation can succeed.

How to diagnose it

  1. Confirm the symptom is sustained. Run nodetool describecluster. If multiple versions appear, wait five minutes and run it again. If the cluster has just executed DDL, give gossip time to propagate. If it resolves within the window, stop.

  2. Identify the outlier. The output groups nodes by schema UUID. Note which IP addresses are attached to the older version.

  3. Check node liveness. Run nodetool status. If the outlier is DN (Down Normal), the root cause is node failure or network partition, not a schema bug. Recover the node first. Once it rejoins the ring and gossip stabilizes, it should pull the latest schema automatically.

  4. Check for UNREACHABLE nodes. In nodetool describecluster, the last line may list UNREACHABLE endpoints. These nodes are not responding to gossip. Restore gossip connectivity before expecting schema convergence.

  5. Inspect the Migration stage. Run nodetool tpstats and look for the Migration stage. If Pending is greater than zero and not decreasing, the node has queued schema mutations that are not being processed. Correlate with GC pauses, high CPU, or disk saturation on the outlier.

  6. Determine if a rolling upgrade is in progress. If nodes are running different Cassandra versions, schema disagreement is expected because nodes on different versions may not stream schema to each other. Do not run nodetool resetlocalschema in a mixed-version cluster unless the disagreement persists after all nodes are on the same version.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Schema version countDirect indicator of cluster-wide agreement>1 UUID sustained for >5 minutes
Migration stage pending tasksSchema mutations queue here before applicationPending > 0 sustained or growing
Node liveness (nodetool status)DOWN nodes cannot receive or apply mutationsAny node DN or UJ longer than expected
Gossip unreachable membersNodes that cannot be reached for schema syncUNREACHABLE endpoints in describecluster
GC pause durationLong pauses can stall the migration stage and gossipPauses > 2 seconds sustained

Fixes

Outlier node is DOWN or unreachable

Do not attempt schema repair on a node that is not fully in the ring. Bring the node back to UN (Up Normal) first. Once it is reachable via gossip, schema convergence usually happens automatically. If the node returns but still shows the old schema UUID after five minutes, proceed to the next fix.

Migration backlog or hung schema pull on a live node

If the outlier is UP but refuses to converge, run nodetool resetlocalschema on that node. This drops the local node’s schema tables and repopulates them by pulling the current schema from a gossip peer. It is safe to run on a live node, but it forces a full resync and should not be your first reaction to transient disagreement. The node must reach peers via gossip for the pull to succeed.

After running the command, wait up to five minutes and re-check nodetool describecluster. Most cases resolve here.

Persistent disagreement after resetlocalschema

If the cluster still shows multiple versions, perform a graceful rolling restart of the outlier. First, drain the node. This disables native transport and stops client traffic on that node.

nodetool drain

Then restart the Cassandra process. drain flushes memtables and commitlog segments before shutdown, which is safer than a hard kill. After the node rejoins, verify nodetool describecluster shows a single version.

Rolling upgrade scenario

If schema disagreement appears during a rolling upgrade, finish upgrading all nodes before attempting any schema repair. Mixed-version clusters inherently may not agree on schema format. Once every node is on the same version, the disagreement should resolve automatically. If it does not, only then consider nodetool resetlocalschema on the remaining outliers.

Prevention

  • Serialize DDL operations. Do not issue a new CREATE, ALTER, or DROP until nodetool describecluster returns exactly one schema version. Parallel or rapid-fire DDL is a common cause of spurious disagreement.
  • Check nodetool status before running DDL. If any node is DN, UJ, or in a non-NORMAL state, wait until the cluster is fully stable.
  • During rolling restarts or upgrades, pause DDL until the topology is fully stable and all nodes report UN.
  • Monitor the Migration stage pending task count via nodetool tpstats or JMX. A growing queue is an early warning that schema mutations are stalling.

How Netdata helps

Netdata collects the SchemaVersions map via JMX from org.apache.cassandra.db:type=StorageService and exposes whether the cluster is in agreement. You can correlate schema disagreement with node liveness and gossip health.

  • Alerts on sustained schema disagreement (> 5 minutes) without requiring manual nodetool checks.
  • Correlates schema splits with unreachable gossip members to distinguish node failure from migration stalls.
  • Tracks Migration stage pending tasks via JMX thread pool metrics to catch backlog before it blocks DDL.
  • Surfaces the signal alongside GC pause duration and thread pool saturation, helping you determine whether a migration stall is caused by JVM pressure or I/O blocking.