MySQL GTID errant transactions: detecting replication divergence

Before a planned failover, or after an orchestrator aborts with a GTID consistency error, replication can appear healthy: SHOW REPLICA STATUS reports both threads running, Seconds_Behind_Source is near zero, and the error log is quiet. Yet comparing GTID sets between source and replica reveals mismatching numbers. Errant transactions are GTIDs present in a replica’s gtid_executed set that the source never generated.

Ordinary lag resolves as the replica catches up. Errant transactions represent true divergence. Promoting a replica with extra GTIDs propagates those transactions into the new source and downstream replicas, creating split-brain that is expensive to reverse. This guide covers distinguishing benign lag from dangerous divergence, pinpointing offending GTIDs, and deciding between empty-transaction injection and a full rebuild.

What this means

In GTID replication, every committed transaction carries a unique identifier composed of a UUID and a sequence number. Every server tracks its committed transaction history in gtid_executed.

Ordinary lag is benign. When GTID_SUBTRACT(source_executed, replica_executed) returns a non-empty set, the replica is behind. As long as the I/O thread is fetching events and the SQL thread is applying them, the replica converges without operator intervention.

True divergence is the inverse. When GTID_SUBTRACT(replica_executed, source_executed) returns a non-empty set, the replica has transactions the source lacks. This can happen after a failed failover where a replica was briefly promoted and accepted writes, after direct writes to a replica with read_only=OFF, or after maintenance scripts executed without disabling binary logging. The replica is not behind; it has branched.

A separate but equally serious failure is source gtid_purged advancing past the replica’s needs. If the source purges binary logs containing GTIDs the replica has not yet consumed, auto-positioning cannot resolve the gap. The replica is incomplete, not divergent. The only resolution is a full rebuild.

flowchart TD
    A[Collect GTID sets] --> B{Source minus replica
non-empty?} B -->|Yes| C[Ordinary lag] B -->|No| D{Replica minus source
non-empty?} D -->|Yes| E[Errant transactions] D -->|No| F[Consistent] E --> G{Source purged
past replica needs?} G -->|Yes| H[Rebuild required] G -->|No| I[Empty transaction fix]

Common causes

CauseWhat it looks likeFirst thing to check
Failed failover with brief promotionA formerly promoted replica rejoins but carries writes from its promotion windowgtid_executed on the demoted node versus current source
Direct writes to a replicaApplication or operator writes to a replica with read_only=OFFSHOW PROCESSLIST history, application connection strings
Maintenance on replica with binlog enabledAd-hoc repairs or schema changes on the replica propagate as new GTIDsWhether SQL_LOG_BIN was disabled during the session
Transaction skipping under GTIDUsing sql_slave_skip_counter in GTID mode, which is unsupported and can leave gapsError log for skip attempts; Retrieved_Gtid_Set versus Executed_Gtid_Set
Group Replication multi-primary conflictsConflicting writes on multiple primaries create divergent GTID setsPer-node gtid_executed in performance_schema.replication_group_members

Quick checks

Run these read-only checks on the source and replica to classify the problem.

-- On source: collect executed GTID set
SELECT @@global.gtid_executed AS source_executed;
-- On replica: collect executed and retrieved GTID sets
SHOW REPLICA STATUS\G
-- On replica: ordinary lag (expected during catch-up)
SELECT GTID_SUBTRACT('SOURCE_GTID_SET', @@global.gtid_executed) AS missing_on_replica;
-- On replica: errant transactions (critical divergence)
SELECT GTID_SUBTRACT(@@global.gtid_executed, 'SOURCE_GTID_SET') AS errant_on_replica;
-- On replica: returns 1 if replica is a subset of source (clean), 0 if errant
SELECT GTID_SUBSET(@@global.gtid_executed, 'SOURCE_GTID_SET');
-- On source: detect whether needed binlogs have been purged
SELECT GTID_SUBTRACT(@@global.gtid_purged, 'REPLICA_GTID_EXECUTED') AS purged_gap;

If the last query returns non-empty, the source has purged GTIDs the replica never received. The replica cannot recover via replication.

How to diagnose it

  1. Pause automation. Stop orchestrated failover, MHA, MySQL Shell AdminAPI switchover, or any other automated topology changes until you resolve the divergence.
  2. Collect GTID sets from all nodes. Run SELECT @@global.gtid_executed on the current source and every replica. Record Retrieved_Gtid_Set and Executed_Gtid_Set from SHOW REPLICA STATUS on each replica.
  3. Classify the gap. Run GTID_SUBTRACT(source_executed, replica_executed) and GTID_SUBTRACT(replica_executed, source_executed). The first detects ordinary lag. The second detects errant transactions.
  4. Identify specific errant UUIDs. The output of GTID_SUBTRACT gives exact UUID:RANGE values. Cross-reference these against the replica’s error log or SHOW BINLOG EVENTS in the relevant binlog file to determine their origin. If the errant UUID matches the replica’s own server_uuid, the transactions originated locally.
  5. Check for purged gaps. On the source, compare gtid_purged against each replica’s gtid_executed. If GTID_SUBTRACT(source_gtid_purged, replica_gtid_executed) is non-empty, that replica is irrecoverable.
  6. Assess scope. Determine whether the errant GTIDs exist on one replica, many replicas, or the source itself. Multi-replica divergence usually points to a former source that was demoted after accepting writes.
  7. Correlate with events. Check the error log for recent failover events, CHANGE REPLICATION SOURCE TO commands, or RESET MASTER operations. Look for application writes to replica hosts.

Temporary gaps in gtid_executed (for example, uuid:1-5:7-10) can appear on multi-threaded replicas. These fill as replication progresses. Do not confuse transient gaps with errant transactions from a different UUID.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
GTID_SUBTRACT(replica_executed, source_executed)Definitive check for true divergenceAny non-empty result
GTID_SUBSET(replica_executed, source_executed)Returns 1 when replica has no extra GTIDsReturns 0
Source gtid_purged versus replica gtid_executedDetects whether needed binlogs are goneGTID_SUBTRACT(gtid_purged, replica_executed) non-empty
Seconds_Behind_SourceDistinguishes ordinary lag from divergenceLow lag with GTID mismatch suggests errant transactions
Replica_SQL_Running / Replica_IO_RunningBroken threads can hide or mimic divergenceEither thread not Yes
Retrieved_Gtid_Set versus Executed_Gtid_SetApply backlog on multi-threaded replicasGrowing delta with replication running

Fixes

Isolated errant GTIDs on replicas

If the replica has only a few errant GTIDs and the source has not purged them, inject empty transactions on the source so the GTIDs propagate naturally through the topology.

-- Run on the source for each errant GTID
SET GTID_NEXT='UUID:NUMBER';
BEGIN;
COMMIT;
SET GTID_NEXT=AUTOMATIC;

This marks the GTID as executed on the source and all downstream replicas without changing application data. Injecting on the source is preferred over injecting on each replica individually, because replication propagates the empty transaction consistently.

For bulk repair across many replicas, use mysqlslavetrx:

# Inject empty transactions across multiple replicas
mysqlslavetrx --gtid-set='<errant_gtid_set>' --verbose --slaves=user:pass@host:port,...

Warning: injecting empty transactions permanently adds those GTIDs to the source’s gtid_executed. The topology now carries the residue of the divergence in its GTID history. Document the injection event in your runbook.

Source gtid_purged advanced past replica needs

If the source has purged binary logs containing GTIDs the replica lacks, empty transactions cannot help. The replica is permanently inconsistent with the source’s available history.

Fix by rebuilding the replica:

  1. STOP REPLICA; RESET REPLICA ALL; on MySQL 8.0, or STOP SLAVE; RESET SLAVE ALL; on 5.7.
  2. Provision a fresh dataset with mysqldump --set-gtid-purged=OFF or a physical backup tool such as Percona XtraBackup. When the source has already purged needed logs, avoid --set-gtid-purged=ON on an empty replica; rely on SOURCE_AUTO_POSITION=1 after restore.
  3. Repoint replication with CHANGE REPLICATION SOURCE TO SOURCE_AUTO_POSITION=1; START REPLICA;. On MySQL 5.7, use the equivalent CHANGE MASTER TO MASTER_AUTO_POSITION=1; syntax.

Warning: RESET MASTER on the replica resets both binary logs and GTID state. Do not run this on a source or on a replica you intend to keep as a replica without understanding the GTID reset implications.

MySQL Shell AdminAPI ClusterSet

In ClusterSet deployments, errant GTIDs surface in clusterSet.status({extended: 1}) as transactionSetErrantGtidSet with global status OK_NOT_CONSISTENT. Controlled switchover is blocked, but emergency failover remains possible. Use the AdminAPI reconciliation procedure to align the cluster, or perform an emergency failover and rebuild the divergent node.

Group Replication multi-primary conflicts

If using multi-primary Group Replication, errant transactions from conflicting writes require removing the affected node from the group, rebuilding it from an online member, and rejoining it.

Prevention

  • Enforce replica immutability. Set read_only=ON on all replicas. On MySQL 8.0, use super_read_only=ON to prevent even SUPER-privileged writes.
  • Disable binary logging before ad-hoc changes. On a replica, run SET SQL_LOG_BIN=0; before maintenance, then SET SQL_LOG_BIN=1; after.
  • Never use sql_slave_skip_counter under GTID. Skip operations must use GTID_NEXT with an empty transaction, applied deliberately and documented.
  • Automate pre-failover GTID checks. Any orchestrator or runbook should verify GTID_SUBTRACT(replica_executed, source_executed) is empty before promoting a replica.
  • Monitor GTID sets proactively. Schedule periodic consistency checks that compare gtid_executed across the topology and alert on non-empty GTID_SUBTRACT(replica, source).

How Netdata helps

Netdata’s MySQL collector tracks Replica_IO_Running, Replica_SQL_Running, and Seconds_Behind_Source in real time.