MySQL GTID errant transactions: detecting replication divergence
Before a planned failover, or after an orchestrator aborts with a GTID consistency error, replication can appear healthy: SHOW REPLICA STATUS reports both threads running, Seconds_Behind_Source is near zero, and the error log is quiet. Yet comparing GTID sets between source and replica reveals mismatching numbers. Errant transactions are GTIDs present in a replica’s gtid_executed set that the source never generated.
Ordinary lag resolves as the replica catches up. Errant transactions represent true divergence. Promoting a replica with extra GTIDs propagates those transactions into the new source and downstream replicas, creating split-brain that is expensive to reverse. This guide covers distinguishing benign lag from dangerous divergence, pinpointing offending GTIDs, and deciding between empty-transaction injection and a full rebuild.
What this means
In GTID replication, every committed transaction carries a unique identifier composed of a UUID and a sequence number. Every server tracks its committed transaction history in gtid_executed.
Ordinary lag is benign. When GTID_SUBTRACT(source_executed, replica_executed) returns a non-empty set, the replica is behind. As long as the I/O thread is fetching events and the SQL thread is applying them, the replica converges without operator intervention.
True divergence is the inverse. When GTID_SUBTRACT(replica_executed, source_executed) returns a non-empty set, the replica has transactions the source lacks. This can happen after a failed failover where a replica was briefly promoted and accepted writes, after direct writes to a replica with read_only=OFF, or after maintenance scripts executed without disabling binary logging. The replica is not behind; it has branched.
A separate but equally serious failure is source gtid_purged advancing past the replica’s needs. If the source purges binary logs containing GTIDs the replica has not yet consumed, auto-positioning cannot resolve the gap. The replica is incomplete, not divergent. The only resolution is a full rebuild.
flowchart TD
A[Collect GTID sets] --> B{Source minus replica
non-empty?}
B -->|Yes| C[Ordinary lag]
B -->|No| D{Replica minus source
non-empty?}
D -->|Yes| E[Errant transactions]
D -->|No| F[Consistent]
E --> G{Source purged
past replica needs?}
G -->|Yes| H[Rebuild required]
G -->|No| I[Empty transaction fix]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Failed failover with brief promotion | A formerly promoted replica rejoins but carries writes from its promotion window | gtid_executed on the demoted node versus current source |
| Direct writes to a replica | Application or operator writes to a replica with read_only=OFF | SHOW PROCESSLIST history, application connection strings |
| Maintenance on replica with binlog enabled | Ad-hoc repairs or schema changes on the replica propagate as new GTIDs | Whether SQL_LOG_BIN was disabled during the session |
| Transaction skipping under GTID | Using sql_slave_skip_counter in GTID mode, which is unsupported and can leave gaps | Error log for skip attempts; Retrieved_Gtid_Set versus Executed_Gtid_Set |
| Group Replication multi-primary conflicts | Conflicting writes on multiple primaries create divergent GTID sets | Per-node gtid_executed in performance_schema.replication_group_members |
Quick checks
Run these read-only checks on the source and replica to classify the problem.
-- On source: collect executed GTID set
SELECT @@global.gtid_executed AS source_executed;
-- On replica: collect executed and retrieved GTID sets
SHOW REPLICA STATUS\G
-- On replica: ordinary lag (expected during catch-up)
SELECT GTID_SUBTRACT('SOURCE_GTID_SET', @@global.gtid_executed) AS missing_on_replica;
-- On replica: errant transactions (critical divergence)
SELECT GTID_SUBTRACT(@@global.gtid_executed, 'SOURCE_GTID_SET') AS errant_on_replica;
-- On replica: returns 1 if replica is a subset of source (clean), 0 if errant
SELECT GTID_SUBSET(@@global.gtid_executed, 'SOURCE_GTID_SET');
-- On source: detect whether needed binlogs have been purged
SELECT GTID_SUBTRACT(@@global.gtid_purged, 'REPLICA_GTID_EXECUTED') AS purged_gap;
If the last query returns non-empty, the source has purged GTIDs the replica never received. The replica cannot recover via replication.
How to diagnose it
- Pause automation. Stop orchestrated failover, MHA, MySQL Shell AdminAPI switchover, or any other automated topology changes until you resolve the divergence.
- Collect GTID sets from all nodes. Run
SELECT @@global.gtid_executedon the current source and every replica. RecordRetrieved_Gtid_SetandExecuted_Gtid_SetfromSHOW REPLICA STATUSon each replica. - Classify the gap. Run
GTID_SUBTRACT(source_executed, replica_executed)andGTID_SUBTRACT(replica_executed, source_executed). The first detects ordinary lag. The second detects errant transactions. - Identify specific errant UUIDs. The output of
GTID_SUBTRACTgives exactUUID:RANGEvalues. Cross-reference these against the replica’s error log orSHOW BINLOG EVENTSin the relevant binlog file to determine their origin. If the errant UUID matches the replica’s ownserver_uuid, the transactions originated locally. - Check for purged gaps. On the source, compare
gtid_purgedagainst each replica’sgtid_executed. IfGTID_SUBTRACT(source_gtid_purged, replica_gtid_executed)is non-empty, that replica is irrecoverable. - Assess scope. Determine whether the errant GTIDs exist on one replica, many replicas, or the source itself. Multi-replica divergence usually points to a former source that was demoted after accepting writes.
- Correlate with events. Check the error log for recent failover events,
CHANGE REPLICATION SOURCE TOcommands, orRESET MASTERoperations. Look for application writes to replica hosts.
Temporary gaps in gtid_executed (for example, uuid:1-5:7-10) can appear on multi-threaded replicas. These fill as replication progresses. Do not confuse transient gaps with errant transactions from a different UUID.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
GTID_SUBTRACT(replica_executed, source_executed) | Definitive check for true divergence | Any non-empty result |
GTID_SUBSET(replica_executed, source_executed) | Returns 1 when replica has no extra GTIDs | Returns 0 |
Source gtid_purged versus replica gtid_executed | Detects whether needed binlogs are gone | GTID_SUBTRACT(gtid_purged, replica_executed) non-empty |
Seconds_Behind_Source | Distinguishes ordinary lag from divergence | Low lag with GTID mismatch suggests errant transactions |
Replica_SQL_Running / Replica_IO_Running | Broken threads can hide or mimic divergence | Either thread not Yes |
Retrieved_Gtid_Set versus Executed_Gtid_Set | Apply backlog on multi-threaded replicas | Growing delta with replication running |
Fixes
Isolated errant GTIDs on replicas
If the replica has only a few errant GTIDs and the source has not purged them, inject empty transactions on the source so the GTIDs propagate naturally through the topology.
-- Run on the source for each errant GTID
SET GTID_NEXT='UUID:NUMBER';
BEGIN;
COMMIT;
SET GTID_NEXT=AUTOMATIC;
This marks the GTID as executed on the source and all downstream replicas without changing application data. Injecting on the source is preferred over injecting on each replica individually, because replication propagates the empty transaction consistently.
For bulk repair across many replicas, use mysqlslavetrx:
# Inject empty transactions across multiple replicas
mysqlslavetrx --gtid-set='<errant_gtid_set>' --verbose --slaves=user:pass@host:port,...
Warning: injecting empty transactions permanently adds those GTIDs to the source’s gtid_executed. The topology now carries the residue of the divergence in its GTID history. Document the injection event in your runbook.
Source gtid_purged advanced past replica needs
If the source has purged binary logs containing GTIDs the replica lacks, empty transactions cannot help. The replica is permanently inconsistent with the source’s available history.
Fix by rebuilding the replica:
STOP REPLICA; RESET REPLICA ALL;on MySQL 8.0, orSTOP SLAVE; RESET SLAVE ALL;on 5.7.- Provision a fresh dataset with
mysqldump --set-gtid-purged=OFFor a physical backup tool such as Percona XtraBackup. When the source has already purged needed logs, avoid--set-gtid-purged=ONon an empty replica; rely onSOURCE_AUTO_POSITION=1after restore. - Repoint replication with
CHANGE REPLICATION SOURCE TO SOURCE_AUTO_POSITION=1; START REPLICA;. On MySQL 5.7, use the equivalentCHANGE MASTER TO MASTER_AUTO_POSITION=1;syntax.
Warning: RESET MASTER on the replica resets both binary logs and GTID state. Do not run this on a source or on a replica you intend to keep as a replica without understanding the GTID reset implications.
MySQL Shell AdminAPI ClusterSet
In ClusterSet deployments, errant GTIDs surface in clusterSet.status({extended: 1}) as transactionSetErrantGtidSet with global status OK_NOT_CONSISTENT. Controlled switchover is blocked, but emergency failover remains possible. Use the AdminAPI reconciliation procedure to align the cluster, or perform an emergency failover and rebuild the divergent node.
Group Replication multi-primary conflicts
If using multi-primary Group Replication, errant transactions from conflicting writes require removing the affected node from the group, rebuilding it from an online member, and rejoining it.
Prevention
- Enforce replica immutability. Set
read_only=ONon all replicas. On MySQL 8.0, usesuper_read_only=ONto prevent even SUPER-privileged writes. - Disable binary logging before ad-hoc changes. On a replica, run
SET SQL_LOG_BIN=0;before maintenance, thenSET SQL_LOG_BIN=1;after. - Never use
sql_slave_skip_counterunder GTID. Skip operations must useGTID_NEXTwith an empty transaction, applied deliberately and documented. - Automate pre-failover GTID checks. Any orchestrator or runbook should verify
GTID_SUBTRACT(replica_executed, source_executed)is empty before promoting a replica. - Monitor GTID sets proactively. Schedule periodic consistency checks that compare
gtid_executedacross the topology and alert on non-emptyGTID_SUBTRACT(replica, source).
How Netdata helps
Netdata’s MySQL collector tracks Replica_IO_Running, Replica_SQL_Running, and Seconds_Behind_Source in real time.







