MySQL semi-synchronous replication stall: commits hanging on ACK
Commits suddenly take 10 seconds or more and then time out. On the MySQL primary, Threads_running climbs while the commit rate flatlines. A moment later, commits resume, but Rpl_semi_sync_source_no_tx is ticking upward. The primary is waiting for a semi-synchronous replica to acknowledge receipt of binlog events, and that ACK is not arriving fast enough or at all.
Confirm the stall, find the missing ACK, and decide whether to fix the replica path or temporarily fall back to asynchronous replication.
What this means
In semi-synchronous replication, the source blocks each commit until at least rpl_semi_sync_source_wait_for_replica_count replicas acknowledge receipt and flush of the event to their relay log. The replica does not need to execute the transaction; the I/O thread only needs to persist the event. If the ACK does not arrive within rpl_semi_sync_source_timeout milliseconds, the source falls back to asynchronous replication for that transaction.
When a replica is slow, disconnected, or unable to flush its relay log, the source waits. If rpl_semi_sync_source_wait_no_replica is ON (the default), the source checks the replica count only at the start of the wait. If the replica disconnects after the commit begins waiting, the source blocks for the full timeout rather than falling back immediately. The result is a burst of commit latency that cascades into connection pile-up and application timeouts.
flowchart TD
A[Client commits on source] --> B{Semi-sync enabled?}
B -->|Yes| C[Wait for replica ACK]
C --> D{ACK received within timeout?}
D -->|Yes| E[Commit returns]
D -->|No| F[Silently fall back to async]
F --> E
B -->|No| G[Commit async]
G --> E
H[Slow replica or network loss] --> I[Relay log flush delayed]
I --> CThe default rpl_semi_sync_source_timeout is 10 000 ms. In OLTP workloads, a 10-second commit stall is an outage. This is the direct trade-off between durability and availability.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Replica network partition or latency spike | Rpl_semi_sync_source_clients drops; replica I/O thread shows “Connecting” | SHOW GLOBAL STATUS LIKE 'Rpl_semi_sync_source_clients' and network latency between nodes |
| Replica relay-log device saturation | Replica I/O thread is running but ACKs are slow; replica disk write queue is high | iostat -x on the replica relay log partition |
| Replica crashed or I/O thread stopped | No ACK possible; replica is unreachable or replication broken | SHOW REPLICA STATUS on the replica for Replica_IO_Running and Last_IO_Error |
| Timeout configured too high for SLO | Commits hang for the full duration of rpl_semi_sync_source_timeout before fallback | SHOW GLOBAL VARIABLES LIKE 'rpl_semi_sync_source_timeout' |
| Plugin mismatch after upgrade | Semi-sync variables missing; Rpl_semi_sync_source_status stays 0 despite plugin listed in mysql.plugin | SHOW PLUGINS and SHOW GLOBAL VARIABLES LIKE 'rpl_semi_sync_%' |
Quick checks
Run these read-only checks on the source first, then on the replica.
-- Check semi-sync source status and client count
SHOW GLOBAL STATUS LIKE 'Rpl_semi_sync_source_%';
-- Check how many replicas are required vs connected
SHOW GLOBAL VARIABLES LIKE 'rpl_semi_sync_source_wait_for_replica_count';
SHOW GLOBAL STATUS LIKE 'Rpl_semi_sync_source_clients';
-- Check whether commits are timing out
SHOW GLOBAL STATUS LIKE 'Rpl_semi_sync_source_no_tx';
SHOW GLOBAL STATUS LIKE 'Rpl_semi_sync_source_yes_tx';
On the replica:
-- Check replica I/O thread health
SHOW REPLICA STATUS\G
# Check relay log device write latency on the replica
# Look for high await or queue depth on the relay log filesystem
iostat -x 1 5
-- Check if the semi-sync replica plugin is loaded
SHOW PLUGINS WHERE Name LIKE '%semi_sync%';
-- Verify timeout and wait point configuration on source
SHOW GLOBAL VARIABLES LIKE 'rpl_semi_sync_source_timeout';
SHOW GLOBAL VARIABLES LIKE 'rpl_semi_sync_source_wait_point';
How to diagnose it
Confirm semi-sync is the bottleneck. On the source, check
Rpl_semi_sync_source_status. If it is 1, the plugin is enabled and the source is attempting semi-sync. IfRpl_semi_sync_source_clientsis less thanrpl_semi_sync_source_wait_for_replica_count, the source will wait. CompareRpl_semi_sync_source_yes_txandRpl_semi_sync_source_no_txover a 30-second window. Ifno_txis increasing whileyes_txis flat, timeouts are occurring.Identify which replica is missing.
Rpl_semi_sync_source_clientstells you how many semi-sync-capable replicas are connected. If you expect one and see zero, the replica I/O thread has disconnected. CheckSHOW PROCESSLISTon the source for replication connections, or checkSHOW REPLICA STATUSon each replica forReplica_IO_Running.Check the replica I/O thread state. The replica ACKs after writing to the relay log and flushing to disk. If
Replica_IO_Runningis “Connecting”, the replica is trying to reconnect. If it is “No”, checkLast_IO_Error. A stopped I/O thread means no ACKs.Measure network health. Use
ping,mtr, or TCP latency checks between the source and replica. Semi-sync ACKs are small packets, but a network partition or severe packet loss will delay or drop them. Check if firewall rules or security groups recently changed.Inspect replica disk I/O. The replica must fsync the relay log before ACKing. If the relay log partition is on a saturated disk, the fsync delays the ACK. On the replica, run
iostat -xand look for highawaitor%utilon the relay log device.Review configuration for availability trade-offs. Check
rpl_semi_sync_source_wait_point.AFTER_SYNC(the default since MySQL 5.7) waits for the ACK before the storage engine commits, so other sessions cannot see the data until the ACK arrives.AFTER_COMMITcommits to the storage engine first, then waits for the ACK, meaning other sessions may see the data before it is replicated. Neither changes the stall duration, butAFTER_SYNCis lossless on failover.Verify plugin compatibility. In MySQL 8.0.26 and later, semi-sync plugins and variables were renamed from
rpl_semi_sync_master_*andrpl_semi_sync_slave_*torpl_semi_sync_source_*andrpl_semi_sync_replica_*. Ifmy.cnfreferences the old names on a new server, or if old plugins remain installed after an upgrade, semi-sync may fail to initialize: the plugin might appear inmysql.pluginbut not load correctly, leaving the status variables missing. CheckSHOW PLUGINSforrpl_semi_sync_sourceorrpl_semi_sync_master, and ensure the source and replica plugin names match the server version.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
Rpl_semi_sync_source_clients | Shows how many replicas are connected and capable of sending ACKs | Drops below rpl_semi_sync_source_wait_for_replica_count |
Rpl_semi_sync_source_no_tx | Counts commits that timed out without receiving an ACK | Sustained increase means the primary is falling back to async |
Rpl_semi_sync_source_yes_tx | Counts commits that succeeded in semi-sync mode | Flat or dropping while write volume is constant |
Rpl_semi_sync_source_status | Whether the source is currently operating in semi-sync mode | Dropping to 0 during a write burst |
Questions rate | Query throughput on the source | Sustained drop while Threads_running rises |
Threads_running | Active concurrency | Rising above CPU core count while commit rate drops |
Replica_IO_Running | Whether the replica I/O thread is fetching events | Not “Yes” on a replica expected to be semi-sync |
Seconds_Behind_Source | SQL apply lag on the replica | Sudden spikes indicate replica health issues, though this lags behind the ACK stall |
Fixes
Reduce the timeout for faster fallback
If availability is more important than durability for every single commit, lower rpl_semi_sync_source_timeout so the source falls back to async faster.
-- Reduce timeout to 5 seconds (default is 10000 ms)
SET GLOBAL rpl_semi_sync_source_timeout = 5000;
Trade-off: You lose the durability guarantee sooner. A failover in the gap between async fallback and replica catch-up can lose data.
Allow immediate async fallback when no replicas are connected
If you prefer async replication to a full commit stall when all replicas disconnect, set rpl_semi_sync_source_wait_no_replica to OFF.
SET GLOBAL rpl_semi_sync_source_wait_no_replica = OFF;
Trade-off: If all replicas are offline, commits proceed without any replication guarantee. This can lead to significant data loss if the primary fails before replicas return.
Temporarily disable semi-sync on the source
If the replica path is broken and you need immediate relief, disable the source plugin. This converts all commits to async instantly.
SET GLOBAL rpl_semi_sync_source_enabled = OFF;
Trade-off: You lose all semi-sync durability guarantees until re-enabled. Only use this when the alternative is complete write unavailability.
Fix the replica I/O path
If the replica is disk-bound on relay log writes, move the relay logs to a dedicated, low-latency device or increase the replica’s I/O capacity. If the replica I/O thread has stopped due to an error, resolve the error (for example, a binlog expired on the source before the replica fetched it) and restart replication.
Correct plugin mismatches
If the semi-sync plugin failed to load after an upgrade because old and new plugin names are mixed, uninstall the deprecated plugins and install the current ones. Only run these when the plugin is not actively needed or during a maintenance window.
-- On source
UNINSTALL PLUGIN rpl_semi_sync_master;
INSTALL PLUGIN rpl_semi_sync_source SONAME 'semisync_source.so';
-- On replica
UNINSTALL PLUGIN rpl_semi_sync_slave;
INSTALL PLUGIN rpl_semi_sync_replica SONAME 'semisync_replica.so';
Restart is not required for plugin install or uninstall, but verify that the variables appear after installation.
Prevention
- Monitor
Rpl_semi_sync_source_clients. A drop below your required count is the earliest warning of a stall. - Set
rpl_semi_sync_source_wait_for_replica_countbelow your total replica count. If you have three replicas, require one. A single replica failure then does not stall commits. - Set a timeout that matches your application’s connection or lock wait limits. The 10-second default is too long for many workloads.
- Watch the ratio of
Rpl_semi_sync_source_no_txto total commits. A rising rate of async fallbacks indicates intermittent replica path problems before they become full stalls. - After any MySQL upgrade, confirm
SHOW PLUGINSlists the correct semi-sync plugins and that the status variables exist. - Alert on replica fsync latency and cross-AZ network latency. Semi-sync is only as fast as the slowest ACK path.
How Netdata helps
- Correlate
Rpl_semi_sync_source_no_txwithThreads_runningand the MySQLQuestionsrate to confirm that commit stalls are causing connection pile-up. - Alert when
Rpl_semi_sync_source_clientsdrops belowrpl_semi_sync_source_wait_for_replica_countbefore commits start timing out. - Cross-reference the primary’s semi-sync counters with the replica’s replication lag, disk write latency, and network latency to isolate whether the missing ACK is a network, disk, or replication thread problem.
- Track
Rpl_semi_sync_source_yes_txversusno_txover time to spot gradual degradation in the durability path, such as a replica flapping due to intermittent packet loss.
Related guides
- How MySQL actually works in production: a mental model for operators
- MySQL Aborted_connects and Aborted_clients climbing: diagnosis
- MySQL adaptive hash index latch contention: high CPU, low throughput
- MySQL InnoDB buffer pool hit ratio collapse: the cliff edge
- MySQL slow after restart: buffer pool warm-up and the cold cache
- MySQL innodb_buffer_pool_size tuning: 60-80% of RAM and when that breaks
- MySQL Innodb_buffer_pool_wait_free > 0: buffer pool memory pressure
- MySQL InnoDB checkpoint age: the redo log capacity signal nobody watches
- MySQL connection exhaustion: detection, diagnosis, and prevention
- MySQL innodb_deadlock_detect=OFF: when deadlock detection becomes the bottleneck
- MySQL ERROR 1213: Deadlock found when trying to get lock; try restarting transaction
- MySQL FLUSH TABLES WITH READ LOCK stall: backups that freeze the server







