MySQL FLUSH TABLES WITH READ LOCK stall: backups that freeze the server

Your application suddenly cannot write. Threads_connected climbs toward max_connections, but Questions flatlines. The last change was a backup job that started ten minutes ago. The culprit is almost always FLUSH TABLES WITH READ LOCK (FTWRL), and the damage is caused not by the lock itself but by what happens while the server waits to acquire it.

FTWRL is triggered by mysqldump --master-data, mysqldump --lock-tables, and similar tools that need a consistent logical backup. It attempts to close all open tables and acquire a global read lock. While it waits for a long-running query to finish, new writes are already blocked. The result is a whole-server freeze: the query rate drops to zero, connections pile up, and the application sees cascading timeouts.

flowchart TD
    A[Long-running SELECT or open transaction] --> B[FTWRL waits to close tables]
    B --> C[Global read lock blocks writes]
    C --> D[Queries queue behind lock]
    D --> E[Threads_running and Threads_connected rise]
    E --> F[Questions rate drops toward zero]
    F --> G[Application timeouts and connection pool exhaustion]

What this means

When FLUSH TABLES WITH READ LOCK runs, MySQL must close all open table handles. If a long-running SELECT, an uncommitted transaction, or any statement holding a table open is active, FTWRL waits. During that wait, writes are blocked. Once FTWRL succeeds, it holds a global read lock that blocks all writes until UNLOCK TABLES.

The freeze begins before the backup starts. The backup tool may sit waiting for FTWRL to return while the server queues write operations. Because the global read lock is incompatible with any write, every INSERT, UPDATE, DELETE, and DDL operation stalls. Reads that do not require table modifications may still work, but any write-dependent application function stops.

This is different from a metadata lock cascade on a single table. FTWRL affects the entire instance.

Common causes

CauseWhat it looks likeFirst thing to check
mysqldump --lock-tables or --master-dataBackup cron starts; seconds later, write latency spikes and Questions dropsSHOW FULL PROCESSLIST for the backup connection and the oldest running query
Long-running reporting SELECTFTWRL hangs behind a query that has been running for minutes or hoursSHOW FULL PROCESSLIST ordered by Time descending
Idle open transaction holding table contextNo active query visible, but a transaction started long ago holds locksSELECT * FROM information_schema.INNODB_TRX ORDER BY trx_started ASC
Backup tooling with implicit FTWRLThird-party backup tools or wrappers that default to table-locking modeBackup script arguments and environment variables

Quick checks

These checks are read-only and safe during an incident.

-- Check for the FTWRL session and its runtime
SHOW FULL PROCESSLIST;

-- Find the oldest open transaction that may be blocking table closure
SELECT trx_id, trx_mysql_thread_id, trx_started, trx_query
FROM information_schema.INNODB_TRX
ORDER BY trx_started ASC LIMIT 5;

-- Check if query throughput has collapsed
SHOW GLOBAL STATUS LIKE 'Questions';
SHOW GLOBAL STATUS LIKE 'Threads_running';
SHOW GLOBAL STATUS LIKE 'Threads_connected';

-- Check InnoDB transaction state for long-running blockers
SHOW ENGINE INNODB STATUS\G

Look for a connection running FLUSH TABLES WITH READ LOCK that has been active for longer than a few seconds. Then look for the oldest query or transaction. If the connection is from mysqldump, the User column or the query text in Info usually reveals it. Queries blocked by FTWRL often show Waiting for global read lock or Waiting for table flush in the State column.

How to diagnose it

  1. Confirm the freeze. Run SHOW GLOBAL STATUS LIKE 'Questions'; twice, five seconds apart. If the counter is flat or barely moving while the application is active, queries are stalling.
  2. Identify FTWRL. Run SHOW FULL PROCESSLIST. Look for a thread executing FLUSH TABLES WITH READ LOCK with a high Time value.
  3. Find the blocker. Look for the oldest running query or the oldest open transaction in information_schema.INNODB_TRX. The blocker may be a large SELECT, an idle transaction, or a slow report.
  4. Verify the source. Check whether the FTWRL connection belongs to a backup user, a cron job, or a deployment script. Cross-reference with your backup schedule.
  5. Assess impact. Check Threads_running and Threads_connected. If Threads_connected is climbing toward max_connections, you have minutes before the server rejects new connections.
  6. Decide. If the blocker is expendable, kill it. If the backup is expendable, kill the FTWRL connection. If neither is safe, you are choosing between a failed backup and a write outage.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Questions rateDirect measure of query throughputSustained drop >50% from baseline with active clients
Threads_runningActive work piling upRising while Questions drops indicates queuing
Threads_connectedConnection pool exhaustionApproaching max_connections during an FTWRL stall
INNODB_TRX open transaction ageIdentifies the silent blockerAny transaction older than your backup window
Metadata lock waits via performance_schema.metadata_locksDistinguishes metadata locking from row lockingPending locks on multiple objects during backup
Innodb_row_lock_current_waitsRules out row-level contentionNear zero during an FTWRL stall confirms global lock

Fixes

Warning: KILL is disruptive. The targeted query, transaction, or connection fails immediately and must be retried or reconnected.

Kill the blocking query or transaction

If the blocker is a non-essential report or an idle session, killing it allows FTWRL to acquire its lock immediately. The backup proceeds and the global freeze ends.

-- Identify the thread ID of the blocker from SHOW FULL PROCESSLIST
KILL <blocker_thread_id>;

Tradeoff: the killed query fails and must be retried. If the blocker is a replication thread or a critical ETL job, do not kill it.

Kill the FTWRL session

If the backup is not time-critical, kill the FTWRL session. This releases the global read lock and restores writes fastest.

-- Identify the thread ID running FLUSH TABLES WITH READ LOCK
KILL <ftwrl_thread_id>;

Tradeoff: the backup is inconsistent and must be restarted.

Switch to –single-transaction

For InnoDB-only servers, replace --lock-tables or --master-data with --single-transaction where appropriate. This starts a consistent snapshot without a global read lock.

# Instead of:
mysqldump --lock-tables db_name > backup.sql

# Use:
mysqldump --single-transaction db_name > backup.sql

Tradeoff: --single-transaction starts a transaction in REPEATABLE READ and holds an MVCC read view for the duration of the dump. This can prevent purge of old undo log records and cause history list length growth or undo tablespace bloat. It does not prevent concurrent DDL, and any non-InnoDB tables in the dump may still be inconsistent.

Use physical or storage-level backups

For large datasets, avoid logical dumps entirely. Use Percona XtraBackup, MySQL Enterprise Backup, or storage snapshots. These tools copy data files directly and do not require FTWRL for the entire backup duration.

Tradeoff: requires additional tooling, disk space, and recovery procedures. Percona XtraBackup and MySQL Enterprise Backup still briefly execute FTWRL at the end of the backup to snapshot non-InnoDB tables or capture binlog coordinates, but the duration is typically seconds rather than the full backup window. Snapshot consistency depends on filesystem or storage engine support.

Prevention

  • Audit backup commands. Search all cron jobs, Ansible playbooks, and Puppet manifests for mysqldump --lock-tables or --master-data. Replace with --single-transaction where appropriate.
  • Run heavy reads on replicas. Direct long-running reports and analytics queries to a replica so they cannot block FTWRL on the primary.
  • Back up from a replica. If you require FTWRL for consistency, run the backup against a replica rather than the primary. Monitor replication lag; if the replica is lagging, the backup may still stall replica writes, but the primary remains unaffected.
  • Monitor open transaction age. Alert on any transaction in INNODB_TRX older than your backup window.
  • Set query timeouts. Use max_execution_time to cap ad-hoc SELECT duration and prevent runaway queries from blocking backups.
  • Document backup requirements. Ensure every backup job includes a comment explaining why it uses or avoids table locks.

How Netdata helps

Netdata correlates the signals that reveal an FTWRL stall:

  • Questions rate: A sudden drop to near zero while Threads_connected remains high is the signature of a global lock freeze.
  • Threads_running: Spikes during FTWRL show queries piling up behind the lock, even if CPU and disk are idle.
  • Threads_connected: Approaching max_connections during a stall provides early warning before connection refusal.
  • Metadata lock monitoring: Surfaces metadata lock buildup that precedes full cascades.
  • Anomaly detection: Flags unusual throughput drops during scheduled backup windows.