$ guides / mysql / mysql-deadlock-detect-off-high-concurrency ▌

Operations Guides

MySQL innodb_deadlock_detect=OFF: when deadlock detection becomes the bottleneck

At extreme concurrency, InnoDB’s deadlock detector can become the bottleneck. It traverses the wait-for graph on every lock enqueue. That cost is usually negligible until thousands of concurrent writing transactions hit a small set of hot rows. When traversal itself saturates CPU, throughput collapses even though disks are idle and the buffer pool is warm.

innodb_deadlock_detect=OFF removes that traversal. Instead of instant deadlock detection, InnoDB relies exclusively on innodb_lock_wait_timeout to break cycles. The blocked transaction waits, holding its locks, until the timeout expires. This is not a generic performance tuning knob. It is a narrow, high-risk tradeoff that converts instant rollback into delayed timeout, and it is appropriate only when you have confirmed through testing that deadlock detection is the bottleneck.

What it is and why it matters

innodb_deadlock_detect is a dynamic global boolean variable. It defaults to ON in MySQL 5.7.15+ and MySQL 8.0+. It does not exist in MySQL 5.6 or earlier; attempts to set it there return an unknown variable error.

When ON, every time a transaction requests a row or table lock that cannot be granted immediately, InnoDB runs DeadlockChecker::check_and_resolve. The checker walks the wait-for graph to see whether adding this wait edge creates a cycle. If it finds one, it instantly rolls back the smaller transaction (by row count) and returns ER_LOCK_DEADLOCK to the victim.

When OFF, InnoDB skips that traversal. The transaction is enqueued as a waiter and sleeps until the lock is released or innodb_lock_wait_timeout (default 50 seconds) expires. The victim selection logic is bypassed entirely.

Under normal OLTP concurrency, the CPU cost of detection is negligible. Under extreme contention (thousands of threads, hot-row updates, queue-style access patterns), the cost of the graph walk on every lock enqueue becomes the dominant consumer of CPU time. Historical Bug #49047 demonstrated pathological O(n^2) behavior in this path when many locks existed on a single row. While subsequent fixes improved throughput, the fundamental scaling concern remains: detection cost grows with the depth of the wait-for graph.

How it works

When a lock request is blocked, InnoDB must decide whether the new wait edge creates a deadlock.

With detection enabled, the following occurs:

InnoDB enters lock_rec_enqueue_waiting() for row locks or lock_table_enqueue_waiting() for table locks.
It invokes the deadlock checker, which traverses the wait-for list of transactions.
If the wait-for list exceeds 200 transactions, or the checker must inspect more than 1,000,000 locks owned by transactions on that list, InnoDB treats the condition as a deadlock regardless of the setting. The checking transaction is rolled back.
If a cycle is found, the cheaper transaction (by estimated row modifications) is selected as the victim and rolled back immediately.
The surviving transactions continue without waiting.

With detection disabled, steps 2 through 4 are skipped. The blocked transaction is placed on the wait queue and relies entirely on innodb_lock_wait_timeout. When the timeout fires, the waiting statement is rolled back. The transaction continues unless innodb_rollback_on_timeout is enabled.

flowchart TD
    A[Lock request blocked] --> B{innodb_deadlock_detect}
    B -->|ON| C[Traverse wait-for graph]
    C --> D{Deadlock found?}
    D -->|Yes| E[Roll back victim immediately]
    D -->|No| F[Wait for lock or timeout]
    B -->|OFF| F
    F --> G{Wait exceeds innodb_lock_wait_timeout?}
    G -->|Yes| H[Roll back waiter]
    G -->|No| I[Lock granted]

Note that innodb_print_all_deadlocks logs deadlocks to the error log independently of the innodb_deadlock_detect setting. If you are evaluating this change, enable it. SHOW ENGINE INNODB STATUS only shows the most recent deadlock, which is insufficient when deadlocks are frequent.

Where it shows up in production

This bottleneck appears suddenly when concurrency crosses a threshold where the graph traversal dominates useful work.

Typical triggering conditions:

Hot-row contention: Counter increments, inventory decrements, or queue pops that all target the same small set of rows.
High thread counts: Thousands of concurrent transactions, often from connection pool misconfiguration or thread pool absence.
CPU saturation with low throughput: Threads_running is high, CPU is pegged, but Questions rate and Innodb_rows_updated are flat or falling. Disk I/O and buffer pool hit ratio look healthy.

If you see these symptoms, confirm the diagnosis before changing the setting. Use SHOW ENGINE INNODB STATUS and inspect the TRANSACTIONS section for current locks and waits. Correlate with Innodb_row_lock_waits and Innodb_row_lock_current_waits. If the bottleneck is instead missing indexes causing gap locks, or long transactions holding locks, disabling detection will not fix the root cause. It will only remove the CPU cost of noticing the contention.

Tradeoffs and when to use it

Disabling deadlock detection is appropriate only after controlled load testing on representative traffic proves it is the bottleneck. It is not a substitute for fixing transaction design, shortening transactions, or adding indexes that reduce lock ranges.

When to consider it:

Load testing shows CPU saturation from deadlock detection itself, and query latency drops meaningfully when detection is disabled.
The workload has very high concurrency on unavoidable hot rows.
The application already handles lock wait timeouts gracefully, with retry logic at the statement or transaction level.
You reduce innodb_lock_wait_timeout aggressively; consider setting it to a low value such as 2 seconds so that genuine deadlocks do not hold connections for the default 50 seconds.

When to avoid it:

The workload has a non-trivial natural deadlock rate. With detection off, deadlocked transactions hold locks for the full timeout instead of resolving instantly. This inflates Innodb_row_lock_current_waits and can cascade into connection exhaustion.
The application does not distinguish ER_LOCK_WAIT_TIMEOUT from ER_LOCK_DEADLOCK. A timeout-based rollback under genuine deadlock conditions behaves differently from instant victim selection.
You have not load-tested for at least two weeks with representative deadlock patterns. One documented production failure involved cascading lock buildup behind blocked transactions that eventually exhausted max_connections and froze the instance.

Aspect	Detection ON	Detection OFF
CPU cost per blocked lock	Graph traversal	None
Deadlock resolution	Instant victim rollback	Delayed until timeout
Connection slot held during deadlock	Brief	Full `innodb_lock_wait_timeout`
Victim selection	Smaller transaction by row count	Waiter on timeout
Risk of `max_connections` exhaustion	Low	Elevated if deadlocks are frequent

A special case is setting innodb_lock_wait_timeout to 0 with detection off. This prevents any waiting, which eliminates deadlocks entirely, but it produces a storm of timeout errors. It is viable only if the application handles every statement failure with an immediate retry and conflicts are genuinely rare.

Signals to watch in production

If you disable deadlock detection, monitoring shifts from watching deadlock frequency to watching lock wait duration and connection saturation.

Signal	Why it matters	Warning sign
`lock_deadlocks` rate from `INFORMATION_SCHEMA.INNODB_METRICS`	Deadlocks still occur even with detection off. A sustained rate means the application is creating cycles faster than timeouts can clear them.	Nonzero rate sustained above your tested baseline.
`Innodb_row_lock_current_waits`	Shows transactions waiting right now. With detection off, deadlocked transactions inflate this gauge for the full timeout.	Elevated for longer than before the change, or climbing steadily.
`Threads_connected / max_connections`	Blocked transactions consume slots until timeout. Under genuine deadlock, multiple participants sit blocked.	Ratio approaching 0.80 during peak, or sudden spikes correlating with lock waits.
`Questions` rate	Validates whether the change actually improved throughput. If detection was the bottleneck, this should rise.	Drop from baseline while `Threads_connected` remains stable indicates queries are stuck waiting.
`Threads_running`	If CPU drops but this stays high, the system is still contention-bound, just not on the detection path.	Sustained above CPU core count without corresponding throughput.
Error log deadlock output with `innodb_print_all_deadlocks=ON`	Reveals hidden deadlock storms that cumulative counters smooth over.	Frequent deadlock entries, especially with matching timeout errors in application logs.

Enable innodb_print_all_deadlocks during any evaluation period. Without it, you are flying blind on the actual deadlock rate.

How Netdata helps

Innodb_row_lock_waits and Innodb_row_lock_current_waits from SHOW GLOBAL STATUS help correlate throughput drops with lock contention before you change detection settings.
The lock_deadlocks counter from INFORMATION_SCHEMA.INNODB_METRICS gives a baseline deadlock frequency so you can measure whether disabling detection changes the effective rate.
Threads_connected and Threads_running show whether blocked transactions are piling up and threatening connection exhaustion after the change.
Query latency and Questions rate validate the experiment: if latency falls and throughput rises after disabling detection, the deadlock traversal was likely the CPU bottleneck.
MySQL error log monitoring surfaces innodb_print_all_deadlocks entries, exposing deadlock storms that aggregate status variables miss.

How MySQL actually works in production: a mental model for operators: /guides/mysql/how-mysql-works-in-production/
MySQL Aborted_connects and Aborted_clients climbing: diagnosis: /guides/mysql/mysql-aborted-connections/
MySQL connection exhaustion: detection, diagnosis, and prevention: /guides/mysql/mysql-connection-exhaustion/
MySQL ERROR 1213: Deadlock found when trying to get lock; try restarting transaction: /guides/mysql/mysql-deadlock-found/
MySQL gap locks and next-key locks: surprising deadlocks under REPEATABLE READ: /guides/mysql/mysql-gap-locks-next-key-locks/
MySQL ERROR 1205: Lock wait timeout exceeded; try restarting transaction: /guides/mysql/mysql-lock-wait-timeout-exceeded/
MySQL: got a packet bigger than ‘max_allowed_packet’ bytes - causes and fixes: /guides/mysql/mysql-max-allowed-packet-errors/
MySQL monitoring checklist: the signals every production instance needs: /guides/mysql/mysql-monitoring-checklist/
MySQL monitoring maturity model: from survival to expert: /guides/mysql/mysql-monitoring-maturity-model/
MySQL InnoDB row lock contention: finding who blocks whom: /guides/mysql/mysql-row-lock-contention/
MySQL Threads_created climbing: thread cache churn and missing pooling: /guides/mysql/mysql-thread-cache-churn/
MySQL Threads_connected vs Threads_running: which one to actually alert on: /guides/mysql/mysql-threads-connected-vs-threads-running/

The Netdata solution

MySQL monitoring with Netdata

Netdata monitors MySQL and MariaDB with per-second metrics and ML anomaly detection. Track connection usage, query throughput, slow queries, redo-log pressure, and replication lag alongside the host and storage signals that explain them.

See MySQL monitoring → Start monitoring free

MySQL innodb_deadlock_detect=OFF: when deadlock detection becomes the bottleneck

MySQL innodb_deadlock_detect=OFF: when deadlock detection becomes the bottleneck

What it is and why it matters

How it works

Where it shows up in production

Tradeoffs and when to use it

Signals to watch in production

How Netdata helps

Related guides

MySQL monitoring with Netdata