MySQL Threads_connected vs Threads_running: which one to actually alert on

Threads_connected counts every client holding an open socket, including idle pooled connections. Threads_running counts only threads actively executing a statement. Paging on the first while ignoring the second is a common postmortem mistake. Threads_running relative to CPU cores is the real load signal. The gap between the two metrics diagnoses failure modes. Connection count still deserves a ticket, but rarely a page.

What it is and why it matters

Threads_connected is the number of currently open client connections. It includes connections in any state: actively querying, waiting on a lock, or sleeping. Each connection consumes memory for session buffers and holds a slot against max_connections. When Threads_connected reaches that limit, new connections are rejected with ER_CON_COUNT_ERROR (“Too many connections”). The server reserves one slot for a user with SUPER or CONNECTION_ADMIN, but from the application perspective the instance is full.

Threads_running is the number of threads not sleeping. It counts only connections actively executing a statement. A thread exits this count when it yields waiting on a lock, I/O, or the network, and re-enters when execution resumes. The invariant Threads_connected >= Threads_running always holds. The delta between them is idle, sleeping, or waiting connections.

Check raw values before trusting dashboard ratios:

SHOW GLOBAL STATUS LIKE 'Threads_%';
SHOW GLOBAL STATUS LIKE 'Questions';
SHOW GLOBAL STATUS LIKE 'Connection_errors_max_connections';
SHOW VARIABLES LIKE 'max_connections';

Threads_connected also includes monitoring and administrative sockets. Exclude these from application capacity planning if they hold persistent connections.

A high connection count with near-zero Threads_running indicates an idle pool leak or an overly generous wait_timeout. The server is benign. Conversely, low Threads_connected with spiking Threads_running is a genuine crisis: the database is saturated despite few open connections, usually from fast OLTP queries, lock contention, or buffer pool thrashing. Threads_running measures actual concurrent work, not socket occupancy.

How it works

MySQL’s classic threading model assigns one OS thread per client connection. Threads_connected increments at connect and decrements at disconnect. Threads_running increments when the thread begins executing a statement and decrements when that statement finishes or yields.

The gap between the two metrics diagnoses specific pathologies. A gap of hundreds indicates idle connections accumulating in the pool. A narrow gap where both metrics are high indicates queries arriving faster than they complete.

When Threads_running rises while the Questions rate stays flat or drops, queries are spending more time waiting than executing. The server is queuing, not accelerating. Under the default one-thread-per-connection model, MySQL typically struggles past roughly 50 concurrently running threads. Once Threads_running consistently exceeds CPU core count, context-switching overhead and contention on internal mutexes dominate. If it climbs toward multiples of core count while throughput collapses, the server is in contention or stall.

On Linux, the OS threads backing these connections are visible under /proc/<mysqld_pid>/task/. In the classic model, the task count roughly tracks Threads_connected, not Threads_running.

Map Threads_running to actual queries with SHOW PROCESSLIST or performance_schema.threads. On MySQL 8.0, filtering PROCESSLIST_COMMAND != 'Sleep' isolates exactly the threads in Threads_running. Grouping by PROCESSLIST_STATE or PROCESSLIST_INFO reveals whether saturation comes from a single query pattern or a broad workload surge.

The thread pool plugin multiplexes many connections onto fewer worker threads. In this mode, Threads_running semantics may differ from the classic model, and thresholds derived from OS thread counts may become stale because the pool caps parallelism independently.

flowchart TD
    A[MySQL concurrency check] --> B{Threads_running high?}
    B -->|Yes| C{Threads_connected high?}
    C -->|Yes| D[Genuine overload]
    C -->|No| E[Crisis: few connections, high load]
    B -->|No| F{Threads_connected high?}
    F -->|Yes| G[Idle pool leak: investigate]
    F -->|No| H[Normal state]

Where it shows up in production

Idle pool leak. Connection pool frameworks such as HikariCP or application servers often default to pools of 100 or more per instance. With ten application nodes, Threads_connected quickly reaches 1,000 even when the database executes fewer than ten queries concurrently. Threads_connected sits at 400 while Threads_running stays below 5. The server appears saturated on a connection dashboard, but it is nearly idle. Paging here wastes on-call time. Fix application-side pool sizing or reduce wait_timeout to evict idle connections faster. The symptom in SHOW PROCESSLIST is a sea of Sleep entries with Time values approaching wait_timeout.

Actual overload with moderate connections. A write-heavy OLTP workload or a metadata lock cascade causes Threads_running to spike to 60 on a 16-core instance while Threads_connected remains at 80. The server is in trouble, but an alert on Threads_connected / max_connections never fires because the ratio is only 53 percent. The load signal is Threads_running relative to CPU cores.

Connection exhaustion cascade. A buffer pool cliff or checkpoint stall causes queries to slow down. Connections that normally finish in milliseconds now hold their slots for seconds. Threads_running rises first, then Threads_connected follows as the application pool fills with waiting queries. Eventually both hit max_connections and new connections are refused. Threads_running is the leading indicator; Threads_connected is the lagging symptom.

Thread pool deployments. With the thread pool enabled, a high Threads_connected may be safely multiplexed onto a small number of worker threads. Threads_running may stay flat even as query latency increases because work is queuing inside the pool rather than consuming OS threads. Operators who migrate to the thread pool and keep legacy Threads_running thresholds may miss saturation signals.

Tradeoffs and common misuses

Paging on connection count alone. Threads_connected should drive a page only when the server is actively rejecting connections and the ratio exceeds 95 percent for a sustained period. Even then, the root cause is usually in Threads_running or slow queries. On managed platforms like AWS RDS, max_connections is capped by instance class and cannot be raised without resizing; approaching that hard limit is a legitimate capacity concern. On self-managed instances, prefer a ticket at 80 percent and a page only when rejections occur.

Using absolute thresholds. Neither metric should be alerted on as an absolute number. A micro instance with max_connections of 50 and a bare-metal server with max_connections of 5000 have completely different semantics at 100 connected threads. Express Threads_connected as a ratio to max_connections. Express Threads_running as a multiple of CPU core count from nproc or /proc/cpuinfo. On containers, nproc may reflect host cores or cgroup limits depending on namespace configuration; use the database host’s effective core count.

The 50-thread ceiling heuristic. Outside of a thread pool, MySQL’s default model struggles past roughly 50 concurrently running threads. This is an operational ceiling, not a rigid limit. It predates modern NUMA hardware and faster mutex implementations, but contention simply shifts from table open cache locks to transaction system and buffer pool hashes. If your baseline Threads_running approaches this value during normal peak, you are already in the danger zone.

Ignoring the gap. The difference between Threads_connected and Threads_running is a diagnostic signal. A widening gap with low running threads points to pool leaks. A collapsing gap where both rise together points to query pile-up. Monitor the gap as a derived ratio or visually on the same chart.

Signals to watch in production

SignalWhy it mattersWarning sign
Threads_running / CPU coresMeasures actual concurrent execution load against hardware parallelism.Sustained above 2x cores (plan), above 4x cores (ticket).
Threads_connected / max_connectionsMeasures connection slot occupancy and headroom before rejection.Above 0.80 sustained for 10 minutes (ticket); above 0.95 with active rejections (page).
Threads_running vs Threads_connected gapDiagnoses whether high connection count is idle or active.High connected with low running suggests pool leak or idle accumulation.
Connection_errors_max_connectionsConfirms that new connections are actually being refused.Any sustained nonzero rate combined with high connected ratio.
Questions rateConfirms whether throughput is collapsing despite high concurrency.Drop > 50 percent from baseline while Threads_connected remains stable.
Innodb_row_lock_current_waitsDistinguishes lock contention from CPU saturation when Threads_running is high.> 0 sustained with rising Threads_running and dropping Questions.

How Netdata helps

Correlating Threads_running, CPU utilization, and disk I/O latency on the same chart exposes whether high concurrency is CPU-bound or I/O-bound. Per-second resolution catches Threads_running spikes that minute-level aggregations smooth over, especially during checkpoint stalls or lock contention bursts. A composite line showing the connected-to-running ratio makes idle pool leaks obvious without mental math. Plotting Questions alongside Threads_running highlights queuing: throughput drops while active threads pile up.