$ guides / mysql / mysql-connection-exhaustion ▌

Operations Guides

MySQL connection exhaustion: detection, diagnosis, and prevention

ERROR 1040 (HY000): Too many connections. Health checks fail. Users cannot sign in. If you configured an admin account correctly, a break-glass session may still get through on the reserved slot, and SELECT 1 returns. MySQL is running, but the connection pool is a wall.

This is a hard cliff. One moment queries flow; the next, every new TCP handshake to port 3306 is rejected. The server does not queue connections. Whether the root cause is a connection leak, a retry storm, or genuine overload from slow queries, the symptom is the same: Threads_connected has reached max_connections.

The key to fast diagnosis is the difference between connected threads and running threads. Threads_connected counts every open session, including thousands of idle Sleep connections. Threads_running counts only those executing a query right now. High connected with low running means you are leaking connections. High running means the database is choking on work. Picking the wrong branch wastes minutes during an outage.

What this means

MySQL uses a one-thread-per-connection model in the classic architecture. Each connection holds session memory and a file descriptor. The server enforces a hard limit at max_connections. The default is 151.

When the limit is reached, the server rejects new attempts and increments Connection_errors_max_connections. MySQL reserves one connection beyond max_connections for accounts with the CONNECTION_ADMIN privilege (or the deprecated SUPER privilege in older versions). This is your emergency access, but only if you configured an admin account and did not consume that slot through application misconfiguration.

Exhaustion is usually a symptom, not the disease. Three broad categories:

Idle accumulation: connections hold sessions without closing them, often because wait_timeout is too high or application pools leak.
Active overload: queries run slowly, so connections stay busy longer and the pool backs up.
Retry amplification: an application hits a timeout, opens more connections to retry, and accelerates saturation.

flowchart TD
    A[Connection refused errors] --> B{Threads_running low?}
    B -->|Yes| C[Idle leak]
    B -->|No| D[Active overload]
    C --> E[Check PROCESSLIST Sleep]
    D --> F[Check query time]
    D --> G[Check lock waits]
    E --> H[Kill old Sleep threads]
    F --> I[Kill blockers or tune]
    G --> I

Common causes

Cause	What it looks like	First thing to check
Connection pool leak	`Threads_connected` near limit, `Threads_running` very low, many `Sleep` entries with high `Time`	`SHOW PROCESSLIST` filtered for `Sleep`
Slow query pile-up	Both metrics high, queries active for many seconds	`SHOW PROCESSLIST` for large `Time` values
Retry storm after partial failure	Waves of `Connection_errors_max_connections` spikes correlating with application errors	Application logs for ERROR 1040 and reconnect loops
`wait_timeout` too high	Gradual daily growth of idle `Sleep` connections that never drop	`SHOW GLOBAL VARIABLES LIKE 'wait_timeout'`
Metadata lock cascade	One table affected, processlist shows `Waiting for table metadata lock`	`performance_schema.metadata_locks` for pending locks

Quick checks

These are read-only unless noted.

# Saturation ratio
mysql -e "SHOW GLOBAL STATUS LIKE 'Threads_connected'; SHOW GLOBAL VARIABLES LIKE 'max_connections';"

# Active versus idle split
mysql -e "SHOW GLOBAL STATUS LIKE 'Threads_running';"

# Rejections happening now
mysql -e "SHOW GLOBAL STATUS LIKE 'Connection_errors_max_connections';"

-- Oldest idle Sleep connections
SELECT id, user, host, db, command, time, state
FROM information_schema.PROCESSLIST
WHERE command = 'Sleep'
ORDER BY time DESC
LIMIT 20;

-- Pending metadata locks (MySQL 5.7+)
SELECT OBJECT_NAME, OBJECT_TYPE, LOCK_TYPE, LOCK_STATUS, COUNT(*) AS waiting
FROM performance_schema.metadata_locks
WHERE LOCK_STATUS = 'PENDING'
GROUP BY OBJECT_NAME, OBJECT_TYPE, LOCK_TYPE, LOCK_STATUS;

-- Long-running active queries
SELECT id, user, host, db, command, time, state, LEFT(info, 100) AS query_preview
FROM information_schema.PROCESSLIST
WHERE command != 'Sleep' AND time > 10
ORDER BY time DESC;

# Timeout configuration
mysql -e "SHOW GLOBAL VARIABLES WHERE Variable_name IN ('wait_timeout', 'interactive_timeout');"

# Peak usage since startup
mysql -e "SHOW GLOBAL STATUS LIKE 'Max_used_connections';"

-- Oldest open transactions
SELECT trx_mysql_thread_id, trx_started, TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS age_seconds, trx_query
FROM information_schema.INNODB_TRX
ORDER BY trx_started ASC
LIMIT 5;

How to diagnose it

Confirm saturation. Verify Threads_connected / max_connections > 0.95 and Connection_errors_max_connections is increasing. If the ratio is high but rejections are zero, you are near the edge but not yet falling.
Determine the branch. Look at Threads_running. If it is low relative to Threads_connected, you have an idle accumulation problem. If it is high and climbing, you have an active overload problem.
For idle accumulation, inspect information_schema.PROCESSLIST for Sleep commands with large Time values. Correlate User and Host with application services. If all idle connections come from one application host, that pool is leaking.
Check wait_timeout and interactive_timeout. The default wait_timeout is 28800 seconds (8 hours). If your application or ORM does not explicitly close connections, reducing this reclaims idle sessions. Interactive clients use interactive_timeout; change both for consistent behavior.
For active overload, inspect active queries. Look for high Time in SHOW PROCESSLIST. Check performance_schema.data_lock_waits (MySQL 8.0+) or information_schema.INNODB_LOCK_WAITS (5.7) for row lock contention backing up the queue.
Check for metadata lock waits. Run the performance_schema.metadata_locks query from Quick checks. If one table has many pending metadata locks and a DDL statement is waiting, you are in a metadata lock cascade. This fills the pool even though the queries themselves are fast.
Check application logs for retry loops. If the application receives ERROR 1040 and immediately retries without backoff, each attempt consumes a brief connection slot and worsens the storm.
Verify admin access. Ensure your monitoring and break-glass accounts hold CONNECTION_ADMIN so you can still connect when the pool is full.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
`Threads_connected / max_connections`	Saturation of the connection pool	> 0.80 sustained for 10 minutes; page if > 0.95 with rejections
`Threads_running`	Distinguishes idle accumulation from genuine overload	Consistently above CPU core count while throughput drops
`Connection_errors_max_connections`	Direct evidence that clients are being refused	Nonzero rate while `Threads_connected` is near the limit
`Max_used_connections / max_connections`	Historical peak trend for capacity planning	Trending upward over weeks, crossing 0.70
`Aborted_connects` rate	Precursor or compounding factor from auth failures and handshake drops	Sustained spike above baseline during saturation events
`Threads_created / Connections` ratio	Thread cache thrashing from connection churn	Ratio > 0.01 indicates cache is too small or pooling is missing

Fixes

Immediate relief without a restart

If you can connect via the admin slot, kill the worst offenders. Killing connections is disruptive and aborts in-flight transactions, but during an outage it may be necessary.

-- Kill the oldest idle Sleep connections first
KILL <id>;

For a metadata lock cascade, decide whether the DDL or the long transaction is more expendable. Killing the DDL provides instant relief; killing the long transaction may be required if the DDL is critical.

-- Kill a blocking DDL or long-running query
KILL <id>;

If idle accumulation is the culprit and you cannot restart the application, lower wait_timeout dynamically.

-- Reclaim idle connections (seconds)
SET GLOBAL wait_timeout = 300;
SET GLOBAL interactive_timeout = 300;

This only affects new idle periods. Existing idle connections are not retroactively subject to the new timeout and must be killed manually.

Short-term configuration changes

Raising max_connections is a band-aid, not a cure. Each connection consumes memory for thread stack, sort buffers, and join buffers. Raising the limit without increasing RAM trades connection rejections for OOM kills or excessive swap. Only raise it after confirming the server has memory headroom.

If you must raise it:

SET GLOBAL max_connections = <new_value>;

Long-term fixes

Fix the application connection pool logic. Ensure connections are returned to the pool and closed on exceptions.
Add exponential backoff with jitter to application retry logic when encountering ERROR 1040.
Deploy a connection pooler such as ProxySQL or MaxScale between the application and MySQL. Middleware multiplexes many application connections into fewer server connections.
Reduce wait_timeout and interactive_timeout permanently to values appropriate for your workload. For many web applications, 300 to 600 seconds is sufficient.
Schedule DDL operations during low-traffic windows and check for long-running transactions with information_schema.INNODB_TRX before starting.

Prevention

Size application connection pools so that total capacity across all instances leaves at least 20-30 percent of max_connections free for spikes, admin access, replication, backups, and monitoring.
Monitor Max_used_connections daily. Track the weekly peak trend. If it grows steadily, investigate before the cliff.
Set alerts at 80 percent of max_connections, not at 100 percent.
Ensure at least one break-glass account has CONNECTION_ADMIN and is not used by applications.
Tune thread_cache_size based on Threads_created rate. If you see constant thread creation, increase the cache to reduce connection latency overhead.
Review ORM and framework configurations. Many frameworks auto-begin transactions and rely on garbage collection or request-finalizers to close connections. Explicit close or try-finally blocks are safer.

How Netdata helps

Correlates Threads_connected, Threads_running, and Connection_errors_max_connections in one view to distinguish idle leaks from active overload.
Tracks Max_used_connections over time to expose gradual growth toward the limit.
Surfaces Aborted_connects and Threads_created alongside connection utilization to highlight pool thrashing or handshake storms.
Supports composite alerts that page only when Threads_connected / max_connections > 0.95 and Connection_errors_max_connections is increasing, filtering transient spikes.
Collects per-second processlist state breakdowns when configured, showing Sleep, Running, and lock wait proportions without manual polling.

The Netdata solution

MySQL monitoring with Netdata

Netdata monitors MySQL and MariaDB with per-second metrics and ML anomaly detection. Track connection usage, query throughput, slow queries, redo-log pressure, and replication lag alongside the host and storage signals that explain them.

See MySQL monitoring → Start monitoring free

MySQL connection exhaustion: detection, diagnosis, and prevention

MySQL connection exhaustion: detection, diagnosis, and prevention

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Immediate relief without a restart

Short-term configuration changes

Long-term fixes

Prevention

How Netdata helps

Related guides

MySQL monitoring with Netdata