MySQL Threads_created climbing: thread cache churn and missing pooling
When the rate of Threads_created climbs to tens or hundreds of new threads per minute, the thread cache is not absorbing your connection churn. In a pooled deployment this rate should stay near zero. Every cache miss pays the full cost of OS thread creation and initialization, plus a TLS handshake if require_secure_transport is enabled. The first visible symptom is usually connection latency spikes or CPU time diverted to thread management.
What this means
MySQL retains disconnected threads in a cache up to thread_cache_size. A new connection reuses a cached thread when available; otherwise Threads_created increments and a new OS thread is spawned. The counter is cumulative since restart, so the operational signal is the delta over time.
Under the one-thread-per-connection model, every connection holds a dedicated server thread for its lifetime. If the application opens a fresh connection per query or HTTP request, the cache empties and Threads_created tracks Connections almost one-for-one. Even with pooling, aggressive eviction or a thread_cache_size smaller than your connection fluctuation produces sustained thread creation. Each creation adds latency to the handshake; with TLS that cost includes an SSL negotiation.
Because both Threads_created and Connections are cumulative counters, their ratio is a lifetime average, not a real-time miss rate. In a stable, pooled environment that ratio should stay well below 0.01.
flowchart TD
A[Threads_created climbing] --> B{Threads_cached near thread_cache_size?}
B -->|Yes| C[Cache too small]
B -->|No| D{Threads_created tracks Connections?}
D -->|Yes| E[Missing connection pooling]
D -->|No| F[Aggressive disconnects]
C --> G[Increase thread_cache_size]
E --> H[Add client-side pool or proxy]
F --> I[Align idle timeouts]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Missing client-side connection pooling | Microservices, serverless functions, or PHP processes open a new connection per request; Threads_created rises in lockstep with Connections | Application connection string and pool configuration |
thread_cache_size too small | Threads_cached stays pegged near thread_cache_size while Threads_created climbs steadily | SHOW VARIABLES LIKE 'thread_cache_size' against peak Threads_connected |
| Aggressive pool or proxy eviction | Connection pool min idle is zero, or idle timeout is shorter than the typical inter-request interval; periodic disconnect storms | Pool idle timeout and MySQL wait_timeout alignment |
| Traffic spikes or thundering herd | Sudden Threads_connected burst followed by mass disconnect; cache cannot absorb the churn | Application startup or auto-scaling behavior |
Quick checks
-- Check thread creation, cache state, and total connections
SHOW GLOBAL STATUS WHERE Variable_name IN ('Threads_created','Threads_cached','Connections');
SHOW GLOBAL VARIABLES LIKE 'thread_cache_size';
-- Compute the cumulative creation ratio
SELECT ROUND(
(SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='Threads_created')
/
NULLIF((SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='Connections'),0)
, 4) AS threads_created_ratio;
# Sample twice, one minute apart, to get per-minute rate
mysqladmin ext | grep Threads_created
sleep 60
mysqladmin ext | grep Threads_created
-- Check if SSL is enforced, which amplifies thread creation cost
SHOW GLOBAL VARIABLES LIKE 'require_secure_transport';
SHOW GLOBAL STATUS LIKE 'Ssl_accepts';
-- Inspect connection churn: many short-lived sleeps indicate throwaway connections
SELECT COMMAND, COUNT(*), MAX(TIME) AS max_time
FROM information_schema.PROCESSLIST
GROUP BY COMMAND;
-- Check peak connection utilization
SHOW GLOBAL STATUS LIKE 'Max_used_connections';
SHOW GLOBAL VARIABLES LIKE 'max_connections';
-- Look for abnormal disconnects that correlate with churn
SHOW GLOBAL STATUS LIKE 'Aborted_clients';
How to diagnose it
Confirm the rate is abnormal.
Threads_createdis cumulative. Take two samples one minute apart. A sustained rate above ten threads per minute is operationally significant. Brief spikes during deployments are normal; a sustained climb is not.Compute the cumulative creation ratio. Divide
Threads_createdbyConnections. If the result is above 0.01 after meaningful uptime, more than one percent of your lifetime connections have triggered expensive thread creation. For active incidents, compare deltas taken over a one-minute window.Inspect cache saturation. Compare
Threads_cachedtothread_cache_size. IfThreads_cachedis consistently at or near the limit andThreads_createdis still rising, the cache is too small for your churn. IfThreads_cachedis well below the limit butThreads_createdis high, the problem is connection lifetime, not cache size.Identify the connection pattern. Query
information_schema.PROCESSLISTand group byCOMMAND. A high count ofSleepwith very shortTIMEvalues means connections are being opened and closed rapidly. IfThreads_runningis low whileThreads_connectedfluctuates wildly, you have a disconnect storm, not genuine load.Correlate with application behavior. Map the churn to a specific service, deployment, or serverless function that is opening connections without a pool. Look for frameworks that default to autocommit with no persistent connections.
Check for SSL amplification. If
require_secure_transportis ON andSsl_acceptsis climbing alongsideThreads_created, each cache miss costs a TLS handshake in addition to thread spawning. The fix is still fewer new connections, but the urgency is higher.Validate server-side limits. If
Max_used_connectionsis approachingmax_connections, thread creation may be compounded by connection retries from rejected clients. See MySQL ERROR 1040 (HY000): Too many connections - causes and fixes.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
Threads_created rate | Direct measure of thread cache misses | > 10/minute sustained |
Threads_created / Connections | Long-term cache efficiency | Ratio > 0.01 |
Threads_cached | Current cache utilization | Stays near thread_cache_size while Threads_created rises |
Threads_connected variance | Indicates connection churn | Rapid oscillation without corresponding traffic spike |
Aborted_clients | Client-side connection teardowns | Rising rate alongside thread creation |
Connection_errors_max_connections | Hard rejections from exhaustion | Nonzero rate |
Fixes
Resize the thread cache
If Threads_cached is pinned at thread_cache_size and Threads_created is climbing, increase the cache to cover normal connection fluctuation.
SET GLOBAL thread_cache_size = <N>;
Do not set this arbitrarily high. Each cached thread retains memory for stack and session buffers. If your peak fluctuation is fifty connections, a cache of fifty to one hundred is usually sufficient. If raising the cache does not stop Threads_created from climbing, the root cause is connection lifetime, not cache size.
Implement or fix client-side connection pooling
The durable fix for high Threads_created is to stop opening and closing connections constantly. Ensure every application uses a connection pool such as HikariCP, and that the pool maintains a reasonable minimum idle count. For architectures where persistent pooling is impossible, such as short-lived serverless functions or PHP-FPM without persistent connections, place a proxy such as ProxySQL or MaxScale between the application and MySQL. The proxy maintains persistent backend connections and absorbs the connect/disconnect churn.
Align idle timeouts
When a pool’s minimum idle count is zero or its idle timeout is seconds-long, connections are closed and reopened constantly. Each reopen triggers a cache lookup or thread creation. Worse, if MySQL wait_timeout is shorter than the pool’s idle timeout, MySQL closes the connection from the server side while the pool still believes it is valid, leading to “MySQL server has gone away” errors.
Set the pool idle timeout lower than wait_timeout so the pool always closes gracefully first. Then raise the pool’s minimum idle count to cover typical concurrency so connections survive brief pauses. If the architecture cannot hold idle connections, use ProxySQL or MaxScale to maintain the persistent backend pool and let the application connect and disconnect at will.
Reduce SSL-induced handshake cost
If require_secure_transport is enabled and thread creation remains high, each new connection pays a TLS handshake tax. Do not disable SSL to mitigate this unless you have an explicit risk acceptance. Instead, reduce the number of handshakes by pooling or proxying. The security boundary stays intact while the thread creation rate drops.
Address connection storms
If Max_used_connections is near max_connections, rejected clients may retry in a loop that amplifies thread creation. Increase max_connections only after verifying that memory can support more threads; otherwise you trade thread churn for an OOM kill. The correct response is to fix the leak or storm that is filling the connection slots.
Prevention
- Monitor
Threads_createdrate from the first day of production, not after latency degrades. - Size
thread_cache_sizeto absorb your observed peak-to-trough connection swing. - Require connection pooling in application standards; forbid connect-per-request patterns in microservices.
- Review pool configuration during every application deploy, especially idle timeout and minimum pool size.
- Alert on
Threads_created / Connectionsratio above 0.01, not just absolute connection count.
How Netdata helps
- Netdata surfaces
Threads_created,Threads_cached,Connections, andThreads_connectedin real time, so you can see the delta without manual sampling. - Correlate climbing
Threads_createdwithThreads_connectedspikes and drops to distinguish cache churn from genuine load growth. - Alert on
Threads_createdrate per minute using the built-in MySQL collector. - Cross-reference thread creation with
Aborted_clientsandConnection_errors_max_connectionsto identify whether the root cause is pool misconfiguration or connection exhaustion. - Track
thread_cache_sizeas a configuration metric alongside utilization to detect undersizing immediately after traffic shifts.







