MySQL Threads_created climbing: thread cache churn and missing pooling

When the rate of Threads_created climbs to tens or hundreds of new threads per minute, the thread cache is not absorbing your connection churn. In a pooled deployment this rate should stay near zero. Every cache miss pays the full cost of OS thread creation and initialization, plus a TLS handshake if require_secure_transport is enabled. The first visible symptom is usually connection latency spikes or CPU time diverted to thread management.

What this means

MySQL retains disconnected threads in a cache up to thread_cache_size. A new connection reuses a cached thread when available; otherwise Threads_created increments and a new OS thread is spawned. The counter is cumulative since restart, so the operational signal is the delta over time.

Under the one-thread-per-connection model, every connection holds a dedicated server thread for its lifetime. If the application opens a fresh connection per query or HTTP request, the cache empties and Threads_created tracks Connections almost one-for-one. Even with pooling, aggressive eviction or a thread_cache_size smaller than your connection fluctuation produces sustained thread creation. Each creation adds latency to the handshake; with TLS that cost includes an SSL negotiation.

Because both Threads_created and Connections are cumulative counters, their ratio is a lifetime average, not a real-time miss rate. In a stable, pooled environment that ratio should stay well below 0.01.

flowchart TD
    A[Threads_created climbing] --> B{Threads_cached near thread_cache_size?}
    B -->|Yes| C[Cache too small]
    B -->|No| D{Threads_created tracks Connections?}
    D -->|Yes| E[Missing connection pooling]
    D -->|No| F[Aggressive disconnects]
    C --> G[Increase thread_cache_size]
    E --> H[Add client-side pool or proxy]
    F --> I[Align idle timeouts]

Common causes

CauseWhat it looks likeFirst thing to check
Missing client-side connection poolingMicroservices, serverless functions, or PHP processes open a new connection per request; Threads_created rises in lockstep with ConnectionsApplication connection string and pool configuration
thread_cache_size too smallThreads_cached stays pegged near thread_cache_size while Threads_created climbs steadilySHOW VARIABLES LIKE 'thread_cache_size' against peak Threads_connected
Aggressive pool or proxy evictionConnection pool min idle is zero, or idle timeout is shorter than the typical inter-request interval; periodic disconnect stormsPool idle timeout and MySQL wait_timeout alignment
Traffic spikes or thundering herdSudden Threads_connected burst followed by mass disconnect; cache cannot absorb the churnApplication startup or auto-scaling behavior

Quick checks

-- Check thread creation, cache state, and total connections
SHOW GLOBAL STATUS WHERE Variable_name IN ('Threads_created','Threads_cached','Connections');
SHOW GLOBAL VARIABLES LIKE 'thread_cache_size';
-- Compute the cumulative creation ratio
SELECT ROUND(
  (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='Threads_created')
  /
  NULLIF((SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='Connections'),0)
, 4) AS threads_created_ratio;
# Sample twice, one minute apart, to get per-minute rate
mysqladmin ext | grep Threads_created
sleep 60
mysqladmin ext | grep Threads_created
-- Check if SSL is enforced, which amplifies thread creation cost
SHOW GLOBAL VARIABLES LIKE 'require_secure_transport';
SHOW GLOBAL STATUS LIKE 'Ssl_accepts';
-- Inspect connection churn: many short-lived sleeps indicate throwaway connections
SELECT COMMAND, COUNT(*), MAX(TIME) AS max_time
FROM information_schema.PROCESSLIST
GROUP BY COMMAND;
-- Check peak connection utilization
SHOW GLOBAL STATUS LIKE 'Max_used_connections';
SHOW GLOBAL VARIABLES LIKE 'max_connections';
-- Look for abnormal disconnects that correlate with churn
SHOW GLOBAL STATUS LIKE 'Aborted_clients';

How to diagnose it

  1. Confirm the rate is abnormal. Threads_created is cumulative. Take two samples one minute apart. A sustained rate above ten threads per minute is operationally significant. Brief spikes during deployments are normal; a sustained climb is not.

  2. Compute the cumulative creation ratio. Divide Threads_created by Connections. If the result is above 0.01 after meaningful uptime, more than one percent of your lifetime connections have triggered expensive thread creation. For active incidents, compare deltas taken over a one-minute window.

  3. Inspect cache saturation. Compare Threads_cached to thread_cache_size. If Threads_cached is consistently at or near the limit and Threads_created is still rising, the cache is too small for your churn. If Threads_cached is well below the limit but Threads_created is high, the problem is connection lifetime, not cache size.

  4. Identify the connection pattern. Query information_schema.PROCESSLIST and group by COMMAND. A high count of Sleep with very short TIME values means connections are being opened and closed rapidly. If Threads_running is low while Threads_connected fluctuates wildly, you have a disconnect storm, not genuine load.

  5. Correlate with application behavior. Map the churn to a specific service, deployment, or serverless function that is opening connections without a pool. Look for frameworks that default to autocommit with no persistent connections.

  6. Check for SSL amplification. If require_secure_transport is ON and Ssl_accepts is climbing alongside Threads_created, each cache miss costs a TLS handshake in addition to thread spawning. The fix is still fewer new connections, but the urgency is higher.

  7. Validate server-side limits. If Max_used_connections is approaching max_connections, thread creation may be compounded by connection retries from rejected clients. See MySQL ERROR 1040 (HY000): Too many connections - causes and fixes.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Threads_created rateDirect measure of thread cache misses> 10/minute sustained
Threads_created / ConnectionsLong-term cache efficiencyRatio > 0.01
Threads_cachedCurrent cache utilizationStays near thread_cache_size while Threads_created rises
Threads_connected varianceIndicates connection churnRapid oscillation without corresponding traffic spike
Aborted_clientsClient-side connection teardownsRising rate alongside thread creation
Connection_errors_max_connectionsHard rejections from exhaustionNonzero rate

Fixes

Resize the thread cache

If Threads_cached is pinned at thread_cache_size and Threads_created is climbing, increase the cache to cover normal connection fluctuation.

SET GLOBAL thread_cache_size = <N>;

Do not set this arbitrarily high. Each cached thread retains memory for stack and session buffers. If your peak fluctuation is fifty connections, a cache of fifty to one hundred is usually sufficient. If raising the cache does not stop Threads_created from climbing, the root cause is connection lifetime, not cache size.

Implement or fix client-side connection pooling

The durable fix for high Threads_created is to stop opening and closing connections constantly. Ensure every application uses a connection pool such as HikariCP, and that the pool maintains a reasonable minimum idle count. For architectures where persistent pooling is impossible, such as short-lived serverless functions or PHP-FPM without persistent connections, place a proxy such as ProxySQL or MaxScale between the application and MySQL. The proxy maintains persistent backend connections and absorbs the connect/disconnect churn.

Align idle timeouts

When a pool’s minimum idle count is zero or its idle timeout is seconds-long, connections are closed and reopened constantly. Each reopen triggers a cache lookup or thread creation. Worse, if MySQL wait_timeout is shorter than the pool’s idle timeout, MySQL closes the connection from the server side while the pool still believes it is valid, leading to “MySQL server has gone away” errors.

Set the pool idle timeout lower than wait_timeout so the pool always closes gracefully first. Then raise the pool’s minimum idle count to cover typical concurrency so connections survive brief pauses. If the architecture cannot hold idle connections, use ProxySQL or MaxScale to maintain the persistent backend pool and let the application connect and disconnect at will.

Reduce SSL-induced handshake cost

If require_secure_transport is enabled and thread creation remains high, each new connection pays a TLS handshake tax. Do not disable SSL to mitigate this unless you have an explicit risk acceptance. Instead, reduce the number of handshakes by pooling or proxying. The security boundary stays intact while the thread creation rate drops.

Address connection storms

If Max_used_connections is near max_connections, rejected clients may retry in a loop that amplifies thread creation. Increase max_connections only after verifying that memory can support more threads; otherwise you trade thread churn for an OOM kill. The correct response is to fix the leak or storm that is filling the connection slots.

Prevention

  • Monitor Threads_created rate from the first day of production, not after latency degrades.
  • Size thread_cache_size to absorb your observed peak-to-trough connection swing.
  • Require connection pooling in application standards; forbid connect-per-request patterns in microservices.
  • Review pool configuration during every application deploy, especially idle timeout and minimum pool size.
  • Alert on Threads_created / Connections ratio above 0.01, not just absolute connection count.

How Netdata helps

  • Netdata surfaces Threads_created, Threads_cached, Connections, and Threads_connected in real time, so you can see the delta without manual sampling.
  • Correlate climbing Threads_created with Threads_connected spikes and drops to distinguish cache churn from genuine load growth.
  • Alert on Threads_created rate per minute using the built-in MySQL collector.
  • Cross-reference thread creation with Aborted_clients and Connection_errors_max_connections to identify whether the root cause is pool misconfiguration or connection exhaustion.
  • Track thread_cache_size as a configuration metric alongside utilization to detect undersizing immediately after traffic shifts.