MySQL authentication failure spike: brute force vs broken credential rotation
You get paged because Aborted_connects is climbing. The error log shows a wall of authentication failures, application dashboards are yellow, and someone in security is asking if this is an attack. Before you block IP addresses or rotate passwords, you need to know which failure mode you are dealing with. External brute force and internal broken credential rotation produce the same MySQL status counters, but the fixes are opposite. One requires closing the network window and flushing blocked hosts. The other requires finding the application instance that missed the new secret.
Use the signals below to classify the spike before max_connect_errors locks out legitimate clients.
What this means
Aborted_connects increments for every client connection that fails to complete the MySQL handshake. That includes bad credentials, but also network timeouts, protocol mismatches, TLS handshake failures, and clients that drop mid-handshake. It is not a pure authentication metric.
When the failure is authentication-related, MySQL writes Access denied for user ... to the error log and the connection attempt contributes to the per-source error count in performance_schema.host_cache. If a source accumulates max_connect_errors consecutive failures (default 100), MySQL blocks that source entirely. Once blocked, even a correct password is rejected until the host cache is flushed. A concentrated attack or a misconfigured application pool can cross that threshold in seconds, turning a noisy log into a production outage.
The key operational question is whether the sources are external attackers probing common users, or internal application hosts retrying with a stale service-account password.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Brute force or credential stuffing | Many Access denied entries from one or a few external IPs; common usernames such as root, admin, or mysql; spike is uncorrelated with deploys | performance_schema.host_cache grouped by IP; error log for source hosts and user names |
| Broken credential rotation | Spike begins during or immediately after a secret rotation or deploy; failures come from application subnets; the same service account repeats; no password-guessing pattern | Error log user and host fields; deploy timeline; performance_schema.accounts for the affected user |
| Misconfigured health check or load balancer | Steady low-level failures from a small set of internal IPs; attempts may not send a valid user; error log lacks a guessing pattern | Health check configuration; whether the check opens TCP only or attempts authentication |
| Network or TLS handshake failure | Failures from many clients at once, often after a certificate or firewall change; may be accompanied by SSL errors in the log | Error log for SSL/TLS errors; recent infrastructure changes |
Quick checks
Run these safe, read-only checks before changing anything.
# Check global connection-failure counters
SHOW GLOBAL STATUS LIKE 'Aborted%';
SHOW GLOBAL STATUS LIKE 'Connection_errors%';
SHOW GLOBAL VARIABLES LIKE 'max_connect_errors';
# Per-source failure counts and first/last error time
SELECT IP, HOST, SUM_CONNECT_ERRORS, FIRST_ERROR_SEEN, LAST_ERROR_SEEN
FROM performance_schema.host_cache
WHERE SUM_CONNECT_ERRORS > 0
ORDER BY SUM_CONNECT_ERRORS DESC;
# Per-user connection footprint
SELECT USER, HOST, CURRENT_CONNECTIONS, TOTAL_CONNECTIONS
FROM performance_schema.accounts
WHERE USER IS NOT NULL
ORDER BY TOTAL_CONNECTIONS DESC;
# Recent authentication failures in the error log
grep "Access denied" /var/log/mysql/error.log | tail -n 50
If SUM_CONNECT_ERRORS for any IP is at or above max_connect_errors, that host is blocked. Aborted_clients climbing alongside Aborted_connects suggests post-connect drops or timeouts, not a pure authentication issue.
How to diagnose it
Follow this sequence to classify the spike before acting.
- Confirm auth failures are driving the spike. Compare
Aborted_connectsrate toAborted_clients. IfAborted_connectsis rising whileAborted_clientsis flat, failures are happening during the handshake. Match that toAccess deniedvolume in the error log. - Map failures to source IPs with
performance_schema.host_cache. A small number of IPs dominating the count points to a concentrated attack or application misconfiguration. - Read the user name pattern. Brute force probes many user names or common names. Broken rotation fails with the same service account every time.
- Correlate with change events. Check deploy logs, secret-rotation jobs, and configuration pushes. A spike that starts within the rotation window is almost always a stale credential.
- Check host blocking status. If any application host has
SUM_CONNECT_ERRORS >= max_connect_errors, the application is locked out. Shift priority to immediate unblock plus root-cause fix. - Estimate blast radius. Blocked application servers mean a production outage. Blocked external scanners mean noise and potential DoS via connection exhaustion.
flowchart TD
A[Aborted_connects rate spikes] --> B{Error log shows Access denied?}
B -->|No| C[Investigate network TLS or health-check failures]
B -->|Yes| D{Sources are on application subnets?}
D -->|Yes| E[Broken credential rotation or misconfigured health check]
D -->|No| F{Sources are external or highly scattered?}
F -->|Yes| G[Brute force or credential stuffing]
F -->|No| H[Inspect user names and timing for mixed cause]
E --> I[Validate secrets and rotate safely]
G --> J[Block sources and flush host cache if needed]Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
Aborted_connects rate | Primary indicator of failed handshakes | Sustained > 10/min; > 100/min is DoS or lockout risk |
Aborted_clients rate | Distinguishes auth failures from post-connect drops | Rising alongside Aborted_connects suggests network or timeout issues |
Connection_errors_max_connections | Hard refusal due to full connection slots | Any sustained nonzero rate |
performance_schema.host_cache.SUM_CONNECT_ERRORS | Per-source failure count; predicts host blocking | Any IP approaching max_connect_errors |
Error log Access denied rate | Direct evidence of authentication failures with user and source | Sustained spike or new source appearing |
performance_schema.accounts per-user totals | Reveals which account footprint is growing | One service account dominating new connections |
Fixes
Brute force or credential stuffing
Do not rely on max_connect_errors as your primary defense. Once the threshold is crossed, legitimate clients behind the same NAT or proxy can also be blocked.
Block the source at the network layer first with firewall rules, cloud security groups, or a network-level deny list. Flush the host cache only after the attack traffic has stopped, or the attacker immediately re-triggers blocking.
-- WARNING: resets counts for ALL hosts
TRUNCATE TABLE performance_schema.host_cache;
mysqladmin flush-hosts does the same. There is no built-in way to unblock a single IP while preserving counts for others.
If the failure rate threatens to exhaust max_connections, ensure at least one admin account has SUPER or CONNECTION_ADMIN so you can use the reserved admin connection during saturation.
Broken credential rotation
Find the application hosts that still hold the old secret. The error log host field and performance_schema.host_cache point to the subnet or instance. Roll the secret forward to the correct value on those hosts, or roll back the database password if the rotation is incomplete.
For future rotations, use MySQL’s dual-password feature:
ALTER USER 'app_user'@'%'
IDENTIFIED BY 'new_password'
RETAIN CURRENT PASSWORD;
Update all application instances, verify Aborted_connects returns to baseline, then discard the old password:
ALTER USER 'app_user'@'%' DISCARD OLD PASSWORD;
This removes the window where partial deployments produce auth failures.
Host blocking and lockout
When a legitimate host is blocked, clients see errors indicating the host is blocked because of many connection errors. Flush the host cache only after correcting the credential or stopping the failure source. Flushing without fixing the root cause repeats the cycle.
If you are already at max_connections, MySQL reserves one additional connection for a user with SUPER or CONNECTION_ADMIN. Use that reserved slot to connect and diagnose.
Prevention
- Alert on rate. Alert when
Aborted_connectsper minute exceeds a workload-specific baseline, not just an absolute threshold. A quiet OLTP instance should see near-zero auth failures. - Alert on host cache saturation. Set a TICKET-level alert when any IP’s
SUM_CONNECT_ERRORSexceeds 50% ofmax_connect_errors. - Use dual-password rotation. Never switch a production password atomically without a window for clients to catch up.
- Segregate health checks. Configure load balancer health checks to use correct credentials, or use TCP-only checks that do not initiate the MySQL handshake.
- Restrict network access. MySQL port 3306 should not be reachable from the public internet. Internal access should be segmented by application.
- Test rotation playbooks. Run credential rotations in a non-production environment with the same connection pool and deploy choreography as production.
How Netdata helps
Netdata correlates the signals that distinguish brute force from broken rotation:
- Charts
Aborted_connectsandAborted_clientsper second, making it obvious whether the spike is handshake-time or post-connect. - Shows MySQL connection utilization alongside auth-failure rates, so you can see when failures are approaching
max_connections. - Collects the MySQL error log so you can correlate
Access deniedspikes with metric patterns. - Retains historical metrics for auth failure rates, so deviations from normal behavior trigger alerts even when absolute values are low.
- Alerts on elevated
Aborted_connectsand on MySQL availability, giving early warning before host blocking turns into an outage.
Related guides
- How MySQL actually works in production: a mental model for operators
- MySQL Aborted_connects and Aborted_clients climbing: diagnosis
- MySQL adaptive hash index latch contention: high CPU, low throughput
- MySQL binary logs filling the disk: expiry, lagging replicas, and purge
- MySQL InnoDB buffer pool hit ratio collapse: the cliff edge
- MySQL slow after restart: buffer pool warm-up and the cold cache
- MySQL innodb_buffer_pool_size tuning: 60-80% of RAM and when that breaks
- MySQL Innodb_buffer_pool_wait_free > 0: buffer pool memory pressure
- MySQL InnoDB checkpoint age: the redo log capacity signal nobody watches
- MySQL connection exhaustion: detection, diagnosis, and prevention
- MySQL innodb_deadlock_detect=OFF: when deadlock detection becomes the bottleneck
- MySQL ERROR 1213: Deadlock found when trying to get lock; try restarting transaction







