MySQL authentication failure spike: brute force vs broken credential rotation

You get paged because Aborted_connects is climbing. The error log shows a wall of authentication failures, application dashboards are yellow, and someone in security is asking if this is an attack. Before you block IP addresses or rotate passwords, you need to know which failure mode you are dealing with. External brute force and internal broken credential rotation produce the same MySQL status counters, but the fixes are opposite. One requires closing the network window and flushing blocked hosts. The other requires finding the application instance that missed the new secret.

Use the signals below to classify the spike before max_connect_errors locks out legitimate clients.

What this means

Aborted_connects increments for every client connection that fails to complete the MySQL handshake. That includes bad credentials, but also network timeouts, protocol mismatches, TLS handshake failures, and clients that drop mid-handshake. It is not a pure authentication metric.

When the failure is authentication-related, MySQL writes Access denied for user ... to the error log and the connection attempt contributes to the per-source error count in performance_schema.host_cache. If a source accumulates max_connect_errors consecutive failures (default 100), MySQL blocks that source entirely. Once blocked, even a correct password is rejected until the host cache is flushed. A concentrated attack or a misconfigured application pool can cross that threshold in seconds, turning a noisy log into a production outage.

The key operational question is whether the sources are external attackers probing common users, or internal application hosts retrying with a stale service-account password.

Common causes

CauseWhat it looks likeFirst thing to check
Brute force or credential stuffingMany Access denied entries from one or a few external IPs; common usernames such as root, admin, or mysql; spike is uncorrelated with deploysperformance_schema.host_cache grouped by IP; error log for source hosts and user names
Broken credential rotationSpike begins during or immediately after a secret rotation or deploy; failures come from application subnets; the same service account repeats; no password-guessing patternError log user and host fields; deploy timeline; performance_schema.accounts for the affected user
Misconfigured health check or load balancerSteady low-level failures from a small set of internal IPs; attempts may not send a valid user; error log lacks a guessing patternHealth check configuration; whether the check opens TCP only or attempts authentication
Network or TLS handshake failureFailures from many clients at once, often after a certificate or firewall change; may be accompanied by SSL errors in the logError log for SSL/TLS errors; recent infrastructure changes

Quick checks

Run these safe, read-only checks before changing anything.

# Check global connection-failure counters
SHOW GLOBAL STATUS LIKE 'Aborted%';
SHOW GLOBAL STATUS LIKE 'Connection_errors%';
SHOW GLOBAL VARIABLES LIKE 'max_connect_errors';
# Per-source failure counts and first/last error time
SELECT IP, HOST, SUM_CONNECT_ERRORS, FIRST_ERROR_SEEN, LAST_ERROR_SEEN
FROM performance_schema.host_cache
WHERE SUM_CONNECT_ERRORS > 0
ORDER BY SUM_CONNECT_ERRORS DESC;
# Per-user connection footprint
SELECT USER, HOST, CURRENT_CONNECTIONS, TOTAL_CONNECTIONS
FROM performance_schema.accounts
WHERE USER IS NOT NULL
ORDER BY TOTAL_CONNECTIONS DESC;
# Recent authentication failures in the error log
grep "Access denied" /var/log/mysql/error.log | tail -n 50

If SUM_CONNECT_ERRORS for any IP is at or above max_connect_errors, that host is blocked. Aborted_clients climbing alongside Aborted_connects suggests post-connect drops or timeouts, not a pure authentication issue.

How to diagnose it

Follow this sequence to classify the spike before acting.

  1. Confirm auth failures are driving the spike. Compare Aborted_connects rate to Aborted_clients. If Aborted_connects is rising while Aborted_clients is flat, failures are happening during the handshake. Match that to Access denied volume in the error log.
  2. Map failures to source IPs with performance_schema.host_cache. A small number of IPs dominating the count points to a concentrated attack or application misconfiguration.
  3. Read the user name pattern. Brute force probes many user names or common names. Broken rotation fails with the same service account every time.
  4. Correlate with change events. Check deploy logs, secret-rotation jobs, and configuration pushes. A spike that starts within the rotation window is almost always a stale credential.
  5. Check host blocking status. If any application host has SUM_CONNECT_ERRORS >= max_connect_errors, the application is locked out. Shift priority to immediate unblock plus root-cause fix.
  6. Estimate blast radius. Blocked application servers mean a production outage. Blocked external scanners mean noise and potential DoS via connection exhaustion.
flowchart TD
  A[Aborted_connects rate spikes] --> B{Error log shows Access denied?}
  B -->|No| C[Investigate network TLS or health-check failures]
  B -->|Yes| D{Sources are on application subnets?}
  D -->|Yes| E[Broken credential rotation or misconfigured health check]
  D -->|No| F{Sources are external or highly scattered?}
  F -->|Yes| G[Brute force or credential stuffing]
  F -->|No| H[Inspect user names and timing for mixed cause]
  E --> I[Validate secrets and rotate safely]
  G --> J[Block sources and flush host cache if needed]

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Aborted_connects ratePrimary indicator of failed handshakesSustained > 10/min; > 100/min is DoS or lockout risk
Aborted_clients rateDistinguishes auth failures from post-connect dropsRising alongside Aborted_connects suggests network or timeout issues
Connection_errors_max_connectionsHard refusal due to full connection slotsAny sustained nonzero rate
performance_schema.host_cache.SUM_CONNECT_ERRORSPer-source failure count; predicts host blockingAny IP approaching max_connect_errors
Error log Access denied rateDirect evidence of authentication failures with user and sourceSustained spike or new source appearing
performance_schema.accounts per-user totalsReveals which account footprint is growingOne service account dominating new connections

Fixes

Brute force or credential stuffing

Do not rely on max_connect_errors as your primary defense. Once the threshold is crossed, legitimate clients behind the same NAT or proxy can also be blocked.

Block the source at the network layer first with firewall rules, cloud security groups, or a network-level deny list. Flush the host cache only after the attack traffic has stopped, or the attacker immediately re-triggers blocking.

-- WARNING: resets counts for ALL hosts
TRUNCATE TABLE performance_schema.host_cache;

mysqladmin flush-hosts does the same. There is no built-in way to unblock a single IP while preserving counts for others.

If the failure rate threatens to exhaust max_connections, ensure at least one admin account has SUPER or CONNECTION_ADMIN so you can use the reserved admin connection during saturation.

Broken credential rotation

Find the application hosts that still hold the old secret. The error log host field and performance_schema.host_cache point to the subnet or instance. Roll the secret forward to the correct value on those hosts, or roll back the database password if the rotation is incomplete.

For future rotations, use MySQL’s dual-password feature:

ALTER USER 'app_user'@'%'
  IDENTIFIED BY 'new_password'
  RETAIN CURRENT PASSWORD;

Update all application instances, verify Aborted_connects returns to baseline, then discard the old password:

ALTER USER 'app_user'@'%' DISCARD OLD PASSWORD;

This removes the window where partial deployments produce auth failures.

Host blocking and lockout

When a legitimate host is blocked, clients see errors indicating the host is blocked because of many connection errors. Flush the host cache only after correcting the credential or stopping the failure source. Flushing without fixing the root cause repeats the cycle.

If you are already at max_connections, MySQL reserves one additional connection for a user with SUPER or CONNECTION_ADMIN. Use that reserved slot to connect and diagnose.

Prevention

  • Alert on rate. Alert when Aborted_connects per minute exceeds a workload-specific baseline, not just an absolute threshold. A quiet OLTP instance should see near-zero auth failures.
  • Alert on host cache saturation. Set a TICKET-level alert when any IP’s SUM_CONNECT_ERRORS exceeds 50% of max_connect_errors.
  • Use dual-password rotation. Never switch a production password atomically without a window for clients to catch up.
  • Segregate health checks. Configure load balancer health checks to use correct credentials, or use TCP-only checks that do not initiate the MySQL handshake.
  • Restrict network access. MySQL port 3306 should not be reachable from the public internet. Internal access should be segmented by application.
  • Test rotation playbooks. Run credential rotations in a non-production environment with the same connection pool and deploy choreography as production.

How Netdata helps

Netdata correlates the signals that distinguish brute force from broken rotation:

  • Charts Aborted_connects and Aborted_clients per second, making it obvious whether the spike is handshake-time or post-connect.
  • Shows MySQL connection utilization alongside auth-failure rates, so you can see when failures are approaching max_connections.
  • Collects the MySQL error log so you can correlate Access denied spikes with metric patterns.
  • Retains historical metrics for auth failure rates, so deviations from normal behavior trigger alerts even when absolute values are low.
  • Alerts on elevated Aborted_connects and on MySQL availability, giving early warning before host blocking turns into an outage.