Elasticsearch authentication failures: audit logs, brute force, and credential drift

Elasticsearch does not expose authentication failure counts through _nodes/stats or _cluster/health. The security audit log is the only structured source for authentication_failed, access_denied, and run_as_denied events. Without it, brute force attempts, credential stuffing, and expiring service tokens are invisible until they cause an outage.

What this means

When xpack.security.audit.enabled is true, each node writes security events to a local audit log file. The events that matter for auth issues are:

  • authentication_failed: credentials did not match any known user.
  • access_denied: an authenticated user attempted an unauthorized action.
  • run_as_denied: an impersonation attempt was rejected.

There is no stats API counter for these events. With audit logging disabled, the only symptom is a client-side error, or silence until a service stops working.

Recognize three operational patterns:

  • Brute force: a sudden burst of authentication_failed events from a single source IP.
  • Credential stuffing: authentication_failed events spread across many distinct usernames, indicating an attacker is trying common passwords at scale.
  • Credential drift: a slow, steady rise in failures over days or weeks, usually from a service account whose token or password expired or was rotated out of sync with one client.

Common causes

CauseWhat it looks likeFirst thing to check
Brute force attackSudden burst of authentication_failed from one source addressAudit log for source IP concentration
Credential stuffingauthentication_failed spread across many distinct principalsAudit log for username diversity
Expired service account credentialsGradual rise in authentication_failed over days or weeksAudit log for service account principals
Client misconfigurationSteady authentication_failed rate from a single application hostRecent deployments or configuration changes
TLS certificate expiryConnection failures followed by auth errors in certificate-based realmsGET /_ssl/certificates for expiry dates
Unauthorized access attemptsaccess_denied or run_as_denied eventsRole mappings and index privileges

Quick checks

Adjust the path, protocol, and credentials to match your environment.

# Tail recent authentication failures from the audit log
grep 'authentication_failed' /var/log/elasticsearch/*_audit.json | tail -20

# Count total authentication failures in the current audit log
grep 'authentication_failed' /var/log/elasticsearch/*_audit.json | wc -l

# Check for unauthorized access attempts
grep 'access_denied' /var/log/elasticsearch/*_audit.json | tail -20

# Check for denied impersonation attempts
grep 'run_as_denied' /var/log/elasticsearch/*_audit.json | tail -20

# Check TLS certificate expiry
curl -s 'https://localhost:9200/_ssl/certificates' | jq '.[] | {path, expiry, has_private_key}'

# Verify cluster stability before making changes
curl -s 'https://localhost:9200/_cluster/health?filter_path=status,number_of_nodes'

How to diagnose it

  1. Enable audit logging if it is off. Set xpack.security.audit.enabled: true in elasticsearch.yml on every node. If a restart is required, do not restart the master node during an incident unless the cluster has at least two other master-eligible nodes.

  2. Extract authentication_failed events. Grep the audit log and look for concentration by source IP and target username.

  3. Classify the pattern.

    • Brute force: more than 100 failures per minute from a single source IP.
    • Credential stuffing: failures spread across many distinct usernames with low per-user counts.
    • Credential drift: a slow rise in failures from a known service principal over days.
  4. Check for access_denied and run_as_denied. If authentication succeeds but operations fail, the issue is authorization, not credentials. Correlate the denied action with the requesting principal and target indices.

  5. Verify TLS certificate validity. If your cluster uses client-certificate authentication, check GET /_ssl/certificates. Expired or near-expiry certificates cause authentication failures that look like credential issues but are actually trust failures.

  6. Review recent changes. Map the failure timeline to deployments, configuration pushes, or scheduled token rotations. A sudden increase after a rollout usually points to a missed client update.

Metrics and signals to monitor

There is no metrics API for auth failures. Operational monitoring relies on audit log aggregation and certificate tracking.

SignalWhy it mattersWarning sign
authentication_failed rate from single sourceBrute force detectionBurst greater than 100 failures/minute from one IP
authentication_failed username diversityCredential stuffing detectionFailures across many distinct principals
authentication_failed trend for service accountsCredential driftGradual rise over days or weeks
access_denied rate for service accountsMisconfigured roles or privilegesSustained nonzero rate after a deployment
TLS certificate remaining lifetimePrevent cluster splitExpiring within 7 days
run_as_denied eventsImpersonation attemptsAny unexpected denied run_as requests

Fixes

Brute force attacks

Block the source IP at your firewall, load balancer, or reverse proxy. Elasticsearch does not provide native brute force protection, so network-level blocking is the only effective response. Rotate any credentials that may have been exposed. Tradeoff: broad IP blocks can catch legitimate users behind NAT. Target the specific origin address first.

Credential stuffing

Force password resets for any accounts that showed successful authentication around the time of the attack. Review password complexity policies. If the attack targeted the native realm over the internet, restrict the Elasticsearch HTTP port to a VPN or known bastion hosts. Tradeoff: mass password resets create user friction and support load.

Expired service account credentials

Generate new tokens or passwords for the affected service account and update all client configurations simultaneously. Check for clients that missed the rotation; they will continue to produce a low, steady rate of authentication_failed events. Tradeoff: rotating credentials without updating every client prolongs the outage.

Authorization failures (access_denied)

If the user authenticates successfully but cannot perform an operation, review role mappings and index-level privileges. Map the denied action to the smallest role that permits it. Tradeoff: granting broad privileges fixes the symptom but violates least privilege and increases blast radius.

TLS certificate expiry

Renew certificates before expiry. In ES 8.x, security is on by default and certificate expiry can cause nodes to reject connections and split the cluster. Plan rotation at least 7 days before expiration.

Prevention

  • Enable audit logging before you need it. Without audit logs, authentication failures cannot be correlated with other cluster events.
  • Ship audit logs to a SIEM or log aggregator. Local grep during an incident is too slow for distributed clusters.
  • Monitor TLS certificate expiry proactively. Use GET /_ssl/certificates in an automated check or external monitor.
  • Rotate service account credentials on a schedule. Track which clients hold which tokens and rotate all copies simultaneously to avoid drift.
  • Restrict network access to Elasticsearch ports. The HTTP and transport interfaces should not be exposed to the public internet.

How Netdata helps

Netdata cannot replace the audit log, but it provides surrounding context:

  • Correlate authentication failure bursts with node-level CPU, memory, and network spikes.
  • Alert on TLS certificate expiry before connections begin failing.
  • Track node availability and thread pool health to distinguish auth failures caused by cluster stress from credential issues.
  • Surface audit log patterns locally when a SIEM is not in place.