Elasticsearch authentication failures: audit logs, brute force, and credential drift
Elasticsearch does not expose authentication failure counts through _nodes/stats or _cluster/health. The security audit log is the only structured source for authentication_failed, access_denied, and run_as_denied events. Without it, brute force attempts, credential stuffing, and expiring service tokens are invisible until they cause an outage.
What this means
When xpack.security.audit.enabled is true, each node writes security events to a local audit log file. The events that matter for auth issues are:
authentication_failed: credentials did not match any known user.access_denied: an authenticated user attempted an unauthorized action.run_as_denied: an impersonation attempt was rejected.
There is no stats API counter for these events. With audit logging disabled, the only symptom is a client-side error, or silence until a service stops working.
Recognize three operational patterns:
- Brute force: a sudden burst of
authentication_failedevents from a single source IP. - Credential stuffing:
authentication_failedevents spread across many distinct usernames, indicating an attacker is trying common passwords at scale. - Credential drift: a slow, steady rise in failures over days or weeks, usually from a service account whose token or password expired or was rotated out of sync with one client.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Brute force attack | Sudden burst of authentication_failed from one source address | Audit log for source IP concentration |
| Credential stuffing | authentication_failed spread across many distinct principals | Audit log for username diversity |
| Expired service account credentials | Gradual rise in authentication_failed over days or weeks | Audit log for service account principals |
| Client misconfiguration | Steady authentication_failed rate from a single application host | Recent deployments or configuration changes |
| TLS certificate expiry | Connection failures followed by auth errors in certificate-based realms | GET /_ssl/certificates for expiry dates |
| Unauthorized access attempts | access_denied or run_as_denied events | Role mappings and index privileges |
Quick checks
Adjust the path, protocol, and credentials to match your environment.
# Tail recent authentication failures from the audit log
grep 'authentication_failed' /var/log/elasticsearch/*_audit.json | tail -20
# Count total authentication failures in the current audit log
grep 'authentication_failed' /var/log/elasticsearch/*_audit.json | wc -l
# Check for unauthorized access attempts
grep 'access_denied' /var/log/elasticsearch/*_audit.json | tail -20
# Check for denied impersonation attempts
grep 'run_as_denied' /var/log/elasticsearch/*_audit.json | tail -20
# Check TLS certificate expiry
curl -s 'https://localhost:9200/_ssl/certificates' | jq '.[] | {path, expiry, has_private_key}'
# Verify cluster stability before making changes
curl -s 'https://localhost:9200/_cluster/health?filter_path=status,number_of_nodes'
How to diagnose it
Enable audit logging if it is off. Set
xpack.security.audit.enabled: trueinelasticsearch.ymlon every node. If a restart is required, do not restart the master node during an incident unless the cluster has at least two other master-eligible nodes.Extract
authentication_failedevents. Grep the audit log and look for concentration by source IP and target username.Classify the pattern.
- Brute force: more than 100 failures per minute from a single source IP.
- Credential stuffing: failures spread across many distinct usernames with low per-user counts.
- Credential drift: a slow rise in failures from a known service principal over days.
Check for
access_deniedandrun_as_denied. If authentication succeeds but operations fail, the issue is authorization, not credentials. Correlate the denied action with the requesting principal and target indices.Verify TLS certificate validity. If your cluster uses client-certificate authentication, check
GET /_ssl/certificates. Expired or near-expiry certificates cause authentication failures that look like credential issues but are actually trust failures.Review recent changes. Map the failure timeline to deployments, configuration pushes, or scheduled token rotations. A sudden increase after a rollout usually points to a missed client update.
Metrics and signals to monitor
There is no metrics API for auth failures. Operational monitoring relies on audit log aggregation and certificate tracking.
| Signal | Why it matters | Warning sign |
|---|---|---|
authentication_failed rate from single source | Brute force detection | Burst greater than 100 failures/minute from one IP |
authentication_failed username diversity | Credential stuffing detection | Failures across many distinct principals |
authentication_failed trend for service accounts | Credential drift | Gradual rise over days or weeks |
access_denied rate for service accounts | Misconfigured roles or privileges | Sustained nonzero rate after a deployment |
| TLS certificate remaining lifetime | Prevent cluster split | Expiring within 7 days |
run_as_denied events | Impersonation attempts | Any unexpected denied run_as requests |
Fixes
Brute force attacks
Block the source IP at your firewall, load balancer, or reverse proxy. Elasticsearch does not provide native brute force protection, so network-level blocking is the only effective response. Rotate any credentials that may have been exposed. Tradeoff: broad IP blocks can catch legitimate users behind NAT. Target the specific origin address first.
Credential stuffing
Force password resets for any accounts that showed successful authentication around the time of the attack. Review password complexity policies. If the attack targeted the native realm over the internet, restrict the Elasticsearch HTTP port to a VPN or known bastion hosts. Tradeoff: mass password resets create user friction and support load.
Expired service account credentials
Generate new tokens or passwords for the affected service account and update all client configurations simultaneously. Check for clients that missed the rotation; they will continue to produce a low, steady rate of authentication_failed events. Tradeoff: rotating credentials without updating every client prolongs the outage.
Authorization failures (access_denied)
If the user authenticates successfully but cannot perform an operation, review role mappings and index-level privileges. Map the denied action to the smallest role that permits it. Tradeoff: granting broad privileges fixes the symptom but violates least privilege and increases blast radius.
TLS certificate expiry
Renew certificates before expiry. In ES 8.x, security is on by default and certificate expiry can cause nodes to reject connections and split the cluster. Plan rotation at least 7 days before expiration.
Prevention
- Enable audit logging before you need it. Without audit logs, authentication failures cannot be correlated with other cluster events.
- Ship audit logs to a SIEM or log aggregator. Local grep during an incident is too slow for distributed clusters.
- Monitor TLS certificate expiry proactively. Use
GET /_ssl/certificatesin an automated check or external monitor. - Rotate service account credentials on a schedule. Track which clients hold which tokens and rotate all copies simultaneously to avoid drift.
- Restrict network access to Elasticsearch ports. The HTTP and transport interfaces should not be exposed to the public internet.
How Netdata helps
Netdata cannot replace the audit log, but it provides surrounding context:
- Correlate authentication failure bursts with node-level CPU, memory, and network spikes.
- Alert on TLS certificate expiry before connections begin failing.
- Track node availability and thread pool health to distinguish auth failures caused by cluster stress from credential issues.
- Surface audit log patterns locally when a SIEM is not in place.
Related guides
- Elasticsearch all shards failed: diagnosing search_phase_execution_exception
- Elasticsearch CircuitBreakingException: [parent] Data too large - causes and fixes
- Elasticsearch cluster_block_exception: blocked by, the read-only blocks explained
- Elasticsearch cluster health red: unassigned primaries and how to recover
- Elasticsearch cluster health yellow: unassigned replicas vs real allocation blocks
- Elasticsearch cluster state too large: field count, index count, and per-node heap
- Elasticsearch disk full: emergency recovery and freeing space safely
- Elasticsearch disk watermark cascade: from low watermark to cluster-wide read-only
- Elasticsearch document indexing failures: index_failed, bulk item errors, and version conflicts
- Elasticsearch EsRejectedExecutionException: write thread pool rejections and HTTP 429
- Elasticsearch fielddata circuit breaker tripped: text-field aggregations and the keyword fix
- Elasticsearch FORBIDDEN/12/index read-only / allow delete (api) — flood stage recovery







