SNMP authentication-failure spikes: misconfiguration vs reconnaissance
SNMP authentication-failure traps are one of the few security signals built into the network monitoring stack. When they spike, the question is never “is something wrong?” - it is “is this a broken poller or someone probing my devices?” The answer changes the response from a quiet config fix to a security incident.
The authenticationFailure trap (OID 1.3.6.1.6.3.1.1.5.5) fires whenever an SNMP agent receives a protocol message that is not properly authenticated. On SNMPv2c, that means a wrong community string. On SNMPv3, it means a wrong username, wrong auth protocol, wrong auth password, or wrong privacy password. The trap is defined in SNMPv2-MIB and every compliant agent can generate it, but many vendors ship with it disabled by default. If you have never explicitly enabled it (for example, snmp-server enable traps snmp authentication on Cisco IOS), you may have no signal at all.
The hard part is interpretation. A burst of auth failures from a single IP can be a monitoring station with a stale credential, or an attacker enumerating community strings. A burst across many devices from one source is almost always scanning. The discriminating signals are source-IP distribution, timing regularity, credential variety, and whether the source is inside or outside the management subnet.
Signal types: traps vs cumulative counters
The authenticationFailure trap is event-based: it fires on each failed authentication attempt. This is distinct from the cumulative counters that track the same failure at the agent level.
For SNMPv2c, the relevant counters are:
.1.3.6.1.2.1.11.4(snmpInBadCommunityNames) - community name not recognized by the agent.1.3.6.1.2.1.11.5(snmpInBadCommunityUses) - community recognized but the SNMP operation was not permitted for that community
For SNMPv3, the USM (User-based Security Model) exposes two separate counters:
.1.3.6.1.6.3.15.1.1.3(usmStatsUnknownUserNames) - the username was not recognized by the agent.1.3.6.1.6.3.15.1.1.5(usmStatsWrongDigests) - the username was recognized but the authentication digest did not match
This distinction matters operationally. Rising usmStatsUnknownUserNames with stable usmStatsWrongDigests means someone is probing usernames that do not exist on the device. Rising usmStatsWrongDigests with stable usmStatsUnknownUserNames means the username is valid but the password or auth protocol is wrong, which points to credential rotation or key compromise rather than blind probing.
Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Misconfigured poller | Failures from one known management IP, periodic at the polling interval (often 300s), same credential each time | Verify the poller’s SNMP credential configuration against the device |
| Credential rotation | Failures from one or a few known IPs, starting at a specific timestamp, correct username but wrong digest | Check whether a credential rotation was applied to one side but not the other |
| Vulnerability scanner sweep | Failures across many devices from one source IP, varied credentials attempted, non-periodic timing | Identify the scanner source and verify it is authorized |
| External reconnaissance | Failures from IPs outside the management subnet, rapid succession, varied community strings or usernames | Check perimeter ACL and firewall logs for the source IP |
| Community string “public” still configured | No auth-failure traps at all from scanned devices, but snmpInBadCommunityNames is high because scans succeed against “public” | Poll snmpInBadCommunityNames and check device config for default community strings |
Quick checks
# Check SNMPv2c bad community name counter
snmpget -v2c -c <community> <device> .1.3.6.1.2.1.11.4.0
# Check SNMPv3 USM unknown usernames counter
# Match -a (SHA/SHA-256/MD5) to the agent's configured auth protocol
snmpget -v3 -l authNoPriv -u <user> -a SHA -A <authpass> <device> .1.3.6.1.6.3.15.1.1.3.0
# Check SNMPv3 USM wrong digests counter
snmpget -v3 -l authNoPriv -u <user> -a SHA -A <authpass> <device> .1.3.6.1.6.3.15.1.1.5.0
# Search trap receiver logs for auth failures
grep "authenticationFailure" /var/log/snmptrapd.log | tail -50
# Count auth failure traps from the device syslog (Cisco IOS/IOS XE)
ssh <device> 'show logging | include SNMP-3-AUTHFAIL'
# Verify the auth-failure trap is enabled on the device (Cisco IOS)
ssh <device> 'show run | include snmp-server enable traps snmp authentication'
How to diagnose it
flowchart TD
A["authFailure spike"] --> B{"Single source IP?"}
B -- Yes, known poller --> C["Misconfigured poller or credential rotation"]
B -- Yes, unknown --> D{"Inside mgmt subnet?"}
D -- Yes --> E["Unauthorized internal tool"]
D -- No --> F["External scanning: escalate to security"]
B -- No, many sources --> G{"Varied credentials?"}
G -- Yes --> H["Reconnaissance or vuln scanner"]
G -- No --> I["Shared wrong credential across estate"]Extract source IPs from the trap stream and syslog. The standard authenticationFailure trap carries no varbinds beyond sysUpTime and snmpTrapOID. The source IP of the failed request is more reliably available in device syslog. On Cisco IOS, the
SNMP-3-AUTHFAILsyslog message includes the source IP directly. Parse both the trap receiver log and the device syslog to build a source-IP frequency table.Classify each source IP. For each source, determine: is it a known monitoring station? Is it inside the management subnet? Any nonzero auth-failure rate from a source outside the management subnet is an unauthorized access attempt. Escalate immediately.
Check timing regularity. Misconfigured pollers produce failures at their polling interval, typically every 300 seconds. If failures arrive at a precise, repeating interval, the source is almost certainly a monitoring probe with a stale credential. Random timing or rapid bursts suggest active scanning.
Distinguish USM error types for SNMPv3. Poll
usmStatsUnknownUserNames(.1.3.6.1.6.3.15.1.1.3.0) andusmStatsWrongDigests(.1.3.6.1.6.3.15.1.1.5.0) separately. Rising unknown-usernames with stable wrong-digests indicates username probing. The inverse indicates a valid user with a wrong password or auth protocol mismatch, pointing to misconfiguration rather than probing.Check for multi-vector scanning. Correlate with syslog for SSH and HTTP authentication failures from the same source IP. An attacker probing SNMP is often probing other protocols simultaneously. Check AAA logs (TACACS+/RADIUS) for login failures from the same source.
Verify the absence of traps is not hiding a problem. Many devices still have SNMPv2c community string “public” configured for read-only access. Scans against “public” succeed and therefore do not generate auth-failure traps. Poll
snmpInBadCommunityNameseven when no traps are seen. If the counter is rising with no corresponding traps, investigate whether “public” or “private” is still configured.Check for silent v3 failures. Some platforms silently fail SNMPv3 authentication without generating a trap unless auth-failure trapping is explicitly enabled. If you suspect v3 failures but see no traps, verify the trap configuration on the device.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
snmpInBadCommunityNames (.1.3.6.1.2.1.11.4) | Cumulative count of community-name mismatches at the agent | Any nonzero in production; rising without corresponding traps suggests scans succeeding against “public” |
snmpInBadCommunityUses (.1.3.6.1.2.1.11.5) | Community recognized but operation not permitted | Nonzero indicates a tool attempting write operations with read-only credentials |
usmStatsUnknownUserNames (.1.3.6.1.6.3.15.1.1.3) | SNMPv3 username not recognized by agent | Rising counter with varied usernames indicates reconnaissance |
usmStatsWrongDigests (.1.3.6.1.6.3.15.1.1.5) | SNMPv3 username valid but auth digest mismatch | Rising counter with stable user count indicates credential rotation or key compromise |
authenticationFailure trap rate (.1.3.6.1.6.3.1.1.5.5) | Event-level signal | Rate > 10 events/min from a single source is scanning; any event from outside the management subnet is unauthorized |
| SSH/HTTP auth failure rate (syslog/AAA) | Multi-vector scanning probes multiple protocols from one source | Same source IP failing auth across SNMP, SSH, and HTTP is active reconnaissance |
| Per-source trap frequency distribution | A single sender dominating trap volume is a finding | One source IP accounting for the majority of auth-failure traps |
Fixes
Misconfigured poller or stale credential
The most common cause. A monitoring station was recently updated, moved, or reconfigured, and its SNMP credentials no longer match the device.
- Identify which poller is generating the failures from the trap source IP or syslog.
- Verify the poller’s configured community string or SNMPv3 credentials against the device’s actual configuration.
- For SNMPv3, check the auth protocol (MD5 vs SHA), auth password, and privacy password independently. A mismatch on any one produces auth failures. SNMPv3 USM (RFC 3414) requires auth passwords of at least 8 characters; shorter passwords fail silently on compliant agents.
- Apply the correct credential to the poller. Do not change the device credential to match the poller unless the poller’s credential is the intended one.
Credential rotation mismatch
Credential rotation applied to one side but not the other.
- Verify whether a credential rotation was recently scheduled or executed.
- Check both the device and the monitoring system for the current credential.
- Synchronize. Prefer rotating on the device first, then updating the poller, to minimize the failure window.
External reconnaissance or scanning
Failures from IPs outside the management subnet require a security response.
- Check perimeter firewall and ACL logs for the source IP.
- Block the source IP at the perimeter if policy permits.
- Verify that SNMP access (UDP 161) is restricted to the management subnet via ACLs on every device. If it is reachable from outside, that is the configuration error that enabled the scanning.
- Escalate to the security team. Correlate with SSH, HTTP, or other protocol auth failures from the same source.
- If any device still uses community string “public” or “private”, remediate immediately. Scans against “public” succeed silently and never generate auth-failure traps.
SNMPv3 credential special-character issues
On some platforms, shell-special characters ($, backticks, !) in SNMPv3 credentials cause silent authentication failures when passed through CLI or configuration management tools. Test credentials that contain only alphanumerics first to isolate parsing from genuine auth failures.
CVE-2025-20352 exposure
CVE-2025-20352 (CVSS 7.7, disclosed September 2025) is a stack overflow vulnerability in the SNMP subsystem of Cisco IOS and IOS XE. It affects all SNMP versions (v1, v2c, v3) and exploitation requires valid SNMP credentials. An authenticated remote attacker can cause a device reload or execute code as root. If your estate includes Cisco IOS or IOS XE devices and you observe auth-failure spikes from external sources, treat this as a potential precursor to exploitation. Restrict SNMP access to trusted management IPs via ACLs and patch to the fixed release.
Prevention
- Enable auth-failure traps on every device. Many vendors ship with this disabled. On Cisco IOS, use
snmp-server enable traps snmp authentication. Without this, you have no event-level signal. - Eliminate default community strings. Any device still using “public” or “private” is a silent finding. Scans against “public” succeed without generating any auth-failure trap, so the absence of traps does not mean the absence of scans.
- Restrict SNMP to the management subnet. SNMP on UDP 161 should never be reachable from outside the management network. Apply ACLs on every device.
- Prefer SNMPv3 over v2c. SNMPv2c community strings are transmitted in cleartext and can be captured by passive sniffing on the management VLAN. SNMPv3 with authPriv provides both authentication and encryption.
- Baseline the auth-failure rate. A healthy estate should have zero auth failures in steady state. Any nonzero value is abnormal. Track the per-source distribution so that a new source is immediately visible.
- Monitor cumulative counters, not just traps. Traps can be dropped by a flooded receiver. UDP 162 is lossy under burst. The cumulative counters (
snmpInBadCommunityNames,usmStatsUnknownUserNames,usmStatsWrongDigests) persist at the agent and survive trap loss. - Correlate with SSH and AAA auth failures. Multi-vector scanning probes multiple protocols from the same source. Join SNMP auth-failure events with syslog auth failures by source IP and timestamp.
How Netdata helps
- Netdata’s SNMP collector can poll
snmpInBadCommunityNames,usmStatsUnknownUserNames, andusmStatsWrongDigestsat per-second resolution, giving rate-of-change visibility that 5-minute polling misses. - Trap receiver metrics expose per-trap-type rates, so an authenticationFailure spike is visible as a distinct signal alongside linkDown, coldStart, and enterprise-specific traps.
- Correlate SNMP auth-failure spikes with syslog auth failures from the same source IP on the unified timeline, without joining across separate tools.
- Anomaly detection on the auth-failure counter rate baselines the normal (zero) state and flags any deviation, including slow-rate probing that stays below fixed thresholds.
- UDP socket buffer drop monitoring (
Udp_RcvbufErrors) on the trap receiver surfaces when traps are lost under burst, so an auth-failure spike does not silently disappear at the receiver.
Related guides
- ARP cache staleness: when IP-to-MAC mapping goes bad
- Asymmetric routing: why your path and latency measurements lie
- Audit log gaps: detecting syslog/trap tampering or loss
- BGP flapping: why a peer keeps resetting and how to find the cause
- BGP NOTIFICATION and Cease messages: what each subcode is telling you
- BGP RIB and FIB growth: monitoring route-table size before it bites
- BGP route leak and hijack: the detection signals and alerts that matter
- BGP session Established but stale: detecting silent route loss
- Correlating cloud VPC flow logs with on-prem NetFlow
- Cold-start topology: why your map is incomplete after a collector restart
- Collector CPU and TSDB write-queue saturation: the capacity signals
- NIC RSS misconfiguration: one CPU core silently dropping your telemetry







