Vendor API 429 throttling: Meraki, Cato, and PAN-OS rate limits

Meraki, Cato, or PAN-OS API-polled devices go dark in your dashboard while ICMP and SNMP to the same devices return healthy responses. Every device sourced from the same vendor API flatlines at the same timestamp. The cause: your collector exhausted the vendor API rate-limit budget and is now receiving HTTP 429 instead of data.

This pattern is frequently misdiagnosed because the symptom (devices appearing “down”) sits two layers above the cause (rate limit exhausted). It surfaces most often during incidents, when teams tighten polling intervals for faster data, or silently when multiple tools share a single API key without coordination.

What this means

HTTP 429 means the vendor refused your request because you exceeded the rate limit for the current window. Each vendor enforces limits differently:

Meraki Dashboard API: Two independent dimensions - throughput (10 req/sec per organization, burst of 30 in the first 2 seconds) and concurrency (10 concurrent requests per IP). The rate-limiting key is the source IP. A 429 response includes a Retry-After header. Some administrative endpoints have stricter limits: 10 requests per 5-minute window per IP.

Cato GraphQL API: Per-query, per-account limits. General floor: 120 requests/minute. Specific queries have lower ceilings: accountSnapshot at 1/sec, accountMetrics at 15/min, eventsFeed at 100/min. Two users issuing different query names do not share a counter, but two users issuing the same query name share the counter across all API keys on that account. Cato does not formally publish rate-limit response headers; verify empirically.

PAN-OS XML/REST API: No published per-second rate. Palo Alto recommends a maximum of 5 concurrent API calls per firewall or Panorama. Exceeding this does not always produce a 429; it degrades the management-plane web server, which serves both API and web UI requests. Prisma Cloud CSPM exposes X-RateLimit-Remaining and related headers keyed per user.

Common causes

CauseWhat it looks likeFirst thing to check
Multiple collectors sharing one API keyAll API-polled devices from one vendor go dark simultaneously; no single collector exceeds the limit aloneCount every tool, script, or integration using the same key
Polling frequency increased during incident429s start shortly after a scrape interval change or ad-hoc query burstCheck collector config history; correlate 429 onset with the change
Runaway script or automation loopSudden burst of 429s with high request volume from one sourceCheck vendor audit trail for request volume anomalies
Per-query collision (Cato)One Cato query type throttles while others remain healthyIdentify which GraphQL query name hits its per-query limit
Concurrency exhaustion (Meraki)Slow responses exhaust the 10-connection cap while staying under 10 req/secCheck collector worker pool size; look for slow API responses
API key rotation or expiry401/403 responses mixed with 429s; some endpoints work, others failVerify key validity; SAML/SSO admins on Meraki cannot generate API keys

Quick checks

# Check Meraki rate-limit headers (read-only; single org list call)
curl -sI -H "Authorization: Bearer $MERAKI_KEY" \
  https://api.meraki.com/api/v1/organizations | grep -i 'ratelimit\|retry'

# Check Cato response headers for rate-limit indicators (requires valid POST)
curl -s -D - -o /dev/null \
  -H "x-api-key: $CATO_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query":"{ __typename }"}' \
  https://api.catonetworks.com/api/v1/graphql2 | grep -i 'ratelimit\|retry'

# Check PAN-OS API responsiveness and management-plane health
curl -sk "https://$FW_HOST/api/?type=op&cmd=<show><system><info></info></system></show>&key=$PAN_KEY" \
  -w "\nHTTP: %{http_code} Time: %{time_total}s\n"

# Verify ICMP still works to the same devices (should succeed if only API is throttled)
ping -c 3 -i 0.2 <device-ip>

# Check if SNMP still returns data (should succeed if only API is throttled)
snmpget -v2c -c <community> <device> .1.3.6.1.2.1.1.3.0

# Probe actual HTTP status codes from the vendor API over a short window
# Note: each call consumes API quota
for i in $(seq 1 10); do
  curl -s -o /dev/null -w "%{http_code}\n" \
    -H "Authorization: Bearer $MERAKI_KEY" \
    https://api.meraki.com/api/v1/organizations
  sleep 0.5
done

# Verify PAN-OS API key validity (invalid key returns error inside HTTP 200 body)
curl -sk "https://$FW_HOST/api/?type=op&cmd=<show><system><info></info></system></show>&key=$PAN_KEY" | head -5

How to diagnose it

flowchart TD
    A[API-polled devices go dark] --> B{ICMP/SNMP still working?}
    B -- No --> C[Network or device outage]
    B -- Yes --> D{HTTP 429 in collector logs?}
    D -- No --> E[Check 401/403: key rotation]
    D -- Yes --> F{Which vendor?}
    F -- Meraki --> G[Check Retry-After header]
    F -- Cato --> H[Identify throttled query name]
    F -- PAN-OS --> I[Check concurrent connection count]
    G --> J[Audit key consumers and poll cadence]
    H --> J
    I --> J
    J --> K[Reduce consumption or shard API keys]
  1. Confirm the scope. Only API-polled devices affected? Run the ICMP and SNMP quick checks above. If both return healthy, the network and devices are fine; the problem is between your collector and the vendor API.

  2. Identify the HTTP status code. Inspect collector logs or run the manual curl loop. A stream of 429 confirms throttling. A mix of 401 or 403 suggests key rotation or expiry. Meraki returns 404 (not 403) on a bad API key by design, to avoid leaking resource existence. PAN-OS returns <response status="error"> inside an HTTP 200 body, so HTTP status alone is misleading.

  3. Identify the throttling dimension. For Meraki, check the Retry-After header on the 429 response. For Cato, identify which GraphQL query name is hitting its limit: the general 120/min applies broadly, but specific queries like accountSnapshot (1/sec) or accountMetrics (15/min) have much lower floors. For PAN-OS, check whether you have more than 5 concurrent API calls in flight to the same firewall or Panorama.

  4. Audit key consumers. The most common root cause is a shared API key consumed by multiple tools with no coordination. Enumerate every system that uses the vendor API key: the NMS, automation scripts, third-party integrations, ad-hoc dashboards, and CI/CD pipelines. For Cato specifically, same-named queries share a counter across all API keys on an account, so splitting keys alone does not solve per-query collisions.

  5. Check for recent polling changes. Correlate the onset of 429s with collector configuration changes. Did someone decrease the poll interval? Add a batch of new devices? Start a new API-driven compliance report?

Metrics and signals to monitor

SignalWhy it mattersWarning sign
HTTP 429 rate from vendor APIDirect indicator of active throttlingAny sustained rate above 0
Retry-After header (Meraki)How long the vendor wants you to back offPresence means you are throttled
API request latencyRising latency often precedes 429s as the vendor applies backpressurep99 latency greater than 5x baseline
API rate-limit remaining (where exposed)Leading indicator before the throttle cliffBelow 20% of quota per window
Data freshness for API-sourced metricsStaleness is the downstream symptom of throttlingTime since last successful poll greater than 2x poll interval
X-RateLimit-Remaining (Prisma Cloud CSPM)Per-user remaining budget exposed in headersValue at 0 means throttled
PAN-OS management-plane web UI responsivenessAPI over-consumption degrades the shared web server processWeb UI sluggish when API calls are in flight

Fixes

Reduce polling frequency

Increase the poll interval for API-sourced data to stay below 70% of the documented limit. For Meraki, that means staying well under 10 req/sec/org across all consumers of that key. For Cato, calculate per-query consumption: if you poll accountSnapshot every second, you are at the 1/sec ceiling with zero headroom for any other consumer. Back off to every 5 to 10 seconds.

Shard API keys

If multiple tools share one API key, assign separate keys per tool. Meraki allows multiple API keys per organization, generated by different dashboard admins. For Cato, same-named queries share a counter across all keys on an account, so sharding keys alone does not resolve per-query collisions. Namespace your queries or stagger identical polling schedules across consumers.

Implement exponential backoff with Retry-After

Respect the Retry-After header on 429 responses. The Meraki Python SDK (meraki/dashboard-api-python) performs automatic retries on 429 by reading this header. Custom integrations must not assume the header is always present; implement a fallback fixed-interval retry when it is absent. For Cato, wait a few minutes and resume after a 429.

Limit concurrency for PAN-OS

Keep in-flight PAN-OS API calls at or below 5 at any time. Exceeding this degrades the management-plane web server and can cause request failures that look like timeouts. If you need higher throughput, use Panorama as an aggregation point and batch requests. For PAN-OS API key management, keys are generated via /api/?type=keygen&user=<user>&password=<password>. This exposes credentials in the URL; prefer generating keys via the web UI when possible. The key is passed as the X-PAN-KEY header or key= query parameter.

Collapse redundant queries

Third-party monitoring templates that issue hundreds of API calls per scan interval, for example pulling per-device metrics across thousands of Meraki devices, routinely exhaust the budget. Replace these with lightweight org-level API calls that return aggregate status. For Cato, consolidate multiple accountMetrics calls into fewer broader queries rather than issuing many narrow ones.

Prevention

  • Track API consumption proactively. Monitor the rate of API calls per key per minute against the documented limit, not just the 429 count. Catch consumption at 70% of budget, not at 100%.
  • Alert on data freshness, not just errors. When the API is throttled, the collector may stop logging errors and simply stop updating. Track time since last successful API response per vendor.
  • Document key ownership. Every API key should have a named owner and a list of consuming systems. When a new tool needs vendor API access, it gets its own key.
  • Budget for incident surges. Set steady-state consumption low enough (below 50% of limit) that a 2x surge during an incident does not trigger throttling.
  • Validate response bodies, not just HTTP status. PAN-OS returns <response status="error"> inside HTTP 200. A collector that checks only the status code will miss this failure and treat the response as successful with no data.
  • Watch for vendor-side tightening. Meraki tightened limits on some administrative endpoints in 2023-2024. Operators relying on historical polling cadences can hit unexpected 429s after vendor-side changes.

How Netdata helps

  • Correlate API data freshness with device-level signals. Netdata charts the time since last successful API response alongside ICMP reachability and SNMP data, making it immediately visible when only the API layer is degraded while the network path is healthy.
  • Track HTTP 429 rates as a first-class metric. A dedicated chart for vendor API error rates per vendor lets you spot throttling before it causes data gaps.
  • Monitor collector-side resource pressure. CPU spikes, worker thread saturation, or queue depth on the collector host can indicate that the collector is over-consuming API budgets.
  • Multi-vendor signal correlation. When Meraki, Cato, and PAN-OS API health are charted together, a single-vendor throttling event is immediately distinguishable from a broader connectivity issue or a collector host problem.
  • Alert on staleness thresholds. Configurable alerts on data freshness degradation catch the silent gap that occurs when the collector receives 429s but stops logging them as errors.