nginx: no resolver defined to resolve - dynamic upstream DNS

502 Bad Gateway responses paired with no resolver defined to resolve example.com in the error log mean proxy_pass uses a variable - for example, proxy_pass http://$backend; - and the enclosing context has no resolver directive.

With a literal proxy_pass, nginx resolves the upstream hostname once at startup or reload and caches the result indefinitely. It never queries DNS again until restart or reload. With a variable-based proxy_pass, nginx resolves the hostname at request time through its internal async resolver. Without a resolver directive, the lookup fails immediately and returns 502.

Since nginx 1.27.3, the resolve parameter on server directives inside upstream blocks is available in open-source nginx. It provides background DNS refreshes and automatic peer list updates without requiring a variable in proxy_pass. On older versions, the only dynamic option is variable-based proxy_pass with an explicit resolver.

What this means

Static versus dynamic resolution determines when DNS is queried, what happens when an upstream IP changes, and whether you need extra configuration.

Static resolution bakes the IP into the running configuration at parse time. If the upstream moves to a new IP, nginx sends traffic to the old address until reload or restart. Simple and efficient, but breaks autoscaling, failover, and environments where backend IPs change frequently.

Variable-based resolution moves DNS lookups into the request path. When a request hits proxy_pass http://$backend;, nginx checks its internal resolver cache. If the entry is missing or expired, it triggers an async DNS query through the event loop. The worker does not block; the request pauses until the lookup completes or resolver_timeout (default 30s) expires. DNS delays do not show up in $upstream_connect_time and do not block the worker. However, the paused request consumes a connection slot until the lookup completes or times out, so a DNS outage can still cause connection exhaustion despite responsive workers.

The third option, upstream resolve, requires nginx 1.27.3 or later. Define an upstream block with a zone for shared memory and add resolve to the server line. nginx refreshes the DNS record in the background and updates the upstream peer list automatically. This avoids per-request lookups and handles headless Kubernetes Services correctly by updating the entire peer list instead of picking one IP per request.

flowchart TD
    A[proxy_pass directive] --> B{Contains variable?}
    B -->|No| C[Resolve at startup only]
    B -->|Yes| D{resolver directive?}
    D -->|Missing| E["Error: no resolver defined"]
    D -->|Present| F[Async lookup per request]
    G[upstream server resolve] --> H[1.27.3+: background refresh]

Common causes

CauseWhat it looks likeFirst thing to check
Variable proxy_pass without resolverExact error in logs; 502 on affected location onlynginx -T | grep resolver
DNS server unreachable or slowIntermittent 502s; latency spikes near 30sdig @<resolver_ip> <hostname> from the nginx host
Stale DNS cache after upstream IP changeOld backend IPs used despite DNS updateresolver valid= value and actual DNS TTL
Missing zone with resolve parameterConfig test fails with “resolving names at run time requires upstream to be in shared memory”nginx -t output
NXDOMAIN cached in KubernetesPersistent 502 after pod scale-down until TTL expiresvalid= timeout vs endpoint churn rate

Quick checks

# Check error log for resolver failures
grep -iE "no resolver defined|host not found|upstream timed out" /var/log/nginx/error.log | tail -20
# Find variable-based proxy_pass directives
nginx -T 2>/dev/null | grep -n 'proxy_pass http://\$'
# Verify resolver configuration and scope
nginx -T 2>/dev/null | grep -B1 -A1 'resolver'
# Check current resolver_timeout (default 30s if unset)
nginx -T 2>/dev/null | grep resolver_timeout
# Test DNS from the nginx host using the configured resolver
dig @127.0.0.11 +short api.example.com
# Check nginx version for resolve support
nginx -v 2>&1
# If using resolve, confirm zone directive exists
nginx -T 2>/dev/null | grep -B2 -A2 'resolve'

How to diagnose it

  1. Locate the exact no resolver defined to resolve <hostname> line in the error log. Note the timestamp.
  2. Find the location block proxying to that hostname. Look for proxy_pass containing a variable.
  3. Check whether a resolver directive exists in the http or server block. Variable-based proxy_pass cannot function without it.
  4. If a resolver is present but errors persist, test DNS reachability from the nginx host using dig or nslookup. Verify the resolver is accessible and returns answers quickly.
  5. If using resolve on an upstream server directive, verify the upstream block includes a zone directive. Without zone, nginx -t fails.
  6. Correlate 502 spikes with DNS TTL expiry or upstream scaling events. In Kubernetes, check whether the error coincides with pod churn and whether negative caching extends the failure window.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
Error log resolver failuresDirect indicator of DNS lookup failuresSustained no resolver defined or host not found entries
502 rate by locationIsolates affected paths502s concentrated in locations with variable proxy_pass
Request latency P95Resolver timeouts default to 30sLatency spikes clustering near 30s
Active connectionsSlow resolver responses hold slotsConnection pile-up without corresponding throughput increase
Upstream response timeDistinguishes DNS delay from backend slowness$upstream_response_time normal but $request_time elevated

Fixes

Add a resolver directive for variable proxy_pass

Add the directive at the http or server level:

http {
    resolver 127.0.0.11 valid=10s;
    ...
    location / {
        set $backend "http://api.example.com";
        proxy_pass $backend;
    }
}

Tradeoff: shorter valid= values improve failover speed but increase DNS query volume and expand the cache-poisoning window.

Use upstream resolve for background refresh (1.27.3+)

upstream api {
    zone upstream_dynamic 64k;
    server api.example.com:443 resolve;
}
resolver 8.8.8.8 valid=30s;

Keeps the peer list fresh without per-request lookups. Tradeoff: requires shared memory via zone. The resolve parameter does not work without it, and nginx -t rejects the combination.

Switch to static upstreams for stable backends

If the upstream IP rarely changes, use a literal proxy_pass or a static upstream block. nginx resolves once at startup and never again. This removes runtime DNS dependencies entirely and is the safest choice for stable environments.

Tune resolver_timeout

resolver_timeout 5s;

Default is 30s. Lower values fail faster, reducing connection slot consumption during DNS outages, but increase 502s during brief resolver latency spikes.

Disable IPv6 on IPv4-only networks

resolver 8.8.8.8 ipv6=off;

nginx queries both A and AAAA records by default. On IPv4-only networks, unnecessary AAAA lookups add latency and can cause silent failures.

Address Kubernetes negative caching

In Kubernetes, nginx caches negative responses such as NXDOMAIN. If a Service endpoint disappears, nginx returns 502s until the cache expires. Avoid variable-based proxy_pass for highly dynamic headless Services, or set a very short valid= value and monitor pod churn closely.

Prevention

  • Prefer static upstream definitions for stable backends. Removes runtime DNS dependencies and eliminates resolver cache management.
  • Always pair variable proxy_pass with an explicit resolver directive. Required for runtime hostname resolution in open-source nginx before 1.27.3.
  • Set valid= based on upstream churn rate. 10s to 60s is commonly used in dynamic environments.
  • Use upstream resolve with zone on nginx 1.27.3 or later. Replaces per-request lookups with background refreshes and handles peer list updates automatically.
  • Monitor resolver error logs and 502 rates as separate signals. DNS failures often appear as latency before they appear as errors.

How Netdata helps

  • Correlate 502 spikes with error log resolver failures.
  • Monitor active connections and request latency to catch resolver timeout delays before they cascade into connection exhaustion.
  • Alert on P95 latency deviations clustering around resolver_timeout values.
  • Compare $upstream_response_time against $request_time to isolate DNS delays from backend slowness.