nginx 500 Internal Server Error: how to diagnose it

A 500 from nginx tells you something failed in the request path, but not whether the failure originated in your application, FastCGI/uWSGI backend, or nginx itself. When 500s spike during an incident, first determine which side of the nginx boundary is breaking.

Unlike 502 Bad Gateway or 504 Gateway Time-out, which point upstream, a 500 can be an application bug passed through by nginx, a configuration error, a permission failure, or resource exhaustion inside an nginx worker.

If your log format includes $upstream_status, you can separate upstream-generated 500s from nginx-generated 500s. If you rely on the default access log alone, both paths look identical.

What this means

In production, a 500 means one of three things.

Upstream application error. The application returned 500 and nginx passed it through. The access log shows status 500 and $upstream_status 500. The nginx error log is often silent; the application logs contain the crash, exception, or assertion.

nginx internal error. Nginx generated the 500 because of a runtime configuration failure, permission denied during request processing, rewrite loop detection, or a module failure like a Lua VM error. $upstream_status is empty or - because the request never reached an upstream. The error log contains [error] or [crit] entries from nginx.

Intercepted upstream error. When proxy_intercept_errors on or fastcgi_intercept_errors on is set, nginx intercepts non-2xx upstream responses and tries to serve a custom error page. If the error_page directive is missing or fails to load, nginx returns 500 instead of the original upstream status. An upstream 404 or 502 can appear as 500 in the access log.

Determine which path produced the 500. The fix differs in each case.

flowchart TD
    A[Client sees 500] --> B{upstream status is 500}
    B -->|Yes| C[Upstream application error]
    B -->|No| D{nginx error log has entries}
    D -->|Yes| E[nginx config or module error]
    D -->|No| F[Check intercept and error page config]
    C --> G[Check application logs]
    E --> H[Run nginx -t and review errors]
    F --> I[Review proxy intercept settings]

Common causes

CauseWhat it looks likeFirst thing to check
Upstream application errorAccess log shows 500 with $upstream_status 500; nginx error log is quiet or shows routine upstream messagesUpstream application logs and process state
nginx config or permission errorAccess log shows 500 with empty $upstream_status; error log contains nginx [error] or [crit]nginx -t output and error log for config or permission details
FastCGI/uWSGI/Lua module error500s isolated to dynamic content endpoints; error log references backend protocol or Lua VMBackend process logs (php-fpm, uWSGI) or Lua traces
Intercepted upstream error500s appear after enabling proxy_intercept_errors or fastcgi_intercept_errors; original upstream status was non-2xxerror_page directives and intercept settings

Quick checks

# Check error log for critical entries in the last 1000 lines
tail -1000 /var/log/nginx/error.log | grep -cE '\[(emerg|alert|crit)\]'

# Check 500 rate in access log (default combined format: $status is field 9)
tail -n 10000 /var/log/nginx/access.log | \
  awk '{if ($9 >= 500 && $9 < 600) count[$9]++}
       END {for (s in count) print s": "count[s]}'

# Test nginx configuration validity
nginx -t

# Count nginx child processes (includes workers, cache loader, and cache manager)
pgrep -c -P $(cat /var/run/nginx.pid)

# Verify master process is alive
kill -0 $(cat /var/run/nginx.pid) 2>/dev/null && echo "master alive" || echo "master dead"

# Check stub_status for active connections and request rate
curl -s http://127.0.0.1/nginx_status

# Compare file descriptor usage against limit for a child process
prlimit -n -p $(pgrep -P $(cat /var/run/nginx.pid) | head -1) 2>/dev/null

These commands are read-only. nginx -t parses configuration without applying changes. The pgrep command counts all children of the master, not only workers. A sudden drop in this count, combined with 500s, suggests workers are crashing from module errors or OOM kills.

How to diagnose it

  1. Quantify the scope. Determine whether 500s hit all endpoints or one location. Isolation points to a specific upstream pool, location rewrite rule, or backend application.

  2. Check $upstream_status against $status. If your log format includes $upstream_status, compare it to the status sent to the client. When both are 500, the upstream application generated the error. When $upstream_status is - or empty, nginx generated the 500 internally. When they differ, proxy_intercept_errors or an error_page directive may be masking the original code.

  3. Read the error log for the same time window. Look for [error], [crit], [alert], or [emerg] entries. If nginx is generating 500s, you will see permission denied, rewrite, or module errors here. If the upstream is failing, you may see upstream timed out or connect() failed.

  4. Validate configuration. Run nginx -t. A passing test does not guarantee runtime correctness, but a failing test immediately explains 500s from configuration that never loaded properly. A failed reload does not stop nginx; the previous config stays active, which can mask recent misconfigurations.

  5. Check upstream health directly. Probe upstream servers with curl or nc from the nginx host. If the upstream returns 500 independently, the problem is the application. If the upstream is healthy but nginx returns 500, the problem is in nginx or the transport path.

  6. Inspect FastCGI/uWSGI/Lua backends. For PHP-FPM, check php-fpm.log for worker crashes or slow requests. For uWSGI, check the master log. For Lua, look for VM initialization failures or runtime errors in the nginx error log. These backends often return 500 through nginx without nginx being misconfigured.

  7. Review resource limits. Check whether worker_connections or file descriptor limits are exhausted. While exhaustion more commonly causes silent connection drops, severe memory pressure inside a worker can cause allocation failures that manifest as 500s. Check dmesg for OOM killer activity targeting nginx workers.

Metrics and signals to monitor

SignalWhy it mattersWarning sign
HTTP 5xx response rateTracks server-side errors returned to clients>1% sustained 500s; >5% with traffic is PAGE-worthy
Error log rate and severityCaptures nginx internal failures that generate 500sAny [emerg], [alert], or [crit]; [error] rate >10/minute
$upstream_status vs $statusDistinguishes upstream errors from nginx internal errorsDiscrepancy indicates intercept_errors masking or nginx internal generation
Worker process countCrashing workers can drop requests or generate 500sCount below configured worker_processes sustained for >30s
File descriptor usageExhaustion prevents proper request handling>95% of limit with “too many open files” errors
Active connectionsConnection pileup can precede resource exhaustionApproaching worker_connections × worker_processes
Upstream response timeRising upstream latency often precedes upstream-generated 500sP95 trending up >20% from baseline

Fixes

Upstream application errors. Remove the failing backend from the upstream pool and reload nginx, or reduce its weight. If all backends are failing, serve stale cache with proxy_cache_use_stale if caching is configured, to buy time while the application is repaired.

nginx config or permission errors. Run nginx -t before reloading. Fix the offending directive, adjust permissions, or correct the rewrite rule. If a recent reload introduced the problem, remember that nginx does not stop serving on a failed reload; the old config stays active. Roll back and reload.

FastCGI/uWSGI/Lua failures. Restart the backend process pool. For PHP-FPM, a graceful restart often clears worker state. For uWSGI, check the master log for worker deaths. For Lua, check that the Lua VM has sufficient memory and that worker_rlimit_nofile is adequate for the module’s needs. If the Lua script is corrupt or incompatible with the nginx version, disable the location block and reload.

Intercepted upstream errors. If proxy_intercept_errors on or fastcgi_intercept_errors on is causing nginx to transform upstream errors into 500s, verify that error_page directives exist for the expected status codes. Without a valid error_page, nginx may return a generic 500 when intercepting a 404 or 502 from upstream. The tradeoff is between hiding upstream errors from clients (intercept on) and seeing the true status in access logs (intercept off).

Prevention

Log both $status and $upstream_status in your access log format. Without both, you cannot separate upstream-generated 500s from nginx-generated 500s during triage. Test every configuration change with nginx -t before applying it in production. Size worker_connections and worker_rlimit_nofile generously, and monitor the accepts - handled gap from stub_status to detect silent capacity issues before they generate errors. Keep backend health check endpoints independent of nginx so you can verify upstream state without relying on user traffic as probes.

How Netdata helps

  • Correlate 500 spikes with upstream response time, error log severity, and worker process state.
  • Track the accepts - handled gap and active connection utilization from stub_status.
  • Alert on 5xx rate thresholds and error log [emerg] / [crit] entries.
  • Monitor per-worker CPU and RSS alongside system memory and file descriptor limits to catch resource exhaustion before it causes 500s.
  • Compare $request_time against $upstream_response_time to isolate latency to nginx or upstream.