nginx 500 Internal Server Error: how to diagnose it

A 500 from nginx tells you something failed in the request path, but not whether the failure originated in your application, FastCGI/uWSGI backend, or nginx itself. When 500s spike during an incident, first determine which side of the nginx boundary is breaking.

Unlike 502 Bad Gateway or 504 Gateway Time-out, which point upstream, a 500 can be an application bug passed through by nginx, a configuration error, a permission failure, or resource exhaustion inside an nginx worker.

If your log format includes $upstream_status, you can separate upstream-generated 500s from nginx-generated 500s. If you rely on the default access log alone, both paths look identical.

What this means

In production, a 500 means one of three things.

Upstream application error. The application returned 500 and nginx passed it through. The access log shows status 500 and $upstream_status 500. The nginx error log is often silent; the application logs contain the crash, exception, or assertion.

nginx internal error. Nginx generated the 500 because of a runtime configuration failure, permission denied during request processing, rewrite loop detection, or a module failure like a Lua VM error. $upstream_status is empty or - because the request never reached an upstream. The error log contains [error] or [crit] entries from nginx.

Intercepted upstream error. When proxy_intercept_errors on or fastcgi_intercept_errors on is set, nginx intercepts non-2xx upstream responses and tries to serve a custom error page. If the error_page directive is missing or fails to load, nginx returns 500 instead of the original upstream status. An upstream 404 or 502 can appear as 500 in the access log.

Determine which path produced the 500. The fix differs in each case.

flowchart TD
    A[Client sees 500] --> B{upstream status is 500}
    B -->|Yes| C[Upstream application error]
    B -->|No| D{nginx error log has entries}
    D -->|Yes| E[nginx config or module error]
    D -->|No| F[Check intercept and error page config]
    C --> G[Check application logs]
    E --> H[Run nginx -t and review errors]
    F --> I[Review proxy intercept settings]

Common causes

Cause	What it looks like	First thing to check
Upstream application error	Access log shows 500 with `$upstream_status` 500; nginx error log is quiet or shows routine upstream messages	Upstream application logs and process state
nginx config or permission error	Access log shows 500 with empty `$upstream_status`; error log contains nginx `[error]` or `[crit]`	`nginx -t` output and error log for config or permission details
FastCGI/uWSGI/Lua module error	500s isolated to dynamic content endpoints; error log references backend protocol or Lua VM	Backend process logs (php-fpm, uWSGI) or Lua traces
Intercepted upstream error	500s appear after enabling `proxy_intercept_errors` or `fastcgi_intercept_errors`; original upstream status was non-2xx	`error_page` directives and intercept settings

Quick checks

# Check error log for critical entries in the last 1000 lines
tail -1000 /var/log/nginx/error.log | grep -cE '\[(emerg|alert|crit)\]'

# Check 500 rate in access log (default combined format: $status is field 9)
tail -n 10000 /var/log/nginx/access.log | \
  awk '{if ($9 >= 500 && $9 < 600) count[$9]++}
       END {for (s in count) print s": "count[s]}'

# Test nginx configuration validity
nginx -t

# Count nginx child processes (includes workers, cache loader, and cache manager)
pgrep -c -P $(cat /var/run/nginx.pid)

# Verify master process is alive
kill -0 $(cat /var/run/nginx.pid) 2>/dev/null && echo "master alive" || echo "master dead"

# Check stub_status for active connections and request rate
curl -s http://127.0.0.1/nginx_status

# Compare file descriptor usage against limit for a child process
prlimit -n -p $(pgrep -P $(cat /var/run/nginx.pid) | head -1) 2>/dev/null

These commands are read-only. nginx -t parses configuration without applying changes. The pgrep command counts all children of the master, not only workers. A sudden drop in this count, combined with 500s, suggests workers are crashing from module errors or OOM kills.

How to diagnose it

Quantify the scope. Determine whether 500s hit all endpoints or one location. Isolation points to a specific upstream pool, location rewrite rule, or backend application.
Check $upstream_status against $status. If your log format includes $upstream_status, compare it to the status sent to the client. When both are 500, the upstream application generated the error. When $upstream_status is - or empty, nginx generated the 500 internally. When they differ, proxy_intercept_errors or an error_page directive may be masking the original code.
Read the error log for the same time window. Look for [error], [crit], [alert], or [emerg] entries. If nginx is generating 500s, you will see permission denied, rewrite, or module errors here. If the upstream is failing, you may see upstream timed out or connect() failed.
Validate configuration. Run nginx -t. A passing test does not guarantee runtime correctness, but a failing test immediately explains 500s from configuration that never loaded properly. A failed reload does not stop nginx; the previous config stays active, which can mask recent misconfigurations.
Check upstream health directly. Probe upstream servers with curl or nc from the nginx host. If the upstream returns 500 independently, the problem is the application. If the upstream is healthy but nginx returns 500, the problem is in nginx or the transport path.
Inspect FastCGI/uWSGI/Lua backends. For PHP-FPM, check php-fpm.log for worker crashes or slow requests. For uWSGI, check the master log. For Lua, look for VM initialization failures or runtime errors in the nginx error log. These backends often return 500 through nginx without nginx being misconfigured.
Review resource limits. Check whether worker_connections or file descriptor limits are exhausted. While exhaustion more commonly causes silent connection drops, severe memory pressure inside a worker can cause allocation failures that manifest as 500s. Check dmesg for OOM killer activity targeting nginx workers.

Metrics and signals to monitor

Signal	Why it matters	Warning sign
HTTP 5xx response rate	Tracks server-side errors returned to clients	>1% sustained 500s; >5% with traffic is PAGE-worthy
Error log rate and severity	Captures nginx internal failures that generate 500s	Any `[emerg]`, `[alert]`, or `[crit]`; `[error]` rate >10/minute
`$upstream_status` vs `$status`	Distinguishes upstream errors from nginx internal errors	Discrepancy indicates intercept_errors masking or nginx internal generation
Worker process count	Crashing workers can drop requests or generate 500s	Count below configured `worker_processes` sustained for >30s
File descriptor usage	Exhaustion prevents proper request handling	>95% of limit with “too many open files” errors
Active connections	Connection pileup can precede resource exhaustion	Approaching `worker_connections × worker_processes`
Upstream response time	Rising upstream latency often precedes upstream-generated 500s	P95 trending up >20% from baseline

Fixes

Upstream application errors. Remove the failing backend from the upstream pool and reload nginx, or reduce its weight. If all backends are failing, serve stale cache with proxy_cache_use_stale if caching is configured, to buy time while the application is repaired.

nginx config or permission errors. Run nginx -t before reloading. Fix the offending directive, adjust permissions, or correct the rewrite rule. If a recent reload introduced the problem, remember that nginx does not stop serving on a failed reload; the old config stays active. Roll back and reload.

FastCGI/uWSGI/Lua failures. Restart the backend process pool. For PHP-FPM, a graceful restart often clears worker state. For uWSGI, check the master log for worker deaths. For Lua, check that the Lua VM has sufficient memory and that worker_rlimit_nofile is adequate for the module’s needs. If the Lua script is corrupt or incompatible with the nginx version, disable the location block and reload.

Intercepted upstream errors. If proxy_intercept_errors on or fastcgi_intercept_errors on is causing nginx to transform upstream errors into 500s, verify that error_page directives exist for the expected status codes. Without a valid error_page, nginx may return a generic 500 when intercepting a 404 or 502 from upstream. The tradeoff is between hiding upstream errors from clients (intercept on) and seeing the true status in access logs (intercept off).

Prevention

Log both $status and $upstream_status in your access log format. Without both, you cannot separate upstream-generated 500s from nginx-generated 500s during triage. Test every configuration change with nginx -t before applying it in production. Size worker_connections and worker_rlimit_nofile generously, and monitor the accepts - handled gap from stub_status to detect silent capacity issues before they generate errors. Keep backend health check endpoints independent of nginx so you can verify upstream state without relying on user traffic as probes.

How Netdata helps

Correlate 500 spikes with upstream response time, error log severity, and worker process state.
Track the accepts - handled gap and active connection utilization from stub_status.
Alert on 5xx rate thresholds and error log [emerg] / [crit] entries.
Monitor per-worker CPU and RSS alongside system memory and file descriptor limits to catch resource exhaustion before it causes 500s.
Compare $request_time against $upstream_response_time to isolate latency to nginx or upstream.

The Netdata solution

Web server monitoring with Netdata

Netdata monitors NGINX with per-second request, connection, and latency metrics plus ML anomaly detection. Correlate connection and file-descriptor exhaustion, upstream cascade failures, buffer spill, and TLS CPU with the host signals behind them.

See web server monitoring → Start monitoring free

nginx 500 Internal Server Error: how to diagnose it

nginx 500 Internal Server Error: how to diagnose it

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

Prevention

How Netdata helps

Related guides

Web server monitoring with Netdata