nginx 500 Internal Server Error: how to diagnose it
A 500 from nginx tells you something failed in the request path, but not whether the failure originated in your application, FastCGI/uWSGI backend, or nginx itself. When 500s spike during an incident, first determine which side of the nginx boundary is breaking.
Unlike 502 Bad Gateway or 504 Gateway Time-out, which point upstream, a 500 can be an application bug passed through by nginx, a configuration error, a permission failure, or resource exhaustion inside an nginx worker.
If your log format includes $upstream_status, you can separate upstream-generated 500s from nginx-generated 500s. If you rely on the default access log alone, both paths look identical.
What this means
In production, a 500 means one of three things.
Upstream application error. The application returned 500 and nginx passed it through. The access log shows status 500 and $upstream_status 500. The nginx error log is often silent; the application logs contain the crash, exception, or assertion.
nginx internal error. Nginx generated the 500 because of a runtime configuration failure, permission denied during request processing, rewrite loop detection, or a module failure like a Lua VM error. $upstream_status is empty or - because the request never reached an upstream. The error log contains [error] or [crit] entries from nginx.
Intercepted upstream error. When proxy_intercept_errors on or fastcgi_intercept_errors on is set, nginx intercepts non-2xx upstream responses and tries to serve a custom error page. If the error_page directive is missing or fails to load, nginx returns 500 instead of the original upstream status. An upstream 404 or 502 can appear as 500 in the access log.
Determine which path produced the 500. The fix differs in each case.
flowchart TD
A[Client sees 500] --> B{upstream status is 500}
B -->|Yes| C[Upstream application error]
B -->|No| D{nginx error log has entries}
D -->|Yes| E[nginx config or module error]
D -->|No| F[Check intercept and error page config]
C --> G[Check application logs]
E --> H[Run nginx -t and review errors]
F --> I[Review proxy intercept settings]Common causes
| Cause | What it looks like | First thing to check |
|---|---|---|
| Upstream application error | Access log shows 500 with $upstream_status 500; nginx error log is quiet or shows routine upstream messages | Upstream application logs and process state |
| nginx config or permission error | Access log shows 500 with empty $upstream_status; error log contains nginx [error] or [crit] | nginx -t output and error log for config or permission details |
| FastCGI/uWSGI/Lua module error | 500s isolated to dynamic content endpoints; error log references backend protocol or Lua VM | Backend process logs (php-fpm, uWSGI) or Lua traces |
| Intercepted upstream error | 500s appear after enabling proxy_intercept_errors or fastcgi_intercept_errors; original upstream status was non-2xx | error_page directives and intercept settings |
Quick checks
# Check error log for critical entries in the last 1000 lines
tail -1000 /var/log/nginx/error.log | grep -cE '\[(emerg|alert|crit)\]'
# Check 500 rate in access log (default combined format: $status is field 9)
tail -n 10000 /var/log/nginx/access.log | \
awk '{if ($9 >= 500 && $9 < 600) count[$9]++}
END {for (s in count) print s": "count[s]}'
# Test nginx configuration validity
nginx -t
# Count nginx child processes (includes workers, cache loader, and cache manager)
pgrep -c -P $(cat /var/run/nginx.pid)
# Verify master process is alive
kill -0 $(cat /var/run/nginx.pid) 2>/dev/null && echo "master alive" || echo "master dead"
# Check stub_status for active connections and request rate
curl -s http://127.0.0.1/nginx_status
# Compare file descriptor usage against limit for a child process
prlimit -n -p $(pgrep -P $(cat /var/run/nginx.pid) | head -1) 2>/dev/null
These commands are read-only. nginx -t parses configuration without applying changes. The pgrep command counts all children of the master, not only workers. A sudden drop in this count, combined with 500s, suggests workers are crashing from module errors or OOM kills.
How to diagnose it
Quantify the scope. Determine whether 500s hit all endpoints or one location. Isolation points to a specific upstream pool, location rewrite rule, or backend application.
Check
$upstream_statusagainst$status. If your log format includes$upstream_status, compare it to the status sent to the client. When both are 500, the upstream application generated the error. When$upstream_statusis-or empty, nginx generated the 500 internally. When they differ,proxy_intercept_errorsor anerror_pagedirective may be masking the original code.Read the error log for the same time window. Look for
[error],[crit],[alert], or[emerg]entries. If nginx is generating 500s, you will see permission denied, rewrite, or module errors here. If the upstream is failing, you may seeupstream timed outorconnect() failed.Validate configuration. Run
nginx -t. A passing test does not guarantee runtime correctness, but a failing test immediately explains 500s from configuration that never loaded properly. A failed reload does not stop nginx; the previous config stays active, which can mask recent misconfigurations.Check upstream health directly. Probe upstream servers with
curlorncfrom the nginx host. If the upstream returns 500 independently, the problem is the application. If the upstream is healthy but nginx returns 500, the problem is in nginx or the transport path.Inspect FastCGI/uWSGI/Lua backends. For PHP-FPM, check
php-fpm.logfor worker crashes or slow requests. For uWSGI, check the master log. For Lua, look for VM initialization failures or runtime errors in the nginx error log. These backends often return 500 through nginx without nginx being misconfigured.Review resource limits. Check whether
worker_connectionsor file descriptor limits are exhausted. While exhaustion more commonly causes silent connection drops, severe memory pressure inside a worker can cause allocation failures that manifest as 500s. Checkdmesgfor OOM killer activity targeting nginx workers.
Metrics and signals to monitor
| Signal | Why it matters | Warning sign |
|---|---|---|
| HTTP 5xx response rate | Tracks server-side errors returned to clients | >1% sustained 500s; >5% with traffic is PAGE-worthy |
| Error log rate and severity | Captures nginx internal failures that generate 500s | Any [emerg], [alert], or [crit]; [error] rate >10/minute |
$upstream_status vs $status | Distinguishes upstream errors from nginx internal errors | Discrepancy indicates intercept_errors masking or nginx internal generation |
| Worker process count | Crashing workers can drop requests or generate 500s | Count below configured worker_processes sustained for >30s |
| File descriptor usage | Exhaustion prevents proper request handling | >95% of limit with “too many open files” errors |
| Active connections | Connection pileup can precede resource exhaustion | Approaching worker_connections × worker_processes |
| Upstream response time | Rising upstream latency often precedes upstream-generated 500s | P95 trending up >20% from baseline |
Fixes
Upstream application errors. Remove the failing backend from the upstream pool and reload nginx, or reduce its weight. If all backends are failing, serve stale cache with proxy_cache_use_stale if caching is configured, to buy time while the application is repaired.
nginx config or permission errors. Run nginx -t before reloading. Fix the offending directive, adjust permissions, or correct the rewrite rule. If a recent reload introduced the problem, remember that nginx does not stop serving on a failed reload; the old config stays active. Roll back and reload.
FastCGI/uWSGI/Lua failures. Restart the backend process pool. For PHP-FPM, a graceful restart often clears worker state. For uWSGI, check the master log for worker deaths. For Lua, check that the Lua VM has sufficient memory and that worker_rlimit_nofile is adequate for the module’s needs. If the Lua script is corrupt or incompatible with the nginx version, disable the location block and reload.
Intercepted upstream errors. If proxy_intercept_errors on or fastcgi_intercept_errors on is causing nginx to transform upstream errors into 500s, verify that error_page directives exist for the expected status codes. Without a valid error_page, nginx may return a generic 500 when intercepting a 404 or 502 from upstream. The tradeoff is between hiding upstream errors from clients (intercept on) and seeing the true status in access logs (intercept off).
Prevention
Log both $status and $upstream_status in your access log format. Without both, you cannot separate upstream-generated 500s from nginx-generated 500s during triage. Test every configuration change with nginx -t before applying it in production. Size worker_connections and worker_rlimit_nofile generously, and monitor the accepts - handled gap from stub_status to detect silent capacity issues before they generate errors. Keep backend health check endpoints independent of nginx so you can verify upstream state without relying on user traffic as probes.
How Netdata helps
- Correlate 500 spikes with upstream response time, error log severity, and worker process state.
- Track the
accepts - handledgap and active connection utilization fromstub_status. - Alert on 5xx rate thresholds and error log
[emerg]/[crit]entries. - Monitor per-worker CPU and RSS alongside system memory and file descriptor limits to catch resource exhaustion before it causes 500s.
- Compare
$request_timeagainst$upstream_response_timeto isolate latency to nginx or upstream.
Related guides
- How NGINX actually works in production: a mental model for operators
- nginx 502 Bad Gateway: causes and how to fix it
- nginx 503 Service Temporarily Unavailable: causes and fixes
- nginx 504 Gateway Time-out: causes and fixes
- NGINX active connections climbing: reading, writing, waiting explained
- NGINX backend cascade failure: when slow upstreams take down everything
- nginx connect() failed (111: Connection refused) while connecting to upstream
- NGINX connection exhaustion: detection, diagnosis, and prevention
- NGINX dropped connections: the accepts vs handled gap
- NGINX monitoring checklist: the signals every production server needs
- NGINX monitoring maturity model: from survival to expert
- nginx no live upstreams while connecting to upstream: what it means







