NGINX SSL session cache: improving TLS resumption and cutting CPU
TLS handshakes are the most CPU-intensive routine operation in nginx. A worker terminating RSA-2048 TLS can handle only a few hundred full handshakes per second, compared to tens of thousands of plain HTTP requests. In production, every reconnecting client that repeats a full handshake wastes CPU and adds latency. The ssl_session_cache directive exists to eliminate that waste by allowing session resumption across connections. Yet many configurations either omit it, under-size it, or misunderstand how TLS 1.3 changes resumption behavior. This article explains the mechanism, sizing, and the signals that tell you whether your cache is working.
What it is and why it matters
Without session resumption, every new TLS connection performs a full handshake. That means asymmetric cryptography, certificate chain verification, and key exchange. For high-churn workloads, such as APIs with short-lived connections, mobile clients, or CDN origin pulls, this cost accumulates fast. The result is SSL termination overload: workers peg CPU, the event loop slows, and latency rises for all requests.
Session caching mitigates this by storing negotiated session parameters so that compatible clients can resume with an abbreviated handshake on their next visit. nginx supports two mechanisms: a shared-memory session ID cache (shared:NAME:SIZE) and session tickets (ssl_session_tickets). The shared cache is visible to all workers, while tickets are self-contained encrypted blobs issued to clients. Both reduce CPU, but they behave differently under TLS 1.2 and TLS 1.3, and they share the same memory zone in modern nginx.
How it works
flowchart TD
Client([Client]) -->|New connection| Check{Session ID or ticket presented?}
Check -->|No| Full[Full TLS handshake]
Check -->|Yes| Lookup{Cache lookup or ticket validation}
Lookup -->|Valid| Resume[Resumed handshake]
Lookup -->|Invalid| Full
Full -->|Update shared cache| Cache[(ssl_session_cache shared:SSL)]
Resume -->|Low CPU| Worker[nginx worker]
Full -->|High CPU| WorkerThe directive syntax includes off, none, and the two storage variants. The production pattern is shared:SSL:SIZE alone. A per-worker OpenSSL cache exists, but it fragments memory across processes and prevents workers from resuming sessions negotiated by their peers, so avoid it in multi-worker deployments.
A shared cache of 1 MB holds approximately 4000 sessions, at roughly 262 bytes per entry. The default ssl_session_timeout is five minutes. To size the zone, use:
required_MB = new_tls_sessions_per_second * ssl_session_timeout_seconds / 4000
For example, 1000 new sessions per second with a ten-minute timeout needs roughly 150 MB. Undersizing causes eviction, which manifests as a dropping hit rate and rising CPU.
To measure effectiveness, add $ssl_session_reused to your log format. It returns "r" for resumed and "." for full handshakes. Calculate hit rate by dividing resumed sessions by total TLS connections over a stable window. Because a cold cache produces a temporary floor, measure at least one full ssl_session_timeout after startup or reload to get a baseline. A healthy production deployment should see a hit rate above 90%. Below 80% is worth investigating; below 50% means the cache is either cold, too small, or unreachable.
Session tickets enable stateless resumption. The server encrypts session state into a ticket sent to the client, which presents it on reconnect. With ssl_session_tickets on, nginx stores ticket keys in the shared zone so all workers can issue and validate them consistently. If you disable tickets, TLS 1.2 clients must rely on the shared ID cache, which increases memory pressure. In TLS 1.3, disabling tickets disables resumption entirely, because TLS 1.3 uses tickets exclusively.
Where it shows up in production
Cold starts. After an nginx restart, the shared cache is empty. Every reconnecting client performs a full handshake until the cache warms. This produces a predictable CPU spike that decays over the first few thousand connections. If you see sustained high CPU after the cache should be warm, the zone is likely undersized or sessions are expiring too quickly.
Reload vs. restart. The shared cache survives nginx -s reload because the master process retains the zone. It is lost only on full process restart. If CPU spikes after a restart but not after a reload, the cache is behaving as expected.
TLS 1.3-only edge termination. If your nginx handles only TLS 1.3, sizing the cache for thousands of session IDs is largely wasted. The zone still needs enough space for ticket keys and metadata, but the dominant CPU win comes from ticket-based resumption, not from session ID lookup.
Vhost consolidation. Multiple server blocks can reference the same shared cache name (shared:SSL:10m). Subsequent declarations retrieve the existing zone rather than creating a new one, and conflicting sizes are silently ignored. This means the first declared size wins for all vhosts. Audit your configuration with nginx -T | grep ssl_session_cache to detect duplicates that might under-size a vhost.
Tradeoffs and when to use it
Shared cache sizing. A larger cache retains more sessions and improves hit rate, but consumes resident memory that is no longer available for connection buffers or the operating system page cache. Size for your peak new-session rate multiplied by your desired reuse window, then add headroom.
Ticket key synchronization. Without a shared cache, each worker generates independent ticket keys. A client that receives a ticket from one worker may fail resumption when reconnecting to another. This manifests as an unexpectedly low hit rate on multi-worker deployments even when tickets are enabled. The shared zone is required to synchronize keys across workers.
Session tickets vs. session IDs. Disabling session tickets forces TLS 1.2 clients to rely on the shared ID cache. This increases memory pressure and reduces resumption rates for mobile clients that rotate network interfaces. In TLS 1.3, disabling tickets disables resumption entirely. TLS 1.3 early data (0-RTT) is possible but introduces replay risk and is not enabled by default. If you disable tickets, ensure the shared cache is large enough to absorb the entire ID-based workload.
Timeout selection. The default ssl_session_timeout of five minutes is conservative. Production deployments often use 10m, 1h, or 1d depending on client behavior. Longer timeouts reduce CPU but increase the memory footprint and widen the window for session replay. Adjust to match your security model and connection churn rate.
Expired session retention. nginx does not actively purge expired sessions from the shared cache. They remain until evicted by LRU churn. For deployments that rely solely on ID-based resumption, stale session data persists beyond the timeout. This wastes zone capacity but does not affect performance beyond reducing available space for valid sessions.
Avoid mixing cache types. Combining a per-worker cache with a shared zone is less efficient than using the shared zone alone.
Signals to watch in production
| Signal | Why it matters | Warning sign |
|---|---|---|
SSL session cache hit rate ($ssl_session_reused) | Directly measures how many connections avoid a full handshake | Sustained below 80%; sudden drop after a config change |
| Worker CPU utilization per process | High CPU with high connection rate indicates handshake saturation | Per-worker CPU above 80% sustained with elevated new connection rate |
| New connection rate vs. request rate | Handshake storms show high connection rate with stagnant request throughput | Connection rate spikes but requests per second do not |
accepts - handled gap | Dropped connections under load can follow CPU saturation if workers cannot keep up | Gap increasing for more than 60 seconds |
TLS version distribution ($ssl_protocol) | TLS 1.3-only deployments benefit less from ID cache sizing; signal guides capacity decisions | Sudden shift to TLS 1.2 may increase cache pressure unexpectedly |
How Netdata helps
- Correlate per-worker CPU utilization with SSL session cache hit rate to distinguish handshake saturation from application-level CPU load.
- Track the
accepts - handledgap alongside connection rate to detect admission loss that precedes visible errors. - Monitor TLS version distribution shifts that change the effective value of your session cache sizing.
- Alert on dropping SSL session cache hit rate before CPU saturation becomes visible.
- Compare hit rate against time since process start to separate cold-start behavior from chronic undersizing.
Related guides
- How NGINX actually works in production: a mental model for operators
- nginx 413 Request Entity Too Large: client_max_body_size explained
- nginx 499 status code: why clients close connections before the response
- nginx 500 Internal Server Error: how to diagnose it
- nginx 502 Bad Gateway: causes and how to fix it
- nginx 503 Service Temporarily Unavailable: causes and fixes
- nginx 504 Gateway Time-out: causes and fixes
- NGINX active connections climbing: reading, writing, waiting explained
- NGINX backend cascade failure: when slow upstreams take down everything
- nginx: a client request body is buffered to a temporary file - what it means
- nginx connect() failed (111: Connection refused) while connecting to upstream
- NGINX connection exhaustion: detection, diagnosis, and prevention







