How MongoDB actually works in production: a mental model for operators

A write latency spike in MongoDB is rarely a query problem. It is more often a storage journal stall, cache eviction cascade, ticket exhaustion, or replication flow-control throttle. Operators who treat MongoDB as a monolithic document database often tune indexes while the real bottleneck is a WiredTiger checkpoint falling behind.

This guide is a runtime mental model for the subsystems that matter in production: the WiredTiger storage engine, replication state machine, thread-per-connection execution model, and sharding coordination layer. Understanding how these compete for memory, disk, tickets, and connections lets you read signals correctly instead of guessing.

What it is and why it matters

MongoDB is not a single process that uniformly slows down under load. It is a set of distinct subsystems that share finite resources. WiredTiger manages an uncompressed in-memory cache separate from the OS page cache. The replication subsystem maintains a capped oplog that determines how long a secondary can survive an outage before it needs a full resync. The connection layer allocates roughly one megabyte of stack memory per client thread. In sharded deployments, the config server replica set holds the cluster metadata that mongos routers cache.

Each subsystem has its own saturation point and failure mode. A secondary that falls off the oplog is a capacity problem, not a replication bug. A primary that stalls writes during a long checkpoint is a storage flush problem, not a query planner regression. An operator without this model reaches for db.currentOp(), kills a slow query, and wonders why latency does not improve. The answer is usually that the bottleneck is in the storage engine, not the query layer.

How it works

WiredTiger storage engine

WiredTiger is the default and only production storage engine in modern MongoDB. It maintains an in-memory cache sized by default to the larger of 50 percent of RAM minus 1 GB, or 256 MB. This cache holds documents and indexes in an uncompressed B-tree format, while on-disk data is compressed using snappy by default, or zstd for time-series collections in MongoDB 7.0 and later. Because the cache is uncompressed, its footprint can substantially exceed the compressed on-disk size.

The engine uses multiversion concurrency control (MVCC). Every write creates a new version of a document in cache. Old versions are retained until no transaction or cursor needs them, which means long-running operations can silently inflate cache pressure by pinning old snapshots.

Checkpoints flush dirty pages to disk every 60 seconds by default. Between checkpoints, WiredTiger relies on a write-ahead journal synced at configurable intervals, with a default of 100 ms. Journal writes are small sequential I/O, and their latency is one of the purest measures of storage health.

Admission to the storage engine is controlled by read and write tickets. In MongoDB 6.0 and earlier, the default is 128 read tickets and 128 write tickets. MongoDB 7.0 introduced a dynamic adjustment algorithm that starts low and scales up, capped at 128. MongoDB 8.0 renamed the counters to queues.execution and added a separate queues.ingress metric. Every operation that touches storage must acquire a ticket. When tickets are exhausted, new operations queue at the engine layer.

Eviction threads work in the background to keep cache usage below thresholds. Background eviction begins when cache fill reaches roughly 80 percent (the eviction target). Aggressive eviction starts at 95 percent (the eviction trigger). Dirty page eviction targets 5 percent and triggers aggressively at 20 percent of cache. If background eviction cannot keep up, application threads are forced to evict pages themselves. That transition is when user-visible latency spikes appear.

Replication

All write operations are recorded sequentially in the oplog, a capped collection named local.oplog.rs on the primary. Secondaries tail this oplog and apply entries to maintain consistency. Secondaries apply oplog entries in batches. While operations to different collections or unrelated documents can be parallelized, entries with dependencies are applied serially to maintain order. A write-heavy workload with contended document keys therefore limits secondary apply parallelism and can inflate lag even when the secondary has ample CPU and disk.

The oplog window is the time span between its oldest and newest entries. If a secondary falls behind by more than this window, it cannot catch up and must perform a full initial sync, which can take hours or days.

Replica set members exchange heartbeats every 2 seconds. If a member does not respond within electionTimeoutMillis (10 seconds by default), the remaining members trigger an election. Elections typically complete in 2 to 12 seconds, during which writes are unavailable. A new primary can optionally wait for secondaries to catch up recent writes before fully stepping up, governed by catchUpTimeoutMillis.

Starting in MongoDB 4.2, flow control actively throttles the primary’s write rate when secondaries fall behind, preventing oplog window collapse by slowing the source rather than letting the consumer fail.

Connection handling

MongoDB uses a thread-per-connection model. Each client connection is assigned a dedicated thread with roughly a 1 MB stack. Ten thousand connections therefore consume roughly 10 GB of RAM in thread stacks alone, before any data is touched. The default maxIncomingConnections is 65,536, but practical limits are usually set by file descriptors, memory, or ticket contention long before that theoretical cap is reached. MongoDB does support an adaptive service executor, but the thread-per-connection model remains the most common deployment configuration.

Sharding

In a sharded cluster, mongos acts as a stateless query router. It reads chunk-to-shard mappings from the config server replica set (CSRS) and routes operations accordingly. The balancer runs on the primary of the config server replica set and moves chunks between shards to maintain even distribution. Chunk migrations are expensive two-phase operations that copy documents, delete from the source, and hold range locks during the critical section.

The config server replica set stores the authoritative mapping of chunks to shards. Each mongos caches this metadata. If the config servers lose their majority, the cluster halts splits and migrations, though existing routing continues from cached metadata until the cache expires or the mongos restarts. Since MongoDB 8.0, direct connections to shard nodes for maintenance are restricted; operators must route through mongos or use a whitelisted maintenance role.

flowchart TD
    Client[Client] -->|request| Mongos[mongos]
    Mongos -->|route| mongod[mongod]
    mongod -->|~1MB stack| Thread[Thread-per-connection]
    Thread -->|acquire| Ticket[WiredTiger Ticket]
    Ticket -->|admit| WT[WiredTiger Cache MVCC]
    WT -->|flush| Disk[Checkpoint 60s
Journal 100ms] WT -->|evict| Evict[Eviction at 80% / 95%] mongod -->|generate| Oplog[local.oplog.rs] Oplog -->|tail| Repl[Secondary
heartbeat 2s] Mongos -->|read map| Config[Config Server] Config -->|schedule| Balancer[Balancer]

Where it shows up in production

The mental model changes based on deployment topology. A standalone node has no replication signals and should never be used in production. A replica set adds oplog, election, and lag monitoring. A sharded cluster multiplies all replica set signals by the number of shards and adds mongos, config server, balancer, and chunk migration signals.

WiredTiger versus the in-memory engine is a completely different failure domain. In-memory has no journal, no checkpoints, and no disk I/O signals; its failure mode is out-of-memory kill rather than I/O stall.

Version differences alter behavior materially. MongoDB 7.0 introduced dynamic ticket scaling, so post-upgrade ticket utilization looks lower by design. MongoDB 8.0 replaced wiredTiger.concurrentTransactions with queues.execution, changed null versus undefined equality matching, and blocked most direct shard connections. MongoDB 6.1 and later always enable journaling; the storage.journal.enabled option is removed.

Multi-document transactions, introduced in MongoDB 4.0, add another dimension. They pin WiredTiger snapshots, hold locks, consume oplog space, and can block replication until committed. A single long-running transaction is often the hidden root cause of cache pressure that appears to come from nowhere.

Production failures usually follow a small set of interaction patterns. A cache pressure cascade begins when dirty data accumulates faster than the 60-second checkpoint interval and background eviction can absorb. An oplog window collapse occurs when primary write volume outpaces secondary apply rate, shrinking the safety margin until a secondary must perform a full initial sync. A connection storm spiral starts with a network blip or election, triggers mass reconnection from connection pools, and creates a positive feedback loop of thread creation, memory pressure, and ticket contention. A checkpoint stall happens when checkpoint duration approaches or exceeds the interval, allowing dirty data to accumulate faster than it can be flushed. Application threads eventually block on eviction, freezing writes. Silent index regression, election storms, and sharding hot shards each map to a specific subsystem misunderstanding: query planner behavior, heartbeat timeout sensitivity, or shard key cardinality.

Tradeoffs and common misuses

WiredTiger cache oversizing. The default cache formula is a safe starting point because data in cache is uncompressed. Raising the cache beyond the default can starve the OS page cache and leave insufficient memory for connection threads and internal overhead. In containerized environments, mongod may detect host RAM instead of cgroup limits. Always set --wiredTigerCacheSizeGB explicitly when running in containers.

Monitoring cache fill but ignoring dirty ratio. A cache at 80 percent fill with 2 percent dirty is healthy. A cache at 70 percent fill with 18 percent dirty is approaching a checkpoint stall. Dirty ratio is the stronger leading indicator.

Treating connection count as unlimited. Thread-per-connection means memory and scheduling overhead scale linearly with connections. High connection counts also inflate ticket contention because more threads compete for the same read and write tickets.

Assuming w:1 is safe. Write concern w:1 means the primary accepted the write in memory. It does not guarantee the write is journaled or replicated. A crash or failover can silently lose those writes.

Routing maintenance commands directly to shards. Starting in MongoDB 8.0, direct connections to shard nodes are restricted. Scripts that bypass mongos will break. Route through mongos or grant the maintenance-only direct-shard-operations role.

Running without trending the oplog window. Teams size the oplog at deployment and forget. Organic write growth shrinks the window month by month until a maintenance event forces a full resync.

Signals to watch in production

SignalWhy it mattersWarning sign
WiredTiger cache dirty ratioPredicts checkpoint stall risk before user latency degradesSustained >15%, trending toward the 20% eviction trigger
WiredTiger available ticketsStorage engine admission control; exhaustion causes global queuingAvailable drops below 25% of total during peak load
Journal sync latencyDirect measure of storage health; rises before application latency degradesAverage >30 ms sustained
Replication lag versus oplog windowDetermines how long a secondary can survive before full resyncLag exceeds 25% of the current oplog window
Checkpoint durationIf it exceeds the 60-second interval, dirty data accumulates uncontrollablySustained >30 seconds, or trending upward over days
Connection churn (totalCreated delta)Reconnection storms generate thread-creation overhead and memory spikestotalCreated rate spikes without proportional increase in current
Application-thread evictionsThe moment background eviction fails and user threads do the workAny sustained nonzero rate of pages evicted by application threads

How Netdata helps

Netdata collects and correlates these signals so you can debug without manually sampling serverStatus() deltas across nodes.

  • WiredTiger cache fill, dirty ratio, and eviction rates are plotted alongside OS disk I/O latency and memory usage. This distinguishes cache-pressure cascades from storage-device failures.
  • Ticket availability and globalLock.currentQueue depths are tracked with opLatencies. When reads and writes spike together, the correlation points to storage engine saturation rather than a single bad query.
  • Replication lag, oplog window hours, and election events are monitored across replica set members in one view, revealing when flow control throttles the primary.
  • In sharded clusters, per-shard CPU, disk, and latency metrics surface hot shards that aggregate dashboards hide.

No related guides are available in this section yet.