Troubleshooting

10 Real-World PostgreSQL Deadlock Stack Traces and How They Were Resolved

From classic update order conflicts to subtle index contention- learn from real production incidents and their fixes

10 Real-World PostgreSQL Deadlock Stack Traces and How They Were Resolved

It’s 3 AM. The pager screams. Your application is throwing a cascade of errors, and users are reporting that the system is completely frozen. You dive into the logs and see the same ominous message repeating over and over: ERROR: deadlock detected. A PostgreSQL deadlock is one of the most abrupt and disruptive failures a database can experience. It’s not a performance degradation; it’s a hard stop where two or more transactions are locked in a fatal embrace, each waiting for a resource the other holds.

PostgreSQL is smart enough to break the deadlock by terminating one of the transactions, but this is cold comfort. The incident_postmortem reveals a failed operation, a frustrated user, and the unsettling fact that it could happen again at any moment. To truly solve the problem, you need to go beyond just restarting the process. You need to perform a lock_tree_analysis on the stack_trace_postgres provided in the logs and understand the fundamental deadlock_root_cause. This guide presents 10 real-world postgres_deadlock_example scenarios, showing you how to dissect the evidence and apply the right deadlock_resolution.

The Tools of the Investigation

Before diving into the case studies, remember the primary tools for a live postgres_debug session:

  • The Deadlock Log: PostgreSQL’s error log is your most important piece of evidence. It provides the stack_trace_postgres, showing the PIDs of the blocking_process and the exact queries involved.
  • pg_stat_activity: This view shows you what every process is currently doing, including its wait_event. A pg_stat_activity_trace is crucial for understanding the context of a live contention issue.
  • pg_locks: This view provides a detailed, low-level look at all the locks currently held and awaited in the system. A pg_locks_analysis can help unravel complex lock chains.

Let’s examine the evidence from ten real deadlock_in_production scenarios.

1. The Classic Row Update Order Deadlock

This is the “hello, world” of deadlocks, most often seen in financial or inventory systems.

  • The Scenario: Two separate application workers are processing transfers between two accounts, but they access the accounts in a different order. Worker 1 transfers from Account A to Account B, while Worker 2 transfers from Account B to Account A.
  • The Deadlock Log:
    ERROR: deadlock detected
    DETAIL: Process 101 waits for ShareLock on transaction 2002; blocked by process 102.
            Process 102 waits for ShareLock on transaction 2001; blocked by process 101.
            Process 101: UPDATE accounts SET balance = balance + 10 WHERE account_id = 'B';
            Process 102: UPDATE accounts SET balance = balance + 20 WHERE account_id = 'A';
    CONTEXT: while updating tuple (0,1) in relation "accounts"
    
  • Lock_Tree_Analysis: Process 101 locked account A, then tried to lock account B. At the same time, Process 102 locked account B, then tried to lock account A. A perfect circle.
  • Deadlock_Resolution: Enforce a consistent locking order in the application logic. Always lock the rows in a deterministic order, for example, by their primary key. Lock the account with the lower account_id first, regardless of which is the source or destination.

2. The Foreign Key Frenzy

Foreign key constraints create implicit locks that can be a surprising source of a transaction_conflict.

  • The Scenario: One transaction is inserting a new order_item for a product, while another transaction is attempting to DELETE that same product.
  • The Deadlock Log:
    ERROR: deadlock detected
    DETAIL: Process 201 waits for ShareLock on transaction 3002; blocked by process 202.
            Process 202 waits for RowExclusiveLock on relation "order_items"; blocked by process 201.
            Process 201: DELETE FROM products WHERE product_id = 123;
            Process 202: INSERT INTO order_items (order_id, product_id, quantity) VALUES (99, 123, 1);
    
  • Lock_Tree_Analysis: The INSERT on order_items takes a ShareLock on the products row to verify it exists. The DELETE takes an ExclusiveLock on the products row and then tries to check for referencing rows in order_items, which conflicts with the INSERT.
  • Deadlock_Resolution: The application logic for deleting a product should first acquire an EXCLUSIVE lock on the target product row (SELECT ... FOR UPDATE) before attempting the DELETE. This forces any concurrent INSERTs into the child table to wait.

3. The Uncooperative Advisory Lock

Advisory locks are powerful for distributed locking but can cause chaos if not used carefully.

  • The Scenario: A job_scheduler_db uses an advisory lock to ensure only one worker runs a task. A separate database maintenance script, written by another team, uses the same integer for its own advisory lock.
  • The Deadlock Log: The log is confusing, showing two unrelated processes, a job_worker and a cleanup_script, blocking each other while waiting for an Advisory wait_event.
  • Lock_Tree_Analysis: Both processes called pg_advisory_lock(12345). The deadlock occurs because the process holding the advisory lock then tries to access a table that the other process has locked.
  • Deadlock_Resolution: Implement a robust lock_key_design strategy. Never use magic numbers. Use a combination of table OIDs and primary keys, or a hash of the job name, to generate unique integer keys for advisory locks.

4. The CREATE INDEX CONCURRENTLY Collision

This DDL command is designed to minimize blocking, but it’s not entirely lock-free.

  • The Scenario: A DBA runs CREATE INDEX CONCURRENTLY on a very busy table during production hours.
  • Lock_Tree_Analysis: CREATE INDEX CONCURRENTLY has two phases. In its second phase, it needs to wait for all transactions that can “see” the old state of the table to finish. If one of those transactions then tries to acquire a lock that conflicts with the CREATE INDEX command’s lock, a deadlock occurs.
  • Deadlock_Resolution: Always run DDL like this during low-traffic maintenance windows. Before starting, query pg_stat_activity to ensure there are no long-running transactions.

5. The Autovacuum Ambush

Even the database’s own maintenance processes can become a blocking_process.

  • The Scenario: On a heavy_write_table, an autovacuum process kicks in. At the same time, a long-running application transaction is performing multi-row updates.
  • Lock_Tree_Analysis: autovacuum holds a lightweight lock on the table but may need to briefly lock index pages. If an application transaction holds a row lock that autovacuum needs and then requests another resource that autovacuum has already locked, they deadlock.
  • Deadlock_Resolution: Tune autovacuum to be more aggressive and faster on heavy-write tables. Lowering the autovacuum_vacuum_scale_factor and increasing maintenance_work_mem allows vacuum to run more frequently for shorter durations, reducing the window for conflict.

6. The Partial Index Predicament

Partial indexes are great for performance but have subtle locking behaviors.

  • The Scenario: A jobs table has a partial index ON jobs (id) WHERE status = 'pending'. Two transactions try to update two different jobs from pending to running.
  • Lock_Tree_Analysis: The two UPDATE statements on different rows have deadlocked. The pg_locks_analysis shows they are both waiting for a lock on the partial index itself. When a row is updated such that it no longer meets the partial index’s WHERE condition, PostgreSQL may need a broader lock on the index.
  • Deadlock_Resolution: This is a tricky query_pair_deadlock. The solution often involves retrying the transaction in the application code.

7. The Paged-Based Lock Escalation

  • The Scenario: Two transactions are rapidly updating many different rows that happen to reside on the same 8KB data page.
  • Lock_Tree_Analysis: PostgreSQL’s locking mechanism can, under high contention, lock an entire page instead of individual rows to save resources. If Transaction A locks Page 1 to update Row X, and Transaction B locks Page 2 to update Row Y, a deadlock can occur if A then needs Page 2 and B needs Page 1.
  • Deadlock_Resolution: Re-ordering rows with CLUSTER or using FILLFACTOR to spread rows across more pages can help mitigate this.

8. The Serializable Isolation Struggle

A serializable_conflict is not technically a deadlock, but PostgreSQL can report it as one.

  • The Scenario: Two transactions running at ISOLATION LEVEL SERIALIZABLE have read/write dependencies. Transaction A reads a row, B reads a different row. Then, A tries to write to the row B read, and B tries to write to the row A read.
  • Lock_Tree_Analysis: The dependency is not on locks, but on the “information” each transaction has read.
  • Deadlock_Resolution: The only solution is to build retry logic into the application. Serializable transactions are expected to fail under certain concurrent patterns, and the application must be prepared to retry.

9. The SKIP LOCKED Queue Worker Conflict

FOR UPDATE SKIP LOCKED is a fantastic tool for building queues, but it’s not immune to deadlocks.

  • The Scenario: A job queue worker uses SKIP LOCKED to grab a job. While processing, it tries to access a separate settings table that is also being updated by an administrative task.
  • Lock_Tree_Analysis: The worker locked a job row and then tried to get a lock on the settings table. The admin task locked the settings table and then tried to access the specific job row that the worker had already locked.
  • Deadlock_Resolution: Keep worker logic simple. The transaction that dequeues a job should ideally do nothing but process that job. If access to shared configuration is needed, read it before the main work transaction begins.

10. The TRUNCATE and Trigger Tangle

  • The Scenario: A nightly maintenance script runs TRUNCATE on a log table. This table has a trigger that, upon deletion, writes an audit entry. Simultaneously, another process is updating the audit table.
  • Lock_Tree_Analysis: TRUNCATE requires an AccessExclusiveLock. The trigger fired by TRUNCATE then tries to get a lock on the audit table but is blocked by the application process. The application, in turn, needs a resource locked by the TRUNCATE transaction.
  • Deadlock_Resolution: Be extremely cautious with triggers on tables that are subject to DDL like TRUNCATE. The best fix is often to remove the trigger and handle the auditing logic in the application code.

From Post-Mortem to Proactive Prevention

Performing an incident_postmortem for each postgres_deadlock_example is crucial for learning. However, the ultimate goal is to prevent them. The common thread in nearly every deadlock_fix_steps is reducing the duration and scope of transactions and enforcing a consistent order for acquiring locks.

This is where comprehensive, real-time monitoring becomes a strategic advantage. Netdata can provide a granular pg_stat_activity_trace and monitor wait_event types across your entire PostgreSQL fleet. By setting up alerts for long-running transactions or unusual lock wait times, you can investigate potential transaction_conflict scenarios before they escalate into a full-blown deadlock, turning your monitoring system into a proactive defense mechanism.

Stop being surprised by deadlocks. Get the deep visibility you need to prevent them with Netdata.