It’s 3 AM. The pager screams. Your application is throwing a cascade of errors, and users are reporting that the system is completely frozen. You dive into the logs and see the same ominous message repeating over and over: ERROR: deadlock detected. A PostgreSQL deadlock is one of the most abrupt and disruptive failures a database can experience. It’s not a performance degradation; it’s a hard stop where two or more transactions are locked in a fatal embrace, each waiting for a resource the other holds.
PostgreSQL is smart enough to break the deadlock by terminating one of the transactions, but this is cold comfort. The incident_postmortem reveals a failed operation, a frustrated user, and the unsettling fact that it could happen again at any moment. To truly solve the problem, you need to go beyond just restarting the process. You need to perform a lock_tree_analysis on the stack_trace_postgres provided in the logs and understand the fundamental deadlock_root_cause. This guide presents 10 real-world postgres_deadlock_example scenarios, showing you how to dissect the evidence and apply the right deadlock_resolution.
The Tools of the Investigation
Before diving into the case studies, remember the primary tools for a live postgres_debug session:
- The Deadlock Log: PostgreSQL’s error log is your most important piece of evidence. It provides the
stack_trace_postgres, showing the PIDs of theblocking_processand the exact queries involved. pg_stat_activity: This view shows you what every process is currently doing, including itswait_event. Apg_stat_activity_traceis crucial for understanding the context of a live contention issue.pg_locks: This view provides a detailed, low-level look at all the locks currently held and awaited in the system. Apg_locks_analysiscan help unravel complex lock chains.
Let’s examine the evidence from ten real deadlock_in_production scenarios.
1. The Classic Row Update Order Deadlock
This is the “hello, world” of deadlocks, most often seen in financial or inventory systems.
- The Scenario: Two separate application workers are processing transfers between two accounts, but they access the accounts in a different order. Worker 1 transfers from Account
Ato AccountB, while Worker 2 transfers from AccountBto AccountA. - The Deadlock Log:
ERROR: deadlock detected DETAIL: Process 101 waits for ShareLock on transaction 2002; blocked by process 102. Process 102 waits for ShareLock on transaction 2001; blocked by process 101. Process 101: UPDATE accounts SET balance = balance + 10 WHERE account_id = 'B'; Process 102: UPDATE accounts SET balance = balance + 20 WHERE account_id = 'A'; CONTEXT: while updating tuple (0,1) in relation "accounts" Lock_Tree_Analysis: Process 101 locked accountA, then tried to lock accountB. At the same time, Process 102 locked accountB, then tried to lock accountA. A perfect circle.Deadlock_Resolution: Enforce a consistent locking order in the application logic. Always lock the rows in a deterministic order, for example, by their primary key. Lock the account with the loweraccount_idfirst, regardless of which is the source or destination.
2. The Foreign Key Frenzy
Foreign key constraints create implicit locks that can be a surprising source of a transaction_conflict.
- The Scenario: One transaction is inserting a new
order_itemfor aproduct, while another transaction is attempting toDELETEthat sameproduct. - The Deadlock Log:
ERROR: deadlock detected DETAIL: Process 201 waits for ShareLock on transaction 3002; blocked by process 202. Process 202 waits for RowExclusiveLock on relation "order_items"; blocked by process 201. Process 201: DELETE FROM products WHERE product_id = 123; Process 202: INSERT INTO order_items (order_id, product_id, quantity) VALUES (99, 123, 1); Lock_Tree_Analysis: TheINSERTonorder_itemstakes aShareLockon theproductsrow to verify it exists. TheDELETEtakes anExclusiveLockon theproductsrow and then tries to check for referencing rows inorder_items, which conflicts with theINSERT.Deadlock_Resolution: The application logic for deleting a product should first acquire anEXCLUSIVElock on the target product row (SELECT ... FOR UPDATE) before attempting theDELETE. This forces any concurrentINSERTs into the child table to wait.
3. The Uncooperative Advisory Lock
Advisory locks are powerful for distributed locking but can cause chaos if not used carefully.
- The Scenario: A
job_scheduler_dbuses an advisory lock to ensure only one worker runs a task. A separate database maintenance script, written by another team, uses the same integer for its own advisory lock. - The Deadlock Log: The log is confusing, showing two unrelated processes, a
job_workerand acleanup_script, blocking each other while waiting for anAdvisorywait_event. Lock_Tree_Analysis: Both processes calledpg_advisory_lock(12345). The deadlock occurs because the process holding the advisory lock then tries to access a table that the other process has locked.Deadlock_Resolution: Implement a robustlock_key_designstrategy. Never use magic numbers. Use a combination of table OIDs and primary keys, or a hash of the job name, to generate unique integer keys for advisory locks.
4. The CREATE INDEX CONCURRENTLY Collision
This DDL command is designed to minimize blocking, but it’s not entirely lock-free.
- The Scenario: A DBA runs
CREATE INDEX CONCURRENTLYon a very busy table during production hours. Lock_Tree_Analysis:CREATE INDEX CONCURRENTLYhas two phases. In its second phase, it needs to wait for all transactions that can “see” the old state of the table to finish. If one of those transactions then tries to acquire a lock that conflicts with theCREATE INDEXcommand’s lock, a deadlock occurs.Deadlock_Resolution: Always run DDL like this during low-traffic maintenance windows. Before starting, querypg_stat_activityto ensure there are no long-running transactions.
5. The Autovacuum Ambush
Even the database’s own maintenance processes can become a blocking_process.
- The Scenario: On a
heavy_write_table, anautovacuumprocess kicks in. At the same time, a long-running application transaction is performing multi-row updates. Lock_Tree_Analysis:autovacuumholds a lightweight lock on the table but may need to briefly lock index pages. If an application transaction holds a row lock thatautovacuumneeds and then requests another resource thatautovacuumhas already locked, they deadlock.Deadlock_Resolution: Tuneautovacuumto be more aggressive and faster on heavy-write tables. Lowering theautovacuum_vacuum_scale_factorand increasingmaintenance_work_memallows vacuum to run more frequently for shorter durations, reducing the window for conflict.
6. The Partial Index Predicament
Partial indexes are great for performance but have subtle locking behaviors.
- The Scenario: A
jobstable has a partial indexON jobs (id) WHERE status = 'pending'. Two transactions try to update two different jobs frompendingtorunning. Lock_Tree_Analysis: The twoUPDATEstatements on different rows have deadlocked. Thepg_locks_analysisshows they are both waiting for a lock on the partial index itself. When a row is updated such that it no longer meets the partial index’sWHEREcondition, PostgreSQL may need a broader lock on the index.Deadlock_Resolution: This is a trickyquery_pair_deadlock. The solution often involves retrying the transaction in the application code.
7. The Paged-Based Lock Escalation
- The Scenario: Two transactions are rapidly updating many different rows that happen to reside on the same 8KB data page.
Lock_Tree_Analysis: PostgreSQL’s locking mechanism can, under high contention, lock an entire page instead of individual rows to save resources. If Transaction A locks Page 1 to update RowX, and Transaction B locks Page 2 to update RowY, a deadlock can occur if A then needs Page 2 and B needs Page 1.Deadlock_Resolution: Re-ordering rows withCLUSTERor usingFILLFACTORto spread rows across more pages can help mitigate this.
8. The Serializable Isolation Struggle
A serializable_conflict is not technically a deadlock, but PostgreSQL can report it as one.
- The Scenario: Two transactions running at
ISOLATION LEVEL SERIALIZABLEhave read/write dependencies. Transaction A reads a row, B reads a different row. Then, A tries to write to the row B read, and B tries to write to the row A read. Lock_Tree_Analysis: The dependency is not on locks, but on the “information” each transaction has read.Deadlock_Resolution: The only solution is to build retry logic into the application. Serializable transactions are expected to fail under certain concurrent patterns, and the application must be prepared to retry.
9. The SKIP LOCKED Queue Worker Conflict
FOR UPDATE SKIP LOCKED is a fantastic tool for building queues, but it’s not immune to deadlocks.
- The Scenario: A job queue worker uses
SKIP LOCKEDto grab a job. While processing, it tries to access a separatesettingstable that is also being updated by an administrative task. Lock_Tree_Analysis: The worker locked ajobrow and then tried to get a lock on thesettingstable. The admin task locked thesettingstable and then tried to access the specificjobrow that the worker had already locked.Deadlock_Resolution: Keep worker logic simple. The transaction that dequeues a job should ideally do nothing but process that job. If access to shared configuration is needed, read it before the main work transaction begins.
10. The TRUNCATE and Trigger Tangle
- The Scenario: A nightly maintenance script runs
TRUNCATEon a log table. This table has a trigger that, upon deletion, writes an audit entry. Simultaneously, another process is updating the audit table. Lock_Tree_Analysis:TRUNCATErequires anAccessExclusiveLock. The trigger fired byTRUNCATEthen tries to get a lock on the audit table but is blocked by the application process. The application, in turn, needs a resource locked by theTRUNCATEtransaction.Deadlock_Resolution: Be extremely cautious with triggers on tables that are subject to DDL likeTRUNCATE. The best fix is often to remove the trigger and handle the auditing logic in the application code.
From Post-Mortem to Proactive Prevention
Performing an incident_postmortem for each postgres_deadlock_example is crucial for learning. However, the ultimate goal is to prevent them. The common thread in nearly every deadlock_fix_steps is reducing the duration and scope of transactions and enforcing a consistent order for acquiring locks.
This is where comprehensive, real-time monitoring becomes a strategic advantage. Netdata can provide a granular pg_stat_activity_trace and monitor wait_event types across your entire PostgreSQL fleet. By setting up alerts for long-running transactions or unusual lock wait times, you can investigate potential transaction_conflict scenarios before they escalate into a full-blown deadlock, turning your monitoring system into a proactive defense mechanism.
Stop being surprised by deadlocks. Get the deep visibility you need to prevent them with Netdata.