It’s 3 AM. The pager screams. Your application is throwing a cascade of errors, and users are reporting that the system is completely frozen. You dive into the logs and see the same ominous message repeating over and over: ERROR: deadlock detected
. A PostgreSQL deadlock is one of the most abrupt and disruptive failures a database can experience. It’s not a performance degradation; it’s a hard stop where two or more transactions are locked in a fatal embrace, each waiting for a resource the other holds.
PostgreSQL is smart enough to break the deadlock by terminating one of the transactions, but this is cold comfort. The incident_postmortem
reveals a failed operation, a frustrated user, and the unsettling fact that it could happen again at any moment. To truly solve the problem, you need to go beyond just restarting the process. You need to perform a lock_tree_analysis
on the stack_trace_postgres
provided in the logs and understand the fundamental deadlock_root_cause
. This guide presents 10 real-world postgres_deadlock_example
scenarios, showing you how to dissect the evidence and apply the right deadlock_resolution
.
The Tools of the Investigation
Before diving into the case studies, remember the primary tools for a live postgres_debug
session:
- The Deadlock Log: PostgreSQL’s error log is your most important piece of evidence. It provides the
stack_trace_postgres
, showing the PIDs of theblocking_process
and the exact queries involved. pg_stat_activity
: This view shows you what every process is currently doing, including itswait_event
. Apg_stat_activity_trace
is crucial for understanding the context of a live contention issue.pg_locks
: This view provides a detailed, low-level look at all the locks currently held and awaited in the system. Apg_locks_analysis
can help unravel complex lock chains.
Let’s examine the evidence from ten real deadlock_in_production
scenarios.
1. The Classic Row Update Order Deadlock
This is the “hello, world” of deadlocks, most often seen in financial or inventory systems.
- The Scenario: Two separate application workers are processing transfers between two accounts, but they access the accounts in a different order. Worker 1 transfers from Account
A
to AccountB
, while Worker 2 transfers from AccountB
to AccountA
. - The Deadlock Log:
ERROR: deadlock detected DETAIL: Process 101 waits for ShareLock on transaction 2002; blocked by process 102. Process 102 waits for ShareLock on transaction 2001; blocked by process 101. Process 101: UPDATE accounts SET balance = balance + 10 WHERE account_id = 'B'; Process 102: UPDATE accounts SET balance = balance + 20 WHERE account_id = 'A'; CONTEXT: while updating tuple (0,1) in relation "accounts"
Lock_Tree_Analysis
: Process 101 locked accountA
, then tried to lock accountB
. At the same time, Process 102 locked accountB
, then tried to lock accountA
. A perfect circle.Deadlock_Resolution
: Enforce a consistent locking order in the application logic. Always lock the rows in a deterministic order, for example, by their primary key. Lock the account with the loweraccount_id
first, regardless of which is the source or destination.
2. The Foreign Key Frenzy
Foreign key constraints create implicit locks that can be a surprising source of a transaction_conflict
.
- The Scenario: One transaction is inserting a new
order_item
for aproduct
, while another transaction is attempting toDELETE
that sameproduct
. - The Deadlock Log:
ERROR: deadlock detected DETAIL: Process 201 waits for ShareLock on transaction 3002; blocked by process 202. Process 202 waits for RowExclusiveLock on relation "order_items"; blocked by process 201. Process 201: DELETE FROM products WHERE product_id = 123; Process 202: INSERT INTO order_items (order_id, product_id, quantity) VALUES (99, 123, 1);
Lock_Tree_Analysis
: TheINSERT
onorder_items
takes aShareLock
on theproducts
row to verify it exists. TheDELETE
takes anExclusiveLock
on theproducts
row and then tries to check for referencing rows inorder_items
, which conflicts with theINSERT
.Deadlock_Resolution
: The application logic for deleting a product should first acquire anEXCLUSIVE
lock on the target product row (SELECT ... FOR UPDATE
) before attempting theDELETE
. This forces any concurrentINSERT
s into the child table to wait.
3. The Uncooperative Advisory Lock
Advisory locks are powerful for distributed locking but can cause chaos if not used carefully.
- The Scenario: A
job_scheduler_db
uses an advisory lock to ensure only one worker runs a task. A separate database maintenance script, written by another team, uses the same integer for its own advisory lock. - The Deadlock Log: The log is confusing, showing two unrelated processes, a
job_worker
and acleanup_script
, blocking each other while waiting for anAdvisory
wait_event
. Lock_Tree_Analysis
: Both processes calledpg_advisory_lock(12345)
. The deadlock occurs because the process holding the advisory lock then tries to access a table that the other process has locked.Deadlock_Resolution
: Implement a robustlock_key_design
strategy. Never use magic numbers. Use a combination of table OIDs and primary keys, or a hash of the job name, to generate unique integer keys for advisory locks.
4. The CREATE INDEX CONCURRENTLY
Collision
This DDL command is designed to minimize blocking, but it’s not entirely lock-free.
- The Scenario: A DBA runs
CREATE INDEX CONCURRENTLY
on a very busy table during production hours. Lock_Tree_Analysis
:CREATE INDEX CONCURRENTLY
has two phases. In its second phase, it needs to wait for all transactions that can “see” the old state of the table to finish. If one of those transactions then tries to acquire a lock that conflicts with theCREATE INDEX
command’s lock, a deadlock occurs.Deadlock_Resolution
: Always run DDL like this during low-traffic maintenance windows. Before starting, querypg_stat_activity
to ensure there are no long-running transactions.
5. The Autovacuum Ambush
Even the database’s own maintenance processes can become a blocking_process
.
- The Scenario: On a
heavy_write_table
, anautovacuum
process kicks in. At the same time, a long-running application transaction is performing multi-row updates. Lock_Tree_Analysis
:autovacuum
holds a lightweight lock on the table but may need to briefly lock index pages. If an application transaction holds a row lock thatautovacuum
needs and then requests another resource thatautovacuum
has already locked, they deadlock.Deadlock_Resolution
: Tuneautovacuum
to be more aggressive and faster on heavy-write tables. Lowering theautovacuum_vacuum_scale_factor
and increasingmaintenance_work_mem
allows vacuum to run more frequently for shorter durations, reducing the window for conflict.
6. The Partial Index Predicament
Partial indexes are great for performance but have subtle locking behaviors.
- The Scenario: A
jobs
table has a partial indexON jobs (id) WHERE status = 'pending'
. Two transactions try to update two different jobs frompending
torunning
. Lock_Tree_Analysis
: The twoUPDATE
statements on different rows have deadlocked. Thepg_locks_analysis
shows they are both waiting for a lock on the partial index itself. When a row is updated such that it no longer meets the partial index’sWHERE
condition, PostgreSQL may need a broader lock on the index.Deadlock_Resolution
: This is a trickyquery_pair_deadlock
. The solution often involves retrying the transaction in the application code.
7. The Paged-Based Lock Escalation
- The Scenario: Two transactions are rapidly updating many different rows that happen to reside on the same 8KB data page.
Lock_Tree_Analysis
: PostgreSQL’s locking mechanism can, under high contention, lock an entire page instead of individual rows to save resources. If Transaction A locks Page 1 to update RowX
, and Transaction B locks Page 2 to update RowY
, a deadlock can occur if A then needs Page 2 and B needs Page 1.Deadlock_Resolution
: Re-ordering rows withCLUSTER
or usingFILLFACTOR
to spread rows across more pages can help mitigate this.
8. The Serializable Isolation Struggle
A serializable_conflict
is not technically a deadlock, but PostgreSQL can report it as one.
- The Scenario: Two transactions running at
ISOLATION LEVEL SERIALIZABLE
have read/write dependencies. Transaction A reads a row, B reads a different row. Then, A tries to write to the row B read, and B tries to write to the row A read. Lock_Tree_Analysis
: The dependency is not on locks, but on the “information” each transaction has read.Deadlock_Resolution
: The only solution is to build retry logic into the application. Serializable transactions are expected to fail under certain concurrent patterns, and the application must be prepared to retry.
9. The SKIP LOCKED
Queue Worker Conflict
FOR UPDATE SKIP LOCKED
is a fantastic tool for building queues, but it’s not immune to deadlocks.
- The Scenario: A job queue worker uses
SKIP LOCKED
to grab a job. While processing, it tries to access a separatesettings
table that is also being updated by an administrative task. Lock_Tree_Analysis
: The worker locked ajob
row and then tried to get a lock on thesettings
table. The admin task locked thesettings
table and then tried to access the specificjob
row that the worker had already locked.Deadlock_Resolution
: Keep worker logic simple. The transaction that dequeues a job should ideally do nothing but process that job. If access to shared configuration is needed, read it before the main work transaction begins.
10. The TRUNCATE
and Trigger Tangle
- The Scenario: A nightly maintenance script runs
TRUNCATE
on a log table. This table has a trigger that, upon deletion, writes an audit entry. Simultaneously, another process is updating the audit table. Lock_Tree_Analysis
:TRUNCATE
requires anAccessExclusiveLock
. The trigger fired byTRUNCATE
then tries to get a lock on the audit table but is blocked by the application process. The application, in turn, needs a resource locked by theTRUNCATE
transaction.Deadlock_Resolution
: Be extremely cautious with triggers on tables that are subject to DDL likeTRUNCATE
. The best fix is often to remove the trigger and handle the auditing logic in the application code.
From Post-Mortem to Proactive Prevention
Performing an incident_postmortem
for each postgres_deadlock_example
is crucial for learning. However, the ultimate goal is to prevent them. The common thread in nearly every deadlock_fix_steps
is reducing the duration and scope of transactions and enforcing a consistent order for acquiring locks.
This is where comprehensive, real-time monitoring becomes a strategic advantage. Netdata can provide a granular pg_stat_activity_trace
and monitor wait_event
types across your entire PostgreSQL fleet. By setting up alerts for long-running transactions or unusual lock wait times, you can investigate potential transaction_conflict
scenarios before they escalate into a full-blown deadlock, turning your monitoring system into a proactive defense mechanism.
Stop being surprised by deadlocks. Get the deep visibility you need to prevent them with Netdata.