Databases

Using FOR UPDATE SKIP LOCKED for Queue-Based Workflows without Deadlocks

How a single SQL clause can transform your database into a high-throughput- parallel-processing task queue

Using FOR UPDATE SKIP LOCKED for Queue-Based Workflows without Deadlocks

One of the most common and powerful patterns in modern application development is the job queue. Whether you’re sending emails, processing images, or running complex calculations, offloading tasks to background workers is essential for building responsive and scalable systems. Many developers reach for dedicated queueing software like RabbitMQ or Redis, but for many use cases, your primary PostgreSQL database already has all the tools you need to build a robust, transactional, and incredibly performant job_queue_postgres.

The challenge, however, lies in coordination. How do you allow multiple concurrent_consumers (workers) to safely grab jobs from a table without them all trying to work on the same task, or worse, getting stuck in a traffic jam of database locks? The naive approach often leads to race conditions, while traditional locking (FOR UPDATE) can serialize your workers, defeating the purpose of parallelism.

This is where a powerful but often overlooked PostgreSQL feature comes into play: FOR UPDATE SKIP LOCKED. This single clause provides an elegant and highly efficient mechanism to create a deadlock_free_queue, allowing your workers to operate in parallel with minimal contention.

The Classic Job Queue Problem: A Recipe for Contention

Let’s imagine a simple jobs table with a status column (pending, processing, completed). The most basic worker logic looks something like this:

  1. Select a Job: Find a job where the status is ‘pending’.
  2. Update the Status: Change the job’s status to ‘processing’ to mark it as taken.
  3. Process the Job: Perform the actual work.

The problem occurs between steps 1 and 2. If two workers run the SELECT query at nearly the same time, they will both see the same ‘pending’ job. They will then both try to UPDATE it, leading to either duplicate processing (a disaster for non-idempotent tasks) or one worker’s update failing.

A more robust approach uses pessimistic locking with FOR UPDATE. The worker’s query would look for a pending job and lock the row immediately. When a second worker tries to select that same row, the database forces it to wait until the first worker’s transaction is complete. This prevents duplicate work but introduces a new bottleneck. Your workers, which are supposed to run in parallel, now form a “convoy,” waiting in a single-file line for the first worker to finish. This leads to high blocking_time and poor queue_performance.

The Elegant Solution: FOR UPDATE SKIP LOCKED

Introduced in PostgreSQL 9.5, SKIP LOCKED is a modifier for FOR UPDATE that completely changes its behavior. When a query with SKIP LOCKED tries to acquire a row_level_lock on a row that is already locked by another transaction, it doesn’t wait. Instead, it simply skips that row as if it didn’t exist and moves on to the next one that meets the WHERE clause criteria.

This is the perfect primitive for a postgres_worker_queue. Each worker can now execute a query to find and lock the first available pending job.

A highly efficient dequeue_script can be built into a single, atomic query. This query finds a pending job, locks it (skipping any that are already locked by other workers), updates its status to ‘processing’, and returns the job’s data to the worker application all in one go. Because this happens in a single statement, it’s incredibly fast and eliminates any chance of a race condition between selecting and updating.

Building a Robust Worker

The full workflow for a worker using this transactional_queue model would be:

  1. Begin Transaction: The worker starts a new database transaction.
  2. Dequeue a Job: The worker executes the atomic SKIP LOCKED query.
  3. Check for Work: If the query returns a job, the worker proceeds. If it returns nothing (meaning all pending jobs are currently locked by other workers), the worker can sleep for a short period before trying again.
  4. Process the Job: The worker performs the business logic using the data from the dequeued job.
  5. Finalize the Job:
    • On success, the worker updates the job’s status to completed.
    • On failure, it updates the status to failed and potentially records the error message. This is a good place to implement retry_logic_sql, perhaps by incrementing a retry_count and setting the status back to pending if the count is below a threshold.
  6. Commit Transaction: The worker commits the transaction, releasing the lock and making the final status change permanent.

If the worker process crashes for any reason, the database automatically rolls back the transaction. The lock is released, and the job’s status reverts to ‘pending’, making it available for another worker to pick up.

SKIP LOCKED vs. Advisory Locks

Before SKIP LOCKED was available, a common pattern for building a postgres_queue involved using advisory locks. An advisory_lock is a cooperative lock that developers manage manually in their application code. A worker would try to acquire a lock using a function call. If successful, it would process the job; if not, it would move on.

So, how does advisory_lock_vs_skip_locked stack up?

  • Complexity: Advisory locks are more complex. The application is responsible for managing the lock lifecycle, including ensuring locks are always released. A bug could lead to a “stuck” lock that requires manual intervention. SKIP LOCKED ties the lock directly to the row and the transaction, so it’s managed automatically by the database.
  • Safety: SKIP LOCKED is safer. Because the lock is tied to the transaction, a worker crash guarantees the lock is released. With session-level advisory locks, a crashed worker could hold a lock until the database connection times out, which could be a very long time.
  • Performance: For this specific use case, SKIP LOCKED is generally more performant as it’s a native, engine-level feature designed for this purpose.

For building a task_queue_db, FOR UPDATE SKIP LOCKED is the idiomatic, simpler, and more robust solution in modern PostgreSQL.

Performance Considerations and Best Practices

To ensure your optimistic_queue runs at peak performance, follow these best practices:

  • Create the Right Index: This is the most critical factor for queue_performance. Your jobs table must have a composite index on the columns used for finding work. For example, an index on (status, created_at) would allow PostgreSQL to very quickly find the oldest pending jobs.
  • Order Your Queue: Use an ORDER BY clause in your query to define the order in which jobs are processed (e.g., ORDER BY created_at for a FIFO queue, or ORDER BY priority, created_at for a priority queue).
  • Partition Large Tables: If your jobs table grows to hundreds of millions of rows, consider using PostgreSQL’s native table partitioning. You could partition by date or status, which can significantly improve query performance and make maintenance easier.
  • Monitor Your Queue: You can’t optimize what you can’t see. Continuously monitor key queue metrics:
    • Queue Depth: The number of jobs with status = 'pending'. A constantly growing number indicates your workers can’t keep up.
    • Worker Throughput: The rate of message_processing (jobs moved to completed or failed per second).
    • Lock Contention: Even with SKIP LOCKED, monitoring overall database lock contention is a good health check.

By combining the power of FOR UPDATE SKIP LOCKED with smart indexing and proactive monitoring, you can build a highly concurrent, reliable, and scalable job queue directly within the database you already trust, without adding another piece of infrastructure to your stack.

Netdata’s auto-discovery and detailed PostgreSQL dashboards make it easy to monitor these critical queue metrics in real-time, helping you fine-tune your worker pool and ensure your background job processing runs smoothly. Try Netdata for free today.