One of the most common and powerful patterns in modern application development is the job queue. Whether you’re sending emails, processing images, or running complex calculations, offloading tasks to background workers is essential for building responsive and scalable systems. Many developers reach for dedicated queueing software like RabbitMQ or Redis, but for many use cases, your primary PostgreSQL database already has all the tools you need to build a robust, transactional, and incredibly performant job_queue_postgres
.
The challenge, however, lies in coordination. How do you allow multiple concurrent_consumers
(workers) to safely grab jobs from a table without them all trying to work on the same task, or worse, getting stuck in a traffic jam of database locks? The naive approach often leads to race conditions, while traditional locking (FOR UPDATE
) can serialize your workers, defeating the purpose of parallelism.
This is where a powerful but often overlooked PostgreSQL feature comes into play: FOR UPDATE SKIP LOCKED
. This single clause provides an elegant and highly efficient mechanism to create a deadlock_free_queue
, allowing your workers to operate in parallel with minimal contention.
The Classic Job Queue Problem: A Recipe for Contention
Let’s imagine a simple jobs
table with a status
column (pending
, processing
, completed
). The most basic worker logic looks something like this:
- Select a Job: Find a job where the status is ‘pending’.
- Update the Status: Change the job’s status to ‘processing’ to mark it as taken.
- Process the Job: Perform the actual work.
The problem occurs between steps 1 and 2. If two workers run the SELECT
query at nearly the same time, they will both see the same ‘pending’ job. They will then both try to UPDATE
it, leading to either duplicate processing (a disaster for non-idempotent tasks) or one worker’s update failing.
A more robust approach uses pessimistic locking with FOR UPDATE
. The worker’s query would look for a pending job and lock the row immediately. When a second worker tries to select that same row, the database forces it to wait until the first worker’s transaction is complete. This prevents duplicate work but introduces a new bottleneck. Your workers, which are supposed to run in parallel, now form a “convoy,” waiting in a single-file line for the first worker to finish. This leads to high blocking_time
and poor queue_performance
.
The Elegant Solution: FOR UPDATE SKIP LOCKED
Introduced in PostgreSQL 9.5, SKIP LOCKED
is a modifier for FOR UPDATE
that completely changes its behavior. When a query with SKIP LOCKED
tries to acquire a row_level_lock
on a row that is already locked by another transaction, it doesn’t wait. Instead, it simply skips that row as if it didn’t exist and moves on to the next one that meets the WHERE
clause criteria.
This is the perfect primitive for a postgres_worker_queue
. Each worker can now execute a query to find and lock the first available pending job.
A highly efficient dequeue_script
can be built into a single, atomic query. This query finds a pending job, locks it (skipping any that are already locked by other workers), updates its status to ‘processing’, and returns the job’s data to the worker application all in one go. Because this happens in a single statement, it’s incredibly fast and eliminates any chance of a race condition between selecting and updating.
Building a Robust Worker
The full workflow for a worker using this transactional_queue
model would be:
- Begin Transaction: The worker starts a new database transaction.
- Dequeue a Job: The worker executes the atomic
SKIP LOCKED
query. - Check for Work: If the query returns a job, the worker proceeds. If it returns nothing (meaning all pending jobs are currently locked by other workers), the worker can sleep for a short period before trying again.
- Process the Job: The worker performs the business logic using the data from the dequeued job.
- Finalize the Job:
- On success, the worker updates the job’s status to
completed
. - On failure, it updates the status to
failed
and potentially records the error message. This is a good place to implementretry_logic_sql
, perhaps by incrementing aretry_count
and setting the status back topending
if the count is below a threshold.
- On success, the worker updates the job’s status to
- Commit Transaction: The worker commits the transaction, releasing the lock and making the final status change permanent.
If the worker process crashes for any reason, the database automatically rolls back the transaction. The lock is released, and the job’s status reverts to ‘pending’, making it available for another worker to pick up.
SKIP LOCKED
vs. Advisory Locks
Before SKIP LOCKED
was available, a common pattern for building a postgres_queue
involved using advisory locks. An advisory_lock
is a cooperative lock that developers manage manually in their application code. A worker would try to acquire a lock using a function call. If successful, it would process the job; if not, it would move on.
So, how does advisory_lock_vs_skip_locked
stack up?
- Complexity: Advisory locks are more complex. The application is responsible for managing the lock lifecycle, including ensuring locks are always released. A bug could lead to a “stuck” lock that requires manual intervention.
SKIP LOCKED
ties the lock directly to the row and the transaction, so it’s managed automatically by the database. - Safety:
SKIP LOCKED
is safer. Because the lock is tied to the transaction, a worker crash guarantees the lock is released. With session-level advisory locks, a crashed worker could hold a lock until the database connection times out, which could be a very long time. - Performance: For this specific use case,
SKIP LOCKED
is generally more performant as it’s a native, engine-level feature designed for this purpose.
For building a task_queue_db
, FOR UPDATE SKIP LOCKED
is the idiomatic, simpler, and more robust solution in modern PostgreSQL.
Performance Considerations and Best Practices
To ensure your optimistic_queue
runs at peak performance, follow these best practices:
- Create the Right Index: This is the most critical factor for
queue_performance
. Yourjobs
table must have a composite index on the columns used for finding work. For example, an index on(status, created_at)
would allow PostgreSQL to very quickly find the oldest pending jobs. - Order Your Queue: Use an
ORDER BY
clause in your query to define the order in which jobs are processed (e.g.,ORDER BY created_at
for a FIFO queue, orORDER BY priority, created_at
for a priority queue). - Partition Large Tables: If your
jobs
table grows to hundreds of millions of rows, consider using PostgreSQL’s native table partitioning. You could partition by date or status, which can significantly improve query performance and make maintenance easier. - Monitor Your Queue: You can’t optimize what you can’t see. Continuously monitor key queue metrics:
- Queue Depth: The number of jobs with
status = 'pending'
. A constantly growing number indicates your workers can’t keep up. - Worker Throughput: The rate of
message_processing
(jobs moved tocompleted
orfailed
per second). - Lock Contention: Even with
SKIP LOCKED
, monitoring overall database lock contention is a good health check.
- Queue Depth: The number of jobs with
By combining the power of FOR UPDATE SKIP LOCKED
with smart indexing and proactive monitoring, you can build a highly concurrent, reliable, and scalable job queue directly within the database you already trust, without adding another piece of infrastructure to your stack.
Netdata’s auto-discovery and detailed PostgreSQL dashboards make it easy to monitor these critical queue metrics in real-time, helping you fine-tune your worker pool and ensure your background job processing runs smoothly. Try Netdata for free today.