$ guides / postgres / postgres-vacuum-full-vs-pg-repack ▌

Operations Guides

PostgreSQL VACUUM FULL vs pg_repack: choosing the right bloat fix

Table bloat is the gap between logical data size and on-disk consumption. Heavy UPDATE and DELETE activity leaves dead tuples in the heap. Autovacuum reclaims those tuples for reuse but does not shrink the underlying file. Once sequential scans slow and disk space vanishes, operators must choose between VACUUM FULL and pg_repack. The wrong choice locks a table for hours, exhausts disk mid-operation, or leaves orphaned triggers behind. This guide explains how each tool rewrites the heap, what locks it takes, and how to pick the right one.

What bloat is and why VACUUM alone is not enough

Regular VACUUM marks dead tuples as free space and updates the visibility map, but the on-disk file size does not shrink. Only a full table rewrite returns space to the OS. See PostgreSQL dead tuples piling up: why autovacuum can’t keep up for the autovacuum failure model. When dead tuples outpace vacuum, the table swells until the only remedy is to rebuild the heap and indexes. VACUUM FULL and pg_repack both produce a compact new table; the difference is availability, locking, and prerequisites.

How VACUUM FULL reclaims space

VACUUM FULL rewrites the table heap from scratch, copying live tuples into a new file and rebuilding all indexes. It acquires an AccessExclusiveLock on the table for the entire duration. No SELECT, INSERT, UPDATE, or DELETE can run against the table; queries referencing it queue until the lock releases. The operation returns freed space to the file system. Because PostgreSQL builds the new copy before dropping the old one, ensure free disk space roughly equal to the current table plus indexes. For a 500 GB table with 300 GB of indexes, plan for at least 800 GB of free space. Duration scales with table and index size; large tables can take hours. VACUUM FULL is simplest when you can afford complete unavailability or when the table lacks a primary key or unique NOT NULL index. It also works for small tables where the lock lasts seconds, or for emergency space reclamation after autovacuum has been disabled.

How pg_repack reclaims space online

pg_repack is an extension that rebuilds a table without holding an exclusive lock for more than a brief moment at the end. It creates a shadow table, rebuilds indexes on the shadow, and installs a trigger on the original table to capture concurrent DML into a log table. It copies rows from the original to the shadow in batches. Once the bulk copy finishes, it replays logged changes, acquires an AccessExclusiveLock only for the final relfilenode swap, and drops the old table. The final lock typically lasts milliseconds.

Because pg_repack must uniquely identify rows to replay changes, it requires a primary key or a unique index on a NOT NULL column. Tables without one must have a constraint added first, or use VACUUM FULL instead. Disk space requirements are roughly twice the target table plus indexes, plus space for the DML logged during the copy phase. On a heavily updated table, the trigger log can grow significantly, so monitor disk space during the operation.

Operational constraints that drive the choice

Disk space requirements

Both VACUUM FULL and pg_repack need enough free space to hold a second copy of the table and indexes while the original remains. Ensure at least 2x the target table size is available on the data volume before starting. If disk space is insufficient, the operation fails partway through. VACUUM FULL may leave temporary relations behind if the backend crashes. pg_repack may leave the shadow table and trigger in place if it fails; use the tool’s own cleanup mechanism rather than sending a hard SIGTERM. pg_repack carries additional risk on busy tables: if writes are heavy during the copy phase, the trigger log table grows beyond the 2x estimate.

Lock behavior and blocking

VACUUM FULL holds AccessExclusiveLock from start to finish. Every query that touches the table waits or times out. pg_repack holds ShareUpdateExclusiveLock during the bulk of the copy, allowing reads and writes. It upgrades to AccessExclusiveLock only for the final swap. The danger is that if a long-running transaction holds a lock on the source table when pg_repack attempts the swap, the upgrade stalls until that transaction completes. A pg_repack that appears almost done can sit waiting indefinitely. Before starting, identify long-running transactions:

-- Find transactions that could block the final swap
SELECT pid, usename, state, xact_start, query
FROM pg_stat_activity
WHERE state <> 'idle'
  AND xact_start < NOW() - INTERVAL '5 minutes'
ORDER BY xact_start;

After the copy phase, watch pg_locks for granted = false on the target table.

pg_repack prerequisites and gotchas

pg_repack requires a primary key or a unique NOT NULL index. Partitioned tables must be processed one child partition at a time . The trigger on the source table adds overhead for every DML operation during the copy phase. On extremely high-write tables, this can degrade application throughput. Parallel index build options can speed up a full-table repack, but they do not change locking behavior. If you must cancel pg_repack, avoid SIGTERM; use the tool’s cleanup mechanism to avoid leaving orphaned triggers and temporary objects.

PostgreSQL 19 native REPACK

PostgreSQL 19 introduces a native REPACK command. The initial implementation behaves like VACUUM FULL regarding locking, holding AccessExclusiveLock for the full duration. A planned REPACK CONCURRENTLY variant uses a logical replication slot instead of triggers to capture concurrent changes, holding ShareUpdateExclusiveLock during the copy and only AccessExclusiveLock at the final swap. REPACK CONCURRENTLY has strict limitations: it cannot be used on UNLOGGED tables, partitioned tables, system catalogs, or TOAST tables; it requires a primary key or index-based replica identity; it consumes one replication slot; only one REPACK process can run system-wide at a time; and there is a documented risk of deadlock or abort during the final lock upgrade.

flowchart TD
    A[Table bloat needs fixing] --> B{PK or unique index?}
    B -->|No| C[VACUUM FULL]
    B -->|Yes| D{Can lock for hours?}
    D -->|Yes| C
    D -->|No| E[pg_repack]
    E --> F{Disk >= 2x table?}
    F -->|No| G[Add disk first]
    F -->|Yes| H[Run pg_repack]
    C --> I[AccessExclusiveLock
for full duration]
    H --> J[Brief lock at swap
only]

Choosing between VACUUM FULL and pg_repack

Situation	Use VACUUM FULL	Use pg_repack
No PK or unique NOT NULL index	Only option	Cannot run
Maintenance window allows hours of exclusive lock	Preferred	Not necessary
Zero-downtime online fix required	Wrong tool	Required
Free disk < 2x table + indexes	Neither is safe	Neither is safe
Small table (< 1 GB)	Simple and fast	Overhead not worth it
High write rate during operation	No trigger overhead	Trigger may slow writes
Partitioned table	Target child only	Target child only

VACUUM FULL fits small tables where the exclusive lock lasts seconds, tables that lack a primary key, or emergency space reclamation during a maintenance window when the application is offline. It has no extension dependencies and no trigger overhead. pg_repack fits large production tables that cannot tolerate extended downtime. It is the standard tool for online bloat removal.

pg_squeeze is a third alternative that uses logical decoding instead of triggers to track changes during a rebuild. It avoids trigger overhead and long locks on system catalogs, but it is newer and less battle-tested than pg_repack. It requires PostgreSQL 14+ and is not universally available on managed platforms. For most production environments today, pg_repack remains the default choice for online reorganization. Keep pg_repack installed and tested before bloat becomes critical; discovering you need it during a disk-space emergency is the wrong time to install an extension for the first time.

Signals to watch during and after a rewrite

Signal	Why it matters	Warning sign
`pg_stat_progress_vacuum`	Tracks VACUUM FULL completion	Stuck at same value for > 30 minutes
Free disk space on data volume	Both tools write a full second copy	Drops below 20% during operation
`pg_locks` wait queue	Detects blocker for pg_repack final swap	`granted = false` on table lock
`pg_stat_user_tables.n_dead_tup`	Confirms bloat was actually removed	Dead tuple ratio still > 20% after completion
Table physical size	Verifies space returned to OS	Size reduction < expected

How Netdata helps

Correlate table bloat trends with n_dead_tup ratios over time to spot tables that outgrow autovacuum capacity.
Alert on disk utilization for PostgreSQL data volumes before a rewrite starts.
Track pg_stat_activity for long-running transactions that would block pg_repack’s final lock upgrade.
Monitor age(datfrozenxid) to distinguish bloat from wraparound pressure, which needs a different response.
Alert on replication slot consumption when testing PostgreSQL 19 REPACK CONCURRENTLY.

The Netdata solution

PostgreSQL monitoring with Netdata

Netdata monitors PostgreSQL with per-second metrics, pre-built dashboards, and ML-powered anomaly detection. Correlate connection saturation, lock waits, autovacuum progress, replication lag, and checkpoint I/O against the rest of your stack so you catch the incidents in these runbooks before they page anyone.

See PostgreSQL monitoring → Start monitoring free

PostgreSQL VACUUM FULL vs pg_repack: choosing the right bloat fix

PostgreSQL VACUUM FULL vs pg_repack: choosing the right bloat fix

What bloat is and why VACUUM alone is not enough

How VACUUM FULL reclaims space

How pg_repack reclaims space online

Operational constraints that drive the choice

Disk space requirements

Lock behavior and blocking

pg_repack prerequisites and gotchas

PostgreSQL 19 native REPACK

Choosing between VACUUM FULL and pg_repack

Signals to watch during and after a rewrite

How Netdata helps

Related guides

PostgreSQL monitoring with Netdata