PostgreSQL VACUUM: Maintenance and Optimization

Keep your PostgreSQL database healthy and fast.

Why VACUUM Exists

PostgreSQL uses a multi-version concurrency control (MVCC) system. When you UPDATE or DELETE a row, PostgreSQL doesn't immediately remove the old data—it marks it as "dead" and keeps it around for active transactions that might still need it.

Over time, these dead rows accumulate. VACUUM cleans them up.

SQL
-- This UPDATE creates a dead row (the old version)
UPDATE users SET name = 'Alice Updated' WHERE id = 1;
-- The old "Alice" row is now dead but still taking space

Basic VACUUM

Run VACUUM on a specific table:

SQL
VACUUM users;

Or all tables in the database:

SQL
VACUUM;

This marks dead rows as reusable space but doesn't return space to the operating system.

VACUUM FULL

For serious space reclamation:

SQL
VACUUM FULL users;

VACUUM FULL rewrites the entire table, removing all dead space and shrinking the file. But there's a catch:

Warning: VACUUM FULL locks the table exclusively. No reads or writes during the operation. For large tables, this can take hours. Only use this during maintenance windows.

VACUUM ANALYZE

Combine cleanup with statistics gathering:

SQL
VACUUM ANALYZE users;

ANALYZE updates statistics that the query planner uses. After bulk data changes, this helps PostgreSQL choose better query plans.

Autovacuum: Let PostgreSQL Handle It

PostgreSQL runs autovacuum in the background. It automatically vacuums tables when dead rows exceed thresholds.

Check if autovacuum is working:

SQL
SELECT relname, last_autovacuum, last_autoanalyze,
       n_dead_tup, n_live_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Tuning Autovacuum

Default settings are conservative. For high-write tables, you might need to tune:

SQL
-- Make autovacuum more aggressive for a specific table
ALTER TABLE events SET (
  autovacuum_vacuum_scale_factor = 0.05,  -- Vacuum when 5% dead (default 20%)
  autovacuum_analyze_scale_factor = 0.02  -- Analyze when 2% changed
);

Or globally in postgresql.conf:

autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05
autovacuum_vacuum_cost_delay = 10ms

When to Run Manual VACUUM

Autovacuum handles most cases, but run manual VACUUM after:

  1. Bulk deletes — Deleted millions of rows? Don't wait for autovacuum.
SQL
DELETE FROM logs WHERE created_at < '2023-01-01';
VACUUM logs;
  1. Bulk updates — Each update creates dead rows.
SQL
UPDATE orders SET status = 'archived' WHERE created_at < '2023-01-01';
VACUUM orders;
  1. After initial data load — Fresh tables need statistics.
SQL
COPY users FROM '/data/users.csv' CSV HEADER;
VACUUM ANALYZE users;

Monitoring Bloat

Dead rows cause "bloat"—tables larger than they need to be. Check table bloat:

SQL
SELECT
  schemaname || '.' || relname AS table_name,
  pg_size_pretty(pg_relation_size(relid)) AS table_size,
  n_dead_tup AS dead_rows,
  n_live_tup AS live_rows,
  ROUND(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

If dead_pct is consistently high (>20%), autovacuum might not be keeping up.

VACUUM vs VACUUM FULL vs pg_repack

| Method | Locks Table? | Reclaims Disk Space? | Speed | |--------|-------------|---------------------|-------| | VACUUM | No | No (marks reusable) | Fast | | VACUUM FULL | Yes (exclusive) | Yes | Slow | | pg_repack | No | Yes | Medium |

pg_repack is an extension that reclaims space without exclusive locks—best of both worlds for production systems.

SQL
-- Install extension
CREATE EXTENSION pg_repack;

-- Repack a table (run from command line)
-- pg_repack -d mydb -t users

Transaction ID Wraparound

PostgreSQL uses 32-bit transaction IDs. After ~2 billion transactions, it needs to "freeze" old rows to prevent wraparound. VACUUM handles this.

Check how close you are:

SQL
SELECT datname,
       age(datfrozenxid) AS xid_age,
       current_setting('autovacuum_freeze_max_age') AS freeze_max
FROM pg_database
ORDER BY age(datfrozenxid) DESC;

If xid_age approaches freeze_max, PostgreSQL will force aggressive vacuuming—or in extreme cases, shut down to prevent corruption.

Quick Reference

| Command | Description | |---------|-------------| | VACUUM table | Clean dead rows, mark space reusable | | VACUUM FULL table | Rewrite table, reclaim disk space (locks!) | | VACUUM ANALYZE table | Clean + update statistics | | ANALYZE table | Update statistics only | | VACUUM VERBOSE table | Show detailed progress |

Best Practices

  1. Trust autovacuum — It handles most cases. Don't disable it.

  2. Tune for hot tables — High-write tables need lower thresholds.

  3. Monitor bloat — Check pg_stat_user_tables regularly.

  4. VACUUM after bulk operations — Don't wait for autovacuum after large deletes/updates.

  5. Avoid VACUUM FULL in production — Use pg_repack if you need space reclamation without downtime.

  6. Watch for wraparound warnings — Don't ignore "preventing transaction ID wraparound" messages.

Summary

VACUUM is essential PostgreSQL maintenance that runs automatically via autovacuum. For most databases, you just need to monitor it's working. For high-write workloads, tune autovacuum thresholds. After bulk operations, run manual VACUUM. And if you need to reclaim disk space without downtime, use pg_repack instead of VACUUM FULL.