Sale: Use codesave50for 50% off
Back to Blog

How to Fix Slow Postgres Queries

JayJay

Postgres rarely gets slow in a single dramatic moment. It degrades. A query that ran in 4ms last quarter starts taking 40ms, then 400ms, then it shows up at the top of an oncall page when a marketing campaign triples traffic. Nothing changed in the code. The query plan flipped because the row count crossed a threshold, or autovacuum fell behind, or someone added a BEFORE UPDATE trigger that turned a HOT update into a heap rewrite.

That slow-creep failure mode is what makes Postgres performance work different from chasing a 500ms log spike in MySQL. The good news: the diagnostic tooling is excellent, almost all of it ships with the database, and most slow queries fall into a handful of recognizable shapes. This guide is the long version of the playbook: every tool worth installing, the EXPLAIN output worth memorizing, the index types worth knowing, and the configuration knobs that change query plans.

Why Postgres queries get slow

Three things drive almost every slow-query report:

  1. The planner picked the wrong plan. This is usually because statistics are stale, or because the planner's cost model doesn't reflect your hardware (it assumes spinning disks unless you tell it otherwise).

  2. The plan is fine, but the table is full of garbage. Postgres uses MVCC, which means every UPDATE and DELETE leaves a dead tuple. If autovacuum can't keep up, queries spend time skipping over dead rows and indexes balloon.

  3. The query is doing something the planner can't optimize. A function on an indexed column, an OR across two columns, a deep OFFSET, a regex without a leading anchor. These read fine but force sequential scans.

The diagnostic process is figuring out which of those three you're hitting, then either changing the plan, fixing the storage, or rewriting the query. The rest of this post walks through the tools and techniques for each.

Step 1: Build a real diagnostic stack

You can't fix what you can't see. Postgres ships with deep instrumentation, but most of it is off by default. A production Postgres installation worth keeping should have at least these turned on.

pg_stat_statements

This is non-negotiable. pg_stat_statements records aggregated execution stats for every query the server has run since startup or stats reset. It is the single best tool for finding which queries are worth your attention.

Enable it in postgresql.conf:

INI
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 10000
pg_stat_statements.track_planning = on   # PG 13+

Restart, then in each database you want to track:

SQL
CREATE EXTENSION pg_stat_statements;

The view itself has a lot of columns. The ones worth keeping in muscle memory:

SQL
SELECT
  substring(query, 1, 80) AS query_preview,
  calls,
  round(total_exec_time::numeric, 1) AS total_ms,
  round(mean_exec_time::numeric, 2) AS mean_ms,
  round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct_total,
  rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Sort by total_exec_time, not mean_exec_time. A query that runs 100,000 times at 5ms is a much bigger problem than one that runs once at 5 seconds. The 5-second query is annoying. The 100,000x5ms query is what's eating your CPU.

A few less-obvious tricks:

  • The track_planning flag (Postgres 13 and later) records planning time as well as execution time. If you see a query where total_plan_time rivals total_exec_time, you have a prepared-statement plan-cache problem or a complex view that's expensive to plan. JIT can also blow up planning time on simple queries; check mean_plan_time.
  • Reset stats deliberately. SELECT pg_stat_statements_reset(); clears the slate. Run it before a deployment, then look 24 hours later to see what the new workload looks like.
  • The queryid column is stable across resets (since PG 14). Use it to track a specific query's performance across deploys without depending on text matching.

pg_stat_statements has overhead, but it's small. The cost of not running it in production is much higher.

log_min_duration_statement

pg_stat_statements aggregates. The log captures individual slow queries with their parameter values, which is what you need to reproduce a problem.

INI
log_min_duration_statement = 1000      # ms; log anything slower
log_line_prefix = '%m [%p] %u@%d %a '
log_statement_sample_rate = 1.0        # PG 13+
log_min_duration_sample = 100          # PG 13+
log_statement_sample_rate = 0.1        # PG 13+

The Postgres 13 additions matter. log_min_duration_sample plus log_statement_sample_rate let you sample queries in a window faster than log_min_duration_statement. So you can log every query above 1 second, plus 10% of queries above 100ms. Without sampling, lowering log_min_duration_statement on a busy server fills the disk in hours.

auto_explain

auto_explain logs the actual execution plan for every slow query, so you don't have to reproduce the problem on your laptop to see what the planner did.

INI
shared_preload_libraries = 'pg_stat_statements,auto_explain'
auto_explain.log_min_duration = 2000
auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_timing = on
auto_explain.log_format = 'json'
auto_explain.log_nested_statements = on

A warning that the manual buries: auto_explain.log_analyze = on adds measurable overhead because it runs the query through the instrumentation path. Use a high log_min_duration (a few seconds), or run with log_analyze = off and only flip it on temporarily when you're hunting something specific.

JSON format is the right default. It's painful to read in the log but trivial to parse with pgbadger or pgMustard.

pgbadger

Built-in tools give you the data. pgbadger turns it into a readable report. It's a Perl script that parses the Postgres log and produces an HTML report with the top queries, the time distribution, the most frequent errors, the connection patterns, and a dozen other things. Install it once and run it nightly:

BASH
pgbadger /var/log/postgresql/postgresql-*.log -o report.html

There's no good reason not to have this on a cron. It costs nothing to run on the log files you're already producing.

pgHero

Where pgbadger is a log analyzer, pgHero is a live dashboard with opinionated recommendations. It's a Rails app you point at your Postgres instance. The two features worth it on their own:

  1. Unused indexes. It queries pg_stat_user_indexes and shows you indexes with zero scans. Caveat: on a replica, this is unreliable. Drop based on stats from the primary.
  2. Slow query suggestions. It reads pg_stat_statements and proposes indexes.

The suggestions aren't always right (they don't know your write patterns or partial-index opportunities), but they're a useful prompt.

auto_explain plus pgMustard or explain.dalibo.com

pgMustard is a paid SaaS that takes an EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) plan and tells you what's wrong with it in plain English. It's $50 a month per seat. Worth it for teams who don't have a Postgres expert. For the same job free, paste the plan into explain.dalibo.com or explain.depesz.com. Dalibo's renders the plan as a flame graph, which makes the hot spots obvious. Depesz's is more compact and easier to read on small plans.

If you're going to pick one to learn first, learn explain.dalibo.com. The flame graph format clicks fast.

pg_stat_kcache

For attribution beyond what Postgres sees: which queries are causing actual disk I/O versus reading from the OS page cache, and which queries are CPU-bound. It hooks into Linux getrusage and exposes the per-query stats. Less common but invaluable when you're trying to figure out whether to add RAM or add indexes.

pg_stat_activity for real-time

For the "something is wrong right now" view, nothing beats pg_stat_activity. The view shows every backend, what query it's running, how long it's been there, what it's waiting on, and what's blocking it.

A query I keep in my back pocket:

SQL
SELECT
  pid,
  now() - pg_stat_activity.query_start AS duration,
  wait_event_type,
  wait_event,
  state,
  substring(query, 1, 80) AS query
FROM pg_stat_activity
WHERE state != 'idle'
  AND (now() - pg_stat_activity.query_start) > interval '5 seconds'
ORDER BY duration DESC;

That gives you every active query running longer than 5 seconds, sorted by duration. The wait_event column is gold: Lock means you have a blocker, IO:DataFileRead means you're hitting disk, IPC: events usually mean parallel query coordination.

Step 2: Read EXPLAIN like a power user

The base form of EXPLAIN is almost useless for performance work. It only shows estimates.

SQL
EXPLAIN SELECT * FROM orders
WHERE customer_id = 12345
AND status = 'pending';
Seq Scan on orders  (cost=0.00..12834.00 rows=23 width=120)
  Filter: ((customer_id = 12345) AND (status = 'pending'::text))

That cost number is in arbitrary units. The first number is the cost to return the first row, the second is the cost to return the last. The planner picks the plan with the lowest total cost. Knowing the absolute value isn't useful; knowing whether two plans have similar costs is.

What you want is EXPLAIN (ANALYZE, BUFFERS). ANALYZE executes the query and reports real timings. BUFFERS reports how many pages were read from the cache and from disk:

SQL
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE customer_id = 12345
AND status = 'pending';
Seq Scan on orders  (cost=0.00..12834.00 rows=23 width=120)
                    (actual time=0.052..2834.123 rows=23 loops=1)
  Filter: ((customer_id = 12345) AND (status = 'pending'::text))
  Rows Removed by Filter: 499977
  Buffers: shared hit=8200 read=4600
Planning Time: 0.234 ms
Execution Time: 2834.456 ms

This is a query you can fix in 30 seconds, and you can see exactly why it's slow. It scanned 500,000 rows to return 23. Buffers: shared read=4600 means 4,600 pages came from disk. At a typical 8KB page, that's 36MB of I/O for 23 rows.

The fields worth memorizing

What to look atWhat it tells you
actual time=A..BA is the time to first row, B is the time to last. A high A on the outer node means the inner work is expensive.
loops=NThe node ran N times. Common in nested loop joins. Multiply actual time by loops for total.
rows estimated vs actualIf they're off by 10x or more, statistics are wrong. Run ANALYZE.
Buffers: shared hit/readhit is buffer cache, read is disk. Lots of read on a hot query means it's not staying in cache.
Buffers: shared dirtied/writtenThe query is modifying pages. A read query with dirtied > 0 is doing setting hint bits, which is normal but worth knowing.
Heap FetchesOn Index Only Scan, this is how many times the visibility map missed and the query had to hit the heap anyway. High Heap Fetches means VACUUM hasn't been keeping up.
Workers Planned vs LaunchedIf they differ, you hit max_parallel_workers_per_gather or max_parallel_workers.
Sort Method: quicksort Memory: NSort fit in work_mem.
Sort Method: external merge Disk: NSort spilled to disk. Bad. Bump work_mem for this query.

Estimation errors are the most common bug

If the planner thinks a WHERE clause will return 100 rows and the real answer is 100,000, it will pick a nested loop join. That nested loop is fine for 100 rows and disastrous for 100,000. Nine times out of ten, this is what "the planner picked a bad plan" comes down to.

To check, look for nodes where rows=N (estimate) is far off from actual rows=M. Postgres can be wrong by orders of magnitude on:

  • Correlated columns. If country = 'US' AND state = 'CA' is hugely more selective than the product of their independent selectivities, the planner doesn't know unless you tell it. Fix: extended statistics.
    SQL
    CREATE STATISTICS orders_country_state (dependencies, ndistinct)
      ON country, state FROM orders;
    ANALYZE orders;
    
  • LIKE with patterns. The planner uses a fixed selectivity estimate for non-prefix patterns. Often badly wrong.
  • JSONB containment. data @> '{...}' selectivity is hard. Add extended statistics or rely on pg_stat_statements to surface the actual hit rate.

Use ANALYZE BUFFERS in production audits, ANALYZE alone in dev

BUFFERS matters more on production-like data sizes. On a 100-row dev database every query is in cache and Buffers: shared hit=2 is the answer for everything. Run your analysis on production-shaped data, or against a real production read replica.

JIT can lie to you

In Postgres 11 and later with JIT enabled (the default on most distributions in Postgres 12+), EXPLAIN ANALYZE for short queries includes JIT compilation time in the total. For OLTP workloads, JIT is often a net loss because compilation time outweighs the runtime savings. Look for:

JIT:
  Functions: 12
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 2.345 ms, Inlining 0.000 ms, Optimization 1.234 ms, Emission 3.456 ms, Total 7.035 ms

7ms of JIT overhead on a 5ms query is a problem. The fix is to raise jit_above_cost so JIT only kicks in for plans where it might pay off, or disable JIT entirely for OLTP workloads:

INI
jit = off
# or, more surgical:
jit_above_cost = 500000
jit_inline_above_cost = 1000000
jit_optimize_above_cost = 1000000

When the visualizer earns its keep

For plans with more than ten nodes, the text format is a mess. Run the same query with FORMAT JSON, paste it into explain.dalibo.com, and the slow node lights up.

SQL
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT ...

The "Children Time" column on Dalibo is the killer feature. It tells you which node owns the time, accounting for the work done by its children. That's the thing you want to optimize.

Step 3: Index design that pays for itself

Most slow Postgres queries are missing an index. The hard part isn't adding one; it's adding the right one, and not adding the wrong one.

Postgres supports six index access methods. You'll use B-tree for 90% of cases, GIN for 8%, and the rest essentially never. Understanding which one to reach for is the difference between a 1000x speedup and an index that wastes disk.

B-tree: the default for a reason

B-tree handles =, <, >, BETWEEN, IN, prefix LIKE, ordering, and IS NULL. It's the default and almost always the right answer.

SQL
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

A few B-tree details that matter:

Sort order is part of the index. An index on (created_at DESC) is different from (created_at). They both answer equality queries, but the descending version is what you want when paginating recent records:

SQL
CREATE INDEX idx_orders_created_desc ON orders(created_at DESC);

-- This uses the index for the sort
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20;

Postgres can scan a B-tree backward at almost the same speed, so a single ascending index handles both directions for most workloads. The descending version matters when you're combining sort with another column:

SQL
CREATE INDEX idx_orders_customer_created
  ON orders(customer_id, created_at DESC);

-- Reads in index order, no Sort node
SELECT * FROM orders
WHERE customer_id = 12345
ORDER BY created_at DESC
LIMIT 20;

INCLUDE is for non-key columns. Postgres 11 added INCLUDE to make covering indexes explicit:

SQL
CREATE INDEX idx_orders_customer_covering
  ON orders(customer_id) INCLUDE (status, total);

The included columns aren't part of the index key. They live in the leaf pages so the query can satisfy the projection without hitting the heap. The plan shows Index Only Scan.

fillfactor for high-update tables. B-tree leaf pages are filled to 90% by default. On a table with frequent in-place updates, dropping the fillfactor leaves room for HOT (Heap-Only Tuple) updates, which avoid index churn:

SQL
ALTER INDEX idx_orders_customer_id SET (fillfactor = 80);
ALTER TABLE orders SET (fillfactor = 80);
REINDEX INDEX CONCURRENTLY idx_orders_customer_id;

This is a niche optimization. Don't bother unless you've measured high index bloat on a write-heavy table.

GIN: for JSONB, arrays, and full-text

When you index a jsonb column or an array, you almost certainly want a GIN index. The default operator class works for general containment:

SQL
CREATE INDEX idx_events_data ON events USING GIN (data);

SELECT * FROM events WHERE data @> '{"type": "signup"}';

There's also jsonb_path_ops, which indexes only the paths and is smaller and faster for the @> operator specifically, at the cost of supporting fewer operators:

SQL
CREATE INDEX idx_events_data ON events USING GIN (data jsonb_path_ops);

If your queries are only @> containment (the common case), jsonb_path_ops is the better default.

For full-text search, the standard recipe:

SQL
CREATE INDEX idx_posts_search ON posts
  USING GIN (to_tsvector('english', body));

SELECT * FROM posts
WHERE to_tsvector('english', body) @@ to_tsquery('postgres & performance');

Or, if you want to index a generated column to avoid recomputing the tsvector:

SQL
ALTER TABLE posts ADD COLUMN body_tsv tsvector
  GENERATED ALWAYS AS (to_tsvector('english', body)) STORED;
CREATE INDEX idx_posts_body_tsv ON posts USING GIN (body_tsv);

For trigram-based fuzzy and substring search, pg_trgm:

SQL
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_users_name_trgm ON users USING GIN (name gin_trgm_ops);

SELECT * FROM users WHERE name ILIKE '%smith%';

This is one of the few ways to get a non-prefix LIKE to use an index. It costs disk and CPU on writes, but for search-style queries it's transformative.

GiST: range types and geo

For range overlap queries (range_a && range_b), geometric types, and exclusion constraints, GiST is the right choice:

SQL
CREATE INDEX idx_bookings_range ON bookings USING GiST (room_id, during);

-- "Find any booking that conflicts with this room and time range"
SELECT * FROM bookings
WHERE room_id = 7 AND during && '[2026-05-21 10:00, 2026-05-21 11:00)'::tstzrange;

The classic use case is double-booking prevention via exclusion constraints, which need GiST under the hood.

BRIN: huge append-mostly tables

BRIN (Block Range INdex) stores a min/max for ranges of physical pages instead of one entry per row. On a billion-row time-series table where the data is roughly ordered by created_at, a BRIN index can be a few hundred kilobytes when the equivalent B-tree would be tens of gigabytes:

SQL
CREATE INDEX idx_events_created_brin
  ON events USING BRIN (created_at)
  WITH (pages_per_range = 32);

BRIN only works if your data is physically ordered by the indexed column. If rows are inserted in random order, BRIN is useless. For time-series data with INSERTs that mostly arrive in chronological order, it's hard to beat.

A common BRIN mistake: pairing BRIN with high-selectivity equality queries. BRIN is bad at equality. It's great at "give me the rows in this hour-long window."

Hash indexes: rehabilitated but rarely worth it

Before Postgres 10, hash indexes weren't WAL-logged and broke after a crash with no warning. Now they work. They're slightly smaller and faster than B-tree for pure equality lookups on large keys. In practice, B-tree's penalty for "supports range queries too" is so small that the hash index rarely earns its place. Skip them unless you've measured.

SP-GiST: skip it

Specialized data structures for non-balanced trees (phone networks, IP ranges). You'll know if you need it.

Partial indexes: the underused win

If you only query a fraction of the rows, index only that fraction:

SQL
-- 95% of orders are 'shipped'; you only query the 5% that aren't
CREATE INDEX idx_orders_unshipped ON orders(customer_id)
  WHERE status != 'shipped';

The index is smaller (5% of the size), cheaper to update, and faster to scan. The query has to include the partial predicate for the planner to use it:

SQL
-- Planner uses idx_orders_unshipped
SELECT * FROM orders WHERE customer_id = 12345 AND status != 'shipped';

-- Cannot use it: status condition doesn't match
SELECT * FROM orders WHERE customer_id = 12345;

Partial indexes are one of the best tools Postgres gives you and most teams ignore them. Look for queries that always include a constant WHERE clause and replace the full-table index with a partial.

Expression indexes: the function-on-column escape hatch

A query that wraps an indexed column in a function can't use a normal index. The expression index fixes it:

SQL
-- Without an expression index, this is a sequential scan
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

CREATE INDEX idx_users_email_lower ON users(LOWER(email));

The expression has to match exactly. LOWER(email) = 'foo' uses it. email = LOWER('Foo') does not (and doesn't need to, because the comparison is on the raw column).

For jsonb field lookups, this is the way to index a specific path:

SQL
CREATE INDEX idx_events_user_id ON events ((data->>'user_id'));

SELECT * FROM events WHERE data->>'user_id' = '12345';

Concurrent index builds

Building an index locks writes by default. On a production table, that's not acceptable. Use CONCURRENTLY:

SQL
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id);

It takes longer (Postgres makes two passes), uses more I/O, and can leave behind an INVALID index if it fails. Always check after the build:

SQL
SELECT indexrelid::regclass, indisvalid FROM pg_index
WHERE indexrelid = 'idx_orders_customer_id'::regclass;

If indisvalid = false, drop it and try again. The CREATE INDEX documentation has a multi-paragraph warning about this. Read it before doing it in production.

Find and drop dead indexes

SQL
SELECT
  schemaname || '.' || relname AS table,
  indexrelname AS index,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size,
  idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelid NOT IN (
    SELECT conindid FROM pg_constraint WHERE conindid != 0
  )
ORDER BY pg_relation_size(indexrelid) DESC;

The conindid filter excludes indexes backing unique and primary key constraints, which you can't drop without breaking the constraint.

Two caveats before deleting:

  1. Read from the primary. Index stats on a replica don't include replicated queries.
  2. The window matters. A nightly batch report might use an index zero times for 23 hours. Reset stats and wait at least 24 hours, ideally a week.

pgHero automates this check.

Bloated indexes and pg_repack

After lots of updates, B-tree indexes accumulate dead entries that VACUUM can clean but not always reclaim space from. Symptom: the index is much bigger than its B-tree theoretical size, queries are slow, and pgstattuple reports a high dead-tuple percentage:

SQL
CREATE EXTENSION pgstattuple;
SELECT * FROM pgstatindex('idx_orders_customer_id');

The cheap fix is REINDEX CONCURRENTLY (PG 12+):

SQL
REINDEX INDEX CONCURRENTLY idx_orders_customer_id;

For table-level bloat with downtime constraints, pg_repack is the standard tool. It rebuilds tables online by swapping pages behind a trigger.

Step 4: Rewrite queries the planner can optimize

A surprising fraction of slow queries are slow because they're written in a way that prevents Postgres from using indexes it has. These rewrites are mechanical.

SELECT * disables index-only scans

If you only need three columns, ask for three columns:

SQL
-- Hits the heap for every row
SELECT * FROM orders WHERE customer_id = 12345;

-- Can satisfy from a covering index
SELECT id, status, total FROM orders WHERE customer_id = 12345;

This is doubly important when the table has a TOASTed column (any column over ~2KB stored out-of-line). SELECT * reads the TOAST chain for every row even if you didn't want the field.

OFFSET pagination is broken past page 5

SQL
-- Reads 10,000 rows, throws them away, returns 20
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 10000;

It gets quadratically worse with depth. Use keyset pagination:

SQL
-- Index lookup, no scan
SELECT * FROM orders WHERE id > :last_seen_id ORDER BY id LIMIT 20;

If you need page numbers (a UX requirement, usually), the fix is hybrid: use keyset for "next page" and "previous page" navigation, accept that "jump to page 47" is slow.

Markus Winand's site use-the-index-luke.com has the definitive treatment of this. Read it once and stop writing OFFSET queries forever.

Functions on indexed columns

The most common form is date truncation:

SQL
-- Disables the index on created_at
SELECT * FROM orders WHERE DATE(created_at) = '2026-05-21';

-- Range query the index can use
SELECT * FROM orders
WHERE created_at >= '2026-05-21'
  AND created_at < '2026-05-22';

If you genuinely need the function form (because the query is generated and you can't change it), add an expression index.

The same applies to type casts:

SQL
-- If user_id is BIGINT, this cast disables the index
SELECT * FROM events WHERE user_id::text = '12345';

-- Fix the caller's parameter type instead
SELECT * FROM events WHERE user_id = 12345;

ORMs are a frequent offender here. Watch for ones that wrap parameters in unnecessary casts.

OR across columns and BitmapOr

Postgres can sometimes combine multiple indexes with BitmapOr, but the planner often picks a sequential scan instead when the selectivities are uncertain:

SQL
-- May or may not use both indexes
SELECT * FROM orders
WHERE customer_id = 12345 OR external_ref = 'abc-123';

The deterministic rewrite is UNION ALL:

SQL
SELECT * FROM orders WHERE customer_id = 12345
UNION ALL
SELECT * FROM orders
WHERE external_ref = 'abc-123' AND customer_id <> 12345;

If the planner is picking the right plan with OR, leave it. If EXPLAIN ANALYZE shows it falling back to Seq Scan, the UNION ALL form forces the issue.

Lateral joins for "top N per group"

A classic query: "give me the 5 most recent orders for each customer." A bare GROUP BY can't do this. The naive self-join is expensive. The right tool is LATERAL:

SQL
SELECT c.id, c.name, o.*
FROM customers c
CROSS JOIN LATERAL (
  SELECT * FROM orders o
  WHERE o.customer_id = c.id
  ORDER BY o.created_at DESC
  LIMIT 5
) o;

Combined with a (customer_id, created_at DESC) index, this is the fastest way to do top-N-per-group in Postgres.

CTEs are no longer an optimization fence (in PG 12+)

Before Postgres 12, a WITH clause acted as an optimization barrier: the CTE was always materialized as a temporary result. Some teams relied on this to force a specific plan shape. As of PG 12, the planner can inline simple CTEs, which is usually faster.

If you want to keep the old behavior:

SQL
WITH recent AS MATERIALIZED (
  SELECT * FROM orders WHERE created_at > now() - interval '1 day'
)
SELECT * FROM recent WHERE customer_id = 12345;

Conversely, NOT MATERIALIZED forces inlining. If you're upgrading from PG 11 or earlier and queries got slower after the upgrade, check whether you relied on CTE materialization.

EXISTS vs IN

In modern Postgres (10+), the planner treats IN (subquery) and EXISTS identically in almost all cases. Pick whichever reads more clearly. The old "always use EXISTS" advice is outdated.

The exception: NOT IN with a subquery that can return NULL is a footgun. It evaluates to UNKNOWN if any subquery row is NULL, which returns zero results with no error. Always use NOT EXISTS for the negation:

SQL
-- Subtle bug: if orders has any NULL customer_id, returns nothing
SELECT * FROM customers
WHERE id NOT IN (SELECT customer_id FROM orders);

-- Always correct
SELECT * FROM customers c
WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.customer_id = c.id
);

Step 5: Tune the configuration knobs that change query plans

Most of Postgres's defaults are conservative and tuned for a small machine. A real production box needs at least these set.

The single best starting point is pgtune.leopard.in.ua. It takes your hardware specs and workload type and outputs a config. Use it as a baseline, then tune based on what your workload does in practice.

shared_buffers: 25% of RAM

This is Postgres's in-process page cache. Above ~40% of RAM, you start fighting with the OS page cache for the same pages, and both lose. 25% is the standard recommendation and it holds up.

INI
shared_buffers = 8GB

On a 32GB instance, that's 8GB to Postgres, 24GB to the OS cache. Postgres reads pages through shared_buffers but the OS caches the underlying files, so the practical working set is the sum.

effective_cache_size: not memory, a hint

This setting tells the planner roughly how much memory is available across shared_buffers and the OS page cache. The planner uses it to decide between index and sequential scans. Set it to about half to three-quarters of total RAM:

INI
effective_cache_size = 24GB

This is the most misunderstood Postgres setting. It does not allocate memory. It's a number the planner uses to compute cost estimates. Setting it too low makes the planner avoid index scans; too high makes it overestimate index scan benefit.

work_mem: per-operation, not per-connection

Each Sort, Hash, or Materialize node gets up to work_mem of memory. Crucially, a single query with three sorts can use three times work_mem. A hundred concurrent queries with three sorts each can use 300x.

Set it conservatively as a default and bump it per-query when needed:

INI
work_mem = 16MB
SQL
-- For a specific session or transaction:
SET LOCAL work_mem = '256MB';

Symptom of low work_mem: EXPLAIN ANALYZE shows Sort Method: external merge Disk: 50000kB. That sort spilled. Bump work_mem for the query.

maintenance_work_mem: for VACUUM, CREATE INDEX, and ALTER TABLE

Set this generously. It's only used briefly and dramatically speeds up maintenance:

INI
maintenance_work_mem = 2GB

random_page_cost: drop to 1.1 on SSDs

The default random_page_cost = 4.0 assumes spinning disks where random I/O is much slower than sequential. On SSDs they're nearly equivalent. The default makes the planner avoid index scans:

INI
random_page_cost = 1.1
seq_page_cost = 1.0

If your database lives entirely in memory, you could go lower (random_page_cost = 1.0), but 1.1 is the safe default for SSD-backed cloud Postgres.

effective_io_concurrency: tell Postgres about your SSD

For Bitmap Heap Scans, Postgres can prefetch pages. The setting tells it how many concurrent prefetches your storage can handle:

INI
effective_io_concurrency = 200       # SSD
maintenance_io_concurrency = 100     # PG 13+

200 is the right ballpark for a single SSD. NVMe can go higher. Cloud disks vary; 200 is a safe default.

Parallelism settings

Modern Postgres can parallelize sequential scans, hash joins, and aggregates. The defaults are low:

INI
max_worker_processes = 8
max_parallel_workers = 8
max_parallel_workers_per_gather = 4
max_parallel_maintenance_workers = 4

max_parallel_workers_per_gather = 4 is the one that matters for OLAP. On a 16-core box, raise it to 8.

Connection pooling: pgBouncer, not max_connections

The common reflex when "out of connections" errors appear is to raise max_connections. This is almost always wrong. Each Postgres connection has nontrivial memory and scheduling overhead. Past 100-200 active connections, performance degrades.

The right answer is a connection pooler. pgBouncer is the standard. It accepts thousands of client connections and multiplexes them onto a small pool of real backend connections.

Two pooling modes:

  • Transaction pooling: a backend connection is held for the duration of a transaction. Compatible with most apps. The default for most setups.
  • Session pooling: a backend connection is held for the duration of the client session. Use only when you need session-scoped features (prepared statements that aren't recreated, LISTEN/NOTIFY, advisory locks, SET outside a transaction).

Transaction pooling breaks a few features. Anything that depends on state outside a transaction stops working. Prepared statements work in transaction mode in pgBouncer 1.21+; before that they didn't. Check your client library.

For PgCat (a Rust rewrite of pgBouncer with built-in load balancing across replicas), the same concepts apply.

transparent_huge_pages: madvise or never

Linux's transparent huge pages can cause latency spikes during compaction. The standard advice on Postgres servers:

BASH
echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
echo madvise > /sys/kernel/mm/transparent_hugepage/defrag

Or disable entirely. madvise lets Postgres opt out per allocation; never opts out unconditionally. Both prevent the latency spikes you'd see with the default always.

Step 6: The specific slow-query patterns to recognize

The same shapes show up over and over.

Autovacuum starvation

Postgres marks deleted and updated tuples as dead. VACUUM reclaims their space. Autovacuum runs in the background to do this for you, but it has limits. On a table with frequent updates, autovacuum can fall behind and the table accumulates millions of dead tuples.

Check:

SQL
SELECT
  schemaname || '.' || relname AS table,
  n_live_tup,
  n_dead_tup,
  round(100.0 * n_dead_tup / NULLIF(n_live_tup, 0), 1) AS dead_pct,
  last_autovacuum,
  last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

If dead_pct is above 20% on a frequently-queried table, tune autovacuum. Common moves:

SQL
ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.02,    -- vacuum at 2% dead instead of 20%
  autovacuum_vacuum_cost_limit = 2000,      -- let autovacuum work faster
  autovacuum_analyze_scale_factor = 0.01
);

For a deep dive on the mechanics, see our PostgreSQL VACUUM guide.

Stale statistics

The planner uses pg_statistic to estimate selectivities. After a bulk load, large migration, or sudden change in data distribution, stats can be wildly off. Run ANALYZE manually:

SQL
ANALYZE orders;

For tables where statistics matter and they change fast, bump the autoanalyze frequency (see above) or run a nightly explicit ANALYZE.

For correlated columns where the planner makes bad estimates, use extended statistics:

SQL
CREATE STATISTICS orders_corr (ndistinct, dependencies)
  ON country, state, city FROM orders;
ANALYZE orders;

Idle in transaction

A backend that's idle in transaction is holding row locks and preventing VACUUM from cleaning rows that fall after its snapshot. One forgotten BEGIN; from a debug session can wreck a database.

SQL
SELECT pid, usename, application_name,
       now() - xact_start AS xact_age,
       state, query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
ORDER BY xact_age DESC;

The defense:

INI
idle_in_transaction_session_timeout = 60000   # 60 seconds

Postgres terminates idle-in-transaction backends after the timeout. This is rude to misbehaving apps but kind to your database.

Lock contention

Find what's blocked and what's blocking:

SQL
SELECT
  blocked.pid AS blocked_pid,
  blocked.usename AS blocked_user,
  blocked.query AS blocked_query,
  blocked.wait_event,
  blocking.pid AS blocking_pid,
  blocking.usename AS blocking_user,
  blocking.query AS blocking_query,
  blocking.state AS blocking_state
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking
  ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE blocked.wait_event_type = 'Lock';

If the blocker is itself stuck behind another backend, the chain can be long. pg_blocking_pids and the recursive query patterns in the Postgres wiki cover the full version.

For job-queue style workloads where you want to grab work without blocking, FOR UPDATE SKIP LOCKED is the answer. See PostgreSQL FOR UPDATE SKIP LOCKED for the pattern.

The TOAST tax

Wide rows with one big column (a jsonb blob, a long text) get split into a separate TOAST table when the row exceeds about 2KB. The main row stays small (which is good), but SELECT * triggers a TOAST read on every row.

If you're doing SELECT * on a table with TOASTed columns and you don't need the big column, project it out:

SQL
SELECT id, customer_id, status, total FROM orders WHERE customer_id = 12345;

You can see TOAST hits in EXPLAIN ANALYZE BUFFERS output: extra Buffers accounting on a tiny-looking query usually means TOAST reads.

JSONB without an index

A query against jsonb without a GIN index does a sequential scan and deserializes the JSON for every row. On a million-row table, that's seconds:

SQL
-- Sequential scan, JSON parse per row
SELECT * FROM events WHERE data @> '{"type": "signup"}';

-- GIN index makes it instant
CREATE INDEX idx_events_data ON events USING GIN (data jsonb_path_ops);

For specific paths (like data->>'user_id'), an expression B-tree is usually better than a full GIN:

SQL
CREATE INDEX idx_events_user_id ON events ((data->>'user_id'));

N+1 queries

The classic ORM problem. The fix isn't database-side; it's in the app:

SQL
-- 1 query for orders, then N queries for items
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM order_items WHERE order_id = 1;  -- repeated N times

Use eager loading (Order.includes(:items) in Rails, prefetch_related in Django, Include in EF Core), or write the JOIN by hand:

SQL
SELECT o.*, i.*
FROM orders o
JOIN order_items i ON i.order_id = o.id
WHERE o.customer_id = 123;

Look for "N+1 detector" middleware in your framework. Bullet (Rails), Django Debug Toolbar's SQL panel, Hibernate's hibernate.generate_statistics. Catch this in development.

Sort spilling

Sort Method: external merge Disk: 80224kB

The sort wanted more memory than work_mem. Either bump work_mem for this query, or add an index that returns rows already sorted:

SQL
CREATE INDEX idx_orders_customer_created
  ON orders(customer_id, created_at DESC);

The index removes the Sort node entirely.

Wrong join order from bad estimates

If the planner thinks the left side of a nested loop is small (say, 100 rows) but the true count is huge (1M rows), the inner loop runs a million times. Symptom: a nested loop with high loops=N and high actual time per loop.

Fixes, in order of preference:

  1. ANALYZE the table.
  2. Add extended statistics if columns are correlated.
  3. Rewrite the query to make the smaller side obvious.
  4. As a last resort, SET enable_nestloop = off for the session to force a different plan. Use this only for diagnostics; don't ship it.

Sequence churn

SERIAL and IDENTITY columns use sequences. By default, each session caches a few sequence values to reduce contention. On a write-heavy workload with many short connections, the cache thrash burns CPU. Bump the cache size:

SQL
ALTER SEQUENCE orders_id_seq CACHE 50;

You'll see gaps in IDs (each session reserves a chunk and may not use it all), which is fine for surrogate keys.

Hot row contention

If hundreds of clients are updating the same row (a counter, a global config, a queue head), they serialize. The fix is to remove the contention: use a shard pattern, batch updates, or use atomic counters in Redis instead.

A common version is incrementing a view_count column. Don't. Either roll up periodic aggregates from a log, or use a Redis counter that you flush periodically.

Step 7: When query tuning isn't enough

Sometimes the query is fine. The problem is the architecture. Recognize the inflection points.

Read replicas

When your read load exceeds what one Postgres can handle, add a streaming replica and direct read queries to it. Logical replication (Postgres 10+) gives you more flexibility (different schemas, partial replication) at the cost of more complexity.

Watch out for replication lag. A query that reads from a replica might see slightly stale data. For "did my just-written value land?" reads, route to the primary.

Partitioning

When a single table gets past a few hundred million rows, partition it. Declarative partitioning has been usable since PG 10 and is solid as of 12+. Range partitioning by month is the standard pattern for time-series:

SQL
CREATE TABLE events (
  id bigserial,
  created_at timestamptz NOT NULL,
  ...
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_05 PARTITION OF events
  FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');

pg_partman automates partition creation and retention. Drop old partitions to delete old data in one operation instead of a slow DELETE.

Citus for horizontal scale

When one machine isn't enough, Citus shards Postgres across multiple nodes while preserving SQL semantics. It's the right answer when you have a clear shard key (typically tenant_id or customer_id) and read/write traffic that doesn't fit on a single primary.

TimescaleDB for time-series

If your workload is "lots of timestamps, time-bucket aggregations, retention policies," TimescaleDB extends Postgres with hypertables that automate partitioning, compression, and downsampling. Worth evaluating before you write your own time-series layer.

Materialized views

If a dashboard query takes 30 seconds and runs every minute, materialize it:

SQL
CREATE MATERIALIZED VIEW orders_daily AS
SELECT date_trunc('day', created_at) AS day,
       customer_id,
       count(*) AS orders,
       sum(total) AS revenue
FROM orders GROUP BY 1, 2;

CREATE UNIQUE INDEX ON orders_daily (day, customer_id);

-- Refresh without blocking reads
REFRESH MATERIALIZED VIEW CONCURRENTLY orders_daily;

Concurrent refresh requires a unique index on the view. Schedule the refresh from cron or pg_cron.

Application-side caching

For genuinely hot reads of slowly-changing data, put Redis or memcached in front. The right answer when you've already optimized the query and the bottleneck is "this is a lot of requests."

The tools worth installing

A curated list, in rough order of value:

ToolWhat it doesCost
pg_stat_statementsQuery stats aggregationFree, built in
auto_explainLogs slow plans automaticallyFree, built in
pgbadgerLog report generationFree
pgHeroLive dashboard, index suggestionsFree
pgBouncerConnection poolerFree
PGTuneConfig calculator (web)Free
explain.dalibo.comPlan visualizerFree
pg_repackOnline table/index rebuildFree
pg_partmanPartition managementFree
hypopgHypothetical indexes ("what if I add this?")Free
pg_qualstatsCaptures WHERE-clause stats for index recommendationsFree
pg_stat_kcacheKernel-level I/O attributionFree
pgcliBetter psql with autocompletionFree
pgMustardPlan analysis with English explanationsPaid

For most teams, the right starting set is: pg_stat_statements, auto_explain, pgBouncer, pgbadger on a cron, and pgHero for the live dashboard. That gets you 80% of the benefit at zero cost.

hypopg deserves a special mention

SQL
CREATE EXTENSION hypopg;

-- Pretend the index exists
SELECT hypopg_create_index('CREATE INDEX ON orders(customer_id, created_at)');

-- See if the planner would use it
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345 ORDER BY created_at DESC LIMIT 20;

You can test an index without paying the cost of building it. Then drop it cheaply if it doesn't help:

SQL
SELECT hypopg_reset();

For a candidate index on a large table, this saves hours of build time per experiment.

Quick checklist

When you encounter a slow query:

  1. Find it in pg_stat_statements. Check whether it's a top consumer or a one-off. Fix the top consumers first.
  2. Run EXPLAIN (ANALYZE, BUFFERS) on production-shaped data. Visualize with explain.dalibo.com if it's a multi-node plan.
  3. Look for Seq Scan on a large table where you only return a fraction of rows. That's a missing index.
  4. Compare estimates to actuals. Off by 10x or more means stale statistics. ANALYZE. If the columns are correlated, add extended statistics.
  5. Check Buffers. Heavy shared read means the working set doesn't fit in cache, or there's a missing index forcing repeated reads.
  6. Audit the query shape. SELECT *? Functions on indexed columns? Deep OFFSET? Each has a mechanical fix.
  7. Verify the index gets used. If pgHero suggested an index and the planner ignores it, check random_page_cost and run ANALYZE.
  8. For sustained issues, look beyond the query. Dead tuples? Lock contention? Idle-in-transaction backends? pg_stat_activity and pg_stat_user_tables tell you.

Most slow Postgres queries are one of three things: a missing or wrong index, stale statistics misleading the planner, or storage bloat from autovacuum falling behind. The tooling above turns each of those from a research project into a few minutes of work. The expensive failure mode is not knowing which one you're in.

Keep Reading