How to Fix Slow Postgres Queries
Postgres rarely gets slow in a single dramatic moment. It degrades. A query that ran in 4ms last quarter starts taking 40ms, then 400ms, then it shows up at the top of an oncall page when a marketing campaign triples traffic. Nothing changed in the code. The query plan flipped because the row count crossed a threshold, or autovacuum fell behind, or someone added a BEFORE UPDATE trigger that turned a HOT update into a heap rewrite.
That slow-creep failure mode is what makes Postgres performance work different from chasing a 500ms log spike in MySQL. The good news: the diagnostic tooling is excellent, almost all of it ships with the database, and most slow queries fall into a handful of recognizable shapes. This guide is the long version of the playbook: every tool worth installing, the EXPLAIN output worth memorizing, the index types worth knowing, and the configuration knobs that change query plans.
Why Postgres queries get slow
Three things drive almost every slow-query report:
-
The planner picked the wrong plan. This is usually because statistics are stale, or because the planner's cost model doesn't reflect your hardware (it assumes spinning disks unless you tell it otherwise).
-
The plan is fine, but the table is full of garbage. Postgres uses MVCC, which means every
UPDATEandDELETEleaves a dead tuple. If autovacuum can't keep up, queries spend time skipping over dead rows and indexes balloon. -
The query is doing something the planner can't optimize. A function on an indexed column, an
ORacross two columns, a deepOFFSET, a regex without a leading anchor. These read fine but force sequential scans.
The diagnostic process is figuring out which of those three you're hitting, then either changing the plan, fixing the storage, or rewriting the query. The rest of this post walks through the tools and techniques for each.
Step 1: Build a real diagnostic stack
You can't fix what you can't see. Postgres ships with deep instrumentation, but most of it is off by default. A production Postgres installation worth keeping should have at least these turned on.
pg_stat_statements
This is non-negotiable. pg_stat_statements records aggregated execution stats for every query the server has run since startup or stats reset. It is the single best tool for finding which queries are worth your attention.
Enable it in postgresql.conf:
Restart, then in each database you want to track:
The view itself has a lot of columns. The ones worth keeping in muscle memory:
Sort by total_exec_time, not mean_exec_time. A query that runs 100,000 times at 5ms is a much bigger problem than one that runs once at 5 seconds. The 5-second query is annoying. The 100,000x5ms query is what's eating your CPU.
A few less-obvious tricks:
- The
track_planningflag (Postgres 13 and later) records planning time as well as execution time. If you see a query wheretotal_plan_timerivalstotal_exec_time, you have a prepared-statement plan-cache problem or a complex view that's expensive to plan. JIT can also blow up planning time on simple queries; checkmean_plan_time. - Reset stats deliberately.
SELECT pg_stat_statements_reset();clears the slate. Run it before a deployment, then look 24 hours later to see what the new workload looks like. - The
queryidcolumn is stable across resets (since PG 14). Use it to track a specific query's performance across deploys without depending on text matching.
pg_stat_statements has overhead, but it's small. The cost of not running it in production is much higher.
log_min_duration_statement
pg_stat_statements aggregates. The log captures individual slow queries with their parameter values, which is what you need to reproduce a problem.
The Postgres 13 additions matter. log_min_duration_sample plus log_statement_sample_rate let you sample queries in a window faster than log_min_duration_statement. So you can log every query above 1 second, plus 10% of queries above 100ms. Without sampling, lowering log_min_duration_statement on a busy server fills the disk in hours.
auto_explain
auto_explain logs the actual execution plan for every slow query, so you don't have to reproduce the problem on your laptop to see what the planner did.
A warning that the manual buries: auto_explain.log_analyze = on adds measurable overhead because it runs the query through the instrumentation path. Use a high log_min_duration (a few seconds), or run with log_analyze = off and only flip it on temporarily when you're hunting something specific.
JSON format is the right default. It's painful to read in the log but trivial to parse with pgbadger or pgMustard.
pgbadger
Built-in tools give you the data. pgbadger turns it into a readable report. It's a Perl script that parses the Postgres log and produces an HTML report with the top queries, the time distribution, the most frequent errors, the connection patterns, and a dozen other things. Install it once and run it nightly:
There's no good reason not to have this on a cron. It costs nothing to run on the log files you're already producing.
pgHero
Where pgbadger is a log analyzer, pgHero is a live dashboard with opinionated recommendations. It's a Rails app you point at your Postgres instance. The two features worth it on their own:
- Unused indexes. It queries
pg_stat_user_indexesand shows you indexes with zero scans. Caveat: on a replica, this is unreliable. Drop based on stats from the primary. - Slow query suggestions. It reads
pg_stat_statementsand proposes indexes.
The suggestions aren't always right (they don't know your write patterns or partial-index opportunities), but they're a useful prompt.
auto_explain plus pgMustard or explain.dalibo.com
pgMustard is a paid SaaS that takes an EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) plan and tells you what's wrong with it in plain English. It's $50 a month per seat. Worth it for teams who don't have a Postgres expert. For the same job free, paste the plan into explain.dalibo.com or explain.depesz.com. Dalibo's renders the plan as a flame graph, which makes the hot spots obvious. Depesz's is more compact and easier to read on small plans.
If you're going to pick one to learn first, learn explain.dalibo.com. The flame graph format clicks fast.
pg_stat_kcache
For attribution beyond what Postgres sees: which queries are causing actual disk I/O versus reading from the OS page cache, and which queries are CPU-bound. It hooks into Linux getrusage and exposes the per-query stats. Less common but invaluable when you're trying to figure out whether to add RAM or add indexes.
pg_stat_activity for real-time
For the "something is wrong right now" view, nothing beats pg_stat_activity. The view shows every backend, what query it's running, how long it's been there, what it's waiting on, and what's blocking it.
A query I keep in my back pocket:
That gives you every active query running longer than 5 seconds, sorted by duration. The wait_event column is gold: Lock means you have a blocker, IO:DataFileRead means you're hitting disk, IPC: events usually mean parallel query coordination.
Step 2: Read EXPLAIN like a power user
The base form of EXPLAIN is almost useless for performance work. It only shows estimates.
Seq Scan on orders (cost=0.00..12834.00 rows=23 width=120)
Filter: ((customer_id = 12345) AND (status = 'pending'::text))
That cost number is in arbitrary units. The first number is the cost to return the first row, the second is the cost to return the last. The planner picks the plan with the lowest total cost. Knowing the absolute value isn't useful; knowing whether two plans have similar costs is.
What you want is EXPLAIN (ANALYZE, BUFFERS). ANALYZE executes the query and reports real timings. BUFFERS reports how many pages were read from the cache and from disk:
Seq Scan on orders (cost=0.00..12834.00 rows=23 width=120)
(actual time=0.052..2834.123 rows=23 loops=1)
Filter: ((customer_id = 12345) AND (status = 'pending'::text))
Rows Removed by Filter: 499977
Buffers: shared hit=8200 read=4600
Planning Time: 0.234 ms
Execution Time: 2834.456 ms
This is a query you can fix in 30 seconds, and you can see exactly why it's slow. It scanned 500,000 rows to return 23. Buffers: shared read=4600 means 4,600 pages came from disk. At a typical 8KB page, that's 36MB of I/O for 23 rows.
The fields worth memorizing
| What to look at | What it tells you |
|---|---|
actual time=A..B | A is the time to first row, B is the time to last. A high A on the outer node means the inner work is expensive. |
loops=N | The node ran N times. Common in nested loop joins. Multiply actual time by loops for total. |
rows estimated vs actual | If they're off by 10x or more, statistics are wrong. Run ANALYZE. |
Buffers: shared hit/read | hit is buffer cache, read is disk. Lots of read on a hot query means it's not staying in cache. |
Buffers: shared dirtied/written | The query is modifying pages. A read query with dirtied > 0 is doing setting hint bits, which is normal but worth knowing. |
Heap Fetches | On Index Only Scan, this is how many times the visibility map missed and the query had to hit the heap anyway. High Heap Fetches means VACUUM hasn't been keeping up. |
Workers Planned vs Launched | If they differ, you hit max_parallel_workers_per_gather or max_parallel_workers. |
Sort Method: quicksort Memory: N | Sort fit in work_mem. |
Sort Method: external merge Disk: N | Sort spilled to disk. Bad. Bump work_mem for this query. |
Estimation errors are the most common bug
If the planner thinks a WHERE clause will return 100 rows and the real answer is 100,000, it will pick a nested loop join. That nested loop is fine for 100 rows and disastrous for 100,000. Nine times out of ten, this is what "the planner picked a bad plan" comes down to.
To check, look for nodes where rows=N (estimate) is far off from actual rows=M. Postgres can be wrong by orders of magnitude on:
- Correlated columns. If
country = 'US' AND state = 'CA'is hugely more selective than the product of their independent selectivities, the planner doesn't know unless you tell it. Fix: extended statistics.SQL LIKEwith patterns. The planner uses a fixed selectivity estimate for non-prefix patterns. Often badly wrong.- JSONB containment.
data @> '{...}'selectivity is hard. Add extended statistics or rely onpg_stat_statementsto surface the actual hit rate.
Use ANALYZE BUFFERS in production audits, ANALYZE alone in dev
BUFFERS matters more on production-like data sizes. On a 100-row dev database every query is in cache and Buffers: shared hit=2 is the answer for everything. Run your analysis on production-shaped data, or against a real production read replica.
JIT can lie to you
In Postgres 11 and later with JIT enabled (the default on most distributions in Postgres 12+), EXPLAIN ANALYZE for short queries includes JIT compilation time in the total. For OLTP workloads, JIT is often a net loss because compilation time outweighs the runtime savings. Look for:
JIT:
Functions: 12
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 2.345 ms, Inlining 0.000 ms, Optimization 1.234 ms, Emission 3.456 ms, Total 7.035 ms
7ms of JIT overhead on a 5ms query is a problem. The fix is to raise jit_above_cost so JIT only kicks in for plans where it might pay off, or disable JIT entirely for OLTP workloads:
When the visualizer earns its keep
For plans with more than ten nodes, the text format is a mess. Run the same query with FORMAT JSON, paste it into explain.dalibo.com, and the slow node lights up.
The "Children Time" column on Dalibo is the killer feature. It tells you which node owns the time, accounting for the work done by its children. That's the thing you want to optimize.
Step 3: Index design that pays for itself
Most slow Postgres queries are missing an index. The hard part isn't adding one; it's adding the right one, and not adding the wrong one.
Postgres supports six index access methods. You'll use B-tree for 90% of cases, GIN for 8%, and the rest essentially never. Understanding which one to reach for is the difference between a 1000x speedup and an index that wastes disk.
B-tree: the default for a reason
B-tree handles =, <, >, BETWEEN, IN, prefix LIKE, ordering, and IS NULL. It's the default and almost always the right answer.
A few B-tree details that matter:
Sort order is part of the index. An index on (created_at DESC) is different from (created_at). They both answer equality queries, but the descending version is what you want when paginating recent records:
Postgres can scan a B-tree backward at almost the same speed, so a single ascending index handles both directions for most workloads. The descending version matters when you're combining sort with another column:
INCLUDE is for non-key columns. Postgres 11 added INCLUDE to make covering indexes explicit:
The included columns aren't part of the index key. They live in the leaf pages so the query can satisfy the projection without hitting the heap. The plan shows Index Only Scan.
fillfactor for high-update tables. B-tree leaf pages are filled to 90% by default. On a table with frequent in-place updates, dropping the fillfactor leaves room for HOT (Heap-Only Tuple) updates, which avoid index churn:
This is a niche optimization. Don't bother unless you've measured high index bloat on a write-heavy table.
GIN: for JSONB, arrays, and full-text
When you index a jsonb column or an array, you almost certainly want a GIN index. The default operator class works for general containment:
There's also jsonb_path_ops, which indexes only the paths and is smaller and faster for the @> operator specifically, at the cost of supporting fewer operators:
If your queries are only @> containment (the common case), jsonb_path_ops is the better default.
For full-text search, the standard recipe:
Or, if you want to index a generated column to avoid recomputing the tsvector:
For trigram-based fuzzy and substring search, pg_trgm:
This is one of the few ways to get a non-prefix LIKE to use an index. It costs disk and CPU on writes, but for search-style queries it's transformative.
GiST: range types and geo
For range overlap queries (range_a && range_b), geometric types, and exclusion constraints, GiST is the right choice:
The classic use case is double-booking prevention via exclusion constraints, which need GiST under the hood.
BRIN: huge append-mostly tables
BRIN (Block Range INdex) stores a min/max for ranges of physical pages instead of one entry per row. On a billion-row time-series table where the data is roughly ordered by created_at, a BRIN index can be a few hundred kilobytes when the equivalent B-tree would be tens of gigabytes:
BRIN only works if your data is physically ordered by the indexed column. If rows are inserted in random order, BRIN is useless. For time-series data with INSERTs that mostly arrive in chronological order, it's hard to beat.
A common BRIN mistake: pairing BRIN with high-selectivity equality queries. BRIN is bad at equality. It's great at "give me the rows in this hour-long window."
Hash indexes: rehabilitated but rarely worth it
Before Postgres 10, hash indexes weren't WAL-logged and broke after a crash with no warning. Now they work. They're slightly smaller and faster than B-tree for pure equality lookups on large keys. In practice, B-tree's penalty for "supports range queries too" is so small that the hash index rarely earns its place. Skip them unless you've measured.
SP-GiST: skip it
Specialized data structures for non-balanced trees (phone networks, IP ranges). You'll know if you need it.
Partial indexes: the underused win
If you only query a fraction of the rows, index only that fraction:
The index is smaller (5% of the size), cheaper to update, and faster to scan. The query has to include the partial predicate for the planner to use it:
Partial indexes are one of the best tools Postgres gives you and most teams ignore them. Look for queries that always include a constant WHERE clause and replace the full-table index with a partial.
Expression indexes: the function-on-column escape hatch
A query that wraps an indexed column in a function can't use a normal index. The expression index fixes it:
The expression has to match exactly. LOWER(email) = 'foo' uses it. email = LOWER('Foo') does not (and doesn't need to, because the comparison is on the raw column).
For jsonb field lookups, this is the way to index a specific path:
Concurrent index builds
Building an index locks writes by default. On a production table, that's not acceptable. Use CONCURRENTLY:
It takes longer (Postgres makes two passes), uses more I/O, and can leave behind an INVALID index if it fails. Always check after the build:
If indisvalid = false, drop it and try again. The CREATE INDEX documentation has a multi-paragraph warning about this. Read it before doing it in production.
Find and drop dead indexes
The conindid filter excludes indexes backing unique and primary key constraints, which you can't drop without breaking the constraint.
Two caveats before deleting:
- Read from the primary. Index stats on a replica don't include replicated queries.
- The window matters. A nightly batch report might use an index zero times for 23 hours. Reset stats and wait at least 24 hours, ideally a week.
pgHero automates this check.
Bloated indexes and pg_repack
After lots of updates, B-tree indexes accumulate dead entries that VACUUM can clean but not always reclaim space from. Symptom: the index is much bigger than its B-tree theoretical size, queries are slow, and pgstattuple reports a high dead-tuple percentage:
The cheap fix is REINDEX CONCURRENTLY (PG 12+):
For table-level bloat with downtime constraints, pg_repack is the standard tool. It rebuilds tables online by swapping pages behind a trigger.
Step 4: Rewrite queries the planner can optimize
A surprising fraction of slow queries are slow because they're written in a way that prevents Postgres from using indexes it has. These rewrites are mechanical.
SELECT * disables index-only scans
If you only need three columns, ask for three columns:
This is doubly important when the table has a TOASTed column (any column over ~2KB stored out-of-line). SELECT * reads the TOAST chain for every row even if you didn't want the field.
OFFSET pagination is broken past page 5
It gets quadratically worse with depth. Use keyset pagination:
If you need page numbers (a UX requirement, usually), the fix is hybrid: use keyset for "next page" and "previous page" navigation, accept that "jump to page 47" is slow.
Markus Winand's site use-the-index-luke.com has the definitive treatment of this. Read it once and stop writing OFFSET queries forever.
Functions on indexed columns
The most common form is date truncation:
If you genuinely need the function form (because the query is generated and you can't change it), add an expression index.
The same applies to type casts:
ORMs are a frequent offender here. Watch for ones that wrap parameters in unnecessary casts.
OR across columns and BitmapOr
Postgres can sometimes combine multiple indexes with BitmapOr, but the planner often picks a sequential scan instead when the selectivities are uncertain:
The deterministic rewrite is UNION ALL:
If the planner is picking the right plan with OR, leave it. If EXPLAIN ANALYZE shows it falling back to Seq Scan, the UNION ALL form forces the issue.
Lateral joins for "top N per group"
A classic query: "give me the 5 most recent orders for each customer." A bare GROUP BY can't do this. The naive self-join is expensive. The right tool is LATERAL:
Combined with a (customer_id, created_at DESC) index, this is the fastest way to do top-N-per-group in Postgres.
CTEs are no longer an optimization fence (in PG 12+)
Before Postgres 12, a WITH clause acted as an optimization barrier: the CTE was always materialized as a temporary result. Some teams relied on this to force a specific plan shape. As of PG 12, the planner can inline simple CTEs, which is usually faster.
If you want to keep the old behavior:
Conversely, NOT MATERIALIZED forces inlining. If you're upgrading from PG 11 or earlier and queries got slower after the upgrade, check whether you relied on CTE materialization.
EXISTS vs IN
In modern Postgres (10+), the planner treats IN (subquery) and EXISTS identically in almost all cases. Pick whichever reads more clearly. The old "always use EXISTS" advice is outdated.
The exception: NOT IN with a subquery that can return NULL is a footgun. It evaluates to UNKNOWN if any subquery row is NULL, which returns zero results with no error. Always use NOT EXISTS for the negation:
Step 5: Tune the configuration knobs that change query plans
Most of Postgres's defaults are conservative and tuned for a small machine. A real production box needs at least these set.
The single best starting point is pgtune.leopard.in.ua. It takes your hardware specs and workload type and outputs a config. Use it as a baseline, then tune based on what your workload does in practice.
shared_buffers: 25% of RAM
This is Postgres's in-process page cache. Above ~40% of RAM, you start fighting with the OS page cache for the same pages, and both lose. 25% is the standard recommendation and it holds up.
On a 32GB instance, that's 8GB to Postgres, 24GB to the OS cache. Postgres reads pages through shared_buffers but the OS caches the underlying files, so the practical working set is the sum.
effective_cache_size: not memory, a hint
This setting tells the planner roughly how much memory is available across shared_buffers and the OS page cache. The planner uses it to decide between index and sequential scans. Set it to about half to three-quarters of total RAM:
This is the most misunderstood Postgres setting. It does not allocate memory. It's a number the planner uses to compute cost estimates. Setting it too low makes the planner avoid index scans; too high makes it overestimate index scan benefit.
work_mem: per-operation, not per-connection
Each Sort, Hash, or Materialize node gets up to work_mem of memory. Crucially, a single query with three sorts can use three times work_mem. A hundred concurrent queries with three sorts each can use 300x.
Set it conservatively as a default and bump it per-query when needed:
Symptom of low work_mem: EXPLAIN ANALYZE shows Sort Method: external merge Disk: 50000kB. That sort spilled. Bump work_mem for the query.
maintenance_work_mem: for VACUUM, CREATE INDEX, and ALTER TABLE
Set this generously. It's only used briefly and dramatically speeds up maintenance:
random_page_cost: drop to 1.1 on SSDs
The default random_page_cost = 4.0 assumes spinning disks where random I/O is much slower than sequential. On SSDs they're nearly equivalent. The default makes the planner avoid index scans:
If your database lives entirely in memory, you could go lower (random_page_cost = 1.0), but 1.1 is the safe default for SSD-backed cloud Postgres.
effective_io_concurrency: tell Postgres about your SSD
For Bitmap Heap Scans, Postgres can prefetch pages. The setting tells it how many concurrent prefetches your storage can handle:
200 is the right ballpark for a single SSD. NVMe can go higher. Cloud disks vary; 200 is a safe default.
Parallelism settings
Modern Postgres can parallelize sequential scans, hash joins, and aggregates. The defaults are low:
max_parallel_workers_per_gather = 4 is the one that matters for OLAP. On a 16-core box, raise it to 8.
Connection pooling: pgBouncer, not max_connections
The common reflex when "out of connections" errors appear is to raise max_connections. This is almost always wrong. Each Postgres connection has nontrivial memory and scheduling overhead. Past 100-200 active connections, performance degrades.
The right answer is a connection pooler. pgBouncer is the standard. It accepts thousands of client connections and multiplexes them onto a small pool of real backend connections.
Two pooling modes:
- Transaction pooling: a backend connection is held for the duration of a transaction. Compatible with most apps. The default for most setups.
- Session pooling: a backend connection is held for the duration of the client session. Use only when you need session-scoped features (prepared statements that aren't recreated,
LISTEN/NOTIFY, advisory locks,SEToutside a transaction).
Transaction pooling breaks a few features. Anything that depends on state outside a transaction stops working. Prepared statements work in transaction mode in pgBouncer 1.21+; before that they didn't. Check your client library.
For PgCat (a Rust rewrite of pgBouncer with built-in load balancing across replicas), the same concepts apply.
transparent_huge_pages: madvise or never
Linux's transparent huge pages can cause latency spikes during compaction. The standard advice on Postgres servers:
Or disable entirely. madvise lets Postgres opt out per allocation; never opts out unconditionally. Both prevent the latency spikes you'd see with the default always.
Step 6: The specific slow-query patterns to recognize
The same shapes show up over and over.
Autovacuum starvation
Postgres marks deleted and updated tuples as dead. VACUUM reclaims their space. Autovacuum runs in the background to do this for you, but it has limits. On a table with frequent updates, autovacuum can fall behind and the table accumulates millions of dead tuples.
Check:
If dead_pct is above 20% on a frequently-queried table, tune autovacuum. Common moves:
For a deep dive on the mechanics, see our PostgreSQL VACUUM guide.
Stale statistics
The planner uses pg_statistic to estimate selectivities. After a bulk load, large migration, or sudden change in data distribution, stats can be wildly off. Run ANALYZE manually:
For tables where statistics matter and they change fast, bump the autoanalyze frequency (see above) or run a nightly explicit ANALYZE.
For correlated columns where the planner makes bad estimates, use extended statistics:
Idle in transaction
A backend that's idle in transaction is holding row locks and preventing VACUUM from cleaning rows that fall after its snapshot. One forgotten BEGIN; from a debug session can wreck a database.
The defense:
Postgres terminates idle-in-transaction backends after the timeout. This is rude to misbehaving apps but kind to your database.
Lock contention
Find what's blocked and what's blocking:
If the blocker is itself stuck behind another backend, the chain can be long. pg_blocking_pids and the recursive query patterns in the Postgres wiki cover the full version.
For job-queue style workloads where you want to grab work without blocking, FOR UPDATE SKIP LOCKED is the answer. See PostgreSQL FOR UPDATE SKIP LOCKED for the pattern.
The TOAST tax
Wide rows with one big column (a jsonb blob, a long text) get split into a separate TOAST table when the row exceeds about 2KB. The main row stays small (which is good), but SELECT * triggers a TOAST read on every row.
If you're doing SELECT * on a table with TOASTed columns and you don't need the big column, project it out:
You can see TOAST hits in EXPLAIN ANALYZE BUFFERS output: extra Buffers accounting on a tiny-looking query usually means TOAST reads.
JSONB without an index
A query against jsonb without a GIN index does a sequential scan and deserializes the JSON for every row. On a million-row table, that's seconds:
For specific paths (like data->>'user_id'), an expression B-tree is usually better than a full GIN:
N+1 queries
The classic ORM problem. The fix isn't database-side; it's in the app:
Use eager loading (Order.includes(:items) in Rails, prefetch_related in Django, Include in EF Core), or write the JOIN by hand:
Look for "N+1 detector" middleware in your framework. Bullet (Rails), Django Debug Toolbar's SQL panel, Hibernate's hibernate.generate_statistics. Catch this in development.
Sort spilling
Sort Method: external merge Disk: 80224kB
The sort wanted more memory than work_mem. Either bump work_mem for this query, or add an index that returns rows already sorted:
The index removes the Sort node entirely.
Wrong join order from bad estimates
If the planner thinks the left side of a nested loop is small (say, 100 rows) but the true count is huge (1M rows), the inner loop runs a million times. Symptom: a nested loop with high loops=N and high actual time per loop.
Fixes, in order of preference:
ANALYZEthe table.- Add extended statistics if columns are correlated.
- Rewrite the query to make the smaller side obvious.
- As a last resort,
SET enable_nestloop = offfor the session to force a different plan. Use this only for diagnostics; don't ship it.
Sequence churn
SERIAL and IDENTITY columns use sequences. By default, each session caches a few sequence values to reduce contention. On a write-heavy workload with many short connections, the cache thrash burns CPU. Bump the cache size:
You'll see gaps in IDs (each session reserves a chunk and may not use it all), which is fine for surrogate keys.
Hot row contention
If hundreds of clients are updating the same row (a counter, a global config, a queue head), they serialize. The fix is to remove the contention: use a shard pattern, batch updates, or use atomic counters in Redis instead.
A common version is incrementing a view_count column. Don't. Either roll up periodic aggregates from a log, or use a Redis counter that you flush periodically.
Step 7: When query tuning isn't enough
Sometimes the query is fine. The problem is the architecture. Recognize the inflection points.
Read replicas
When your read load exceeds what one Postgres can handle, add a streaming replica and direct read queries to it. Logical replication (Postgres 10+) gives you more flexibility (different schemas, partial replication) at the cost of more complexity.
Watch out for replication lag. A query that reads from a replica might see slightly stale data. For "did my just-written value land?" reads, route to the primary.
Partitioning
When a single table gets past a few hundred million rows, partition it. Declarative partitioning has been usable since PG 10 and is solid as of 12+. Range partitioning by month is the standard pattern for time-series:
pg_partman automates partition creation and retention. Drop old partitions to delete old data in one operation instead of a slow DELETE.
Citus for horizontal scale
When one machine isn't enough, Citus shards Postgres across multiple nodes while preserving SQL semantics. It's the right answer when you have a clear shard key (typically tenant_id or customer_id) and read/write traffic that doesn't fit on a single primary.
TimescaleDB for time-series
If your workload is "lots of timestamps, time-bucket aggregations, retention policies," TimescaleDB extends Postgres with hypertables that automate partitioning, compression, and downsampling. Worth evaluating before you write your own time-series layer.
Materialized views
If a dashboard query takes 30 seconds and runs every minute, materialize it:
Concurrent refresh requires a unique index on the view. Schedule the refresh from cron or pg_cron.
Application-side caching
For genuinely hot reads of slowly-changing data, put Redis or memcached in front. The right answer when you've already optimized the query and the bottleneck is "this is a lot of requests."
The tools worth installing
A curated list, in rough order of value:
| Tool | What it does | Cost |
|---|---|---|
pg_stat_statements | Query stats aggregation | Free, built in |
auto_explain | Logs slow plans automatically | Free, built in |
pgbadger | Log report generation | Free |
pgHero | Live dashboard, index suggestions | Free |
pgBouncer | Connection pooler | Free |
PGTune | Config calculator (web) | Free |
explain.dalibo.com | Plan visualizer | Free |
pg_repack | Online table/index rebuild | Free |
pg_partman | Partition management | Free |
hypopg | Hypothetical indexes ("what if I add this?") | Free |
pg_qualstats | Captures WHERE-clause stats for index recommendations | Free |
pg_stat_kcache | Kernel-level I/O attribution | Free |
pgcli | Better psql with autocompletion | Free |
pgMustard | Plan analysis with English explanations | Paid |
For most teams, the right starting set is: pg_stat_statements, auto_explain, pgBouncer, pgbadger on a cron, and pgHero for the live dashboard. That gets you 80% of the benefit at zero cost.
hypopg deserves a special mention
You can test an index without paying the cost of building it. Then drop it cheaply if it doesn't help:
For a candidate index on a large table, this saves hours of build time per experiment.
Quick checklist
When you encounter a slow query:
- Find it in
pg_stat_statements. Check whether it's a top consumer or a one-off. Fix the top consumers first. - Run
EXPLAIN (ANALYZE, BUFFERS)on production-shaped data. Visualize withexplain.dalibo.comif it's a multi-node plan. - Look for
Seq Scanon a large table where you only return a fraction of rows. That's a missing index. - Compare estimates to actuals. Off by 10x or more means stale statistics.
ANALYZE. If the columns are correlated, add extended statistics. - Check
Buffers. Heavyshared readmeans the working set doesn't fit in cache, or there's a missing index forcing repeated reads. - Audit the query shape.
SELECT *? Functions on indexed columns? DeepOFFSET? Each has a mechanical fix. - Verify the index gets used. If
pgHerosuggested an index and the planner ignores it, checkrandom_page_costand runANALYZE. - For sustained issues, look beyond the query. Dead tuples? Lock contention? Idle-in-transaction backends?
pg_stat_activityandpg_stat_user_tablestell you.
Most slow Postgres queries are one of three things: a missing or wrong index, stale statistics misleading the planner, or storage bloat from autovacuum falling behind. The tooling above turns each of those from a research project into a few minutes of work. The expensive failure mode is not knowing which one you're in.
Keep Reading
How to Fix Slow MongoDB Queries
A practitioner's guide to diagnosing and fixing slow MongoDB queries. The diagnostic stack, reading explain output, indexing with the ESR rule, schema design that holds up at scale, aggregation pipeline optimization, and the tools worth knowing.
How to Fix Slow MySQL Queries
Your MySQL query is slow. Here's how to find out why and fix it, from basic EXPLAIN analysis to indexing strategies and query rewrites.
ClickHouse vs PostgreSQL: OLAP vs OLTP
ClickHouse and PostgreSQL serve different purposes. Here's when to use a columnar analytics database vs a general-purpose relational database.