PostgreSQL CREATE INDEX

Speed up your queries by teaching PostgreSQL where to look.

What is an Index?

Imagine you're in a library with millions of books, and you need to find every book written by Stephen King. Without any system, you'd have to walk through every single aisle, checking every single book. That could take days.

But libraries have a card catalog (or these days, a computer system) that lets you look up "Stephen King" and instantly see exactly which shelves contain his books. You go straight there. Done in seconds.

A database index works the same way. Without an index, PostgreSQL performs a "sequential scan"—it reads every single row in the table to find matches. With an index, it jumps directly to the matching rows.

Basic Syntax

Creating an index in PostgreSQL is straightforward:

SQL
CREATE INDEX index_name ON table_name (column_name);

For example, if you have a users table and frequently search by email:

SQL
CREATE INDEX idx_users_email ON users (email);

Now queries like this become much faster:

SQL
SELECT * FROM users WHERE email = 'alice@example.com';

Index Types

PostgreSQL offers several index types, each optimized for different use cases:

B-tree (Default)

The workhorse of indexes. Great for equality and range queries. This is what you get if you don't specify a type.

SQL
-- These are equivalent
CREATE INDEX idx_users_created ON users (created_at);
CREATE INDEX idx_users_created ON users USING btree (created_at);

-- B-tree handles all these queries well:
SELECT * FROM users WHERE created_at = '2024-01-15';
SELECT * FROM users WHERE created_at > '2024-01-01';
SELECT * FROM users WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';
SELECT * FROM users ORDER BY created_at DESC LIMIT 10;

Hash

Optimized for simple equality comparisons. Slightly faster than B-tree for = operations, but can't handle ranges.

SQL
CREATE INDEX idx_sessions_token ON sessions USING hash (token);

-- Good for:
SELECT * FROM sessions WHERE token = 'abc123xyz';

-- Won't help with:
SELECT * FROM sessions WHERE token > 'abc';  -- Range query

GIN (Generalized Inverted Index)

Perfect for columns containing multiple values—arrays, JSONB, and full-text search.

SQL
-- For JSONB columns
CREATE INDEX idx_products_metadata ON products USING gin (metadata);

-- Now you can efficiently query:
SELECT * FROM products WHERE metadata @> '{"color": "red"}';

-- For array columns
CREATE INDEX idx_posts_tags ON posts USING gin (tags);

-- Query posts with specific tags:
SELECT * FROM posts WHERE tags @> ARRAY['postgresql', 'tutorial'];

GiST (Generalized Search Tree)

Used for geometric data, ranges, and full-text search. Essential for PostGIS spatial queries.

SQL
-- For geometric/spatial data
CREATE INDEX idx_locations_coords ON locations USING gist (coordinates);

-- For range types
CREATE INDEX idx_events_duration ON events USING gist (time_range);

-- Query overlapping time ranges:
SELECT * FROM events
WHERE time_range && '[2024-01-01, 2024-01-31]'::daterange;

Common Patterns

Unique Index

Enforces uniqueness while also providing fast lookups:

SQL
CREATE UNIQUE INDEX idx_users_email_unique ON users (email);

-- This will fail if duplicate emails exist:
-- ERROR: could not create unique index "idx_users_email_unique"
-- DETAIL: Key (email)=(duplicate@example.com) is duplicated.

Composite Index (Multiple Columns)

When you frequently filter or sort by multiple columns together:

SQL
CREATE INDEX idx_orders_user_date ON orders (user_id, created_at DESC);

-- Efficiently handles:
SELECT * FROM orders WHERE user_id = 123 ORDER BY created_at DESC;
SELECT * FROM orders WHERE user_id = 123 AND created_at > '2024-01-01';

Column order matters! The index above helps queries filtering by user_id, but won't help queries that only filter by created_at.

Partial Index

Index only a subset of rows. Smaller, faster, and more efficient:

SQL
-- Only index active users
CREATE INDEX idx_users_active_email ON users (email)
WHERE status = 'active';

-- Only index unprocessed orders
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE processed_at IS NULL;

Expression Index

Index the result of an expression:

SQL
-- Index lowercase emails for case-insensitive search
CREATE INDEX idx_users_email_lower ON users (LOWER(email));

-- Now this query uses the index:
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

Concurrent Index Creation

By default, CREATE INDEX locks the table for writes. On a production database with millions of rows, this could mean minutes of downtime.

Use CONCURRENTLY to build the index without blocking:

SQL
CREATE INDEX CONCURRENTLY idx_orders_user ON orders (user_id);

Caveat: Concurrent index creation takes longer and cannot run inside a transaction. If it fails partway through, you'll have an invalid index that needs to be dropped.

When NOT to Use Indexes

Indexes aren't free. They come with tradeoffs:

  • Write overhead: Every INSERT, UPDATE, and DELETE must also update the index
  • Storage space: Indexes consume disk space (sometimes significant)
  • Maintenance: Indexes can become bloated and need periodic maintenance

Skip indexes when:

  • The table is small (under ~10,000 rows)—sequential scan is often faster
  • The column has very low cardinality (e.g., a boolean or status with 3 values)
  • The table is write-heavy and rarely queried
  • You're selecting most of the table anyway

Checking Index Usage

Use EXPLAIN to see if your query uses an index:

SQL
EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';

-- With index (good):
-- Index Scan using idx_users_email on users
--   Index Cond: (email = 'alice@example.com')

-- Without index (slow):
-- Seq Scan on users
--   Filter: (email = 'alice@example.com')

Check which indexes exist and their sizes:

SQL
SELECT
    indexname,
    indexdef,
    pg_size_pretty(pg_relation_size(indexname::regclass)) as size
FROM pg_indexes
WHERE tablename = 'users';

Quick Reference

| Command | Description | |---------|-------------| | CREATE INDEX name ON table (col) | Create a B-tree index | | CREATE UNIQUE INDEX ... | Create index that enforces uniqueness | | CREATE INDEX ... USING gin | Create GIN index for arrays/JSONB | | CREATE INDEX ... WHERE condition | Create partial index | | CREATE INDEX CONCURRENTLY ... | Create without locking table | | DROP INDEX name | Remove an index | | REINDEX INDEX name | Rebuild a corrupted/bloated index |

Summary

Indexes are one of the most powerful tools for query optimization. Start with B-tree indexes on columns you frequently filter or sort by, use EXPLAIN to verify they're being used, and remember that the best index is one that matches your actual query patterns.

When in doubt, measure. A query that takes 500ms without an index might take 2ms with one—but an unnecessary index just wastes space and slows down writes.